Re: substitution

Anthra Norell Mon, 18 Jan 2010 04:46:33 -0800

superpollo wrote:

hi.


what is the most pythonic way to substitute substrings?

eg: i want to apply:

foo --> bar
baz --> quux
quuux --> foo

so that:

fooxxxbazyyyquuux --> barxxxquuxyyyfoo

bye

Try the code below the dotted line. It does any number of substitutionsand handles overlaps correctly (long over short)



Your case:

>>> substitutions = (('foo', 'bar'), ('baz', 'quux'), ('quuux','foo')) # Sequence of doublets

>>> T = Translator (substitutions)   # Compile substitutions -> translator
>>> s = 'fooxxxbazyyyquuux'   # Your source string
>>> d = 'barxxxquuxyyyfoo'    # Your destination string
>>> print T (s)
barxxxquuxyyyfoo
>>> print T (s) == d
True


Frederic


-------------------------------------------------------------

class Translator:

   r"""

Will translate any number of targets, handling them correctly ifsome overlap.


       Making Translator
           T = Translator (definitions, [eat = 1])

'definitions' is a sequence of pairs: ((target,substitute),(t2, s2), ...)'eat' says whether untargeted sections pass (translator) orare skipped (extractor).

               Makes a translator by default (eat = False)

T.eat is an instance attribute that can be changed atany time.Definitions example:(('a','A'),('b','B'),('ab','ab'),('abc','xyz') # ('ab','ab') see Tricks.

           ('\x0c', 'page break'), ('\r\n','\n'), ('   ','\t'))

Order doesn't matter.

       Running
           translation = T (source)

       Tricks
           Deletion:  ('target', '')

Exception: (('\n',''), ('\n\n','\n\n')) # Eat LF exceptparagraph breaks.Exception: (('\n', '\r\n'), ('\r\n',\r\n')) # Unix to DOS,would leave DOS unchanged

           Translation cascade:

# Rejoin text lines per paragraph Unix or DOS, insertinginter-word space if missingMark_LF = Translator((('\n','+LF+'),('\r\n','+LF+'),('\r\n\r\n','\r\n\r\n'),('\n\n','\n\n')))# Pick positively identifiable mark for Unix and DOS endof linesSingle_Space_Mark = Translator (((' +LF+', ' '),('+LF+',' '),('-+LF+', '')))

               no_lf_text = Single_Space_Mark (Mark_LF (text))
           Translation cascade:
               # Nesting calls
               reptiles = T_latin_english (T_german_latin (reptilien))

       Limitations

1. The number of substitutions and the maximum size of inputdepends on the respective

               capabilities of the Python re module.
           2. Regular expressions will not work as such.

       Author:
           Frederic Rentsch ([email protected]).

"""


   def __init__ (self, definitions, eat = 0):

       '''

definitions: a sequence of pairs of strings. ((target,substitute), (t, s), ...)eat: False (0) means translate: unaffected data passesunaltered.True (1) means extract: unaffected data doesn't pass(gets eaten).Extraction filters typically require substitutes to endwith some separator,

                else they fuse together. (E.g. ' ', '\t' or '\n')
           'eat' is an attribute that can be switched anytime.

'''self.eat = eat

       self.compile_sequence_of_pairs (definitions)

def compile_sequence_of_pairs (self, definitions):


       '''
           Argument 'definitions' is a sequence of pairs:
           (('target 1', 'substitute 1'), ('t2', 's2'), ...)

Order doesn't matter.

'''

import re

       self.definitions = definitions
       targets, substitutes = zip (*definitions)
       re_targets = [re.escape (item) for item in targets]
       re_targets.sort (reverse = True)

self.targets_set = set (targets)self.table = dict (definitions)

       regex_string = '|'.join (re_targets)
       self.regex = re.compile (regex_string, re.DOTALL)

def __call__ (self, s):

       hits = self.regex.findall (s)
       nohits = self.regex.split (s)

valid_hits = set (hits) & self.targets_set # Ignore targetswith illegal re modifiers.

       if valid_hits:

substitutes = [self.table [item] for item in hits if item invalid_hits] + [] # Make lengths equal for zip to work right

           if self.eat:
               return ''.join (substitutes)

else:zipped = zip (nohits, substitutes)return ''.join (list (reduce (lambda a, b: a + b,[zipped][0]))) + nohits [-1]

       else:
           if self.eat:
               return ''
           else:
               return s



--
http://mail.python.org/mailman/listinfo/python-list

Re: substitution

Reply via email to