On Jun 11, 2:01 am, Lie Ryan <lie.1...@gmail.com> wrote: > 504cr...@gmail.com wrote: > > I've encountered a problem with my RegEx learning curve -- how to > > escape hash characters # in strings being matched, e.g.: > > >>>> string = re.escape('123#abc456') > >>>> match = re.match('\d+', string) > >>>> print match > > > <_sre.SRE_Match object at 0x00A6A800> > >>>> print match.group() > > > 123 > > > The correct result should be: > > > 123456 > > > I've tried to escape the hash symbol in the match string without > > result. > > > Any ideas? Is the answer something I overlooked in my lurching Python > > schooling? > > As you're not being clear on what you wanted, I'm just guessing this is > what you wanted: > > >>> s = '123#abc456' > >>> re.match('\d+', re.sub('#\D+', '', s)).group() > '123456' > >>> s = '123#this is a comment and is ignored456' > >>> re.match('\d+', re.sub('#\D+', '', s)).group() > > '123456'
Sorry I wasn't more clear. I positively appreciate your reply. It provides half of what I'm hoping to learn. The hash character is actually a desirable hook to identify a data entity in a scraping routine I'm developing, but not a character I want in the scrubbed data. In my application, the hash makes a string of alphanumeric characters unique from other alphanumeric strings. The strings I'm looking for are actually manually-entered identifiers, but a real machine-created identifier shouldn't contain that hash character. The correct pattern should be 'A1234509', but is instead often merely entered as '#12345' when the first character, representing an alphabet sequence for the month, and the last two characters, representing a two-digit year, can be assumed. Identifying the hash character in a RegEx match is a way of trapping the string and transforming it into its correct machine- generated form. I'm surprised it's been so difficult to find an example of the hash character in a RegEx string -- for exactly this type of situation, since it's so common in the real world that people want to put a pound symbol in front of a number. Thanks! -- http://mail.python.org/mailman/listinfo/python-list