Re: template strings for matching?

2008-10-09 Thread Tino Wildenhain

Joe Strout wrote:
Catching up on what's new in Python since I last used it a decade ago, 
I've just been reading up on template strings.  These are pretty cool!  
However, just as a template string has some advantages over % 
substitution for building a string, it seems like it would have 
advantages over manually constructing a regex for string matching.


So... is there any way to use a template string for matching?  I 
expected something like:


 templ = Template(The $object in $location falls mainly in the $subloc.)
 d = templ.match(s)

and then d would either by None (if s doesn't match), or a dictionary 
with values for 'object', 'location', and 'subloc'.


But I couldn't find anything like that in the docs.  Am I overlooking 
something?


Yeah, its a bit hard to spot:

http://docs.python.org/library/stdtypes.html#string-formatting-operations

HTH
Tino


smime.p7s
Description: S/MIME Cryptographic Signature
--
http://mail.python.org/mailman/listinfo/python-list


Re: template strings for matching?

2008-10-09 Thread skip
Joe   templ = Template(The $object in $location falls mainly in the  
$subloc.)
Joe   d = templ.match(s)

Joe and then d would either by None (if s doesn't match), or a
Joe dictionary with values for 'object', 'location', and 'subloc'.

Joe But I couldn't find anything like that in the docs.  Am I
Joe overlooking something?

Nope, you're not missing anything.

Skip
--
http://mail.python.org/mailman/listinfo/python-list


Re: template strings for matching?

2008-10-09 Thread skip

Tino Yeah, its a bit hard to spot:

Tino 
http://docs.python.org/library/stdtypes.html#string-formatting-operations

That shows how to use the template formatting as it currently exists.  To my
knowledge there is no support for the inverse operation, which is what Joe
asked about.  Given a string and a format string assign the elements of the
string which correspond to the template elements to key/value pairs in a
dictionary.

Skip

--
http://mail.python.org/mailman/listinfo/python-list


Re: template strings for matching?

2008-10-09 Thread Robin Becker

Joe Strout wrote:
Catching up on what's new in Python since I last used it a decade ago, 
I've just been reading up on template strings.  These are pretty cool!  
However, just as a template string has some advantages over % 
substitution for building a string, it seems like it would have 
advantages over manually constructing a regex for string matching.


So... is there any way to use a template string for matching?  I 
expected something like:

...

you could use something like this to record the lookups

 class XDict(dict):
... def __new__(cls,*args,**kwds):
... self = dict.__new__(cls,*args,**kwds)
... self.__record = set()
... return self
... def _record_clear(self):
... self.__record.clear()
... def __getitem__(self,k):
... v = dict.__getitem__(self,k)
... self.__record.add(k)
... return v
... def _record(self):
... return self.__record
...
 x=XDict()
 x._record()
set([])
 x=XDict(a=1,b=2,c=3)
 x
{'a': 1, 'c': 3, 'b': 2}
 '%(a)s %(c)s' % x
'1 3'
 x._record()
set(['a', 'c'])


a slight modification would allow your template match function to work even when 
some keys were missing in the dict. That would allow you to see which lookups 
failed as well.

--
Robin Becker

--
http://mail.python.org/mailman/listinfo/python-list


Re: template strings for matching?

2008-10-09 Thread Paul McGuire
Pyparsing makes building expressions with named fields pretty easy.

from pyparsing import Word, alphas

wrd = Word(alphas)

templ = The + wrd(object) + in + wrd(location) + \
stays mainly in the + wrd(subloc) + .

tests = \
The rain in Spain stays mainly in the plain.
The snake in plane stays mainly in the cabin.
In Hempstead, Haverford and Hampshire hurricanes hardly ever
happen.
.splitlines()
for t in tests:
t = t.strip()
try:
match = templ.parseString(t)
print match.object
print match.location
print match.subloc
print Fields are: %(object)s %(location)s %(subloc)s % match
except:
print ' + t + ' is not a match.
print

Read more about pyparsing at http://pyparsing.wikispaces.com.
-- Paul

--
http://mail.python.org/mailman/listinfo/python-list


Re: template strings for matching?

2008-10-09 Thread Tino Wildenhain

[EMAIL PROTECTED] wrote:

Tino Yeah, its a bit hard to spot:

Tino 
http://docs.python.org/library/stdtypes.html#string-formatting-operations

That shows how to use the template formatting as it currently exists.  To my
knowledge there is no support for the inverse operation, which is what Joe
asked about.  Given a string and a format string assign the elements of the
string which correspond to the template elements to key/value pairs in a
dictionary.


??? can you elaborate? I don't see the problem.

%(foo)s % mapping

just calls get(foo) on mapping so if you have a dictionary
with all possible values it just works. If you want to do
some fancy stuff just subclass and change the method
call appropriately.

Regards
Tino


smime.p7s
Description: S/MIME Cryptographic Signature
--
http://mail.python.org/mailman/listinfo/python-list


Re: template strings for matching?

2008-10-09 Thread Joe Strout

On Oct 9, 2008, at 7:05 AM, [EMAIL PROTECTED] wrote:


   Tino 
http://docs.python.org/library/stdtypes.html#string-formatting-operations

That shows how to use the template formatting as it currently  
exists.  To my
knowledge there is no support for the inverse operation, which is  
what Joe
asked about.  Given a string and a format string assign the elements  
of the
string which correspond to the template elements to key/value pairs  
in a

dictionary.


Right.

Well, what do y'all think?  It wouldn't be too hard to write this for  
myself, but it seems like the sort of thing Python ought to have built  
in.  Right on the Template class, so it doesn't add anything new to  
the global namespace; it just makes this class more useful.


I took a look at PEP 3101, which is more of a high-powered string  
formatter (as the title says, Advanced String Formatting), and will be  
considerably more intimidating for a beginner than Template.  So, even  
if that goes through, perhaps Template will stick around, and being  
able to use it in both directions could be quite handy.


Oh boy!  Could this be my very first PEP?  :)

Thanks for any opinions,
- Joe


--
http://mail.python.org/mailman/listinfo/python-list


Re: template strings for matching?

2008-10-09 Thread Peter Otten
Joe Strout wrote:

 Catching up on what's new in Python since I last used it a decade ago,
 I've just been reading up on template strings.  These are pretty
 cool!  

I don't think they've gained much traction and expect them to be superseded
by PEP 3101 (see http://www.python.org/dev/peps/pep-3101/ )

 However, just as a template string has some advantages over % 
 substitution for building a string, it seems like it would have
 advantages over manually constructing a regex for string matching.
 
 So... is there any way to use a template string for matching?  I
 expected something like:
 
   templ = Template(The $object in $location falls mainly in the
 $subloc.)
   d = templ.match(s)
 
 and then d would either by None (if s doesn't match), or a dictionary
 with values for 'object', 'location', and 'subloc'.
 
 But I couldn't find anything like that in the docs.  Am I overlooking
 something?

I don't think so. Here's a DIY implementation:

import re

def _replace(match):
word = match.group(2)
if word == $:
return [$]
return (?P%s.*) % word

def extract(template, text):
r = re.compile(r([$]([$]|\w+)))
r = r.sub(_replace, template)
return re.compile(r).match(text).groupdict()


print extract(My $$ is on the $object in $location...,
  My $ is on the biggest bird in the highest tree...)

As always with regular expressions I may be missing some corner cases...

Peter
--
http://mail.python.org/mailman/listinfo/python-list


Re: template strings for matching?

2008-10-09 Thread skip

Tino ??? can you elaborate? I don't see the problem.

Tino %(foo)s % mapping

Joe wants to go in the other direction.  Using your example, he wants a
function which takes a string and a template string and returns a dict.
Here's a concrete example:

s = My dog has fleas
fmt = My $pet has $parasites
d = fmt_extract(fmt, s)
assert d['pet'] == 'dog'
assert d['parasites'] == 'fleas'

Skip
--
http://mail.python.org/mailman/listinfo/python-list


template strings for matching?

2008-10-09 Thread Joe Strout
Catching up on what's new in Python since I last used it a decade ago,  
I've just been reading up on template strings.  These are pretty  
cool!  However, just as a template string has some advantages over %  
substitution for building a string, it seems like it would have  
advantages over manually constructing a regex for string matching.


So... is there any way to use a template string for matching?  I  
expected something like:


 templ = Template(The $object in $location falls mainly in the  
$subloc.)

 d = templ.match(s)

and then d would either by None (if s doesn't match), or a dictionary  
with values for 'object', 'location', and 'subloc'.


But I couldn't find anything like that in the docs.  Am I overlooking  
something?


Thanks,
- Joe

--
http://mail.python.org/mailman/listinfo/python-list


Re: template strings for matching?

2008-10-09 Thread Joe Strout
Wow, this was harder than I thought (at least for a rusty Pythoneer  
like myself).  Here's my stab at an implementation.  Remember, the  
goal is to add a match method to Template which works like  
Template.substitute, but in reverse: given a string, if that string  
matches the template, then it should return a dictionary mapping each  
template field to the corresponding value in the given string.


Oh, and as one extra feature, I want to support a .greedy attribute  
on the Template object, which determines whether the matching of  
fields should be done in a greedy or non-greedy manner.



#!/usr/bin/python

from string import Template
import re

def templateMatch(self, s):
# start by finding the fields in our template, and building a map
# from field position (index) to field name.
posToName = {}
pos = 1
for item in self.pattern.findall(self.template):
# each item is a tuple where item 1 is the field name
posToName[pos] = item[1]
pos += 1

# determine if we should match greedy or non-greedy
greedy = False
if self.__dict__.has_key('greedy'):
greedy = self.greedy

# now, build a regex pattern to compare against s
# (taking care to escape any characters in our template that
# would have special meaning in regex)
pat = self.template.replace('.', '\\.')
pat = pat.replace('(', '\\(')
pat = pat.replace(')', '\\)') # there must be a better way...

if greedy:
pat = self.pattern.sub('(.*)', pat)
else:
pat = self.pattern.sub('(.*?)', pat)
p = re.compile(pat)

# try to match this to the given string
match = p.match(s)
if match is None: return None
out = {}
for i in posToName.keys():
out[posToName[i]] = match.group(i)
return out


Template.match = templateMatch

t = Template(The $object in $location falls mainly in the $subloc.)
print t.match( The rain in Spain falls mainly in the train. )


This sort-of works, but it won't properly handle $$ in the template,  
and I'm not too sure whether it handles the ${fieldname} form,  
either.  Also, it only escapes '.', '(', and ')' in the template...  
there must be a better way of escaping all characters that have  
special meaning to RegEx, except for '$' (which is why I can't use  
re.escape).


Probably the rest of the code could be improved too.  I'm eager to  
hear your feedback.


Thanks,
- Joe


--
http://mail.python.org/mailman/listinfo/python-list


Re: template strings for matching?

2008-10-09 Thread MRAB
On Oct 9, 5:20 pm, Joe Strout [EMAIL PROTECTED] wrote:
 Wow, this was harder than I thought (at least for a rusty Pythoneer  
 like myself).  Here's my stab at an implementation.  Remember, the  
 goal is to add a match method to Template which works like  
 Template.substitute, but in reverse: given a string, if that string  
 matches the template, then it should return a dictionary mapping each  
 template field to the corresponding value in the given string.

 Oh, and as one extra feature, I want to support a .greedy attribute  
 on the Template object, which determines whether the matching of  
 fields should be done in a greedy or non-greedy manner.

 
 #!/usr/bin/python

 from string import Template
 import re

 def templateMatch(self, s):
         # start by finding the fields in our template, and building a map
         # from field position (index) to field name.
         posToName = {}
         pos = 1
         for item in self.pattern.findall(self.template):
                 # each item is a tuple where item 1 is the field name
                 posToName[pos] = item[1]
                 pos += 1

         # determine if we should match greedy or non-greedy
         greedy = False
         if self.__dict__.has_key('greedy'):
                 greedy = self.greedy

         # now, build a regex pattern to compare against s
         # (taking care to escape any characters in our template that
         # would have special meaning in regex)
         pat = self.template.replace('.', '\\.')
         pat = pat.replace('(', '\\(')
         pat = pat.replace(')', '\\)') # there must be a better way...

         if greedy:
                 pat = self.pattern.sub('(.*)', pat)
         else:
                 pat = self.pattern.sub('(.*?)', pat)
         p = re.compile(pat)

         # try to match this to the given string
         match = p.match(s)
         if match is None: return None
         out = {}
         for i in posToName.keys():
                 out[posToName[i]] = match.group(i)
         return out

 Template.match = templateMatch

 t = Template(The $object in $location falls mainly in the $subloc.)
 print t.match( The rain in Spain falls mainly in the train. )
 

 This sort-of works, but it won't properly handle $$ in the template,  
 and I'm not too sure whether it handles the ${fieldname} form,  
 either.  Also, it only escapes '.', '(', and ')' in the template...  
 there must be a better way of escaping all characters that have  
 special meaning to RegEx, except for '$' (which is why I can't use  
 re.escape).

 Probably the rest of the code could be improved too.  I'm eager to  
 hear your feedback.

 Thanks,
 - Joe

How about something like:

import re

def placeholder(m):
if m.group(1):
return (?P%s.+) % m.group(1)
elif m.group(2):
return \\$
else:
return re.escape(m.group(3))

regex = re.compile(r\$(\w+)|(\$\$))

t = The $object in $location falls mainly in the $subloc.
print regex.sub(placeholder, t)
--
http://mail.python.org/mailman/listinfo/python-list