Re: Newbie needs help with regex strings

2005-12-14 Thread Paul McGuire
This isn't a regex solution, but uses pyparsing instead.  Pyparsing
helps you construct recursive-descent parsers, and maintains a code
structure that is easy to compose, read, understand, maintain, and
remember what you did 6-months after you wrote it in the first place.

Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul


data = pie=apple,quantity=1,cooked=yes,ingredients='sugar and
cinnamon'
Pie=peach,quantity=2,ingredients='peaches,powdered sugar'
Pie=cherry,quantity=3,cooked=no,price=5,ingredients='cherries and
sugar'

from pyparsing import CaselessLiteral, Literal, Word, alphas, nums,
oneOf, quotedString, \
Group, Dict, delimitedList, removeQuotes

# define basic elements for parsing
pieName = Word(alphas)
qty = Word(nums)
yesNo = oneOf(yes no,caseless=True)
EQUALS = Literal(=).suppress()

# define separate pie attributes
pieEntry = CaselessLiteral(pie) + EQUALS + pieName
qtyEntry = CaselessLiteral(quantity) + EQUALS + qty
cookedEntry  = CaselessLiteral(cooked) + EQUALS + yesNo
ingredientsEntry = CaselessLiteral(ingredients) + EQUALS +
quotedString.setParseAction(removeQuotes)
priceEntry   = CaselessLiteral(price) + EQUALS + qty

# define overall list of alternative attributes
pieAttribute = pieEntry | qtyEntry | cookedEntry | ingredientsEntry |
priceEntry

# define each line as a list of attributes (comma delimiter is the
default), grouping results by attribute
pieDataFormat = delimitedList( Group(pieAttribute) )

# parse each line in the input string, and create a dict of the results
for line in data.split(\n):
pieData = pieDataFormat.parseString(line)
pieDict = dict( pieData.asList() )
print pieDict

''' prints out:
{'cooked': 'yes', 'ingredients': 'sugar and cinnamon', 'pie': 'apple',
'quantity': '1'}
{'ingredients': 'peaches,powdered sugar', 'pie': 'peach', 'quantity':
'2'}
{'cooked': 'no', 'price': '5', 'ingredients': 'cherries and sugar',
'pie': 'cherry', 'quantity': '3'}
'''

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie needs help with regex strings

2005-12-14 Thread Fredrik Lundh
Scott wrote:

 I have a file with lines in the following format.

 pie=apple,quantity=1,cooked=yes,ingredients='sugar and cinnamon'
 Pie=peach,quantity=2,ingredients='peaches,powdered sugar'
 Pie=cherry,quantity=3,cooked=no,price=5,ingredients='cherries and sugar'

 I would like to pull out some of the values and write them to a csv
 file.

 For line in filea
 pie = regex
 quantity = regex
 cooked = regex
 ingredients = regex
 fileb.write (quantity,pie,cooked,ingredients)

 How can I retreive the values and assign them to a name?

here's a relatively straightforward re solution that gives you a dictionary
with the values for each line.

import re

for line in open(infile.txt):
d = {}
for k, v1, v2 in re.findall((\w+)=(?:(\w+)|'([^']*)'), line):
d[k.lower()] = v1 or v2
print d

(the pattern looks for alphanumeric characters (k) followed by an equal
sign followed by either a number of alphanumeric characters (v1), or text
inside single quotes (v2).  either v1 or v2 will be set)

getting from dictionary to file is left as an exercise to the reader.

/F



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie needs help with regex strings

2005-12-14 Thread Christopher Subich
Paul McGuire wrote:
 This isn't a regex solution, but uses pyparsing instead.  Pyparsing
 helps you construct recursive-descent parsers, and maintains a code
 structure that is easy to compose, read, understand, maintain, and
 remember what you did 6-months after you wrote it in the first place.
 
 Download pyparsing at http://pyparsing.sourceforge.net.


For the example listed, pyparsing is even overkill; the OP should 
probably use the csv module.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie needs help with regex strings

2005-12-14 Thread Dennis Benzinger
Catalina Scott A Contr AFCA/EVEO schrieb:
 I have a file with lines in the following format.
 
 pie=apple,quantity=1,cooked=yes,ingredients='sugar and cinnamon'
 Pie=peach,quantity=2,ingredients='peaches,powdered sugar'
 Pie=cherry,quantity=3,cooked=no,price=5,ingredients='cherries and sugar'
 
 I would like to pull out some of the values and write them to a csv
 file.
 
 For line in filea
   pie = regex
   quantity = regex
   cooked = regex
   ingredients = regex
   fileb.write (quantity,pie,cooked,ingredients)
 
 How can I retreive the values and assign them to a name?
 
 Thank you
 Scott

Try this:

import re
import StringIO

filea_string = pie=apple,quantity=1,cooked=yes,ingredients='sugar and 
cinnamon'
pie=peach,quantity=2,ingredients='peaches,powdered sugar'
pie=cherry,quantity=3,cooked=no,price=5,ingredients='cherries and sugar'


FIELDS = (pie, quantity, cooked, ingredients, price)

field_regexes = {}

for field in FIELDS:
 field_regexes[field] = re.compile(%s=([^,\n]*) % field)

for line in StringIO.StringIO(filea_string):

 field_values = {}

 for field in FIELDS:
 match_object = field_regexes[field].search(line)

 if match_object is not None:
 field_values[field] = match_object.group(1)

 print field_values
 #fileb.write (quantity,pie,cooked,ingredients)



Bye,
Dennis
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie needs help with regex strings

2005-12-14 Thread Dennis Benzinger
Christopher Subich schrieb:
 Paul McGuire wrote:
 
 [...]
 For the example listed, pyparsing is even overkill; the OP should 
 probably use the csv module.

But the OP wants to parse lines with key=value pairs, not simply lines
with comma separated values. Using the csv module will just separate the 
key=value pairs and you would still have to take them apart.

Bye,
Dennis
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie needs help with regex strings

2005-12-14 Thread Gerard Flanagan
Fredrik Lundh wrote:

 Scott wrote:

  I have a file with lines in the following format.
 
  pie=apple,quantity=1,cooked=yes,ingredients='sugar and cinnamon'
  Pie=peach,quantity=2,ingredients='peaches,powdered sugar'
  Pie=cherry,quantity=3,cooked=no,price=5,ingredients='cherries and sugar'
 
  I would like to pull out some of the values and write them to a csv
  file.

 here's a relatively straightforward re solution that gives you a dictionary
 with the values for each line.

 import re

 for line in open(infile.txt):
 d = {}
 for k, v1, v2 in re.findall((\w+)=(?:(\w+)|'([^']*)'), line):
 d[k.lower()] = v1 or v2
 print d


How about replacing

d={}

with

d = {'pie': ',', 'quantity': ',', 'cooked': ',', 'price':
',','ingredients': '', 'eol': '\n'}

to get the appropriate commas for missing fields?

Gerard

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie needs help with regex strings

2005-12-14 Thread Michael Spencer
Dennis Benzinger wrote:
 Christopher Subich schrieb:
 Paul McGuire wrote:

 [...]
 For the example listed, pyparsing is even overkill; the OP should 
 probably use the csv module.
 
 But the OP wants to parse lines with key=value pairs, not simply lines
 with comma separated values. Using the csv module will just separate the 
 key=value pairs and you would still have to take them apart.
 
 Bye,
 Dennis
that, and csv.reader has another problem with this task:

   csv.reader([Pie=peach,quantity=2,ingredients='peaches,powdered sugar'], 
quotechar = ').next()
  ['Pie=peach', 'quantity=2', ingredients='peaches, powdered sugar']

i.e., it doesn't allow separators within fields unless either the *whole* field 
is quoted:

   csv.reader([Pie=peach,quantity=2,'ingredients=peaches,powdered sugar'], 
quotechar = ').next()
  ['Pie=peach', 'quantity=2', 'ingredients=peaches,powdered sugar']
  

or the separator is escaped:

   csv.reader([Pie=peach,quantity=2,ingredients='peaches\,powdered 
sugar'], 
quotechar = ', escapechar = \\).next()
  ['Pie=peach', 'quantity=2', ingredients='peaches,powdered sugar']
  


Michael

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie needs help with regex strings

2005-12-14 Thread Michael Spencer
Catalina Scott A Contr AFCA/EVEO wrote:
 I have a file with lines in the following format.
 
 pie=apple,quantity=1,cooked=yes,ingredients='sugar and cinnamon'
 Pie=peach,quantity=2,ingredients='peaches,powdered sugar'
 Pie=cherry,quantity=3,cooked=no,price=5,ingredients='cherries and sugar'
 
 I would like to pull out some of the values and write them to a csv
 file.
 
 For line in filea
   pie = regex
   quantity = regex
   cooked = regex
   ingredients = regex
   fileb.write (quantity,pie,cooked,ingredients)
 
 How can I retreive the values and assign them to a name?
 
 Thank you
 Scott

Here's a trick to parse this source, exploiting the fact that its syntax mimics 
python's keyword arguments.  All that's needed is a way to quote the bare names:

   class lazynames(dict):
  ... def __getitem__(self, key):
  ... if key in self:
  ... return dict.__getitem__(self, key)
  ... return %s % key # if name not found, return it as a str constant
  ...
   d = lazynames(dict=dict, __builtins__ = None)


   source = \
  ... pie=apple,quantity=1,cooked=yes,ingredients='sugar and cinnamon'
  ... Pie=peach,quantity=2,ingredients='peaches,powdered sugar'
  ... Pie=cherry,quantity=3,cooked=no,price=5,ingredients='cherries and sugar'
  ... 
  
   [eval(dict(%s) % line, d) for line in source.splitlines()]
  [{'cooked': 'yes', 'ingredients': 'sugar and cinnamon', 'pie': 'apple', 
'quantity': 1}, {'ingredients': 'peaches,powdered sugar', 'Pie': 'peach', 
'quantity': 2}, {'cooked': 'no', 'price': 5, 'ingredients': 'cherries and 
sugar', 'Pie': 'cherry', 'quantity': 3}]
  

Michael

-- 
http://mail.python.org/mailman/listinfo/python-list