Re: Newbie needs help with regex strings
This isn't a regex solution, but uses pyparsing instead. Pyparsing helps you construct recursive-descent parsers, and maintains a code structure that is easy to compose, read, understand, maintain, and remember what you did 6-months after you wrote it in the first place. Download pyparsing at http://pyparsing.sourceforge.net. -- Paul data = pie=apple,quantity=1,cooked=yes,ingredients='sugar and cinnamon' Pie=peach,quantity=2,ingredients='peaches,powdered sugar' Pie=cherry,quantity=3,cooked=no,price=5,ingredients='cherries and sugar' from pyparsing import CaselessLiteral, Literal, Word, alphas, nums, oneOf, quotedString, \ Group, Dict, delimitedList, removeQuotes # define basic elements for parsing pieName = Word(alphas) qty = Word(nums) yesNo = oneOf(yes no,caseless=True) EQUALS = Literal(=).suppress() # define separate pie attributes pieEntry = CaselessLiteral(pie) + EQUALS + pieName qtyEntry = CaselessLiteral(quantity) + EQUALS + qty cookedEntry = CaselessLiteral(cooked) + EQUALS + yesNo ingredientsEntry = CaselessLiteral(ingredients) + EQUALS + quotedString.setParseAction(removeQuotes) priceEntry = CaselessLiteral(price) + EQUALS + qty # define overall list of alternative attributes pieAttribute = pieEntry | qtyEntry | cookedEntry | ingredientsEntry | priceEntry # define each line as a list of attributes (comma delimiter is the default), grouping results by attribute pieDataFormat = delimitedList( Group(pieAttribute) ) # parse each line in the input string, and create a dict of the results for line in data.split(\n): pieData = pieDataFormat.parseString(line) pieDict = dict( pieData.asList() ) print pieDict ''' prints out: {'cooked': 'yes', 'ingredients': 'sugar and cinnamon', 'pie': 'apple', 'quantity': '1'} {'ingredients': 'peaches,powdered sugar', 'pie': 'peach', 'quantity': '2'} {'cooked': 'no', 'price': '5', 'ingredients': 'cherries and sugar', 'pie': 'cherry', 'quantity': '3'} ''' -- http://mail.python.org/mailman/listinfo/python-list
Re: Newbie needs help with regex strings
Scott wrote: I have a file with lines in the following format. pie=apple,quantity=1,cooked=yes,ingredients='sugar and cinnamon' Pie=peach,quantity=2,ingredients='peaches,powdered sugar' Pie=cherry,quantity=3,cooked=no,price=5,ingredients='cherries and sugar' I would like to pull out some of the values and write them to a csv file. For line in filea pie = regex quantity = regex cooked = regex ingredients = regex fileb.write (quantity,pie,cooked,ingredients) How can I retreive the values and assign them to a name? here's a relatively straightforward re solution that gives you a dictionary with the values for each line. import re for line in open(infile.txt): d = {} for k, v1, v2 in re.findall((\w+)=(?:(\w+)|'([^']*)'), line): d[k.lower()] = v1 or v2 print d (the pattern looks for alphanumeric characters (k) followed by an equal sign followed by either a number of alphanumeric characters (v1), or text inside single quotes (v2). either v1 or v2 will be set) getting from dictionary to file is left as an exercise to the reader. /F -- http://mail.python.org/mailman/listinfo/python-list
Re: Newbie needs help with regex strings
Paul McGuire wrote: This isn't a regex solution, but uses pyparsing instead. Pyparsing helps you construct recursive-descent parsers, and maintains a code structure that is easy to compose, read, understand, maintain, and remember what you did 6-months after you wrote it in the first place. Download pyparsing at http://pyparsing.sourceforge.net. For the example listed, pyparsing is even overkill; the OP should probably use the csv module. -- http://mail.python.org/mailman/listinfo/python-list
Re: Newbie needs help with regex strings
Catalina Scott A Contr AFCA/EVEO schrieb: I have a file with lines in the following format. pie=apple,quantity=1,cooked=yes,ingredients='sugar and cinnamon' Pie=peach,quantity=2,ingredients='peaches,powdered sugar' Pie=cherry,quantity=3,cooked=no,price=5,ingredients='cherries and sugar' I would like to pull out some of the values and write them to a csv file. For line in filea pie = regex quantity = regex cooked = regex ingredients = regex fileb.write (quantity,pie,cooked,ingredients) How can I retreive the values and assign them to a name? Thank you Scott Try this: import re import StringIO filea_string = pie=apple,quantity=1,cooked=yes,ingredients='sugar and cinnamon' pie=peach,quantity=2,ingredients='peaches,powdered sugar' pie=cherry,quantity=3,cooked=no,price=5,ingredients='cherries and sugar' FIELDS = (pie, quantity, cooked, ingredients, price) field_regexes = {} for field in FIELDS: field_regexes[field] = re.compile(%s=([^,\n]*) % field) for line in StringIO.StringIO(filea_string): field_values = {} for field in FIELDS: match_object = field_regexes[field].search(line) if match_object is not None: field_values[field] = match_object.group(1) print field_values #fileb.write (quantity,pie,cooked,ingredients) Bye, Dennis -- http://mail.python.org/mailman/listinfo/python-list
Re: Newbie needs help with regex strings
Christopher Subich schrieb: Paul McGuire wrote: [...] For the example listed, pyparsing is even overkill; the OP should probably use the csv module. But the OP wants to parse lines with key=value pairs, not simply lines with comma separated values. Using the csv module will just separate the key=value pairs and you would still have to take them apart. Bye, Dennis -- http://mail.python.org/mailman/listinfo/python-list
Re: Newbie needs help with regex strings
Fredrik Lundh wrote: Scott wrote: I have a file with lines in the following format. pie=apple,quantity=1,cooked=yes,ingredients='sugar and cinnamon' Pie=peach,quantity=2,ingredients='peaches,powdered sugar' Pie=cherry,quantity=3,cooked=no,price=5,ingredients='cherries and sugar' I would like to pull out some of the values and write them to a csv file. here's a relatively straightforward re solution that gives you a dictionary with the values for each line. import re for line in open(infile.txt): d = {} for k, v1, v2 in re.findall((\w+)=(?:(\w+)|'([^']*)'), line): d[k.lower()] = v1 or v2 print d How about replacing d={} with d = {'pie': ',', 'quantity': ',', 'cooked': ',', 'price': ',','ingredients': '', 'eol': '\n'} to get the appropriate commas for missing fields? Gerard -- http://mail.python.org/mailman/listinfo/python-list
Re: Newbie needs help with regex strings
Dennis Benzinger wrote: Christopher Subich schrieb: Paul McGuire wrote: [...] For the example listed, pyparsing is even overkill; the OP should probably use the csv module. But the OP wants to parse lines with key=value pairs, not simply lines with comma separated values. Using the csv module will just separate the key=value pairs and you would still have to take them apart. Bye, Dennis that, and csv.reader has another problem with this task: csv.reader([Pie=peach,quantity=2,ingredients='peaches,powdered sugar'], quotechar = ').next() ['Pie=peach', 'quantity=2', ingredients='peaches, powdered sugar'] i.e., it doesn't allow separators within fields unless either the *whole* field is quoted: csv.reader([Pie=peach,quantity=2,'ingredients=peaches,powdered sugar'], quotechar = ').next() ['Pie=peach', 'quantity=2', 'ingredients=peaches,powdered sugar'] or the separator is escaped: csv.reader([Pie=peach,quantity=2,ingredients='peaches\,powdered sugar'], quotechar = ', escapechar = \\).next() ['Pie=peach', 'quantity=2', ingredients='peaches,powdered sugar'] Michael -- http://mail.python.org/mailman/listinfo/python-list
Re: Newbie needs help with regex strings
Catalina Scott A Contr AFCA/EVEO wrote: I have a file with lines in the following format. pie=apple,quantity=1,cooked=yes,ingredients='sugar and cinnamon' Pie=peach,quantity=2,ingredients='peaches,powdered sugar' Pie=cherry,quantity=3,cooked=no,price=5,ingredients='cherries and sugar' I would like to pull out some of the values and write them to a csv file. For line in filea pie = regex quantity = regex cooked = regex ingredients = regex fileb.write (quantity,pie,cooked,ingredients) How can I retreive the values and assign them to a name? Thank you Scott Here's a trick to parse this source, exploiting the fact that its syntax mimics python's keyword arguments. All that's needed is a way to quote the bare names: class lazynames(dict): ... def __getitem__(self, key): ... if key in self: ... return dict.__getitem__(self, key) ... return %s % key # if name not found, return it as a str constant ... d = lazynames(dict=dict, __builtins__ = None) source = \ ... pie=apple,quantity=1,cooked=yes,ingredients='sugar and cinnamon' ... Pie=peach,quantity=2,ingredients='peaches,powdered sugar' ... Pie=cherry,quantity=3,cooked=no,price=5,ingredients='cherries and sugar' ... [eval(dict(%s) % line, d) for line in source.splitlines()] [{'cooked': 'yes', 'ingredients': 'sugar and cinnamon', 'pie': 'apple', 'quantity': 1}, {'ingredients': 'peaches,powdered sugar', 'Pie': 'peach', 'quantity': 2}, {'cooked': 'no', 'price': 5, 'ingredients': 'cherries and sugar', 'Pie': 'cherry', 'quantity': 3}] Michael -- http://mail.python.org/mailman/listinfo/python-list