Re: re.search (works)|(doesn't work) depending on for loop order
On Mar 22, 5:03 pm, Gabriel Genellina [EMAIL PROTECTED] wrote: En Sat, 22 Mar 2008 17:27:49 -0300, sgharvey [EMAIL PROTECTED] escribi�: Take a look at ConfigObjhttp://pypi.python.org/pypi/ConfigObj/ Thanks for the pointer; I'll check it out. I'm not sure you can process a config file in this unstructured way; looks a lot easier if you look for [sections] and process sequentially lines inside sections. It works though... now that I've fixed up all my ugly stuff, and a dumb logic error or two. The regular expressions look strange too. A comment may be empty. A setting too. There may be spaces around the = sign. Don't try to catch all in one go. I didn't think about empty comments/settings... fixed now. It also seemed simpler to handle surrounding spaces after the match was found. New version of the problematic part: code self.contents = [] content = {} # Get the content in each line for line in lines: for name in patterns: # Match each pattern to the current line match = patterns[name].search(line) if match: content[name] = match.group(0).strip() self.contents.append(content) content = {} /code new iniparsing.py http://pastebin.com/f445701d4 new ini_regexen_dicts.py http://pastebin.com/f1e41cd3d -- Gabriel Genellina Much thanks to all for the constructive criticism. Samuel Harvey -- http://mail.python.org/mailman/listinfo/python-list
re.search (works)|(doesn't work) depending on for loop order
... and by works, I mean works like I expect it to. I'm writing my own cheesy config.ini parser because ConfigParser doesn't preserve case or order of sections, or order of options w/in sections. What's confusing me is this: If I try matching every line to one pattern at a time, all the patterns that are supposed to match, actually match. If I try to match every pattern to one line at a time, only one pattern will match. What am I not understanding about re.search? Doesn't match properly: code # Iterate through each pattern for each line for line in lines: for pattern in patterns: # Match each pattern to the current line match = patterns[pattern].search(line) if match: %s: %s % (pattern, str(match.groups()) ) /code _Does_ match properly: code # Let's iterate through all the lines for each pattern for pattern in pattern: for line in lines: # Match each pattern to the current line match = patterns[pattern].search(line) if match: %s: %s % (pattern, str(match.groups()) ) /code Related code: The whole src http://pastebin.com/f63298772 regexen and delimiters (imported into whole src) http://pastebin.com/f485ac180 -- http://mail.python.org/mailman/listinfo/python-list
Re: re.search (works)|(doesn't work) depending on for loop order
On Sat, 22 Mar 2008 13:27:49 -0700, sgharvey wrote: ... and by works, I mean works like I expect it to. I'm writing my own cheesy config.ini parser because ConfigParser doesn't preserve case or order of sections, or order of options w/in sections. What's confusing me is this: If I try matching every line to one pattern at a time, all the patterns that are supposed to match, actually match. If I try to match every pattern to one line at a time, only one pattern will match. What am I not understanding about re.search? That has nothing to do with `re.search` but how files work. A file has a current position marker that is advanced at each iteration to the next line in the file. When it is at the end, it stays there, so you can just iterate *once* over an open file unless you rewind it with the `seek()` method. That only works on seekable files and it's not a good idea anyway because usually the files and the overhead of reading is greater than the time to iterate over in memory data like the patterns. Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list
Re: re.search (works)|(doesn't work) depending on for loop order
On Mar 23, 8:21 am, Marc 'BlackJack' Rintsch [EMAIL PROTECTED] wrote: On Sat, 22 Mar 2008 13:27:49 -0700, sgharvey wrote: ... and by works, I mean works like I expect it to. I'm writing my own cheesy config.ini parser because ConfigParser doesn't preserve case or order of sections, or order of options w/in sections. What's confusing me is this: If I try matching every line to one pattern at a time, all the patterns that are supposed to match, actually match. If I try to match every pattern to one line at a time, only one pattern will match. What am I not understanding about re.search? That has nothing to do with `re.search` but how files work. A file has a current position marker that is advanced at each iteration to the next line in the file. When it is at the end, it stays there, so you can just iterate *once* over an open file unless you rewind it with the `seek()` method. That only works on seekable files and it's not a good idea anyway because usually the files and the overhead of reading is greater than the time to iterate over in memory data like the patterns. Unless the OP has changed the pastebin code since you read it, that's absolutely nothing to do with his problem -- his pastebin code slurps in the whole .ini file using file.readlines; it is not iterating over an open file. -- http://mail.python.org/mailman/listinfo/python-list
Re: re.search (works)|(doesn't work) depending on for loop order
On Mar 23, 7:27 am, sgharvey [EMAIL PROTECTED] wrote: ... and by works, I mean works like I expect it to. You haven't told us what you expect it to do. In any case, your subject heading indicates that the problem is 99.999% likely to be in your logic -- the converse would require the result of re.compile() to retain some memory of what it's seen before *AND* to act differently depending somehow on those memorised facts. I'm writing my own cheesy config.ini parser because ConfigParser doesn't preserve case or order of sections, or order of options w/in sections. What's confusing me is this: If I try matching every line to one pattern at a time, all the patterns that are supposed to match, actually match. If I try to match every pattern to one line at a time, only one pattern will match. What am I not understanding about re.search? Its behaviour is not contingent on previous input. The following pseudocode is not very useful; the corrections I have made below can be made only after reading the actual pastebin code :- ( ... you are using the name pattern to refer both to a pattern name (e.g. 'setting') and to a compiled regex. Doesn't match properly: code # Iterate through each pattern for each line for line in lines: for pattern in patterns: you mean: for pattern_name in pattern_names: # Match each pattern to the current line match = patterns[pattern].search(line) you mean: match = compiled_regexes[pattern_name].search(line) if match: %s: %s % (pattern, str(match.groups()) ) you mean: print pattern_name, match.groups /code _Does_ match properly: code [snip] /code Related code: The whole src http://pastebin.com/f63298772 This can't be the code that you ran, because it won't even compile. See comments in my update at http://pastebin.com/m77f0617a By the way, you should be either (a) using *match* (not search) with a \Z at the end of each pattern or (b) checking that there is not extraneous guff at the end of the line ... otherwise a line like [blah] waffle would be classified as a section. Have you considered leading/trailing/embedded spaces? regexen and delimiters (imported into whole src) http://pastebin.com/f485ac180 HTH, John -- http://mail.python.org/mailman/listinfo/python-list
Re: re.search (works)|(doesn't work) depending on for loop order
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 sgharvey wrote: ... and by works, I mean works like I expect it to. I'm writing my own cheesy config.ini parser because ConfigParser doesn't preserve case or order of sections, or order of options w/in sections. What's confusing me is this: If I try matching every line to one pattern at a time, all the patterns that are supposed to match, actually match. If I try to match every pattern to one line at a time, only one pattern will match. I don't see that behavior when I try your code. I had to fix your pattern loading: patterns[pattern] = re.compile(pattern_strings[pattern], re.VERBOSE) I would also recommend against using both the plural and singular variable names, its bound to cause confusion eventually. I also changed contents to self.contents so that it would be accessible outside the class. The correct way to do it is run each pattern against each line. This will maintain the order of the config.ini file. If you do it the other way you will end up with everything ordered based on the patterns instead of the file. I tried it with Python2.5 on OSX from within TextMate and it ran as expected. Brian - -- - ---[Office 70.9F]--[Outside 54.5F]--[Server 103.3F]--[Coaster 68.0F]--- - ---[ KLAHOWYA WSF (366773110) @ 47 31.2076 -122 27.2249 ]--- Software, Linux, Microcontrollers http://www.brianlane.com AIS Parser SDKhttp://www.aisparser.com Movie Landmarks Search Enginehttp://www.movielandmarks.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (Darwin) Comment: Remember Lexington Green! iD8DBQFH5ZHaIftj/pcSws0RAigtAJsE+NWTxwV5kO797P6AXhNTEp8dmQCfXL9I y0nD/oOfNw6ZR6UZIOvwkkE= =U+Zo -END PGP SIGNATURE- -- http://mail.python.org/mailman/listinfo/python-list
Re: re.search (works)|(doesn't work) depending on for loop order
En Sat, 22 Mar 2008 17:27:49 -0300, sgharvey [EMAIL PROTECTED] escribi�: ... and by works, I mean works like I expect it to. I'm writing my own cheesy config.ini parser because ConfigParser doesn't preserve case or order of sections, or order of options w/in sections. Take a look at ConfigObj http://pypi.python.org/pypi/ConfigObj/ Instead of: # Remove the '\n's from the end of each line lines = [line[0:line.__len__()-1] for line in lines] line.__len__() is a crazy (and ugly) way of spelling len(line). The comment is misleading; you say you remove '\n's but you don't actually check for them. The last line in the file might not have a trailing \n. See this: lines = [line.rstrip('\n') for line in lines] Usually trailing spaces are ignored too; so you end up writing: lines = [line.rstrip() for line in lines] In this case: # Compile the regexen patterns = {} for pattern in pattern_strings: patterns.update(pattern: re.compile(pattern_strings[pattern], re.VERBOSE)) That code does not even compile. I got lost with all those similar names; try to choose meaningful ones. What about this: patterns = {} for name,regexpr in pattern_strings.iteritems(): patterns[name] = re.compile(regexpr, re.VERBOSE)) or even: patterns = dict((name,re.compile(regexpr, re.VERBOSE)) for name,regexpr in pattern_strings.iteritems() or even compile them directly when you define them. I'm not sure you can process a config file in this unstructured way; looks a lot easier if you look for [sections] and process sequentially lines inside sections. if match: content.update({pattern: match.groups()}) I wonder where you got the idea of populating a dict that way. It's a basic operation: content[name] = value The regular expressions look strange too. A comment may be empty. A setting too. There may be spaces around the = sign. Don't try to catch all in one go. -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list