I'll chew on this. Thanks, got to go.
"Steve Holden" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > It's me wrote: > > > I am never very good with regular expressions. My head always hurts > > whenever I need to use it. > > > Well, they are a pain to more than just you, and the conventional advice > is "even when you are convinced you need to use REs, try and find > another way". > > > I need to read a data file and parse each data record. Each item on the > > data record begins with either a string, or a list of strings. I searched > > around and didn't see any existing Python packages that does that. > > scanf.py, for instance, can do standard items but doesn't know about list. > > So, I figure I might have to write a lex engine for it and of course I have > > to deal wit RE again. > > > Well, you haven't yet convinced me that you *have* to. Personally, I > think you just like trouble :-) > > > But I run into problem right from the start. To recognize a list, I need a > > RE for the string: > > > > 1) begin with [" (left bracket followed by a double quote with zero or more > > spaces in between) > > 2) followed by any characters until ] but only if that left bracket is not > > preceeded by the escape character \. > > > So the pattern is > > 1. If the line begins with a "[" it should end with a "]" > > 2. Otherwise, it shouldn't? > > I'm trying to gently point out that the syntax you want to accept isn't > actually very clear. If the format is "Python strings and lists of > strings" then you might want to use the Python lexer to parse them, but > that's quite an advanced topic. [too advanced for me :-] > > The problem is matching "up to a right bracket not preceded by a > backslash". This seems to require what's technically referred to as a > "negative lookbehind assertion" - in other words, a pattern that doesn't > match anything, but checks that a specific condition is false or fails. > > > So, I tried: > > > > ^\[[" "]*"[a-z,A-Z\,, ]*(\\\])*[a-z,A-Z\,, \"]*] > > > > and tested with: > > > > ["This line\] works"] > > > > but it fails with: > > > > ["This line fails"] > > > > I would have thought that: > > > > (\\\])* > > > > should work because it's zero or more incidence of the pattern \] > > > > Any help is greatly appreciated. > > > > Sorry for beign OT. I posted this question at the lex group and didn't get > > any response. I figure may be somebody would know around here. > > I'd start with baby steps. First of all, make sure that you can match > the individual strings. Then use that pattern, parenthesized to turn it > into a group, as a component in a more complex pattern. > > Do you want to treat "this is also \" a string" as an allowable string? > In that case you need a pattern that matches 'up to the first quotation > mark not preceded by a backslash" as well! > > Let's try matching a single string first: > > >>> s = re.compile(r'(".*?(?<!\\)")') > >>> s.match('"s1", "s2"').groups() > ('"s1"',) > > Note that I followed the "*" with a "?" to stop it being greedy, and > matching as many characters as it could. OK, does that work when we have > escaped quotation marks? > > >>> s.match(r'"s1\"\"", "s2"').groups() > ('"s1\\"\\""',) > > Apparently so. The negative lookbehind assertion stops a quote from > matching when it's preceded by a backslash. Can we match a > comma-separated list of such strings? > > >>> slpat = r'(".*?(?<!\\)")(?:, (".*?(?<!\\)"))*' > >>> s = re.compile(slpat) > > This is a bit trickier: here the second grouping beginning with "(?:" is > intended to ensure that only the strings that get matched are included > in the groups, not the separators, even though they must be grouped > together. The list *must* be separated by ", ", but you could alter the > pattern to allow zero or more whitespace characters. > > >>> s.match(r'"s1\"\"", "s2"').groups() > ('"s1\\"\\""', '"s2"') > > Well, that seems to work. Note that these patterns all ignore bracket > characters, so all you need to do now is to surround them with patterns > to match the opening and closing brackets, and you're done (I hope). > > Anyway, it'll give you a few ideas to work with. > > regards > Steve > -- > Steve Holden http://www.holdenweb.com/ > Python Web Programming http://pydish.holdenweb.com/ > Holden Web LLC +1 703 861 4237 +1 800 494 3119 -- http://mail.python.org/mailman/listinfo/python-list