subject:"\[Tutor\] Parsing problem"

Re: [Tutor] Parsing problem

2005-07-26 Thread Liam Clarke

...Oh my gosh, that is awesome. Thanks so much. I had started playing
with the positioning of various patterns and using |, but it was
getting into the early AM, so I stopped. Prematurely, it seems.

I also got a point 0.1 second increase in speed by merging number and
identifier, as the values will always be treated as strings, and are
being written by a programme, so there's very little need to error
check what's being parsed in. And it felt nice to improve it a wee bit
myself. : )

Also interesting is that our processors, which aren't overly far apart
in clock speed, vary so greatly in processing this problem. Maybe Intel
is better *grin*

Much thanks sir. 

Regards, 

Liam ClarkeOn 7/26/05, Paul McGuire [EMAIL PROTECTED] wrote:
Liam -I made some changes and timed them, I think this problem is solvable.(Alltimings are done using your data, on a P4 800MHz laptop.)1. Baseline, the current state, in the parser code you sent me:
bracketGroup  ( pp.Group( LBRACE + ( pp.empty ^ pp.OneOrMore(assignment) ^pp.OneOrMore(identifier) ^ pp.OneOrMore(pp.dblQuotedString) ^pp.OneOrMore(number) ^ pp.OneOrMore(bracketGroup) ) + RBRACE ) )
Time: 02:20.71 (mm:ss.sss)Just for general edification, '^' means Or, and it will evaluate all thealternatives and choose the longest match (in regexp docs, this is sometimesreferred to as greedy matching); '|' means MatchFirst, and it will only
evaluate alternatives until it finds a match (which I refer to as eagermatching).In the past, I've had only slight results converting '^' to '|',but since this is a recursive _expression_, evaluating all of the possible
alternatives can involve quite a bit of processing before selecting thelongest.2. Convert to '|', replace empty with ZeroOrMore:bracketGroup  ( pp.Group( LBRACE + pp.ZeroOrMore( assignment | identifier
| pp.dblQuotedString | number | bracketGroup ) + RBRACE ) )Time: 00:14.57This is getting us somewhere!Replaced empty and OneOrMore's with a singleZeroOrMore, and changed from '^' to '|'.Since there is no overlap of the
various alternatives *in their current order*, it is safe to use '|'. (Thiswould not be the case if assignment came after identifier - this should be ahint on how to resolve the 'empty' problem.)One problem with this
_expression_ is that it will permit mixed bracket groups, such as { A 10b=1000 {} }.3. Go back to baseline, change '^' to '|', *and put empty at the end*bracketGroup  ( pp.Group( LBRACE + ( 
pp.OneOrMore(assignment) |pp.OneOrMore(identifier) | pp.OneOrMore(pp.dblQuotedString) |pp.OneOrMore(number) | pp.OneOrMore(bracketGroup) | pp.empty ) + RBRACE ) )Time: 00:12.04Best solution yet!This is faster than #2, since, once a match is made on
the first term withina bracketGroup, all others in the group are expectedto be the same type.Since '|' means take first match, we resolve empty'saccept anything behavior simply by putting it at the end of the list.
4. Make change in #3, also convert '^' to '|' in RHS.RHS  ( pp.dblQuotedString | identifier | number | bracketGroup )Time: 00:01.15Voila!I'm happy to say, this is the first time I've seen a 100X
improvement, mostly by replacing '^' by '|'.While this is not *always*possible (see the CORBA IDL parser in the examples directory), it is worththe effort, especially with a recursive _expression_.The one item to be wary of when using '|' is when expressions mask each
other.The easiest example is when trying to parse numbers, which may beintegers or reals.If I write the _expression_ as (assuming that integerswill match a sequence of digits, and reals will match digits with a decimal
point and some more digits):number = (integer | real)I will never match a real number! The integer _expression_ masks the real,and since it occurs first, it will match first.The two solutions are:
number = (integer ^ real)Ornumber = (real | integer)That is, use an Or, which will match the longest, or reorder the MatchFirstto put the most restrictive _expression_ first.Welcome to pyparsing, please let me know how your project goes!
-- Paul-Original Message-From: Liam Clarke [mailto:[EMAIL PROTECTED]]Sent: Monday, July 25, 2005 8:31 AMTo: Paul McGuireSubject: Re: [Tutor] Parsing problem
Hi Paul,I've attached the latest version. It includes my sample data within thefile. The sample data came in at 8 minutes 32 seconds without Pysco, 5minutes 25 with, on a650MHz Athlon.I was pondering whether breaking the test data down into separate bits via
some preprocessing and feeding the simpler data structures in would help atall.Unfortunately, as I'm using pp.empty to deal with empty bracket sets (whichwere causing my 'expected } ' exceptions), using | matches to 
pp.emptyfirst.I'm not sure how to get around the empty brackets without using that.I also get the feeling that pyparsing was more designed for making parsingsmall complex expressions easy, as opposed to my data churning. That said, I
can think of about ten different projects I'd played with before giving upbecause of a problem

Re: [Tutor] Parsing problem

2005-07-25 Thread Paul McGuire

Liam -

Great, this sounds like it's coming together.  Don't be discouraged, parsing
text like this has many forward/backward steps.

As far as stopping after one assignent, well, you might kick yourself over
this, but the answer is that you are no longer parsing just a single
assignment, but a list of them.  You cannot parse more than one assignment
with assignment as you have it, and you shouldn't.  Instead, expand the
scope of the parser to correspond to the expanded scope of input, as in:


listOfAssignments = OneOrMore( assignment )  


Now listOfAssignments is your root BNF, that you use to call parseString
against the contents of the input file.

Looking at your code, you might prefer to just enclose the contents inside
the braces inside an Optional, or a ZeroOrMore.  Seeing the other possible
elements that might be in your braces, will this work?  ZeroOrMore will take
care of the empty option, and recursively nesting RHS will avoid having to
repeat the other scalar entries.


RHS  ( pp.dblQuotedString.setParseAction(pp.removeQuotes) ^
 identifier ^
 integer ^
 pp.Group( LBRACE + pp.ZeroOrMore( assignment ^ RHS ) + RBRACE ) )


-- Paul



-Original Message-
From: Liam Clarke [mailto:[EMAIL PROTECTED] 
Sent: Sunday, July 24, 2005 10:21 AM
To: Paul McGuire
Cc: tutor@python.org
Subject: Re: [Tutor] Parsing problem

Hi Paul, 

That is fantastic. It works, and using that pp.group is the key with the
nested braces. 

I just ran this on the actual file after adding a few more possible values
inside the group, and it parsed the entire header structure rather nicely.

Now this will probably sound silly, but from the bit 

header = {...
...
}

it continues on with 

province = {...
} 

and so forth. 

Now, once it reads up to the closing bracket of the header section, it
returns that parsed nicely. 
Is there a way I can tell it to continue onwards? I can see that it's
stopping at one group.

Pyparsing is wonderful, but boy... as learning curves go, I'm somewhat over
my head.

I've tried this - 

Code http://www.rafb.net/paste/results/3Dm7FF35.html
Current data http://www.rafb.net/paste/results/3cWyt169.html

assignment  (pp.OneOrMore(pp.Group( LHS + EQUALS + RHS ))) 

to try and continue the parsing, but no luck.

I've been running into the 

 File c:\python24\Lib\site-packages\pyparsing.py, line 1427, in parseImpl
raise maxException
pyparsing.ParseException: Expected } (at char 742), (line:35, col:5) 

hassle again. From the CPU loading, I'm worried I've got myself something
very badly recursive going on, but I'm unsure of how to use validate()

I've noticed that a few of the sections in between contain values like this
- 

foo = { BAR = { HUN = 10 SOB = 6 } oof = { HUN = { } SOB = 4 } }

and so I've stuck pp.empty into my RHS possible values. What unintended side
effects may I get from using pp.empty? From the docs, it sounds like a
wildcard token, rather than matching a null.

Using pp.empty has resolved my apparent problem with empty {}'s causing my
favourite exception, but I'm just worried that I'm casting my net too wide.

Oh, and, if there's a way to get a 'last line parsed' value so as to start
parsing onwards, it would ease my day, as the only way I've found to get the
whole thing parsed is to use another x = { ... } around the whole of the
data, and now, I'm only getting the 'x' returned, so if I could parse by
section, it would help my understanding of what's happening. 

I'm still trial and error-ing a bit too much at the moment.

Regards, 

Liam Clarke





On 7/24/05, Paul McGuire [EMAIL PROTECTED] wrote:

Liam -

Glad you are sticking with pyparsing through some of these
idiosyncracies!

One thing that might simplify your life is if you are a bit more
strict on
specifying your grammar, especially using pp.printables as the
character set
for your various words and values.  Is this statement really valid?

Lw)r*)*dsflkj = sldjouwe)r#jdd

According to your grammar, it is.  Also, by using printables, you
force your
user to insert whitespace between the assignment target and the
equals sign. 
I'm sure your users would like to enter a quick a=1 once in a
while, but
since there is no whitespace, it will all be slurped into the
left-hand side
identifier.

Let's create two expressions, LHS and RHS, to dictate what is valid
on the 
left and right-hand side of the equals sign.  (Well, it turns out I
create a
bunch of expressions here, in the process of defining LHS and RHS,
but
hopefullly, this will make some sense):

EQUALS = pp.Suppress (=)
LBRACE = pp.Suppress({)
RBRACE = pp.Suppress(})
identifier = pp.Word(pp.alphas, pp.alphanums + _)
integer = pp.Word(pp.nums+-+, pp.nums)
assignment = pp.Forward()
LHS = identifier
RHS = pp.Forward

Re: [Tutor] Parsing problem

2005-07-25 Thread Liam Clarke

Hi Paul, 

That is fantastic. It works, and using that pp.group is the key with the nested braces. 

I just ran this on the actual file after adding a few more possible
values inside the group, and it parsed the entire header structure
rather nicely.

Now this will probably sound silly, but from the bit 

header = {...
...
}

it continues on with 

province = {...
} 

and so forth. 

Now, once it reads up to the closing bracket of the header section, it returns that parsed nicely. 
Is there a way I can tell it to continue onwards? I can see that it's stopping at one group.

Pyparsing is wonderful, but boy... as learning curves go, I'm somewhat over my head.

I've tried this - 

Code http://www.rafb.net/paste/results/3Dm7FF35.html

Current data http://www.rafb.net/paste/results/3cWyt169.html

assignment  (pp.OneOrMore(pp.Group( LHS + EQUALS + RHS ))) 

to try and continue the parsing, but no luck.

I've been running into the  

File c:\python24\Lib\site-packages\pyparsing.py, line 1427, in parseImpl
 raise maxException
pyparsing.ParseException: Expected } (at char 742), (line:35, col:5) 

hassle again. From the CPU loading, I'm worried I've got myself
something very badly recursive going on, but I'm unsure of how to use
validate()

I've noticed that a few of the sections in between contain values like this - 

foo = { BAR = { HUN = 10 SOB = 6 } oof = { HUN = { } SOB = 4 } }

and so I've stuck pp.empty into my RHS possible values. What unintended
side effects may I get from using pp.empty? From the docs, it sounds
like a wildcard token, rather than matching a null.

Using pp.empty has resolved my apparent problem with empty {}'s causing
my favourite exception, but I'm just worried that I'm casting my net
too wide.

Oh, and, if there's a way to get a 'last line parsed' value so as to
start parsing onwards, it would ease my day, as the only way I've found
to get the whole thing parsed is to use another x = { ... } around the
whole of the data, and now, I'm only getting the 'x' returned, so if I
could parse by section, it would help my understanding of what's
happening. 

I'm still trial and error-ing a bit too much at the moment.

Regards, 

Liam Clarke



On 7/24/05, Paul McGuire [EMAIL PROTECTED] wrote:
Liam -Glad you are sticking with pyparsing through some of these idiosyncracies!One thing that might simplify your life is if you are a bit more strict onspecifying your grammar, especially using pp.printables
 as the character setfor your various words and values.Is this statement really valid?Lw)r*)*dsflkj = sldjouwe)r#jddAccording to your grammar, it is.Also, by using printables, you force youruser to insert whitespace between the assignment target and the equals sign.
I'm sure your users would like to enter a quick a=1 once in a while, butsince there is no whitespace, it will all be slurped into the left-hand sideidentifier.Let's create two expressions, LHS and RHS, to dictate what is valid on the
left and right-hand side of the equals sign.(Well, it turns out I create abunch of expressions here, in the process of defining LHS and RHS, buthopefullly, this will make some sense):EQUALS = pp.Suppress
(=)LBRACE = pp.Suppress({)RBRACE = pp.Suppress(})identifier = pp.Word(pp.alphas, pp.alphanums + _)integer = pp.Word(pp.nums+-+, pp.nums)assignment = 
pp.Forward()LHS = identifierRHS = pp.Forward().setName(RHS)RHS  ( pp.dblQuotedString ^ identifier ^ integer ^ pp.Group( LBRACE +pp.OneOrMore(assignment) + RBRACE ) )assignment  
pp.Group( LHS + EQUALS + RHS )I leave it to you to flesh out what other possible value types can beincluded in RHS.Note also the use of the Group.Try running this snippet with and withoutGroup and see how the results change.I think using Group will help you to
build up a good parse tree for the matched tokens.Lastly, please note in the '' assignment to RHS that the _expression_ isenclosed in parens.I originally left this asRHS  pp.dblQuotedString
 ^ identifier ^ integer ^ pp.Group( LBRACE +pp.OneOrMore(assignment) + RBRACE )And it failed to match!A bug! In my own code!The shame...This fails because '' has a higher precedence then '^', so RHS only worked
if it was handed a quoted string.Probably good practice to always enclosein quotes the _expression_ being assigned to a Forward using ''.-- Paul-Original Message-From: Liam Clarke [mailto:
[EMAIL PROTECTED]]Sent: Saturday, July 23, 2005 9:03 AMTo: Paul McGuireCc: tutor@python.orgSubject: Re: [Tutor] Parsing problem
*sigh* I just read the documentation more carefully and found the differencebetween the| operator and the ^ operator.Input -j = { line = { foo = 10 bar = 20 } }New codesel = pp.Forward
()values = ((pp.Word(pp.printables) + pp.Suppress(=) +pp.Word(pp.printables)) ^ sel)sel  (pp.Word(pp.printables) + pp.Suppress(=) + pp.Suppress({) +pp.OneOrMore(values) + 
pp.Suppress(}))Output -(['j', 'line', 'foo', '10', 'bar', '20'], {})My apologies for the deluge.Regards,Liam ClarkeOn 7/24/05, Liam Clarke 
[EMAIL PROTECTED] wrote:Hmmm

Re: [Tutor] Parsing problem

2005-07-25 Thread Paul McGuire

Liam -

I just uploaded an update to pyparsing, version 1.3.2, that should fix the
problem with using nested Dicts.  Now you won't need to use [0] to
dereference the 0'th element, just reference the nested elements as a.b.c,
or a[b][c].

-- Paul 


-Original Message-
From: Liam Clarke [mailto:[EMAIL PROTECTED] 
Sent: Sunday, July 24, 2005 10:21 AM
To: Paul McGuire
Cc: tutor@python.org
Subject: Re: [Tutor] Parsing problem

Hi Paul, 

That is fantastic. It works, and using that pp.group is the key with the
nested braces. 

I just ran this on the actual file after adding a few more possible values
inside the group, and it parsed the entire header structure rather nicely.

Now this will probably sound silly, but from the bit 

header = {...
...
}

it continues on with 

province = {...
} 

and so forth. 

Now, once it reads up to the closing bracket of the header section, it
returns that parsed nicely. 
Is there a way I can tell it to continue onwards? I can see that it's
stopping at one group.

Pyparsing is wonderful, but boy... as learning curves go, I'm somewhat over
my head.

I've tried this - 

Code http://www.rafb.net/paste/results/3Dm7FF35.html
Current data http://www.rafb.net/paste/results/3cWyt169.html

assignment  (pp.OneOrMore(pp.Group( LHS + EQUALS + RHS ))) 

to try and continue the parsing, but no luck.

I've been running into the 

 File c:\python24\Lib\site-packages\pyparsing.py, line 1427, in parseImpl
raise maxException
pyparsing.ParseException: Expected } (at char 742), (line:35, col:5) 

hassle again. From the CPU loading, I'm worried I've got myself something
very badly recursive going on, but I'm unsure of how to use validate()

I've noticed that a few of the sections in between contain values like this
- 

foo = { BAR = { HUN = 10 SOB = 6 } oof = { HUN = { } SOB = 4 } }

and so I've stuck pp.empty into my RHS possible values. What unintended side
effects may I get from using pp.empty? From the docs, it sounds like a
wildcard token, rather than matching a null.

Using pp.empty has resolved my apparent problem with empty {}'s causing my
favourite exception, but I'm just worried that I'm casting my net too wide.

Oh, and, if there's a way to get a 'last line parsed' value so as to start
parsing onwards, it would ease my day, as the only way I've found to get the
whole thing parsed is to use another x = { ... } around the whole of the
data, and now, I'm only getting the 'x' returned, so if I could parse by
section, it would help my understanding of what's happening. 

I'm still trial and error-ing a bit too much at the moment.

Regards, 

Liam Clarke





On 7/24/05, Paul McGuire [EMAIL PROTECTED] wrote:

Liam -

Glad you are sticking with pyparsing through some of these
idiosyncracies!

One thing that might simplify your life is if you are a bit more
strict on
specifying your grammar, especially using pp.printables as the
character set
for your various words and values.  Is this statement really valid?

Lw)r*)*dsflkj = sldjouwe)r#jdd

According to your grammar, it is.  Also, by using printables, you
force your
user to insert whitespace between the assignment target and the
equals sign. 
I'm sure your users would like to enter a quick a=1 once in a
while, but
since there is no whitespace, it will all be slurped into the
left-hand side
identifier.

Let's create two expressions, LHS and RHS, to dictate what is valid
on the 
left and right-hand side of the equals sign.  (Well, it turns out I
create a
bunch of expressions here, in the process of defining LHS and RHS,
but
hopefullly, this will make some sense):

EQUALS = pp.Suppress (=)
LBRACE = pp.Suppress({)
RBRACE = pp.Suppress(})
identifier = pp.Word(pp.alphas, pp.alphanums + _)
integer = pp.Word(pp.nums+-+, pp.nums)
assignment = pp.Forward()
LHS = identifier
RHS = pp.Forward().setName(RHS)
RHS  ( pp.dblQuotedString ^ identifier ^ integer ^ pp.Group(
LBRACE +
pp.OneOrMore(assignment) + RBRACE ) )
assignment  pp.Group( LHS + EQUALS + RHS )

I leave it to you to flesh out what other possible value types can
be
included in RHS.

Note also the use of the Group.  Try running this snippet with and
without
Group and see how the results change.  I think using Group will help
you to 
build up a good parse tree for the matched tokens.

Lastly, please note in the '' assignment to RHS that the
expression is
enclosed in parens.  I originally left this as

RHS  pp.dblQuotedString ^ identifier ^ integer ^ pp.Group( LBRACE
+
pp.OneOrMore(assignment) + RBRACE )

And it failed to match!  A bug! In my own code!  The shame...

This fails because '' has a higher

Re: [Tutor] Parsing problem

2005-07-25 Thread Paul McGuire

Liam -

Could you e-mail me your latest grammar?  The last version I had includes
this definition for RHS:

RHS  ( pp.dblQuotedString.setParseAction(pp.removeQuotes) ^
 identifier ^
 integer ^
 pp.Group( LBRACE + pp.ZeroOrMore( assignment ^ RHS ) + RBRACE ) )

What happens if you replace the '^' operators with '|', as in:

RHS  ( pp.dblQuotedString.setParseAction(pp.removeQuotes) |
 identifier |
 integer |
 pp.Group( LBRACE + pp.ZeroOrMore( assignment | RHS ) + RBRACE ) )

I think earlier on, you needed to use '^' because your various terms were
fairly vague (you were still using Word(pp.printables), which would accept
just about anything).  But now I think there is little ambiguity between a
quoted string, identifier, etc., and simple '|' or MatchFirst's will do.
This is about the only optimization I can think of.

-- Paul
 

-Original Message-
From: Liam Clarke [mailto:[EMAIL PROTECTED] 
Sent: Monday, July 25, 2005 7:38 AM
To: Paul McGuire
Cc: tutor@python.org
Subject: Re: [Tutor] Parsing problem

Hi Paul, 

Well various tweaks and such done, it parses perfectly, so much thanks, I
think I now have a rough understanding of the basics of pyparsing. 

Now, onto the fun part of optimising it. At the moment, I'm looking at 2 - 5
minutes to parse a 2000 line country section, and that's with psyco. Only
problem is, I have 157 country sections...

I am running a 650 MHz processor, so that isn't helping either. I read this
quote on http://pyparsing.sourceforge.net.

Thanks again for your help and thanks for writing pyparser! It seems my
code needed to be optimized and now I am able to parse a 200mb file in 3
seconds. Now I can stick my tongue out at the Perl guys ;)

I'm jealous, 200mb in 3 seconds, my file's only 4mb.

Are there any general approaches to optimisation that work well?

My current thinking is to use string methods to split the string into each
component section, and then parse each section to a bare minimum key, value.
ie - instead of parsing 

x = { foo = { bar = 10 bob = 20 } type = { z = { } y = { } }}

out fully, just parse to x:{ foo = { bar = 10 bob = 20 } type = { z = { }
y = { } }}

I'm thinking that would avoid the complicated nested structure I have now,
and I could parse data out of the string as needed, if needed at all.

Erk, I don't know, I've never had to optimise anything. 

Much thanks for creating pyparsing, and doubly thank-you for your assistance
in learning how to use it. 

Regards, 

Liam Clarke

On 7/25/05, Liam Clarke [EMAIL PROTECTED] wrote:

Hi Paul, 

My apologies, as I was jumping into my car after sending that email,
it clicked in my brain. 
Oh yeah... initial  body...

But good to know about how to accept valid numbers.

Sorry, getting a bit too quick to fire off emails here.

Regards, 

Liam Clarke


On 7/25/05, Paul McGuire  [EMAIL PROTECTED]
mailto:[EMAIL PROTECTED]  wrote:


Liam -

The two arguments to Word work this way:
- the first argument lists valid *initial* characters
- the second argument lists valid *body* or subsequent
characters

For example, in the identifier definition, 

identifier = pp.Word(pp.alphas, pp.alphanums + _/:.)

identifiers *must* start with an alphabetic character, and
then may be
followed by 0 or more alphanumeric or _/: or . characters.
If only one 
argument is supplied, then the same string of characters is
used as both
initial and body.  Identifiers are very typical for 2
argument Word's, as
they often start with alphas, but then accept digits and
other punctuation. 
No whitespace is permitted within a Word.  The Word matching
will end when a
non-body character is seen.

Using this definition:

integer = pp.Word(pp.nums+-+., pp.nums)

It will accept +123, -345, 678, and .901.  But in a
real number, a 
period may occur anywhere in the number, not just as the
initial character,
as in 3.14159.  So your bodyCharacters must also include a
., as in:

integer = pp.Word(pp.nums+-+., pp.nums+.)

Let me say, though, that this is a very permissive
definition of integer -
for one thing, we really should rename it something like
number, since it
now accepts non-integers as well!  But also, there is no
restriction on the 
frequency of body characters.  This definition would accept
a number that
looks like 3.4.3234.111.123.3234.  If you are certain

Re: [Tutor] Parsing problem

2005-07-25 Thread Paul McGuire

Liam -

The two arguments to Word work this way:
- the first argument lists valid *initial* characters
- the second argument lists valid *body* or subsequent characters

For example, in the identifier definition, 

identifier = pp.Word(pp.alphas, pp.alphanums + _/:.)

identifiers *must* start with an alphabetic character, and then may be
followed by 0 or more alphanumeric or _/: or . characters.  If only one
argument is supplied, then the same string of characters is used as both
initial and body.  Identifiers are very typical for 2 argument Word's, as
they often start with alphas, but then accept digits and other punctuation.
No whitespace is permitted within a Word.  The Word matching will end when a
non-body character is seen.

Using this definition:

integer = pp.Word(pp.nums+-+., pp.nums)

It will accept +123, -345, 678, and .901.  But in a real number, a
period may occur anywhere in the number, not just as the initial character,
as in 3.14159.  So your bodyCharacters must also include a ., as in:

integer = pp.Word(pp.nums+-+., pp.nums+.)

Let me say, though, that this is a very permissive definition of integer -
for one thing, we really should rename it something like number, since it
now accepts non-integers as well!  But also, there is no restriction on the
frequency of body characters.  This definition would accept a number that
looks like 3.4.3234.111.123.3234.  If you are certain that you will only
receive valid inputs, then this simple definition will be fine.  But if you
will have to handle and reject erroneous inputs, then you might do better
with a number definition like:

number = Combine( Word( +-+nums, nums ) + 
  Optional( point + Optional( Word( nums ) ) ) )

This will handle +123, -345, 678, and 0.901, but not .901.  If you
want to accept numbers that begin with .s, then you'll need to tweak this
a bit further.

One last thing: you may want to start using setName() on some of your
expressions, as in:

number = Combine( Word( +-+nums, nums ) + 
  Optional( point + Optional( Word( nums ) ) )
).setName(number)

Note, this is *not* the same as setResultsName.  Here setName is attaching a
name to this pattern, so that when it appears in an exception, the name will
be used instead of an encoded pattern string (such as W:012345...).  No need
to do this for Literals, the literal string is used when it appears in an
exception.

-- Paul


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Parsing problem

2005-07-24 Thread Liam Clarke

Hi Paul, 

My apologies, as I was jumping into my car after sending that email, it clicked in my brain. 
Oh yeah... initial  body...

But good to know about how to accept valid numbers.

Sorry, getting a bit too quick to fire off emails here.

Regards, 

Liam ClarkeOn 7/25/05, Paul McGuire [EMAIL PROTECTED] wrote:
Liam -The two arguments to Word work this way:- the first argument lists valid *initial* characters- the second argument lists valid *body* or subsequent charactersFor example, in the identifier definition,
identifier = pp.Word(pp.alphas, pp.alphanums + _/:.)identifiers *must* start with an alphabetic character, and then may befollowed by 0 or more alphanumeric or _/: or . characters.If only one
argument is supplied, then the same string of characters is used as bothinitial and body.Identifiers are very typical for 2 argument Word's, asthey often start with alphas, but then accept digits and other punctuation.
No whitespace is permitted within a Word.The Word matching will end when anon-body character is seen.Using this definition:integer = pp.Word(pp.nums+-+., pp.nums)It will accept +123, -345, 678, and .901.But in a real number, a
period may occur anywhere in the number, not just as the initial character,as in 3.14159.So your bodyCharacters must also include a ., as in:integer = pp.Word(pp.nums+-+., 
pp.nums+.)Let me say, though, that this is a very permissive definition of integer -for one thing, we really should rename it something like number, since itnow accepts non-integers as well!But also, there is no restriction on the
frequency of body characters.This definition would accept a number thatlooks like 3.4.3234.111.123.3234.If you are certain that you will onlyreceive valid inputs, then this simple definition will be fine.But if you
will have to handle and reject erroneous inputs, then you might do betterwith a number definition like:number = Combine( Word( +-+nums, nums ) +Optional(
point + Optional( Word( nums ) ) ) )This will handle +123, -345, 678, and 0.901, but not .901.If youwant to accept numbers that begin with .s, then you'll need to tweak this
a bit further.One last thing: you may want to start using setName() on some of yourexpressions, as in:number = Combine( Word( +-+nums, nums ) +Optional(
point + Optional( Word( nums ) ) )).setName(number)Note, this is *not* the same as setResultsName.Here setName is attaching aname to this pattern, so that when it appears in an exception, the name will
be used instead of an encoded pattern string (such as W:012345...).No needto do this for Literals, the literal string is used when it appears in anexception.-- Paul
-- 'There is only one basic human right, and that is to do as you damn well please.And with it comes the only basic human duty, to take the consequences.'
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Parsing problem

2005-07-23 Thread Liam Clarke

Howdy, 

I've attempted to follow your lead and have started from scratch, I
could just copy and paste your solution (which works pretty well), but
I want to understand what I'm doing *grin*

However, I've been hitting a couple of ruts in the path to
enlightenment. Is there a way to tell pyparsing that to treat specific
escaped characters as just a slash followed by a letter? For the time
being I've converted all backslashes to forwardslashes, as it was
choking on \a in a file path.

But my latest hitch, takes this form (apologies for large traceback)

Traceback (most recent call last):
 File interactive input, line 1, in ?
 File parse.py, line 336, in parse
 parsedEntries = dicts.parseString(test_data)
 File c:\python24\Lib\site-packages\pyparsing.py, line 616, in parseString
 loc, tokens = self.parse( instring.expandtabs(), 0 )
 File c:\python24\Lib\site-packages\pyparsing.py, line 558, in parse
 loc,tokens = self.parseImpl( instring, loc, doActions )
 File c:\python24\Lib\site-packages\pyparsing.py, line 1518, in parseImpl
 return self.expr.parse( instring, loc, doActions )
 File c:\python24\Lib\site-packages\pyparsing.py, line 558, in parse
 loc,tokens = self.parseImpl( instring, loc, doActions )
 File c:\python24\Lib\site-packages\pyparsing.py, line 1367, in parseImpl
 loc, exprtokens = e.parse( instring, loc, doActions )
 File c:\python24\Lib\site-packages\pyparsing.py, line 558, in parse
 loc,tokens = self.parseImpl( instring, loc, doActions )
 File c:\python24\Lib\site-packages\pyparsing.py, line 1518, in parseImpl
 return self.expr.parse( instring, loc, doActions )
 File c:\python24\Lib\site-packages\pyparsing.py, line 560, in parse
 raise ParseException, ( instring, len(instring), self.errmsg, self )

ParseException: Expected } (at char 9909), (line:325, col:5)

The offending code can be found here (includes the data) - http://www.rafb.net/paste/results/L560wx80.html

It's like pyparsing isn't recognising a lot of my }'s, as if I add
another one, it throws the same error, same for adding another two...

No doubt I've done something silly, but any help in finding the tragic
flaw would be much appreciated. I need to get a parsingResults object
out so I can learn how to work with the basic structure!

Much regards,

Liam ClarkeOn 7/21/05, Paul McGuire [EMAIL PROTECTED] wrote:
Liam, Kent, and Danny -It sure looks like pyparsing is taking on a life of its own!I can see I nolonger am the only one pitching pyparsing at some of these applications!Yes, Liam, it is possible to create dictionary-like objects, that is,
ParseResults objects that have named values in them.I looked into yourapplication, and the nested assignments seem very similar to a ConfigParsetype of structure.Here is a pyparsing version that handles the test data
in your original post (I kept Danny Yoo's recursive list values, and addedrecursive dictionary entries):--import pyparsing as pplistValue = pp.Forward()listSeq = pp.Suppress
('{') + pp.Group(pp.ZeroOrMore(listValue)) +pp.Suppress('}')listValue  ( pp.dblQuotedString.setParseAction(pp.removeQuotes) |pp.Word(pp.alphanums)
| listSeq )keyName = pp.Word( pp.alphas )entries = pp.Forward()entrySeq = pp.Suppress('{') + pp.Group(pp.OneOrMore(entries)) +pp.Suppress('}')entries  pp.Dict(pp.OneOrMore
(pp.Group(
keyName + pp.Suppress('=') + (entrySeq |listValue) ) ) )--Dict is one of the most confusing classes to use, and there are someexamples in the examples directory that comes with pyparsing (see
dictExample2.py), but it is still tricky.Here is some code to access yourinput test data, repeated here for easy reference:--testdata = \country = {tag = ENG
ai = {flags = { }combat = { DAU FRA ORL PRO }continent = { }area = { }region = { British Isles NorthSeaSea ECAtlanticSea NAtlanticSeaTagoSea WCAtlanticSea }
war = 60ferocity = no}}parsedEntries = entries.parseString(testdata)def dumpEntries(dct,depth=0):keys = dct.keys()keys.sort()for k in keys:print (''*depth) + '- ' + k + ':',
if isinstance(dct[k],pp.ParseResults):if dct[k][0].keys():printdumpEntries(dct[k][0],depth+1)else:print dct[k][0]
else:print dct[k]dumpEntries( parsedEntries )printprint parsedEntries.country[0].tagprint parsedEntries.country[0].ai[0].warprint parsedEntries.country[0].ai[0].ferocity
--This will print out:--- country:- ai:- area: []- combat: ['DAU', 'FRA', 'ORL', 'PRO']- continent: []- ferocity: no
- flags: []- region: ['British Isles', 'NorthSeaSea', 'ECAtlanticSea','NAtlanticSea', 'TagoSea', 'WCAtlanticSea']- war: 60- tag: ENGENG60No--
But I really dislike having to dereference those nested values using the0'th element.So I'm going to fix pyparsing so that in the next release,you'll be able to reference the sub-elements as:print parsedEntries.country.tag
print parsedEntries.country.ai.warprint parsedEntries.country.ai.ferocityThis *may* break some existing code, but Dict is

Re: [Tutor] Parsing problem

2005-07-23 Thread Liam Clarke

Hmmm... just a quick update, I've been poking around and I'm obviously making some error of logic. 

Given a line - 

f = j = { line = { foo = 10 bar = 20 } }

And given the following code - 

select = pp.Forward()select  
pp.Word(pp.printables) + pp.Suppress(=) + pp.Suppress({) + 
pp.OneOrMore( (pp.Word(pp.printables) + pp.Suppress(=) + 
pp.Word(pp.printables) ) | select ) + pp.Suppress(})

sel.parseString(f) gives - 

(['j', 'line', '{', 'foo', '10', 'bar', '20'], {})

So I've got a bracket sneaking through there. Argh. My brain hurts. 

Is the | operator an exclusive or? 

Befuddled, 

Liam Clarke
On 7/23/05, Liam Clarke [EMAIL PROTECTED] wrote:
Howdy, 

I've attempted to follow your lead and have started from scratch, I
could just copy and paste your solution (which works pretty well), but
I want to understand what I'm doing *grin*

However, I've been hitting a couple of ruts in the path to
enlightenment. Is there a way to tell pyparsing that to treat specific
escaped characters as just a slash followed by a letter? For the time
being I've converted all backslashes to forwardslashes, as it was
choking on \a in a file path.

But my latest hitch, takes this form (apologies for large traceback)

Traceback (most recent call last):
 File interactive input, line 1, in ?
 File parse.py, line 336, in parse
 parsedEntries = dicts.parseString(test_data)
 File c:\python24\Lib\site-packages\pyparsing.py, line 616, in parseString
 loc, tokens = self.parse( instring.expandtabs(), 0 )
 File c:\python24\Lib\site-packages\pyparsing.py, line 558, in parse
 loc,tokens = self.parseImpl( instring, loc, doActions )
 File c:\python24\Lib\site-packages\pyparsing.py, line 1518, in parseImpl
 return self.expr.parse( instring, loc, doActions )
 File c:\python24\Lib\site-packages\pyparsing.py, line 558, in parse
 loc,tokens = self.parseImpl( instring, loc, doActions )
 File c:\python24\Lib\site-packages\pyparsing.py, line 1367, in parseImpl
 loc, exprtokens = e.parse( instring, loc, doActions )
 File c:\python24\Lib\site-packages\pyparsing.py, line 558, in parse
 loc,tokens = self.parseImpl( instring, loc, doActions )
 File c:\python24\Lib\site-packages\pyparsing.py, line 1518, in parseImpl
 return self.expr.parse( instring, loc, doActions )
 File c:\python24\Lib\site-packages\pyparsing.py, line 560, in parse
 raise ParseException, ( instring, len(instring), self.errmsg, self )

ParseException: Expected } (at char 9909), (line:325, col:5)

The offending code can be found here (includes the data) - http://www.rafb.net/paste/results/L560wx80.html


It's like pyparsing isn't recognising a lot of my }'s, as if I add
another one, it throws the same error, same for adding another two...

No doubt I've done something silly, but any help in finding the tragic
flaw would be much appreciated. I need to get a parsingResults object
out so I can learn how to work with the basic structure!

Much regards,

Liam ClarkeOn 7/21/05, Paul McGuire 
[EMAIL PROTECTED] wrote:
Liam, Kent, and Danny -It sure looks like pyparsing is taking on a life of its own!I can see I nolonger am the only one pitching pyparsing at some of these applications!Yes, Liam, it is possible to create dictionary-like objects, that is,
ParseResults objects that have named values in them.I looked into yourapplication, and the nested assignments seem very similar to a ConfigParsetype of structure.Here is a pyparsing version that handles the test data
in your original post (I kept Danny Yoo's recursive list values, and addedrecursive dictionary entries):--import pyparsing as pplistValue = pp.Forward()listSeq = pp.Suppress

('{') + pp.Group(pp.ZeroOrMore(listValue)) +pp.Suppress('}')listValue  ( pp.dblQuotedString.setParseAction(pp.removeQuotes) |pp.Word(pp.alphanums)
| listSeq )keyName = pp.Word( pp.alphas )entries = pp.Forward()entrySeq = pp.Suppress('{') + pp.Group(pp.OneOrMore(entries)) +pp.Suppress('}')entries  pp.Dict(pp.OneOrMore

(pp.Group(
keyName + pp.Suppress('=') + (entrySeq |listValue) ) ) )--Dict is one of the most confusing classes to use, and there are someexamples in the examples directory that comes with pyparsing (see
dictExample2.py), but it is still tricky.Here is some code to access yourinput test data, repeated here for easy reference:--testdata = \country = {
tag = ENG
ai = {flags = { }combat = { DAU FRA ORL PRO }continent = { }area = { }region = { British Isles NorthSeaSea ECAtlanticSea NAtlanticSeaTagoSea WCAtlanticSea }
war = 60ferocity = no}}parsedEntries = entries.parseString(testdata)def dumpEntries(dct,depth=0):keys = dct.keys()keys.sort()for k in keys:print (''*depth) + '- ' + k + ':',
if isinstance(dct[k],pp.ParseResults):if dct[k][0].keys():printdumpEntries(dct[k][0],depth+1)else:print dct[k][0]

else:print dct[k]dumpEntries( parsedEntries )printprint parsedEntries.country[0].tagprint parsedEntries.country[0].ai[0].warprint parsedEntries.country[0].ai[0].ferocity

Re: [Tutor] Parsing problem

2005-07-23 Thread Liam Clarke

*sigh* I just read the documentation more carefully and found the difference between the 
| operator and the ^ operator. 

Input - 

j = { line = { foo = 10 bar = 20 } }

New code

sel = pp.Forward()
values = ((pp.Word(pp.printables) + pp.Suppress(=) + pp.Word(pp.printables)) ^ sel)
sel  (pp.Word(pp.printables) + pp.Suppress(=) + pp.Suppress({) + pp.OneOrMore(values) + pp.Suppress(}))

Output - 

(['j', 'line', 'foo', '10', 'bar', '20'], {})

My apologies for the deluge. 

Regards, 

Liam ClarkeOn 7/24/05, Liam Clarke [EMAIL PROTECTED] wrote:
Hmmm... just a quick update, I've been poking around and I'm obviously making some error of logic. 

Given a line - 

f = j = { line = { foo = 10 bar = 20 } }

And given the following code - 

select = pp.Forward()select  
pp.Word(pp.printables) + pp.Suppress(=) + pp.Suppress({) + 
pp.OneOrMore( (pp.Word(pp.printables) + pp.Suppress(=) + 
pp.Word(pp.printables) ) | select ) + pp.Suppress(})

sel.parseString(f) gives - 

(['j', 'line', '{', 'foo', '10', 'bar', '20'], {})

So I've got a bracket sneaking through there. Argh. My brain hurts. 

Is the | operator an exclusive or? 

Befuddled, 

Liam Clarke
On 7/23/05, Liam Clarke [EMAIL PROTECTED]
 wrote:
Howdy, 

I've attempted to follow your lead and have started from scratch, I
could just copy and paste your solution (which works pretty well), but
I want to understand what I'm doing *grin*

However, I've been hitting a couple of ruts in the path to
enlightenment. Is there a way to tell pyparsing that to treat specific
escaped characters as just a slash followed by a letter? For the time
being I've converted all backslashes to forwardslashes, as it was
choking on \a in a file path.

But my latest hitch, takes this form (apologies for large traceback)

Traceback (most recent call last):
 File interactive input, line 1, in ?
 File parse.py, line 336, in parse
 parsedEntries = dicts.parseString(test_data)
 File c:\python24\Lib\site-packages\pyparsing.py, line 616, in parseString
 loc, tokens = self.parse( instring.expandtabs(), 0 )
 File c:\python24\Lib\site-packages\pyparsing.py, line 558, in parse
 loc,tokens = self.parseImpl( instring, loc, doActions )
 File c:\python24\Lib\site-packages\pyparsing.py, line 1518, in parseImpl
 return self.expr.parse( instring, loc, doActions )
 File c:\python24\Lib\site-packages\pyparsing.py, line 558, in parse
 loc,tokens = self.parseImpl( instring, loc, doActions )
 File c:\python24\Lib\site-packages\pyparsing.py, line 1367, in parseImpl
 loc, exprtokens = e.parse( instring, loc, doActions )
 File c:\python24\Lib\site-packages\pyparsing.py, line 558, in parse
 loc,tokens = self.parseImpl( instring, loc, doActions )
 File c:\python24\Lib\site-packages\pyparsing.py, line 1518, in parseImpl
 return self.expr.parse( instring, loc, doActions )
 File c:\python24\Lib\site-packages\pyparsing.py, line 560, in parse
 raise ParseException, ( instring, len(instring), self.errmsg, self )

ParseException: Expected } (at char 9909), (line:325, col:5)

The offending code can be found here (includes the data) - http://www.rafb.net/paste/results/L560wx80.html


It's like pyparsing isn't recognising a lot of my }'s, as if I add
another one, it throws the same error, same for adding another two...

No doubt I've done something silly, but any help in finding the tragic
flaw would be much appreciated. I need to get a parsingResults object
out so I can learn how to work with the basic structure!

Much regards,

Liam ClarkeOn 7/21/05, Paul McGuire 

[EMAIL PROTECTED] wrote:
Liam, Kent, and Danny -It sure looks like pyparsing is taking on a life of its own!I can see I nolonger am the only one pitching pyparsing at some of these applications!Yes, Liam, it is possible to create dictionary-like objects, that is,
ParseResults objects that have named values in them.I looked into yourapplication, and the nested assignments seem very similar to a ConfigParsetype of structure.Here is a pyparsing version that handles the test data
in your original post (I kept Danny Yoo's recursive list values, and addedrecursive dictionary entries):--import pyparsing as pplistValue = pp.Forward()listSeq = pp.Suppress


('{') + pp.Group(pp.ZeroOrMore(listValue)) +pp.Suppress('}')listValue  ( pp.dblQuotedString.setParseAction(pp.removeQuotes) |pp.Word(pp.alphanums)
| listSeq )keyName = pp.Word( pp.alphas )entries = pp.Forward()entrySeq = pp.Suppress('{') + pp.Group(pp.OneOrMore(entries)) +pp.Suppress('}')entries  pp.Dict(pp.OneOrMore


(pp.Group(
keyName + pp.Suppress('=') + (entrySeq |listValue) ) ) )--Dict is one of the most confusing classes to use, and there are someexamples in the examples directory that comes with pyparsing (see
dictExample2.py), but it is still tricky.Here is some code to access yourinput test data, repeated here for easy reference:--testdata = \country = {

tag = ENG
ai = {flags = { }combat = { DAU FRA ORL PRO }continent = { }area = { }region

Re: [Tutor] Parsing problem

2005-07-23 Thread Paul McGuire

Liam -

Glad you are sticking with pyparsing through some of these idiosyncracies!

One thing that might simplify your life is if you are a bit more strict on
specifying your grammar, especially using pp.printables as the character set
for your various words and values.  Is this statement really valid?

Lw)r*)*dsflkj = sldjouwe)r#jdd

According to your grammar, it is.  Also, by using printables, you force your
user to insert whitespace between the assignment target and the equals sign.
I'm sure your users would like to enter a quick a=1 once in a while, but
since there is no whitespace, it will all be slurped into the left-hand side
identifier.

Let's create two expressions, LHS and RHS, to dictate what is valid on the
left and right-hand side of the equals sign.  (Well, it turns out I create a
bunch of expressions here, in the process of defining LHS and RHS, but
hopefullly, this will make some sense):

EQUALS = pp.Suppress(=)
LBRACE = pp.Suppress({)
RBRACE = pp.Suppress(})
identifier = pp.Word(pp.alphas, pp.alphanums + _)
integer = pp.Word(pp.nums+-+, pp.nums)
assignment = pp.Forward()
LHS = identifier
RHS = pp.Forward().setName(RHS)
RHS  ( pp.dblQuotedString ^ identifier ^ integer ^ pp.Group( LBRACE +
pp.OneOrMore(assignment) + RBRACE ) )
assignment  pp.Group( LHS + EQUALS + RHS )

I leave it to you to flesh out what other possible value types can be
included in RHS.

Note also the use of the Group.  Try running this snippet with and without
Group and see how the results change.  I think using Group will help you to
build up a good parse tree for the matched tokens.

Lastly, please note in the '' assignment to RHS that the expression is
enclosed in parens.  I originally left this as

RHS  pp.dblQuotedString ^ identifier ^ integer ^ pp.Group( LBRACE +
pp.OneOrMore(assignment) + RBRACE )

And it failed to match!  A bug! In my own code!  The shame...

This fails because '' has a higher precedence then '^', so RHS only worked
if it was handed a quoted string.  Probably good practice to always enclose
in quotes the expression being assigned to a Forward using ''.

-- Paul


-Original Message-
From: Liam Clarke [mailto:[EMAIL PROTECTED] 
Sent: Saturday, July 23, 2005 9:03 AM
To: Paul McGuire
Cc: tutor@python.org
Subject: Re: [Tutor] Parsing problem

*sigh* I just read the documentation more carefully and found the difference
between the 
| operator and the ^ operator. 

Input - 

j = { line = { foo = 10 bar = 20 } }

New code

sel = pp.Forward()
values = ((pp.Word(pp.printables) + pp.Suppress(=) +
pp.Word(pp.printables)) ^ sel)
sel  (pp.Word(pp.printables) + pp.Suppress(=) + pp.Suppress({) +
pp.OneOrMore(values) + pp.Suppress(}))

Output - 

(['j', 'line', 'foo', '10', 'bar', '20'], {})

My apologies for the deluge. 

Regards, 

Liam Clarke


On 7/24/05, Liam Clarke [EMAIL PROTECTED] wrote:

Hmmm... just a quick update, I've been poking around and I'm
obviously making some error of logic. 

Given a line - 

 f = j = { line = { foo = 10 bar = 20 } }

And given the following code - 

select = pp.Forward()
select  
pp.Word(pp.printables) + pp.Suppress(=) + pp.Suppress({) + 
pp.OneOrMore( (pp.Word(pp.printables) + pp.Suppress(=) + 
pp.Word(pp.printables) ) | select ) + pp.Suppress(})

sel.parseString(f) gives - 

(['j', 'line', '{', 'foo', '10', 'bar', '20'], {})

So I've got a bracket sneaking through there. Argh. My brain hurts. 

Is the | operator an exclusive or? 

Befuddled, 

Liam Clarke



On 7/23/05, Liam Clarke [EMAIL PROTECTED]  wrote:

Howdy, 

I've attempted to follow your lead and have started from
scratch, I could just copy and paste your solution (which works pretty
well), but I want to understand what I'm doing *grin*

However, I've been hitting a couple of ruts in the path to
enlightenment. Is there a way to tell pyparsing that to treat specific
escaped characters as just a slash followed by a letter? For the time being
I've converted all backslashes to forwardslashes, as it was choking on \a in
a file path.

But my latest hitch, takes this form (apologies for large
traceback)

Traceback (most recent call last):
  File interactive input, line 1, in ?
  File parse.py, line 336, in parse
parsedEntries = dicts.parseString(test_data)
  File c:\python24\Lib\site-packages\pyparsing.py, line
616, in parseString
loc, tokens = self.parse( instring.expandtabs(), 0 )
  File c:\python24\Lib\site-packages\pyparsing.py, line
558, in parse
loc,tokens = self.parseImpl( instring, loc, doActions

[Tutor] Parsing problem

2005-07-21 Thread Paul McGuire

Liam, Kent, and Danny -

It sure looks like pyparsing is taking on a life of its own!  I can see I no
longer am the only one pitching pyparsing at some of these applications!

Yes, Liam, it is possible to create dictionary-like objects, that is,
ParseResults objects that have named values in them.  I looked into your
application, and the nested assignments seem very similar to a ConfigParse
type of structure.  Here is a pyparsing version that handles the test data
in your original post (I kept Danny Yoo's recursive list values, and added
recursive dictionary entries):

--
import pyparsing as pp

listValue = pp.Forward()
listSeq = pp.Suppress('{') + pp.Group(pp.ZeroOrMore(listValue)) +
pp.Suppress('}')
listValue  ( pp.dblQuotedString.setParseAction(pp.removeQuotes) | 
pp.Word(pp.alphanums) | listSeq )

keyName = pp.Word( pp.alphas )

entries = pp.Forward()
entrySeq = pp.Suppress('{') + pp.Group(pp.OneOrMore(entries)) +
pp.Suppress('}')
entries  pp.Dict( 
pp.OneOrMore( 
pp.Group( keyName + pp.Suppress('=') + (entrySeq |
listValue) ) ) )
--


Dict is one of the most confusing classes to use, and there are some
examples in the examples directory that comes with pyparsing (see
dictExample2.py), but it is still tricky.  Here is some code to access your
input test data, repeated here for easy reference:

--
testdata = \
country = {
tag = ENG
ai = {
flags = { }
combat = { DAU FRA ORL PRO }
continent = { }
area = { }
region = { British Isles NorthSeaSea ECAtlanticSea NAtlanticSea
TagoSea WCAtlanticSea }
war = 60
ferocity = no
}
}

parsedEntries = entries.parseString(testdata)

def dumpEntries(dct,depth=0):
keys = dct.keys()
keys.sort()
for k in keys:
print ('  '*depth) + '- ' + k + ':',
if isinstance(dct[k],pp.ParseResults):
if dct[k][0].keys():
print
dumpEntries(dct[k][0],depth+1)
else:
print dct[k][0]
else:
print dct[k]

dumpEntries( parsedEntries )

print
print parsedEntries.country[0].tag
print parsedEntries.country[0].ai[0].war
print parsedEntries.country[0].ai[0].ferocity
--

This will print out:

--
- country:
  - ai:
- area: []
- combat: ['DAU', 'FRA', 'ORL', 'PRO']
- continent: []
- ferocity: no
- flags: []
- region: ['British Isles', 'NorthSeaSea', 'ECAtlanticSea',
'NAtlanticSea', 'TagoSea', 'WCAtlanticSea']
- war: 60
  - tag: ENG

ENG
60
No
--

But I really dislike having to dereference those nested values using the
0'th element.  So I'm going to fix pyparsing so that in the next release,
you'll be able to reference the sub-elements as:

print parsedEntries.country.tag
print parsedEntries.country.ai.war
print parsedEntries.country.ai.ferocity

This *may* break some existing code, but Dict is not heavily used, based on
feedback from users, and this may make it more useful in general, especially
when data parses into nested Dict's.

Hope this sheds more light than confusion!
-- Paul McGuire

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Parsing problem

2005-07-20 Thread Liam Clarke

Well, I've been poking around and... well.. this is way better than writing complex regexes.

To suit my needs, I need something that can handle - 

foo = bar
foo = 20
foo = { bar 20 }
foo = { bar = 20 baz}
foo = {bar = 20 baz { dave henry}}

OK, so the last one's extreme. So far, I can handle down to foo = { bar
20 }, but it looks ugly, so some feedback on my very rough usage of
pyparsing would be great. 

 from pyparsing import Word, Suppress, alphas, nums
 q = (Word(alphas) + Suppress(=) + ( ( Word(nums) |
Word(alphas) ) | ( Suppress({) + pyparsing.ZeroOrMore( Word (alphas)
| Word(nums) ) + Suppress(} ) ) ) )
 q.parseString(foo = bar).asList()
['foo', 'bar']
 q.parseString(a = 23).asList()
['a', '23']
 q.parseString( foo = { bar baz 23 }).asList()
['foo', 'bar', 'baz', '23']

Yeech. 

I'm sure I can shorten that a whole lot ( I just found alphanums in the
manual, d'oh. ), but it works pretty good out of the box. Thanks for
the heads up.

Couple of queries -

I think I understand Danny's example of circular references. 

--
Value   (Symbol | Sequence)
Sequence  (pyparsing.Suppress({) +
   pyparsing.Group(pyparsing.ZeroOrMore(Value)) +
   pyparsing.Suppress(}))
--

Sequence depends on Value for it's *ahem* value, but Value depends on Sequence for it's value, so I'll play with that.

Is anyone able to post an example of returning dictionaries from ParsingResults? If so, it would be brilliant. 

The documentation states - 
the Dict class generates dictionary entries using the data of the
input text - in addition to ParseResults listed as [ [ a1, b1, c1, ...], [ a2, b2, c2, ...] ]
it also acts as a dictionary with entries defined as { a1 : [ b1, c1, ... ] }, { a2 : [ b2, c2, ... ] };

Problem is, I haven't figured out how to use it yet, I know I could use
pyparsing.Group(stuff) to ensure proper key:value pairings. 

Thanks for the pointers so far, feeling very chuffed with myself for
managing to get this far, I had strayed into VBA territory, it's nice
to work with real objects again. 

And of course, always open to being shown the simple, elegant way. ;)

Many thanks, 

Liam Clarke

On 7/19/05, Liam Clarke [EMAIL PROTECTED] wrote:
Thanks guys, I daresay I will have a lot of questions regarding this,
but at least I have a point to start digging and a better shovel!

Cheers, 

Liam ClarkeOn 7/19/05, Danny Yoo 
[EMAIL PROTECTED] wrote:
On Mon, 18 Jul 2005, Liam Clarke wrote: country = { tag = ENG ai = { flags = { } combat = { DAU FRA ORL PRO } continent = { } area = { } region = { British Isles NorthSeaSea ECAtlanticSea NAtlanticSea
 TagoSea WCAtlanticSea } war = 60 ferocity = no } }[Long message ahead; skip if you're not interested.]Kent mentioned PyParsing,

http://pyparsing.sourceforge.net/which is a really excellent system.Here's a demo of what it can do, just
so you have a better idea what pyparsing is capable of.
(For the purposes of this demo, I'm doing 'import pyparsing', but in realusage, I'd probably use 'from pyparsing import ...' just to make thingsless verbose.)Let's say that we want to recognize a simpler subset of the data that you
have there, something like:{ fee fie foo fum }And let's imagine that we have a function parse() that can take a stringlike:## testString = ... { fee fie foo fum }
... ##This imaginary parse() function could turn that into something that lookslike a Python value, like this:## parse(testString)([fee, fie, foo, fum])
##That's our goal; does this make sense so far?So how do we start?Instead of going at the big goal of doing:country = { fee fie foo fum }let's start small by teaching our system how to recognize the innermost
parts, the small things like fee or foo.Let's start there:## Symbol = pyparsing.Word(pyparsing.alphas)##We want a Symbol to be able to recognize a Word made up of alphabetic
letters.Does this work?## Symbol.parseString(fee)(['fee'], {})###Symbol is now a thing that can parse a string, and return a list ofresults in a pyparsing.ParseResults

 object.Ok, if we can recognize Symbols, let's go for the jugular:{ fee fie foo fum }Let's call this a Sequence.## Sequence = { + pyparsing.ZeroOrMore

(Symbol) + }##A Sequence is made up of zero or more Symbols.Wait, let's change that, for a moment, to A Sequence is made up of zeroor more Values.(You'll see why in a moment.*grin*)
If we turn toward this strange way, then we need a definition for a Value:## Value = Symbol##and now we can say that a Sequence is a bunch of Values:##
 Sequence = { + pyparsing.ZeroOrMore(Value) + }##Let's try this out:## Sequence.parseString('{ fee fiefoo fum}')(['{', 'fee', 'fie', 'foo', 'fum', '}'], {})
##This is close, but it's not quite right: the problem is that we'd like tosomehow group the results all together in a list, and without the braces.That is, we actually want to see:[['fee', 'fie', 'foo', 'fum']]
in some form.(Remember, we want a list of a single result, and thatresult should be our

Re: [Tutor] Parsing problem

2005-07-19 Thread Liam Clarke

Thanks guys, I daresay I will have a lot of questions regarding this,
but at least I have a point to start digging and a better shovel!

Cheers, 

Liam ClarkeOn 7/19/05, Danny Yoo [EMAIL PROTECTED] wrote:
On Mon, 18 Jul 2005, Liam Clarke wrote: country = { tag = ENG ai = { flags = { } combat = { DAU FRA ORL PRO } continent = { } area = { } region = { British Isles NorthSeaSea ECAtlanticSea NAtlanticSea
 TagoSea WCAtlanticSea } war = 60 ferocity = no } }[Long message ahead; skip if you're not interested.]Kent mentioned PyParsing,
http://pyparsing.sourceforge.net/which is a really excellent system.Here's a demo of what it can do, justso you have a better idea what pyparsing is capable of.
(For the purposes of this demo, I'm doing 'import pyparsing', but in realusage, I'd probably use 'from pyparsing import ...' just to make thingsless verbose.)Let's say that we want to recognize a simpler subset of the data that you
have there, something like:{ fee fie foo fum }And let's imagine that we have a function parse() that can take a stringlike:## testString = ... { fee fie foo fum }
... ##This imaginary parse() function could turn that into something that lookslike a Python value, like this:## parse(testString)([fee, fie, foo, fum])
##That's our goal; does this make sense so far?So how do we start?Instead of going at the big goal of doing:country = { fee fie foo fum }let's start small by teaching our system how to recognize the innermost
parts, the small things like fee or foo.Let's start there:## Symbol = pyparsing.Word(pyparsing.alphas)##We want a Symbol to be able to recognize a Word made up of alphabetic
letters.Does this work?## Symbol.parseString(fee)(['fee'], {})###Symbol is now a thing that can parse a string, and return a list ofresults in a pyparsing.ParseResults
 object.Ok, if we can recognize Symbols, let's go for the jugular:{ fee fie foo fum }Let's call this a Sequence.## Sequence = { + pyparsing.ZeroOrMore
(Symbol) + }##A Sequence is made up of zero or more Symbols.Wait, let's change that, for a moment, to A Sequence is made up of zeroor more Values.(You'll see why in a moment.*grin*)
If we turn toward this strange way, then we need a definition for a Value:## Value = Symbol##and now we can say that a Sequence is a bunch of Values:##
 Sequence = { + pyparsing.ZeroOrMore(Value) + }##Let's try this out:## Sequence.parseString('{ fee fiefoo fum}')(['{', 'fee', 'fie', 'foo', 'fum', '}'], {})
##This is close, but it's not quite right: the problem is that we'd like tosomehow group the results all together in a list, and without the braces.That is, we actually want to see:[['fee', 'fie', 'foo', 'fum']]
in some form.(Remember, we want a list of a single result, and thatresult should be our Sequence.)How do we get this working?We have to tell pyparsing to Group themiddle elements together in a collection, and to suppress the braces
from the result.Here we go:## Sequence = (pyparsing.Suppress({) +... pyparsing.Group(pyparsing.ZeroOrMore(Value)) +... pyparsing.Suppress
(}))##Does this work?## Sequence.parseString('{ fee fiefoo fum}')([(['fee', 'fie', 'foo', 'fum'], {})], {})##That looks a little messy and more nested than expected.
Actually, what's happening is that we're looking at thatpyparsing.ParseResults object, so there's more nesting in the stringrepresentation than what's really there.We can use the ParseResults'sasList() method to make it a little easier to see what the real result
value looks like:## Sequence.parseString('{ fee fiefoo fum}').asList()[['fee', 'fie', 'foo', 'fum']]##That's better.Out of curiosity, wouldn't it be neat if we could parse out something like
this? { fee fie {foo fum} }*cough* *cough*What we'd like to do is make Sequence itself a possible value.Theproblem is that then there's a little circularity involved:
### Illegal PyParsing pseudocode###Value = Symbol | SequenceSequence = (pyparsing.Suppress({) +pyparsing.Group(pyparsing.ZeroOrMore(Value)) +pyparsing.Suppress
(}))##The problem is that Value can't be defined before Sequence is, andvice-versa.We break this problem by telling PyParsing ok, the followingrules will come up soon and forward define them:
## Value = pyparsing.Forward() Sequence = pyparsing.Forward()##and once we have these forward declarations, we can then reconnect them totheir real definitions by using ''.(This looks bizarre, but it applies
just to rules that are Forward()ed.)##Value (Symbol | Sequence)Sequence  (pyparsing.Suppress({) + pyparsing.Group(pyparsing.ZeroOrMore(Value)) + 
pyparsing.Suppress(}))##Let's try it:## Value.parseString(' { fee fie {foo fum} } ').asList()[['fee', 'fie', ['foo', 'fum']]]##Cool.
Ok, that was a little artificial, but oh well.The idea is we now knowhow to say:A Value is either a Symbol or SequenceandA Sequence is a bunch of Valueswithout getting into trouble with pyparsing, and that's important whenever
we're

[Tutor] Parsing problem

2005-07-18 Thread Liam Clarke

Hi all, 

I am a Europa Universalis II freak, and in attempting to recreate a
lost saved game, I had to delve into the mechanics of the save game
file. 
Which, luckily, is plain text. 

It's formatted like this - 

country = { 
 tag = ENG 
 ai = { 
 flags = { } 
 combat = { DAU FRA ORL PRO } 
 continent = { } 
 area = { } 
 region = { British Isles
NorthSeaSea ECAtlanticSea NAtlanticSea TagoSea WCAtlanticSea
} 
 war = 60 
 ferocity = no 
 }
}

Now, it tends to conform to certain rules, which make it a bit easier,
there's always a space either side of an equals sign and such forth,
which should hopefully make parsing stuff like - 

date = { year = 1421 month = july day = 7 }

a bit less complex, considering that it uses space to separate list items. 

What I need to do, is to turn this into a data structure, and I think
this relates to XML in a way. Basically, I want to parse the above (I
assume I'll be counting braces to find where I am) so that a country
object called ENG has a dictionary called ai, which points to lists,
integers, strings etc. and so forth. 

If anyone has any links to any (simple) examples of XML parsing or
similar which could give me pointers as to how to go about this, it'd
be much appreciated. 

Regards, 

Liam Clarke

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Parsing problem

2005-07-18 Thread Kent Johnson

Liam Clarke wrote:
 What I need to do, is to turn this into a data structure, and I think 
 this relates to XML in a way. Basically, I want to parse the above (I 
 assume I'll be counting braces to find where I am) so that a country 
 object called ENG has a dictionary called ai, which points to lists, 
 integers, strings etc. and so forth. 
 
 If anyone has any links to any (simple) examples of XML parsing or 
 similar which could give me pointers as to how to go about this, it'd be 
 much appreciated.

Take a look at pyparsing, I think it is the easiest Python parsing package.
http://pyparsing.sourceforge.net/

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Parsing problem

2005-07-18 Thread Danny Yoo



On Mon, 18 Jul 2005, Liam Clarke wrote:

 country = {
 tag = ENG
 ai = {
 flags = { }
 combat = { DAU FRA ORL PRO }
 continent = { }
 area = { }
 region = { British Isles NorthSeaSea ECAtlanticSea NAtlanticSea
 TagoSea WCAtlanticSea }
 war = 60
 ferocity = no
 }
 }

[Long message ahead; skip if you're not interested.]


Kent mentioned PyParsing,

http://pyparsing.sourceforge.net/

which is a really excellent system.  Here's a demo of what it can do, just
so you have a better idea what pyparsing is capable of.

(For the purposes of this demo, I'm doing 'import pyparsing', but in real
usage, I'd probably use 'from pyparsing import ...' just to make things
less verbose.)


Let's say that we want to recognize a simpler subset of the data that you
have there, something like:

{ fee fie foo fum }

And let's imagine that we have a function parse() that can take a string
like:

##
 testString = 
... { fee fie foo fum }
... 
##


This imaginary parse() function could turn that into something that looks
like a Python value, like this:

##
 parse(testString)
([fee, fie, foo, fum])
##

That's our goal; does this make sense so far?  So how do we start?



Instead of going at the big goal of doing:

country = { fee fie foo fum }

let's start small by teaching our system how to recognize the innermost
parts, the small things like fee or foo.  Let's start there:

##
 Symbol = pyparsing.Word(pyparsing.alphas)
##

We want a Symbol to be able to recognize a Word made up of alphabetic
letters.  Does this work?

##
 Symbol.parseString(fee)
(['fee'], {})
###

Symbol is now a thing that can parse a string, and return a list of
results in a pyparsing.ParseResults object.


Ok, if we can recognize Symbols, let's go for the jugular:

{ fee fie foo fum }


Let's call this a Sequence.

##
 Sequence = { + pyparsing.ZeroOrMore(Symbol) + }
##


A Sequence is made up of zero or more Symbols.


Wait, let's change that, for a moment, to A Sequence is made up of zero
or more Values.  (You'll see why in a moment.  *grin*)



If we turn toward this strange way, then we need a definition for a Value:

##
 Value = Symbol
##

and now we can say that a Sequence is a bunch of Values:

##
 Sequence = { + pyparsing.ZeroOrMore(Value) + }
##


Let's try this out:

##
 Sequence.parseString('{ fee fiefoo fum}')
(['{', 'fee', 'fie', 'foo', 'fum', '}'], {})
##


This is close, but it's not quite right: the problem is that we'd like to
somehow group the results all together in a list, and without the braces.
That is, we actually want to see:

[['fee', 'fie', 'foo', 'fum']]

in some form.  (Remember, we want a list of a single result, and that
result should be our Sequence.)


How do we get this working?  We have to tell pyparsing to Group the
middle elements together in a collection, and to suppress the braces
from the result.

Here we go:

##
 Sequence = (pyparsing.Suppress({) +
... pyparsing.Group(pyparsing.ZeroOrMore(Value)) +
... pyparsing.Suppress(}))
##

Does this work?


##
 Sequence.parseString('{ fee fiefoo fum}')
([(['fee', 'fie', 'foo', 'fum'], {})], {})
##


That looks a little messy and more nested than expected.


Actually, what's happening is that we're looking at that
pyparsing.ParseResults object, so there's more nesting in the string
representation than what's really there.  We can use the ParseResults's
asList() method to make it a little easier to see what the real result
value looks like:

##
 Sequence.parseString('{ fee fiefoo fum}').asList()
[['fee', 'fie', 'foo', 'fum']]
##

That's better.



Out of curiosity, wouldn't it be neat if we could parse out something like
this?

 { fee fie {foo fum} }

*cough* *cough*

What we'd like to do is make Sequence itself a possible value.  The
problem is that then there's a little circularity involved:


### Illegal PyParsing pseudocode  ###
Value = Symbol | Sequence

Sequence = (pyparsing.Suppress({) +
pyparsing.Group(pyparsing.ZeroOrMore(Value)) +
pyparsing.Suppress(}))
##

The problem is that Value can't be defined before Sequence is, and
vice-versa.  We break this problem by telling PyParsing ok, the following
rules will come up soon and forward define them:

##
 Value = pyparsing.Forward()
 Sequence = pyparsing.Forward()
##

and once we have these forward declarations, we can then reconnect them to
their real definitions by using ''.  (This looks bizarre, but it applies
just to rules that are Forward()ed.)

##
Value (Symbol | Sequence)
Sequence  (pyparsing.Suppress({) +
 pyparsing.Group(pyparsing.ZeroOrMore(Value)) +
 pyparsing.Suppress(}))
##


Let's try it:

##
 Value.parseString(' { fee fie {foo fum} } ').asList()
[['fee', 'fie', ['foo', 'fum']]]
##


Cool.


Ok, that was a little artificial, but oh well.  The idea is we now know
how to say:

A Value is

Re: [Tutor] Parsing problem

Re: [Tutor] Parsing problem

Re: [Tutor] Parsing problem

Re: [Tutor] Parsing problem

Re: [Tutor] Parsing problem

Re: [Tutor] Parsing problem

Re: [Tutor] Parsing problem

Re: [Tutor] Parsing problem

Re: [Tutor] Parsing problem

Re: [Tutor] Parsing problem

Re: [Tutor] Parsing problem

[Tutor] Parsing problem

Re: [Tutor] Parsing problem

Re: [Tutor] Parsing problem

[Tutor] Parsing problem

Re: [Tutor] Parsing problem

Re: [Tutor] Parsing problem

17 matches

Site Navigation

Mail list logo

Footer information