Re: pyparsing question: single word values with a double quoted string every once in a while

2009-05-27 Thread Piet van Oostrum
 hubritic colinland...@gmail.com (h) wrote:

h I want to parse a log that has entries like this:
h [2009-03-17 07:28:05.545476 -0500] rprt s=d2bpr80d6 m=2 mod=mail
h cmd=msg module=access rule=x_dynamic_ip action=discard attachments=0
h rcpts=1
h 
routes=DL_UK_ALL,NOT_DL_UK_ALL,default_inbound,firewallsafe,mail01_mail02,spfsafe
h size=4363 guid=291f0f108fd3a6e73a11f96f4fb9e4cd hdr_mid=
h qid=n2HCS4ks025832 subject=I want to interview you duration=0.236
h elapsed=0.280


h the keywords will not always be the same. Also differing log levels
h will provide a different mix of keywords.

h This is good enough to get the majority of cases where there is a
h keyword, a = and then a value with no spaces:

h Group(Word(alphas + +_-.).setResultsName(keyword) +  Suppress
h (Literal (=)) + Optional(Word(printables)))

h Sometimes there is a subject, which is a quoted string. That is easy
h enough to get with this:
h dblQuotedString(ZeroOrMore(Word(printables) ) )

h My problem is combining them into one expression. Either I wind up
h with just the subject or I wind up with they keywords and their
h values, one of which is:

h subject, 'I'

h which is clearly not what I want.

h Do I scan each line twice, first looking for quotes ?


Use the MatchFirst (|)

I have also split it up to make it more readable

kw = Word(alphas + +_-.).setResultsName(keyword) 
eq = Suppress(Literal (=))
value = dblQuotedString | Optional(Word(printables))

pattern = Group(kw + eq + value)

-- 
Piet van Oostrum p...@cs.uu.nl
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: p...@vanoostrum.org
-- 
http://mail.python.org/mailman/listinfo/python-list


pyparsing question: single word values with a double quoted string every once in a while

2009-05-19 Thread hubritic
I want to parse a log that has entries like this:

[2009-03-17 07:28:05.545476 -0500] rprt s=d2bpr80d6 m=2 mod=mail
cmd=msg module=access rule=x_dynamic_ip action=discard attachments=0
rcpts=1
routes=DL_UK_ALL,NOT_DL_UK_ALL,default_inbound,firewallsafe,mail01_mail02,spfsafe
size=4363 guid=291f0f108fd3a6e73a11f96f4fb9e4cd hdr_mid=
qid=n2HCS4ks025832 subject=I want to interview you duration=0.236
elapsed=0.280


the keywords will not always be the same. Also differing log levels
will provide a different mix of keywords.

This is good enough to get the majority of cases where there is a
keyword, a = and then a value with no spaces:

Group(Word(alphas + +_-.).setResultsName(keyword) +  Suppress
(Literal (=)) + Optional(Word(printables)))

Sometimes there is a subject, which is a quoted string. That is easy
enough to get with this:
dblQuotedString(ZeroOrMore(Word(printables) ) )

My problem is combining them into one expression. Either I wind up
with just the subject or I wind up with they keywords and their
values, one of which is:

subject, 'I'

which is clearly not what I want.

Do I scan each line twice, first looking for quotes ?

Thanks


-- 
http://mail.python.org/mailman/listinfo/python-list


Pyparsing Question

2008-05-16 Thread Ant

Hi all,

I have a question on PyParsing. I am trying to create a parser for a 
hierarchical todo list format, but have hit a stumbling block. I have 
parsers for the header of the list (title and description), and the body 
(recursive descent on todo items).


Individually they are working fine, combined they throw an exception. 
The code follows:


#!/usr/bin/python
# parser.py
import pyparsing as pp

def grammar():
underline = pp.Word(=).suppress()
dotnum = pp.Combine(pp.Word(pp.nums) + .)
textline = pp.Combine(pp.Group(pp.Word(pp.alphas, pp.printables) + 
pp.restOfLine))

number = pp.Group(pp.OneOrMore(dotnum))

headtitle = textline
headdescription = pp.ZeroOrMore(textline)
head = pp.Group(headtitle + underline + headdescription)

taskname = pp.OneOrMore(dotnum) + textline
task = pp.Forward()
subtask = pp.Group(dotnum + task)
task  (taskname + pp.ZeroOrMore(subtask))
maintask = pp.Group(pp.LineStart() + task)

parser = pp.OneOrMore(maintask)

return head, parser

text = 


My Title


Text on a longer line of several words.
More test
and more.



text2 = 

1. Task 1
1.1. Subtask
1.1.1. More tasks.
1.2. Another subtask
2. Task 2
2.1. Subtask again

head, parser = grammar()

print head.parseString(text)
print parser.parseString(text2)

comb = head + pp.OneOrMore(pp.LineStart() + pp.restOfLine) + parser
print comb.parseString(text + text2)

#===

Now the first two print statements output the parse tree as I would 
expect, but the combined parser fails with an exception:


Traceback (most recent call last):
  File parser.py, line 50, in ?
print comb.parseString(text + text2)
.
. [Stacktrace snipped]
.
raise exc
pyparsing.ParseException: Expected start of line (at char 81), (line:9, 
col:1)


Any help appreciated!

Cheers,

--
Ant.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Pyparsing Question

2008-05-16 Thread Paul McGuire
On May 16, 6:43 am, Ant [EMAIL PROTECTED] wrote:
 Hi all,

 I have a question on PyParsing. I am trying to create a parser for a
 hierarchical todo list format, but have hit a stumbling block. I have
 parsers for the header of the list (title and description), and the body
 (recursive descent on todo items).


LineStart *really* wants to be parsed at the beginning of a line.
Your textline reads up to but not including the LineEnd.  Try making
these changes.

1. Change textline to:

 textline = pp.Combine(
pp.Group(pp.Word(pp.alphas, pp.printables) + pp.restOfLine)) +
\
pp.LineEnd().suppress()

2. Change comb to:

comb = head + parser

With these changes, my version of your code runs ok.

-- Paul
--
http://mail.python.org/mailman/listinfo/python-list


Re: Pyparsing Question

2008-05-16 Thread castironpi
On May 16, 6:43 am, Ant [EMAIL PROTECTED] wrote:
 Hi all,

 I have a question on PyParsing. I am trying to create a parser for a
 hierarchical todo list format, but have hit a stumbling block. I have
 parsers for the header of the list (title and description), and the body
 (recursive descent on todo items).

 Individually they are working fine, combined they throw an exception.
 The code follows:

 #!/usr/bin/python
 # parser.py
 import pyparsing as pp

 def grammar():
      underline = pp.Word(=).suppress()
      dotnum = pp.Combine(pp.Word(pp.nums) + .)
      textline = pp.Combine(pp.Group(pp.Word(pp.alphas, pp.printables) +
 pp.restOfLine))
      number = pp.Group(pp.OneOrMore(dotnum))

      headtitle = textline
      headdescription = pp.ZeroOrMore(textline)
      head = pp.Group(headtitle + underline + headdescription)

      taskname = pp.OneOrMore(dotnum) + textline
      task = pp.Forward()
      subtask = pp.Group(dotnum + task)
      task  (taskname + pp.ZeroOrMore(subtask))
      maintask = pp.Group(pp.LineStart() + task)

      parser = pp.OneOrMore(maintask)

      return head, parser

 text = 

 My Title
 

 Text on a longer line of several words.
 More test
 and more.

 

 text2 = 

 1. Task 1
      1.1. Subtask
          1.1.1. More tasks.
      1.2. Another subtask
 2. Task 2
      2.1. Subtask again

 head, parser = grammar()

 print head.parseString(text)
 print parser.parseString(text2)

 comb = head + pp.OneOrMore(pp.LineStart() + pp.restOfLine) + parser
 print comb.parseString(text + text2)

 #===

 Now the first two print statements output the parse tree as I would
 expect, but the combined parser fails with an exception:

 Traceback (most recent call last):
    File parser.py, line 50, in ?
      print comb.parseString(text + text2)
 .
 . [Stacktrace snipped]
 .
      raise exc
 pyparsing.ParseException: Expected start of line (at char 81), (line:9,
 col:1)

 Any help appreciated!

 Cheers,

 --
 Ant.

I hold that the + operator should be overloaded for strings to include
newlines.  Python 3.0 print has parentheses around it; wouldn't it
make sense to take them out?
--
http://mail.python.org/mailman/listinfo/python-list


Re: Pyparsing Question

2008-05-16 Thread Ant

Hi Paul,


LineStart *really* wants to be parsed at the beginning of a line.
Your textline reads up to but not including the LineEnd.  Try making
these changes.

1. Change textline to:

 textline = pp.Combine(
pp.Group(pp.Word(pp.alphas, pp.printables) + pp.restOfLine)) +
\
pp.LineEnd().suppress()


Ah - so restOfLine excludes the actual line ending does it?


2. Change comb to:

comb = head + parser


Yes - I'd got this originally. I added the garbage to try to fix the 
problem and forgot to take it back out! Thanks for the advice - it works 
 fine now, and will provide a base for extending the list format.


Thanks,

Ant...
--
http://mail.python.org/mailman/listinfo/python-list


Re: Pyparsing Question

2008-05-16 Thread castironpi
On May 16, 10:45 am, Ant [EMAIL PROTECTED] wrote:
 Hi Paul,

  LineStart *really* wants to be parsed at the beginning of a line.
  Your textline reads up to but not including the LineEnd.  Try making
  these changes.

  1. Change textline to:

       textline = pp.Combine(
          pp.Group(pp.Word(pp.alphas, pp.printables) + pp.restOfLine)) +
  \
          pp.LineEnd().suppress()

 Ah - so restOfLine excludes the actual line ending does it?

  2. Change comb to:

      comb = head + parser

 Yes - I'd got this originally. I added the garbage to try to fix the
 problem and forgot to take it back out! Thanks for the advice - it works
   fine now, and will provide a base for extending the list format.

 Thanks,

 Ant...

There is a possibility that spirals can come from doubles, which could
be non-trivially useful, in par. in the Java library.  I won't see a
cent.  Can anyone start a thread to spin letters, and see what the
team looks like?  I want to animate spinners.  It's across
dimensions.  (per something.)  Swipe a cross in a fluid.  I'm draw
crosses.  Animate cubes to draw crosses.  I.e. swipe them.

--
http://mail.python.org/mailman/listinfo/python-list


Re: pyparsing question

2008-01-02 Thread hubritic
On Jan 1, 4:18 pm, John Machin [EMAIL PROTECTED] wrote:
 On Jan 2, 10:32 am, hubritic [EMAIL PROTECTED] wrote:

  The data I have has a fixed number of characters per field, so I could
  split it up that way, but wouldn't that defeat the purpose of using a
  parser?

 The purpose of a parser is to parse. Data in fixed columns does not
 need parsing.

   I am determined to become proficient with pyparsing so I am
  using it even when it could be considered overkill; thus, it has gone
  past mere utility now, this is a matter of principle!

 An extremely misguided principle.  Would you use an AK47 on the
 flies around your barbecue? A better principle is to choose the best
 tool for the job.

Your principle is no doubt the saner one for the real world, but your
example of AK47 is a bit off.
We generally know enough about an AK47 to know that it is not
something to kill flies with. Consider, though, if
someone unfamiliar with the concept of guns and mayhem got an AK47 for
xmas and was only told that it was
really good for killing things. He would try it out and would discover
that indeed it kills all sorts of things.
So he might try killing flies. Then he would discover the limitations;
those already familiar with guns would wonder
why he would waste his time.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: pyparsing question

2008-01-02 Thread Paul McGuire
On Jan 1, 5:32 pm, hubritic [EMAIL PROTECTED] wrote:
 I am trying to parse data that looks like this:

 IDENTIFIER    TIMESTAMP   T  C   RESOURCE_NAME   DESCRIPTION
 2BFA76F6     1208230607   T   S   SYSPROC                    SYSTEM
 SHUTDOWN BY USER
 A6D1BD62   1215230807     I
 H                                            Firmware Event

snip

 The data I have has a fixed number of characters per field, so I could
 split it up that way, but wouldn't that defeat the purpose of using a
 parser?  

I think you have this backwards.  I use pyparsing for a lot of text
processing, but if it is not a good fit, or if str.split is all that
is required, there is no real rationale for using anything more
complicated.

 I am determined to become proficient with pyparsing so I am
 using it even when it could be considered overkill; thus, it has gone
 past mere utility now, this is a matter of principle!


Well, I'm glad you are driven to learn pyparsing if it kills you, but
John Machin has a good point.  This data is really so amenable to
something as simple as:

for line in logfile:
id,timestamp,t,c resource_and_description = line.split(None,4)

that it is difficult to recommend pyparsing for this case.  The sample
you posted was space-delimited, but if it is tab-delimited, and there
is a pair of tabs between the H and Firmware Event on the second
line, then just use split(\t) for your data and be done.

Still, pyparsing may be helpful in disambiguating that RESOURCE_NAME
and DESCRIPTION text.  One approach would be to enumerate (if
possible) the different values of RESOURCE_NAME.  Something like this:

ident = Word(alphanums)
timestamp = Word(nums,exact=10)

# I don't know what these are, I'm just getting the values
# from the sample text you posted
t_field = oneOf(T I)
c_field = oneOf(S H)

# I'm just guessing here, you'll need to provide the actual
# values from your log file
resource_name = oneOf(SYSPROC USERPROC IOSUBSYS whatever)

logline = ident(identifier) + timestamp(time) + \
t_field(T) + c_field(C) + \
Optional(resource_name, default=)(resource) + \
Optional(restOfLine, default=)(description)


Another tack to take might be to use a parse action on the resource
name, to verify the column position of the found token by using the
pyparsing method col:

def matchOnlyAtCol(n):
def verifyCol(strg,locn,toks):
if col(locn,strg) != n: raise
ParseException(strg,locn,matched token not at column %d % n)
return verifyCol

resource_name = Word(alphas).setParseAction(matchOnlyAtCol(35))

This will only work if your data really is columnar - the example text
that you posted isn't.  (Hmm, I like that matchOnlyAtCol method, I
think I'll add that to the next release of pyparsing...)

Here are some similar parsers that might give you some other ideas:
http://pyparsing.wikispaces.com/space/showimage/httpServerLogParser.py
http://mail.python.org/pipermail/python-list/2005-January/thread.html#301450

In the second link, I made a similar remark, that pyparsing may not be
the first tool to try, but the variability of the input file made the
non-pyparsing options pretty hairy-looking with special case code, so
in the end, pyparsing was no more complex to use.

Good luck!
-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list


pyparsing question

2008-01-01 Thread hubritic
I am trying to parse data that looks like this:

IDENTIFIERTIMESTAMP   T  C   RESOURCE_NAME   DESCRIPTION
2BFA76F6 1208230607   T   S   SYSPROCSYSTEM
SHUTDOWN BY USER
A6D1BD62   1215230807 I
HFirmware Event

My problem is that sometimes there is a RESOURCE_NAME and sometimes
not, so I wind up with Firmware as my RESOURCE_NAME and Event as
my DESCRIPTION.  The formating seems to use a set number of spaces.

I have tried making RESOURCE_NAME an Optional(Word(alphanums))) and
Description OneOrMore(Word(alphas) + LineEnd(). So the question is,
how can I avoid having the first word of Description sucked into
RESOURCE_NAME when that field should be blank?


The data I have has a fixed number of characters per field, so I could
split it up that way, but wouldn't that defeat the purpose of using a
parser?  I am determined to become proficient with pyparsing so I am
using it even when it could be considered overkill; thus, it has gone
past mere utility now, this is a matter of principle!

thanks
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: pyparsing question

2008-01-01 Thread Neil Cerutti
On Jan 1, 2008 6:32 PM, hubritic [EMAIL PROTECTED] wrote:

 I am trying to parse data that looks like this:

 IDENTIFIERTIMESTAMP   T  C   RESOURCE_NAME   DESCRIPTION
 2BFA76F6 1208230607   T   S   SYSPROCSYSTEM
 SHUTDOWN BY USER
 A6D1BD62   1215230807 I
 HFirmware Event

 My problem is that sometimes there is a RESOURCE_NAME and sometimes
 not, so I wind up with Firmware as my RESOURCE_NAME and Event as
 my DESCRIPTION.  The formating seems to use a set number of spaces.



 The data I have has a fixed number of characters per field, so I could
 split it up that way, but wouldn't that defeat the purpose of using a
 parser?  I am determined to become proficient with pyparsing so I am
 using it even when it could be considered overkill; thus, it has gone
 past mere utility now, this is a matter of principle!


If your data is really in fixed-size columns, then pyparsing is the wrong
tool.

There's no standard Python tool for reading and writing fixed-length field
flatfile data files, but it's pretty simple to use named slices to get at
the data.

identifier = slice(0, 8)
timestamp = slice(8, 18)
t = slice(18, 21)
c = slice(21, 24)
resource_name = slice(24, 35)
description = slice(35)

for line in file:
   line = line.rstrip(\n)
   print id:, line[identifier]
   print timestamp:, line[timestamp]
   ...etc...
-- 
Neil Cerutti
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: pyparsing question

2008-01-01 Thread Neil Cerutti
On Jan 1, 2008 6:54 PM, Neil Cerutti [EMAIL PROTECTED] wrote:

 There's no standard Python tool for reading and writing fixed-length field
 flatfile data files, but it's pretty simple to use named slices to get at
 the data.

 identifier = slice(0, 8)
 timestamp = slice(8, 18)
 t = slice(18, 21)
 c = slice(21, 24)
 resource_name = slice(24, 35)
 description = slice(35)


Oops! I made an errant stab at the slice constructor. That last should be
'slice(35, None)'.

-- 
Neil Cerutti
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: pyparsing question

2008-01-01 Thread John Machin
On Jan 2, 10:32 am, hubritic [EMAIL PROTECTED] wrote:

 The data I have has a fixed number of characters per field, so I could
 split it up that way, but wouldn't that defeat the purpose of using a
 parser?

The purpose of a parser is to parse. Data in fixed columns does not
need parsing.

  I am determined to become proficient with pyparsing so I am
 using it even when it could be considered overkill; thus, it has gone
 past mere utility now, this is a matter of principle!

An extremely misguided principle.  Would you use an AK47 on the
flies around your barbecue? A better principle is to choose the best
tool for the job.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Pyparsing Question.

2006-11-23 Thread Ant

 Welcome to pyparsing!  The simplest way to implement a markup processor in
 pyparsing is to define the grammar of the markup, attach a parse action to
 each markup type to convert the original markup to the actual results, and
 then use transformString to run through the input and do the conversion.
 This discussion topic has some examples:
 http://pyparsing.wikispaces.com/message/view/home/31853.

Thanks for the pointers - I had a look through the examples on the
pyparsing website, but none seemed to show a simple example of this
kind of thing. The discussion topic you noted above is exactly the sort
of thing I was after!

Cheers,

-- 
http://mail.python.org/mailman/listinfo/python-list


Pyparsing Question.

2006-11-22 Thread Ant
I have a home-grown Wiki that I created as an excercise, with it's own
wiki markup (actually just a clone of the Trac wiki markup). The wiki
text parser I wrote works nicely, but makes heavy use of regexes, tags
and stacks to parse the text. As such it is a bit of a mantainability
nightmare - adding new wiki constructs can be a bit painful.

So I thought I'd look into the pyparsing module, but can't find a
simple example of processing random text. For example, I want to parse
the following:

Some random text and '''some bold text''' and some more random text

into:

Some random text and strongsome bold text/strong and some more
random text

I have the following as a starting point:

from pyparsing import *

def parse(text):
italics = QuotedString(quoteChar='')

parser = Optional(italics)

parsed_text = parser.parseString(text)


print parse(Test this is '''bold''' but this is not.)

So if you could provide a bit of a starting point, I'd be grateful!

Cheers,

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Pyparsing Question.

2006-11-22 Thread Stefan Behnel
Ant wrote:
 So I thought I'd look into the pyparsing module, but can't find a
 simple example of processing random text.

Have you looked at the examples on the pyparsing web page?

Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Pyparsing Question.

2006-11-22 Thread Paul McGuire
Ant [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
I have a home-grown Wiki that I created as an excercise, with it's own
 wiki markup (actually just a clone of the Trac wiki markup). The wiki
 text parser I wrote works nicely, but makes heavy use of regexes, tags
 and stacks to parse the text. As such it is a bit of a mantainability
 nightmare - adding new wiki constructs can be a bit painful.

 So I thought I'd look into the pyparsing module, but can't find a
 simple example of processing random text. For example, I want to parse
 the following:

 Some random text and '''some bold text''' and some more random text

 into:

 Some random text and strongsome bold text/strong and some more
 random text

 I have the following as a starting point:

 from pyparsing import *

 def parse(text):
italics = QuotedString(quoteChar='')

parser = Optional(italics)

parsed_text = parser.parseString(text)


 print parse(Test this is '''bold''' but this is not.)

 So if you could provide a bit of a starting point, I'd be grateful!

 Cheers,

Ant,

Welcome to pyparsing!  The simplest way to implement a markup processor in 
pyparsing is to define the grammar of the markup, attach a parse action to 
each markup type to convert the original markup to the actual results, and 
then use transformString to run through the input and do the conversion. 
This discussion topic has some examples: 
http://pyparsing.wikispaces.com/message/view/home/31853.

-- Paul


-- 
http://mail.python.org/mailman/listinfo/python-list