Re: Question: PLY and Data Parsing

David Beazley Mon, 23 Mar 2009 12:22:12 -0700

In principle, PLY might be applicable to parsing certain kinds of data  
files (although it was never created with that use-case in mind).    
However,  it would only really work if the input format can be  
precisely described by some sort of context-free grammar.   Sadly, a  
lot of input formats don't neatly fit into that model.  For more  
regular kinds of input (e.g., XML), the performance of PLY is probably  
not going to be up-to-par with a dedicated parser.   I've never  
benchmarked it, but I'd be willing to guess that the Python  
ElementTree module (for parsing XML) would run more than order of  
magnitude faster than anything I could cook up in PLY.


For what it's worth, you should also  take a serious look at using  
Python generator functions for parsing data.  They are particularly  
effective at breaking up input streams into records, defining  
different parsing stages, and creating various kinds of processing  
pipelines.     Based on the sample data you included in your messages,  
I'd probably be inclined to try and cook up some generator solution  
myself.

Cheers,
Dave


On Mar 23, 2009, at 1:55 PM, Jester_EE wrote:

>
> 'ply-hack' Members,
>
> I am looking for a little insight for potentially using PLY for
> parsing raw data files.  I am posting here because most of the
> documentation I have found concerning PLY, lex, yacc, etc. has dealt
> with language type lexing and parsing (e.g. C Language).  While I can
> extrapolate from what I have read that a data parser would be possible
> within the PLY construct (using quite a long token list and a number
> of lexer states), my question is if I would gain anything from doing
> it?  The complexity seemingly goes up, but does the prospect of code
> reuse go up as well?  Would I be able to generalize class structures
> of parsers to inherit attributes?
>
> I have made many 'single-use' ~200 line parsers in Perl in the past
> but have always found them wanting.  Little nuances to the data file
> structure tended to have me completely re-architect my script; in
> typical Perl fashion.  I'm eagerly looking to get away from that
> problem, thus the switch to Python and the search for a more general
> and easy to maintain parsing system.
>
> For an extremely small example, here is a block of something that
> would need to be processed:
> <data>
> !    Title Block
> !    Name      = User
> !    Date      = MM/DD/YYYY
>
> HEADER
>    INPUTS
>    sweep1    V    D    GRD    SMU1    0.1    LIN    1    0    1
> 5    0.25
>    sweep2    V    G    GRD    SMU2    0.001    LIN    2    0    5
> 6    1.0
>    OUTPUTS
>    out1    I    D    GRD    SMU1
>    GLOB_VARS
>    var1    'foo'
>    var2    'bar'
>    var3    'baz'
> END_HEADER
>
> DATA
>    VAR    sweep1    0
>    VAR    sweep2    0
>
>    #sweep1    out1
>    0          0.123
>    0.25       0.456
>    0.50       0.789
>    0.75       1.123
>    1.00       1.456
> END_DATA
> </data>
>
> Any thoughts are helpful!  Thank you for your time!
> - Jester_EE
> >


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ply-hack" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ply-hack?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Question: PLY and Data Parsing

Reply via email to