In principle, PLY might be applicable to parsing certain kinds of data files (although it was never created with that use-case in mind). However, it would only really work if the input format can be precisely described by some sort of context-free grammar. Sadly, a lot of input formats don't neatly fit into that model. For more regular kinds of input (e.g., XML), the performance of PLY is probably not going to be up-to-par with a dedicated parser. I've never benchmarked it, but I'd be willing to guess that the Python ElementTree module (for parsing XML) would run more than order of magnitude faster than anything I could cook up in PLY.
For what it's worth, you should also take a serious look at using Python generator functions for parsing data. They are particularly effective at breaking up input streams into records, defining different parsing stages, and creating various kinds of processing pipelines. Based on the sample data you included in your messages, I'd probably be inclined to try and cook up some generator solution myself. Cheers, Dave On Mar 23, 2009, at 1:55 PM, Jester_EE wrote: > > 'ply-hack' Members, > > I am looking for a little insight for potentially using PLY for > parsing raw data files. I am posting here because most of the > documentation I have found concerning PLY, lex, yacc, etc. has dealt > with language type lexing and parsing (e.g. C Language). While I can > extrapolate from what I have read that a data parser would be possible > within the PLY construct (using quite a long token list and a number > of lexer states), my question is if I would gain anything from doing > it? The complexity seemingly goes up, but does the prospect of code > reuse go up as well? Would I be able to generalize class structures > of parsers to inherit attributes? > > I have made many 'single-use' ~200 line parsers in Perl in the past > but have always found them wanting. Little nuances to the data file > structure tended to have me completely re-architect my script; in > typical Perl fashion. I'm eagerly looking to get away from that > problem, thus the switch to Python and the search for a more general > and easy to maintain parsing system. > > For an extremely small example, here is a block of something that > would need to be processed: > <data> > ! Title Block > ! Name = User > ! Date = MM/DD/YYYY > > HEADER > INPUTS > sweep1 V D GRD SMU1 0.1 LIN 1 0 1 > 5 0.25 > sweep2 V G GRD SMU2 0.001 LIN 2 0 5 > 6 1.0 > OUTPUTS > out1 I D GRD SMU1 > GLOB_VARS > var1 'foo' > var2 'bar' > var3 'baz' > END_HEADER > > DATA > VAR sweep1 0 > VAR sweep2 0 > > #sweep1 out1 > 0 0.123 > 0.25 0.456 > 0.50 0.789 > 0.75 1.123 > 1.00 1.456 > END_DATA > </data> > > Any thoughts are helpful! Thank you for your time! > - Jester_EE > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ply-hack" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/ply-hack?hl=en -~----------~----~----~----~------~----~------~--~---
