Peng Yu wrote: > This is more a less just a list of parsers. I would like some detailed > guidelines on which one to choose for various parsing problems. > > Regards, > Peng
It depends on the parsing problem. Obviously your not going to use an INI parser to work with XML, or vice versa. Likewise some formats can be parsed in different ways, XML parsers for example are often build around a SAX or DOM model. The differences between them (hit Wikipedia) can effect the performance of your application, more then learning how to use an XML parsers API can effect the hair on your head. For flat data, simple unix style rc or dos style ini file will often suffice, and writing a parser is fairly trivial; in fact writing a config file parser is an excellent learning exercise, to get a feel for a given languages standard I/O, string handling, and type conversion features. These kind of parsers tend to be pretty quick because of their simplicity, and writing a small but extremely fast one can be enjoyable at times; one of these days I need to do it in X86 assembly just for the hell of it. Python includes an INI parser in the standard library. XML serves well for hierarchical data models, but can be a royal pain to write code around the parsers (IMHO anyway!), but often is handy. Popular parsers for XML include expat and libxml2 - there is also a more "Pythonic" wrapper for libxml/libxslt called py-lxml; Python also comes with parsers for XML. Other formats such as JSON, YAML, heck even S-expressions could be used and parsed. Some programs only parse enough to slup up code and eval it (not always smart, but sometimes useful). In general the issues to consider when selecting a parser for a given format, involve: speed, size, and time. How long does it take to process the data set, how much memory (size) does it consume, and how much bloody time will it take to learn the API ;). The best way to choose a parser, is experiment with several, test (and profile!) them according to the project, then pick the one you like best, out of those that are suitable for the task. Profiling can be very important. -- TerryP -- http://mail.python.org/mailman/listinfo/python-list