On 7/10/19 10:50 AM, Johannes Bauer wrote:
Hi list,

I'm looking for ideas as to a pretty, Pythonic solution for a specific
problem that I am solving over and over but where I'm never happy about
the solution in the end. It always works, but never is pretty. So see
this as an open-ended brainstorming question.

Here's the task: There's a custom file format. Each line can be parsed
individually and, given the current context, the meaning of each
individual line is always clearly distinguishable. I'll give an easy
example to demonstrate:


moo = koo
bar = foo
foo :=
    abc
    def
baz = abc

Let's say the root context knows only two regexes and give them names:

keyvalue: \w+ = \w+
start-multiblock: \w+ :=

The keyvalue is contained in itself, when the line is successfully
parsed all the information is present. The start-multiblock however
gives us only part of the puzzle, namely the name of the following
block. In the multiblock context, there's different regexes that can
happen (actually only one):

multiblock-item: \s\w+

Now obviously whe the block is finished, there's no delimiter. It's
implicit by the multiblock-item regex not matching and therefore we
backtrack to the previous parser (root parser) and can successfully
parse the last line baz = abc.

Especially consider that even though this is a simple example, generally
you'll have multiple contexts, many more regexes and especially nesting
inside these contexts.

Without having to use a parser generator (for those the examples I deal
with are usually too much overhead) what I usually end up doing is
building a state machine by hand. I.e., I memorize the context, match
those and upon no match manually delegate the input data to backtracked
matchers.

This results in AWFULLY ugly code. I'm wondering what your ideas are to
solve this neatly in a Pythonic fashion without having to rely on
third-party dependencies.

Cheers,
Joe


That's pretty much what I do. I generally make the parser a class and each state a method. Every line the parser takes out of the file it passes to self.statefn, which processes the line in the current context and updates self.statefn to a different method if necessary.

--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order.  See above to fix.
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to