On Mon, Jul 15, 2019 at 8:47 PM Andrew Barnert <abarn...@yahoo.com> wrote:
> On Jul 15, 2019, at 18:44, Nam Nguyen <bits...@gmail.com> wrote: > > I have implemented a tiny (~200 SLOCs) package at > https://gitlab.com/nam-nguyen/parser_compynator that demonstrates > something like this is possible. There are several examples for you to have > a feel of it, as well as some early benchmark numbers to consider. This is > far smaller than any of the Python parsing libraries I have looked at, yet > more universal than many of them. I hope that it would convert the skeptics > ;). > > > For at least some of your use cases, I don’t think it’s a problem that > it’s 70x slower than the custom parsers you’d be replacing. How often do > you need to parse a million URLs in your inner loop? Also, if the function > composition is really the performance hurdle, can you optimize that away > relatively simply, just by building an explicit tree (expression-template > style) and walking the tree in a __call__ method, rather than building an > implicit tree of nested calls? (And that could be optimized further if > needed, e.g. by turning the tree walk into a simple virtual machine where > all of the fundamental operations are inlined into the loop, and maybe even > accelerating that with C code.) > > But I do think it’s a problem that there seems to be no way to usefully > indicate failure to the caller, and I’m not sure that could be fixed as > easily. > An empty set signifies the parse has failed. Perhaps I have misunderstood what you indicated here. > Invalid inputs in your readme examples don’t fail, they successfully > return an empty set. > Because the library supports ambiguity, it can return more than one parse results. The guarantee here is if it returns an empty set, the parse has failed. > There also doesn’t seem to be any way to trigger a hard fail rather than a > backtrack. > You can have a parser that raises an exception. None of the primitive parsers do that though. > So I’m not sure how a real urlparse replacement could do the things the > current one does, like raising a ValueError on https://abc.d[ef.ghi/ > complaining that the netloc looks like an invalid IPv6 address. (Maybe you > could def a function that raises a ValueError and attach it as a where > somewhere in the parser tree? But even if that works, wouldn’t you get a > meaningless exception that doesn’t have any information about where in the > source text or where in the parse tree it came from or why it was raised, > and, as your readme says, a stack trace full of garbage?) > urlparse right now raises ValueError('Invalid IPv6 URL'). It does not mention where in the source text the error comes from. > Can you add failure handling without breaking the “~200LOC and easy to > read” feature of the library, and without breaking the “easy to read once > you grok parser combinators” feature of the parsers built with it? > This is a good request. I will have to play around with this idea more. What I think could be the most challenging task is to attribute failure to appropriate rule(s) (i.e. expr expects a term + term, but you only have term +). I feel like some metadata about the grammar might be required here, and that might be too unwieldy to provide in a parser combinator formulation. Interestingly enough, regex doesn't have anything like this either. Cheers, Nam
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/5LDEJMFQZ6H7UF3JXJRLEIG4Q36RV5MJ/ Code of Conduct: http://python.org/psf/codeofconduct/