[Python-ideas] Re: Universal parsing library in the stdlib to alleviate security issues

Nam Nguyen Sat, 20 Jul 2019 08:26:37 -0700

On Fri, Jul 19, 2019 at 8:01 PM Guido van Rossum <gu...@python.org> wrote:


> But regardless of the author's availability, do you think it would serve
> your purpose?
>
Yes, it would serve the goal here, though I would love to see a much
reduced set of features, and support for context sensitive grammars.

> How does it compare to your own library?
>
First of all pyparsing has been around for 15+ years. It is often *the*
parsing library Python programmers turn to. My library has enjoyed life for
about 2 weeks.

Secondly, on feature set that is deemed useful here, pyparsing has some
failure / exception and source location support. For example:

pyparsing.ParseException: Expected {{["-"]...
Re:('[+-]?\\d+(?:\\.\\d*)?(?:[eE][+-]?\\d+)?')} | Group:({Suppress:("(")
Forward: ... Suppress:(")")})} (at char 1), (line:1, col:2)

Pyparsing also ignores whitespace by default. These features are not
available in my library.

Finally, pyparsing has some limitations of its own. Some of them are
inability to handle recursive grammars, ambiguities, and context sensitive
grammars.

Maybe you could compare the timings?
>

I've just pushed out
https://gitlab.com/nam-nguyen/parser_compynator/commit/40d41e6acc61f721847265b8adc56fe47359fa34
to
do that. Pyparsing seems to be about 2.5 times faster than my library for a
simple +-*/ grammar. The commit should give you some feel for how similar
the two are.

Cheers,
Nam

>
> On Fri, Jul 19, 2019 at 6:33 PM Nam Nguyen <bits...@gmail.com> wrote:
>
>> Yes, I have. PyParsing was the first one I turned too for it has been
>> available for a very long time. I emailed the author, Paul McGuire, a few
>> times about this python-ideas thread too but never got a response.
>>
>> On Fri, Jul 19, 2019 at 9:36 AM Guido van Rossum <gu...@python.org>
>> wrote:
>>
>>> Have you looked into pyparsing (https://github.com/pyparsing/pyparsing)?
>>> It somehow looks relevant.
>>>
>>> On Mon, Jul 15, 2019 at 6:45 PM Nam Nguyen <bits...@gmail.com> wrote:
>>>
>>>> Hello list,
>>>>
>>>> I sent an email to this list two or three months ago about the same
>>>> idea. In that discussion, there were both skepticism and support. Since I
>>>> had some time during the previous long weekend, I have made my idea more
>>>> concrete and I thought I would try with the list again, after having run it
>>>> through some of you privately.
>>>>
>>>> GOAL: To have some parsing primitives in the stdlib so that other
>>>> modules in the stdlib itself can make use of. This would alleviate various
>>>> security issues we have seen throughout the years.
>>>>
>>>> With that goal in mind, I opine that any parsing library for this
>>>> purpose should have the following characteristics:
>>>>
>>>> #. Can be expressed in code. My opinion is that it is hard to review
>>>> generated code. Code review is even more crucial in security contexts.
>>>>
>>>> #. Small and verifiable. This helps build trust in the code that is
>>>> meant to plug security holes.
>>>>
>>>> #. Less evolving. Being in the stdlib has its drawback that is
>>>> development velocity. The library should be theoretically sound and stable
>>>> from the beginning.
>>>>
>>>> #.  Universal. Most of the times we'll parse left-factored context-free
>>>> grammars, but sometimes we'll also want to parse context-sensitive grammars
>>>> such as short XML fragments in which end tags must match start tags.
>>>>
>>>> I have implemented a tiny (~200 SLOCs) package at
>>>> https://gitlab.com/nam-nguyen/parser_compynator that demonstrates
>>>> something like this is possible. There are several examples for you to have
>>>> a feel of it, as well as some early benchmark numbers to consider. This is
>>>> far smaller than any of the Python parsing libraries I have looked at, yet
>>>> more universal than many of them. I hope that it would convert the skeptics
>>>> ;).
>>>>
>>>> Finally, my request to the list is: Please debate on: 1) whether we
>>>> want a small (even private, underscore prefixed) parsing library in the
>>>> stdlib to help with tasks that are a little too complex for regexes, and 2)
>>>> if yes, how should it look like?
>>>>
>>>> I also welcome comments (naming, uses of operator overloading,
>>>> features, bikeshedding, etc.) on the above package ;).
>>>>
>>>> Thanks!
>>>> Nam
>>>> _______________________________________________
>>>> Python-ideas mailing list -- python-ideas@python.org
>>>> To unsubscribe send an email to python-ideas-le...@python.org
>>>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>>>> Message archived at
>>>> https://mail.python.org/archives/list/python-ideas@python.org/message/2WFZPWUSW3CKGGP7P623GIHG5AK3NVCC/
>>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>>
>>>
>>>
>>> --
>>> --Guido van Rossum (python.org/~guido)
>>> *Pronouns: he/him/his **(why is my pronoun here?)*
>>> <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
>>>
>>
>
> --
> --Guido van Rossum (python.org/~guido)
> *Pronouns: he/him/his **(why is my pronoun here?)*
> <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
>

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/UTKC34UDBFN7XU6RVWRRQKBGPOUBSMVT/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Universal parsing library in the stdlib to alleviate security issues

Reply via email to