Hello list,

I sent an email to this list two or three months ago about the same idea.
In that discussion, there were both skepticism and support. Since I had
some time during the previous long weekend, I have made my idea more
concrete and I thought I would try with the list again, after having run it
through some of you privately.

GOAL: To have some parsing primitives in the stdlib so that other modules
in the stdlib itself can make use of. This would alleviate various security
issues we have seen throughout the years.

With that goal in mind, I opine that any parsing library for this purpose
should have the following characteristics:

#. Can be expressed in code. My opinion is that it is hard to review
generated code. Code review is even more crucial in security contexts.

#. Small and verifiable. This helps build trust in the code that is meant
to plug security holes.

#. Less evolving. Being in the stdlib has its drawback that is development
velocity. The library should be theoretically sound and stable from the
beginning.

#.  Universal. Most of the times we'll parse left-factored context-free
grammars, but sometimes we'll also want to parse context-sensitive grammars
such as short XML fragments in which end tags must match start tags.

I have implemented a tiny (~200 SLOCs) package at
https://gitlab.com/nam-nguyen/parser_compynator that demonstrates something
like this is possible. There are several examples for you to have a feel of
it, as well as some early benchmark numbers to consider. This is far
smaller than any of the Python parsing libraries I have looked at, yet more
universal than many of them. I hope that it would convert the skeptics ;).

Finally, my request to the list is: Please debate on: 1) whether we want a
small (even private, underscore prefixed) parsing library in the stdlib to
help with tasks that are a little too complex for regexes, and 2) if yes,
how should it look like?

I also welcome comments (naming, uses of operator overloading, features,
bikeshedding, etc.) on the above package ;).

Thanks!
Nam
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/2WFZPWUSW3CKGGP7P623GIHG5AK3NVCC/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to