[Python-ideas] Fwd: Re: Universal parsing library in the stdlib to alleviate security issues

Nam Nguyen Sun, 28 Jul 2019 19:28:18 -0700

Forward to the list because Abusix had blocked google.com initially.
Nam

---------- Forwarded message ---------
From: Nam Nguyen <[email protected]>
Date: Sun, Jul 28, 2019 at 10:18 AM
Subject: Re: [Python-ideas] Re: Universal parsing library in the stdlib to
alleviate security issues
To: Sebastian Kreft <[email protected]>
Cc: Paul Moore <[email protected]>, python-ideas <[email protected]>



Let's circle back to the beginning one last time ;).

On Thu, Jul 25, 2019 at 8:15 AM Sebastian Kreft <[email protected]> wrote:

> Nam, I think it'd be better to frame the proposal as a security
> enhancement. Stating some of the common bugs/gotchas found when manually
> implementing parsers, and the impact this has had on python over the years.
> Seeing a full list of security issues (CVEs) by module would give us a
> sense of how widespread the problem is.
>

Since my final exam was done this weekend, I gathered some more info into
this spreadsheet.

https://docs.google.com/spreadsheets/d/1TlWSf8iM7eIzEPXanJAP8Ztyzt4ZD28xFvUKeBuQtdA/

I think a strict parser can help with the majority of those problems. They
are in HTTP headers, emails, cookies, URLs, and even low level socket code
(inet_atoi).


> Then survey the stdlib for what kind of grammars are currently being
> parsed, what ad-hoc parsing strategy are implemented and provide examples
> of whether having a general purpose parser would have prevented the
> security issues you have previously cited.
>

Most grammars I have seen here come straight from RFCs, which are in ABNF
and thus context-free. Current implementations are based on regexes or
string splitting. My previous example showed that at least 30500, 36216,
36742 were non-issues if we started out with a strict parser.


>
> Right now, it is not clear what the impact of such refactor would be, nor
> the worth of such attempt.
>

Exactly the kind of response I'm looking for. It is okay to suggest that
the benefits aren't clear or that there are requirements X and Y that a
general parser won't be able to meet, but it's not convincing to brush
aside this because there is "existing, working code." Many of the bugs in
that sheet are still open. It's not comfortable to say the code is working
with a straight face as I have experienced with my own fix for 30500. I
just couldn't tell if it was doing the right thing.


>
> What others have said earlier is that you are the one that needs to
> provide some of the requirements for the proposed private parsing library.
> And from what I read from your emails you do have some ideas. For example,
> you want it to be easy to write and review (I guess here you would
> eventually like it to be a close translation from whatever is specified in
> the RFC or grammar specification).
>

Yes, that's the most important point because "readability counts." It's
hard to reason about correctness when there are many transformations
between the authoritative spec and the implementation. I definitely don't
want to touch the regexes, string splits, and custom logic that I don't
understand "why" they are that way in the beginning. How do I, for example,
know what this regex is about

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?


(It's from RFC 3986.)

But you also need to take into consideration some of the list's concerns,
> the parser library has to be performant, as a performance regression is
> likely not to be tolerable.
>

Absolutely. That's where I need inputs from the list. I have provided my
own set of requirements for such a parser library. I'm sure most of us have
different needs too. So if a parser library can help you, let's hear what
you want from it. If you think it can't, please let me understand why.

Thanks,
Nam


>
>
> On Thu, Jul 25, 2019 at 10:54 AM Nam Nguyen <[email protected]> wrote:
>
>> On Thu, Jul 25, 2019 at 2:32 AM Paul Moore <[email protected]> wrote:
>>
>>> On Thu, 25 Jul 2019 at 02:16, Nam Nguyen <[email protected]> wrote:
>>> > Back to my original requests to the list: 1) Whether we want to have a
>>> (possibly private) parsing library in the stdlib
>>>
>>> In the abstract, no. Propose a specific library, and that answer would
>>> change to "maybe".
>>>
>>
>> I have no specific library to propose. I'm looking for a list of features
>> such a library should have.
>>
>>
>>>
>>> > and 2) What features it should have.
>>>
>>> That question only makes sense if you get agreement to the abstract
>>> proposal that "we should add a parsing library. And as I said, I don't
>>> agree to that so I can't answer the second question.
>>>
>>
>> As Chris summarized it correctly, I am advocating for a general solution
>> to individual problems (which have the same nature). We can certainly solve
>> the problems when they are reported, or we can take a proactive approach to
>> make them less likely to occur. I am talking about a class of input
>> validation issues here and I thought parsing would be a very natural
>> solution to that. This is quite similar to a context-sensitive templating
>> library that prevents cross-site-scripting on the output side. So I don't
>> know why (or what it takes) to convince people that it's a good thing(tm).
>>
>>
>>>
>>> Generally, things go into the stdlib when they have been developed
>>> externally and proved their value. The bar for designing a whole
>>> library from scratch, "specifically" targeted at stdlib inclusion, is
>>> very high, and you're nowhere near reaching it IMO.
>>>
>>
>> This is a misunderstanding. I have not proposed any from-scratch, or
>> existing library to be used. And on this note, please allow me to make it
>> clear once more time that I am not asking for a publicly-facing library
>> either.
>>
>>
>>>
>>> > These are good points to set as targets! What does it take for me to
>>> get the list to agree on one such set of criteria?
>>>
>>> You need to start by getting agreement on the premise that adding a
>>> newly-written parser to the stdlib is a good idea. And so far your
>>> *only* argument seems to be that "it will avoid a class of security
>>> bugs" which I find extremely unconvincing (and I get the impression
>>> others do, too).
>>
>>
>> Why? What is unconvincing about a parsing library being able... parse
>> (and therefore, validate) inputs?
>>
>>
>>> But even if "using a real parser" was useful in that
>>> context, there's *still* no argument for writing one from scratch,
>>> rather than using an existing, proven library.
>>
>>
>> Never a goal.
>>
>>
>>> At the most basic
>>> level, what if there's a bug in your new parsing library? If we're
>>> using it in security-critical code, such a bug would be a
>>> vulnerability just like the ones you're suggesting your parser would
>>> avoid. Are you asking us to believe that your code will be robust
>>> enough to trust over code that's been used in production systems for
>>> years?
>>>
>>> I think you need to stop getting distracted by details, and focus on
>>> your stated initial request "Whether we want to have a (possibly
>>> private) parsing library in the stdlib". You don't seem to me to have
>>> persuaded anyone of this basic suggestion yet,
>>
>>
>> Good observation. How do I convince you that complex input validation
>> tasks should be left to a parser?
>>
>> Thanks!
>> Nam
>>
>> _______________________________________________
>> Python-ideas mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/[email protected]/message/FCPU4ZW43G3G6JZHJTD33MT7SYI3DBQY/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
> --
> Sebastian Kreft
>

_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/M6ITXY4JVK32L2JJ2UVXHJQR5CJ6OTUT/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Fwd: Re: Universal parsing library in the stdlib to alleviate security issues

Reply via email to