[Python-ideas] Correct way for writing Python code without causing interpreter crashes due to parser stack overflow

Fiedler Roman Wed, 27 Jun 2018 00:05:36 -0700

Hello List,

Context: we are conducting machine learning experiments that generate some kind 
of nested decision trees. As the tree includes specific decision elements 
(which require custom code to evaluate), we decided to store the decision tree 
(result of the analysis) as generated Python code. Thus the decision tree can 
be transferred to sensor nodes (detectors) that will then filter data according 
to the decision tree when executing the given code.


Tracking down a crash when executing that generated code, we came to following 
simplified reproducer that will cause the interpreter to crash (on Python 2/3) 
when loading the code before execution is started:

#!/usr/bin/python2 -BEsStt
A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A([A(None)])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])

The error message is:

s_push: parser stack overflow
MemoryError

Despite the machine having 16GB of RAM, the code cannot be loaded. Splitting it 
into two lines using an intermediate variable is the current workaround to 
still get it running after manual adapting.

As discussed on Python security list, crashes when loading such decision trees 
or also mathematical formulas (see bug report [1]) should not be a security 
problem. Even when not directly covered in the Python security model 
documentation [2], this case comes too close to "arbitrary code execution", 
where Python does not attempt to provide any protection. There might be only 
some border cases of affected software,  e.g. Python sandbox systems like 
Zope/Plone or maybe even Python based smart contract blockchains like 
Etherereum (do not know if/where the use/derived work from the default Python 
interpreter for their use). But in both cases they would also be too close 
violating the security model, thus no changes to Python required from this 
side. Thus Python security suggested that the discussion should be continued on 
this list.


Even when no security problem involved, the crash is still quite an annoyance. 
Development of code generators can be a tedious tasks. It is then somehow 
frustrating, when your generated code is not accepted by the interpreter, even 
when you do not feel like getting close to some system-relevant limits, e.g. 50 
elements in a line like above on a 16GB machine. You may adapt the generator, 
but as the error does not include any information, which limit you really 
violated (number of brackets, function calls, list definitions?) you can only 
do experiments or look on the Python compiler code to figure that out. Even 
when you fix it, you have no guarantee to hit some other obscure limit the next 
day or that those limits change from one Python minor version to the next 
causing regressions.

Questions:

* Do you deem it possible/sensible to even attempt to write a Python language 
code generator that will produce non-malicious, syntactically valid decision 
tree code/mathematical formulas and still having a sufficiently high 
probability that the Python interpreter will also run that code now and in near 
future (regressions)?

* Assuming yes to the question above, when generating code, what should be the 
maximal nesting depth a code generator can always expect to be compiled on 
Python 2.7 and 3.5 on? Are there any other similar restrictions that need to be 
considered by the code generator? Or is generating code that way not the 
preferred solution anyway - the code generator should generate e.g. binary 
python code immediately? Note: in the end the exact same logic code will run as 
Python process, it seems it is only about how it is loaded into the Python 
interpreter.

* If not possible/recommended/sensible, we might generate Java-bytecode or 
native x86-code instead, where the likelihood of the (virtual) CPU really 
executing code that is compliant to the language specification (even with CPU 
errata like FDIV-bug et al) might be magnitudes higher than with the Python 
interpreter.

Any feedback appreciated!

Roman

[1] https://bugs.python.org/issue3971)
[2] http://python-security.readthedocs.io/security.html#security-model
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Correct way for writing Python code without causing interpreter crashes due to parser stack overflow

Reply via email to