[pypy-commit] pypy sandbox-lib: expand the interface, still only theoretical

2017-01-01 Thread arigo
Author: Armin Rigo 
Branch: sandbox-lib
Changeset: r89283:be4412e6ecf2
Date: 2016-12-29 19:02 +0100
http://bitbucket.org/pypy/pypy/changeset/be4412e6ecf2/

Log:expand the interface, still only theoretical

diff --git a/rpython/translator/rsandbox/src/part.h 
b/rpython/translator/rsandbox/src/part.h
--- a/rpython/translator/rsandbox/src/part.h
+++ b/rpython/translator/rsandbox/src/part.h
@@ -1,40 +1,46 @@
-/*** rpython/translator/rsandbox/src/part.h ***/
-
 #ifndef _RSANDBOX_H_
 #define _RSANDBOX_H_
 
 #ifndef RPY_SANDBOX_EXPORTED
-/* Common definitions when including this file from an external C project */
-
-#include 
-#include 
-
-#define RPY_SANDBOX_EXPORTED  extern
-
-typedef long Signed;
-typedef unsigned long Unsigned;
-
+#  define RPY_SANDBOX_EXPORTED  extern
 #endif
 
 
 /* ***
 
-   WARNING: Python is not meant to be a safe language.  For example,
-   think about making a custom code object with a random byte string and
-   trying to interpret that.  A sandboxed PyPy contains extra safety
-   checks that can detect such invalid operations before they cause
-   problems.  When such a case is detected, THE WHOLE PROCESS IS
+   A direct interface for safely embedding Python inside a larger
+   application written in C (or any other language which can access C
+   libraries).
+
+   For now, there is little support for more complex cases.  Notably,
+   any call to functions like open() or any attempt to do 'import' of
+   any non-builtin module will fail.  This interface is not meant to
+   "drop in" a large amount of existing Python code.  If you are looking
+   for this and are not concerned about security, look at CFFI
+   embedding: http://cffi.readthedocs.org/en/latest/embedding.html .
+   Instead, this interface is meant to run small amounts of untrusted
+   Python code from third-party sources.  (It is possible to rebuild a
+   module system on top of this interface, by writing a custom
+   __import__ hook in Python.  Similarly, you cannot return arbitrary
+   Python objects to C code, but you can make a Python-side data
+   structure like a list or a dict, and pass integer indices to C.)
+
+   WARNING: Python is originally not meant to be a safe language.  For
+   example, think about making a custom code object with a random byte
+   string and trying to interpret that.  A sandboxed PyPy contains extra
+   safety checks that can detect such invalid operations before they
+   cause problems.  When such a case is detected, THE WHOLE PROCESS IS
ABORTED right now.  In the future, there should be a setjmp/longjmp
alternative to this, but the details need a bit of care (e.g. it
would still create memory leaks).
 
-   For now, you have to accept that the process can be aborted if
-   given malicious code.  Also, running several Python sources from
-   different sources in the same process is not recommended---there is
-   only one global state: malicious code can easily mangle the state
-   of the Python interpreter, influencing subsequent runs.  Unless you
-   are fine with both issues, you MUST run Python from subprocesses,
-   not from your main program.
+   For now, you have to accept that the process can be aborted if given
+   malicious code.  Also, running several Python codes from different
+   untrusted sources in the same process is not recommended---there is
+   only one global state: malicious code can easily mangle the state of
+   the PyPy interpreter, influencing subsequent runs.  Unless you are
+   fine with both issues, you MUST run Python from subprocesses, not
+   from your main program.
 
Multi-threading issues: DO NOT USE FROM SEVERAL THREADS AT THE SAME
TIME!  You need a lock.  If you use subprocesses, they will likely
@@ -150,6 +156,14 @@
 */
 RPY_SANDBOX_EXPORTED void rsandbox_result_bytes(char *buf, size_t bufsize);
 
+/* If the called function returns a tuple of values, then the above
+   'result' functions work on individual items in the tuple, initially
+   the 0th one.  This function changes the current item to
+   'current_item' if that is within bounds.  Returns the total length of
+   the tuple, or -1 if not a tuple.
+*/
+RPY_SANDBOX_EXPORTED int rsandbox_result_tuple_item(int current_item);
+
 /* When an exception occurred in rsandbox_open() or rsandbox_call(),
return more information as a 'char *' string.  Same rules as
rsandbox_result_bytes().  (Careful, you MUST NOT assume that the
@@ -163,14 +177,38 @@
 RPY_SANDBOX_EXPORTED void rsandbox_last_exception(char *buf, size_t bufsize,
   int traceback_limit);
 
+/* Installs a callback inside the module 'mod' under the name 'fnname'.
+   The Python code then sees a function 'fnname()' which invokes back
+   the C function given as the 'callback' parameter.  The 'callback' is
+   called with 'data' as sole argument (use NULL if you don't need
+   this).
+
+   When the Python 'fnname()' is

[pypy-commit] pypy sandbox-lib: string => bytes

2017-01-01 Thread arigo
Author: Armin Rigo 
Branch: sandbox-lib
Changeset: r89282:3d02cf9459c7
Date: 2016-12-28 17:59 +0100
http://bitbucket.org/pypy/pypy/changeset/3d02cf9459c7/

Log:string => bytes

diff --git a/rpython/translator/rsandbox/src/part.h 
b/rpython/translator/rsandbox/src/part.h
--- a/rpython/translator/rsandbox/src/part.h
+++ b/rpython/translator/rsandbox/src/part.h
@@ -20,7 +20,7 @@
 /* ***
 
WARNING: Python is not meant to be a safe language.  For example,
-   think about making a custom code object with a random string and
+   think about making a custom code object with a random byte string and
trying to interpret that.  A sandboxed PyPy contains extra safety
checks that can detect such invalid operations before they cause
problems.  When such a case is detected, THE WHOLE PROCESS IS
@@ -72,7 +72,7 @@
 
rsandbox_module_t *compile_expression(const char *expression)
{
-   rsandbox_push_string(expression);   // 'expression' is untrusted
+   rsandbox_push_bytes(expression);   // 'expression' is untrusted
return rsandbox_open(
"code = compile(args[0], '', 'eval')\n"
"def evaluate(n):\n"
@@ -102,8 +102,8 @@
 */
 RPY_SANDBOX_EXPORTED void rsandbox_push_long(long);
 RPY_SANDBOX_EXPORTED void rsandbox_push_double(double);
-RPY_SANDBOX_EXPORTED void rsandbox_push_string(const char *);
-RPY_SANDBOX_EXPORTED void rsandbox_push_string_and_size(const char *, size_t);
+RPY_SANDBOX_EXPORTED void rsandbox_push_bytes(const char *);
+RPY_SANDBOX_EXPORTED void rsandbox_push_bytes_and_size(const char *, size_t);
 RPY_SANDBOX_EXPORTED void rsandbox_push_none(void);
 RPY_SANDBOX_EXPORTED void rsandbox_push_rw_buffer(char *, size_t);
 
@@ -122,24 +122,25 @@
malicious code returning results like inf, nan, or 1e-323.) */
 RPY_SANDBOX_EXPORTED double rsandbox_result_double(void);
 
-/* Returns the length of the string returned in the previous
-   rsandbox_call().  If it was not a string, returns 0. */
-RPY_SANDBOX_EXPORTED size_t rsandbox_result_string_length(void);
+/* Returns the length of the byte string returned in the previous
+   rsandbox_call().  If it was not a byte string, returns 0. */
+RPY_SANDBOX_EXPORTED size_t rsandbox_result_bytes_length(void);
 
-/* Returns the data in the string.  This function always writes an
-   additional '\0'.  If the string is longer than 'bufsize-1', it is
+/* Returns the data in the byte string.  This function always writes an
+   additional '\0'.  If the byte string is longer than 'bufsize-1', it is
truncated to 'bufsize-1' characters.
 
For small human-readable strings you can call
-   rsandbox_result_string() with some fixed maximum size.  You get a
+   rsandbox_result_bytes() with some fixed maximum size.  You get a
regular null-terminated 'char *' string.  (If it contains embedded
'\0', it will appear truncated; if the Python function did not
-   return a string at all, it will be completely empty; but anyway
+   return a byte string at all, it will be completely empty; but anyway
you MUST be ready to handle any malformed string at all.)
 
For strings of larger sizes or strings that can meaningfully
-   contain embedded '\0', you should allocate a 'buf' of size
-   'rsandbox_result_string_length() + 1'.
+   contain embedded '\0', you should compute 'bufsize =
+   rsandbox_result_bytes_length() + 1' and allocate a buffer of this
+   length.
 
To repeat: Be careful when reading strings from Python!  They can
contain any character, so be sure to escape them correctly (or
@@ -147,17 +148,20 @@
further.  Malicious code can return any string.  Your code must be
ready for anything.  Err on the side of caution.
 */
-RPY_SANDBOX_EXPORTED void rsandbox_result_string(char *buf, size_t bufsize);
+RPY_SANDBOX_EXPORTED void rsandbox_result_bytes(char *buf, size_t bufsize);
 
 /* When an exception occurred in rsandbox_open() or rsandbox_call(),
-   return more information as a string.  Same rules as
-   rsandbox_result_string().  (Careful, you MUST NOT assume that the
+   return more information as a 'char *' string.  Same rules as
+   rsandbox_result_bytes().  (Careful, you MUST NOT assume that the
string is well-formed: malicious code can make it contain anything.
If you are copying it to a web page, for example, then a good idea
is to replace any character not in a whitelist with '?'.)
+
+   If 'traceback_limit' is greater than zero, the output is a multiline
+   traceback like in standard Python, with up to 'traceback_limit' levels.
 */
 RPY_SANDBOX_EXPORTED void rsandbox_last_exception(char *buf, size_t bufsize,
-  int include_traceback);
+  int traceback_limit);
 
 
 //
___
pypy-commit mailing list
[email protected]
https

[pypy-commit] pypy default: document branch

2017-01-01 Thread arigo
Author: Armin Rigo 
Branch: 
Changeset: r89286:999ff3b3f9a4
Date: 2017-01-01 11:31 +0100
http://bitbucket.org/pypy/pypy/changeset/999ff3b3f9a4/

Log:document branch

diff --git a/pypy/doc/whatsnew-head.rst b/pypy/doc/whatsnew-head.rst
--- a/pypy/doc/whatsnew-head.rst
+++ b/pypy/doc/whatsnew-head.rst
@@ -76,3 +76,8 @@
 PyMemoryViewObject with a PyBuffer attached so that the call to 
 ``PyMemoryView_GET_BUFFER`` does not leak a PyBuffer-sized piece of memory.
 Properly call ``bf_releasebuffer`` when not ``NULL``.
+
+.. branch: boehm-rawrefcount
+
+Support translations of cpyext with the Boehm GC (for special cases like
+revdb).
___
pypy-commit mailing list
[email protected]
https://mail.python.org/mailman/listinfo/pypy-commit


[pypy-commit] pypy default: hg merge boehm-rawrefcount

2017-01-01 Thread arigo
Author: Armin Rigo 
Branch: 
Changeset: r89284:a3aedbe6023d
Date: 2017-01-01 11:27 +0100
http://bitbucket.org/pypy/pypy/changeset/a3aedbe6023d/

Log:hg merge boehm-rawrefcount

A branch to add minimal support for rawrefcount in Boehm
translations. This is needed by revdb.

diff --git a/pypy/module/cpyext/state.py b/pypy/module/cpyext/state.py
--- a/pypy/module/cpyext/state.py
+++ b/pypy/module/cpyext/state.py
@@ -1,7 +1,7 @@
 from rpython.rlib.objectmodel import we_are_translated
 from rpython.rtyper.lltypesystem import rffi, lltype
 from pypy.interpreter.error import OperationError, oefmt
-from pypy.interpreter.executioncontext import AsyncAction
+from pypy.interpreter import executioncontext
 from rpython.rtyper.lltypesystem import lltype
 from rpython.rtyper.annlowlevel import llhelper
 from rpython.rlib.rdynload import DLLHANDLE
@@ -14,8 +14,9 @@
 self.reset()
 self.programname = lltype.nullptr(rffi.CCHARP.TO)
 self.version = lltype.nullptr(rffi.CCHARP.TO)
-pyobj_dealloc_action = PyObjDeallocAction(space)
-self.dealloc_trigger = lambda: pyobj_dealloc_action.fire()
+if space.config.translation.gc != "boehm":
+pyobj_dealloc_action = PyObjDeallocAction(space)
+self.dealloc_trigger = lambda: pyobj_dealloc_action.fire()
 
 def reset(self):
 from pypy.module.cpyext.modsupport import PyMethodDef
@@ -67,6 +68,11 @@
 state.api_lib = str(api.build_bridge(self.space))
 else:
 api.setup_library(self.space)
+#
+if self.space.config.translation.gc == "boehm":
+action = BoehmPyObjDeallocAction(self.space)
+self.space.actionflag.register_periodic_action(action,
+use_bytecode_counter=True)
 
 def install_dll(self, eci):
 """NOT_RPYTHON
@@ -84,8 +90,10 @@
 from pypy.module.cpyext.api import init_static_data_translated
 
 if we_are_translated():
-rawrefcount.init(llhelper(rawrefcount.RAWREFCOUNT_DEALLOC_TRIGGER,
-  self.dealloc_trigger))
+if space.config.translation.gc != "boehm":
+rawrefcount.init(
+llhelper(rawrefcount.RAWREFCOUNT_DEALLOC_TRIGGER,
+self.dealloc_trigger))
 init_static_data_translated(space)
 
 setup_new_method_def(space)
@@ -143,15 +151,23 @@
 self.extensions[path] = w_copy
 
 
-class PyObjDeallocAction(AsyncAction):
+def _rawrefcount_perform(space):
+from pypy.module.cpyext.pyobject import PyObject, decref
+while True:
+py_obj = rawrefcount.next_dead(PyObject)
+if not py_obj:
+break
+decref(space, py_obj)
+
+class PyObjDeallocAction(executioncontext.AsyncAction):
 """An action that invokes _Py_Dealloc() on the dying PyObjects.
 """
+def perform(self, executioncontext, frame):
+_rawrefcount_perform(self.space)
 
+class BoehmPyObjDeallocAction(executioncontext.PeriodicAsyncAction):
+# This variant is used with Boehm, which doesn't have the explicit
+# callback.  Instead we must periodically check ourselves.
 def perform(self, executioncontext, frame):
-from pypy.module.cpyext.pyobject import PyObject, decref
-
-while True:
-py_obj = rawrefcount.next_dead(PyObject)
-if not py_obj:
-break
-decref(self.space, py_obj)
+if we_are_translated():
+_rawrefcount_perform(self.space)
diff --git a/rpython/rlib/rawrefcount.py b/rpython/rlib/rawrefcount.py
--- a/rpython/rlib/rawrefcount.py
+++ b/rpython/rlib/rawrefcount.py
@@ -4,10 +4,11 @@
 #  This is meant for pypy's cpyext module, but is a generally
 #  useful interface over our GC.  XXX "pypy" should be removed here
 #
-import sys, weakref
-from rpython.rtyper.lltypesystem import lltype, llmemory
+import sys, weakref, py
+from rpython.rtyper.lltypesystem import lltype, llmemory, rffi
 from rpython.rlib.objectmodel import we_are_translated, specialize, not_rpython
 from rpython.rtyper.extregistry import ExtRegistryEntry
+from rpython.translator.tool.cbuild import ExternalCompilationInfo
 from rpython.rlib import rgc
 
 
@@ -245,6 +246,11 @@
 v_p, v_ob = hop.inputargs(*hop.args_r)
 hop.exception_cannot_occur()
 hop.genop(name, [_unspec_p(hop, v_p), _unspec_ob(hop, v_ob)])
+#
+if hop.rtyper.annotator.translator.config.translation.gc == "boehm":
+c_func = hop.inputconst(lltype.typeOf(func_boehm_eci),
+func_boehm_eci)
+hop.genop('direct_call', [c_func])
 
 
 class Entry(ExtRegistryEntry):
@@ -297,3 +303,10 @@
 v_ob = hop.genop('gc_rawrefcount_next_dead', [],
  resulttype = llmemory.Address)
 return _spec_ob(hop, v_ob)
+
+src_dir = py.path.local(__file__).dirpath() / 'src'
+boehm_eci = ExternalCo

[pypy-commit] pypy default: Allow --gc=boehm with the cpyext module.

2017-01-01 Thread arigo
Author: Armin Rigo 
Branch: 
Changeset: r89285:257848776fca
Date: 2017-01-01 11:30 +0100
http://bitbucket.org/pypy/pypy/changeset/257848776fca/

Log:Allow --gc=boehm with the cpyext module.

diff --git a/pypy/goal/targetpypystandalone.py 
b/pypy/goal/targetpypystandalone.py
--- a/pypy/goal/targetpypystandalone.py
+++ b/pypy/goal/targetpypystandalone.py
@@ -305,9 +305,9 @@
 config.objspace.lonepycfiles = False
 
 if config.objspace.usemodules.cpyext:
-if config.translation.gc != 'incminimark':
+if config.translation.gc not in ('incminimark', 'boehm'):
 raise Exception("The 'cpyext' module requires the 
'incminimark'"
-" GC.  You need either 
'targetpypystandalone.py"
+" 'boehm' GC.  You need either 
'targetpypystandalone.py"
 " --withoutmod-cpyext' or '--gc=incminimark'")
 
 config.translating = True
___
pypy-commit mailing list
[email protected]
https://mail.python.org/mailman/listinfo/pypy-commit