[Python-Dev] pyparallel and new memory API discussions...

2013-06-19 Thread Trent Nelson

The new memory API discussions (and PEP) warrant a quick pyparallel
update: a couple of weeks after PyCon, I came up with a solution for
the biggest show-stopper that has been plaguing pyparallel since its
inception: being able to detect the modification of "main thread"
Python objects from within a parallel context.

For example, `data.append(4)` in the example below will generate an
AssignmentError exception, because data is a main thread object, and
`data.append(4)` gets executed from within a parallel context::

data = [ 1, 2, 3 ]

def work():
data.append(4)

async.submit_work(work)

The solution turned out to be deceptively simple:

  1.  Prior to running parallel threads, lock all "main thread"
  memory pages as read-only (via VirtualProtect on Windows,
  mprotect on POSIX).

  2.  Detect attempts to write to main thread pages during parallel
  thread execution (via SEH on Windows or a SIGSEGV trap on POSIX),
  and raise an exception instead (detection is done in the ceval
  frame exec loop).

  3.  Prior to returning control back to the main thread (which will
  be paused whilst all the parallel threads are running), unlock
  all the "main thread" pages.

  4.  Pause all parallel threads while the main thread runs.

  5.  Go back to 1.

I got a proof-of-concept working on Windows a while back (and also
played around with large page support in the same commit).  The main
changes were to obmalloc.c:


https://bitbucket.org/tpn/pyparallel/commits/0e70a0caa1c07dc0c14bb5c99cbe808c1c11779f#chg-Objects/obmalloc.c

The key was the introduction of two new API calls, intended to be
called by the pyparallel.c infrastructure:

_PyMem_LockMainThreadPages()
_PyMem_UnlockMainThreadPages()

The implementation is pretty simple:

+int
+_PyMem_LockMainThreadPages(void)
+{
+DWORD old = 0;
+
+if (!VirtualProtect(base_addr, nbytes_committed, PAGE_READONLY, &old)) {
+PyErr_SetFromWindowsErr(0);
+return -1;
+}

Note the `base_addr` and `nbytes_committed` argument.  Basically, I
re-organized obmalloc.c a little bit such that we never actually
call malloc() directly.  Instead, we exploit the ability to reserve
huge virtual address ranges without actually committing the memory,
giving us a fixed `base_addr` void pointer that we can pass to calls
like VirtualProtect or mprotect.

We then incrementally commit more pages as demand increases, and
simply adjust our `nbytes_committed` counter as we go along.  The
net effect is that we can call VirtualProtect/mprotect once, with a
single base void pointer and size_t range, and immediately affect the
protection of all memory pages that fall within that range.

As an added bonus, we also get a very cheap and elegant way to test
if a pointer (or any arbitrary memory address, actually) belongs to
the main thread's memory range (at least in comparison to the
existing _PyMem_InRange black magic).  (This is very useful for my
pyparallel infrastructure, which makes extensive use of conditional
logic based on address tests.)

(Side-bar: a side-effect of the approach I've used in the proof-
 of-concept (by only having a single base addr pointer) is that
 we effectively limit the maximum memory we could eventually
 commit.

 I actually quite like this -- in fact, I'd like to tweak it
 such that we can actually expose min/max memory values to the
 Python interpreter at startup (analogous to the JVM).

 Having known upper bounds on maximum memory usage will vastly
 simplify some other areas of my pyparallel work (like the async
 socket stuff).

 For example, consider network programs these days that take a
 "max clients" configuration parameter.  That seems a bit
 backwards to me.

 It would be better if we simply said, "here, Python, you have
 1GB to work with".  That allows us to calculate how many
 clients we could simultaneously serve based on socket memory
 requirements, which allows for much more graceful behavior
 under load than leaving it open-ended.

 Maximum memory constraints would also be useful for the
 parallel.map(callable, iterable) stuff I've got in the works,
 as it'll allow us to optimally chunk work and assign to threads
 based on available memory.)

So, Victor, I'm interested to hear how the new API you're proposing
will affect this solution I've come up with for pyparallel; I'm
going to be absolutely dependent upon the ability to lock main
thread pages as read-only in one fell-swoop -- am I still going to
be able to do that with your new API in place?

Regards,

Trent.

Re: [Python-Dev] pyparallel and new memory API discussions...

2013-06-19 Thread Nick Coghlan
On 19 June 2013 23:10, Trent Nelson  wrote:
> So, Victor, I'm interested to hear how the new API you're proposing
> will affect this solution I've come up with for pyparallel; I'm
> going to be absolutely dependent upon the ability to lock main
> thread pages as read-only in one fell-swoop -- am I still going to
> be able to do that with your new API in place?

By default, nothing will change for the ordinary CPython runtime. It's
only if an embedding application starts messing with the allocators
that things might change, but at that point, pyparallel would break
anyway.

Cheers,
Nick.

--
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pyparallel and new memory API discussions...

2013-06-19 Thread Charles-François Natali
2013/6/19 Trent Nelson :
>
> The new memory API discussions (and PEP) warrant a quick pyparallel
> update: a couple of weeks after PyCon, I came up with a solution for
> the biggest show-stopper that has been plaguing pyparallel since its
> inception: being able to detect the modification of "main thread"
> Python objects from within a parallel context.
>
> For example, `data.append(4)` in the example below will generate an
> AssignmentError exception, because data is a main thread object, and
> `data.append(4)` gets executed from within a parallel context::
>
> data = [ 1, 2, 3 ]
>
> def work():
> data.append(4)
>
> async.submit_work(work)
>
> The solution turned out to be deceptively simple:
>
>   1.  Prior to running parallel threads, lock all "main thread"
>   memory pages as read-only (via VirtualProtect on Windows,
>   mprotect on POSIX).
>
>   2.  Detect attempts to write to main thread pages during parallel
>   thread execution (via SEH on Windows or a SIGSEGV trap on POSIX),
>   and raise an exception instead (detection is done in the ceval
>   frame exec loop).

Quick stupid question: because of refcounts, the pages will be written
to even in case of read-only access. How do you deal with this?

cf
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pyparallel and new memory API discussions...

2013-06-19 Thread Trent Nelson
Hi Charles-François!

Good to hear from you again.  It was actually your e-mail a few
months ago that acted as the initial catalyst for this memory
protection idea, so, thanks for that :-)

Answer below.

On Wed, Jun 19, 2013 at 07:01:49AM -0700, Charles-François Natali wrote:
> 2013/6/19 Trent Nelson :
> >
> > The new memory API discussions (and PEP) warrant a quick pyparallel
> > update: a couple of weeks after PyCon, I came up with a solution for
> > the biggest show-stopper that has been plaguing pyparallel since its
> > inception: being able to detect the modification of "main thread"
> > Python objects from within a parallel context.
> >
> > For example, `data.append(4)` in the example below will generate an
> > AssignmentError exception, because data is a main thread object, and
> > `data.append(4)` gets executed from within a parallel context::
> >
> > data = [ 1, 2, 3 ]
> >
> > def work():
> > data.append(4)
> >
> > async.submit_work(work)
> >
> > The solution turned out to be deceptively simple:
> >
> >   1.  Prior to running parallel threads, lock all "main thread"
> >   memory pages as read-only (via VirtualProtect on Windows,
> >   mprotect on POSIX).
> >
> >   2.  Detect attempts to write to main thread pages during parallel
> >   thread execution (via SEH on Windows or a SIGSEGV trap on POSIX),
> >   and raise an exception instead (detection is done in the ceval
> >   frame exec loop).
> 
> Quick stupid question: because of refcounts, the pages will be written
> to even in case of read-only access. How do you deal with this?

Easy: I don't refcount in parallel contexts :-)

There's no need, for two reasons:

 1. All memory allocated in a parallel context is localized to a
private heap.  When the parallel context is finished, the entire
heap can be blown away in one fell-swoop.  There's no need for
reference counting or GC because none of the objects will exist
after the parallel context completes.

 2. The main thread won't be running when parallel threads/contexts
are executing, which means main thread objects being accessed in
parallel contexts (read-only access is fine) won't be suddenly
free()'d or GC-collected or whatever.

You get credit for that second point; you asked a similar question a
few months ago that made me realize I absolutely couldn't have the
main thread running at the same time the parallel threads were
running.

Once I accepted that as a design constraint, everything else came
together nicely... "Hmmm, if the main thread isn't running, it won't
need write-access to any of its pages!  If we mark them read-only,
we could catch the traps/SEHs from parallel threads, then raise an
exception, ahh, simple!".

I'm both chuffed at how simple it is (considering it was *the* major
show-stopper), and miffed at how it managed to elude me for so long
;-)

Regards,

Trent.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pyparallel and new memory API discussions...

2013-06-19 Thread Victor Stinner
>  1. All memory allocated in a parallel context is localized to a
> private heap.

How do you allocate memory in this "private" heap? Did you add new
functions to allocate memory?

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pyparallel and new memory API discussions...

2013-06-19 Thread Trent Nelson
On Wed, Jun 19, 2013 at 08:45:55AM -0700, Victor Stinner wrote:
> >  1. All memory allocated in a parallel context is localized to a
> > private heap.
> 
> How do you allocate memory in this "private" heap? Did you add new
> functions to allocate memory?

Yup:
_PyHeap_Malloc(): 
http://hg.python.org/sandbox/trent/file/0e70a0caa1c0/Python/pyparallel.c#l2365.

All memory operations (PyObject_New/Malloc etc) get intercepted
during parallel thread execution and redirected to _PyHeap_Malloc(),
which is a very simple slab allocator.  (No need for convoluted
buckets because we never free individual objects during parallel
execution; instead, we just blow everything away at the end.)

Trent.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pyparallel and new memory API discussions...

2013-06-19 Thread Victor Stinner
"""
So, Victor, I'm interested to hear how the new API you're proposing
will affect this solution I've come up with for pyparallel; I'm
going to be absolutely dependent upon the ability to lock main
thread pages as read-only in one fell-swoop -- am I still going to
be able to do that with your new API in place?
"""

2013/6/19 Trent Nelson :
> On Wed, Jun 19, 2013 at 08:45:55AM -0700, Victor Stinner wrote:
>> >  1. All memory allocated in a parallel context is localized to a
>> > private heap.
>>
>> How do you allocate memory in this "private" heap? Did you add new
>> functions to allocate memory?
>
> Yup:
> _PyHeap_Malloc(): 
> http://hg.python.org/sandbox/trent/file/0e70a0caa1c0/Python/pyparallel.c#l2365.
>
> All memory operations (PyObject_New/Malloc etc) get intercepted
> during parallel thread execution and redirected to _PyHeap_Malloc(),
> which is a very simple slab allocator.  (No need for convoluted
> buckets because we never free individual objects during parallel
> execution; instead, we just blow everything away at the end.)

Ok, so I don't think that the PEP 445 would change anything for you.

The following change might have an impact: If _PyHeap_Malloc is not
thread safe, replacing PyMem_Malloc() with PyMem_RawMalloc() when the
GIL is not held would avoid bugs in your code.

If you want to choose dynamically the allocator at runtime, you can
replace PyObject_Malloc allocator using:
-- 8< -
static void *
_PxMem_AllocMalloc(void *ctx, size_t size)
{
PyMemBlockAllocator *ctx;
if (Py_PXCTX)
return _PxMem_Malloc(size))
else
return alloc->malloc(alloc->ctx, size);
}

...

PyMemBlockAllocator pyparallel_pyobject;

static void *
setup_pyparallel_allocator(void)
{
PyMemBlockAllocator alloc;
PyObject_GetAllocator(&pyparallel_pyobject);
alloc.ctx = &pyparallel_pyobject;
alloc.malloc = _PxMem_AllocMalloc;
...
PyObject_SetAllocator(&alloc);
}
-- 8< -

But I don't know if you want pyparallel to be an "optional" feature
chosen at runtime...

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pyparallel and new memory API discussions...

2013-06-19 Thread Trent Nelson
On Wed, Jun 19, 2013 at 09:20:15AM -0700, Victor Stinner wrote:
> """
> So, Victor, I'm interested to hear how the new API you're proposing
> will affect this solution I've come up with for pyparallel; I'm
> going to be absolutely dependent upon the ability to lock main
> thread pages as read-only in one fell-swoop -- am I still going to
> be able to do that with your new API in place?
> """
> 
> 2013/6/19 Trent Nelson :
> > On Wed, Jun 19, 2013 at 08:45:55AM -0700, Victor Stinner wrote:
> >> >  1. All memory allocated in a parallel context is localized to a
> >> > private heap.
> >>
> >> How do you allocate memory in this "private" heap? Did you add new
> >> functions to allocate memory?
> >
> > Yup:
> > _PyHeap_Malloc(): 
> > http://hg.python.org/sandbox/trent/file/0e70a0caa1c0/Python/pyparallel.c#l2365.
> >
> > All memory operations (PyObject_New/Malloc etc) get intercepted
> > during parallel thread execution and redirected to _PyHeap_Malloc(),
> > which is a very simple slab allocator.  (No need for convoluted
> > buckets because we never free individual objects during parallel
> > execution; instead, we just blow everything away at the end.)
> 
> Ok, so I don't think that the PEP 445 would change anything for you.
> 
> The following change might have an impact: If _PyHeap_Malloc is not
> thread safe, replacing PyMem_Malloc() with PyMem_RawMalloc() when the
> GIL is not held would avoid bugs in your code.

Hmmm, well, _PyHeap_Malloc is sort of implicitly thread-safe, by
design, but I'm not sure if we're referring to the same sort of
thread-safe problem here.

For one, _PyHeap_Malloc won't ever run if the GIL isn't being held.

(Parallel threads are only allowed to run when the main thread has
 the GIL held and has relinquished control to parallel threads.)

Also, I interpret PyMem_RawMalloc() as a direct shortcut to
malloc() (or something else that returns void *s that are then
free()'d down the track).  Is that right?

I don't think that would impact pyparallel.

> If you want to choose dynamically the allocator at runtime, you can
> replace PyObject_Malloc allocator using:
> -- 8< -
> static void *
> _PxMem_AllocMalloc(void *ctx, size_t size)
> {
> PyMemBlockAllocator *ctx;
> if (Py_PXCTX)
> return _PxMem_Malloc(size))
> else
> return alloc->malloc(alloc->ctx, size);
> }
> 
> ...
> 
> PyMemBlockAllocator pyparallel_pyobject;
> 
> static void *
> setup_pyparallel_allocator(void)
> {
> PyMemBlockAllocator alloc;
> PyObject_GetAllocator(&pyparallel_pyobject);
> alloc.ctx = &pyparallel_pyobject;
> alloc.malloc = _PxMem_AllocMalloc;
> ...
> PyObject_SetAllocator(&alloc);
> }
> -- 8< -
> 
> But I don't know if you want pyparallel to be an "optional" feature
> chosen at runtime...

Hmmm, those code snippets are interesting.  Time for some more
homework.

Trent.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com