Re: [Python-ideas] Thread-safe generators

2017-04-17 Thread Greg Ewing

Nick Coghlan wrote:

but at the cost of
changing the nature of the workload in a given thread, and hence
messing with the working set of objects it has active.


Does the working set of an individual thread matter to anything?
Or even an individual process? As far as the virtual memory
system is concerned, what matters is the working set of all
active code sharing the RAM.

--
Greg


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Thread-safe generators

2017-04-17 Thread Jim Baker
This is a bad idea in the generator itself, as commented earlier by others
here.

>From a cross implementation perspective, in Jython, different threads can
call next on a non running generator, *so long as they coordinate with each
other external to any use of this generator*, and this works fine.

But any reliance on gi_running, as seen here, can only be considered to be
possible help in detecting such races; it would not even come close to
preventing a race:
https://github.com/jythontools/jython/blob/master/src/org/python/core/PyGenerator.java#L146

(We don't even bother making gi_running a volatile to get actual
test-and-set style semantics, because really it makes no sense to pretend
otherwise; and why pay the performance penalty?)

The idea of putting generators behind a queue sounds reasonably workable -
the semantics then are the right ones, although implementing this
efficiently is the trick here.


On Sun, Apr 16, 2017 at 11:08 PM, Nick Coghlan  wrote:

> On 17 April 2017 at 08:00, Paul Moore  wrote:
> > On 15 April 2017 at 10:45, Nick Coghlan  wrote:
> >> So I'd be opposed to trying to make generator objects natively thread
> >> aware - as Stephen notes, the GIL is an implementation detail of
> >> CPython, so it isn't OK to rely on it when defining changes to
> >> language level semantics (in this case, whether or not it's OK to have
> >> multiple threads all calling the same generator without some form of
> >> external locking).
> >>
> >> However, it may make sense to explore possible options for offering a
> >> queue.AutoQueue type, where the queue always has a defined maximum
> >> size (defaulting to 1), disallows explicit calls to put(), and
> >> automatically populates itself based on an iterator supplied to the
> >> constructors. Once the input iterator raises StopIteration, then the
> >> queue will start reporting itself as being empty.
> >
> > +1 A generator that can have values pulled from it on different
> > threads sounds like a queue to me, so the AutoQueue class that wraps a
> > generator seems like a natural abstraction to work with. It also means
> > that the cost for thread safety is only paid by those applications
> > that need it.
>
> If someone did build something like this, it would be interesting to
> benchmark it against a more traditional producer thread model, where
> one thread is responsible for adding work items to the queue, while
> others are responsible for draining them.
>
> The trick is that an auto-queue would borrow execution time from the
> consumer threads when new values are needed, so you'd theoretically
> get fewer context switches between threads, but at the cost of
> changing the nature of the workload in a given thread, and hence
> messing with the working set of objects it has active.
>
> It may also pair well with the concurrent.futures.Executor model,
> which is already good for "go handle this predefined list of tasks",
> but currently less useful as a replacement for a message queue with a
> pool of workers.
>
> Setting the latter up yourself is currently still a bit tedious, since:
>
> 1. we don't have a standard threading Pool abstraction in the standard
> library, just the one tucked away as part of multiprocessing
> 2. while queue.Queue has native support for worker pools, we don't
> provide a pre-assembled version that makes it easy to say "here is the
> producer, here are the consumers, wire them together for me"
>
> There are good reasons for that (mainly that it's hard to come up with
> an abstraction that's useful in its own right without becoming so
> complex that you're on the verge of reinventing a task manager like
> celery or a distributed computation manager like dask), but at the
> same time, the notion of "input queue, worker pool, output queue" is
> one that comes up a *lot* across different concurrency models, so
> there's potential value in providing a low-barrier-to-entry
> introduction to that idiom as part of the standard library.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Thread-safe generators

2017-04-16 Thread Nick Coghlan
On 17 April 2017 at 08:00, Paul Moore  wrote:
> On 15 April 2017 at 10:45, Nick Coghlan  wrote:
>> So I'd be opposed to trying to make generator objects natively thread
>> aware - as Stephen notes, the GIL is an implementation detail of
>> CPython, so it isn't OK to rely on it when defining changes to
>> language level semantics (in this case, whether or not it's OK to have
>> multiple threads all calling the same generator without some form of
>> external locking).
>>
>> However, it may make sense to explore possible options for offering a
>> queue.AutoQueue type, where the queue always has a defined maximum
>> size (defaulting to 1), disallows explicit calls to put(), and
>> automatically populates itself based on an iterator supplied to the
>> constructors. Once the input iterator raises StopIteration, then the
>> queue will start reporting itself as being empty.
>
> +1 A generator that can have values pulled from it on different
> threads sounds like a queue to me, so the AutoQueue class that wraps a
> generator seems like a natural abstraction to work with. It also means
> that the cost for thread safety is only paid by those applications
> that need it.

If someone did build something like this, it would be interesting to
benchmark it against a more traditional producer thread model, where
one thread is responsible for adding work items to the queue, while
others are responsible for draining them.

The trick is that an auto-queue would borrow execution time from the
consumer threads when new values are needed, so you'd theoretically
get fewer context switches between threads, but at the cost of
changing the nature of the workload in a given thread, and hence
messing with the working set of objects it has active.

It may also pair well with the concurrent.futures.Executor model,
which is already good for "go handle this predefined list of tasks",
but currently less useful as a replacement for a message queue with a
pool of workers.

Setting the latter up yourself is currently still a bit tedious, since:

1. we don't have a standard threading Pool abstraction in the standard
library, just the one tucked away as part of multiprocessing
2. while queue.Queue has native support for worker pools, we don't
provide a pre-assembled version that makes it easy to say "here is the
producer, here are the consumers, wire them together for me"

There are good reasons for that (mainly that it's hard to come up with
an abstraction that's useful in its own right without becoming so
complex that you're on the verge of reinventing a task manager like
celery or a distributed computation manager like dask), but at the
same time, the notion of "input queue, worker pool, output queue" is
one that comes up a *lot* across different concurrency models, so
there's potential value in providing a low-barrier-to-entry
introduction to that idiom as part of the standard library.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Thread-safe generators

2017-04-16 Thread Paul Moore
On 15 April 2017 at 10:45, Nick Coghlan  wrote:
> So I'd be opposed to trying to make generator objects natively thread
> aware - as Stephen notes, the GIL is an implementation detail of
> CPython, so it isn't OK to rely on it when defining changes to
> language level semantics (in this case, whether or not it's OK to have
> multiple threads all calling the same generator without some form of
> external locking).
>
> However, it may make sense to explore possible options for offering a
> queue.AutoQueue type, where the queue always has a defined maximum
> size (defaulting to 1), disallows explicit calls to put(), and
> automatically populates itself based on an iterator supplied to the
> constructors. Once the input iterator raises StopIteration, then the
> queue will start reporting itself as being empty.

+1 A generator that can have values pulled from it on different
threads sounds like a queue to me, so the AutoQueue class that wraps a
generator seems like a natural abstraction to work with. It also means
that the cost for thread safety is only paid by those applications
that need it.

Paul
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Thread-safe generators

2017-04-16 Thread Guido van Rossum
I think the two shouldn't be mixed.

On Apr 16, 2017 7:58 AM, "Victor Stinner"  wrote:

> Thread safety is very complex and has an impact on performance. I dislike
> the idea of providing such property to generators which can have a complex
> next method.
>
> IMHO it's better to put a generator in wrapper which adds thread safety.
>
> What do you think?
>
> Victor
>
> Le 14 avr. 2017 18:48, "Serhiy Storchaka"  a écrit :
>
>> When use a generator from different threads you can get a ValueError
>> "generator already executing". Getting this exception with the single
>> thread is a programming error, it in case of different threads it could be
>> possible to wait until other thread finish executing the generator. The
>> generator can be made thread-safe after wrapping it in a class that acquire
>> a lock before calling the generator's __next__ method (for example see
>> [1]). But this is not very efficient of course.
>>
>> I wondering if it is worth to add support of thread-safe generators in
>> the stdlib. Either by providing standard decorator (written in C for
>> efficiency), or adding threading support just in the generator object. The
>> latter may need increasing the size of the generator object for a lock and
>> thread identifier (but may be GIL is enough), but should not affect
>> performance since locking is used only when you faced with a generator
>> running in other thread.
>>
>> This topic already was raised on Python-Dev [2] but didn't moved too
>> much. There are a number of StackOverflow questions about threads and
>> generators. We have already encountered this issue in the stdlib. Once in
>> regrtest with the -j option ([3], [4]), other time after reimplementing
>> tempfile._RandomNameSequence as a generator [5].
>>
>> [1] http://anandology.com/blog/using-iterators-and-generators/
>> [2] https://mail.python.org/pipermail/python-dev/2004-February/0
>> 42390.html
>> [3] https://bugs.python.org/issue7996
>> [4] https://bugs.python.org/issue15320
>> [5] https://bugs.python.org/issue30030
>>
>> ___
>> Python-ideas mailing list
>> Python-ideas@python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Thread-safe generators

2017-04-16 Thread Victor Stinner
Thread safety is very complex and has an impact on performance. I dislike
the idea of providing such property to generators which can have a complex
next method.

IMHO it's better to put a generator in wrapper which adds thread safety.

What do you think?

Victor

Le 14 avr. 2017 18:48, "Serhiy Storchaka"  a écrit :

> When use a generator from different threads you can get a ValueError
> "generator already executing". Getting this exception with the single
> thread is a programming error, it in case of different threads it could be
> possible to wait until other thread finish executing the generator. The
> generator can be made thread-safe after wrapping it in a class that acquire
> a lock before calling the generator's __next__ method (for example see
> [1]). But this is not very efficient of course.
>
> I wondering if it is worth to add support of thread-safe generators in the
> stdlib. Either by providing standard decorator (written in C for
> efficiency), or adding threading support just in the generator object. The
> latter may need increasing the size of the generator object for a lock and
> thread identifier (but may be GIL is enough), but should not affect
> performance since locking is used only when you faced with a generator
> running in other thread.
>
> This topic already was raised on Python-Dev [2] but didn't moved too much.
> There are a number of StackOverflow questions about threads and generators.
> We have already encountered this issue in the stdlib. Once in regrtest with
> the -j option ([3], [4]), other time after reimplementing
> tempfile._RandomNameSequence as a generator [5].
>
> [1] http://anandology.com/blog/using-iterators-and-generators/
> [2] https://mail.python.org/pipermail/python-dev/2004-February/042390.html
> [3] https://bugs.python.org/issue7996
> [4] https://bugs.python.org/issue15320
> [5] https://bugs.python.org/issue30030
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Thread-safe generators

2017-04-15 Thread Guido van Rossum
My 2 cent's worth, don't even think about it.

On Apr 15, 2017 3:27 AM, "Serhiy Storchaka"  wrote:

> On 15.04.17 11:55, Stephen J. Turnbull wrote:
>
>> Serhiy Storchaka writes:
>>
>>  > The first thread just sets the running flag (as in current code). Due
>> to
>>  > GIL this doesn't need additional synchronization.
>>
>> Can we assume this lack of additional synchronization for other
>> implementations?  If not, do we care?
>>
>
> Other implementations should have atomic test-and-set operations for the
> running flag. Or other ways to prevent a race condition. So yes, we can
> assume this.
>
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Thread-safe generators

2017-04-15 Thread Serhiy Storchaka

On 15.04.17 11:55, Stephen J. Turnbull wrote:

Serhiy Storchaka writes:

 > The first thread just sets the running flag (as in current code). Due to
 > GIL this doesn't need additional synchronization.

Can we assume this lack of additional synchronization for other
implementations?  If not, do we care?


Other implementations should have atomic test-and-set operations for the 
running flag. Or other ways to prevent a race condition. So yes, we can 
assume this.



___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Thread-safe generators

2017-04-15 Thread Nick Coghlan
On 15 April 2017 at 02:47, Serhiy Storchaka  wrote:
> When use a generator from different threads you can get a ValueError
> "generator already executing". Getting this exception with the single thread
> is a programming error, it in case of different threads it could be possible
> to wait until other thread finish executing the generator. The generator can
> be made thread-safe after wrapping it in a class that acquire a lock before
> calling the generator's __next__ method (for example see [1]). But this is
> not very efficient of course.
>
> I wondering if it is worth to add support of thread-safe generators in the
> stdlib. Either by providing standard decorator (written in C for
> efficiency), or adding threading support just in the generator object. The
> latter may need increasing the size of the generator object for a lock and
> thread identifier (but may be GIL is enough), but should not affect
> performance since locking is used only when you faced with a generator
> running in other thread.

Allowing multiple worker threads to pull from the same work queue is a
general concurrency problem, and that's why we have queue.Queue in the
standard library: https://docs.python.org/3/library/queue.html

So I'd be opposed to trying to make generator objects natively thread
aware - as Stephen notes, the GIL is an implementation detail of
CPython, so it isn't OK to rely on it when defining changes to
language level semantics (in this case, whether or not it's OK to have
multiple threads all calling the same generator without some form of
external locking).

However, it may make sense to explore possible options for offering a
queue.AutoQueue type, where the queue always has a defined maximum
size (defaulting to 1), disallows explicit calls to put(), and
automatically populates itself based on an iterator supplied to the
constructors. Once the input iterator raises StopIteration, then the
queue will start reporting itself as being empty.

The benefit of going down that path is that it can be used with
arbitrary iterators (not just generators), and can be more easily
generalised to other synchronisation models (such as multiprocessing).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Thread-safe generators

2017-04-15 Thread Stephen J. Turnbull
Serhiy Storchaka writes:

 > The first thread just sets the running flag (as in current code). Due to 
 > GIL this doesn't need additional synchronization.

Can we assume this lack of additional synchronization for other
implementations?  If not, do we care?

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Thread-safe generators

2017-04-14 Thread Serhiy Storchaka

On 15.04.17 01:42, Greg Ewing wrote:

Serhiy Storchaka wrote:

but should not affect performance since locking is used only when you
faced with a generator running in other thread.


I don't think that's true, because the first thread to use a
generator has to lock it as well. And even if there is only
one thread in existence when __next__ is called, another one
could be created before it finishes and try to use the
generator.


The first thread just sets the running flag (as in current code). Due to 
GIL this doesn't need additional synchronization. Other threads check 
this flag and sleep rather than raising an exception. After finishing 
the generator the first thread checks if there are other threads 
awaiting and awake a one of them.



___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Thread-safe generators

2017-04-14 Thread Greg Ewing

Serhiy Storchaka wrote:
but should not 
affect performance since locking is used only when you faced with a 
generator running in other thread.


I don't think that's true, because the first thread to use a
generator has to lock it as well. And even if there is only
one thread in existence when __next__ is called, another one
could be created before it finishes and try to use the
generator.

--
Greg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Thread-safe generators

2017-04-14 Thread Serhiy Storchaka
When use a generator from different threads you can get a ValueError 
"generator already executing". Getting this exception with the single 
thread is a programming error, it in case of different threads it could 
be possible to wait until other thread finish executing the generator. 
The generator can be made thread-safe after wrapping it in a class that 
acquire a lock before calling the generator's __next__ method (for 
example see [1]). But this is not very efficient of course.


I wondering if it is worth to add support of thread-safe generators in 
the stdlib. Either by providing standard decorator (written in C for 
efficiency), or adding threading support just in the generator object. 
The latter may need increasing the size of the generator object for a 
lock and thread identifier (but may be GIL is enough), but should not 
affect performance since locking is used only when you faced with a 
generator running in other thread.


This topic already was raised on Python-Dev [2] but didn't moved too 
much. There are a number of StackOverflow questions about threads and 
generators. We have already encountered this issue in the stdlib. Once 
in regrtest with the -j option ([3], [4]), other time after 
reimplementing tempfile._RandomNameSequence as a generator [5].


[1] http://anandology.com/blog/using-iterators-and-generators/
[2] https://mail.python.org/pipermail/python-dev/2004-February/042390.html
[3] https://bugs.python.org/issue7996
[4] https://bugs.python.org/issue15320
[5] https://bugs.python.org/issue30030

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/