[issue24882] ThreadPoolExecutor doesn't reuse threads until #threads == max_workers

2019-05-22 Thread Antoine Pitrou


Antoine Pitrou  added the comment:

Thank you for your contribution iunknwn!

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24882] ThreadPoolExecutor doesn't reuse threads until #threads == max_workers

2019-05-22 Thread Antoine Pitrou


Antoine Pitrou  added the comment:


New changeset 904e34d4e6b6007986dcc585d5c553ee8ae06f95 by Antoine Pitrou (Sean) 
in branch 'master':
bpo-24882: Let ThreadPoolExecutor reuse idle threads before creating new thread 
(#6375)
https://github.com/python/cpython/commit/904e34d4e6b6007986dcc585d5c553ee8ae06f95


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24882] ThreadPoolExecutor doesn't reuse threads until #threads == max_workers

2019-05-18 Thread Antoine Pitrou


Antoine Pitrou  added the comment:

Thomas, I think that's a good argument, so perhaps we should do this (strive to 
reuse threads) after all.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24882] ThreadPoolExecutor doesn't reuse threads until #threads == max_workers

2019-05-18 Thread Pierre Glaser


Change by Pierre Glaser :


--
nosy: +pierreglaser

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24882] ThreadPoolExecutor doesn't reuse threads until #threads == max_workers

2019-05-18 Thread Thomas


Thomas  added the comment:

We ran into this issue in the context of asyncio which uses an internal 
ThreadPoolExecutor to provide an asynchronous getaddrinfo / getnameinfo.

We observed an async application spawned more and more threads through several 
reconnects. With a maximum of 5 x CPUs these were dozens of threads which 
easily looked like a resource leak.

At least in this scenario I would strongly prefer to correctly reuse idle 
threads. 

Spawning all possible threads on initialization in such a transparent case 
would be quite bad. Imagine having a process-parallel daemon that running a 
apparently single-threaded asyncio loop but then getting these executors for 
doing a single asyncio.getaddrinfo. Now you run 80 instances on an 80 core 
machine you get 32.000 extra implicit threads.

Now you can argue whether the default executor in asyncio is good as is, but if 
the executors properly reuse threads, it would be quite unlikely to be a 
practical problem.

--
nosy: +tilsche

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24882] ThreadPoolExecutor doesn't reuse threads until #threads == max_workers

2019-05-08 Thread Brian Quinlan


Brian Quinlan  added the comment:

After playing with it for a while, https://github.com/python/cpython/pull/6375 
seems reasonable to me.

It needs tests and some documentation.

Antoine, are you still -1 because of the complexity increase?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24882] ThreadPoolExecutor doesn't reuse threads until #threads == max_workers

2019-05-08 Thread Brian Quinlan


Brian Quinlan  added the comment:

When I first wrote and started using ThreadPoolExecutor, I had a lot of code 
like this:

with ThreadPoolExecutor(max_workers=500) as e:
  e.map(download, images)

I didn't expect that `images` would be a large list but, if it was, I wanted 
all of the downloads to happen in parallel.

I didn't want to have to explicitly take into account the list size when 
starting the executor (e.g. max_works=min(500, len(images))) but I also didn't 
want to create 500 threads up front when I only needed a few.

My use case involved transient ThreadPoolExecutors so I didn't have to worry 
about idle threads.

In principle, I'd be OK with trying to avoid unnecessary thread creation if the 
implementation can be simple and efficient enough.

https://github.com/python/cpython/pull/6375 seems simple enough but I haven't 
convinced myself that it works yet ;-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24882] ThreadPoolExecutor doesn't reuse threads until #threads == max_workers

2018-04-24 Thread iunknwn

iunknwn  added the comment:

I feel like there are two reasonable options here:

1) We can implement a thread pool with basic resource tracking. This means idle 
threads get recycled, and threads that have been sitting idle for a while are 
terminated as demand drops, so resources can be reclaimed. This wouldn't 
require very much work - we would just need to modify some of the queue 
operations to have timeouts, and track the number of idle threads (most of 
that's already in my first PR). We could easily add options like min_threads 
and idle_thread_timeout as options in kwargs to the init routine. 

2) We can skip all tracking, and spin a fixed number of threads at 
initialization. This removes the complexity of locks and counts, and means the 
thread pool executor will work identically to the process pool executor (which 
also eagerly spawns resources). If we want this, this is ready to go in the 
second PR. 

I personally like option 1 because it feels closer to other languages I've 
worked in, but I'd like a bit more guidance from the reviewers before 
proceeding.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24882] ThreadPoolExecutor doesn't reuse threads until #threads == max_workers

2018-04-19 Thread INADA Naoki

INADA Naoki  added the comment:

Why not just remove TODO comment?
Thread is cheap, but not zero-cost.

--
nosy: +inada.naoki

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24882] ThreadPoolExecutor doesn't reuse threads until #threads == max_workers

2018-04-18 Thread iunknwn

iunknwn  added the comment:

Done - as recommend, I've opened a new PR that changes the behavior to spawn 
all worker threads when the executor is created. This eliminates all the thread 
logic from the submit function.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24882] ThreadPoolExecutor doesn't reuse threads until #threads == max_workers

2018-04-18 Thread iunknwn

Change by iunknwn :


--
pull_requests: +6224

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24882] ThreadPoolExecutor doesn't reuse threads until #threads == max_workers

2018-04-13 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

Creating a new PR would be cleaner IMHO.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24882] ThreadPoolExecutor doesn't reuse threads until #threads == max_workers

2018-04-13 Thread iunknwn

iunknwn  added the comment:

Alright - I'll put together another patch that removes the logic, and spins up 
all threads during initialization. 

Do you want me to create a completely new PR, or just update my existing one?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24882] ThreadPoolExecutor doesn't reuse threads until #threads == max_workers

2018-04-13 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

> That said, if threads are cheap, why not just create all the work threads on 
> initialization, and then remove all the logic entirely?

That would sound reasonable to me.  bquinlan has been absent for a long time, 
so I wouldn't expect an answer from him on this issue.

> Also, regarding the executor and thread-safety, there's an example in the 
> current docs showing a job being added to the executor from a worker thread

Actually, looking at the code again, submit() is protected by the 
shutdown_lock, so it seems it should be thread-safe.  That's on git master btw.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24882] ThreadPoolExecutor doesn't reuse threads until #threads == max_workers

2018-04-13 Thread iunknwn

iunknwn  added the comment:

The existing behavior seems strange (and isn't well documented). The code had a 
TODO comment from bquinlan to implement idle thread recycling, so that was why 
I made the change. 

That said, if threads are cheap, why not just create all the work threads on 
initialization, and then remove all the logic entirely?

Also, regarding the executor and thread-safety, there's an example in the 
current docs showing a job being added to the executor from a worker thread 
(it's part of the example on deadlocks, but it focuses on max worker count, not 
on the executor's thread-safety).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24882] ThreadPoolExecutor doesn't reuse threads until #threads == max_workers

2018-04-08 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

Side note:

> One concern I do have - while writing the patch, I noticed the existing 
> submit method (specifically the adjust_thread_count function) isn't thread 
> safe.

True.  The executor is obviously thread-safe internally (as it handles multiple 
worker threads).  But the user should not /call/ it from multiple threads.

(most primitives exposed by the Python stdlib are not thread-safe, except for 
the simplest ones such as lists, dicts etc.)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24882] ThreadPoolExecutor doesn't reuse threads until #threads == max_workers

2018-04-08 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

> If each worker thread ties up other resources in an application, such as 
> handles to server connections, conserving threads could have a significant 
> impact.

You may want to implement a pooling mechanism for those connections, 
independent of the thread pool.  It is also probably more flexible (you can 
implement whichever caching and lifetime logic benefits your application).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24882] ThreadPoolExecutor doesn't reuse threads until #threads == max_workers

2018-04-08 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

I'm not fond of this proposal.  The existing behaviour is harmless; especially 
for a thread pool, since threads are cheap resources.  Improving the logic a 
bit might seem nice, but it also complicates the executor implementation a bit 
more.

Besides, once the N threads are spawned, they remain alive until the executor 
is shut down. So all it takes is a spike in incoming requests and you don't 
save resources anymore.

--
nosy: +tomMoral
versions: +Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24882] ThreadPoolExecutor doesn't reuse threads until #threads == max_workers

2018-04-08 Thread Ned Deily

Change by Ned Deily :


--
nosy: +bquinlan, pitrou

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24882] ThreadPoolExecutor doesn't reuse threads until #threads == max_workers

2018-04-05 Thread iunknwn

iunknwn  added the comment:

I've submitted a PR that should resolve this - it uses a simple atomic counter 
to ensure new threads are not created if existing threads are idle. 

One concern I do have - while writing the patch, I noticed the existing submit 
method (specifically the adjust_thread_count function) isn't thread safe. I've 
added more details in the PR.

--
components: +Library (Lib)

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24882] ThreadPoolExecutor doesn't reuse threads until #threads == max_workers

2018-04-05 Thread iunknwn

Change by iunknwn :


--
nosy: +iunknwn

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24882] ThreadPoolExecutor doesn't reuse threads until #threads == max_workers

2018-04-04 Thread Roundup Robot

Change by Roundup Robot :


--
keywords: +patch
pull_requests: +6087
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24882] ThreadPoolExecutor doesn't reuse threads until #threads == max_workers

2016-10-21 Thread David MacKenzie

David MacKenzie added the comment:

If each worker thread ties up other resources in an application, such as 
handles to server connections, conserving threads could have a significant 
impact. That's the situation for an application I am involved with.

I've written and tested a patch to make this change, using a second Queue for 
the worker threads to notify the executor in the main thread by sending a None 
when they finish a WorkItem and are therefore idle and ready for more work. 
It's a fairly simple patch. It does add a little more overhead to executing a 
job, inevitably. I can submit the patch if there's interest. Otherwise, perhaps 
the TODO comment in thread.py should be rewritten to explain why it's not worth 
doing.

--
nosy: +dmacnet

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24882] ThreadPoolExecutor doesn't reuse threads until #threads == max_workers

2016-10-21 Thread David MacKenzie

David MacKenzie added the comment:

This issue seems to overlap with 14119.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24882] ThreadPoolExecutor doesn't reuse threads until #threads == max_workers

2016-05-31 Thread Josh Rosenberg

Josh Rosenberg added the comment:

Is there a good reason to worry about overeager worker spawning? 
ProcessPoolExecutor spawns all workers when the first work item is submitted ( 
https://hg.python.org/cpython/file/3.4/Lib/concurrent/futures/process.py#l361 
), only ThreadPoolExecutor even makes an effort to limit the number of threads 
spawned. Threads are typically more lightweight than processes, and with the 
recent GIL improvements, the CPython specific costs associated with threads 
(particularly threads that are just sitting around waiting on a lock) are 
fairly minimal.

It just seems like if eager process spawning isn't a problem, neither is 
(cheaper) eager thread spawning.

--
nosy: +josh.r

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24882] ThreadPoolExecutor doesn't reuse threads until #threads == max_workers

2016-05-30 Thread Torsten Landschoff

Torsten Landschoff added the comment:

For demonstration purposes, here is a small example specifically for Linux 
which shows how each request starts a new thread even though the client blocks 
for each result.

--
nosy: +torsten
Added file: http://bugs.python.org/file43061/many_threads.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24882] ThreadPoolExecutor doesn't reuse threads until #threads == max_workers

2015-08-17 Thread Matt Spitz

Changes by Matt Spitz mattsp...@gmail.com:


--
title: ThreadPoolExceutor doesn't reuse threads until #threads == max_workers 
- ThreadPoolExecutor doesn't reuse threads until #threads == max_workers

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24882
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24882] ThreadPoolExecutor doesn't reuse threads until #threads == max_workers

2015-08-17 Thread Matt Spitz

Matt Spitz added the comment:

On further investigation, it appears that we can't just check against the queue 
length, as it doesn't indicate whether threads are doing work or idle.

A change here will need a counter/semaphore to keep track of the number of 
idle/working threads, which may have negative performance implications.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24882
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com