bkietz commented on pull request #10233: URL: https://github.com/apache/arrow/pull/10233#issuecomment-832989304
> Why do you need to be so closely tied to the thread pool capacity? We could certainly simplify by setting a static capacity. This can result in extraneous instances of `State` sitting in the cache and doing nothing until a set of tasks completed in the event of capacity decrease, since we wouldn't be able to react > What if the tasks are running on a non-thread pool Executor (we have an event loop inspired serial executor now and maybe we will have another kind of executor in the future)? This could be generalized to support arbitrary Executors without too much trouble, I think. If the general approach of BorrowSet seems acceptable, then I can add the OnThreadAdded etc callbacks and move MakeThreadLocalState to Executor > Are the construct and destroy functions just adding complexity. Any user that needs to do some special construct/destroy logic could do so in the constructor/destructor it would seem. Re construct: I was thinking of state like `unique_ptr<Encoder>` or so, whose default constructor just leaves it null. This isn't the end of the world; it'd just defer construction to the task: ```c++ auto encoder_set = pool->MakeThreadLocalState<unique_ptr<Encoder>>(...); auto encoded_gen = MakeMappedGenerator( MakeTransferredGenerator(batch_gen, pool.get()), [encoder_set](const std::shared_ptr<RecordBatch>& batch) { auto encoder = encoder_set->BorrowOne(); if (!encoder.get()) encoder.get() = Encoder::Make(); return encoder.get()->Encode(batch); }); ``` Re destroy: as currently written, this is also the "reduce" step (in the unit test I use it to update the global sum from a local sum). I could just rename it to "reduce" or "gather". If we want to support dynamic reduction of capacity then we'll need to keep this as a callback, otherwise going from 5 threads to 2 would fail to garbage collect 3 unused states until a workflow was completed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org