bkietz commented on pull request #10233:
URL: https://github.com/apache/arrow/pull/10233#issuecomment-832989304


   > Why do you need to be so closely tied to the thread pool capacity?
   
   We could certainly simplify by setting a static capacity. This can result in 
extraneous instances of `State` sitting in the cache and doing nothing until a 
set of tasks completed in the event of capacity decrease, since we wouldn't be 
able to react
   
   > What if the tasks are running on a non-thread pool Executor (we have an 
event loop inspired serial executor now and maybe we will have another kind of 
executor in the future)?
   
   This could be generalized to support arbitrary Executors without too much 
trouble, I think. If the general approach of BorrowSet seems acceptable, then I 
can add the OnThreadAdded etc callbacks and move MakeThreadLocalState to 
Executor
   
   > Are the construct and destroy functions just adding complexity. Any user 
that needs to do some special construct/destroy logic could do so in the 
constructor/destructor it would seem.
   
   Re construct: I was thinking of state like `unique_ptr<Encoder>` or so, 
whose default constructor just leaves it null. This isn't the end of the world; 
it'd just defer construction to the task:
   
   ```c++
   auto encoder_set = pool->MakeThreadLocalState<unique_ptr<Encoder>>(...);
   auto encoded_gen = MakeMappedGenerator(
       MakeTransferredGenerator(batch_gen, pool.get()),
       [encoder_set](const std::shared_ptr<RecordBatch>& batch) {
         auto encoder = encoder_set->BorrowOne();
         if (!encoder.get()) encoder.get() = Encoder::Make();
         return encoder.get()->Encode(batch);
       });
   ```
   
   Re destroy: as currently written, this is also the "reduce" step (in the 
unit test I use it to update the global sum from a local sum). I could just 
rename it to "reduce" or "gather". If we want to support dynamic reduction of 
capacity then we'll need to keep this as a callback, otherwise going from 5 
threads to 2 would fail to garbage collect 3 unused states until a workflow was 
completed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to