joellubi commented on issue #2084:
URL: https://github.com/apache/arrow-adbc/issues/2084#issuecomment-2298600431

   > @zeroshade Ok, makes sense. Although I'm here at the mercy of 
`adbc_ingest` pull for the next generator value. Although I would prefer to 
front-load the pre-processing. Because sometimes it can take quite a lot of 
time. I hope the stage creation or other stuff it does underneath will not time 
out. But I suppose it will work, I will check it out.
   
   By "front-load the pre-processing" do you mean that you would like to 
process the next batch while the current batch is being uploaded by 
`adbc_ingest`? Regardless of whether `adbc_ingest` "pulls" the batch or it is 
"pushed" as in your example, python will inherently only do one thing at a time 
assuming your pre-processing work is CPU-bound. In either case, you would need 
to use `threading` or `multiprocessing` to have python pre-process the next 
batch while the current batch is ingesting. One potential way to do this would 
be to use the python generator to wrap a 
[Queue](https://docs.python.org/3/library/queue.html) which can offload the 
pre-processing to another thread or process.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to