joellubi commented on issue #2084: URL: https://github.com/apache/arrow-adbc/issues/2084#issuecomment-2298600431
> @zeroshade Ok, makes sense. Although I'm here at the mercy of `adbc_ingest` pull for the next generator value. Although I would prefer to front-load the pre-processing. Because sometimes it can take quite a lot of time. I hope the stage creation or other stuff it does underneath will not time out. But I suppose it will work, I will check it out. By "front-load the pre-processing" do you mean that you would like to process the next batch while the current batch is being uploaded by `adbc_ingest`? Regardless of whether `adbc_ingest` "pulls" the batch or it is "pushed" as in your example, python will inherently only do one thing at a time assuming your pre-processing work is CPU-bound. In either case, you would need to use `threading` or `multiprocessing` to have python pre-process the next batch while the current batch is ingesting. One potential way to do this would be to use the python generator to wrap a [Queue](https://docs.python.org/3/library/queue.html) which can offload the pre-processing to another thread or process. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
