Spark): Add prefetch for direct result [arrow-adbc]

via GitHub Thu, 03 Apr 2025 13:50:48 -0700


birschick-bq commented on PR #2666:
URL: https://github.com/apache/arrow-adbc/pull/2666#issuecomment-2776262506


   > @birschick-bq, do I understand correctly that there's no reliable or safe 
way to get this ~40% perf improvement because of limitations in the Thrift 
library? What would we need to do to work around it, open multiple connections 
to the server?
   
   Yes, the limitation is in the Thrift library. The Transport layer is 
allocating/disposing the buffers independent of the higher level call. When the 
FetchNext await the results (buffer) it should have, they may have been 
destroyed by the next interleaved call.
   
   My idea to solve this problem, is to have the buffers allocated/dispose at 
the Client layer and passed to the Transport layer. They would be created 
before the `send_` call and disposed after the `recv_` high-level call. 
   
   For example, currently the generated code for 
[FetchResult](https://github.com/apache/arrow-adbc/blob/0a9d8c1e90afa5bb9a510c081f6c8500d4ff797d/csharp/src/Drivers/Apache/Thrift/Service/Rpc/Thrift/TCLIService.cs#L789)
 would change to look something like this ...
   
   ```csharp
   public async Task<TFetchResultsResp> FetchResults(TFetchResultsReq @req, 
CancellationToken cancellationToken = default)
   {
     using TransportBuffer inputBuffer = 
this.InputProtocol.Transport.AllocateBuffer();
     await send_FetchResults(@req, buffer: inputBuffer, cancellationToken);
     using TransportBuffer outputBuffer = 
this.OutputProtocol.Transport.AllocateBuffer();
     return await recv_FetchResults(buffer: outputBuffer, cancellationToken);
   }
   ```
   
   The challenge is changing all the code along the way to pass the buffers 
through the layers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(csharp/src/Drivers/Apache/Spark): Add prefetch for direct result [arrow-adbc]

Reply via email to