[GitHub] [arrow] edponce commented on pull request #13857: ARROW-9773: [C++] Implement Take kernel for ChunkedArray

GitBox Fri, 23 Sep 2022 19:59:45 -0700


edponce commented on PR #13857:
URL: https://github.com/apache/arrow/pull/13857#issuecomment-1256841764


   From the results above, before performing the Take operation what 
information do we know that could allow us to select the adequate strategy?
   * The main factor driving the differences is the indices access order 
(random vs monotonic). I do not think we can identify a priori if the take 
indices are monotonic or random. If so, we can clearly select a strategy. 
Please correct me if I'm wrong here.
   * Number of chunks and size we can get from the chunked array.
   
   Now let's try to very hand-wavy summarize some observations based on logical 
array size.
   
   **Random order**
   * 1K --> concat is ~2x faster
   * 10K --> concat is ~4x faster
   * 100K and 1M --> concat is ~1.5x faster
   
   **Monotonic order** 
   * 1K and 10K --> concat is significantly faster for up to 10's number of 
chunks, ChunkResolver is faster for 100 and 1K chunks
   * 100K and 1M --> ChunkResolver is ~1.5x faster


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] edponce commented on pull request #13857: ARROW-9773: [C++] Implement Take kernel for ChunkedArray

Reply via email to