edponce commented on PR #13857: URL: https://github.com/apache/arrow/pull/13857#issuecomment-1256841764
From the results above, before performing the Take operation what information do we know that could allow us to select the adequate strategy? * The main factor driving the differences is the indices access order (random vs monotonic). I do not think we can identify a priori if the take indices are monotonic or random. If so, we can clearly select a strategy. Please correct me if I'm wrong here. * Number of chunks and size we can get from the chunked array. Now let's try to very hand-wavy summarize some observations based on logical array size. **Random order** * 1K --> concat is ~2x faster * 10K --> concat is ~4x faster * 100K and 1M --> concat is ~1.5x faster **Monotonic order** * 1K and 10K --> concat is significantly faster for up to 10's number of chunks, ChunkResolver is faster for 100 and 1K chunks * 100K and 1M --> ChunkResolver is ~1.5x faster -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
