wjones127 commented on PR #13857: URL: https://github.com/apache/arrow/pull/13857#issuecomment-1251448976
I started by working on the Take implementation for primitive values, so I could familiarize myself with how the Take kernels work. But based on the benchmark I added, it seems like I actually made performance much worse (except in the monotonic case)! I suspect this is because having to use `ChunkResolver` for every index is more expensive than just copying the values into a contiguous array. Does that sounds reasonable? Or is there something obviously wrong? <details><summary> Benchmark results </summary> Baseline: ``` -------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------------------------------- TakeChunkedInt64RandomIndicesNoNulls/4194304/1000 19362199 ns 19358250 ns 36 items_per_second=216.668M/s null_percent=0.1 size=4.1943M TakeChunkedInt64RandomIndicesNoNulls/4194304/10 19504339 ns 19494278 ns 36 items_per_second=215.156M/s null_percent=10 size=4.1943M TakeChunkedInt64RandomIndicesNoNulls/4194304/2 34162071 ns 34146150 ns 20 items_per_second=122.834M/s null_percent=50 size=4.1943M TakeChunkedInt64RandomIndicesNoNulls/4194304/1 10458465 ns 10455803 ns 66 items_per_second=401.146M/s null_percent=100 size=4.1943M TakeChunkedInt64RandomIndicesNoNulls/4194304/0 12260952 ns 12258093 ns 54 items_per_second=342.166M/s null_percent=0 size=4.1943M TakeChunkedInt64RandomIndicesWithNulls/4194304/1000 19419778 ns 19412389 ns 36 items_per_second=216.063M/s null_percent=0.1 size=4.1943M TakeChunkedInt64RandomIndicesWithNulls/4194304/10 29953237 ns 29944261 ns 23 items_per_second=140.07M/s null_percent=10 size=4.1943M TakeChunkedInt64RandomIndicesWithNulls/4194304/2 51350571 ns 51330500 ns 14 items_per_second=81.7117M/s null_percent=50 size=4.1943M TakeChunkedInt64RandomIndicesWithNulls/4194304/1 3319791 ns 3318972 ns 214 items_per_second=1.26374G/s null_percent=100 size=4.1943M TakeChunkedInt64RandomIndicesWithNulls/4194304/0 12277404 ns 12275145 ns 55 items_per_second=341.691M/s null_percent=0 size=4.1943M TakeChunkedInt64MonotonicIndices/4194304/1000 24581060 ns 24574690 ns 29 items_per_second=170.676M/s null_percent=0.1 size=4.1943M TakeChunkedInt64MonotonicIndices/4194304/10 22506711 ns 22501129 ns 31 items_per_second=186.404M/s null_percent=10 size=4.1943M TakeChunkedInt64MonotonicIndices/4194304/2 20736080 ns 20730853 ns 34 items_per_second=202.322M/s null_percent=50 size=4.1943M TakeChunkedInt64MonotonicIndices/4194304/1 16202271 ns 16196349 ns 43 items_per_second=258.966M/s null_percent=100 size=4.1943M TakeChunkedInt64MonotonicIndices/4194304/0 15727504 ns 15721614 ns 44 items_per_second=266.786M/s null_percent=0 size=4.1943M ``` Proposed: ``` -------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------------------------------- TakeChunkedInt64RandomIndicesNoNulls/4194304/1000 142831500 ns 142791200 ns 5 items_per_second=29.3737M/s null_percent=0.1 size=4.1943M TakeChunkedInt64RandomIndicesNoNulls/4194304/10 144134633 ns 144110400 ns 5 items_per_second=29.1048M/s null_percent=10 size=4.1943M TakeChunkedInt64RandomIndicesNoNulls/4194304/2 125704833 ns 125667167 ns 6 items_per_second=33.3763M/s null_percent=50 size=4.1943M TakeChunkedInt64RandomIndicesNoNulls/4194304/1 84408114 ns 84386875 ns 8 items_per_second=49.7033M/s null_percent=100 size=4.1943M TakeChunkedInt64RandomIndicesNoNulls/4194304/0 88094063 ns 88072375 ns 8 items_per_second=47.6234M/s null_percent=0 size=4.1943M TakeChunkedInt64RandomIndicesWithNulls/4194304/1000 111903111 ns 111859500 ns 6 items_per_second=37.4962M/s null_percent=0.1 size=4.1943M TakeChunkedInt64RandomIndicesWithNulls/4194304/10 113359923 ns 113286667 ns 6 items_per_second=37.0238M/s null_percent=10 size=4.1943M TakeChunkedInt64RandomIndicesWithNulls/4194304/2 95110995 ns 95098625 ns 8 items_per_second=44.1048M/s null_percent=50 size=4.1943M TakeChunkedInt64RandomIndicesWithNulls/4194304/1 1613900 ns 1613515 ns 437 items_per_second=2.59948G/s null_percent=100 size=4.1943M TakeChunkedInt64RandomIndicesWithNulls/4194304/0 88383021 ns 88365750 ns 8 items_per_second=47.4653M/s null_percent=0 size=4.1943M TakeChunkedInt64MonotonicIndices/4194304/1000 23783853 ns 23776276 ns 29 items_per_second=176.407M/s null_percent=0.1 size=4.1943M TakeChunkedInt64MonotonicIndices/4194304/10 24145126 ns 24140310 ns 29 items_per_second=173.747M/s null_percent=10 size=4.1943M TakeChunkedInt64MonotonicIndices/4194304/2 23058231 ns 23046233 ns 30 items_per_second=181.995M/s null_percent=50 size=4.1943M TakeChunkedInt64MonotonicIndices/4194304/1 16306472 ns 16301465 ns 43 items_per_second=257.296M/s null_percent=100 size=4.1943M TakeChunkedInt64MonotonicIndices/4194304/0 15400245 ns 15398652 ns 46 items_per_second=272.381M/s null_percent=0 size=4.1943M ``` </details> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
