uchenily commented on PR #45918: URL: https://github.com/apache/arrow/pull/45918#issuecomment-2750256709
I ran a test `hashjoin + hash aggr` (join type: RIGHT_OUTER, no key match). When each input batch was set to 1<<15, the `probe * build (4096 * 512)` scenario took only 17.8s (including data generation time), whereas the original serial way took 471.8s. It should be noted that during this test, I modified `kNumRowsPerScanTask` to `4 * 1024`. If the original value of `512 * 1024` was used, the performance remained poor, in fact, the test took so long that I couldn't even measure the runtime. What I mean is that kNumRowsPerScanTask also significantly impacts the test results. However, since I couldn't determine a more reasonable value for this parameter, I did not modify kNumRowsPerScanTask in this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
