[GitHub] [arrow-datafusion] ozankabak commented on pull request #7180: Top-K eager batch sorting

via GitHub Tue, 08 Aug 2023 06:44:15 -0700


ozankabak commented on PR #7180:
URL: 
https://github.com/apache/arrow-datafusion/pull/7180#issuecomment-1669647841


   > I'd like to emphasize that there are still regressions with this approach. 
In fact in case of larger files (> 1GB) with K in 1000-8000 range, the runtime 
seems to be hit the most, with probably negligible memory improvements (if 
any). Anecdotally, the original file I've been testing does now show 
considerable speedup though, but that is perhaps not a typical file size 
(146M). So it's a mixed bag really, and I'm not sure it's best for this to be 
merged as is.
   
   I agree. A lot of us are thinking about solving the "top K" problem these 
days, I feel we should be able to find a solution that will achieve desirable 
results without the regressions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] ozankabak commented on pull request #7180: Top-K eager batch sorting

Reply via email to