ozankabak commented on PR #7180: URL: https://github.com/apache/arrow-datafusion/pull/7180#issuecomment-1669647841
> I'd like to emphasize that there are still regressions with this approach. In fact in case of larger files (> 1GB) with K in 1000-8000 range, the runtime seems to be hit the most, with probably negligible memory improvements (if any). Anecdotally, the original file I've been testing does now show considerable speedup though, but that is perhaps not a typical file size (146M). So it's a mixed bag really, and I'm not sure it's best for this to be merged as is. I agree. A lot of us are thinking about solving the "top K" problem these days, I feel we should be able to find a solution that will achieve desirable results without the regressions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
