2010YOUY01 commented on issue #17597: URL: https://github.com/apache/datafusion/issues/17597#issuecomment-3302602419
This should be a memory leak, and would be great to fix. BTW, regarding the `limit 20000` being slower than limit: I think TopK operator's heap implementation will be inevitably slower than `SortExec` + `LimitExec` when the k is large, due to constant factors in the implementation. IIRC now the planner always opt to `TopK` path when there is a `LIMIT`, it can be better to use some heuristic to make a better decision (for very large K, opt to `SortExec` + `LimitExec` instead of `TopK`); and also maybe add an option to disable `TopK` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
