Alexander Ocsa created ARROW-14202: -------------------------------------- Summary: A more RAM-efficient top-k sink node Key: ARROW-14202 URL: https://issues.apache.org/jira/browse/ARROW-14202 Project: Apache Arrow Issue Type: Improvement Components: C++ Affects Versions: 7.0.0 Reporter: Alexander Ocsa
Mentioned here: https://github.com/apache/arrow/pull/11274#pullrequestreview-768267959 For example, a top-k implementation could periodically (when batches_ has some configurable # of rows) run through and discard data. The way it is written now it would still require me to buffer the entire dataset in memory (and/or spillover). -- This message was sent by Atlassian Jira (v8.3.4#803005)