Re: [I] Further refine the Top K sort operator [datafusion]

via GitHub Mon, 24 Nov 2025 05:04:41 -0800


bharath-techie commented on issue #9417:
URL: https://github.com/apache/datafusion/issues/9417#issuecomment-3570701447


   Hi @alamb @Dandandan ,
   We've been testing datafusion on lower ram instances such as r6g.x large. We 
have allocated 10 GB to datafusion memory pool [greedy] and bunch of clickbench 
queries which gets optimized via topK fails with memory.
   
   For example :
   For following query : 
   ```
   SELECT UserID, SearchPhrase, COUNT(*) FROM hits GROUP BY UserID, 
SearchPhrase ORDER BY COUNT(*) DESC LIMIT 10;
   ```
   
   Partitions: 
   4 [ equals to number of cores ] 
   
   Number of clickbench partitions :
   8 [ Data split into 8 equal parts ] 
   
   Plan : 
   
   ```
   [DEBUG] After: ProjectionExec: expr=[count(Int64(1))@0 as count(), UserID@1 
as UserID, SearchPhrase@2 as SearchPhrase]
     SortExec: TopK(fetch=100), expr=[count(Int64(1))@0 DESC NULLS LAST], 
preserve_partitioning=[false]
       ProjectionExec: expr=[count(Int64(1))@2 as count(Int64(1)), UserID@0 as 
UserID, SearchPhrase@1 as SearchPhrase]
         AggregateExec: mode=Single, gby=[UserID@0 as UserID, SearchPhrase@1 as 
SearchPhrase], aggr=[count(Int64(1))]
           ProjectionExec: expr=[UserID@1 as UserID, SearchPhrase@0 as 
SearchPhrase]
             CoalesceBatchesExec: target_batch_size=1024
               FilterExec: SearchPhrase@0 != 
                 DataSourceExec: file_groups={1 group: [[Users/... 
_parquet_file_merged_860.parquet:0..1919347491]]}, projection=[SearchPhrase, 
UserID], file_type=parquet, predicate=SearchPhrase@0 != , 
pruning_predicate=SearchPhrase_null_count@2 != row_count@3 AND 
(SearchPhrase_min@0 !=  OR  != SearchPhrase_max@1), 
required_guarantees=[SearchPhrase not in ()]
   
   ```
   Exception :
   
   ```
   Caused by: java.lang.RuntimeException: Resources exhausted: Additional 
allocation failed with top memory consumers (across reservations) as:
     TopK[0]#31330(can spill: false) consumed 3.4 GB, peak 3.4 GB,
     TopK[0]#31326(can spill: false) consumed 2.9 GB, peak 2.9 GB,
     GroupedHashAggregateStream[0] (count(1))#31327(can spill: true) consumed 
432.1 MB, peak 432.1 MB,
     GroupedHashAggregateStream[0] (count(1))#31335(can spill: true) consumed 
222.6 MB, peak 222.6 MB,
     GroupedHashAggregateStream[0] (count(1))#31339(can spill: true) consumed 
210.2 MB, peak 210.2 MB.
   Error: Failed to allocate additional 366.2 MB for TopK[0] with 2.9 GB 
already allocated for this reservation - 70.4 MB remain available for the total 
pool
           at 
org.opensearch.datafusion.RecordBatchStream.lambda$loadNextBatch$1(RecordBatchStream.java:105)
 ~[?:?]
   ```
   
   Tried with both` Batch size : 8192` and `Batch size : 1024`
   
   
   I did go through https://github.com/apache/datafusion/issues/9562 which 
seems complex. 
   
   Just wanted to get your opinions on how to tackle this problem and if there 
are any issues that are looking into this.
   
   Please suggest if there is config / planning changes that could help too. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Further refine the Top K sort operator [datafusion]

Reply via email to