andygrove commented on PR #3536:
URL: 
https://github.com/apache/datafusion-comet/pull/3536#issuecomment-3910404829

   I ran the queries individually and compared memory usage between main and 
this PR.
   
   Key findings from Claude analysis of the results:
   
   
   1. The memory shift is NOT consistent — it's highly query-dependent. Some 
queries see off-heap decrease (Q4, Q10, Q11), others see large increases (Q7, 
Q12, Q13). There is no single directional trend.
   2. Off-heap and JVM heap sometimes move inversely. Q11 is the clearest 
example: off-heap dropped 56.4% while JVM heap increased 127%. Q10 shows the 
same pattern (off-heap -72.7%, heap +36.5%). DF52 appears to shift work between 
native and JVM memory for certain query shapes.
   3. Join-heavy queries are most affected. The queries with the largest memory 
changes (Q4, Q7, Q10, Q11, Q12, Q13, Q21) all involve complex joins, correlated 
subqueries, or GROUP BY with HAVING. Simpler scan-and-aggregate queries (Q1, 
Q6) are stable. This points to changes in DataFusion 52's hash join/aggregate 
memory management.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to