[I] [CH] CartesianProduct cause OOM when input is empty [incubator-gluten]

via GitHub Wed, 11 Dec 2024 23:09:02 -0800


lwz9103 opened a new issue, #8216:
URL: https://github.com/apache/incubator-gluten/issues/8216


   ### Backend
   
   CH (ClickHouse)
   
   ### Bug description
   
   ```
   set spark.sql.autoBroadcastJoinThreshold=-1;
   set spark.sql.shuffle.partitions=100;
   
   create table test_join(a int, b int, c int) using parquet;
   
   select * from 
      (select a from test_join group by a order by a),
      (select b from test_join group by b order by b),
      (select c from test_join group by c order by c)
   limit 10000;
   ```
   
   
![image](https://github.com/user-attachments/assets/f13bbc9a-94fe-4265-8e69-9a7448615623)
   
   CartesianColumnarBatchRDD outputs 100 * 100 * 100 = 1000000 partitions, and 
ShuffleMapStage will create 1000000 tasks.
   
   But, when we use vanilla spark, CartesianRDD only outputs 1 partition.
   
   
   ### Spark version
   
   None
   
   ### Spark configurations
   
   _No response_
   
   ### System information
   
   _No response_
   
   ### Relevant logs
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [CH] CartesianProduct cause OOM when input is empty [incubator-gluten]

Reply via email to