wenwj0 opened a new issue, #10010:
URL: https://github.com/apache/incubator-gluten/issues/10010
### Backend
VL (Velox)
### Bug description
When I run the SQL like below, an OOM error occurs.
```sql
select 20250530 ,key_id2, count(distinct key_id3)
from
(
select *
from xxxtable1
where dt between '20180101' and '20250530'
) a
left join
(
select *
from xxxtable2
where ds between '20180101' and '20250530'
) b on lower(a.key_id1)=lower(b.key_id1) and a.key_id2=b.key_id2
group by key_id2
```
Error msg is :
`ExecutorLostFailure (executor 43 exited caused by one of the running tasks)
Reason: Container killed by YARN for exceeding physical memory limits. 6.0 GB
of 6 GB physical memory used. Consider boosting spark.executor.memoryOverhead.`
The data size of scan in these two tables is about 1 TB. I tried to use
shufflehashjoin and sortmergerjoin respectively, but they were failed. The same
SQL can be run successfully in vanilla spark.
The failed stage is join, I suspect it has something wrong with spill.
<img width="726" alt="Image"
src="https://github.com/user-attachments/assets/4fd8c52a-b2a3-4355-bd8b-27e7e38a7e39"
/>
### Gluten version
Gluten-1.3
### Spark version
Spark-3.2.x
### Spark configurations
spark.memory.offHeap.enabled=true;
spark.memory.offHeap.size=3g;
spark.yarn.executor.memoryOverhead=2g;
spark.executor.memory=1g;
spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager;
spark.sql.shuffle.partitions=500;
spark.io.compression.codec=zstd;
### System information
Gluten Version: 1.3.0
Commit: 98546a6d62e889d792d44715d90b1bf92f2e74e3
CMake Version: 3.28.3
System: Linux-4.9.0-14-amd64
Arch: x86_64
CPU Name: Model name: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
C++ Compiler: /usr/bin/c++
C++ Compiler Version: 11.5.0
C Compiler: /usr/bin/cc
C Compiler Version: 11.5.0
CMake Prefix Path:
/usr/local;/usr;/;/usr/local;/usr/local;/usr/X11R6;/usr/pkg;/opt
### Relevant logs
```bash
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]