[
https://issues.apache.org/jira/browse/SPARK-40377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
LLiu updated SPARK-40377:
-
Description:
Recently, we encountered some driver OOM problems. Some large tables were
compressed using Snappy and then broadcast join was performed, but the actual
data volume was too large, which resulted in driver OOM.
The values of maxBroadcastTableBytes and maxBroadcastRows are hardcoded, 8GB
and 51200 respectively. Maybe we can allow customization of these values,
configure smaller values according to different scenarios, and prohibit
broadcast joins for some large tables to avoid driver OOM.
was:
Recently, we encountered some driver OOM problems. Some tables with large data
volume were compressed using Snappy and then broadcast join was performed, but
the actual data volume was too large, which resulted in driver OOM.
The values of maxBroadcastTableBytes and maxBroadcastRows are hardcoded, 8GB
and 51200 respectively. Maybe we can allow customization of these values,
configure smaller values according to different scenarios, and prohibit
broadcast joins for tables with large data volumes to avoid driver OOM.
> Allow customize maxBroadcastTableBytes and maxBroadcastRows
> ---
>
> Key: SPARK-40377
> URL: https://issues.apache.org/jira/browse/SPARK-40377
> Project: Spark
> Issue Type: Improvement
> Components: SQL
>Affects Versions: 3.4.0
>Reporter: LLiu
>Priority: Major
> Attachments: 截屏2022-09-07 20.40.06.png, 截屏2022-09-07 20.40.16.png
>
>
> Recently, we encountered some driver OOM problems. Some large tables were
> compressed using Snappy and then broadcast join was performed, but the actual
> data volume was too large, which resulted in driver OOM.
> The values of maxBroadcastTableBytes and maxBroadcastRows are hardcoded, 8GB
> and 51200 respectively. Maybe we can allow customization of these values,
> configure smaller values according to different scenarios, and prohibit
> broadcast joins for some large tables to avoid driver OOM.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org