[jira] [Updated] (SPARK-40377) Allow customize maxBroadcastTableBytes and maxBroadcastRows

2022-09-08 Thread LeeeeLiu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LLiu updated SPARK-40377:
-
Description: 
Recently, we encountered some driver OOM problems. Some large tables were 
compressed using Snappy and then broadcast join was performed, but the actual 
data volume was too large, which resulted in driver OOM.

The values of maxBroadcastTableBytes and maxBroadcastRows are hardcoded, 8GB 
and 51200 respectively. Maybe we can allow customization of these values, 
configure smaller values according to different scenarios, and prohibit 
broadcast joins for some large tables to avoid driver OOM.

  was:
Recently, we encountered some driver OOM problems. Some tables with large data 
volume were compressed using Snappy and then broadcast join was performed, but 
the actual data volume was too large, which resulted in driver OOM.

The values of maxBroadcastTableBytes and maxBroadcastRows are hardcoded, 8GB 
and 51200 respectively. Maybe we can allow customization of these values, 
configure smaller values according to different scenarios, and prohibit 
broadcast joins for tables with large data volumes to avoid driver OOM.


> Allow customize maxBroadcastTableBytes and maxBroadcastRows
> ---
>
> Key: SPARK-40377
> URL: https://issues.apache.org/jira/browse/SPARK-40377
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: LLiu
>Priority: Major
> Attachments: 截屏2022-09-07 20.40.06.png, 截屏2022-09-07 20.40.16.png
>
>
> Recently, we encountered some driver OOM problems. Some large tables were 
> compressed using Snappy and then broadcast join was performed, but the actual 
> data volume was too large, which resulted in driver OOM.
> The values of maxBroadcastTableBytes and maxBroadcastRows are hardcoded, 8GB 
> and 51200 respectively. Maybe we can allow customization of these values, 
> configure smaller values according to different scenarios, and prohibit 
> broadcast joins for some large tables to avoid driver OOM.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40377) Allow customize maxBroadcastTableBytes and maxBroadcastRows

2022-09-07 Thread LeeeeLiu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LLiu updated SPARK-40377:
-
Attachment: 截屏2022-09-07 20.40.06.png
截屏2022-09-07 20.40.16.png

> Allow customize maxBroadcastTableBytes and maxBroadcastRows
> ---
>
> Key: SPARK-40377
> URL: https://issues.apache.org/jira/browse/SPARK-40377
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: LLiu
>Priority: Major
> Attachments: 截屏2022-09-07 20.40.06.png, 截屏2022-09-07 20.40.16.png
>
>
> Recently, we encountered some driver OOM problems. Some tables with large 
> data volume were compressed using Snappy and then broadcast join was 
> performed, but the actual data volume was too large, which resulted in driver 
> OOM.
> The values of maxBroadcastTableBytes and maxBroadcastRows are hardcoded, 8GB 
> and 51200 respectively. Maybe we can allow customization of these values, 
> configure smaller values according to different scenarios, and prohibit 
> broadcast joins for tables with large data volumes to avoid driver OOM.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org