[ https://issues.apache.org/jira/browse/SPARK-43911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yuming Wang reassigned SPARK-43911: ----------------------------------- Assignee: mcdull_zhang > Use toSet to deduplicate the iterator data to prevent the creation of large > Array > --------------------------------------------------------------------------------- > > Key: SPARK-43911 > URL: https://issues.apache.org/jira/browse/SPARK-43911 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.4.0 > Reporter: mcdull_zhang > Assignee: mcdull_zhang > Priority: Minor > > When SubqueryBroadcastExec reuses the keys of Broadcast HashedRelation for > dynamic partition pruning, it will put all the keys in an Array, and then > call the distinct of the Array to remove the duplicates. > In general, Broadcast HashedRelation may have many rows, and the repetition > rate of this key is high. Doing so will cause this Array to occupy a large > amount of memory (and this memory is not managed by MemoryManager), which may > trigger OOM. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org