jin xing created SPARK-22334: -------------------------------- Summary: Check table size from Hdfs in case the size in metastore is wrong. Key: SPARK-22334 URL: https://issues.apache.org/jira/browse/SPARK-22334 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.0 Reporter: jin xing
Currently we use table properties('totalSize') to judge if to use broadcast join. Table properties are from metastore. However they can be wrong. Hive sometimes fails to update table properties after producing data successfully(e,g, NameNode timeout from https://github.com/apache/hive/blob/branch-1.2/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java#L180). If 'totalSize' in table properties is much smaller than its real size on HDFS, Spark can launch broadcast join by mistake and suffer OOM. Could we add a defense config and check the size from HDFS when 'totalSize' is below {{spark.sql.autoBroadcastJoinThreshold}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org