Davies Liu created SPARK-15392: ---------------------------------- Summary: The default value of size estimation is not good Key: SPARK-15392 URL: https://issues.apache.org/jira/browse/SPARK-15392 Project: Spark Issue Type: Bug Affects Versions: 2.0.0 Reporter: Davies Liu
We use autoBroadcastJoinThreshold + 1L as the default value of size estimation, that is not good in 2.0, because we will calculate the size based on size of schema, then the estimation could be less than autoBroadcastJoinThreshold if you have an SELECT on top of an DataFrame created from RDD. We should use an even bigger default value, for example, MaxLong. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org