broadcast join in SparkSQL requires analyze table noscan
Hi, there I am looking at the SparkSQL setting spark.sql.autoBroadcastJoinThreshold. According to the programming guide *Note that currently statistics are only supported for Hive Metastore tables where the command ANALYZE TABLE COMPUTE STATISTICS noscan has been run.* My question is that is "NOSCAN" option a must? If I execute "ANALYZE TABLE compute statistics" command in Hive shell, is the statistics going to be used by SparkSQL to decide broadcast join? Thanks.
Re: broadcast join in SparkSQL requires analyze table noscan
> > My question is that is "NOSCAN" option a must? If I execute "ANALYZE TABLE > compute statistics" command in Hive shell, is the statistics > going to be used by SparkSQL to decide broadcast join? Yes, spark SQL will only accept the simple no scan version. However, as long as the sizeInBytes statistic is present, we will use it.
Re: broadcast join in SparkSQL requires analyze table noscan
Michael, Thanks for the reply. On Wed, Feb 10, 2016 at 11:44 AM, Michael Armbrustwrote: > My question is that is "NOSCAN" option a must? If I execute "ANALYZE TABLE >> compute statistics" command in Hive shell, is the statistics >> going to be used by SparkSQL to decide broadcast join? > > > Yes, spark SQL will only accept the simple no scan version. However, as > long as the sizeInBytes statistic is present, we will use it. > >