broadcast join in SparkSQL requires analyze table noscan

2016-02-10 Thread Lan Jiang
Hi, there I am looking at the SparkSQL setting spark.sql.autoBroadcastJoinThreshold. According to the programming guide *Note that currently statistics are only supported for Hive Metastore tables where the command ANALYZE TABLE COMPUTE STATISTICS noscan has been run.* My question is that is

Re: broadcast join in SparkSQL requires analyze table noscan

2016-02-10 Thread Michael Armbrust
> > My question is that is "NOSCAN" option a must? If I execute "ANALYZE TABLE > compute statistics" command in Hive shell, is the statistics > going to be used by SparkSQL to decide broadcast join? Yes, spark SQL will only accept the simple no scan version. However, as long as the sizeInBytes

Re: broadcast join in SparkSQL requires analyze table noscan

2016-02-10 Thread Lan Jiang
Michael, Thanks for the reply. On Wed, Feb 10, 2016 at 11:44 AM, Michael Armbrust wrote: > My question is that is "NOSCAN" option a must? If I execute "ANALYZE TABLE >> compute statistics" command in Hive shell, is the statistics >> going to be used by SparkSQL to