Hi Patcharee,

Did you enable the predicate pushdown in the second method?

Thanks.

Zhan Zhang

On Oct 8, 2015, at 1:43 AM, patcharee <patcharee.thong...@uni.no> wrote:

> Hi,
> 
> I am using spark sql 1.5 to query a hive table stored as partitioned orc 
> file. We have the total files is about 6000 files and each file size is about 
> 245MB.
> 
> What is the difference between these two query methods below:
> 
> 1. Using query on hive table directly
> 
> hiveContext.sql("select col1, col2 from table1")
> 
> 2. Reading from orc file, register temp table and query from the temp table
> 
> val c = hiveContext.read.format("orc").load("/apps/hive/warehouse/table1")
> c.registerTempTable("regTable")
> hiveContext.sql("select col1, col2 from regTable")
> 
> When the number of files is large (query all from the total 6000 files) , the 
> second case is much slower then the first one. Any ideas why?
> 
> BR,
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to