Hi Patcharee, Did you enable the predicate pushdown in the second method?
Thanks. Zhan Zhang On Oct 8, 2015, at 1:43 AM, patcharee <patcharee.thong...@uni.no> wrote: > Hi, > > I am using spark sql 1.5 to query a hive table stored as partitioned orc > file. We have the total files is about 6000 files and each file size is about > 245MB. > > What is the difference between these two query methods below: > > 1. Using query on hive table directly > > hiveContext.sql("select col1, col2 from table1") > > 2. Reading from orc file, register temp table and query from the temp table > > val c = hiveContext.read.format("orc").load("/apps/hive/warehouse/table1") > c.registerTempTable("regTable") > hiveContext.sql("select col1, col2 from regTable") > > When the number of files is large (query all from the total 6000 files) , the > second case is much slower then the first one. Any ideas why? > > BR, > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org