Re: sql query orc slow

Zhan Zhang Fri, 09 Oct 2015 09:32:25 -0700

Hi Patcharee,

>From the query, it looks like only the column pruning will be applied. 
>Partition pruning and predicate pushdown does not have effect. Do you see big 
>IO difference between two methods?


The potential reason of the speed difference I can think of may be the 
different versions of OrcInputFormat. The hive path may use NewOrcInputFormat, 
but the spark path use OrcInputFormat.

Thanks.

Zhan Zhang

On Oct 8, 2015, at 11:55 PM, patcharee <patcharee.thong...@uni.no> wrote:

> Yes, the predicate pushdown is enabled, but still take longer time than the 
> first method
> 
> BR,
> Patcharee
> 
> On 08. okt. 2015 18:43, Zhan Zhang wrote:
>> Hi Patcharee,
>> 
>> Did you enable the predicate pushdown in the second method?
>> 
>> Thanks.
>> 
>> Zhan Zhang
>> 
>> On Oct 8, 2015, at 1:43 AM, patcharee <patcharee.thong...@uni.no> wrote:
>> 
>>> Hi,
>>> 
>>> I am using spark sql 1.5 to query a hive table stored as partitioned orc 
>>> file. We have the total files is about 6000 files and each file size is 
>>> about 245MB.
>>> 
>>> What is the difference between these two query methods below:
>>> 
>>> 1. Using query on hive table directly
>>> 
>>> hiveContext.sql("select col1, col2 from table1")
>>> 
>>> 2. Reading from orc file, register temp table and query from the temp table
>>> 
>>> val c = hiveContext.read.format("orc").load("/apps/hive/warehouse/table1")
>>> c.registerTempTable("regTable")
>>> hiveContext.sql("select col1, col2 from regTable")
>>> 
>>> When the number of files is large (query all from the total 6000 files) , 
>>> the second case is much slower then the first one. Any ideas why?
>>> 
>>> BR,
>>> 
>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>> 
>>> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: sql query orc slow

Reply via email to