Re: sql query orc slow

2015-10-13 Thread Patcharee Thongtra
Hi Zhan Zhang, Is my problem (which is ORC predicate is not generated from WHERE clause even though spark.sql.orc.filterPushdown=true) can be related to some factors below ? - orc file version (File Version: 0.12 with HIVE_8732) - hive version (using Hive 1.2.1.2.3.0.0-2557) - orc table is

Re: sql query orc slow

2015-10-13 Thread Zhan Zhang
Hi Patcharee, I am not sure which side is wrong, driver or executor. If it is executor side, the reason you mentioned may be possible. But if the driver side didn’t set the predicate at all, then somewhere else is broken. Can you please file a JIRA with a simple reproduce step, and let me know

Re: sql query orc slow

2015-10-13 Thread Patcharee Thongtra
Hi Zhan Zhang, Here is the issue https://issues.apache.org/jira/browse/SPARK-11087 BR, Patcharee On 10/13/2015 06:47 PM, Zhan Zhang wrote: Hi Patcharee, I am not sure which side is wrong, driver or executor. If it is executor side, the reason you mentioned may be possible. But if the

Re: sql query orc slow

2015-10-09 Thread patcharee
Yes, the predicate pushdown is enabled, but still take longer time than the first method BR, Patcharee On 08. okt. 2015 18:43, Zhan Zhang wrote: Hi Patcharee, Did you enable the predicate pushdown in the second method? Thanks. Zhan Zhang On Oct 8, 2015, at 1:43 AM, patcharee

Re: sql query orc slow

2015-10-09 Thread Zhan Zhang
Hi Patcharee, >From the query, it looks like only the column pruning will be applied. >Partition pruning and predicate pushdown does not have effect. Do you see big >IO difference between two methods? The potential reason of the speed difference I can think of may be the different versions of

Re: sql query orc slow

2015-10-09 Thread Zhan Zhang
In your case, you manually set an AND pushdown, and the predicate is right based on your setting, : leaf-0 = (EQUALS x 320) The right way is to enable the predicate pushdown as follows. sqlContext.setConf("spark.sql.orc.filterPushdown", "true”) Thanks. Zhan Zhang On Oct 9, 2015, at 9:58

Re: sql query orc slow

2015-10-09 Thread patcharee
I set hiveContext.setConf("spark.sql.orc.filterPushdown", "true"). But from the log No ORC pushdown predicate for my query with WHERE clause. 15/10/09 19:16:01 DEBUG OrcInputFormat: No ORC pushdown predicate I did not understand what wrong with this. BR, Patcharee On 09. okt. 2015 19:10,

Re: sql query orc slow

2015-10-09 Thread patcharee
Hi Zhan Zhang Actually my query has WHERE clause "select date, month, year, hh, (u*0.9122461 - v*-0.40964267), (v*0.9122461 + u*-0.40964267), z from 4D where x = 320 and y = 117 and zone == 2 and year=2009 and z >= 2 and z <= 8", column "x", "y" is not partition column, the others are

Re: sql query orc slow

2015-10-09 Thread Zhan Zhang
That is weird. Unfortunately, there is no debug info available on this part. Can you please open a JIRA to add some debug information on the driver side? Thanks. Zhan Zhang On Oct 9, 2015, at 10:22 AM, patcharee > wrote: I set

Re: sql query orc slow

2015-10-08 Thread Zhan Zhang
Hi Patcharee, Did you enable the predicate pushdown in the second method? Thanks. Zhan Zhang On Oct 8, 2015, at 1:43 AM, patcharee wrote: > Hi, > > I am using spark sql 1.5 to query a hive table stored as partitioned orc > file. We have the total files is about