[jira] [Commented] (SPARK-11087) spark.sql.orc.filterPushdown does not work, No ORC pushdown predicate

Zhan Zhang (JIRA) Thu, 15 Oct 2015 11:21:06 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-11087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959342#comment-14959342
 ]


Zhan Zhang commented on SPARK-11087:
------------------------------------

[~patcharee] I tried a simple case with partition and predicate pushdown, and 
didn't hit the problem. The predicate is pushdown correctly. I will try to use 
your same table to see whether it works.


2501  case class Contact(name: String, phone: String)
2502  case class Person(name: String, age: Int, contacts: Seq[Contact])
2503  val records = (1 to 100).map { i =>;
2504  Person(s"name_$i", i, (0 to 1).map { m => Contact(s"contact_$m", 
s"phone_$m") })
2505  }
2506  sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
2507    
sc.parallelize(records).toDF().write.format("orc").partitionBy("age").save("peoplePartitioned")
2508   val peoplePartitioned = 
sqlContext.read.format("orc").load("peoplePartitioned")
2509  peoplePartitioned.registerTempTable("peoplePartitioned")
2510    sqlContext.sql("SELECT * FROM peoplePartitioned WHERE age = 20 and name 
= 'name_20'").count
2511  :history
2512    sqlContext.sql("SELECT * FROM peoplePartitioned WHERE name = 'name_20' 
and age = 20").count
2513  :history

scala>

2015-10-15 10:40:45 OrcInputFormat [INFO] ORC pushdown predicate: leaf-0 = 
(LESS_THAN age 15)
expr = leaf-0

2015-10-15 10:48:20 OrcInputFormat [INFO] ORC pushdown predicate: leaf-0 = 
(EQUALS name name_20)
expr = leaf-0

sqlContext.sql("SELECT name FROM people WHERE age == 15 and age < 16").count()

2015-10-15 10:58:35 OrcInputFormat [INFO] ORC pushdown predicate: leaf-0 = 
(EQUALS age 15)
leaf-1 = (LESS_THAN age 16)

sqlContext.sql("SELECT name FROM people WHERE age < 15").count()

> spark.sql.orc.filterPushdown does not work, No ORC pushdown predicate
> ---------------------------------------------------------------------
>
>                 Key: SPARK-11087
>                 URL: https://issues.apache.org/jira/browse/SPARK-11087
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.1
>         Environment: orc file version 0.12 with HIVE_8732
> hive version 1.2.1.2.3.0.0-2557
>            Reporter: patcharee
>            Priority: Minor
>
> I have an external hive table stored as partitioned orc file (see the table 
> schema below). I tried to query from the table with where clause>
> hiveContext.setConf("spark.sql.orc.filterPushdown", "true")
> hiveContext.sql("select u, v from 4D where zone = 2 and x = 320 and y = 
> 117")). 
> But from the log file with debug logging level on, the ORC pushdown predicate 
> was not generated. 
> Unfortunately my table was not sorted when I inserted the data, but I 
> expected the ORC pushdown predicate should be generated (because of the where 
> clause) though
> Table schema
> ================================
> hive> describe formatted 4D;
> OK
> # col_name                    data_type               comment             
>                
> date                  int                                         
> hh                    int                                         
> x                     int                                         
> y                     int                                         
> height                float                                       
> u                     float                                       
> v                     float                                       
> w                     float                                       
> ph                    float                                       
> phb                   float                                       
> t                     float                                       
> p                     float                                       
> pb                    float                                       
> qvapor                float                                       
> qgraup                float                                       
> qnice                 float                                       
> qnrain                float                                       
> tke_pbl               float                                       
> el_pbl                float                                       
> qcloud                float                                       
>                
> # Partition Information                
> # col_name                    data_type               comment             
>                
> zone                  int                                         
> z                     int                                         
> year                  int                                         
> month                 int                                         
>                
> # Detailed Table Information           
> Database:             default                  
> Owner:                patcharee                
> CreateTime:           Thu Jul 09 16:46:54 CEST 2015    
> LastAccessTime:       UNKNOWN                  
> Protect Mode:         None                     
> Retention:            0                        
> Location:             hdfs://helmhdfs/apps/hive/warehouse/wrf_tables/4D       
>  
> Table Type:           EXTERNAL_TABLE           
> Table Parameters:              
>       EXTERNAL                TRUE                
>       comment                 this table is imported from rwf_data/*/wrf/*
>       last_modified_by        patcharee           
>       last_modified_time      1439806692          
>       orc.compress            ZLIB                
>       transient_lastDdlTime   1439806692          
>                
> # Storage Information          
> SerDe Library:        org.apache.hadoop.hive.ql.io.orc.OrcSerde        
> InputFormat:          org.apache.hadoop.hive.ql.io.orc.OrcInputFormat  
> OutputFormat:         org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat        
>  
> Compressed:           No                       
> Num Buckets:          -1                       
> Bucket Columns:       []                       
> Sort Columns:         []                       
> Storage Desc Params:           
>       serialization.format    1                   
> Time taken: 0.388 seconds, Fetched: 58 row(s)
> ================================
> Data was inserted into this table by another spark job>
> df.write.format("org.apache.spark.sql.hive.orc.DefaultSource").mode(org.apache.spark.sql.SaveMode.Append).partitionBy("zone","z","year","month").saveAsTable("4D")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11087) spark.sql.orc.filterPushdown does not work, No ORC pushdown predicate

Reply via email to