[jira] [Commented] (SPARK-11087) spark.sql.orc.filterPushdown does not work, No ORC pushdown predicate

Zhan Zhang (JIRA) Thu, 15 Oct 2015 13:57:31 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-11087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959599#comment-14959599
 ]


Zhan Zhang commented on SPARK-11087:
------------------------------------

[~patcharee] I try to duplicate your table as much as possible, but still 
didn't hit the problem. Please refer to the below for the details.

case class record(date: Int, hh: Int, x: Int, y: Int, height: Float, u: Float, 
w: Float, ph: Float, phb: Float, t: Float, p: Float, pb: Float, tke_pbl: Float, 
el_pbl: Float, qcloud: Float, zone: Int, z: Int, year: Int, month: Int)

val records = (1 to 100).map { i =>
record(i.toInt, i.toInt, i.toInt, i.toInt, i.toFloat, i.toFloat, i.toFloat, 
i.toFloat, i.toFloat, i.toFloat, i.toFloat, i.toFloat, i.toFloat, i.toFloat, 
i.toFloat, i.toInt, i.toInt, i.toInt, i.toInt)
}


sc.parallelize(records).toDF().write.format("org.apache.spark.sql.hive.orc.DefaultSource").mode(org.apache.spark.sql.SaveMode.Append).partitionBy("zone","z","year","month").saveAsTable("5D")
sc.parallelize(records).toDF().write.format("org.apache.spark.sql.hive.orc.DefaultSource").partitionBy("zone","z","year","month").save("4D")
val test = sqlContext.read.format("orc").load("4D")
2503   test.registerTempTable("4D")
2504   sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
2505  sqlContext.sql("select date, month, year, hh, u*0.9122461, u*-0.40964267, 
z from 4D where x = 320 and y = 117 and zone == 2 and year=2 and z >= 2 and z 
<= 8").show

2015-10-15 13:37:45 OrcInputFormat [INFO] ORC pushdown predicate: leaf-0 = 
(EQUALS x 320)
leaf-1 = (EQUALS y 117)
expr = (and leaf-0 leaf-1)
2507   sqlContext.sql("select date, month, year, hh, u*0.9122461, 
u*-0.40964267, z from 5D where x = 321 and y = 118 and zone == 2 and year=2 and 
z >= 2 and z <= 8").show
2015-10-15 13:40:06 OrcInputFormat [INFO] ORC pushdown predicate: leaf-0 = 
(EQUALS x 321)
leaf-1 = (EQUALS y 118)
expr = (and leaf-0 leaf-1)


> spark.sql.orc.filterPushdown does not work, No ORC pushdown predicate
> ---------------------------------------------------------------------
>
>                 Key: SPARK-11087
>                 URL: https://issues.apache.org/jira/browse/SPARK-11087
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.1
>         Environment: orc file version 0.12 with HIVE_8732
> hive version 1.2.1.2.3.0.0-2557
>            Reporter: patcharee
>            Priority: Minor
>
> I have an external hive table stored as partitioned orc file (see the table 
> schema below). I tried to query from the table with where clause>
> hiveContext.setConf("spark.sql.orc.filterPushdown", "true")
> hiveContext.sql("select u, v from 4D where zone = 2 and x = 320 and y = 
> 117")). 
> But from the log file with debug logging level on, the ORC pushdown predicate 
> was not generated. 
> Unfortunately my table was not sorted when I inserted the data, but I 
> expected the ORC pushdown predicate should be generated (because of the where 
> clause) though
> Table schema
> ================================
> hive> describe formatted 4D;
> OK
> # col_name                    data_type               comment             
>                
> date                  int                                         
> hh                    int                                         
> x                     int                                         
> y                     int                                         
> height                float                                       
> u                     float                                       
> v                     float                                       
> w                     float                                       
> ph                    float                                       
> phb                   float                                       
> t                     float                                       
> p                     float                                       
> pb                    float                                       
> qvapor                float                                       
> qgraup                float                                       
> qnice                 float                                       
> qnrain                float                                       
> tke_pbl               float                                       
> el_pbl                float                                       
> qcloud                float                                       
>                
> # Partition Information                
> # col_name                    data_type               comment             
>                
> zone                  int                                         
> z                     int                                         
> year                  int                                         
> month                 int                                         
>                
> # Detailed Table Information           
> Database:             default                  
> Owner:                patcharee                
> CreateTime:           Thu Jul 09 16:46:54 CEST 2015    
> LastAccessTime:       UNKNOWN                  
> Protect Mode:         None                     
> Retention:            0                        
> Location:             hdfs://helmhdfs/apps/hive/warehouse/wrf_tables/4D       
>  
> Table Type:           EXTERNAL_TABLE           
> Table Parameters:              
>       EXTERNAL                TRUE                
>       comment                 this table is imported from rwf_data/*/wrf/*
>       last_modified_by        patcharee           
>       last_modified_time      1439806692          
>       orc.compress            ZLIB                
>       transient_lastDdlTime   1439806692          
>                
> # Storage Information          
> SerDe Library:        org.apache.hadoop.hive.ql.io.orc.OrcSerde        
> InputFormat:          org.apache.hadoop.hive.ql.io.orc.OrcInputFormat  
> OutputFormat:         org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat        
>  
> Compressed:           No                       
> Num Buckets:          -1                       
> Bucket Columns:       []                       
> Sort Columns:         []                       
> Storage Desc Params:           
>       serialization.format    1                   
> Time taken: 0.388 seconds, Fetched: 58 row(s)
> ================================
> Data was inserted into this table by another spark job>
> df.write.format("org.apache.spark.sql.hive.orc.DefaultSource").mode(org.apache.spark.sql.SaveMode.Append).partitionBy("zone","z","year","month").saveAsTable("4D")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11087) spark.sql.orc.filterPushdown does not work, No ORC pushdown predicate

Reply via email to