patcharee created SPARK-11087:
---------------------------------

             Summary: spark.sql.orc.filterPushdown does not work, No ORC 
pushdown predicate
                 Key: SPARK-11087
                 URL: https://issues.apache.org/jira/browse/SPARK-11087
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.5.1
         Environment: orc file version 0.12 with HIVE_8732
hive version 1.2.1.2.3.0.0-2557
            Reporter: patcharee
            Priority: Minor


I have an external hive table stored as partitioned orc file (see the table 
schema below). I tried to query from the table with where clause>

hiveContext.setConf("spark.sql.orc.filterPushdown", "true")
hiveContext.sql("select u, v from 4D where zone = 2 and x = 320 and y = 117")). 

But from the log file with debug logging level on, the ORC pushdown predicate 
was not generated. 

Unfortunately my table was not sorted when I inserted the data, but I expected 
the ORC pushdown predicate should be generated (because of the where clause) 
though

Table schema
================================
hive> describe formatted 4D;
OK
# col_name              data_type               comment             
                 
date                    int                                         
hh                      int                                         
x                       int                                         
y                       int                                         
height                  float                                       
u                       float                                       
v                       float                                       
w                       float                                       
ph                      float                                       
phb                     float                                       
t                       float                                       
p                       float                                       
pb                      float                                       
qvapor                  float                                       
qgraup                  float                                       
qnice                   float                                       
qnrain                  float                                       
tke_pbl                 float                                       
el_pbl                  float                                       
qcloud                  float                                       
                 
# Partition Information          
# col_name              data_type               comment             
                 
zone                    int                                         
z                       int                                         
year                    int                                         
month                   int                                         
                 
# Detailed Table Information             
Database:               default                  
Owner:                  patcharee                
CreateTime:             Thu Jul 09 16:46:54 CEST 2015    
LastAccessTime:         UNKNOWN                  
Protect Mode:           None                     
Retention:              0                        
Location:               hdfs://helmhdfs/apps/hive/warehouse/wrf_tables/4D       
 
Table Type:             EXTERNAL_TABLE           
Table Parameters:                
        EXTERNAL                TRUE                
        comment                 this table is imported from rwf_data/*/wrf/*
        last_modified_by        patcharee           
        last_modified_time      1439806692          
        orc.compress            ZLIB                
        transient_lastDdlTime   1439806692          
                 
# Storage Information            
SerDe Library:          org.apache.hadoop.hive.ql.io.orc.OrcSerde        
InputFormat:            org.apache.hadoop.hive.ql.io.orc.OrcInputFormat  
OutputFormat:           org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat        
 
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:             
        serialization.format    1                   
Time taken: 0.388 seconds, Fetched: 58 row(s)

================================

Data was inserted into this table by another spark job>

df.write.format("org.apache.spark.sql.hive.orc.DefaultSource").mode(org.apache.spark.sql.SaveMode.Append).partitionBy("zone","z","year","month").saveAsTable("4D")




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to