patcharee created SPARK-11087: --------------------------------- Summary: spark.sql.orc.filterPushdown does not work, No ORC pushdown predicate Key: SPARK-11087 URL: https://issues.apache.org/jira/browse/SPARK-11087 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.1 Environment: orc file version 0.12 with HIVE_8732 hive version 1.2.1.2.3.0.0-2557 Reporter: patcharee Priority: Minor
I have an external hive table stored as partitioned orc file (see the table schema below). I tried to query from the table with where clause> hiveContext.setConf("spark.sql.orc.filterPushdown", "true") hiveContext.sql("select u, v from 4D where zone = 2 and x = 320 and y = 117")). But from the log file with debug logging level on, the ORC pushdown predicate was not generated. Unfortunately my table was not sorted when I inserted the data, but I expected the ORC pushdown predicate should be generated (because of the where clause) though Table schema ================================ hive> describe formatted 4D; OK # col_name data_type comment date int hh int x int y int height float u float v float w float ph float phb float t float p float pb float qvapor float qgraup float qnice float qnrain float tke_pbl float el_pbl float qcloud float # Partition Information # col_name data_type comment zone int z int year int month int # Detailed Table Information Database: default Owner: patcharee CreateTime: Thu Jul 09 16:46:54 CEST 2015 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://helmhdfs/apps/hive/warehouse/wrf_tables/4D Table Type: EXTERNAL_TABLE Table Parameters: EXTERNAL TRUE comment this table is imported from rwf_data/*/wrf/* last_modified_by patcharee last_modified_time 1439806692 orc.compress ZLIB transient_lastDdlTime 1439806692 # Storage Information SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: serialization.format 1 Time taken: 0.388 seconds, Fetched: 58 row(s) ================================ Data was inserted into this table by another spark job> df.write.format("org.apache.spark.sql.hive.orc.DefaultSource").mode(org.apache.spark.sql.SaveMode.Append).partitionBy("zone","z","year","month").saveAsTable("4D") -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org