[jira] [Issue Comment Deleted] (SPARK-11087) spark.sql.orc.filterPushdown does not work, No ORC pushdown predicate

2015-11-06 Thread patcharee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

patcharee updated SPARK-11087:
--
Comment: was deleted

(was: I found a scenario where the problem exists)

> spark.sql.orc.filterPushdown does not work, No ORC pushdown predicate
> -
>
> Key: SPARK-11087
> URL: https://issues.apache.org/jira/browse/SPARK-11087
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
> Environment: orc file version 0.12 with HIVE_8732
> hive version 1.2.1.2.3.0.0-2557
>Reporter: patcharee
>Priority: Minor
>
> I have an external hive table stored as partitioned orc file (see the table 
> schema below). I tried to query from the table with where clause>
> hiveContext.setConf("spark.sql.orc.filterPushdown", "true")
> hiveContext.sql("select u, v from 4D where zone = 2 and x = 320 and y = 
> 117")). 
> But from the log file with debug logging level on, the ORC pushdown predicate 
> was not generated. 
> Unfortunately my table was not sorted when I inserted the data, but I 
> expected the ORC pushdown predicate should be generated (because of the where 
> clause) though
> Table schema
> 
> hive> describe formatted 4D;
> OK
> # col_namedata_type   comment 
>
> date  int 
> hhint 
> x int 
> y int 
> heightfloat   
> u float   
> v float   
> w float   
> phfloat   
> phb   float   
> t float   
> p float   
> pbfloat   
> qvaporfloat   
> qgraupfloat   
> qnice float   
> qnrainfloat   
> tke_pbl   float   
> el_pblfloat   
> qcloudfloat   
>
> # Partition Information
> # col_namedata_type   comment 
>
> zone  int 
> z int 
> year  int 
> month int 
>
> # Detailed Table Information   
> Database: default  
> Owner:patcharee
> CreateTime:   Thu Jul 09 16:46:54 CEST 2015
> LastAccessTime:   UNKNOWN  
> Protect Mode: None 
> Retention:0
> Location: hdfs://helmhdfs/apps/hive/warehouse/wrf_tables/4D   
>  
> Table Type:   EXTERNAL_TABLE   
> Table Parameters:  
>   EXTERNALTRUE
>   comment this table is imported from rwf_data/*/wrf/*
>   last_modified_bypatcharee   
>   last_modified_time  1439806692  
>   orc.compressZLIB
>   transient_lastDdlTime   1439806692  
>
> # Storage Information  
> SerDe Library:org.apache.hadoop.hive.ql.io.orc.OrcSerde
> InputFormat:  org.apache.hadoop.hive.ql.io.orc.OrcInputFormat  
> OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>  
> Compressed:   No   
> Num Buckets:  -1   
> Bucket Columns:   []   
> Sort Columns: []   
> Storage Desc Params:   
>   serialization.format1   
> Time taken: 0.388 seconds, Fetched: 58 row(s)
> 
> Data was inserted into this table by another spark job>
> 

[jira] [Issue Comment Deleted] (SPARK-11087) spark.sql.orc.filterPushdown does not work, No ORC pushdown predicate

2015-11-06 Thread patcharee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

patcharee updated SPARK-11087:
--
Comment: was deleted

(was: Hi [~zzhan], the problem actually happens when I generates orc file by 
"saveAsTable()" method (because I need my orc file to be accessible through 
hive). See below>>

hive> create external table peopletable(name string, address string, phone 
string) partitioned by(age int) stored as orc location 
'/user/patcharee/peopletable';

On spark shell local mode>>
2501  sql("set hive.exec.dynamic.partition.mode=nonstrict")
2502  sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
2503  case class Person(name: String, age: Int, address: String, phone: String)
2504   val records = (1 to 100).map { i => Person(s"name_$i", i, s"address_$i", 
s"phone_$i" ) }
2505  
sc.parallelize(records).toDF().write.format("orc").mode("Append").partitionBy("age").saveAsTable("peopletable")
2506  val people = sqlContext.read.format("orc").load("peopletable")
2507  people.registerTempTable("people")
2508  sqlContext.sql("SELECT * FROM people WHERE age = 20 and name = 
'name_20'").count

It is true that if the orc file is generated by "save()" method, the predicate 
will be generated. But it is not for the case "saveAsTable()" method.

[~zzhan] can you please suggest how to fix this?)

> spark.sql.orc.filterPushdown does not work, No ORC pushdown predicate
> -
>
> Key: SPARK-11087
> URL: https://issues.apache.org/jira/browse/SPARK-11087
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
> Environment: orc file version 0.12 with HIVE_8732
> hive version 1.2.1.2.3.0.0-2557
>Reporter: patcharee
>Priority: Minor
>
> I have an external hive table stored as partitioned orc file (see the table 
> schema below). I tried to query from the table with where clause>
> hiveContext.setConf("spark.sql.orc.filterPushdown", "true")
> hiveContext.sql("select u, v from 4D where zone = 2 and x = 320 and y = 
> 117")). 
> But from the log file with debug logging level on, the ORC pushdown predicate 
> was not generated. 
> Unfortunately my table was not sorted when I inserted the data, but I 
> expected the ORC pushdown predicate should be generated (because of the where 
> clause) though
> Table schema
> 
> hive> describe formatted 4D;
> OK
> # col_namedata_type   comment 
>
> date  int 
> hhint 
> x int 
> y int 
> heightfloat   
> u float   
> v float   
> w float   
> phfloat   
> phb   float   
> t float   
> p float   
> pbfloat   
> qvaporfloat   
> qgraupfloat   
> qnice float   
> qnrainfloat   
> tke_pbl   float   
> el_pblfloat   
> qcloudfloat   
>
> # Partition Information
> # col_namedata_type   comment 
>
> zone  int 
> z int 
> year  int 
> month int 
>
> # Detailed Table Information   
> Database: default  
> Owner:patcharee
> CreateTime:   Thu Jul 09 16:46:54 CEST 2015
> LastAccessTime:   UNKNOWN  
> Protect Mode: None 
> Retention:0
> Location: hdfs://helmhdfs/apps/hive/warehouse/wrf_tables/4D   
>  
> Table Type:   EXTERNAL_TABLE   
>