[
https://issues.apache.org/jira/browse/SPARK-11087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
patcharee updated SPARK-11087:
--
Comment: was deleted
(was: Hi [~zzhan], the problem actually happens when I generates orc file by
"saveAsTable()" method (because I need my orc file to be accessible through
hive). See below>>
hive> create external table peopletable(name string, address string, phone
string) partitioned by(age int) stored as orc location
'/user/patcharee/peopletable';
On spark shell local mode>>
2501 sql("set hive.exec.dynamic.partition.mode=nonstrict")
2502 sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
2503 case class Person(name: String, age: Int, address: String, phone: String)
2504 val records = (1 to 100).map { i => Person(s"name_$i", i, s"address_$i",
s"phone_$i" ) }
2505
sc.parallelize(records).toDF().write.format("orc").mode("Append").partitionBy("age").saveAsTable("peopletable")
2506 val people = sqlContext.read.format("orc").load("peopletable")
2507 people.registerTempTable("people")
2508 sqlContext.sql("SELECT * FROM people WHERE age = 20 and name =
'name_20'").count
It is true that if the orc file is generated by "save()" method, the predicate
will be generated. But it is not for the case "saveAsTable()" method.
[~zzhan] can you please suggest how to fix this?)
> spark.sql.orc.filterPushdown does not work, No ORC pushdown predicate
> -
>
> Key: SPARK-11087
> URL: https://issues.apache.org/jira/browse/SPARK-11087
> Project: Spark
> Issue Type: Bug
> Components: SQL
>Affects Versions: 1.5.1
> Environment: orc file version 0.12 with HIVE_8732
> hive version 1.2.1.2.3.0.0-2557
>Reporter: patcharee
>Priority: Minor
>
> I have an external hive table stored as partitioned orc file (see the table
> schema below). I tried to query from the table with where clause>
> hiveContext.setConf("spark.sql.orc.filterPushdown", "true")
> hiveContext.sql("select u, v from 4D where zone = 2 and x = 320 and y =
> 117")).
> But from the log file with debug logging level on, the ORC pushdown predicate
> was not generated.
> Unfortunately my table was not sorted when I inserted the data, but I
> expected the ORC pushdown predicate should be generated (because of the where
> clause) though
> Table schema
>
> hive> describe formatted 4D;
> OK
> # col_namedata_type comment
>
> date int
> hhint
> x int
> y int
> heightfloat
> u float
> v float
> w float
> phfloat
> phb float
> t float
> p float
> pbfloat
> qvaporfloat
> qgraupfloat
> qnice float
> qnrainfloat
> tke_pbl float
> el_pblfloat
> qcloudfloat
>
> # Partition Information
> # col_namedata_type comment
>
> zone int
> z int
> year int
> month int
>
> # Detailed Table Information
> Database: default
> Owner:patcharee
> CreateTime: Thu Jul 09 16:46:54 CEST 2015
> LastAccessTime: UNKNOWN
> Protect Mode: None
> Retention:0
> Location: hdfs://helmhdfs/apps/hive/warehouse/wrf_tables/4D
>
> Table Type: EXTERNAL_TABLE
>