Re: orc/parquet sql conf

Ovidiu-Cristian MARCU Mon, 25 Jul 2016 04:47:25 -0700

Thank you! Any chance for this work being reviewed and integrated with next 
Spark release?


Best,
Ovidiu
> On 25 Jul 2016, at 12:20, Hyukjin Kwon <gurwls...@gmail.com> wrote:
> 
> For the question 1., It is possible but not supported yet. Please refer 
> https://github.com/apache/spark/pull/13775 
> <https://github.com/apache/spark/pull/13775>
> 
> Thanks!
> 
> 2016-07-25 19:01 GMT+09:00 Ovidiu-Cristian MARCU 
> <ovidiu-cristian.ma...@inria.fr <mailto:ovidiu-cristian.ma...@inria.fr>>:
> Hi,
> 
> Assuming I have some data in both ORC/Parquet formats, and some complex 
> workflow that eventually combine results of some queries on these datasets, I 
> would like to get the best execution and looking at the default configs I 
> noticed:
> 
> 1) Vectorized query execution possible with Parquet only, can you confirm 
> this is possible with the ORC format?
> 
> parameter spark.sql.parquet.enableVectorizedReader
> [1] 
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
>  
> <https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala>
> Hive is assuming ORC, parameter hive.vectorized.execution.enabled
> [2] 
> https://cwiki.apache.org/confluence/display/Hive/Vectorized+Query+Execution 
> <https://cwiki.apache.org/confluence/display/Hive/Vectorized+Query+Execution>
> 
> 2) Enabling filter pushdown is by default true for Parquet only, why not also 
> for ORC?
> spark.sql.parquet.filterPushdown=true
> spark.sql.orc.filterPushdown=false
> 
> 3) Should I even try to process ORC format with Spark at it seems there is 
> Parquet native support?
> 
> 
> Thank you!
> 
> Best,
> Ovidiu
>

Re: orc/parquet sql conf

Reply via email to