Re: Vectorized ORC Reader in Apache Spark 2.3 with Apache ORC 1.4.1.

2018-01-28 Thread Dongjoon Hyun
Hi, Nicolas. Yes. In Apache Spark 2.3, there are new sub-improvements for SPARK-20901 (Feature parity for ORC with Parquet). For your questions, the following three are related. 1. spark.sql.orc.impl="native" By default, `native` ORC implementation (based on the latest ORC 1.4.1) is added.

Re: Vectorized ORC Reader in Apache Spark 2.3 with Apache ORC 1.4.1.

2018-01-28 Thread Nicolas Paris
Hi Thanks for this work. Will this affect both: 1) spark.read.format("orc").load("...") 2) spark.sql("select ... from my_orc_table_in_hive") ? Le 10 janv. 2018 à 20:14, Dongjoon Hyun écrivait : > Hi, All. > > Vectorized ORC Reader is now supported in Apache Spark 2.3. > >    

Vectorized ORC Reader in Apache Spark 2.3 with Apache ORC 1.4.1.

2018-01-10 Thread Dongjoon Hyun
Hi, All. Vectorized ORC Reader is now supported in Apache Spark 2.3. https://issues.apache.org/jira/browse/SPARK-16060 It has been a long journey. From now, Spark can read ORC files faster without feature penalty. Thank you for all your support, especially Wenchen Fan. It's done by two