[ https://issues.apache.org/jira/browse/SPARK-21422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-21422: ---------------------------------- Description: Like Parquet, this issue aims to depend on the latest Apache ORC 1.4 for Apache Spark 2.3. There are key benefits for now. - Stability: Apache ORC 1.4.0 has many fixes and we can depend on ORC community more. - Maintainability: Reduce the Hive dependency and can remove old legacy code later. Later, we can get the following two key benefits by adding new ORCFileFormat in SPARK-20728, too. - Usability: User can use ORC data sources without hive module, i.e, -Phive. - Speed: Use both Spark ColumnarBatch and ORC RowBatch together. This is faster than the current implementation in Spark. was: Like Parquet, this issue aims to depend on the latest Apache ORC 1.4 for Apache Spark 2.3. > Depend on Apache ORC 1.4.0 > -------------------------- > > Key: SPARK-21422 > URL: https://issues.apache.org/jira/browse/SPARK-21422 > Project: Spark > Issue Type: Bug > Components: Build > Affects Versions: 2.3.0 > Reporter: Dongjoon Hyun > > Like Parquet, this issue aims to depend on the latest Apache ORC 1.4 for > Apache Spark 2.3. There are key benefits for now. > - Stability: Apache ORC 1.4.0 has many fixes and we can depend on ORC > community more. > - Maintainability: Reduce the Hive dependency and can remove old legacy code > later. > Later, we can get the following two key benefits by adding new ORCFileFormat > in SPARK-20728, too. > - Usability: User can use ORC data sources without hive module, i.e, -Phive. > - Speed: Use both Spark ColumnarBatch and ORC RowBatch together. This is > faster than the current implementation in Spark. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org