[ https://issues.apache.org/jira/browse/SPARK-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14383627#comment-14383627 ]
Cheng Lian commented on SPARK-6566: ----------------------------------- Hi [~k.shaposhni...@gmail.com], as described in SPARK-5463, we do want to upgrade Parquet. However, currently we have two concerns: # The most recent Parquet RC release introduces subtle API incompatibilities related to filter push-down and Parquet metadata gathering, which I believe requires more work than the patch you provided if we want everything works perfectly with the best performance. # We'd like to wait for the official release of Parquet 1.6.0. This is the first release for Parquet as an Apache top-level project, so it takes more time than usual. We probably will first try to upgrade to a most recent 1.6.0 RC release in Spark master, and then switch to the official 1.6.0 release in Spark 1.4.0 (and Spark 1.3.2 if there will be one). > Update Spark to use the latest version of Parquet libraries > ----------------------------------------------------------- > > Key: SPARK-6566 > URL: https://issues.apache.org/jira/browse/SPARK-6566 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 1.3.0 > Reporter: Konstantin Shaposhnikov > > There are a lot of bug fixes in the latest version of parquet (1.6.0rc7). > E.g. PARQUET-136 > It would be good to update Spark to use the latest parquet version. > The following changes are required: > {code} > diff --git a/pom.xml b/pom.xml > index 5ad39a9..095b519 100644 > --- a/pom.xml > +++ b/pom.xml > @@ -132,7 +132,7 @@ > <!-- Version used for internal directory structure --> > <hive.version.short>0.13.1</hive.version.short> > <derby.version>10.10.1.1</derby.version> > - <parquet.version>1.6.0rc3</parquet.version> > + <parquet.version>1.6.0rc7</parquet.version> > <jblas.version>1.2.3</jblas.version> > <jetty.version>8.1.14.v20131031</jetty.version> > <orbit.version>3.0.0.v201112011016</orbit.version> > {code} > and > {code} > --- > a/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala > +++ > b/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala > @@ -480,7 +480,7 @@ private[parquet] class FilteringParquetRowInputFormat > globalMetaData = new GlobalMetaData(globalMetaData.getSchema, > mergedMetadata, globalMetaData.getCreatedBy) > > - val readContext = getReadSupport(configuration).init( > + val readContext = > ParquetInputFormat.getReadSupportInstance(configuration).init( > new InitContext(configuration, > globalMetaData.getKeyValueMetaData, > globalMetaData.getSchema)) > {code} > I am happy to prepare a pull request if necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org