[jira] [Commented] (SPARK-5968) Parquet warning in spark-shell
[ https://issues.apache.org/jira/browse/SPARK-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034624#comment-15034624 ] swetha k commented on SPARK-5968: - [~lian cheng] Following are the dependencies and the versions that I am using. I want to know if using a different version would be of any help to fix this. I see this error in my Spark Batch Job when I save the Parquet files to hdfs. 1.5.2 1.7.7 1.4.3 org.apache.spark spark-core_2.10 ${sparkVersion} provided org.apache.avro avro ${avro.version} com.twitter parquet-avro 1.6.0rc7 com.twitter parquet-hadoop 1.6.0rc7 > Parquet warning in spark-shell > -- > > Key: SPARK-5968 > URL: https://issues.apache.org/jira/browse/SPARK-5968 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 >Reporter: Michael Armbrust >Assignee: Cheng Lian >Priority: Critical > Fix For: 1.3.0 > > > This may happen in the case of schema evolving, namely appending new Parquet > data with different but compatible schema to existing Parquet files: > {code} > 15/02/23 23:29:24 WARN ParquetOutputCommitter: could not write summary file > for rankings > parquet.io.ParquetEncodingException: > file:/Users/matei/workspace/apache-spark/rankings/part-r-1.parquet > invalid: all the files must be contained in the root rankings > at parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:422) > at > parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:398) > at > parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:51) > {code} > The reason is that the Spark SQL schemas stored in Parquet key-value metadata > differ. Parquet doesn't know how to "merge" these opaque user-defined > metadata, and just throw an exception and give up writing summary files. > Since the Parquet data source in Spark 1.3.0 supports schema merging, it's > harmless. But this is kind of scary for the user. We should try to suppress > this through the logger. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5968) Parquet warning in spark-shell
[ https://issues.apache.org/jira/browse/SPARK-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001885#comment-15001885 ] Cheng Lian commented on SPARK-5968: --- As explained in the JIRA description, this issue shouldn't affect functionality. > Parquet warning in spark-shell > -- > > Key: SPARK-5968 > URL: https://issues.apache.org/jira/browse/SPARK-5968 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 >Reporter: Michael Armbrust >Assignee: Cheng Lian >Priority: Critical > Fix For: 1.3.0 > > > This may happen in the case of schema evolving, namely appending new Parquet > data with different but compatible schema to existing Parquet files: > {code} > 15/02/23 23:29:24 WARN ParquetOutputCommitter: could not write summary file > for rankings > parquet.io.ParquetEncodingException: > file:/Users/matei/workspace/apache-spark/rankings/part-r-1.parquet > invalid: all the files must be contained in the root rankings > at parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:422) > at > parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:398) > at > parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:51) > {code} > The reason is that the Spark SQL schemas stored in Parquet key-value metadata > differ. Parquet doesn't know how to "merge" these opaque user-defined > metadata, and just throw an exception and give up writing summary files. > Since the Parquet data source in Spark 1.3.0 supports schema merging, it's > harmless. But this is kind of scary for the user. We should try to suppress > this through the logger. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5968) Parquet warning in spark-shell
[ https://issues.apache.org/jira/browse/SPARK-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000200#comment-15000200 ] Cheng Lian commented on SPARK-5968: --- It had once been fixed via a quite hacky trick. Unfortunately it came back again after upgrading to parquet-mr 1.7.0, and there doesn't seem to be a reliable way to override the log settings because of PARQUET-369, which prevents users or other libraries to redirect Parquet JUL logger via SLF4J. It's fixed in the most recent parquet-format master, but I'm afraid we have to wait for another parquet-format and parquet-mr release to fix this issue completely. > Parquet warning in spark-shell > -- > > Key: SPARK-5968 > URL: https://issues.apache.org/jira/browse/SPARK-5968 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 >Reporter: Michael Armbrust >Assignee: Cheng Lian >Priority: Critical > Fix For: 1.3.0 > > > This may happen in the case of schema evolving, namely appending new Parquet > data with different but compatible schema to existing Parquet files: > {code} > 15/02/23 23:29:24 WARN ParquetOutputCommitter: could not write summary file > for rankings > parquet.io.ParquetEncodingException: > file:/Users/matei/workspace/apache-spark/rankings/part-r-1.parquet > invalid: all the files must be contained in the root rankings > at parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:422) > at > parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:398) > at > parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:51) > {code} > The reason is that the Spark SQL schemas stored in Parquet key-value metadata > differ. Parquet doesn't know how to "merge" these opaque user-defined > metadata, and just throw an exception and give up writing summary files. > Since the Parquet data source in Spark 1.3.0 supports schema merging, it's > harmless. But this is kind of scary for the user. We should try to suppress > this through the logger. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5968) Parquet warning in spark-shell
[ https://issues.apache.org/jira/browse/SPARK-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001294#comment-15001294 ] swetha k commented on SPARK-5968: - [~lian cheng] Is this just a logger issue or would it have any potential impact on the functionality? > Parquet warning in spark-shell > -- > > Key: SPARK-5968 > URL: https://issues.apache.org/jira/browse/SPARK-5968 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 >Reporter: Michael Armbrust >Assignee: Cheng Lian >Priority: Critical > Fix For: 1.3.0 > > > This may happen in the case of schema evolving, namely appending new Parquet > data with different but compatible schema to existing Parquet files: > {code} > 15/02/23 23:29:24 WARN ParquetOutputCommitter: could not write summary file > for rankings > parquet.io.ParquetEncodingException: > file:/Users/matei/workspace/apache-spark/rankings/part-r-1.parquet > invalid: all the files must be contained in the root rankings > at parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:422) > at > parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:398) > at > parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:51) > {code} > The reason is that the Spark SQL schemas stored in Parquet key-value metadata > differ. Parquet doesn't know how to "merge" these opaque user-defined > metadata, and just throw an exception and give up writing summary files. > Since the Parquet data source in Spark 1.3.0 supports schema merging, it's > harmless. But this is kind of scary for the user. We should try to suppress > this through the logger. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5968) Parquet warning in spark-shell
[ https://issues.apache.org/jira/browse/SPARK-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996160#comment-14996160 ] swetha k commented on SPARK-5968: - [~marmbrus] How is this issue resolved? I still see the following issue when I try to save my parquet file. Nov 8, 2015 11:35:39 PM WARNING: parquet.hadoop.ParquetOutputCommitter: could not write summary file for active_sessions_current parquet.io.ParquetEncodingException: maprfs:/user/testId/active_sessions_current/part-r-00142.parquet invalid: all the files must be contained in the root active_sessions_current at parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:422) at parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:398) at parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:51) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1056) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:998) > Parquet warning in spark-shell > -- > > Key: SPARK-5968 > URL: https://issues.apache.org/jira/browse/SPARK-5968 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 >Reporter: Michael Armbrust >Assignee: Cheng Lian >Priority: Critical > Fix For: 1.3.0 > > > This may happen in the case of schema evolving, namely appending new Parquet > data with different but compatible schema to existing Parquet files: > {code} > 15/02/23 23:29:24 WARN ParquetOutputCommitter: could not write summary file > for rankings > parquet.io.ParquetEncodingException: > file:/Users/matei/workspace/apache-spark/rankings/part-r-1.parquet > invalid: all the files must be contained in the root rankings > at parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:422) > at > parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:398) > at > parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:51) > {code} > The reason is that the Spark SQL schemas stored in Parquet key-value metadata > differ. Parquet doesn't know how to "merge" these opaque user-defined > metadata, and just throw an exception and give up writing summary files. > Since the Parquet data source in Spark 1.3.0 supports schema merging, it's > harmless. But this is kind of scary for the user. We should try to suppress > this through the logger. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5968) Parquet warning in spark-shell
[ https://issues.apache.org/jira/browse/SPARK-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335098#comment-14335098 ] Apache Spark commented on SPARK-5968: - User 'liancheng' has created a pull request for this issue: https://github.com/apache/spark/pull/4744 Parquet warning in spark-shell -- Key: SPARK-5968 URL: https://issues.apache.org/jira/browse/SPARK-5968 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Reporter: Michael Armbrust Assignee: Cheng Lian Priority: Critical This may happen in the case of schema evolving, namely appending new Parquet data with different but compatible schema to existing Parquet files: {code} 15/02/23 23:29:24 WARN ParquetOutputCommitter: could not write summary file for rankings parquet.io.ParquetEncodingException: file:/Users/matei/workspace/apache-spark/rankings/part-r-1.parquet invalid: all the files must be contained in the root rankings at parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:422) at parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:398) at parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:51) {code} The reason is that the Spark SQL schemas stored in Parquet key-value metadata differ. Parquet doesn't know how to merge these opaque user-defined metadata, and just throw an exception and give up writing summary files. Since the Parquet data source in Spark 1.3.0 supports schema merging, it's harmless. But this is kind of scary for the user. We should try to suppress this through the logger. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org