[jira] [Commented] (SPARK-5968) Parquet warning in spark-shell

2015-12-01 Thread swetha k (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034624#comment-15034624
 ] 

swetha k commented on SPARK-5968:
-

[~lian cheng]

Following are the dependencies and the versions that I am using. I want to know 
if using a different version would be of any help to fix this.  I see this 
error in my Spark Batch Job when I save the Parquet files to hdfs.

1.5.2
1.7.7
1.4.3


org.apache.spark
spark-core_2.10
${sparkVersion}
provided




org.apache.avro
avro
${avro.version}



com.twitter
parquet-avro
1.6.0rc7



com.twitter
parquet-hadoop
1.6.0rc7




> Parquet warning in spark-shell
> --
>
> Key: SPARK-5968
> URL: https://issues.apache.org/jira/browse/SPARK-5968
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: Michael Armbrust
>Assignee: Cheng Lian
>Priority: Critical
> Fix For: 1.3.0
>
>
> This may happen in the case of schema evolving, namely appending new Parquet 
> data with different but compatible schema to existing Parquet files:
> {code}
> 15/02/23 23:29:24 WARN ParquetOutputCommitter: could not write summary file 
> for rankings
> parquet.io.ParquetEncodingException: 
> file:/Users/matei/workspace/apache-spark/rankings/part-r-1.parquet 
> invalid: all the files must be contained in the root rankings
> at parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:422)
> at 
> parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:398)
> at 
> parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:51)
> {code}
> The reason is that the Spark SQL schemas stored in Parquet key-value metadata 
> differ. Parquet doesn't know how to "merge" these opaque user-defined 
> metadata, and just throw an exception and give up writing summary files. 
> Since the Parquet data source in Spark 1.3.0 supports schema merging, it's 
> harmless.  But this is kind of scary for the user.  We should try to suppress 
> this through the logger. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5968) Parquet warning in spark-shell

2015-11-12 Thread Cheng Lian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001885#comment-15001885
 ] 

Cheng Lian commented on SPARK-5968:
---

As explained in the JIRA description, this issue shouldn't affect functionality.

> Parquet warning in spark-shell
> --
>
> Key: SPARK-5968
> URL: https://issues.apache.org/jira/browse/SPARK-5968
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: Michael Armbrust
>Assignee: Cheng Lian
>Priority: Critical
> Fix For: 1.3.0
>
>
> This may happen in the case of schema evolving, namely appending new Parquet 
> data with different but compatible schema to existing Parquet files:
> {code}
> 15/02/23 23:29:24 WARN ParquetOutputCommitter: could not write summary file 
> for rankings
> parquet.io.ParquetEncodingException: 
> file:/Users/matei/workspace/apache-spark/rankings/part-r-1.parquet 
> invalid: all the files must be contained in the root rankings
> at parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:422)
> at 
> parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:398)
> at 
> parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:51)
> {code}
> The reason is that the Spark SQL schemas stored in Parquet key-value metadata 
> differ. Parquet doesn't know how to "merge" these opaque user-defined 
> metadata, and just throw an exception and give up writing summary files. 
> Since the Parquet data source in Spark 1.3.0 supports schema merging, it's 
> harmless.  But this is kind of scary for the user.  We should try to suppress 
> this through the logger. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5968) Parquet warning in spark-shell

2015-11-11 Thread Cheng Lian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000200#comment-15000200
 ] 

Cheng Lian commented on SPARK-5968:
---

It had once been fixed via a quite hacky trick. Unfortunately it came back 
again after upgrading to parquet-mr 1.7.0, and there doesn't seem to be a 
reliable way to override the log settings because of PARQUET-369, which 
prevents users or other libraries to redirect Parquet JUL logger via SLF4J. 
It's fixed in the most recent parquet-format master, but I'm afraid we have to 
wait for another parquet-format and parquet-mr release to fix this issue 
completely.

> Parquet warning in spark-shell
> --
>
> Key: SPARK-5968
> URL: https://issues.apache.org/jira/browse/SPARK-5968
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: Michael Armbrust
>Assignee: Cheng Lian
>Priority: Critical
> Fix For: 1.3.0
>
>
> This may happen in the case of schema evolving, namely appending new Parquet 
> data with different but compatible schema to existing Parquet files:
> {code}
> 15/02/23 23:29:24 WARN ParquetOutputCommitter: could not write summary file 
> for rankings
> parquet.io.ParquetEncodingException: 
> file:/Users/matei/workspace/apache-spark/rankings/part-r-1.parquet 
> invalid: all the files must be contained in the root rankings
> at parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:422)
> at 
> parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:398)
> at 
> parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:51)
> {code}
> The reason is that the Spark SQL schemas stored in Parquet key-value metadata 
> differ. Parquet doesn't know how to "merge" these opaque user-defined 
> metadata, and just throw an exception and give up writing summary files. 
> Since the Parquet data source in Spark 1.3.0 supports schema merging, it's 
> harmless.  But this is kind of scary for the user.  We should try to suppress 
> this through the logger. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5968) Parquet warning in spark-shell

2015-11-11 Thread swetha k (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001294#comment-15001294
 ] 

swetha k commented on SPARK-5968:
-

[~lian cheng]

Is this just a logger issue or would it have any potential impact on the 
functionality?

> Parquet warning in spark-shell
> --
>
> Key: SPARK-5968
> URL: https://issues.apache.org/jira/browse/SPARK-5968
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: Michael Armbrust
>Assignee: Cheng Lian
>Priority: Critical
> Fix For: 1.3.0
>
>
> This may happen in the case of schema evolving, namely appending new Parquet 
> data with different but compatible schema to existing Parquet files:
> {code}
> 15/02/23 23:29:24 WARN ParquetOutputCommitter: could not write summary file 
> for rankings
> parquet.io.ParquetEncodingException: 
> file:/Users/matei/workspace/apache-spark/rankings/part-r-1.parquet 
> invalid: all the files must be contained in the root rankings
> at parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:422)
> at 
> parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:398)
> at 
> parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:51)
> {code}
> The reason is that the Spark SQL schemas stored in Parquet key-value metadata 
> differ. Parquet doesn't know how to "merge" these opaque user-defined 
> metadata, and just throw an exception and give up writing summary files. 
> Since the Parquet data source in Spark 1.3.0 supports schema merging, it's 
> harmless.  But this is kind of scary for the user.  We should try to suppress 
> this through the logger. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5968) Parquet warning in spark-shell

2015-11-08 Thread swetha k (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996160#comment-14996160
 ] 

swetha k commented on SPARK-5968:
-

[~marmbrus]

How is this issue resolved? I still see the following issue when I try to save 
my parquet file.

Nov 8, 2015 11:35:39 PM WARNING: parquet.hadoop.ParquetOutputCommitter: could 
not write summary file for active_sessions_current 
parquet.io.ParquetEncodingException: 
maprfs:/user/testId/active_sessions_current/part-r-00142.parquet invalid: all 
the files must be contained in the root active_sessions_current 
at 
parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:422) 
at 
parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:398) 
at 
parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:51) 
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1056)
 
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:998)

> Parquet warning in spark-shell
> --
>
> Key: SPARK-5968
> URL: https://issues.apache.org/jira/browse/SPARK-5968
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: Michael Armbrust
>Assignee: Cheng Lian
>Priority: Critical
> Fix For: 1.3.0
>
>
> This may happen in the case of schema evolving, namely appending new Parquet 
> data with different but compatible schema to existing Parquet files:
> {code}
> 15/02/23 23:29:24 WARN ParquetOutputCommitter: could not write summary file 
> for rankings
> parquet.io.ParquetEncodingException: 
> file:/Users/matei/workspace/apache-spark/rankings/part-r-1.parquet 
> invalid: all the files must be contained in the root rankings
> at parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:422)
> at 
> parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:398)
> at 
> parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:51)
> {code}
> The reason is that the Spark SQL schemas stored in Parquet key-value metadata 
> differ. Parquet doesn't know how to "merge" these opaque user-defined 
> metadata, and just throw an exception and give up writing summary files. 
> Since the Parquet data source in Spark 1.3.0 supports schema merging, it's 
> harmless.  But this is kind of scary for the user.  We should try to suppress 
> this through the logger. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5968) Parquet warning in spark-shell

2015-02-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335098#comment-14335098
 ] 

Apache Spark commented on SPARK-5968:
-

User 'liancheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/4744

 Parquet warning in spark-shell
 --

 Key: SPARK-5968
 URL: https://issues.apache.org/jira/browse/SPARK-5968
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: Michael Armbrust
Assignee: Cheng Lian
Priority: Critical

 This may happen in the case of schema evolving, namely appending new Parquet 
 data with different but compatible schema to existing Parquet files:
 {code}
 15/02/23 23:29:24 WARN ParquetOutputCommitter: could not write summary file 
 for rankings
 parquet.io.ParquetEncodingException: 
 file:/Users/matei/workspace/apache-spark/rankings/part-r-1.parquet 
 invalid: all the files must be contained in the root rankings
 at parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:422)
 at 
 parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:398)
 at 
 parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:51)
 {code}
 The reason is that the Spark SQL schemas stored in Parquet key-value metadata 
 differ. Parquet doesn't know how to merge these opaque user-defined 
 metadata, and just throw an exception and give up writing summary files. 
 Since the Parquet data source in Spark 1.3.0 supports schema merging, it's 
 harmless.  But this is kind of scary for the user.  We should try to suppress 
 this through the logger. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org