[jira] [Commented] (SPARK-11620) parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException

2017-01-25 Thread Swaranga Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839306#comment-15839306
 ] 

Swaranga Sarma commented on SPARK-11620:


I encountered this issue in Spark 2.0.2

> parquet.hadoop.ParquetOutputCommitter.commitJob() throws 
> parquet.io.ParquetEncodingException
> 
>
> Key: SPARK-11620
> URL: https://issues.apache.org/jira/browse/SPARK-11620
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: swetha k
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11620) parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException

2016-10-08 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557931#comment-15557931
 ] 

Hyukjin Kwon commented on SPARK-11620:
--

[~swethakasireddy] Could you please check if this still happens in the current 
master or latest versions?

> parquet.hadoop.ParquetOutputCommitter.commitJob() throws 
> parquet.io.ParquetEncodingException
> 
>
> Key: SPARK-11620
> URL: https://issues.apache.org/jira/browse/SPARK-11620
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: swetha k
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11620) parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException

2015-12-01 Thread swetha k (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034650#comment-15034650
 ] 

swetha k commented on SPARK-11620:
--

[~hyukjin.kwon]

I have the following code that saves the parquet files in my hourly batch to
hdfs and the code is based on the github link in the end. 

val job = Job.getInstance()
var filePath = "path"
val metricsPath: Path = new Path(filePath)
//Check if inputFile exists
val fs: FileSystem = FileSystem.get(job.getConfiguration)

if (fs.exists(metricsPath)) {
  fs.delete(metricsPath, true)
}

// Configure the ParquetOutputFormat to use Avro as the
serialization format
ParquetOutputFormat.setWriteSupportClass(job,
classOf[AvroWriteSupport])
// You need to pass the schema to AvroParquet when you are writing
objects but not when you
// are reading them. The schema is saved in Parquet file for future
readers to use.
AvroParquetOutputFormat.setSchema(job, Metrics.SCHEMA$)


// Create a PairRDD with all keys set to null and wrap each Metrics
in serializable objects
val metricsToBeSaved = metrics.map(metricRecord => (null, new
SerializableMetrics(new Metrics(metricRecord._1, metricRecord._2._1,
metricRecord._2._2;

metricsToBeSaved.coalesce(1500)
// Save the RDD to a Parquet file in our temporary output directory
metricsToBeSaved.saveAsNewAPIHadoopFile(filePath, classOf[Void],
classOf[Metrics],
  classOf[ParquetOutputFormat[Metrics]], job.getConfiguration)


https://github.com/massie/spark-parquet-example/blob/master/src/main/scala/com/zenfractal/SparkParquetExample.scala

> parquet.hadoop.ParquetOutputCommitter.commitJob() throws 
> parquet.io.ParquetEncodingException
> 
>
> Key: SPARK-11620
> URL: https://issues.apache.org/jira/browse/SPARK-11620
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: swetha k
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11620) parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException

2015-11-20 Thread swetha k (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15018051#comment-15018051
 ] 

swetha k commented on SPARK-11620:
--

[~hyukjin.kwon]

We use Spark 1.5.2 now and it still shows the same error. Which version of 
Parquet-Avro should be used for that?

Thanks,
Swetha

> parquet.hadoop.ParquetOutputCommitter.commitJob() throws 
> parquet.io.ParquetEncodingException
> 
>
> Key: SPARK-11620
> URL: https://issues.apache.org/jira/browse/SPARK-11620
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: swetha k
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11620) parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException

2015-11-20 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15015468#comment-15015468
 ] 

Hyukjin Kwon commented on SPARK-11620:
--

[~swethakasireddy] It uses 1.6.0rc3. Hm.. Would please you give me a more 
detailed description? such as the command you ran and full message of exception

> parquet.hadoop.ParquetOutputCommitter.commitJob() throws 
> parquet.io.ParquetEncodingException
> 
>
> Key: SPARK-11620
> URL: https://issues.apache.org/jira/browse/SPARK-11620
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: swetha k
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11620) parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException

2015-11-20 Thread swetha k (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15019284#comment-15019284
 ] 

swetha k commented on SPARK-11620:
--

It is not an error. It is a WARNING and I see the following.

Nov 8, 2015 11:35:39 PM WARNING: parquet.hadoop.ParquetOutputCommitter: could 
not write summary file for active_sessions_current 
parquet.io.ParquetEncodingException: 
maprfs:/user/testId/active_sessions_current/part-r-00142.parquet invalid: all 
the files must be contained in the root active_sessions_current 
at parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:422) 
at 
parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:398) 
at 
parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:51) 
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1056)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:998)

> parquet.hadoop.ParquetOutputCommitter.commitJob() throws 
> parquet.io.ParquetEncodingException
> 
>
> Key: SPARK-11620
> URL: https://issues.apache.org/jira/browse/SPARK-11620
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: swetha k
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11620) parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException

2015-11-20 Thread swetha k (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15019288#comment-15019288
 ] 

swetha k commented on SPARK-11620:
--

[~hyukjin.kwon]

If I use ParquetInputFormat.setReadSupportClass(job, 
classOf[AvroReadSupport[PreviousPVTracker]]) with Parquet 1.7.0 , I see the 
following error. It looks like its is not a part of Parquet 1.7.0. My code is 
based on http://zenfractal.com/2013/08/21/a-powerful-big-data-trio/.

 not found: type AvroReadSupport
[ERROR]   ParquetInputFormat.setReadSupportClass(job, 
classOf[AvroReadSupport[PreviousPVTracker]])


> parquet.hadoop.ParquetOutputCommitter.commitJob() throws 
> parquet.io.ParquetEncodingException
> 
>
> Key: SPARK-11620
> URL: https://issues.apache.org/jira/browse/SPARK-11620
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: swetha k
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11620) parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException

2015-11-12 Thread swetha k (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002384#comment-15002384
 ] 

swetha k commented on SPARK-11620:
--

[~hyukjin.kwon]

We are using Spark 1.4.1 in one of Clusters. Which parquet version should be 
used for 1.4.1?

> parquet.hadoop.ParquetOutputCommitter.commitJob() throws 
> parquet.io.ParquetEncodingException
> 
>
> Key: SPARK-11620
> URL: https://issues.apache.org/jira/browse/SPARK-11620
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: swetha k
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11620) parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException

2015-11-11 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001746#comment-15001746
 ] 

Hyukjin Kwon commented on SPARK-11620:
--

Can you tell me your Spark version?

Spark 1.5.1 uses Parquet 1.7.0, which you can use the library from here.

http://mvnrepository.com/artifact/org.apache.parquet/parquet-avro

> parquet.hadoop.ParquetOutputCommitter.commitJob() throws 
> parquet.io.ParquetEncodingException
> 
>
> Key: SPARK-11620
> URL: https://issues.apache.org/jira/browse/SPARK-11620
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: swetha k
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11620) parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException

2015-11-09 Thread swetha k (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14998015#comment-14998015
 ] 

swetha k commented on SPARK-11620:
--

I see the following Warning message when I use parquet-avro. Following is the 
dependency that I use.


com.twitter
parquet-avro
1.6.0


Nov 8, 2015 11:35:39 PM WARNING: parquet.hadoop.ParquetOutputCommitter: could 
not write summary file for active_sessions_current 
parquet.io.ParquetEncodingException: 
maprfs:/user/testId/active_sessions_current/part-r-00142.parquet invalid: all 
the files must be contained in the root active_sessions_current 
at parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:422) 
at 
parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:398) 
at 
parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:51) 
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1056)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:998)

> parquet.hadoop.ParquetOutputCommitter.commitJob() throws 
> parquet.io.ParquetEncodingException
> 
>
> Key: SPARK-11620
> URL: https://issues.apache.org/jira/browse/SPARK-11620
> Project: Spark
>  Issue Type: Bug
>Reporter: swetha k
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org