Re: Writing empty Dataframes doesn't save any _metadata files in Spark 1.5.1 and 1.6

2016-06-14 Thread Hyukjin Kwon
Ops, I just so the link. It is not actually only for Spark 2.0.


To be clear, https://issues.apache.org/jira/browse/SPARK-15393 was a bit
different with your case (it was about writing empty data frame with empty
partitions).

This was caused by https://github.com/apache/spark/pull/12855 and reverted.



I wrote your case in the comments in that JIRA.



2016-06-15 10:26 GMT+09:00 Hyukjin Kwon :

> Yea, I met this case before. I guess this is related with
> https://issues.apache.org/jira/browse/SPARK-15393.
>
> 2016-06-15 8:46 GMT+09:00 antoniosi :
>
>> I tried the following code in both Spark 1.5.1 and Spark 1.6.0:
>>
>> import org.apache.spark.sql.types.{
>> StructType, StructField, StringType, IntegerType}
>> import org.apache.spark.sql.Row
>>
>> val schema = StructType(
>> StructField("k", StringType, true) ::
>> StructField("v", IntegerType, false) :: Nil)
>>
>> sqlContext.createDataFrame(sc.emptyRDD[Row], schema)
>> df.write.save("hdfs://xxx")
>>
>> Both 1.5.1 and 1.6.0 only save _SUCCESS file. It does not save any
>> _metadata
>> files. Also, in 1.6.0, it also gives the following error:
>>
>> 16/06/14 16:29:27 WARN ParquetOutputCommitter: could not write summary
>> file
>> for hdfs://xxx
>> java.lang.NullPointerException
>> at
>>
>> org.apache.parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:456)
>> at
>>
>> org.apache.parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:420)
>> at
>>
>> org.apache.parquet.hadoop.ParquetOutputCommitter.writeMetaDataFile(ParquetOutputCommitter.java:58)
>> at
>>
>> org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:48)
>> at
>>
>> org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:230)
>> at
>>
>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:151)
>> at
>>
>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108)
>> at
>>
>> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108)
>>
>> I do not get this exception in 1.5.1 version though.
>>
>> I see this bug https://issues.apache.org/jira/browse/SPARK-15393, but
>> this
>> is for Spark 2.0. Is there a same bug in Spark 1.5.1 and 1.6?
>>
>> Is there a way we could save an empty dataframe properly?
>>
>> Thanks.
>>
>> Antonio.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Writing-empty-Dataframes-doesn-t-save-any-metadata-files-in-Spark-1-5-1-and-1-6-tp27169.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


Re: Writing empty Dataframes doesn't save any _metadata files in Spark 1.5.1 and 1.6

2016-06-14 Thread Hyukjin Kwon
Yea, I met this case before. I guess this is related with
https://issues.apache.org/jira/browse/SPARK-15393.

2016-06-15 8:46 GMT+09:00 antoniosi :

> I tried the following code in both Spark 1.5.1 and Spark 1.6.0:
>
> import org.apache.spark.sql.types.{
> StructType, StructField, StringType, IntegerType}
> import org.apache.spark.sql.Row
>
> val schema = StructType(
> StructField("k", StringType, true) ::
> StructField("v", IntegerType, false) :: Nil)
>
> sqlContext.createDataFrame(sc.emptyRDD[Row], schema)
> df.write.save("hdfs://xxx")
>
> Both 1.5.1 and 1.6.0 only save _SUCCESS file. It does not save any
> _metadata
> files. Also, in 1.6.0, it also gives the following error:
>
> 16/06/14 16:29:27 WARN ParquetOutputCommitter: could not write summary file
> for hdfs://xxx
> java.lang.NullPointerException
> at
>
> org.apache.parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:456)
> at
>
> org.apache.parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:420)
> at
>
> org.apache.parquet.hadoop.ParquetOutputCommitter.writeMetaDataFile(ParquetOutputCommitter.java:58)
> at
>
> org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:48)
> at
>
> org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:230)
> at
>
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:151)
> at
>
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108)
> at
>
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108)
>
> I do not get this exception in 1.5.1 version though.
>
> I see this bug https://issues.apache.org/jira/browse/SPARK-15393, but this
> is for Spark 2.0. Is there a same bug in Spark 1.5.1 and 1.6?
>
> Is there a way we could save an empty dataframe properly?
>
> Thanks.
>
> Antonio.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Writing-empty-Dataframes-doesn-t-save-any-metadata-files-in-Spark-1-5-1-and-1-6-tp27169.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>