Error when saving as parquet to S3
After repartitioning a DataFrame in Spark 1.3.0 I get a .parquet exception when saving toAmazon's S3. The data that I try to write is 10G. logsForDate .repartition(10) .saveAsParquetFile(destination) // -- Exception here The exception I receive is: java.io.IOException: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN at parquet.hadoop.ParquetFileWriter$STATE.error(ParquetFileWriter.java:137) at parquet.hadoop.ParquetFileWriter$STATE.startBlock(ParquetFileWriter.java:129) at parquet.hadoop.ParquetFileWriter.startBlock(ParquetFileWriter.java:173) at parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:152) at parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:112) at parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:73) at org.apache.spark.sql.parquet.ParquetRelation2.org $apache$spark$sql$parquet$ParquetRelation2$$writeShard$1(newParquet.scala:635) at org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:649) at org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:649) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) I would like to know what is the problem and how to solve it. *Cosmin Catalin SANDA* Software Systems Engineer Phone: +45.27.30.60.35
Error when saving as parquet to S3 from Spark
After repartitioning a *DataFrame* in *Spark 1.3.0* I get a *.parquet* exception when saving to*Amazon's S3*. The data that I try to write is 10G. logsForDate .repartition(10) .saveAsParquetFile(destination) // -- Exception here The exception I receive is: java.io.IOException: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN at parquet.hadoop.ParquetFileWriter$STATE.error(ParquetFileWriter.java:137) at parquet.hadoop.ParquetFileWriter$STATE.startBlock(ParquetFileWriter.java:129) at parquet.hadoop.ParquetFileWriter.startBlock(ParquetFileWriter.java:173) at parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:152) at parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:112) at parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:73) at org.apache.spark.sql.parquet.ParquetRelation2.org http://org.apache.spark.sql.parquet.parquetrelation2.org/$apache$spark$sql$parquet$ParquetRelation2$$writeShard$1(newParquet.scala:635) at org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:649) at org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:649) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) I would like to know what is the problem and how to solve it. *Cosmin Catalin SANDA* Software Systems Engineer Phone: +45.27.30.60.35
Error when saving as parquet to S3
After repartitioning a DataFrame in Spark 1.3.0 I get a .parquet exception when saving toAmazon's S3. The data that I try to write is 10G. logsForDate .repartition(10) .saveAsParquetFile(destination) // -- Exception here The exception I receive is: java.io.IOException: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN at parquet.hadoop.ParquetFileWriter$STATE.error(ParquetFileWriter.java:137) at parquet.hadoop.ParquetFileWriter$STATE.startBlock(ParquetFileWriter.java:129) at parquet.hadoop.ParquetFileWriter.startBlock(ParquetFileWriter.java:173) at parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:152) at parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:112) at parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:73) at org.apache.spark.sql.parquet.ParquetRelation2.org$apache$spark$sql$parquet$ParquetRelation2$$writeShard$1(newParquet.scala:635) at org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:649) at org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:649) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) I would like to know what is the problem and how to solve it. - https://www.linkedin.com/in/cosmincatalinsanda -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-saving-as-parquet-to-S3-tp22722.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org