[ https://issues.apache.org/jira/browse/SPARK-9345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated SPARK-9345: ----------------------------- Component/s: SQL Spark Shell > Failure to cleanup on exceptions causes persistent I/O problems later on > ------------------------------------------------------------------------ > > Key: SPARK-9345 > URL: https://issues.apache.org/jira/browse/SPARK-9345 > Project: Spark > Issue Type: Bug > Components: Spark Shell, SQL > Affects Versions: 1.3.1, 1.4.0, 1.4.1 > Environment: Ubuntu on AWS > Reporter: Simeon Simeonov > Priority: Minor > > When using spark-shell in local mode, I've observed the following behavior on > a number of nodes: > # Some operation generates an exception related to Spark SQL processing via > {{HiveContext}}. > # From that point on, nothing could be written to Hive with {{saveAsTable}}. > # Another identically-configured version of Spark on the same machine may not > exhibit the problem initially but, with enough exceptions, it will start > exhibiting the problem also. > # A new identically-configured installation of the same version on the same > machine would exhibit the problem. > The error is always related to inability to create a temporary folder on HDFS: > {code} > 15/07/25 16:03:35 ERROR InsertIntoHadoopFsRelation: Aborting task. > java.io.IOException: Mkdirs failed to create > file:/user/hive/warehouse/test/_temporary/0/_temporary/attempt_201507251603_0001_m_000001_0 > (exists=false, cwd=file:/home/ubuntu) > at > org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:442) > at > org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:428) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) > at parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:154) > at > parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:279) > at > parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:252) > at > org.apache.spark.sql.parquet.ParquetOutputWriter.<init>(newParquet.scala:83) > at > org.apache.spark.sql.parquet.ParquetRelation2$$anon$4.newInstance(newParquet.scala:229) > at > org.apache.spark.sql.sources.DefaultWriterContainer.initWriters(commands.scala:470) > at > org.apache.spark.sql.sources.BaseWriterContainer.executorSideSetup(commands.scala:360) > at > org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:172) > at > org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:160) > at > org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:160) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) > at org.apache.spark.scheduler.Task.run(Task.scala:70) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > ... > {code} > The behavior does not seem related to HDFS as it persists even if the HDFS > volume is reformatted. > The behavior is difficult to reproduce reliably but consistently observable > with sufficient Spark SQL experimentation (dozens of exceptions arising from > Spark SQL processing). > The likelihood of this happening goes up substantially if some Spark SQL > operation runs out of memory, which suggests > that the problem is related to cleanup. > In this gist ([https://gist.github.com/ssimeonov/72a64947bc33628d2d11]) you > can see how on the same machine, identically configured 1.3.1 and 1.4.1 > versions sharing the same HDFS system and Hive metastore, behave differently. > 1.3.1 can write to Hive. 1.4.1 cannot. The behavior started happening on > 1.4.1 after an out of memory exception on a large job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org