It is Hadoop-2.4.0 with spark-1.3.0.
I found that the problem only happen if there are multi nodes. If the cluster
has only one node, it works fine.
For example if the cluster has a spark-master on machine A and a spark-worker
on machine B, this problem happen. If both spark-master and spark-worker are on
machine A, then no problem.
I do not use HDFS. I am just saving the RDD to a window share folder
rdd.saveAsObjectFile(“file:///T:/lab4-win02/IndexRoot01/tobacco-07/myrdd.objfile:///T:\lab4-win02\IndexRoot01\tobacco-07\myrdd.obj”)
With T: drive mapped to
\\10.196.119.230\mysharefile:///\\10.196.119.230\myshare
Ningjun
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Friday, May 22, 2015 5:02 PM
To: Wang, Ningjun (LNG-NPV)
Cc: user@spark.apache.org
Subject: Re: spark on Windows 2008 failed to save RDD to windows shared folder
The stack trace is related to hdfs.
Can you tell us which hadoop release you are using ?
Is this a secure cluster ?
Thanks
On Fri, May 22, 2015 at 1:55 PM, Wang, Ningjun (LNG-NPV)
ningjun.w...@lexisnexis.commailto:ningjun.w...@lexisnexis.com wrote:
I used spark standalone cluster on Windows 2008. I kept on getting the
following error when trying to save an RDD to a windows shared folder
rdd.saveAsObjectFile(“file:///T:/lab4-win02/IndexRoot01/tobacco-07/myrdd.objfile:///T:\lab4-win02\IndexRoot01\tobacco-07\myrdd.obj”)
15/05/22 16:49:05 ERROR Executor: Exception in task 0.0 in stage 12.0 (TID 12)
java.io.IOException: Mkdirs failed to create
file:/T:/lab4-win02/IndexRoot01/tobacco-07/tmp/docs-150522204904805.op/_temporary/0/_temporary/attempt_201505221649_0012_m_00_12
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:438)
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
at
org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1071)
at
org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:270)
at
org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:527)
at
org.apache.hadoop.mapred.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:63)
at
org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1068)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
The T: drive is mapped to a windows shared folder, e.g. T: -
\\10.196.119.230\mysharefile:///\\10.196.119.230\myshare
The id running spark does have write permission to this folder. It works most
of the time but failed sometime.
Can anybody tell me what is the problem here?
Please advise. Thanks.