RE: spark on Windows 2008 failed to save RDD to windows shared folder

2015-05-26 Thread Wang, Ningjun (LNG-NPV)
It is Hadoop-2.4.0 with spark-1.3.0.

I found that the problem only happen if there are multi nodes. If the cluster 
has only one node, it works fine.

For example if the cluster has a spark-master on machine A and a spark-worker 
on machine B, this problem happen. If both spark-master and spark-worker are on 
machine A, then no problem.

I do not use HDFS. I am just saving the RDD to a window share folder
rdd.saveAsObjectFile(“file:///T:/lab4-win02/IndexRoot01/tobacco-07/myrdd.objfile:///T:\lab4-win02\IndexRoot01\tobacco-07\myrdd.obj”)

With T: drive mapped to   
\\10.196.119.230\mysharefile:///\\10.196.119.230\myshare

Ningjun

From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Friday, May 22, 2015 5:02 PM
To: Wang, Ningjun (LNG-NPV)
Cc: user@spark.apache.org
Subject: Re: spark on Windows 2008 failed to save RDD to windows shared folder

The stack trace is related to hdfs.

Can you tell us which hadoop release you are using ?

Is this a secure cluster ?

Thanks

On Fri, May 22, 2015 at 1:55 PM, Wang, Ningjun (LNG-NPV) 
ningjun.w...@lexisnexis.commailto:ningjun.w...@lexisnexis.com wrote:
I used spark standalone cluster on Windows 2008. I kept on getting the 
following error when trying to save an RDD to a windows shared folder

rdd.saveAsObjectFile(“file:///T:/lab4-win02/IndexRoot01/tobacco-07/myrdd.objfile:///T:\lab4-win02\IndexRoot01\tobacco-07\myrdd.obj”)

15/05/22 16:49:05 ERROR Executor: Exception in task 0.0 in stage 12.0 (TID 12)
java.io.IOException: Mkdirs failed to create 
file:/T:/lab4-win02/IndexRoot01/tobacco-07/tmp/docs-150522204904805.op/_temporary/0/_temporary/attempt_201505221649_0012_m_00_12
at 
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:438)
at 
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
at 
org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1071)
at 
org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:270)
at 
org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:527)
at 
org.apache.hadoop.mapred.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:63)
at 
org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1068)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059)
at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
The T: drive is mapped to a windows shared folder, e.g.  T:  -  
\\10.196.119.230\mysharefile:///\\10.196.119.230\myshare

The id running spark does have write permission to this folder. It works most 
of the time but failed sometime.

Can anybody tell me what is the problem here?

Please advise. Thanks.



Re: spark on Windows 2008 failed to save RDD to windows shared folder

2015-05-22 Thread Ted Yu
The stack trace is related to hdfs.

Can you tell us which hadoop release you are using ?

Is this a secure cluster ?

Thanks

On Fri, May 22, 2015 at 1:55 PM, Wang, Ningjun (LNG-NPV) 
ningjun.w...@lexisnexis.com wrote:

  I used spark standalone cluster on Windows 2008. I kept on getting the
 following error when trying to save an RDD to a windows shared folder




 rdd.saveAsObjectFile(“file:///T:/lab4-win02/IndexRoot01/tobacco-07/myrdd.obj”)



 15/05/22 16:49:05 ERROR Executor: Exception in task 0.0 in stage 12.0 (TID
 12)

 java.io.IOException: Mkdirs failed to create
 file:/T:/lab4-win02/IndexRoot01/tobacco-07/tmp/docs-150522204904805.op/_temporary/0/_temporary/attempt_201505221649_0012_m_00_12

 at
 org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:438)

 at
 org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)

 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)

 at
 org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1071)

 at
 org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:270)

 at
 org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:527)

 at
 org.apache.hadoop.mapred.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:63)

 at
 org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90)

 at
 org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1068)

 at
 org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059)

 at
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)

 at org.apache.spark.scheduler.Task.run(Task.scala:64)

 at
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)

 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

 at java.lang.Thread.run(Thread.java:745)

  The T: drive is mapped to a windows shared folder, e.g.  T:  -
 \\10.196.119.230\myshare



 The id running spark does have write permission to this folder. It works
 most of the time but failed sometime.



 Can anybody tell me what is the problem here?



 Please advise. Thanks.