I am not familiar with that particular piece of code. But the spark's concurrency comes from Multi-thread. One executor will use multi threads to process tasks, and these tasks share the JVM memory of the executor. So it won't be surprised that Spark needs some blocking/sync for the memory some places. Yong
> Date: Fri, 27 May 2016 20:21:46 +0200 > Subject: Re: Not able to write output to local filsystem from Standalone mode. > From: ja...@japila.pl > To: java8...@hotmail.com > CC: math...@closetwork.org; stutiawas...@hcl.com; user@spark.apache.org > > Hi Yong, > > It makes sense...almost. :) I'm not sure how relevant it is, but just > today was reviewing BlockInfoManager code with the locks for reading > and writing, and my understanding of the code shows that Spark if fine > when there are multiple attempts for writes of new memory blocks > (pages) with a mere synchronized code block. See > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala#L324-L325 > > With that, it's not that simple to say "that just makes sense". > > p.s. The more I know the less things "just make sense to me". > > Pozdrawiam, > Jacek Laskowski > ---- > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Fri, May 27, 2016 at 3:42 AM, Yong Zhang <java8...@hotmail.com> wrote: > > That just makes sense, doesn't it? > > > > The only place will be driver. If not, the executor will be having > > contention by whom should create the directory in this case. > > > > Only the coordinator (driver in this case) is the best place for doing it. > > > > Yong > > > > ________________________________ > > From: math...@closetwork.org > > Date: Wed, 25 May 2016 18:23:02 +0000 > > Subject: Re: Not able to write output to local filsystem from Standalone > > mode. > > To: ja...@japila.pl > > CC: stutiawas...@hcl.com; user@spark.apache.org > > > > > > Experience. I don't use Mesos or Yarn or Hadoop, so I don't know. > > > > > > On Wed, May 25, 2016 at 2:51 AM Jacek Laskowski <ja...@japila.pl> wrote: > > > > Hi Mathieu, > > > > Thanks a lot for the answer! I did *not* know it's the driver to > > create the directory. > > > > You said "standalone mode", is this the case for the other modes - > > yarn and mesos? > > > > p.s. Did you find it in the code or...just experienced before? #curious > > > > Pozdrawiam, > > Jacek Laskowski > > ---- > > https://medium.com/@jaceklaskowski/ > > Mastering Apache Spark http://bit.ly/mastering-apache-spark > > Follow me at https://twitter.com/jaceklaskowski > > > > > > On Tue, May 24, 2016 at 4:04 PM, Mathieu Longtin <math...@closetwork.org> > > wrote: > >> In standalone mode, executor assume they have access to a shared file > >> system. The driver creates the directory and the executor write files, so > >> the executors end up not writing anything since there is no local > >> directory. > >> > >> On Tue, May 24, 2016 at 8:01 AM Stuti Awasthi <stutiawas...@hcl.com> > >> wrote: > >>> > >>> hi Jacek, > >>> > >>> Parent directory already present, its my home directory. Im using Linux > >>> (Redhat) machine 64 bit. > >>> Also I noticed that "test1" folder is created in my master with > >>> subdirectory as "_temporary" which is empty. but on slaves, no such > >>> directory is created under /home/stuti. > >>> > >>> Thanks > >>> Stuti > >>> ________________________________ > >>> From: Jacek Laskowski [ja...@japila.pl] > >>> Sent: Tuesday, May 24, 2016 5:27 PM > >>> To: Stuti Awasthi > >>> Cc: user > >>> Subject: Re: Not able to write output to local filsystem from Standalone > >>> mode. > >>> > >>> Hi, > >>> > >>> What happens when you create the parent directory /home/stuti? I think > >>> the > >>> failure is due to missing parent directories. What's the OS? > >>> > >>> Jacek > >>> > >>> On 24 May 2016 11:27 a.m., "Stuti Awasthi" <stutiawas...@hcl.com> wrote: > >>> > >>> Hi All, > >>> > >>> I have 3 nodes Spark 1.6 Standalone mode cluster with 1 Master and 2 > >>> Slaves. Also Im not having Hadoop as filesystem . Now, Im able to launch > >>> shell , read the input file from local filesystem and perform > >>> transformation > >>> successfully. When I try to write my output in local filesystem path then > >>> I > >>> receive below error . > >>> > >>> > >>> > >>> I tried to search on web and found similar Jira : > >>> https://issues.apache.org/jira/browse/SPARK-2984 . Even though it shows > >>> resolved for Spark 1.3+ but already people have posted the same issue > >>> still > >>> persists in latest versions. > >>> > >>> > >>> > >>> ERROR > >>> > >>> scala> data.saveAsTextFile("/home/stuti/test1") > >>> > >>> 16/05/24 05:03:42 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 2, > >>> server1): java.io.IOException: The temporary job-output directory > >>> file:/home/stuti/test1/_temporary doesn't exist! > >>> > >>> at > >>> > >>> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250) > >>> > >>> at > >>> > >>> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244) > >>> > >>> at > >>> > >>> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116) > >>> > >>> at > >>> org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91) > >>> > >>> at > >>> > >>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1193) > >>> > >>> at > >>> > >>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1185) > >>> > >>> at > >>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > >>> > >>> at org.apache.spark.scheduler.Task.run(Task.scala:89) > >>> > >>> at > >>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > >>> > >>> at > >>> > >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > >>> > >>> at > >>> > >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > >>> > >>> at java.lang.Thread.run(Thread.java:745) > >>> > >>> > >>> > >>> What is the best way to resolve this issue if suppose I don’t want to > >>> have > >>> Hadoop installed OR is it mandatory to have Hadoop to write the output > >>> from > >>> Standalone cluster mode. > >>> > >>> > >>> > >>> Please suggest. > >>> > >>> > >>> > >>> Thanks &Regards > >>> > >>> Stuti Awasthi > >>> > >>> > >>> > >>> > >>> > >>> ::DISCLAIMER:: > >>> > >>> > >>> ---------------------------------------------------------------------------------------------------------------------------------------------------- > >>> > >>> The contents of this e-mail and any attachment(s) are confidential and > >>> intended for the named recipient(s) only. > >>> E-mail transmission is not guaranteed to be secure or error-free as > >>> information could be intercepted, corrupted, > >>> lost, destroyed, arrive late or incomplete, or may contain viruses in > >>> transmission. The e mail and its contents > >>> (with or without referred errors) shall therefore not attach any > >>> liability > >>> on the originator or HCL or its affiliates. > >>> Views or opinions, if any, presented in this email are solely those of > >>> the > >>> author and may not necessarily reflect the > >>> views or opinions of HCL or its affiliates. Any form of reproduction, > >>> dissemination, copying, disclosure, modification, > >>> distribution and / or publication of this message without the prior > >>> written consent of authorized representative of > >>> HCL is strictly prohibited. If you have received this email in error > >>> please delete it and notify the sender immediately. > >>> Before opening any email and/or attachments, please check them for > >>> viruses > >>> and other defects. > >>> > >>> > >>> > >>> ---------------------------------------------------------------------------------------------------------------------------------------------------- > >>> > >>> --------------------------------------------------------------------- To > >>> unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional > >>> commands, e-mail: user-h...@spark.apache.org > >> > >> -- > >> Mathieu Longtin > >> 1-514-803-8977 > > > > -- > > Mathieu Longtin > > 1-514-803-8977