Hi, I am running my app in a single machine first before moving it in the cluster; actually simultaneous actions are not working for me now; is this comming from the fact that I am using a single machine ? yet I am using FAIR scheduler.
2016-01-17 21:23 GMT+01:00 Mark Hamstra <m...@clearstorydata.com>: > It can be far more than that (e.g. > https://issues.apache.org/jira/browse/SPARK-11838), and is generally > either unrecognized or a greatly under-appreciated and underused feature of > Spark. > > On Sun, Jan 17, 2016 at 12:20 PM, Koert Kuipers <ko...@tresata.com> wrote: > >> the re-use of shuffle files is always a nice surprise to me >> >> On Sun, Jan 17, 2016 at 3:17 PM, Mark Hamstra <m...@clearstorydata.com> >> wrote: >> >>> Same SparkContext means same pool of Workers. It's up to the Scheduler, >>> not the SparkContext, whether the exact same Workers or Executors will be >>> used to calculate simultaneous actions against the same RDD. It is likely >>> that many of the same Workers and Executors will be used as the Scheduler >>> tries to preserve data locality, but that is not guaranteed. In fact, what >>> is most likely to happen is that the shared Stages and Tasks being >>> calculated for the simultaneous actions will not actually be run at exactly >>> the same time, which means that shuffle files produced for one action will >>> be reused by the other(s), and repeated calculations will be avoided even >>> without explicitly caching/persisting the RDD. >>> >>> On Sun, Jan 17, 2016 at 8:06 AM, Koert Kuipers <ko...@tresata.com> >>> wrote: >>> >>>> Same rdd means same sparkcontext means same workers >>>> >>>> Cache/persist the rdd to avoid repeated jobs >>>> On Jan 17, 2016 5:21 AM, "Mennour Rostom" <mennou...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> Thank you all for your answers, >>>>> >>>>> If I correctly understand, actions (in my case foreach) can be run >>>>> concurrently and simultaneously on the SAME rdd, (which is logical because >>>>> they are read only object). however, I want to know if the same workers >>>>> are >>>>> used for the concurrent analysis ? >>>>> >>>>> Thank you >>>>> >>>>> 2016-01-15 21:11 GMT+01:00 Jakob Odersky <joder...@gmail.com>: >>>>> >>>>>> I stand corrected. How considerable are the benefits though? Will the >>>>>> scheduler be able to dispatch jobs from both actions simultaneously (or >>>>>> on >>>>>> a when-workers-become-available basis)? >>>>>> >>>>>> On 15 January 2016 at 11:44, Koert Kuipers <ko...@tresata.com> wrote: >>>>>> >>>>>>> we run multiple actions on the same (cached) rdd all the time, i >>>>>>> guess in different threads indeed (its in akka) >>>>>>> >>>>>>> On Fri, Jan 15, 2016 at 2:40 PM, Matei Zaharia < >>>>>>> matei.zaha...@gmail.com> wrote: >>>>>>> >>>>>>>> RDDs actually are thread-safe, and quite a few applications use >>>>>>>> them this way, e.g. the JDBC server. >>>>>>>> >>>>>>>> Matei >>>>>>>> >>>>>>>> On Jan 15, 2016, at 2:10 PM, Jakob Odersky <joder...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> I don't think RDDs are threadsafe. >>>>>>>> More fundamentally however, why would you want to run RDD actions >>>>>>>> in parallel? The idea behind RDDs is to provide you with an >>>>>>>> abstraction for >>>>>>>> computing parallel operations on distributed data. Even if you were to >>>>>>>> call >>>>>>>> actions from several threads at once, the individual executors of your >>>>>>>> spark environment would still have to perform operations sequentially. >>>>>>>> >>>>>>>> As an alternative, I would suggest to restructure your RDD >>>>>>>> transformations to compute the required results in one single >>>>>>>> operation. >>>>>>>> >>>>>>>> On 15 January 2016 at 06:18, Jonathan Coveney <jcove...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Threads >>>>>>>>> >>>>>>>>> >>>>>>>>> El viernes, 15 de enero de 2016, Kira <mennou...@gmail.com> >>>>>>>>> escribió: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Can we run *simultaneous* actions on the *same RDD* ?; if yes how >>>>>>>>>> can this >>>>>>>>>> be done ? >>>>>>>>>> >>>>>>>>>> Thank you, >>>>>>>>>> Regards >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> View this message in context: >>>>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/simultaneous-actions-tp25977.html >>>>>>>>>> Sent from the Apache Spark User List mailing list archive at >>>>>>>>>> Nabble.com <http://nabble.com>. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>>>>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>> >> >