Re: simultaneous actions

2016-01-18 Thread Debasish Das
Simultaneous action works on cluster fine if they are independent...on local I never paid attention but the code path should be similar... On Jan 18, 2016 8:00 AM, "Koert Kuipers" wrote: > stacktrace? details? > > On Mon, Jan 18, 2016 at 5:58 AM, Mennour Rostom

Re: simultaneous actions

2016-01-18 Thread Mennour Rostom
Hi, I am running my app in a single machine first before moving it in the cluster; actually simultaneous actions are not working for me now; is this comming from the fact that I am using a single machine ? yet I am using FAIR scheduler. 2016-01-17 21:23 GMT+01:00 Mark Hamstra

Re: simultaneous actions

2016-01-18 Thread Koert Kuipers
stacktrace? details? On Mon, Jan 18, 2016 at 5:58 AM, Mennour Rostom wrote: > Hi, > > I am running my app in a single machine first before moving it in the > cluster; actually simultaneous actions are not working for me now; is this > comming from the fact that I am using a

Re: simultaneous actions

2016-01-17 Thread Koert Kuipers
the re-use of shuffle files is always a nice surprise to me On Sun, Jan 17, 2016 at 3:17 PM, Mark Hamstra wrote: > Same SparkContext means same pool of Workers. It's up to the Scheduler, > not the SparkContext, whether the exact same Workers or Executors will be > used

Re: simultaneous actions

2016-01-17 Thread Matei Zaharia
They'll be able to run concurrently and share workers / data. Take a look at http://spark.apache.org/docs/latest/job-scheduling.html for how scheduling happens across multiple running jobs in the same SparkContext. Matei > On Jan 17,

Re: simultaneous actions

2016-01-17 Thread Mark Hamstra
Same SparkContext means same pool of Workers. It's up to the Scheduler, not the SparkContext, whether the exact same Workers or Executors will be used to calculate simultaneous actions against the same RDD. It is likely that many of the same Workers and Executors will be used as the Scheduler

Re: simultaneous actions

2016-01-17 Thread Mark Hamstra
It can be far more than that (e.g. https://issues.apache.org/jira/browse/SPARK-11838), and is generally either unrecognized or a greatly under-appreciated and underused feature of Spark. On Sun, Jan 17, 2016 at 12:20 PM, Koert Kuipers wrote: > the re-use of shuffle files is

Re: simultaneous actions

2016-01-17 Thread Mennour Rostom
Hi, Thank you all for your answers, If I correctly understand, actions (in my case foreach) can be run concurrently and simultaneously on the SAME rdd, (which is logical because they are read only object). however, I want to know if the same workers are used for the concurrent analysis ? Thank

Re: simultaneous actions

2016-01-17 Thread Koert Kuipers
Same rdd means same sparkcontext means same workers Cache/persist the rdd to avoid repeated jobs On Jan 17, 2016 5:21 AM, "Mennour Rostom" wrote: > Hi, > > Thank you all for your answers, > > If I correctly understand, actions (in my case foreach) can be run > concurrently

Re: simultaneous actions

2016-01-15 Thread Jonathan Coveney
Threads El viernes, 15 de enero de 2016, Kira escribió: > Hi, > > Can we run *simultaneous* actions on the *same RDD* ?; if yes how can this > be done ? > > Thank you, > Regards > > > > -- > View this message in context: >

Re: simultaneous actions

2016-01-15 Thread Sean Owen
Can you run N jobs depending on the same RDD in parallel on the driver? certainly. The context / scheduling is thread-safe and the RDD is immutable. I've done this to, for example, build and evaluate a bunch of models simultaneously on a big cluster. On Fri, Jan 15, 2016 at 7:10 PM, Jakob Odersky

Re: simultaneous actions

2016-01-15 Thread Jakob Odersky
I don't think RDDs are threadsafe. More fundamentally however, why would you want to run RDD actions in parallel? The idea behind RDDs is to provide you with an abstraction for computing parallel operations on distributed data. Even if you were to call actions from several threads at once, the

Re: simultaneous actions

2016-01-15 Thread Jonathan Coveney
SparkContext is thread safe. And RDDs just describe operations. While I generally agree that you want to model as much possible as transformations as possible, this is not always possible. And in that case, you have no option than to use threads. Spark's designers should have made all actions

Re: simultaneous actions

2016-01-15 Thread Matei Zaharia
RDDs actually are thread-safe, and quite a few applications use them this way, e.g. the JDBC server. Matei > On Jan 15, 2016, at 2:10 PM, Jakob Odersky wrote: > > I don't think RDDs are threadsafe. > More fundamentally however, why would you want to run RDD actions in >

Re: simultaneous actions

2016-01-15 Thread Koert Kuipers
we run multiple actions on the same (cached) rdd all the time, i guess in different threads indeed (its in akka) On Fri, Jan 15, 2016 at 2:40 PM, Matei Zaharia wrote: > RDDs actually are thread-safe, and quite a few applications use them this > way, e.g. the JDBC

Re: simultaneous actions

2016-01-15 Thread Jakob Odersky
I stand corrected. How considerable are the benefits though? Will the scheduler be able to dispatch jobs from both actions simultaneously (or on a when-workers-become-available basis)? On 15 January 2016 at 11:44, Koert Kuipers wrote: > we run multiple actions on the same

Re: simultaneous actions

2016-01-15 Thread Sean Owen
It makes sense if you're parallelizing jobs that have relatively few tasks, and have a lot of execution slots available. It makes sense to turn them loose all at once and try to use the parallelism available. There are downsides, eventually: for example, N jobs accessing one cached RDD may