Hi,

I am running my app in a single machine first before moving it in the
cluster; actually simultaneous actions are not working for me now; is this
comming from the fact that I am using a single machine ? yet I am using
FAIR scheduler.

2016-01-17 21:23 GMT+01:00 Mark Hamstra <m...@clearstorydata.com>:

> It can be far more than that (e.g.
> https://issues.apache.org/jira/browse/SPARK-11838), and is generally
> either unrecognized or a greatly under-appreciated and underused feature of
> Spark.
>
> On Sun, Jan 17, 2016 at 12:20 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> the re-use of shuffle files is always a nice surprise to me
>>
>> On Sun, Jan 17, 2016 at 3:17 PM, Mark Hamstra <m...@clearstorydata.com>
>> wrote:
>>
>>> Same SparkContext means same pool of Workers.  It's up to the Scheduler,
>>> not the SparkContext, whether the exact same Workers or Executors will be
>>> used to calculate simultaneous actions against the same RDD.  It is likely
>>> that many of the same Workers and Executors will be used as the Scheduler
>>> tries to preserve data locality, but that is not guaranteed.  In fact, what
>>> is most likely to happen is that the shared Stages and Tasks being
>>> calculated for the simultaneous actions will not actually be run at exactly
>>> the same time, which means that shuffle files produced for one action will
>>> be reused by the other(s), and repeated calculations will be avoided even
>>> without explicitly caching/persisting the RDD.
>>>
>>> On Sun, Jan 17, 2016 at 8:06 AM, Koert Kuipers <ko...@tresata.com>
>>> wrote:
>>>
>>>> Same rdd means same sparkcontext means same workers
>>>>
>>>> Cache/persist the rdd to avoid repeated jobs
>>>> On Jan 17, 2016 5:21 AM, "Mennour Rostom" <mennou...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Thank you all for your answers,
>>>>>
>>>>> If I correctly understand, actions (in my case foreach) can be run
>>>>> concurrently and simultaneously on the SAME rdd, (which is logical because
>>>>> they are read only object). however, I want to know if the same workers 
>>>>> are
>>>>> used for the concurrent analysis ?
>>>>>
>>>>> Thank you
>>>>>
>>>>> 2016-01-15 21:11 GMT+01:00 Jakob Odersky <joder...@gmail.com>:
>>>>>
>>>>>> I stand corrected. How considerable are the benefits though? Will the
>>>>>> scheduler be able to dispatch jobs from both actions simultaneously (or 
>>>>>> on
>>>>>> a when-workers-become-available basis)?
>>>>>>
>>>>>> On 15 January 2016 at 11:44, Koert Kuipers <ko...@tresata.com> wrote:
>>>>>>
>>>>>>> we run multiple actions on the same (cached) rdd all the time, i
>>>>>>> guess in different threads indeed (its in akka)
>>>>>>>
>>>>>>> On Fri, Jan 15, 2016 at 2:40 PM, Matei Zaharia <
>>>>>>> matei.zaha...@gmail.com> wrote:
>>>>>>>
>>>>>>>> RDDs actually are thread-safe, and quite a few applications use
>>>>>>>> them this way, e.g. the JDBC server.
>>>>>>>>
>>>>>>>> Matei
>>>>>>>>
>>>>>>>> On Jan 15, 2016, at 2:10 PM, Jakob Odersky <joder...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> I don't think RDDs are threadsafe.
>>>>>>>> More fundamentally however, why would you want to run RDD actions
>>>>>>>> in parallel? The idea behind RDDs is to provide you with an 
>>>>>>>> abstraction for
>>>>>>>> computing parallel operations on distributed data. Even if you were to 
>>>>>>>> call
>>>>>>>> actions from several threads at once, the individual executors of your
>>>>>>>> spark environment would still have to perform operations sequentially.
>>>>>>>>
>>>>>>>> As an alternative, I would suggest to restructure your RDD
>>>>>>>> transformations to compute the required results in one single 
>>>>>>>> operation.
>>>>>>>>
>>>>>>>> On 15 January 2016 at 06:18, Jonathan Coveney <jcove...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Threads
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> El viernes, 15 de enero de 2016, Kira <mennou...@gmail.com>
>>>>>>>>> escribió:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Can we run *simultaneous* actions on the *same RDD* ?; if yes how
>>>>>>>>>> can this
>>>>>>>>>> be done ?
>>>>>>>>>>
>>>>>>>>>> Thank you,
>>>>>>>>>> Regards
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> View this message in context:
>>>>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/simultaneous-actions-tp25977.html
>>>>>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>>>>>> Nabble.com <http://nabble.com>.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>>>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>

Reply via email to