Re: Replacing Spark's native scheduler with Sparrow

Kay Ousterhout Fri, 07 Nov 2014 22:24:07 -0800

I don't have much more info than what Shivaram said.  My sense is that,
over time, task launch overhead with Spark has slowly grown as Spark
supports more and more functionality.  However, I haven't seen it be as
high as the 100ms Michael quoted (maybe this was for jobs with tasks that
have much larger objects that take a long time to deserialize?).
Fortunately, the UI now quantifies this: if you click "Show Additional
Metrics", the scheduler delay (which basically represents the overhead of
shipping the task to the worker and getting the result back), the task
deserialization time, and the result serialization time all represent parts
of the task launch overhead.  So, you can use the UI to get a sense of what
this overhead is for the workload you're considering and whether it's worth
optimizing.


-Kay

On Fri, Nov 7, 2014 at 9:43 PM, Shivaram Venkataraman <
shiva...@eecs.berkeley.edu> wrote:

> I think Kay might be able to give a better answer. The most recent
> benchmark I remember had the number at at somewhere between 8.6ms and
> 14.6ms depending on the Spark version (
> https://github.com/apache/spark/pull/2030#issuecomment-52715181). Another
> point to note is that this is the total time to run a null job, so this
> includes scheduling + task launch + time to send back results etc.
>
> Shivaram
>
> On Fri, Nov 7, 2014 at 9:23 PM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> Hmm, relevant quote from section 3.3:
>>
>> newer frameworks like Spark [35] reduce the overhead to 5ms. To support
>>> tasks that complete in hundreds of mil- liseconds, we argue for reducing
>>> task launch overhead even further to 1ms so that launch overhead
>>> constitutes at most 1% of task runtime. By maintaining an active thread
>>> pool for task execution on each worker node and caching binaries, task
>>> launch overhead can be reduced to the time to make a remote procedure call
>>> to the slave machine to launch the task. Today’s datacenter networks easily
>>> allow a RPC to complete within 1ms. In fact, re- cent work showed that 10μs
>>> RPCs are possible in the short term [26]; thus, with careful engineering,
>>> we be- lieve task launch overheads of 50μ s are attainable. 50μ s task
>>> launch overheads would enable even smaller tasks that could read data from
>>> in-memory or from flash stor- age in order to complete in milliseconds.
>>
>>
>> So it looks like I misunderstood the current cost of task initialization.
>> It's already as low as 5ms (and not 100ms)?
>>
>> Nick
>>
>> On Fri, Nov 7, 2014 at 11:15 PM, Shivaram Venkataraman <
>> shiva...@eecs.berkeley.edu> wrote:
>>
>>>
>>>
>>> On Fri, Nov 7, 2014 at 8:04 PM, Nicholas Chammas <
>>> nicholas.cham...@gmail.com> wrote:
>>>
>>>> Sounds good. I'm looking forward to tracking improvements in this area.
>>>>
>>>> Also, just to connect some more dots here, I just remembered that there
>>>> is
>>>> currently an initiative to add an IndexedRDD
>>>> <https://issues.apache.org/jira/browse/SPARK-2365> interface. Some
>>>> interesting use cases mentioned there include (emphasis added):
>>>>
>>>> To address these problems, we propose IndexedRDD, an efficient key-value
>>>> > store built on RDDs. IndexedRDD would extend RDD[(Long, V)] by
>>>> enforcing
>>>> > key uniqueness and pre-indexing the entries for efficient joins and
>>>> *point
>>>> > lookups, updates, and deletions*.
>>>>
>>>>
>>>> GraphX would be the first user of IndexedRDD, since it currently
>>>> implements
>>>> > a limited form of this functionality in VertexRDD. We envision a
>>>> variety of
>>>> > other uses for IndexedRDD, including *streaming updates* to RDDs,
>>>> *direct
>>>> > serving* from RDDs, and as an execution strategy for Spark SQL.
>>>>
>>>>
>>>> Maybe some day we'll have Spark clusters directly serving up point
>>>> lookups
>>>> or updates. I imagine the tasks running on clusters like that would be
>>>> tiny
>>>> and would benefit from very low task startup times and scheduling
>>>> latency.
>>>> Am I painting that picture correctly?
>>>>
>>>> Yeah - we painted a similar picture in a short paper last year titled
>>> "The Case for Tiny Tasks in Compute Clusters"
>>> http://shivaram.org/publications/tinytasks-hotos13.pdf
>>>
>>>> Anyway, thanks for explaining the current status of Sparrow.
>>>>
>>>> Nick
>>>>
>>>
>>>
>>
>

Re: Replacing Spark's native scheduler with Sparrow

Reply via email to