On Sun, Nov 9, 2014 at 1:51 AM, Tathagata Das tathagata.das1...@gmail.com
wrote:
This causes a scalability vs. latency tradeoff - if your limit is 1000
tasks per second (simplifying from 1500), you could either configure
it to use 100 receivers at 100 ms batches (10 blocks/sec), or 1000
Too bad Nick, I dont have anything immediately ready that tests Spark
Streaming with those extreme settings. :)
On Mon, Nov 10, 2014 at 9:56 AM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
On Sun, Nov 9, 2014 at 1:51 AM, Tathagata Das tathagata.das1...@gmail.com
wrote:
This causes a
However, I haven't seen it be as
high as the 100ms Michael quoted (maybe this was for jobs with tasks that
have much larger objects that take a long time to deserialize?).
I was thinking more about the average end-to-end latency for launching a
query that has 100s of partitions. Its also
I just watched Kay's talk from 2013 on Sparrow
https://www.youtube.com/watch?v=ayjH_bG-RC0. Is replacing Spark's native
scheduler with Sparrow still on the books?
The Sparrow repo https://github.com/radlab/sparrow hasn't been updated
recently, and I don't see any JIRA issues about it.
It would
larger clusters, such that Sparrow will be necessary!
-Kay
On Fri, Nov 7, 2014 at 3:05 PM, Nicholas Chammas nicholas.cham...@gmail.com
wrote:
I just watched Kay's talk from 2013 on Sparrow
https://www.youtube.com/watch?v=ayjH_bG-RC0. Is replacing Spark's native
scheduler with Sparrow still
If, for example, you have a cluster of 100 machines, this means the
scheduler can launch 150 tasks per machine per second.
Did you mean 15 tasks per machine per second here? Or alternatively, 10
machines?
I don't know of any existing Spark clusters that have a large enough number
of
On Fri, Nov 7, 2014 at 6:20 PM, Nicholas Chammas nicholas.cham...@gmail.com
wrote:
If, for example, you have a cluster of 100 machines, this means the
scheduler can launch 150 tasks per machine per second.
Did you mean 15 tasks per machine per second here? Or alternatively, 10
machines?
Sounds good. I'm looking forward to tracking improvements in this area.
Also, just to connect some more dots here, I just remembered that there is
currently an initiative to add an IndexedRDD
https://issues.apache.org/jira/browse/SPARK-2365 interface. Some
interesting use cases mentioned there
On Fri, Nov 7, 2014 at 8:04 PM, Nicholas Chammas nicholas.cham...@gmail.com
wrote:
Sounds good. I'm looking forward to tracking improvements in this area.
Also, just to connect some more dots here, I just remembered that there is
currently an initiative to add an IndexedRDD
Hmm, relevant quote from section 3.3:
newer frameworks like Spark [35] reduce the overhead to 5ms. To support
tasks that complete in hundreds of mil- liseconds, we argue for reducing
task launch overhead even further to 1ms so that launch overhead
constitutes at most 1% of task runtime. By
I think Kay might be able to give a better answer. The most recent
benchmark I remember had the number at at somewhere between 8.6ms and
14.6ms depending on the Spark version (
https://github.com/apache/spark/pull/2030#issuecomment-52715181). Another
point to note is that this is the total time to
I don't have much more info than what Shivaram said. My sense is that,
over time, task launch overhead with Spark has slowly grown as Spark
supports more and more functionality. However, I haven't seen it be as
high as the 100ms Michael quoted (maybe this was for jobs with tasks that
have much
12 matches
Mail list logo