Re: [Spark Streaming] is spark.streaming.concurrentJobs a per node or a cluster global value ?

thomas lavocat Tue, 05 Jun 2018 04:48:27 -0700


On 05/06/2018 13:44, Saisai Shao wrote:

You need to read the code, this is an undocumented configuration.

I'm on it right now, but, Spark is a big piece of software.

Basically this will break the ordering of Streaming jobs, AFAIK it mayget unexpected results if you streaming jobs are not independent.

What do you mean exactly by not independent ?
Are several source joined together dependent ?


Thanks,
Thomas

thomas lavocat <thomas.lavo...@univ-grenoble-alpes.fr<mailto:thomas.lavo...@univ-grenoble-alpes.fr>> 于2018年6月5日周二下午7:17写道：


    Hello,

    Thank's for your answer.


    On 05/06/2018 11:24, Saisai Shao wrote:

    spark.streaming.concurrentJobs is a driver side internal
    configuration, this means that how many streaming jobs can be
    submitted concurrently in one batch. Usually this should not be
    configured by user, unless you're familiar with Spark Streaming
    internals, and know the implication of this configuration.


    How can I find some documentation about those implications ?

    I've experimented some configuration of this parameters and found
    out that my overall throughput is increased in correlation with
    this property.
    But I'm experiencing scalability issues. With more than 16
    receivers spread over 8 executors, my executors no longer receive
    work from the driver and fall idle.
    Is there an explanation ?

    Thanks,
    Thomas

Re: [Spark Streaming] is spark.streaming.concurrentJobs a per node or a cluster global value ?

Reply via email to