Re: [Spark Streaming] is spark.streaming.concurrentJobs a per node or a cluster global value ?

thomas lavocat Mon, 11 Jun 2018 02:19:40 -0700

Thank you very much for your answer.

Since I don't have dependent jobs I will continue to use this functionality.



On 05/06/2018 13:52, Saisai Shao wrote:

"dependent" I mean this batch's job relies on the previous batch'sresult. So this batch should wait for the finish of previous batch, ifyou set "spark.streaming.concurrentJobs" larger than 1, then thecurrent batch could start without waiting for the previous batch (ifit is delayed), which will lead to unexpected results.

thomas lavocat <thomas.lavo...@univ-grenoble-alpes.fr<mailto:thomas.lavo...@univ-grenoble-alpes.fr>> 于2018年6月5日周二下午7:48写道：



    On 05/06/2018 13:44, Saisai Shao wrote:

    You need to read the code, this is an undocumented configuration.

    I'm on it right now, but, Spark is a big piece of software.

    Basically this will break the ordering of Streaming jobs, AFAIK
    it may get unexpected results if you streaming jobs are not
    independent.

    What do you mean exactly by not independent ?
    Are several source joined together dependent ?

    Thanks,
    Thomas


    thomas lavocat <thomas.lavo...@univ-grenoble-alpes.fr
    <mailto:thomas.lavo...@univ-grenoble-alpes.fr>> 于2018年6月5日周二
    下午7:17写道：

        Hello,

        Thank's for your answer.


        On 05/06/2018 11:24, Saisai Shao wrote:

        spark.streaming.concurrentJobs is a driver side internal
        configuration, this means that how many streaming jobs can
        be submitted concurrently in one batch. Usually this should
        not be configured by user, unless you're familiar with Spark
        Streaming internals, and know the implication of this
        configuration.


        How can I find some documentation about those implications ?

        I've experimented some configuration of this parameters and
        found out that my overall throughput is increased in
        correlation with this property.
        But I'm experiencing scalability issues. With more than 16
        receivers spread over 8 executors, my executors no longer
        receive work from the driver and fall idle.
        Is there an explanation ?

        Thanks,
        Thomas

Re: [Spark Streaming] is spark.streaming.concurrentJobs a per node or a cluster global value ?

Reply via email to