I have a Spark application whose structure is below:

 

    var ts: Long = 0L

    dstream1.foreachRDD{

        (x, time) => {

            ts = time

            x.do_something()...

        }

    }

    ......

    process_data(dstream2, ts, ......)

 

I assume foreachRDD function call can update "ts" variable which is then
used in the Spark tasks of "process_data" function.

 

>From my test result of a standalone Spark cluster, it is working. But
should I concern if switch to YARN?

 

And I saw some articles are recommending to avoid state in Scala
programming. Without the state variable, how could that be done?

 

Any comments or suggestions are appreciated.

 

Thanks,

Haopu

Reply via email to