Can someone look at my questions? Thanks again!

 

________________________________

From: Haopu Wang 
Sent: 2016年6月12日 16:40
To: u...@spark.apache.org
Subject: Should I avoid "state" in an Spark application?

 

I have a Spark application whose structure is below:

 

    var ts: Long = 0L

    dstream1.foreachRDD{

        (x, time) => {

            ts = time

            x.do_something()...

        }

    }

    ......

    process_data(dstream2, ts, ......)

 

I assume foreachRDD function call can update "ts" variable which is then used 
in the Spark tasks of "process_data" function.

 

>From my test result of a standalone Spark cluster, it is working. But should I 
>concern if switch to YARN?

 

And I saw some articles are recommending to avoid state in Scala programming. 
Without the state variable, how could that be done?

 

Any comments or suggestions are appreciated.

 

Thanks,

Haopu

Reply via email to