Hi! I have some application (skeleton):
val sc = new SparkContext($SOME_CONF) val input = sc.textFile(inputFile) val result = input.map(record => { val myState = new MyState() // state }) .filter($SOME_FILTER) .sortBy($SOME_SORT) .partitionBy(new HashPartitioner(100)) .reduceByKey((first,current) => current.updateRecord(first)) val report = result.map {x => x._2.generateReport(x._1)}.coalesce(1) report.saveAsTextFile($SOME_OUTPUT) // classes case class AccumulatedState() { ... } case class MyState() { var data: AccumulatedState = new AccumulatedState() ... } // EOC I map the input by UUID, Also, I have a custom state (MyState that holds AccumulatedState), If the application run in a local mode, or on a real cluster with a single reducer - the behavior is correct, but, if I am trying to run it with a multi-reducers, the state was reset somewhere in the processing. How a shared state maintained in the lifecycle of the application? I know that there are Accumulators & Broadcast variables, but those standing for different use-case (counters & global static (like lookup tables)). How can I have such shared state across the program till the end to generate my results? Thanks!!! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Maintain-a-state-till-the-end-of-the-application-tp25944.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org