Creating millions of temporary (immutable) objects is bad for performance. It should be simple to do a micro-benchmark locally. -Xiangrui
On Mon, Jun 22, 2015 at 7:25 PM, mzeltser <mzelt...@gmail.com> wrote: > Using StatCounter as an example, I'd like to understand if "pure" functional > implementation would be more or less beneficial for "accumulating" > structures used inside RDD.map > > StatCounter.merge is updating mutable class variables and returning > reference to same object. This is clearly a non-functional implementation > and it mutates existing state of the instance. (Unless I'm missing > something) > > Would it be preferable to have all the class variables declared as val and > create new instance to hold merged values? > > The StatCounter would be used inside the RDD.map to collect stats on the > fly. > Would mutable state present bottleneck? > > Can anybody comment on why non-functional implementation has been chosen? > > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/mutable-vs-pure-functional-implementation-StatCounter-tp23441.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org