Thanks, but broadcast variables won't achieve won't I'm looking to do. I'm not trying to just share a one-time set of data across the cluster. Rather, I'm trying to set up a small cache of info that's constantly being updated based on the records in the dataframe.
DR On Mon, Jan 22, 2018 at 10:41 PM, naresh Goud <nareshgoud.du...@gmail.com> wrote: > If I understand your requirement correct. > Use broadcast variables to replicate across all nodes the small amount of > data you wanted to reuse. > > > > On Mon, Jan 22, 2018 at 9:24 PM David Rosenstrauch <daro...@gmail.com> > wrote: > >> This seems like an easy thing to do, but I've been banging my head >> against the wall for hours trying to get it to work. >> >> I'm processing a spark dataframe (in python). What I want to do is, as >> I'm processing it I want to hold some data from one record in some local >> variables in memory, and then use those values later while I'm processing a >> subsequent record. But I can't see any way to do this. >> >> I tried using: >> >> dataframe.select(a_custom_udf_function('some_column')) >> >> ... and then reading/writing to local variables in the udf function, but >> I can't get this to work properly. >> >> My next guess would be to use dataframe.foreach(a_custom_function) and >> try to save data to local variables in there, but I have a suspicion that >> may not work either. >> >> >> What's the correct way to do something like this in Spark? In Hadoop I >> would just go ahead and declare local variables, and read and write to them >> in my map function as I like. (Although with the knowledge that a) the >> same map function would get repeatedly called for records with many >> different keys, and b) there would be many different instances of my code >> spread across many machines, and so each map function running on an >> instance would only see a subset of the records.) But in Spark it seems to >> be extraordinarily difficult to create local variables that can be read >> from / written to across different records in the dataframe. >> >> Perhaps there's something obvious I'm missing here? If so, any help >> would be greatly appreciated! >> >> Thanks, >> >> DR >> >>