I'm currently developing a Spark Streaming application. I have a function that receives an RDD and an object instance as a parameter, and returns an RDD:
def doTheThing(a: RDD[A], b: B): RDD[C] Within the function, I do some processing within a map of the RDD. Like this: def doTheThing(a: RDD[A], b: B): RDD[C] { a.combineByKey(...).map(b.function(_)) } I combine the RDD by key, then map the results calling a function of instance b, and return the results. Here is where I ran into trouble. In a unit test running Spark in memory, I was able to convince myself that this worked well. But in our development environment, the returned RDD results were empty and b.function(_) was never executed. However, when I added an otherwise useless foreach: doTheThing(a: RDD[A], b: B): RDD[C] { val results = a.combineByKey(...).map(b.function(_)) results.foreach( p => p ) results } Then it works. So, basically, adding an extra foreach iteration appears to cause b.function(_) to execute and returns results correctly. I find this confusing. Can anyone shed some light on why this would be? Thank you, Jeff