Hi Spark devs,

I'm coding a spark job and at a certain point in execution I need to send
some data present in an RDD to an external system.

val myRdd = ....

myRdd.foreach { record =>
  sendToWhtv(record)
}

The thing is that foreach forces materialization of the RDD and it seems to
be executed on the driver program, which is not very benefitial in my case.
So I changed the logic to a Map (mapWithParititons, but it's the same).

val newRdd = myRdd.map { record =>
  sendToWhtv(record)
}
newRdd.count()

My understanding is that map is a transformation operation and then I have
to force materialization by invoking some action (such as count). Is this
the correct way to do this kind of distributed foreach or is there any
other function to achieve this that doesn't necessarily imply a data
transformation or a returned RDD ?


Thanks,
Alex

Reply via email to