*"The thing is that foreach forces materialization of the RDD and it seems to be executed on the driver program"* What makes you think that? No, foreach is run in the executors (distributed) and not in the driver.
2015-07-02 18:32 GMT+02:00 Alexandre Rodrigues < alex.jose.rodrig...@gmail.com>: > Hi Spark devs, > > I'm coding a spark job and at a certain point in execution I need to send > some data present in an RDD to an external system. > > val myRdd = .... > > myRdd.foreach { record => > sendToWhtv(record) > } > > The thing is that foreach forces materialization of the RDD and it seems > to be executed on the driver program, which is not very benefitial in my > case. So I changed the logic to a Map (mapWithParititons, but it's the > same). > > val newRdd = myRdd.map { record => > sendToWhtv(record) > } > newRdd.count() > > My understanding is that map is a transformation operation and then I have > to force materialization by invoking some action (such as count). Is this > the correct way to do this kind of distributed foreach or is there any > other function to achieve this that doesn't necessarily imply a data > transformation or a returned RDD ? > > > Thanks, > Alex > >