*"The thing is that foreach forces materialization of the RDD and it seems
to be executed on the driver program"*
What makes you think that? No, foreach is run in the executors
(distributed) and not in the driver.

2015-07-02 18:32 GMT+02:00 Alexandre Rodrigues <
alex.jose.rodrig...@gmail.com>:

> Hi Spark devs,
>
> I'm coding a spark job and at a certain point in execution I need to send
> some data present in an RDD to an external system.
>
> val myRdd = ....
>
> myRdd.foreach { record =>
>   sendToWhtv(record)
> }
>
> The thing is that foreach forces materialization of the RDD and it seems
> to be executed on the driver program, which is not very benefitial in my
> case. So I changed the logic to a Map (mapWithParititons, but it's the
> same).
>
> val newRdd = myRdd.map { record =>
>   sendToWhtv(record)
> }
> newRdd.count()
>
> My understanding is that map is a transformation operation and then I have
> to force materialization by invoking some action (such as count). Is this
> the correct way to do this kind of distributed foreach or is there any
> other function to achieve this that doesn't necessarily imply a data
> transformation or a returned RDD ?
>
>
> Thanks,
> Alex
>
>

Reply via email to