Hi,

I have an RDD and a function that should be called on every item in this
RDD once (say it updates an external database). So far, I used
  rdd.map(myFunction).count()
or
  rdd.mapPartitions(iter => iter.map(myFunction))
but I am wondering if this always triggers the call of myFunction in both
cases. Actually, in the first case, the count() will be the same whether or
not myFunction is called for each element, so I was just wondering if I can
rely on count() evaluating the whole pipeline including functions that
cannot change the count.

Thanks
Tobias

Reply via email to