Okay. I think I got it now. Yes take() does not need to be called more than once. I got the impression that we wanted to bring elements to the driver node and then run out qualifying_function on driver_node.
Now, I am back to my question which I started with: Could there be an approach where the qualifying_function() does not get called after an element has been found? Regards, Sandeep Giri, +1 347 781 4573 (US) +91-953-899-8962 (IN) www.KnowBigData.com. <http://KnowBigData.com.> Phone: +1-253-397-1945 (Office) [image: linkedin icon] <https://linkedin.com/company/knowbigdata> [image: other site icon] <http://knowbigdata.com> [image: facebook icon] <https://facebook.com/knowbigdata> [image: twitter icon] <https://twitter.com/IKnowBigData> <https://twitter.com/IKnowBigData> On Wed, Aug 5, 2015 at 9:21 PM, Sean Owen <so...@cloudera.com> wrote: > take only brings n elements to the driver, which is probably still a win > if n is small. I'm not sure what you mean by only taking a count argument > -- what else would be an arg to take? > > On Wed, Aug 5, 2015 at 4:49 PM, Sandeep Giri <sand...@knowbigdata.com> > wrote: > >> Yes, but in the take() approach we will be bringing the data to the >> driver and is no longer distributed. >> >> Also, the take() takes only count as argument which means that every time >> we would transferring the redundant elements. >> >>