Thank you Ewen. RDD.pipe is what I need and it works like a charm. On the other side RDD.mapPartitions seems to be interesting but I can't figure out how to make it work.
Jaonary On Thu, Mar 20, 2014 at 4:54 PM, Ewen Cheslack-Postava <m...@ewencp.org>wrote: > Take a look at RDD.pipe(). > > You could also accomplish the same thing using RDD.mapPartitions, which > you pass a function that processes the iterator for each partition rather > than processing each element individually. This lets you only start up as > many processes as there are partitions, pipe the contents of each iterator > to them, then collect the output. This might be useful if, e.g., your > external process doesn't use line-oriented input/output. > > -Ewen > > Jaonary Rabarisoa <jaon...@gmail.com> > March 20, 2014 at 1:04 AM > Dear all, > > > Dear all, > > > Does Spark has a kind of Hadoop streaming feature to run external process > to manipulate data from RDD sent through stdin and stdout ? > > Best, > > Jaonary > >
<<inline: compose-unknown-contact.jpg>>