Thank you Ewen. RDD.pipe is what I need and it works like a charm. On the
other side RDD.mapPartitions seems to be interesting but I can't figure out
how to make it work.

Jaonary


On Thu, Mar 20, 2014 at 4:54 PM, Ewen Cheslack-Postava <m...@ewencp.org>wrote:

> Take a look at RDD.pipe().
>
> You could also accomplish the same thing using RDD.mapPartitions, which
> you pass a function that processes the iterator for each partition rather
> than processing each element individually. This lets you only start up as
> many processes as there are partitions, pipe the contents of each iterator
> to them, then collect the output. This might be useful if, e.g., your
> external process doesn't use line-oriented input/output.
>
> -Ewen
>
>   Jaonary Rabarisoa <jaon...@gmail.com>
>  March 20, 2014 at 1:04 AM
> Dear all,
>
>
> Dear all,
>
>
> Does Spark has a kind of Hadoop streaming feature to run external process
> to manipulate data from RDD sent through stdin and stdout ?
>
> Best,
>
> Jaonary
>
>

<<inline: compose-unknown-contact.jpg>>

Reply via email to