Thank you Matei.

 I found a solution using pipe and matlab engine (an executable that can
call matlab behind the scene and uses stdin and stdout to communicate). I
just need to fix two other issues :

- how can I handle my dependencies ? My matlab script need other matlab
files that need to be present on each workers' matlab path. So I need a way
to push them to each worker and tell matlab where to find them with
"addpath". I know how to call "addpath" but I don't know what should be the
path.

- is the pipe() operator works on a partition level in order to run the
external process once for each data in a partition. Initializing my
external process cost a lot so it is not good to call it several times.



On Mon, Aug 25, 2014 at 9:03 PM, Matei Zaharia <matei.zaha...@gmail.com>
wrote:

> Have you tried the pipe() operator? It should work if you can launch your
> script from the command line. Just watch out for any environment variables
> needed (you can pass them to pipe() as an optional argument if there are
> some).
>
> On August 25, 2014 at 12:41:29 AM, Jaonary Rabarisoa (jaon...@gmail.com)
> wrote:
>
> Hi all,
>
> Is there someone that tried to pipe RDD into matlab script ? I'm trying to
> do something similiar if one of you could point some hints.
>
> Best regards,
>
> Jao
>
>

Reply via email to