Thank you Matei. I found a solution using pipe and matlab engine (an executable that can call matlab behind the scene and uses stdin and stdout to communicate). I just need to fix two other issues :
- how can I handle my dependencies ? My matlab script need other matlab files that need to be present on each workers' matlab path. So I need a way to push them to each worker and tell matlab where to find them with "addpath". I know how to call "addpath" but I don't know what should be the path. - is the pipe() operator works on a partition level in order to run the external process once for each data in a partition. Initializing my external process cost a lot so it is not good to call it several times. On Mon, Aug 25, 2014 at 9:03 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote: > Have you tried the pipe() operator? It should work if you can launch your > script from the command line. Just watch out for any environment variables > needed (you can pass them to pipe() as an optional argument if there are > some). > > On August 25, 2014 at 12:41:29 AM, Jaonary Rabarisoa (jaon...@gmail.com) > wrote: > > Hi all, > > Is there someone that tried to pipe RDD into matlab script ? I'm trying to > do something similiar if one of you could point some hints. > > Best regards, > > Jao > >