Re: writing to local files on a worker

2018-11-15 Thread Steve Lewis
I looked at Java's mechanism for creating temporary local files. I believe they can be created, written to and passed to other programs on the system. I wrote a proof of concept to send some Strings out and use the local program cat to concatenate them and write the result to a local file .

Re: writing to local files on a worker

2018-11-12 Thread Steve Lewis
I have been looking at Spark-Blast which calls Blast - a well known C++ program in parallel - In my case I have tried to translate the C++ code to Java but am not getting the same results - it is convoluted - I have code that will call the program and read its results - the only real issue is the

Re: writing to local files on a worker

2018-11-11 Thread Jörn Franke
Can you use JNI to call the c++ functionality directly from Java? Or you wrap this into a MR step outside Spark and use Hadoop Streaming (it allows you to use shell scripts as mapper and reducer)? You can also write temporary files for each partition and execute the software within a map

Re: writing to local files on a worker

2018-11-11 Thread Joe
Hello, You could try using mapPartitions function if you can send partial data to your C++ program: mapPartitions(func): Similar to map, but runs separately on each partition (block) of the RDD, so /func/ must be of type Iterator => Iterator when running on an RDD of type T. That way you

writing to local files on a worker

2018-11-11 Thread Steve Lewis
I have a problem where a critical step needs to be performed by a third party c++ application. I can send or install this program on the worker nodes. I can construct a function holding all the data this program needs to process. The problem is that the program is designed to read and write from