Hi all, I need to run a complex external process with a lots of dependencies from spark. The "pipe" and "addFile" function seem to be my friends but there are just some issues that I need to solve.
Precisely, the process I want to run are C++ executable that may depend on some libraries and additional file parameters. I bundle every things in one tar file so I may have the following structure : myalgo: -- run.exe -- libdepend_run.so -- parameter_file For example my algo may be a support vector machine with the trained model file. Now I need a way to deploy my bundled algo on every node and pipe the executable on my RDD. My question is : is it possible to deploy my tar files and extract them on every worker so that I can invoke my executable ? Any ideas will be helpfull, Cheers,