Hi, I want to use pySpark, but can't understand how it works. Documentation doesn't provide enough information.
1) How python shipped to cluster? Should machines in cluster already have python? 2) What happens when I write some python code in "map" function - is it shipped to cluster and just executed on it? How it understand all dependencies, which my code need and ship it there? If I use Math in my code in "map" does it mean, that I would ship Math class or some python Math on cluster would be used? 3) I have c++ compiled code. Can I ship this executable with "addPyFile" and just use "exec" function from python? Would it work? -- *Sincerely yoursEgor PakhomovScala Developer, Yandex*