Also take a look at this: https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals
On Fri, Jul 11, 2014 at 10:29 AM, Andrew Or <and...@databricks.com> wrote: > Hi Egor, > > Here are a few answers to your questions: > > 1) Python needs to be installed on all machines, but not pyspark. The way > the executors get the pyspark code depends on which cluster manager you > use. In standalone mode, your executors need to have the actual python > files in their working directory. In yarn mode, python files are included > in the assembly jar, which is then shipped to your executor containers > through a distributed cache. > > 2) Pyspark is just a thin wrapper around Spark. When you write a closure in > python, it is shipped to the executors within the task itself the same way > scala closures are shipped. If you use a special library, then all of the > nodes will need to have that library pre-installed. > > 3) Are you trying to run your c++ code inside the "map" function? If so, > you need to make sure the compiled code is present in the working directory > on all the executors before-hand for python to "exec" it. I haven't done > this before, but maybe there are a few gotchas in doing this. > > Maybe others can add more information? > > Andrew > > > 2014-07-11 5:50 GMT-07:00 Egor Pahomov <pahomov.e...@gmail.com>: > > > Hi, I want to use pySpark, but can't understand how it works. > Documentation > > doesn't provide enough information. > > > > 1) How python shipped to cluster? Should machines in cluster already have > > python? > > 2) What happens when I write some python code in "map" function - is it > > shipped to cluster and just executed on it? How it understand all > > dependencies, which my code need and ship it there? If I use Math in my > > code in "map" does it mean, that I would ship Math class or some python > > Math on cluster would be used? > > 3) I have c++ compiled code. Can I ship this executable with "addPyFile" > > and just use "exec" function from python? Would it work? > > > > -- > > > > > > > > *Sincerely yoursEgor PakhomovScala Developer, Yandex* > > >