Re: Understanding PySpark Internals

2016-03-30 Thread Josh Rosen
One clarification: there *are* Python interpreters running on executors so that Python UDFs and RDD API code can be executed. Some slightly-outdated but mostly-correct reference material for this can be found at https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals. See also: search

Understanding PySpark Internals

2016-03-29 Thread Adam Roberts
Hi, I'm interested in figuring out how the Python API for Spark works, I've came to the following conclusion and want to share this with the community; could be of use in the PySpark docs here, specifically the "Execution and pipelining part". Any sanity checking would be much appreciated,