The compilation happens in parallel on all of the machines, so its not really clear that there is a win to generating it on the driver and shipping it from a latency perspective. However, really I just took the easiest path that didn't require more bytecode extracting / shipping machinery.
On Mon, Apr 6, 2015 at 3:07 PM, Akshat Aranya <aara...@gmail.com> wrote: > Thanks for the info, Michael. Is there a reason to do so, as opposed to > shipping out the bytecode and loading it via the classloader? Is it more > complex? I can imagine caching to be effective for repeated queries, but > when the subsequent queries are different. > > On Mon, Apr 6, 2015 at 2:41 PM, Michael Armbrust <mich...@databricks.com> > wrote: > >> It is generated and cached on each of the executors. >> >> On Mon, Apr 6, 2015 at 2:32 PM, Akshat Aranya <aara...@gmail.com> wrote: >> >>> Hi, >>> >>> I'm curious as to how Spark does code generation for SQL queries. >>> >>> Following through the code, I saw that an expression is parsed and >>> compiled into a class using Scala reflection toolbox. However, it's >>> unclear to me whether the actual byte code is generated on the master or on >>> each of the executors. If it generated on the master, how is the byte code >>> shipped out to the executors? >>> >>> Thanks, >>> Akshat >>> >>> >>> https://databricks.com/blog/2014/06/02/exciting-performance-improvements-on-the-horizon-for-spark-sql.html >>> >> >> >