Re: Spark SQL code generation

2015-04-06 Thread Akshat Aranya
Thanks for the info, Michael. Is there a reason to do so, as opposed to shipping out the bytecode and loading it via the classloader? Is it more complex? I can imagine caching to be effective for repeated queries, but when the subsequent queries are different. On Mon, Apr 6, 2015 at 2:41 PM,

Spark SQL code generation

2015-04-06 Thread Akshat Aranya
Hi, I'm curious as to how Spark does code generation for SQL queries. Following through the code, I saw that an expression is parsed and compiled into a class using Scala reflection toolbox. However, it's unclear to me whether the actual byte code is generated on the master or on each of the

Re: Spark SQL code generation

2015-04-06 Thread Michael Armbrust
The compilation happens in parallel on all of the machines, so its not really clear that there is a win to generating it on the driver and shipping it from a latency perspective. However, really I just took the easiest path that didn't require more bytecode extracting / shipping machinery. On

Re: Spark SQL code generation

2015-04-06 Thread Michael Armbrust
It is generated and cached on each of the executors. On Mon, Apr 6, 2015 at 2:32 PM, Akshat Aranya aara...@gmail.com wrote: Hi, I'm curious as to how Spark does code generation for SQL queries. Following through the code, I saw that an expression is parsed and compiled into a class using