Re: Spark SQL code generation

2015-04-06 Thread Akshat Aranya
Thanks for the info, Michael.  Is there a reason to do so, as opposed to
shipping out the bytecode and loading it via the classloader?  Is it more
complex?  I can imagine caching to be effective for repeated queries, but
when the subsequent queries are different.

On Mon, Apr 6, 2015 at 2:41 PM, Michael Armbrust mich...@databricks.com
wrote:

 It is generated and cached on each of the executors.

 On Mon, Apr 6, 2015 at 2:32 PM, Akshat Aranya aara...@gmail.com wrote:

 Hi,

 I'm curious as to how Spark does code generation for SQL queries.

 Following through the code, I saw that an expression is parsed and
 compiled into a class using Scala reflection toolbox.  However, it's
 unclear to me whether the actual byte code is generated on the master or on
 each of the executors.  If it generated on the master, how is the byte code
 shipped out to the executors?

 Thanks,
 Akshat


 https://databricks.com/blog/2014/06/02/exciting-performance-improvements-on-the-horizon-for-spark-sql.html





Spark SQL code generation

2015-04-06 Thread Akshat Aranya
Hi,

I'm curious as to how Spark does code generation for SQL queries.

Following through the code, I saw that an expression is parsed and compiled
into a class using Scala reflection toolbox.  However, it's unclear to me
whether the actual byte code is generated on the master or on each of the
executors.  If it generated on the master, how is the byte code shipped out
to the executors?

Thanks,
Akshat

https://databricks.com/blog/2014/06/02/exciting-performance-improvements-on-the-horizon-for-spark-sql.html


Re: Spark SQL code generation

2015-04-06 Thread Michael Armbrust
The compilation happens in parallel on all of the machines, so its not
really clear that there is a win to generating it on the driver and
shipping it from a latency perspective.  However, really I just took the
easiest path that didn't require more bytecode extracting / shipping
machinery.

On Mon, Apr 6, 2015 at 3:07 PM, Akshat Aranya aara...@gmail.com wrote:

 Thanks for the info, Michael.  Is there a reason to do so, as opposed to
 shipping out the bytecode and loading it via the classloader?  Is it more
 complex?  I can imagine caching to be effective for repeated queries, but
 when the subsequent queries are different.

 On Mon, Apr 6, 2015 at 2:41 PM, Michael Armbrust mich...@databricks.com
 wrote:

 It is generated and cached on each of the executors.

 On Mon, Apr 6, 2015 at 2:32 PM, Akshat Aranya aara...@gmail.com wrote:

 Hi,

 I'm curious as to how Spark does code generation for SQL queries.

 Following through the code, I saw that an expression is parsed and
 compiled into a class using Scala reflection toolbox.  However, it's
 unclear to me whether the actual byte code is generated on the master or on
 each of the executors.  If it generated on the master, how is the byte code
 shipped out to the executors?

 Thanks,
 Akshat


 https://databricks.com/blog/2014/06/02/exciting-performance-improvements-on-the-horizon-for-spark-sql.html






Re: Spark SQL code generation

2015-04-06 Thread Michael Armbrust
It is generated and cached on each of the executors.

On Mon, Apr 6, 2015 at 2:32 PM, Akshat Aranya aara...@gmail.com wrote:

 Hi,

 I'm curious as to how Spark does code generation for SQL queries.

 Following through the code, I saw that an expression is parsed and
 compiled into a class using Scala reflection toolbox.  However, it's
 unclear to me whether the actual byte code is generated on the master or on
 each of the executors.  If it generated on the master, how is the byte code
 shipped out to the executors?

 Thanks,
 Akshat


 https://databricks.com/blog/2014/06/02/exciting-performance-improvements-on-the-horizon-for-spark-sql.html