Re: Spark SQL code generation
Thanks for the info, Michael. Is there a reason to do so, as opposed to shipping out the bytecode and loading it via the classloader? Is it more complex? I can imagine caching to be effective for repeated queries, but when the subsequent queries are different. On Mon, Apr 6, 2015 at 2:41 PM, Michael Armbrust mich...@databricks.com wrote: It is generated and cached on each of the executors. On Mon, Apr 6, 2015 at 2:32 PM, Akshat Aranya aara...@gmail.com wrote: Hi, I'm curious as to how Spark does code generation for SQL queries. Following through the code, I saw that an expression is parsed and compiled into a class using Scala reflection toolbox. However, it's unclear to me whether the actual byte code is generated on the master or on each of the executors. If it generated on the master, how is the byte code shipped out to the executors? Thanks, Akshat https://databricks.com/blog/2014/06/02/exciting-performance-improvements-on-the-horizon-for-spark-sql.html
Spark SQL code generation
Hi, I'm curious as to how Spark does code generation for SQL queries. Following through the code, I saw that an expression is parsed and compiled into a class using Scala reflection toolbox. However, it's unclear to me whether the actual byte code is generated on the master or on each of the executors. If it generated on the master, how is the byte code shipped out to the executors? Thanks, Akshat https://databricks.com/blog/2014/06/02/exciting-performance-improvements-on-the-horizon-for-spark-sql.html
Re: Spark SQL code generation
The compilation happens in parallel on all of the machines, so its not really clear that there is a win to generating it on the driver and shipping it from a latency perspective. However, really I just took the easiest path that didn't require more bytecode extracting / shipping machinery. On Mon, Apr 6, 2015 at 3:07 PM, Akshat Aranya aara...@gmail.com wrote: Thanks for the info, Michael. Is there a reason to do so, as opposed to shipping out the bytecode and loading it via the classloader? Is it more complex? I can imagine caching to be effective for repeated queries, but when the subsequent queries are different. On Mon, Apr 6, 2015 at 2:41 PM, Michael Armbrust mich...@databricks.com wrote: It is generated and cached on each of the executors. On Mon, Apr 6, 2015 at 2:32 PM, Akshat Aranya aara...@gmail.com wrote: Hi, I'm curious as to how Spark does code generation for SQL queries. Following through the code, I saw that an expression is parsed and compiled into a class using Scala reflection toolbox. However, it's unclear to me whether the actual byte code is generated on the master or on each of the executors. If it generated on the master, how is the byte code shipped out to the executors? Thanks, Akshat https://databricks.com/blog/2014/06/02/exciting-performance-improvements-on-the-horizon-for-spark-sql.html
Re: Spark SQL code generation
It is generated and cached on each of the executors. On Mon, Apr 6, 2015 at 2:32 PM, Akshat Aranya aara...@gmail.com wrote: Hi, I'm curious as to how Spark does code generation for SQL queries. Following through the code, I saw that an expression is parsed and compiled into a class using Scala reflection toolbox. However, it's unclear to me whether the actual byte code is generated on the master or on each of the executors. If it generated on the master, how is the byte code shipped out to the executors? Thanks, Akshat https://databricks.com/blog/2014/06/02/exciting-performance-improvements-on-the-horizon-for-spark-sql.html