Hi,I am speaking in Spark Europe summit on exploiting GPUs for columnar
DataFrame operations. I was going through various blogs, talks and JIRAs
given by all you and trying to figure out where to make changes for this
proposal.First of all, I must thank the recent progress in project tungsten
that has made my job easier. The changes for code generation make it
possible to allow me to generate OpenCL code for expressions instead of
existing java/scala code and run the OpenCL code on GPUs through a Java
library JavaCL.However, before starting the work, I have a few
questions/doubts as below:   * I found where the code generation happens in
spark code from the blogs
https://databricks.com/blog/2014/06/02/exciting-performance-improvements-on-the-horizon-for-spark-sql.html,
https://databricks.com/blog/2015/04/13/deep-dive-into-spark-sqls-catalyst-optimizer.html
and
https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html.
However, I could not find where is the generated code executed? A major part
of my changes will be there since this executor will now have to send
vectors of columns to GPU RAM, invoke execution, and get the results back to
CPU RAM. Thus, the existing executor will significantly change.   * On the
project tungsten blog, in the third Code Generation section, you mention
that you plan to increase the level of code generation from record-at-a-time
expression evaluation to vectorized expression evaluation. Has this been
implemented? If not, how do I implement this? I will need access to columnar
ByteBuffer objects in DataFrame to do this. Having row by row access to data
will defeat this exercise. In particular, I need access to
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnType.scala
in the executor of the generated code.   * One thing that confuses me is the
changes from 1.4 to 1.5 possibly due to JIRA
https://issues.apache.org/jira/browse/SPARK-7956 and pull request
https://github.com/apache/spark/pull/6479/files. This changed the code
generation from quasiquotes (q) to string s operator. This makes it simpler
for me to generate OpenCL code which is string based. The question, is this
branch stable now? Should I make my changes on spark 1.4 or spark 1.5 or
master branch?How do I tune the batch size (number of rows in the
ByteBuffer)? Is it through the property
spark.sql.inMemoryColumnarStorage.batchSize?Thanks in
anticipation,KiranPS:Other things I found useful were:Spark DataFrames:
https://www.brighttalk.com/webcast/12891/166495Apache Spark 1.5:
https://www.brighttalk.com/webcast/12891/168177The links to
JavaCL/ScalaCL:Library to execute OpenCL code through Java:
https://github.com/nativelibs4java/ScalaCLLibrary to convert Scala code to
OpenCL and execute on GPUs: https://github.com/nativelibs4java/JavaCL



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Code-generation-for-GPU-tp24587.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to