Re: Using CUDA within Spark / boosting linear algebra

2016-02-04 Thread Max Grossman
Allen,

Currently it only supports OpenCL because the code generator we’ve extended 
targeted OpenCL. There’s no technical reason that CUDA couldn’t be supported if 
people would be interested in that, but it would require a rewrite of some of 
the code generator as well as some ifdefs in the runtime to allow us to compile 
with either OpenCL or CUDA support. There are actually a few components that 
support both OpenCL and CUDA for when they’ve been reused for other projects 
that did use CUDA, just not all of them.

Thanks,

Max

> On Feb 4, 2016, at 9:42 AM, Allen Zhang <allenzhang...@126.com> wrote:
> 
> Hi Max,
> 
> I will look at it tomorrow. but a quick question, does it support CUDA from 
> Nvidia, not only OpenCL?
> 
> Thanks,
> Allen
> 
> 
> 
> 
> 
> At 2016-02-04 23:13:05, "Max Grossman" <j...@rice.edu> wrote:
> Hi all,
> 
> I’m jumping on this thread to point out another Spark+GPU project for people 
> to take a look at: https://github.com/agrippa/spark-swat 
> <https://github.com/agrippa/spark-swat>
> 
> SWAT (Spark with Accelerated Tasks) is a third-party JAR sitting on top of 
> Spark that uses runtime code generation to convert user-written 
> transformations into OpenCL kernels. SWAT’s lightweight runtime supports 
> multi-GPU systems, managing each device and its memory automatically. You 
> write your own Spark programs, and the runtime takes care of offloading your 
> transformations to the GPUs in your system:
> 
> val rdd = CLWrapper.cl(sc.objectFile(inputPath))
> val next = rdd.map(i => 2 * i).collect
> 
> SWAT primarily distinguishes itself in programmability: an explicit goal of 
> this project is to have as few user-visible API changes as possible from what 
> people have come to know and love in Spark. There are a number of 
> fixed-function GPU libraries out there now, so we wanted to look instead at 
> something that could be used to build new but still well-performing Spark 
> apps.
> 
> SWAT is currently more of a research project than a production-ready system, 
> so there’s a chance it won’t work out-of-the-box on some systems. With that 
> said, it does have fairly comprehensive functional and code generation 
> testing. If you’re interested in trying it out and having trouble setting up, 
> feel free to contact me directly. And of course, any questions or feedback 
> from the community are always welcome.
> 
> Thanks,
> 
> Max
> 
>> On Jan 22, 2016, at 3:42 AM, Kazuaki Ishizaki <ishiz...@jp.ibm.com 
>> <mailto:ishiz...@jp.ibm.com>> wrote:
>> 
>> Hi Alexander,
>> The goal of our columnar to effectively drive GPUs in Spark. One of 
>> important items is to effectively and easily enable highly-tuned libraries 
>> for GPU such as BIDMach.
>> 
>> We will enable BIDMach with our columnar storage. On the other hand, it is 
>> not easy task to scaling BIDMach with current Spark. I expect that this talk 
>> would help us.
>> http://conferences.oreilly.com/strata/hadoop-big-data-ca/public/schedule/detail/47565
>>  
>> <http://conferences.oreilly.com/strata/hadoop-big-data-ca/public/schedule/detail/47565>
>> 
>> We appreciate your great feedback.
>> 
>> Best Regards,
>> Kazuaki Ishizaki, Ph.D., Senior research staff member, IBM Research - Tokyo
>> 
>> 
>> 
>> From:"Ulanov, Alexander" <alexander.ula...@hpe.com 
>> <mailto:alexander.ula...@hpe.com>>
>> To:Kazuaki Ishizaki/Japan/IBM@IBMJP, "dev@spark.apache.org 
>> <mailto:dev@spark.apache.org>" <dev@spark.apache.org 
>> <mailto:dev@spark.apache.org>>, Joseph Bradley <jos...@databricks.com 
>> <mailto:jos...@databricks.com>>
>> Cc:John Canny <ca...@berkeley.edu <mailto:ca...@berkeley.edu>>, 
>> "Evan R. Sparks" <evan.spa...@gmail.com <mailto:evan.spa...@gmail.com>>, 
>> Xiangrui Meng <men...@gmail.com <mailto:men...@gmail.com>>, Sam Halliday 
>> <sam.halli...@gmail.com <mailto:sam.halli...@gmail.com>>
>> Date:2016/01/22 04:20
>> Subject:RE: Using CUDA within Spark / boosting linear algebra
>> 
>> 
>> 
>> Hi Kazuaki,
>>  
>> Indeed, moving data to/from GPU is costly and this benchmark summarizes the 
>> costs for moving different data sizes with regards to matrices 
>> multiplication. These costs are paid for the convenience of using the 
>> standard BLAS API that Nvidia NVBLAS provides. The thing is that there are 
>> no code changes required (in Spark), one just needs to reference BLAS 
>> implementation with the syst

Re: Using CUDA within Spark / boosting linear algebra

2016-02-04 Thread Max Grossman
Hi all,

I’m jumping on this thread to point out another Spark+GPU project for people to 
take a look at: https://github.com/agrippa/spark-swat 


SWAT (Spark with Accelerated Tasks) is a third-party JAR sitting on top of 
Spark that uses runtime code generation to convert user-written transformations 
into OpenCL kernels. SWAT’s lightweight runtime supports multi-GPU systems, 
managing each device and its memory automatically. You write your own Spark 
programs, and the runtime takes care of offloading your transformations to the 
GPUs in your system:

val rdd = CLWrapper.cl(sc.objectFile(inputPath))
val next = rdd.map(i => 2 * i).collect

SWAT primarily distinguishes itself in programmability: an explicit goal of 
this project is to have as few user-visible API changes as possible from what 
people have come to know and love in Spark. There are a number of 
fixed-function GPU libraries out there now, so we wanted to look instead at 
something that could be used to build new but still well-performing Spark apps.

SWAT is currently more of a research project than a production-ready system, so 
there’s a chance it won’t work out-of-the-box on some systems. With that said, 
it does have fairly comprehensive functional and code generation testing. If 
you’re interested in trying it out and having trouble setting up, feel free to 
contact me directly. And of course, any questions or feedback from the 
community are always welcome.

Thanks,

Max

> On Jan 22, 2016, at 3:42 AM, Kazuaki Ishizaki  wrote:
> 
> Hi Alexander,
> The goal of our columnar to effectively drive GPUs in Spark. One of important 
> items is to effectively and easily enable highly-tuned libraries for GPU such 
> as BIDMach.
> 
> We will enable BIDMach with our columnar storage. On the other hand, it is 
> not easy task to scaling BIDMach with current Spark. I expect that this talk 
> would help us.
> http://conferences.oreilly.com/strata/hadoop-big-data-ca/public/schedule/detail/47565
>  
> 
> 
> We appreciate your great feedback.
> 
> Best Regards,
> Kazuaki Ishizaki, Ph.D., Senior research staff member, IBM Research - Tokyo
> 
> 
> 
> From:"Ulanov, Alexander" 
> To:Kazuaki Ishizaki/Japan/IBM@IBMJP, "dev@spark.apache.org" 
> , Joseph Bradley 
> Cc:John Canny , "Evan R. Sparks" 
> , Xiangrui Meng , Sam Halliday 
> 
> Date:2016/01/22 04:20
> Subject:RE: Using CUDA within Spark / boosting linear algebra
> 
> 
> 
> Hi Kazuaki,
>  
> Indeed, moving data to/from GPU is costly and this benchmark summarizes the 
> costs for moving different data sizes with regards to matrices 
> multiplication. These costs are paid for the convenience of using the 
> standard BLAS API that Nvidia NVBLAS provides. The thing is that there are no 
> code changes required (in Spark), one just needs to reference BLAS 
> implementation with the system variable. Naturally, hardware-specific 
> implementation will always be faster than default. The benchmark results show 
> that fact by comparing jCuda (by means of BIDMat) and NVBLAS. However, it 
> also shows that it worth using NVBLAS for large matrices because it can take 
> advantage of several GPUs and it will be faster despite the copying overhead. 
> That is also a known thing advertised by Nvidia.
>  
> By the way, I don’t think that the column/row friendly format is an issue, 
> because one can use transposed matrices to fit the required format. I believe 
> that is just a software preference.
>  
> My suggestion with regards to your prototype would be to make comparisons 
> with Spark’s implementation of logistic regression (that does not take 
> advantage of GPU) and also with BIDMach’s (that takes advantage of GPUs). It 
> will give the users a better understanding of your’s implementation 
> performance. Currently you compare it with Spark’s example logistic 
> regression implementation that is supposed to be a reference for learning 
> Spark rather than benchmarking its performance.
>  
> Best regards, Alexander
>  
> From: Kazuaki Ishizaki [mailto:ishiz...@jp.ibm.com 
> ] 
> Sent: Thursday, January 21, 2016 3:34 AM
> To: dev@spark.apache.org; Ulanov, Alexander; Joseph Bradley
> Cc: John Canny; Evan R. Sparks; Xiangrui Meng; Sam Halliday
> Subject: RE: Using CUDA within Spark / boosting linear algebra
>  
> Dear all,
> 
>  Hi Alexander,
> 
>  Using GPUs with Spark would be very exciting.  Small comment:
>  Concerning your question earlier about keeping data stored on the
>  GPU rather than having to move it between main memory and GPU
>  memory on each iteration, I would guess this would be critical to
>  getting good