Re: SYSTEMML-447

2018-05-18 Thread Janardhan
Thanks for explaining this. I will go according to this.

- Janardhan

On Fri, May 11, 2018 at 10:14 AM, Matthias Boehm  wrote:

> This particular JIRA is only partially related. Niketan and Nakul
> worked out the details - the only reason I show up as the reporter is
> that, if I remember correctly, we split a larger scoped JIRA for
> low-level optimizations (GPU, codegen, compression) into individual
> JIRAs and created the detailed tasks.
>
> Overall, I believe that sparse GPU operations would be very valuable,
> especially in the context of NLP, graphs, and structured data with
> categorical features (which often become very sparse after dummy
> coding) because in these ultra-sparse scenarios dense operations cause
> unnecessary overheads of orders of magnitude (proportional to the
> sparsity). However, creating efficient sparse GPU kernels is
> challenging due to irregularities (e.g., sparsity skew). Compared to
> CPU operations, there might still be benefit depending on the data
> location of inputs/outputs, as well as higher memory bandwidth.
>
> Even in the face of extending the codegen framework for GPUs (which is
> still on the roadmap for this year), we would still need dense/sparse
> kernels for the individual operations because we want to apply codegen
> only if we can benefit from fusion. Right now we call existing
> libraries such as cuBLAS and cuDNN and have dense kernels for a subset
> of operations such as unary and binary, and unary aggregates.
>
> Regarding ramping up on the GPU backend, maybe it's a good idea to
> first start with missing dense operations. I'm thinking of statistical
> functions (e.g., covariance, moment), parameterized builtin functions
> (e.g., grouped aggregated), missing unary and binary operations (e.g.,
> bitwise), missing reorg operations (e.g., reshape, sort - there should
> be library for the latter), missing unary, binary and ternary
> aggregates, missing nary (e.g., nary cbind/rbind), etc. Adding these
> remaining operations would also help a lot. However, if you're more
> interested in contributing to the development of sparse kernels, maybe
> you could one or two dense operations, get comfortable, and then move
> on to sparse operations. Apart from the kernels, a seamless support
> for sparse operations would also require some integration work on how
> we pass data, maintain nnz, preallocate sparse outputs, etc.
>
> Regards,
> Matthias
>
>
> On Thu, May 10, 2018 at 8:47 PM, Janardhan  wrote:
> > Hi Matthias,
> >
> > Was this related to long term plan for GPU codegen?
> >
> > Thank you,
> > Janardhan
>


Re: SYSTEMML-447

2018-05-10 Thread Matthias Boehm
This particular JIRA is only partially related. Niketan and Nakul
worked out the details - the only reason I show up as the reporter is
that, if I remember correctly, we split a larger scoped JIRA for
low-level optimizations (GPU, codegen, compression) into individual
JIRAs and created the detailed tasks.

Overall, I believe that sparse GPU operations would be very valuable,
especially in the context of NLP, graphs, and structured data with
categorical features (which often become very sparse after dummy
coding) because in these ultra-sparse scenarios dense operations cause
unnecessary overheads of orders of magnitude (proportional to the
sparsity). However, creating efficient sparse GPU kernels is
challenging due to irregularities (e.g., sparsity skew). Compared to
CPU operations, there might still be benefit depending on the data
location of inputs/outputs, as well as higher memory bandwidth.

Even in the face of extending the codegen framework for GPUs (which is
still on the roadmap for this year), we would still need dense/sparse
kernels for the individual operations because we want to apply codegen
only if we can benefit from fusion. Right now we call existing
libraries such as cuBLAS and cuDNN and have dense kernels for a subset
of operations such as unary and binary, and unary aggregates.

Regarding ramping up on the GPU backend, maybe it's a good idea to
first start with missing dense operations. I'm thinking of statistical
functions (e.g., covariance, moment), parameterized builtin functions
(e.g., grouped aggregated), missing unary and binary operations (e.g.,
bitwise), missing reorg operations (e.g., reshape, sort - there should
be library for the latter), missing unary, binary and ternary
aggregates, missing nary (e.g., nary cbind/rbind), etc. Adding these
remaining operations would also help a lot. However, if you're more
interested in contributing to the development of sparse kernels, maybe
you could one or two dense operations, get comfortable, and then move
on to sparse operations. Apart from the kernels, a seamless support
for sparse operations would also require some integration work on how
we pass data, maintain nnz, preallocate sparse outputs, etc.

Regards,
Matthias


On Thu, May 10, 2018 at 8:47 PM, Janardhan  wrote:
> Hi Matthias,
>
> Was this related to long term plan for GPU codegen?
>
> Thank you,
> Janardhan


SYSTEMML-447

2018-05-10 Thread Janardhan
Hi Matthias,

Was this related to long term plan for GPU codegen?

Thank you,
Janardhan