Re: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-25 Thread Dmitriy Lyubimov
On Thu, Sep 25, 2014 at 8:50 AM, Dmitriy Lyubimov  wrote:

> As for pure scala backend, it already exists and it is called Breeze
> project (something MLib uses internally), supported by David Hall (among
> others). It also includes a lot more common non-distributed math than just
> algebra. By my estimate, it is one of the most well-rounded and
> comprehensive math libraries in existence today ...
>

for JVM.


Re: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-25 Thread Dmitriy Lyubimov
to be absolutely frank, if i could divorce easily from Colt, I
would've divorced the entire scala code from Mahout. Unfortunately
currently it is not very realistic case for me. More hopefully, we could
patch Colt for major problems and add new backs there.

As for pure scala backend, it already exists and it is called Breeze
project (something MLib uses internally), supported by David Hall (among
others). It also includes a lot more common non-distributed math than just
algebra. By my estimate, it is one of the most well-round and comprehensive
math libraries in existence today. It has, however, had significant
difficulties dealing with sparse/dense operation optimizations in the past,
as well as modelling, not sure as of this very moment. Colt at some point
was marginally better in typing sparse in-memory idioms.

On Thu, Sep 25, 2014 at 5:32 AM, Saikat Kanjilal 
wrote:

> From a big picture perspective do we intend to keep colt around or write
> scala implementations for functions like the aggregate, if so then I can
> add scala code to do the aggregation and call it from the DSL for the norm.
>
> Sent from my iPhone
>
> > On Sep 25, 2014, at 12:25 AM, Ted Dunning  wrote:
> >
> > On Wed, Sep 24, 2014 at 11:09 PM, Dmitriy Lyubimov 
> > wrote:
> >
> >> Aggregate is Colt's thing. Colt (aka Mahout-math) establish java-side
> >> concept of different function types which are unfortunately not
> compatible
> >> with Scala literals.
> >
> > Dmitriy,
> >
> > Is this because we have other methods that describe the characteristics
> of
> > the function?
> >
> > What would be the Scala friendly idiom?  Additional traits?
>


Re: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-25 Thread Dmitriy Lyubimov
Scala function literals (or any function literal) derive from a particular
set of traits. It may be java classes are able to implement these traits
(nobody that i know attempted to do that), and then maybe they will become
supported as scala function types. But i think even that is a big if, since
scala compiler tinkers with bytecode a lot, and compatibility at bytecode
level is not guaranteed between scala major releases. Bottom line, even if
it is possible to write scala functions in java, it is definitely not
publicly documented feature.

On the other hand, it is possible to use "function-like" Colt classes such
as DoubleDoubleFunction just like a plain old reference-type object from
either scala or Java, which is exactly how it happens in the example given
in the question originally asked.

On Thu, Sep 25, 2014 at 12:24 AM, Ted Dunning  wrote:

> On Wed, Sep 24, 2014 at 11:09 PM, Dmitriy Lyubimov 
> wrote:
>
> > Aggregate is Colt's thing. Colt (aka Mahout-math) establish java-side
> > concept of different function types which are unfortunately not
> compatible
> > with Scala literals.
> >
>
> Dmitriy,
>
> Is this because we have other methods that describe the characteristics of
> the function?
>
> What would be the Scala friendly idiom?  Additional traits?
>


Re: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-25 Thread Saikat Kanjilal
From a big picture perspective do we intend to keep colt around or write scala 
implementations for functions like the aggregate, if so then I can add scala 
code to do the aggregation and call it from the DSL for the norm.

Sent from my iPhone

> On Sep 25, 2014, at 12:25 AM, Ted Dunning  wrote:
> 
> On Wed, Sep 24, 2014 at 11:09 PM, Dmitriy Lyubimov 
> wrote:
> 
>> Aggregate is Colt's thing. Colt (aka Mahout-math) establish java-side
>> concept of different function types which are unfortunately not compatible
>> with Scala literals.
> 
> Dmitriy,
> 
> Is this because we have other methods that describe the characteristics of
> the function?
> 
> What would be the Scala friendly idiom?  Additional traits?


Re: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-25 Thread Ted Dunning
On Wed, Sep 24, 2014 at 11:09 PM, Dmitriy Lyubimov 
wrote:

> Aggregate is Colt's thing. Colt (aka Mahout-math) establish java-side
> concept of different function types which are unfortunately not compatible
> with Scala literals.
>

Dmitriy,

Is this because we have other methods that describe the characteristics of
the function?

What would be the Scala friendly idiom?  Additional traits?


Re: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-24 Thread Dmitriy Lyubimov
On Wed, Sep 24, 2014 at 9:15 PM, Saikat Kanjilal 
wrote:

> Shannon/Dmitry,Quick question, I'm wanting to calculate the scala
> equivalent of the frobenius norm per this API spec in python (
> http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.norm.html),
> I dug into the mahout-math-scala project and found the following API to
> calculate the norm:
>
>
>
>
>
>
>
>
> def norm = sqrt(m.aggregate(Functions.PLUS, Functions.SQUARE))
> I believe the above is also calculating the frobenius norm, however I am
> curious why we are calling a Java API from scala, the type of m above is a
> java interface called Matrix, I'm guessing the implementation of aggregate
> is happening in the math-math-scala somewhere, is that assumption correct?
>

We are colling Colt (i.e. java) for pretty much everything. As far as scala
bindings are concerned, they are but a DSL wrapper to Colt (unlike
distributed algebra which is much more).

Aggregate is Colt's thing. Colt (aka Mahout-math) establish java-side
concept of different function types which are unfortunately not compatible
with Scala literals.




> Thanks in advance.
> > From: sxk1...@hotmail.com
> > To: dev@mahout.apache.org
> > Subject: RE: Mahout-1539-computation of gaussian kernel between 2 arrays
> of shapes
> > Date: Thu, 18 Sep 2014 12:51:36 -0700
> >
> > Ok great I'll use the cartesian spark API call, so what I'd still like
> some thoughts on where the code that calls the cartesian should live in our
> directory structure.
> > > Date: Thu, 18 Sep 2014 15:33:59 -0400
> > > From: squ...@gatech.edu
> > > To: dev@mahout.apache.org
> > > Subject: Re: Mahout-1539-computation of gaussian kernel between 2
> arrays of shapes
> > >
> > > Saikat,
> > >
> > > Spark has the cartesian() method that will align all pairs of points;
> > > that's the nontrivial part of determining an RBF kernel. After that
> it's
> > > a simple matter of performing the equation that's given on the
> > > scikit-learn doc page.
> > >
> > > However, like you said it'll also have to be implemented using the
> > > Mahout DSL. I can envision that users would like to compute pairwise
> > > metrics for a lot more than just RBF kernels (pairwise Euclidean
> > > distance, etc), so my guess would be a DSL implementation of
> cartesian()
> > > is what you're looking for. You can build the other methods on top of
> that.
> > >
> > > Correct me if I'm wrong.
> > >
> > > Shannon
> > >
> > > On 9/18/14, 3:28 PM, Saikat Kanjilal wrote:
> > > >
> http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.rbf_kernel.html
> > > > I need to implement the above in the scala world and expose a DSL
> API to call the computation when computing the affinity matrix.
> > > >
> > > >> From: ted.dunn...@gmail.com
> > > >> Date: Thu, 18 Sep 2014 10:04:34 -0700
> > > >> Subject: Re: Mahout-1539-computation of gaussian kernel between 2
> arrays of shapes
> > > >> To: dev@mahout.apache.org
> > > >>
> > > >> There are number of non-traditional linear algebra operations like
> this
> > > >> that are important to implement.
> > > >>
> > > >> Can you describe what you intend to do so that we can discuss the
> shape of
> > > >> the API and computation?
> > > >>
> > > >>
> > > >>
> > > >> On Wed, Sep 17, 2014 at 9:28 PM, Saikat Kanjilal <
> sxk1...@hotmail.com>
> > > >> wrote:
> > > >>
> > > >>> Dmitry et al,As part of the above JIRA I need to calculate the
> gaussian
> > > >>> kernel between 2 shapes, I looked through mahout-math-scala and
> didnt see
> > > >>> anything to do this, any objections to me adding some code under
> > > >>> scalabindings to do this?
> > > >>> Thanks in advance.
> > > >
> > >
> >
>
>


Re: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-24 Thread Ted Dunning
Yes.  That code is computing Frobenius norm.

I can't answer the context question about Scala calling Java, however.

On Wed, Sep 24, 2014 at 9:15 PM, Saikat Kanjilal 
wrote:

> Shannon/Dmitry,Quick question, I'm wanting to calculate the scala
> equivalent of the frobenius norm per this API spec in python (
> http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.norm.html),
> I dug into the mahout-math-scala project and found the following API to
> calculate the norm:
>
>
>
>
>
>
>
>
> def norm = sqrt(m.aggregate(Functions.PLUS, Functions.SQUARE))
> I believe the above is also calculating the frobenius norm, however I am
> curious why we are calling a Java API from scala, the type of m above is a
> java interface called Matrix, I'm guessing the implementation of aggregate
> is happening in the math-math-scala somewhere, is that assumption correct?
> Thanks in advance.
> > From: sxk1...@hotmail.com
> > To: dev@mahout.apache.org
> > Subject: RE: Mahout-1539-computation of gaussian kernel between 2 arrays
> of shapes
> > Date: Thu, 18 Sep 2014 12:51:36 -0700
> >
> > Ok great I'll use the cartesian spark API call, so what I'd still like
> some thoughts on where the code that calls the cartesian should live in our
> directory structure.
> > > Date: Thu, 18 Sep 2014 15:33:59 -0400
> > > From: squ...@gatech.edu
> > > To: dev@mahout.apache.org
> > > Subject: Re: Mahout-1539-computation of gaussian kernel between 2
> arrays of shapes
> > >
> > > Saikat,
> > >
> > > Spark has the cartesian() method that will align all pairs of points;
> > > that's the nontrivial part of determining an RBF kernel. After that
> it's
> > > a simple matter of performing the equation that's given on the
> > > scikit-learn doc page.
> > >
> > > However, like you said it'll also have to be implemented using the
> > > Mahout DSL. I can envision that users would like to compute pairwise
> > > metrics for a lot more than just RBF kernels (pairwise Euclidean
> > > distance, etc), so my guess would be a DSL implementation of
> cartesian()
> > > is what you're looking for. You can build the other methods on top of
> that.
> > >
> > > Correct me if I'm wrong.
> > >
> > > Shannon
> > >
> > > On 9/18/14, 3:28 PM, Saikat Kanjilal wrote:
> > > >
> http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.rbf_kernel.html
> > > > I need to implement the above in the scala world and expose a DSL
> API to call the computation when computing the affinity matrix.
> > > >
> > > >> From: ted.dunn...@gmail.com
> > > >> Date: Thu, 18 Sep 2014 10:04:34 -0700
> > > >> Subject: Re: Mahout-1539-computation of gaussian kernel between 2
> arrays of shapes
> > > >> To: dev@mahout.apache.org
> > > >>
> > > >> There are number of non-traditional linear algebra operations like
> this
> > > >> that are important to implement.
> > > >>
> > > >> Can you describe what you intend to do so that we can discuss the
> shape of
> > > >> the API and computation?
> > > >>
> > > >>
> > > >>
> > > >> On Wed, Sep 17, 2014 at 9:28 PM, Saikat Kanjilal <
> sxk1...@hotmail.com>
> > > >> wrote:
> > > >>
> > > >>> Dmitry et al,As part of the above JIRA I need to calculate the
> gaussian
> > > >>> kernel between 2 shapes, I looked through mahout-math-scala and
> didnt see
> > > >>> anything to do this, any objections to me adding some code under
> > > >>> scalabindings to do this?
> > > >>> Thanks in advance.
> > > >
> > >
> >
>


RE: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-24 Thread Saikat Kanjilal
Shannon/Dmitry,Quick question, I'm wanting to calculate the scala equivalent of 
the frobenius norm per this API spec in python 
(http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.norm.html), I 
dug into the mahout-math-scala project and found the following API to calculate 
the norm:








def norm = sqrt(m.aggregate(Functions.PLUS, Functions.SQUARE))
I believe the above is also calculating the frobenius norm, however I am 
curious why we are calling a Java API from scala, the type of m above is a java 
interface called Matrix, I'm guessing the implementation of aggregate is 
happening in the math-math-scala somewhere, is that assumption correct?
Thanks in advance.
> From: sxk1...@hotmail.com
> To: dev@mahout.apache.org
> Subject: RE: Mahout-1539-computation of gaussian kernel between 2 arrays of 
> shapes
> Date: Thu, 18 Sep 2014 12:51:36 -0700
> 
> Ok great I'll use the cartesian spark API call, so what I'd still like some 
> thoughts on where the code that calls the cartesian should live in our 
> directory structure.
> > Date: Thu, 18 Sep 2014 15:33:59 -0400
> > From: squ...@gatech.edu
> > To: dev@mahout.apache.org
> > Subject: Re: Mahout-1539-computation of gaussian kernel between 2 arrays of 
> > shapes
> > 
> > Saikat,
> > 
> > Spark has the cartesian() method that will align all pairs of points; 
> > that's the nontrivial part of determining an RBF kernel. After that it's 
> > a simple matter of performing the equation that's given on the 
> > scikit-learn doc page.
> > 
> > However, like you said it'll also have to be implemented using the 
> > Mahout DSL. I can envision that users would like to compute pairwise 
> > metrics for a lot more than just RBF kernels (pairwise Euclidean 
> > distance, etc), so my guess would be a DSL implementation of cartesian() 
> > is what you're looking for. You can build the other methods on top of that.
> > 
> > Correct me if I'm wrong.
> > 
> > Shannon
> > 
> > On 9/18/14, 3:28 PM, Saikat Kanjilal wrote:
> > > http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.rbf_kernel.html
> > > I need to implement the above in the scala world and expose a DSL API to 
> > > call the computation when computing the affinity matrix.
> > >
> > >> From: ted.dunn...@gmail.com
> > >> Date: Thu, 18 Sep 2014 10:04:34 -0700
> > >> Subject: Re: Mahout-1539-computation of gaussian kernel between 2 arrays 
> > >> of shapes
> > >> To: dev@mahout.apache.org
> > >>
> > >> There are number of non-traditional linear algebra operations like this
> > >> that are important to implement.
> > >>
> > >> Can you describe what you intend to do so that we can discuss the shape 
> > >> of
> > >> the API and computation?
> > >>
> > >>
> > >>
> > >> On Wed, Sep 17, 2014 at 9:28 PM, Saikat Kanjilal 
> > >> wrote:
> > >>
> > >>> Dmitry et al,As part of the above JIRA I need to calculate the gaussian
> > >>> kernel between 2 shapes, I looked through mahout-math-scala and didnt 
> > >>> see
> > >>> anything to do this, any objections to me adding some code under
> > >>> scalabindings to do this?
> > >>> Thanks in advance.
> > >   
> > 
> 
  

Re: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-18 Thread Dmitriy Lyubimov
you want a REALLY-REALLY big matrix? as in distributed matrix?

On Thu, Sep 18, 2014 at 12:28 PM, Saikat Kanjilal 
wrote:

>
> http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.rbf_kernel.html
> I need to implement the above in the scala world and expose a DSL API to
> call the computation when computing the affinity matrix.
>
> > From: ted.dunn...@gmail.com
> > Date: Thu, 18 Sep 2014 10:04:34 -0700
> > Subject: Re: Mahout-1539-computation of gaussian kernel between 2 arrays
> of shapes
> > To: dev@mahout.apache.org
> >
> > There are number of non-traditional linear algebra operations like this
> > that are important to implement.
> >
> > Can you describe what you intend to do so that we can discuss the shape
> of
> > the API and computation?
> >
> >
> >
> > On Wed, Sep 17, 2014 at 9:28 PM, Saikat Kanjilal 
> > wrote:
> >
> > > Dmitry et al,As part of the above JIRA I need to calculate the gaussian
> > > kernel between 2 shapes, I looked through mahout-math-scala and didnt
> see
> > > anything to do this, any objections to me adding some code under
> > > scalabindings to do this?
> > > Thanks in advance.
>
>


RE: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-18 Thread Saikat Kanjilal
Ok great I'll use the cartesian spark API call, so what I'd still like some 
thoughts on where the code that calls the cartesian should live in our 
directory structure.
> Date: Thu, 18 Sep 2014 15:33:59 -0400
> From: squ...@gatech.edu
> To: dev@mahout.apache.org
> Subject: Re: Mahout-1539-computation of gaussian kernel between 2 arrays of 
> shapes
> 
> Saikat,
> 
> Spark has the cartesian() method that will align all pairs of points; 
> that's the nontrivial part of determining an RBF kernel. After that it's 
> a simple matter of performing the equation that's given on the 
> scikit-learn doc page.
> 
> However, like you said it'll also have to be implemented using the 
> Mahout DSL. I can envision that users would like to compute pairwise 
> metrics for a lot more than just RBF kernels (pairwise Euclidean 
> distance, etc), so my guess would be a DSL implementation of cartesian() 
> is what you're looking for. You can build the other methods on top of that.
> 
> Correct me if I'm wrong.
> 
> Shannon
> 
> On 9/18/14, 3:28 PM, Saikat Kanjilal wrote:
> > http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.rbf_kernel.html
> > I need to implement the above in the scala world and expose a DSL API to 
> > call the computation when computing the affinity matrix.
> >
> >> From: ted.dunn...@gmail.com
> >> Date: Thu, 18 Sep 2014 10:04:34 -0700
> >> Subject: Re: Mahout-1539-computation of gaussian kernel between 2 arrays 
> >> of shapes
> >> To: dev@mahout.apache.org
> >>
> >> There are number of non-traditional linear algebra operations like this
> >> that are important to implement.
> >>
> >> Can you describe what you intend to do so that we can discuss the shape of
> >> the API and computation?
> >>
> >>
> >>
> >> On Wed, Sep 17, 2014 at 9:28 PM, Saikat Kanjilal 
> >> wrote:
> >>
> >>> Dmitry et al,As part of the above JIRA I need to calculate the gaussian
> >>> kernel between 2 shapes, I looked through mahout-math-scala and didnt see
> >>> anything to do this, any objections to me adding some code under
> >>> scalabindings to do this?
> >>> Thanks in advance.
> > 
> 
  

Re: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-18 Thread Shannon Quinn

Saikat,

Spark has the cartesian() method that will align all pairs of points; 
that's the nontrivial part of determining an RBF kernel. After that it's 
a simple matter of performing the equation that's given on the 
scikit-learn doc page.


However, like you said it'll also have to be implemented using the 
Mahout DSL. I can envision that users would like to compute pairwise 
metrics for a lot more than just RBF kernels (pairwise Euclidean 
distance, etc), so my guess would be a DSL implementation of cartesian() 
is what you're looking for. You can build the other methods on top of that.


Correct me if I'm wrong.

Shannon

On 9/18/14, 3:28 PM, Saikat Kanjilal wrote:

http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.rbf_kernel.html
I need to implement the above in the scala world and expose a DSL API to call 
the computation when computing the affinity matrix.


From: ted.dunn...@gmail.com
Date: Thu, 18 Sep 2014 10:04:34 -0700
Subject: Re: Mahout-1539-computation of gaussian kernel between 2 arrays of 
shapes
To: dev@mahout.apache.org

There are number of non-traditional linear algebra operations like this
that are important to implement.

Can you describe what you intend to do so that we can discuss the shape of
the API and computation?



On Wed, Sep 17, 2014 at 9:28 PM, Saikat Kanjilal 
wrote:


Dmitry et al,As part of the above JIRA I need to calculate the gaussian
kernel between 2 shapes, I looked through mahout-math-scala and didnt see
anything to do this, any objections to me adding some code under
scalabindings to do this?
Thanks in advance.






RE: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-18 Thread Saikat Kanjilal
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.rbf_kernel.html
I need to implement the above in the scala world and expose a DSL API to call 
the computation when computing the affinity matrix.

> From: ted.dunn...@gmail.com
> Date: Thu, 18 Sep 2014 10:04:34 -0700
> Subject: Re: Mahout-1539-computation of gaussian kernel between 2 arrays of 
> shapes
> To: dev@mahout.apache.org
> 
> There are number of non-traditional linear algebra operations like this
> that are important to implement.
> 
> Can you describe what you intend to do so that we can discuss the shape of
> the API and computation?
> 
> 
> 
> On Wed, Sep 17, 2014 at 9:28 PM, Saikat Kanjilal 
> wrote:
> 
> > Dmitry et al,As part of the above JIRA I need to calculate the gaussian
> > kernel between 2 shapes, I looked through mahout-math-scala and didnt see
> > anything to do this, any objections to me adding some code under
> > scalabindings to do this?
> > Thanks in advance.
  

Re: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-18 Thread Ted Dunning
There are number of non-traditional linear algebra operations like this
that are important to implement.

Can you describe what you intend to do so that we can discuss the shape of
the API and computation?



On Wed, Sep 17, 2014 at 9:28 PM, Saikat Kanjilal 
wrote:

> Dmitry et al,As part of the above JIRA I need to calculate the gaussian
> kernel between 2 shapes, I looked through mahout-math-scala and didnt see
> anything to do this, any objections to me adding some code under
> scalabindings to do this?
> Thanks in advance.


Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-17 Thread Saikat Kanjilal
Dmitry et al,As part of the above JIRA I need to calculate the gaussian kernel 
between 2 shapes, I looked through mahout-math-scala and didnt see anything to 
do this, any objections to me adding some code under scalabindings to do this?
Thanks in advance.