Re: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-25 Thread Dmitriy Lyubimov
On Wed, Sep 24, 2014 at 9:15 PM, Saikat Kanjilal sxk1...@hotmail.com
wrote:

 Shannon/Dmitry,Quick question, I'm wanting to calculate the scala
 equivalent of the frobenius norm per this API spec in python (
 http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.norm.html),
 I dug into the mahout-math-scala project and found the following API to
 calculate the norm:








 def norm = sqrt(m.aggregate(Functions.PLUS, Functions.SQUARE))
 I believe the above is also calculating the frobenius norm, however I am
 curious why we are calling a Java API from scala, the type of m above is a
 java interface called Matrix, I'm guessing the implementation of aggregate
 is happening in the math-math-scala somewhere, is that assumption correct?


We are colling Colt (i.e. java) for pretty much everything. As far as scala
bindings are concerned, they are but a DSL wrapper to Colt (unlike
distributed algebra which is much more).

Aggregate is Colt's thing. Colt (aka Mahout-math) establish java-side
concept of different function types which are unfortunately not compatible
with Scala literals.




 Thanks in advance.
  From: sxk1...@hotmail.com
  To: dev@mahout.apache.org
  Subject: RE: Mahout-1539-computation of gaussian kernel between 2 arrays
 of shapes
  Date: Thu, 18 Sep 2014 12:51:36 -0700
 
  Ok great I'll use the cartesian spark API call, so what I'd still like
 some thoughts on where the code that calls the cartesian should live in our
 directory structure.
   Date: Thu, 18 Sep 2014 15:33:59 -0400
   From: squ...@gatech.edu
   To: dev@mahout.apache.org
   Subject: Re: Mahout-1539-computation of gaussian kernel between 2
 arrays of shapes
  
   Saikat,
  
   Spark has the cartesian() method that will align all pairs of points;
   that's the nontrivial part of determining an RBF kernel. After that
 it's
   a simple matter of performing the equation that's given on the
   scikit-learn doc page.
  
   However, like you said it'll also have to be implemented using the
   Mahout DSL. I can envision that users would like to compute pairwise
   metrics for a lot more than just RBF kernels (pairwise Euclidean
   distance, etc), so my guess would be a DSL implementation of
 cartesian()
   is what you're looking for. You can build the other methods on top of
 that.
  
   Correct me if I'm wrong.
  
   Shannon
  
   On 9/18/14, 3:28 PM, Saikat Kanjilal wrote:
   
 http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.rbf_kernel.html
I need to implement the above in the scala world and expose a DSL
 API to call the computation when computing the affinity matrix.
   
From: ted.dunn...@gmail.com
Date: Thu, 18 Sep 2014 10:04:34 -0700
Subject: Re: Mahout-1539-computation of gaussian kernel between 2
 arrays of shapes
To: dev@mahout.apache.org
   
There are number of non-traditional linear algebra operations like
 this
that are important to implement.
   
Can you describe what you intend to do so that we can discuss the
 shape of
the API and computation?
   
   
   
On Wed, Sep 17, 2014 at 9:28 PM, Saikat Kanjilal 
 sxk1...@hotmail.com
wrote:
   
Dmitry et al,As part of the above JIRA I need to calculate the
 gaussian
kernel between 2 shapes, I looked through mahout-math-scala and
 didnt see
anything to do this, any objections to me adding some code under
scalabindings to do this?
Thanks in advance.
   
  
 




Re: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-25 Thread Ted Dunning
On Wed, Sep 24, 2014 at 11:09 PM, Dmitriy Lyubimov dlie...@gmail.com
wrote:

 Aggregate is Colt's thing. Colt (aka Mahout-math) establish java-side
 concept of different function types which are unfortunately not compatible
 with Scala literals.


Dmitriy,

Is this because we have other methods that describe the characteristics of
the function?

What would be the Scala friendly idiom?  Additional traits?


Re: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-25 Thread Saikat Kanjilal
From a big picture perspective do we intend to keep colt around or write scala 
implementations for functions like the aggregate, if so then I can add scala 
code to do the aggregation and call it from the DSL for the norm.

Sent from my iPhone

 On Sep 25, 2014, at 12:25 AM, Ted Dunning ted.dunn...@gmail.com wrote:
 
 On Wed, Sep 24, 2014 at 11:09 PM, Dmitriy Lyubimov dlie...@gmail.com
 wrote:
 
 Aggregate is Colt's thing. Colt (aka Mahout-math) establish java-side
 concept of different function types which are unfortunately not compatible
 with Scala literals.
 
 Dmitriy,
 
 Is this because we have other methods that describe the characteristics of
 the function?
 
 What would be the Scala friendly idiom?  Additional traits?


Re: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-25 Thread Dmitriy Lyubimov
Scala function literals (or any function literal) derive from a particular
set of traits. It may be java classes are able to implement these traits
(nobody that i know attempted to do that), and then maybe they will become
supported as scala function types. But i think even that is a big if, since
scala compiler tinkers with bytecode a lot, and compatibility at bytecode
level is not guaranteed between scala major releases. Bottom line, even if
it is possible to write scala functions in java, it is definitely not
publicly documented feature.

On the other hand, it is possible to use function-like Colt classes such
as DoubleDoubleFunction just like a plain old reference-type object from
either scala or Java, which is exactly how it happens in the example given
in the question originally asked.

On Thu, Sep 25, 2014 at 12:24 AM, Ted Dunning ted.dunn...@gmail.com wrote:

 On Wed, Sep 24, 2014 at 11:09 PM, Dmitriy Lyubimov dlie...@gmail.com
 wrote:

  Aggregate is Colt's thing. Colt (aka Mahout-math) establish java-side
  concept of different function types which are unfortunately not
 compatible
  with Scala literals.
 

 Dmitriy,

 Is this because we have other methods that describe the characteristics of
 the function?

 What would be the Scala friendly idiom?  Additional traits?



Re: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-25 Thread Dmitriy Lyubimov
to be absolutely frank, if i could divorce easily from Colt, I
would've divorced the entire scala code from Mahout. Unfortunately
currently it is not very realistic case for me. More hopefully, we could
patch Colt for major problems and add new backs there.

As for pure scala backend, it already exists and it is called Breeze
project (something MLib uses internally), supported by David Hall (among
others). It also includes a lot more common non-distributed math than just
algebra. By my estimate, it is one of the most well-round and comprehensive
math libraries in existence today. It has, however, had significant
difficulties dealing with sparse/dense operation optimizations in the past,
as well as modelling, not sure as of this very moment. Colt at some point
was marginally better in typing sparse in-memory idioms.

On Thu, Sep 25, 2014 at 5:32 AM, Saikat Kanjilal sxk1...@hotmail.com
wrote:

 From a big picture perspective do we intend to keep colt around or write
 scala implementations for functions like the aggregate, if so then I can
 add scala code to do the aggregation and call it from the DSL for the norm.

 Sent from my iPhone

  On Sep 25, 2014, at 12:25 AM, Ted Dunning ted.dunn...@gmail.com wrote:
 
  On Wed, Sep 24, 2014 at 11:09 PM, Dmitriy Lyubimov dlie...@gmail.com
  wrote:
 
  Aggregate is Colt's thing. Colt (aka Mahout-math) establish java-side
  concept of different function types which are unfortunately not
 compatible
  with Scala literals.
 
  Dmitriy,
 
  Is this because we have other methods that describe the characteristics
 of
  the function?
 
  What would be the Scala friendly idiom?  Additional traits?



Re: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-25 Thread Dmitriy Lyubimov
On Thu, Sep 25, 2014 at 8:50 AM, Dmitriy Lyubimov dlie...@gmail.com wrote:

 As for pure scala backend, it already exists and it is called Breeze
 project (something MLib uses internally), supported by David Hall (among
 others). It also includes a lot more common non-distributed math than just
 algebra. By my estimate, it is one of the most well-rounded and
 comprehensive math libraries in existence today ...


for JVM.


RE: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-24 Thread Saikat Kanjilal
Shannon/Dmitry,Quick question, I'm wanting to calculate the scala equivalent of 
the frobenius norm per this API spec in python 
(http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.norm.html), I 
dug into the mahout-math-scala project and found the following API to calculate 
the norm:








def norm = sqrt(m.aggregate(Functions.PLUS, Functions.SQUARE))
I believe the above is also calculating the frobenius norm, however I am 
curious why we are calling a Java API from scala, the type of m above is a java 
interface called Matrix, I'm guessing the implementation of aggregate is 
happening in the math-math-scala somewhere, is that assumption correct?
Thanks in advance.
 From: sxk1...@hotmail.com
 To: dev@mahout.apache.org
 Subject: RE: Mahout-1539-computation of gaussian kernel between 2 arrays of 
 shapes
 Date: Thu, 18 Sep 2014 12:51:36 -0700
 
 Ok great I'll use the cartesian spark API call, so what I'd still like some 
 thoughts on where the code that calls the cartesian should live in our 
 directory structure.
  Date: Thu, 18 Sep 2014 15:33:59 -0400
  From: squ...@gatech.edu
  To: dev@mahout.apache.org
  Subject: Re: Mahout-1539-computation of gaussian kernel between 2 arrays of 
  shapes
  
  Saikat,
  
  Spark has the cartesian() method that will align all pairs of points; 
  that's the nontrivial part of determining an RBF kernel. After that it's 
  a simple matter of performing the equation that's given on the 
  scikit-learn doc page.
  
  However, like you said it'll also have to be implemented using the 
  Mahout DSL. I can envision that users would like to compute pairwise 
  metrics for a lot more than just RBF kernels (pairwise Euclidean 
  distance, etc), so my guess would be a DSL implementation of cartesian() 
  is what you're looking for. You can build the other methods on top of that.
  
  Correct me if I'm wrong.
  
  Shannon
  
  On 9/18/14, 3:28 PM, Saikat Kanjilal wrote:
   http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.rbf_kernel.html
   I need to implement the above in the scala world and expose a DSL API to 
   call the computation when computing the affinity matrix.
  
   From: ted.dunn...@gmail.com
   Date: Thu, 18 Sep 2014 10:04:34 -0700
   Subject: Re: Mahout-1539-computation of gaussian kernel between 2 arrays 
   of shapes
   To: dev@mahout.apache.org
  
   There are number of non-traditional linear algebra operations like this
   that are important to implement.
  
   Can you describe what you intend to do so that we can discuss the shape 
   of
   the API and computation?
  
  
  
   On Wed, Sep 17, 2014 at 9:28 PM, Saikat Kanjilal sxk1...@hotmail.com
   wrote:
  
   Dmitry et al,As part of the above JIRA I need to calculate the gaussian
   kernel between 2 shapes, I looked through mahout-math-scala and didnt 
   see
   anything to do this, any objections to me adding some code under
   scalabindings to do this?
   Thanks in advance.
 
  
 
  

Re: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-24 Thread Ted Dunning
Yes.  That code is computing Frobenius norm.

I can't answer the context question about Scala calling Java, however.

On Wed, Sep 24, 2014 at 9:15 PM, Saikat Kanjilal sxk1...@hotmail.com
wrote:

 Shannon/Dmitry,Quick question, I'm wanting to calculate the scala
 equivalent of the frobenius norm per this API spec in python (
 http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.norm.html),
 I dug into the mahout-math-scala project and found the following API to
 calculate the norm:








 def norm = sqrt(m.aggregate(Functions.PLUS, Functions.SQUARE))
 I believe the above is also calculating the frobenius norm, however I am
 curious why we are calling a Java API from scala, the type of m above is a
 java interface called Matrix, I'm guessing the implementation of aggregate
 is happening in the math-math-scala somewhere, is that assumption correct?
 Thanks in advance.
  From: sxk1...@hotmail.com
  To: dev@mahout.apache.org
  Subject: RE: Mahout-1539-computation of gaussian kernel between 2 arrays
 of shapes
  Date: Thu, 18 Sep 2014 12:51:36 -0700
 
  Ok great I'll use the cartesian spark API call, so what I'd still like
 some thoughts on where the code that calls the cartesian should live in our
 directory structure.
   Date: Thu, 18 Sep 2014 15:33:59 -0400
   From: squ...@gatech.edu
   To: dev@mahout.apache.org
   Subject: Re: Mahout-1539-computation of gaussian kernel between 2
 arrays of shapes
  
   Saikat,
  
   Spark has the cartesian() method that will align all pairs of points;
   that's the nontrivial part of determining an RBF kernel. After that
 it's
   a simple matter of performing the equation that's given on the
   scikit-learn doc page.
  
   However, like you said it'll also have to be implemented using the
   Mahout DSL. I can envision that users would like to compute pairwise
   metrics for a lot more than just RBF kernels (pairwise Euclidean
   distance, etc), so my guess would be a DSL implementation of
 cartesian()
   is what you're looking for. You can build the other methods on top of
 that.
  
   Correct me if I'm wrong.
  
   Shannon
  
   On 9/18/14, 3:28 PM, Saikat Kanjilal wrote:
   
 http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.rbf_kernel.html
I need to implement the above in the scala world and expose a DSL
 API to call the computation when computing the affinity matrix.
   
From: ted.dunn...@gmail.com
Date: Thu, 18 Sep 2014 10:04:34 -0700
Subject: Re: Mahout-1539-computation of gaussian kernel between 2
 arrays of shapes
To: dev@mahout.apache.org
   
There are number of non-traditional linear algebra operations like
 this
that are important to implement.
   
Can you describe what you intend to do so that we can discuss the
 shape of
the API and computation?
   
   
   
On Wed, Sep 17, 2014 at 9:28 PM, Saikat Kanjilal 
 sxk1...@hotmail.com
wrote:
   
Dmitry et al,As part of the above JIRA I need to calculate the
 gaussian
kernel between 2 shapes, I looked through mahout-math-scala and
 didnt see
anything to do this, any objections to me adding some code under
scalabindings to do this?
Thanks in advance.
   
  
 



Re: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-18 Thread Ted Dunning
There are number of non-traditional linear algebra operations like this
that are important to implement.

Can you describe what you intend to do so that we can discuss the shape of
the API and computation?



On Wed, Sep 17, 2014 at 9:28 PM, Saikat Kanjilal sxk1...@hotmail.com
wrote:

 Dmitry et al,As part of the above JIRA I need to calculate the gaussian
 kernel between 2 shapes, I looked through mahout-math-scala and didnt see
 anything to do this, any objections to me adding some code under
 scalabindings to do this?
 Thanks in advance.


RE: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-18 Thread Saikat Kanjilal
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.rbf_kernel.html
I need to implement the above in the scala world and expose a DSL API to call 
the computation when computing the affinity matrix.

 From: ted.dunn...@gmail.com
 Date: Thu, 18 Sep 2014 10:04:34 -0700
 Subject: Re: Mahout-1539-computation of gaussian kernel between 2 arrays of 
 shapes
 To: dev@mahout.apache.org
 
 There are number of non-traditional linear algebra operations like this
 that are important to implement.
 
 Can you describe what you intend to do so that we can discuss the shape of
 the API and computation?
 
 
 
 On Wed, Sep 17, 2014 at 9:28 PM, Saikat Kanjilal sxk1...@hotmail.com
 wrote:
 
  Dmitry et al,As part of the above JIRA I need to calculate the gaussian
  kernel between 2 shapes, I looked through mahout-math-scala and didnt see
  anything to do this, any objections to me adding some code under
  scalabindings to do this?
  Thanks in advance.
  

Re: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-18 Thread Shannon Quinn

Saikat,

Spark has the cartesian() method that will align all pairs of points; 
that's the nontrivial part of determining an RBF kernel. After that it's 
a simple matter of performing the equation that's given on the 
scikit-learn doc page.


However, like you said it'll also have to be implemented using the 
Mahout DSL. I can envision that users would like to compute pairwise 
metrics for a lot more than just RBF kernels (pairwise Euclidean 
distance, etc), so my guess would be a DSL implementation of cartesian() 
is what you're looking for. You can build the other methods on top of that.


Correct me if I'm wrong.

Shannon

On 9/18/14, 3:28 PM, Saikat Kanjilal wrote:

http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.rbf_kernel.html
I need to implement the above in the scala world and expose a DSL API to call 
the computation when computing the affinity matrix.


From: ted.dunn...@gmail.com
Date: Thu, 18 Sep 2014 10:04:34 -0700
Subject: Re: Mahout-1539-computation of gaussian kernel between 2 arrays of 
shapes
To: dev@mahout.apache.org

There are number of non-traditional linear algebra operations like this
that are important to implement.

Can you describe what you intend to do so that we can discuss the shape of
the API and computation?



On Wed, Sep 17, 2014 at 9:28 PM, Saikat Kanjilal sxk1...@hotmail.com
wrote:


Dmitry et al,As part of the above JIRA I need to calculate the gaussian
kernel between 2 shapes, I looked through mahout-math-scala and didnt see
anything to do this, any objections to me adding some code under
scalabindings to do this?
Thanks in advance.






RE: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-18 Thread Saikat Kanjilal
Ok great I'll use the cartesian spark API call, so what I'd still like some 
thoughts on where the code that calls the cartesian should live in our 
directory structure.
 Date: Thu, 18 Sep 2014 15:33:59 -0400
 From: squ...@gatech.edu
 To: dev@mahout.apache.org
 Subject: Re: Mahout-1539-computation of gaussian kernel between 2 arrays of 
 shapes
 
 Saikat,
 
 Spark has the cartesian() method that will align all pairs of points; 
 that's the nontrivial part of determining an RBF kernel. After that it's 
 a simple matter of performing the equation that's given on the 
 scikit-learn doc page.
 
 However, like you said it'll also have to be implemented using the 
 Mahout DSL. I can envision that users would like to compute pairwise 
 metrics for a lot more than just RBF kernels (pairwise Euclidean 
 distance, etc), so my guess would be a DSL implementation of cartesian() 
 is what you're looking for. You can build the other methods on top of that.
 
 Correct me if I'm wrong.
 
 Shannon
 
 On 9/18/14, 3:28 PM, Saikat Kanjilal wrote:
  http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.rbf_kernel.html
  I need to implement the above in the scala world and expose a DSL API to 
  call the computation when computing the affinity matrix.
 
  From: ted.dunn...@gmail.com
  Date: Thu, 18 Sep 2014 10:04:34 -0700
  Subject: Re: Mahout-1539-computation of gaussian kernel between 2 arrays 
  of shapes
  To: dev@mahout.apache.org
 
  There are number of non-traditional linear algebra operations like this
  that are important to implement.
 
  Can you describe what you intend to do so that we can discuss the shape of
  the API and computation?
 
 
 
  On Wed, Sep 17, 2014 at 9:28 PM, Saikat Kanjilal sxk1...@hotmail.com
  wrote:
 
  Dmitry et al,As part of the above JIRA I need to calculate the gaussian
  kernel between 2 shapes, I looked through mahout-math-scala and didnt see
  anything to do this, any objections to me adding some code under
  scalabindings to do this?
  Thanks in advance.
  
 
  

Re: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-18 Thread Dmitriy Lyubimov
you want a REALLY-REALLY big matrix? as in distributed matrix?

On Thu, Sep 18, 2014 at 12:28 PM, Saikat Kanjilal sxk1...@hotmail.com
wrote:


 http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.rbf_kernel.html
 I need to implement the above in the scala world and expose a DSL API to
 call the computation when computing the affinity matrix.

  From: ted.dunn...@gmail.com
  Date: Thu, 18 Sep 2014 10:04:34 -0700
  Subject: Re: Mahout-1539-computation of gaussian kernel between 2 arrays
 of shapes
  To: dev@mahout.apache.org
 
  There are number of non-traditional linear algebra operations like this
  that are important to implement.
 
  Can you describe what you intend to do so that we can discuss the shape
 of
  the API and computation?
 
 
 
  On Wed, Sep 17, 2014 at 9:28 PM, Saikat Kanjilal sxk1...@hotmail.com
  wrote:
 
   Dmitry et al,As part of the above JIRA I need to calculate the gaussian
   kernel between 2 shapes, I looked through mahout-math-scala and didnt
 see
   anything to do this, any objections to me adding some code under
   scalabindings to do this?
   Thanks in advance.




Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-17 Thread Saikat Kanjilal
Dmitry et al,As part of the above JIRA I need to calculate the gaussian kernel 
between 2 shapes, I looked through mahout-math-scala and didnt see anything to 
do this, any objections to me adding some code under scalabindings to do this?
Thanks in advance.