Re: Google Groups: You've been added to Spark Developers

2013-09-08 Thread Matei Zaharia
Yeah, unfortunately I think this way of doing mirroring won't work. Give me a 
few days and I'll start closing the Google groups and getting people here 
(starting with the dev list first).

Matei


On Sep 5, 2013, at 7:03 PM, Henry Saputra  wrote:

> Matei, I saw your test emails to Spark developer google group list but they 
> did not show up in the ASF dev@ list.



Re: Needs a matrix library

2013-09-08 Thread Adam Estrada
Yep. JAMA looks good to me as well. I am not all that familiar with it and
think that JBLAS would be good too but that seems like it would add quite a
few unneeded dependencies.

Adam


On Sun, Sep 8, 2013 at 6:38 PM, Chris Mattmann  wrote:

> Hi Martin,
>
> OK, so looking at the license for JAMA:
>
> http://wordhoard.northwestern.edu/userman/thirdparty/jama.html
>
> and
> http://muuki88.github.io/jama-osgi/license.html
>
>
> Looks like the later is ALv2 licensed. So JAMA looks good to me,
> too.
>
> Cheers,
> Chris
>
> -Original Message-
>
> From: Martin Desruisseaux 
> Organization: Geomatys
> Reply-To: "d...@sis.apache.org" 
> Date: Sunday, September 8, 2013 1:23 PM
> To: "d...@sis.apache.org" 
> Cc: "dev@spark.incubator.apache.org" 
> Subject: Re: Needs a matrix library
>
> >Thanks all for the tips. So if I'm summarizing right:
> >
> >  * Hama and Spark are designed for distributed computing. Given that
> >our need is for small matrices (usually no more than 5x5),
> >distributed computing would probably be too much. However I keep
> >Hama and Spark in mind for the SIS "Grid Coverage" (or Raster)
> >processing part, to come later.
> >  * JBlas seems to be JNI wrappers around LAPACK and BLAS Fortran
> >libraries. For small matrix, the JNI cost may be larger than the
> >benefit. I will keep JBlas in mind for some computations that
> >require large matrix, but those computations are not expected to
> >occur in the "referencing" part of SIS.
> >  * Other libraries under compatible license include Apache Commons Math
> >[1] and JAMA [2].
> >
> >
> >Given that Apache Commons is a large library (the JAR file is 1.6 Mb)
> >while JAMA is very small and focused on Matrix only (a 36.5 kb file), I
> >would be tempted to propose JAMA. Is there any though on that?
> >
> >
> > Martin
> >
> >
> >[1] http://commons.apache.org/proper/commons-math/
> >[2] http://math.nist.gov/javanumerics/jama/
> >
>
>
>


Re: Needs a matrix library

2013-09-08 Thread Chris Mattmann
Hi Martin,

OK, so looking at the license for JAMA:

http://wordhoard.northwestern.edu/userman/thirdparty/jama.html

and 
http://muuki88.github.io/jama-osgi/license.html


Looks like the later is ALv2 licensed. So JAMA looks good to me,
too.

Cheers,
Chris

-Original Message-

From: Martin Desruisseaux 
Organization: Geomatys
Reply-To: "d...@sis.apache.org" 
Date: Sunday, September 8, 2013 1:23 PM
To: "d...@sis.apache.org" 
Cc: "dev@spark.incubator.apache.org" 
Subject: Re: Needs a matrix library

>Thanks all for the tips. So if I'm summarizing right:
>
>  * Hama and Spark are designed for distributed computing. Given that
>our need is for small matrices (usually no more than 5x5),
>distributed computing would probably be too much. However I keep
>Hama and Spark in mind for the SIS "Grid Coverage" (or Raster)
>processing part, to come later.
>  * JBlas seems to be JNI wrappers around LAPACK and BLAS Fortran
>libraries. For small matrix, the JNI cost may be larger than the
>benefit. I will keep JBlas in mind for some computations that
>require large matrix, but those computations are not expected to
>occur in the "referencing" part of SIS.
>  * Other libraries under compatible license include Apache Commons Math
>[1] and JAMA [2].
>
>
>Given that Apache Commons is a large library (the JAR file is 1.6 Mb)
>while JAMA is very small and focused on Matrix only (a 36.5 kb file), I
>would be tempted to propose JAMA. Is there any though on that?
>
>
> Martin
>
>
>[1] http://commons.apache.org/proper/commons-math/
>[2] http://math.nist.gov/javanumerics/jama/
>




Re: Needs a matrix library

2013-09-08 Thread Mike
Martin Desruisseaux wrote:
>  * JBlas seems to be JNI wrappers around LAPACK and BLAS Fortran
>libraries.

According to http://mikiobraun.github.io/jblas/javadoc/ that's not true 
for operations that are faster to perform in the JVM.


Re: Needs a matrix library

2013-09-08 Thread Martin Desruisseaux

Thanks all for the tips. So if I'm summarizing right:

 * Hama and Spark are designed for distributed computing. Given that
   our need is for small matrices (usually no more than 5x5),
   distributed computing would probably be too much. However I keep
   Hama and Spark in mind for the SIS "Grid Coverage" (or Raster)
   processing part, to come later.
 * JBlas seems to be JNI wrappers around LAPACK and BLAS Fortran
   libraries. For small matrix, the JNI cost may be larger than the
   benefit. I will keep JBlas in mind for some computations that
   require large matrix, but those computations are not expected to
   occur in the "referencing" part of SIS.
 * Other libraries under compatible license include Apache Commons Math
   [1] and JAMA [2].


Given that Apache Commons is a large library (the JAR file is 1.6 Mb) 
while JAMA is very small and focused on Matrix only (a 36.5 kb file), I 
would be tempted to propose JAMA. Is there any though on that?



Martin


[1] http://commons.apache.org/proper/commons-math/
[2] http://math.nist.gov/javanumerics/jama/



Re: Spark 0.8.0-incubating RC2

2013-09-08 Thread Patrick Wendell
Thanks Henry. The MLLib files have been fixed since you ran the tool.

On Sat, Sep 7, 2013 at 11:25 PM, Henry Saputra  wrote:
> HI Patrick,
>
> I ran the Apache RAT tool as shown at
> http://creadur.apache.org/rat/apache-rat/index.html:
>
> java -jar apache-rat-0.10.jar ~/Downloads/spark-0.8.0-src-incubating-RC2
>
> However we should add maven plugin to Spark pom.xml to support
> integrated RAT check as part of CI later.
>
> - Henry
>
> On Sat, Sep 7, 2013 at 11:24 AM, Patrick Wendell  wrote:
>> Henry,
>>
>> Thanks a lot for your feedback.
>>
>> Could you let me know how you ran Apache RAT tool so I can reproduce this?
>>
>> My sense is that the best "next step" is to do a RC that is built
>> against the Apache Git and also includes both `src` and `bin` in
>> addition to cleaned up license files. Some inline responses below.
>>
>>> 1. I only see source artifacts in Patrick's p.a.o URL. I assume the
>>> pre-built ones will also be published with hash and signed?
>>
>> Yes, we'll do both src and binary releases. I'll hash, and sign both.
>>
>>> 2. For every ASF release, we need designated release engineer (RE)
>>> that will drive the release process including determining bugs to be
>>> included, make sure all files have the right ASF header (running maven
>>> RAT plugin check), create release branch, update version for next
>>> development, create release artifacts and sign them correctly. I
>>> assume this would be Matei or Patrick?
>>
>> Yes, this might be me for this release because I've got the keys
>> correctly set-up. I'll chat with Matei when he's back.
>>
>>> 3. The proposed source artifacts 0.8.0-RC2's signature looks good and
>>> hash looks good. However it was generated against github mesos:spark
>>> repo.
>>> Reminder that when we send proposal for release to
>>> general@incubator.a.o we need to generate RC builds using ASF git repo
>>> with the right tagged branch.
>>
>> Next RC we will take care of this.
>>
>>> 4. I ran RAT check for the source artifact and found a lot of source
>>> do not have ASF license header.
>>>
>>>  For example some in repl directory has this:
>>>
>>> /* NSC -- new Scala compiler
>>>  * Copyright 2005-2011 LAMP/EPFL
>>>  * @author Paul Phillips
>>>  */
>>>
>>> Not sure if we need to ASF header to it since we are technically put
>>> in under apache package.
>>>
>>> Scala source files under mllib are missing ASF headers.
>>
>> See comment above.
>>
>>> 5. Add public key of RE to
>>> http://people.apache.org/keys/group/spark.asc (@Chris do we still need
>>> to create KEYS file in the Spark git repo?)
>>
>> This is now finished for me :)


Adding support for implicit feedback to ALS

2013-09-08 Thread Nick Pentreath
Hi

I know everyone's pretty busy with getting 0.8.0 out, but as and when folks
have time it would be great to get your feedback on this PR adding support
for the 'implicit feedback' model variant to ALS:
https://github.com/apache/incubator-spark/pull/4

In particular any potential efficiency improvements, issues, and testing it
out locally and on a cluster and on some datasets!

Comments & feedback welcome.

Many thanks
Nick