Re: Google Groups: You've been added to Spark Developers
Yeah, unfortunately I think this way of doing mirroring won't work. Give me a few days and I'll start closing the Google groups and getting people here (starting with the dev list first). Matei On Sep 5, 2013, at 7:03 PM, Henry Saputra wrote: > Matei, I saw your test emails to Spark developer google group list but they > did not show up in the ASF dev@ list.
Re: Needs a matrix library
Yep. JAMA looks good to me as well. I am not all that familiar with it and think that JBLAS would be good too but that seems like it would add quite a few unneeded dependencies. Adam On Sun, Sep 8, 2013 at 6:38 PM, Chris Mattmann wrote: > Hi Martin, > > OK, so looking at the license for JAMA: > > http://wordhoard.northwestern.edu/userman/thirdparty/jama.html > > and > http://muuki88.github.io/jama-osgi/license.html > > > Looks like the later is ALv2 licensed. So JAMA looks good to me, > too. > > Cheers, > Chris > > -Original Message- > > From: Martin Desruisseaux > Organization: Geomatys > Reply-To: "d...@sis.apache.org" > Date: Sunday, September 8, 2013 1:23 PM > To: "d...@sis.apache.org" > Cc: "dev@spark.incubator.apache.org" > Subject: Re: Needs a matrix library > > >Thanks all for the tips. So if I'm summarizing right: > > > > * Hama and Spark are designed for distributed computing. Given that > >our need is for small matrices (usually no more than 5x5), > >distributed computing would probably be too much. However I keep > >Hama and Spark in mind for the SIS "Grid Coverage" (or Raster) > >processing part, to come later. > > * JBlas seems to be JNI wrappers around LAPACK and BLAS Fortran > >libraries. For small matrix, the JNI cost may be larger than the > >benefit. I will keep JBlas in mind for some computations that > >require large matrix, but those computations are not expected to > >occur in the "referencing" part of SIS. > > * Other libraries under compatible license include Apache Commons Math > >[1] and JAMA [2]. > > > > > >Given that Apache Commons is a large library (the JAR file is 1.6 Mb) > >while JAMA is very small and focused on Matrix only (a 36.5 kb file), I > >would be tempted to propose JAMA. Is there any though on that? > > > > > > Martin > > > > > >[1] http://commons.apache.org/proper/commons-math/ > >[2] http://math.nist.gov/javanumerics/jama/ > > > > >
Re: Needs a matrix library
Hi Martin, OK, so looking at the license for JAMA: http://wordhoard.northwestern.edu/userman/thirdparty/jama.html and http://muuki88.github.io/jama-osgi/license.html Looks like the later is ALv2 licensed. So JAMA looks good to me, too. Cheers, Chris -Original Message- From: Martin Desruisseaux Organization: Geomatys Reply-To: "d...@sis.apache.org" Date: Sunday, September 8, 2013 1:23 PM To: "d...@sis.apache.org" Cc: "dev@spark.incubator.apache.org" Subject: Re: Needs a matrix library >Thanks all for the tips. So if I'm summarizing right: > > * Hama and Spark are designed for distributed computing. Given that >our need is for small matrices (usually no more than 5x5), >distributed computing would probably be too much. However I keep >Hama and Spark in mind for the SIS "Grid Coverage" (or Raster) >processing part, to come later. > * JBlas seems to be JNI wrappers around LAPACK and BLAS Fortran >libraries. For small matrix, the JNI cost may be larger than the >benefit. I will keep JBlas in mind for some computations that >require large matrix, but those computations are not expected to >occur in the "referencing" part of SIS. > * Other libraries under compatible license include Apache Commons Math >[1] and JAMA [2]. > > >Given that Apache Commons is a large library (the JAR file is 1.6 Mb) >while JAMA is very small and focused on Matrix only (a 36.5 kb file), I >would be tempted to propose JAMA. Is there any though on that? > > > Martin > > >[1] http://commons.apache.org/proper/commons-math/ >[2] http://math.nist.gov/javanumerics/jama/ >
Re: Needs a matrix library
Martin Desruisseaux wrote: > * JBlas seems to be JNI wrappers around LAPACK and BLAS Fortran >libraries. According to http://mikiobraun.github.io/jblas/javadoc/ that's not true for operations that are faster to perform in the JVM.
Re: Needs a matrix library
Thanks all for the tips. So if I'm summarizing right: * Hama and Spark are designed for distributed computing. Given that our need is for small matrices (usually no more than 5x5), distributed computing would probably be too much. However I keep Hama and Spark in mind for the SIS "Grid Coverage" (or Raster) processing part, to come later. * JBlas seems to be JNI wrappers around LAPACK and BLAS Fortran libraries. For small matrix, the JNI cost may be larger than the benefit. I will keep JBlas in mind for some computations that require large matrix, but those computations are not expected to occur in the "referencing" part of SIS. * Other libraries under compatible license include Apache Commons Math [1] and JAMA [2]. Given that Apache Commons is a large library (the JAR file is 1.6 Mb) while JAMA is very small and focused on Matrix only (a 36.5 kb file), I would be tempted to propose JAMA. Is there any though on that? Martin [1] http://commons.apache.org/proper/commons-math/ [2] http://math.nist.gov/javanumerics/jama/
Re: Spark 0.8.0-incubating RC2
Thanks Henry. The MLLib files have been fixed since you ran the tool. On Sat, Sep 7, 2013 at 11:25 PM, Henry Saputra wrote: > HI Patrick, > > I ran the Apache RAT tool as shown at > http://creadur.apache.org/rat/apache-rat/index.html: > > java -jar apache-rat-0.10.jar ~/Downloads/spark-0.8.0-src-incubating-RC2 > > However we should add maven plugin to Spark pom.xml to support > integrated RAT check as part of CI later. > > - Henry > > On Sat, Sep 7, 2013 at 11:24 AM, Patrick Wendell wrote: >> Henry, >> >> Thanks a lot for your feedback. >> >> Could you let me know how you ran Apache RAT tool so I can reproduce this? >> >> My sense is that the best "next step" is to do a RC that is built >> against the Apache Git and also includes both `src` and `bin` in >> addition to cleaned up license files. Some inline responses below. >> >>> 1. I only see source artifacts in Patrick's p.a.o URL. I assume the >>> pre-built ones will also be published with hash and signed? >> >> Yes, we'll do both src and binary releases. I'll hash, and sign both. >> >>> 2. For every ASF release, we need designated release engineer (RE) >>> that will drive the release process including determining bugs to be >>> included, make sure all files have the right ASF header (running maven >>> RAT plugin check), create release branch, update version for next >>> development, create release artifacts and sign them correctly. I >>> assume this would be Matei or Patrick? >> >> Yes, this might be me for this release because I've got the keys >> correctly set-up. I'll chat with Matei when he's back. >> >>> 3. The proposed source artifacts 0.8.0-RC2's signature looks good and >>> hash looks good. However it was generated against github mesos:spark >>> repo. >>> Reminder that when we send proposal for release to >>> general@incubator.a.o we need to generate RC builds using ASF git repo >>> with the right tagged branch. >> >> Next RC we will take care of this. >> >>> 4. I ran RAT check for the source artifact and found a lot of source >>> do not have ASF license header. >>> >>> For example some in repl directory has this: >>> >>> /* NSC -- new Scala compiler >>> * Copyright 2005-2011 LAMP/EPFL >>> * @author Paul Phillips >>> */ >>> >>> Not sure if we need to ASF header to it since we are technically put >>> in under apache package. >>> >>> Scala source files under mllib are missing ASF headers. >> >> See comment above. >> >>> 5. Add public key of RE to >>> http://people.apache.org/keys/group/spark.asc (@Chris do we still need >>> to create KEYS file in the Spark git repo?) >> >> This is now finished for me :)
Adding support for implicit feedback to ALS
Hi I know everyone's pretty busy with getting 0.8.0 out, but as and when folks have time it would be great to get your feedback on this PR adding support for the 'implicit feedback' model variant to ALS: https://github.com/apache/incubator-spark/pull/4 In particular any potential efficiency improvements, issues, and testing it out locally and on a cluster and on some datasets! Comments & feedback welcome. Many thanks Nick