Spark Matrix Factorization

2014-01-02 Thread Debasish Das
Hi, I am not noticing any DSGD implementation of ALS in Spark. There are two ALS implementations. org.apache.spark.examples.SparkALS does not run on large matrices and seems more like a demo code. org.apache.spark.mllib.recommendation.ALS looks feels more robust version and I am experimenting

Re: Spark Matrix Factorization

2014-01-02 Thread Ameet Talwalkar
Hi Deb, Thanks for your email. We currently do not have a DSGD implementation in MLlib. Also, just to clarify, DSGD is not a variant of ALS, but rather a different algorithm for solving the same the same bi-convex objective function. It would be a good thing to do add, but to the best of my

compiling against hadoop 2.2

2014-01-02 Thread Ted Yu
Hi, I used the following command to compile against hadoop 2.2: mvn clean package -DskipTests -Pnew-yarn But I got a lot of compilation errors. Did I use the wrong command ? Cheers

RE: compiling against hadoop 2.2

2014-01-02 Thread Liu, Raymond
I think you also need to set yarn.version Say something like mvn -Pyarn -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 -DskipTests clean package hadoop.version is default to 2.2.0 while yarn.version not when you chose the new-yarn profile. We probably need to fix it later for easy usage.

RE: compiling against hadoop 2.2

2014-01-02 Thread Liu, Raymond
Sorry , mvn -Pnew-yarn -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 -DskipTests clean package The one in previous mail not yet available. Best Regards, Raymond Liu -Original Message- From: Liu, Raymond Sent: Friday, January 03, 2014 2:09 PM To: dev@spark.incubator.apache.org Subject:

Re: compiling against hadoop 2.2

2014-01-02 Thread Ted Yu
Specification of yarn.version can be inserted following this line (#762 in pom.xml), right ? hadoop.version2.2.0/hadoop.version On Thu, Jan 2, 2014 at 10:10 PM, Liu, Raymond raymond@intel.com wrote: Sorry , mvn -Pnew-yarn -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 -DskipTests

RE: compiling against hadoop 2.2

2014-01-02 Thread Liu, Raymond
Yep, you are right. While we will merge in new code pretty soon ( maybe today? I hope so) on this part. Might shift a few lines Best Regards, Raymond Liu -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Friday, January 03, 2014 2:21 PM To:

RE: compiling against hadoop 2.2

2014-01-02 Thread Liu, Raymond
And I am not sure where it is value able to providing different setting for hadoop/hdfs and yarn version. When build with SBT, they will always be the same. Maybe in mvn we should do so too. Best Regards, Raymond Liu -Original Message- From: Liu, Raymond Sent: Friday, January 03,

Terminology: worker vs slave

2014-01-02 Thread Andrew Ash
The terms worker and slave seem to be used interchangeably. Are they the same? Worker is used more frequently in the codebase: aash@aash-mbp ~/git/spark$ git grep -i worker | wc -l 981 aash@aash-mbp ~/git/spark$ git grep -i slave | wc -l 348 aash@aash-mbp ~/git/spark$ Does it make

Re: Terminology: worker vs slave

2014-01-02 Thread Reynold Xin
It is historic. I think we are converging towards worker: the slave daemon in the standalone cluster manager executor: the jvm process that is launched by the worker that executes tasks On Thu, Jan 2, 2014 at 10:39 PM, Andrew Ash and...@andrewash.com wrote: The terms worker and slave seem

Re: Terminology: worker vs slave

2014-01-02 Thread Patrick Wendell
Ya we've been trying to standardize on the terminology here (see glossary): http://spark.incubator.apache.org/docs/latest/cluster-overview.html I think slave actually isn't mentioned here at all - but references to slave in the codebase are synonymous with worker. - Patrick On Thu, Jan 2, 2014