Re: ALS array index out of bound with 50 factors

2014-04-07 Thread Debasish Das
Hi Xiangrui, With 4 ALS iterations it runs fine...If I run 10 I am failing...I believe I have to cut the lineage chain and call checkpointTrying to follow the other email chain on checkpointing... Thanks. Deb On Sun, Apr 6, 2014 at 9:08 PM, Xiangrui Meng men...@gmail.com wrote: Hi Deb,

Re: ALS array index out of bound with 50 factors

2014-04-07 Thread Nick Pentreath
On the partitioning / id keys. If we would look at hash partitioning, how feasible will it be to just allow the user and item ids to be strings? A lot of the time these ids are strings anyway (UUIDs and so on), and it's really painful to translate between String - Int the whole time. Are there

Re: ALS array index out of bound with 50 factors

2014-04-07 Thread Debasish Das
Nick, I already have this code which calls dictionary generation and then maps string etc to ints...I think the core algorithm should stay in ints...if you like I can add this code in MFUtils.scalathat's the convention I followed similar to MLUtils.scala...actually these functions should be

Re: tachyon dependency

2014-04-07 Thread Haoyuan Li
Tachyon is Java 6 compatible from version 0.4. Beside putting input/output data in Tachyon ( http://tachyon-project.org/Running-Spark-on-Tachyon.html ), Spark applications can also persist data into Tachyon ( https://github.com/apache/spark/blob/master/docs/scala-programming-guide.md ). On Mon,

Contributing to Spark

2014-04-07 Thread Mukesh G
Hi, How I contribute to Spark and it's associated projects? Appreciate the help... Thanks Mukesh

Re: Contributing to Spark

2014-04-07 Thread Sujeet Varakhedi
This is a good place to start: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark Sujeet On Mon, Apr 7, 2014 at 9:20 AM, Mukesh G muk...@gmail.com wrote: Hi, How I contribute to Spark and it's associated projects? Appreciate the help... Thanks Mukesh

Re: ALS array index out of bound with 50 factors

2014-04-07 Thread Debasish Das
I am using master... No negative indexes... If I run with 4 iterations it runs fine and I can generate factors... With 10 iterations run fails with array index out of bound... 25m users and 3m products are within int limits Does it help if I can point the logs for both the runs to you ?

Flaky streaming tests

2014-04-07 Thread Kay Ousterhout
Hi all, The InputStreamsSuite seems to have some serious flakiness issues -- I've seen the file input stream fail many times and now I'm seeing some actor input stream test failures ( https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13846/consoleFull) on what I think is an

Re: Flaky streaming tests

2014-04-07 Thread Nan Zhu
I met this issue when Jenkins seems to be very busy On Monday, April 7, 2014, Kay Ousterhout k...@eecs.berkeley.edu wrote: Hi all, The InputStreamsSuite seems to have some serious flakiness issues -- I've seen the file input stream fail many times and now I'm seeing some actor input

Re: Flaky streaming tests

2014-04-07 Thread Patrick Wendell
TD - do you know what is going on here? I looked into this ab it and at least a few of these that use Thread.sleep() and assume the sleep will be exact, which is wrong. We should disable all the tests that do and probably they should be re-written to virtualize time. - Patrick On Mon, Apr 7,

Re: Flaky streaming tests

2014-04-07 Thread Michael Armbrust
There is a JIRA for one of the flakey tests here: https://issues.apache.org/jira/browse/SPARK-1409 On Mon, Apr 7, 2014 at 11:32 AM, Patrick Wendell pwend...@gmail.com wrote: TD - do you know what is going on here? I looked into this ab it and at least a few of these that use Thread.sleep()

Re: Flaky streaming tests

2014-04-07 Thread Tathagata Das
Yes, I will take a look at those tests ASAP. TD On Mon, Apr 7, 2014 at 11:32 AM, Patrick Wendell pwend...@gmail.com wrote: TD - do you know what is going on here? I looked into this ab it and at least a few of these that use Thread.sleep() and assume the sleep will be exact, which is

Re: ALS array index out of bound with 50 factors

2014-04-07 Thread Xiangrui Meng
Hi Deb, It would be helpful if you can attached the logs. It is strange to see that you can make 4 iterations but not 10. Xiangrui On Mon, Apr 7, 2014 at 10:36 AM, Debasish Das debasish.da...@gmail.com wrote: I am using master... No negative indexes... If I run with 4 iterations it runs

Spark Streaming and Flume Avro RPC Servers

2014-04-07 Thread Christophe Clapp
Hi, From my testing of Spark Streaming with Flume, it seems that there's only one of the Spark worker nodes that runs a Flume Avro RPC server to receive messages at any given time, as opposed to every Spark worker running an Avro RPC server to receive messages. Is this the case? Our use-case

Re: Spark Streaming and Flume Avro RPC Servers

2014-04-07 Thread Michael Ernest
You can configure your sinks to write to one or more Avro sources in a load-balanced configuration. https://flume.apache.org/FlumeUserGuide.html#flume-sink-processors mfe On Mon, Apr 7, 2014 at 3:19 PM, Christophe Clapp christo...@christophe.ccwrote: Hi, From my testing of Spark Streaming

Re: Spark Streaming and Flume Avro RPC Servers

2014-04-07 Thread Michael Ernest
I don't see why not. If one were doing something similar with straight Flume, you'd start an agent on each node you care to receive Avro/RPC events. In the absence of clearer insight to your use case, I'm puzzling just a little why it's necessary for each Worker to be its own receiver, but there's

Re: Spark Streaming and Flume Avro RPC Servers

2014-04-07 Thread Christophe Clapp
Cool. I'll look at making the code change in FlumeUtils and generating a pull request. As far as the use case, the volume of messages we have is currently about 30 MB per second which may grow to over what a 1 Gbit network adapter can handle. - Christophe On Apr 7, 2014 1:51 PM, Michael Ernest

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-04-07 Thread DB Tsai
Hi guys, The latest PR uses Breeze's L-BFGS implement which is introduced by Xiangrui's sparse input format work in SPARK-1212. https://github.com/apache/spark/pull/353 Now, it works with the new sparse framework! Any feedback would be greatly appreciated. Thanks. Sincerely, DB Tsai

Re: Contributing to Spark

2014-04-07 Thread Mukesh G
Hi Sujeet, Thanks. I went thru the website and looks great. Is there a list of items that I can choose from, for contribution? Thanks Mukesh On Mon, Apr 7, 2014 at 10:14 PM, Sujeet Varakhedi svarakh...@gopivotal.comwrote: This is a good place to start:

Re: Contributing to Spark

2014-04-07 Thread Matei Zaharia
I’d suggest looking for the issues labeled “Starter” on JIRA. You can find them here: https://issues.apache.org/jira/browse/SPARK-1438?jql=project%20%3D%20SPARK%20AND%20labels%20%3D%20Starter%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened) Matei On Apr 7, 2014, at 9:45 PM,