Re: all values for a key must fit in memory

2014-04-21 Thread Sandy Ryza
Thanks Matei and Mridul - was basically wondering whether we would be able to change the shuffle to accommodate this after 1.0, and from your answers it sounds like we can. On Mon, Apr 21, 2014 at 12:31 AM, Mridul Muralidharan mri...@gmail.comwrote: As Matei mentioned, the Values is now an

Re: Any plans for new clustering algorithms?

2014-04-21 Thread Evan R. Sparks
While DBSCAN and others would be welcome contributions, I couldn't agree more with Sean. On Mon, Apr 21, 2014 at 8:58 AM, Sean Owen so...@cloudera.com wrote: Nobody asked me, and this is a comment on a broader question, not this one, but: In light of a number of recent items about adding

Re: Any plans for new clustering algorithms?

2014-04-21 Thread Paul Brown
I agree that it will be good to see more algorithms added to the MLlib universe, although this does bring to mind a couple of comments: - MLlib as Mahout.next would be a unfortunate. There are some gems in Mahout, but there are also lots of rocks. Setting a minimal bar of working, correctly

Re: Any plans for new clustering algorithms?

2014-04-21 Thread Sean Owen
On Mon, Apr 21, 2014 at 6:03 PM, Paul Brown p...@mult.ifario.us wrote: - MLlib as Mahout.next would be a unfortunate. There are some gems in Mahout, but there are also lots of rocks. Setting a minimal bar of working, correctly implemented, and documented requires a surprising amount of work.

Re: Any plans for new clustering algorithms?

2014-04-21 Thread Xiangrui Meng
+1 on Sean's comment. MLlib covers the basic algorithms but we definitely need to spend more time on how to make the design scalable. For example, think about current ProblemWithAlgorithm naming scheme. That being said, new algorithms are welcomed. I wish they are well-established and

Re: Any plans for new clustering algorithms?

2014-04-21 Thread Nick Pentreath
I'd say a section in the how to contribute page would be a good place to put this. In general I'd say that the criteria for inclusion of an algorithm is it should be high quality, widely known, used and accepted (citations and concrete use cases as examples of this), scalable and

Re: Any plans for new clustering algorithms?

2014-04-21 Thread Xiangrui Meng
Cannot agree more with your words. Could you add one section about how and what to contribute to MLlib's guide? -Xiangrui On Mon, Apr 21, 2014 at 1:41 PM, Nick Pentreath nick.pentre...@gmail.com wrote: I'd say a section in the how to contribute page would be a good place to put this. In

Re: Any plans for new clustering algorithms?

2014-04-21 Thread Sandy Ryza
How do I get permissions to edit the wiki? On Mon, Apr 21, 2014 at 3:19 PM, Xiangrui Meng men...@gmail.com wrote: Cannot agree more with your words. Could you add one section about how and what to contribute to MLlib's guide? -Xiangrui On Mon, Apr 21, 2014 at 1:41 PM, Nick Pentreath

Re: Any plans for new clustering algorithms?

2014-04-21 Thread Xiangrui Meng
The markdown files are under spark/docs. You can submit a PR for changes. -Xiangrui On Mon, Apr 21, 2014 at 6:01 PM, Sandy Ryza sandy.r...@cloudera.com wrote: How do I get permissions to edit the wiki? On Mon, Apr 21, 2014 at 3:19 PM, Xiangrui Meng men...@gmail.com wrote: Cannot agree more

Re: Any plans for new clustering algorithms?

2014-04-21 Thread Sandy Ryza
I thought this might be a good thing to add to the wiki's How to contribute pagehttps://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark, as it's not tied to a release. On Mon, Apr 21, 2014 at 6:09 PM, Xiangrui Meng men...@gmail.com wrote: The markdown files are under

Re: Any plans for new clustering algorithms?

2014-04-21 Thread Nan Zhu
I thought those are files of spark.apache.org? -- Nan Zhu On Monday, April 21, 2014 at 9:09 PM, Xiangrui Meng wrote: The markdown files are under spark/docs. You can submit a PR for changes. -Xiangrui On Mon, Apr 21, 2014 at 6:01 PM, Sandy Ryza sandy.r...@cloudera.com

Re: Any plans for new clustering algorithms?

2014-04-21 Thread Matei Zaharia
The wiki is actually maintained separately in https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage. We restricted editing of the wiki because bots would automatically add stuff. I’ve given you permissions now. Matei On Apr 21, 2014, at 6:22 PM, Nan Zhu zhunanmcg...@gmail.com wrote: