Thanks Matei. I added a section "How to contribute" page.
On Mon, Apr 21, 2014 at 7:25 PM, Matei Zaharia <matei.zaha...@gmail.com>wrote: > The wiki is actually maintained separately in > https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage. We > restricted editing of the wiki because bots would automatically add stuff. > I've given you permissions now. > > Matei > > On Apr 21, 2014, at 6:22 PM, Nan Zhu <zhunanmcg...@gmail.com> wrote: > > > I thought those are files of spark.apache.org? > > > > -- > > Nan Zhu > > > > > > On Monday, April 21, 2014 at 9:09 PM, Xiangrui Meng wrote: > > > >> The markdown files are under spark/docs. You can submit a PR for > >> changes. -Xiangrui > >> > >> On Mon, Apr 21, 2014 at 6:01 PM, Sandy Ryza > >> <sandy.r...@cloudera.com(mailto: > sandy.r...@cloudera.com)> wrote: > >>> How do I get permissions to edit the wiki? > >>> > >>> > >>> On Mon, Apr 21, 2014 at 3:19 PM, Xiangrui Meng <men...@gmail.com(mailto: > men...@gmail.com)> wrote: > >>> > >>>> Cannot agree more with your words. Could you add one section about > >>>> "how and what to contribute" to MLlib's guide? -Xiangrui > >>>> > >>>> On Mon, Apr 21, 2014 at 1:41 PM, Nick Pentreath > >>>> <nick.pentre...@gmail.com (mailto:nick.pentre...@gmail.com)> wrote: > >>>>> I'd say a section in the "how to contribute" page would be a good > place > >>>> > >>>> to put this. > >>>>> > >>>>> In general I'd say that the criteria for inclusion of an algorithm > is it > >>>> should be high quality, widely known, used and accepted (citations and > >>>> concrete use cases as examples of this), scalable and parallelizable, > well > >>>> documented and with reasonable expectation of dev support > >>>>> > >>>>> Sent from my iPhone > >>>>> > >>>>>> On 21 Apr 2014, at 19:59, Sandy Ryza <sandy.r...@cloudera.com(mailto: > sandy.r...@cloudera.com)> wrote: > >>>>>> > >>>>>> If it's not done already, would it make sense to codify this > philosophy > >>>>>> somewhere? I imagine this won't be the first time this discussion > comes > >>>>>> up, and it would be nice to have a doc to point to. I'd be happy to > >>>>>> > >>>>> > >>>>> > >>>> > >>>> take a > >>>>>> stab at this. > >>>>>> > >>>>>> > >>>>>>> On Mon, Apr 21, 2014 at 10:54 AM, Xiangrui Meng > >>>>>>> <men...@gmail.com(mailto: > men...@gmail.com)> > >>>> wrote: > >>>>>>> > >>>>>>> +1 on Sean's comment. MLlib covers the basic algorithms but we > >>>>>>> definitely need to spend more time on how to make the design > scalable. > >>>>>>> For example, think about current "ProblemWithAlgorithm" naming > scheme. > >>>>>>> That being said, new algorithms are welcomed. I wish they are > >>>>>>> well-established and well-understood by users. They shouldn't be > >>>>>>> research algorithms tuned to work well with a particular dataset > but > >>>>>>> not tested widely. You see the change log from Mahout: > >>>>>>> > >>>>>>> === > >>>>>>> The following algorithms that were marked deprecated in 0.8 have > been > >>>>>>> removed in 0.9: > >>>>>>> > >>>>>>> From Clustering: > >>>>>>> Switched LDA implementation from using Gibbs Sampling to Collapsed > >>>>>>> Variational Bayes (CVB) > >>>>>>> Meanshift > >>>>>>> MinHash - removed due to poor performance, lack of support and > lack of > >>>>>>> usage > >>>>>>> > >>>>>>> From Classification (both are sequential implementations) > >>>>>>> Winnow - lack of actual usage and support > >>>>>>> Perceptron - lack of actual usage and support > >>>>>>> > >>>>>>> Collaborative Filtering > >>>>>>> SlopeOne implementations in > >>>>>>> org.apache.mahout.cf.taste.hadoop.slopeone and > >>>>>>> org.apache.mahout.cf.taste.impl.recommender.slopeone > >>>>>>> Distributed pseudo recommender in > >>>>>>> org.apache.mahout.cf.taste.hadoop.pseudo > >>>>>>> TreeClusteringRecommender in > >>>>>>> org.apache.mahout.cf.taste.impl.recommender > >>>>>>> > >>>>>>> Mahout Math > >>>>>>> Hadoop entropy stuff in org.apache.mahout.math.stats.entropy > >>>>>>> === > >>>>>>> > >>>>>>> In MLlib, we should include the algorithms users know how to use > and > >>>>>>> we can provide support rather than letting algorithms come and go. > >>>>>>> > >>>>>>> My $0.02, > >>>>>>> Xiangrui > >>>>>>> > >>>>>>>> On Mon, Apr 21, 2014 at 10:23 AM, Sean Owen > >>>>>>>> <so...@cloudera.com(mailto: > so...@cloudera.com)> > >>>> wrote: > >>>>>>>>> On Mon, Apr 21, 2014 at 6:03 PM, Paul Brown > >>>>>>>>> <p...@mult.ifario.us(mailto: > p...@mult.ifario.us)> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>>> wrote: > >>>>>>>>> - MLlib as Mahout.next would be a unfortunate. There are some > gems > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>>> in > >>>>>>>>> Mahout, but there are also lots of rocks. Setting a minimal bar > of > >>>>>>>>> working, correctly implemented, and documented requires a > surprising > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> amount > >>>>>>>>> of work. > >>>>>>>> > >>>>>>>> > >>>>>>>> As someone with first-hand knowledge, this is correct. To Sang's > >>>>>>>> question, I can't see value in 'porting' Mahout since it is based > on a > >>>>>>>> quite different paradigm. About the only part that translates is > the > >>>>>>>> algorithm concept itself. > >>>>>>>> > >>>>>>>> This is also the cautionary tale. The contents of the project have > >>>>>>>> ended up being a number of "drive-by" contributions of > implementations > >>>>>>>> that, while individually perhaps brilliant (perhaps), didn't > >>>>>>>> necessarily match any other implementation in structure, > input/output, > >>>>>>>> libraries used. The implementations were often a touch academic. > The > >>>>>>>> result was hard to document, maintain, evolve or use. > >>>>>>>> > >>>>>>>> Far more of the structure of the MLlib implementations are > consistent > >>>>>>>> by virtue of being built around Spark core already. That's great. > >>>>>>>> > >>>>>>>> One can't wait to completely build the foundation before building > any > >>>>>>>> implementations. To me, the existing implementations are almost > >>>>>>>> exactly the basics I would choose. They cover the bases and will > >>>>>>>> exercise the abstractions and structure. So that's also great > IMHO. > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>>> > >>> > >>> > >> > >> > >> > > > > > >