Under Release Highlights, please also add: a) Dan's Streaming kmeans clustering. b) Mahout upgrade to be Lucene 4.3.0 compatible
(both of the above deserve special mentions along with lucene2seq and vector/matrix performance improvements). ________________________________ From: Grant Ingersoll <gsing...@apache.org> To: dev@mahout.apache.org; s...@apache.org Cc: u...@mahout.apache.org Sent: Saturday, June 8, 2013 1:33 PM Subject: Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion On Jun 8, 2013, at 1:26 PM, Sebastian Schelter <s...@apache.org> wrote: > Hi Grant, > > Very good release announcement. I propose that we deprecate a lot more, > I think we should be aggressive here to pave the way for a clean and > slim 1.0 release. > > I propose to additionally deprecate the following algorithms, as to my > state of knowledge, they are not actively used: > > Collaborative Filtering: > > - all recommenders in o.a.m.cf.taste.impl.recommender.knn > > - the TreeClusteringRecommender in o.a.m.cf.taste.impl.recommender > > - the SlopeOne implementations in o.a.m..cf.taste.hadoop.slopeone and > o.a.m.cf.taste.impl.recommender.slopeone > > - the distributed pseudo recommender in o.a.m.cf.taste.hadoop.pseudo Pseudo is useful, no? Don't know about the others. > > Classification: > > - the Hidden Markov Models in o.a.m.classifier.sequencelearning.hmm We have some parallel training stuff coming, so I'd say -1 here, as I think HMMs are pretty important, no? > > Clustering > > - Fuzzy k-Means o.a.m.clustering.fuzzykmeans > - Spectral k-Means in o.a.m.clustering.spectral -1 on spectral being dropped as that seems to receive decent traction. Not sure on Fuzzy, as I think it is a pretty trivial extension of K-Means. > > Math > > - the tooling in o.a.m.math.stats.entropy > > Furthermore, I think we should deprecate the Lanczos implementation in > o.a.m.math.hadoop.decomposer and port all code that uses it to SSVD. No opinion. +1 on everything else. > > To all users and other committers, this is a biased first proposal, > please shout, if you see things different and want to have things kept. > > Best, > Sebastian > > > On 08.06.2013 16:42, Grant Ingersoll wrote: >> More tests are always welcome. >> >> On Jun 8, 2013, at 10:29 AM, Ravi Mummulla <ravi.mummu...@gmail.com> wrote: >> >>> Hi Grant, >>> Regarding 1.0 plans, do we also want to include a note on adding tests >>> where they don't exist or improving them where needed or is that implicit? >>> >>> Thanks. >>> >>> >>> On Sat, Jun 8, 2013 at 3:55 AM, Grant Ingersoll <gsing...@apache.org> wrote: >>> >>>> Hi Mahouts, >>>> >>>> A full copy of proposed draft release notes are up at >>>> https://cwiki.apache.org/confluence/display/MAHOUT/Release+0.8. Please >>>> add/edit as appropriate. >>>> >>>> IN PARTICULAR, PLEASE PAY CLOSE ATTENTION TO THE SECTION LABELLED __FUTURE >>>> PLANS__, which I have included below. This is purely my own opinion, but I >>>> think it reflects conversations I've had w/ both Robin and Sebastian at >>>> Berlin Buzzwords. I'm also interested in opinions on my proposed >>>> deprecation plan (which I haven't discussed with anyone) which is put forth >>>> in the 1.0 plans below. >>>> >>>> -------------------------- DRAFT ------------------------- >>>> FUTURE PLANS >>>> >>>> 0.9 >>>> >>>> As the project moves towards a 1.0 release, the community is working to >>>> clean up and/or remove parts of the code base that are under-supported or >>>> that underperform as well as to better focus the energy and contributions >>>> on key algorithms that are proven to scale in production and have seen >>>> wide-spread adoption. To this end, in the next release, the project is >>>> planning on removing support for the following algorithms unless there is >>>> sustained support and improvement of them before the next release. >>>> >>>> The algorithms to be removed are: >>>> - From Clustering: >>>> Dirichlet >>>> MeanShift >>>> MinHash >>>> - From Classification (both are sequential implementations) >>>> Winnow >>>> Perceptron >>>> - Frequent Pattern Mining >>>> - Collaborative Filtering >>>> GSI: DO ANY GO HERE? >>>> - Other >>>> GSI: ANYTHING? >>>> >>>> If you are interested in supporting 1 or more of these algorithms, please >>>> make it known on dev@mahout.apache.org and via JIRA issues that fix >>>> and/or improve them. Please also provide supporting evidence as to there >>>> effectiveness for you in production. >>>> >>>> 1.0 PLANS >>>> >>>> Our plans as a community are to focus 0.9 on cleanup of bugs and the >>>> removal of the code mentioned above and then to follow with a 1.0 release >>>> soon thereafter, at which point the community is committing to the support >>>> of the algorithms packaged in the 1.0 for at least two minor versions after >>>> their release. In the case of removal, we will deprecate the functionality >>>> in the 1.(x+1) minor release and remove it in the 1.(x+2) release. For >>>> instance, if feature X is to be removed after the 1.2 release, it will be >>>> deprecated in 1.3 and removed in 1.4. >>>> >>>> ------------------- DRAFT ---------------------- >>>> >>>> -Grant >>> >>> >>> >>> >>> -- >>> Thanks. >> >> -------------------------------------------- >> Grant Ingersoll | @gsingers >> http://www.lucidworks.com >> >> >> >> >> >> > -------------------------------------------- Grant Ingersoll | @gsingers http://www.lucidworks.com