Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion
Hi Grant, Very good release announcement. I propose that we deprecate a lot more, I think we should be aggressive here to pave the way for a clean and slim 1.0 release. I propose to additionally deprecate the following algorithms, as to my state of knowledge, they are not actively used: Collaborative Filtering: - all recommenders in o.a.m.cf.taste.impl.recommender.knn - the TreeClusteringRecommender in o.a.m.cf.taste.impl.recommender - the SlopeOne implementations in o.a.m..cf.taste.hadoop.slopeone and o.a.m.cf.taste.impl.recommender.slopeone - the distributed pseudo recommender in o.a.m.cf.taste.hadoop.pseudo Classification: - the Hidden Markov Models in o.a.m.classifier.sequencelearning.hmm Clustering - Fuzzy k-Means o.a.m.clustering.fuzzykmeans - Spectral k-Means in o.a.m.clustering.spectral Math - the tooling in o.a.m.math.stats.entropy Furthermore, I think we should deprecate the Lanczos implementation in o.a.m.math.hadoop.decomposer and port all code that uses it to SSVD. To all users and other committers, this is a biased first proposal, please shout, if you see things different and want to have things kept. Best, Sebastian On 08.06.2013 16:42, Grant Ingersoll wrote: More tests are always welcome. On Jun 8, 2013, at 10:29 AM, Ravi Mummulla ravi.mummu...@gmail.com wrote: Hi Grant, Regarding 1.0 plans, do we also want to include a note on adding tests where they don't exist or improving them where needed or is that implicit? Thanks. On Sat, Jun 8, 2013 at 3:55 AM, Grant Ingersoll gsing...@apache.org wrote: Hi Mahouts, A full copy of proposed draft release notes are up at https://cwiki.apache.org/confluence/display/MAHOUT/Release+0.8. Please add/edit as appropriate. IN PARTICULAR, PLEASE PAY CLOSE ATTENTION TO THE SECTION LABELLED __FUTURE PLANS__, which I have included below. This is purely my own opinion, but I think it reflects conversations I've had w/ both Robin and Sebastian at Berlin Buzzwords. I'm also interested in opinions on my proposed deprecation plan (which I haven't discussed with anyone) which is put forth in the 1.0 plans below. -- DRAFT - FUTURE PLANS 0.9 As the project moves towards a 1.0 release, the community is working to clean up and/or remove parts of the code base that are under-supported or that underperform as well as to better focus the energy and contributions on key algorithms that are proven to scale in production and have seen wide-spread adoption. To this end, in the next release, the project is planning on removing support for the following algorithms unless there is sustained support and improvement of them before the next release. The algorithms to be removed are: - From Clustering: Dirichlet MeanShift MinHash - From Classification (both are sequential implementations) Winnow Perceptron - Frequent Pattern Mining - Collaborative Filtering GSI: DO ANY GO HERE? - Other GSI: ANYTHING? If you are interested in supporting 1 or more of these algorithms, please make it known on d...@mahout.apache.org and via JIRA issues that fix and/or improve them. Please also provide supporting evidence as to there effectiveness for you in production. 1.0 PLANS Our plans as a community are to focus 0.9 on cleanup of bugs and the removal of the code mentioned above and then to follow with a 1.0 release soon thereafter, at which point the community is committing to the support of the algorithms packaged in the 1.0 for at least two minor versions after their release. In the case of removal, we will deprecate the functionality in the 1.(x+1) minor release and remove it in the 1.(x+2) release. For instance, if feature X is to be removed after the 1.2 release, it will be deprecated in 1.3 and removed in 1.4. --- DRAFT -- -Grant -- Thanks. Grant Ingersoll | @gsingers http://www.lucidworks.com
Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion
On Jun 8, 2013, at 1:26 PM, Sebastian Schelter s...@apache.org wrote: Hi Grant, Very good release announcement. I propose that we deprecate a lot more, I think we should be aggressive here to pave the way for a clean and slim 1.0 release. I propose to additionally deprecate the following algorithms, as to my state of knowledge, they are not actively used: Collaborative Filtering: - all recommenders in o.a.m.cf.taste.impl.recommender.knn - the TreeClusteringRecommender in o.a.m.cf.taste.impl.recommender - the SlopeOne implementations in o.a.m..cf.taste.hadoop.slopeone and o.a.m.cf.taste.impl.recommender.slopeone - the distributed pseudo recommender in o.a.m.cf.taste.hadoop.pseudo Pseudo is useful, no? Don't know about the others. Classification: - the Hidden Markov Models in o.a.m.classifier.sequencelearning.hmm We have some parallel training stuff coming, so I'd say -1 here, as I think HMMs are pretty important, no? Clustering - Fuzzy k-Means o.a.m.clustering.fuzzykmeans - Spectral k-Means in o.a.m.clustering.spectral -1 on spectral being dropped as that seems to receive decent traction. Not sure on Fuzzy, as I think it is a pretty trivial extension of K-Means. Math - the tooling in o.a.m.math.stats.entropy Furthermore, I think we should deprecate the Lanczos implementation in o.a.m.math.hadoop.decomposer and port all code that uses it to SSVD. No opinion. +1 on everything else. To all users and other committers, this is a biased first proposal, please shout, if you see things different and want to have things kept. Best, Sebastian On 08.06.2013 16:42, Grant Ingersoll wrote: More tests are always welcome. On Jun 8, 2013, at 10:29 AM, Ravi Mummulla ravi.mummu...@gmail.com wrote: Hi Grant, Regarding 1.0 plans, do we also want to include a note on adding tests where they don't exist or improving them where needed or is that implicit? Thanks. On Sat, Jun 8, 2013 at 3:55 AM, Grant Ingersoll gsing...@apache.org wrote: Hi Mahouts, A full copy of proposed draft release notes are up at https://cwiki.apache.org/confluence/display/MAHOUT/Release+0.8. Please add/edit as appropriate. IN PARTICULAR, PLEASE PAY CLOSE ATTENTION TO THE SECTION LABELLED __FUTURE PLANS__, which I have included below. This is purely my own opinion, but I think it reflects conversations I've had w/ both Robin and Sebastian at Berlin Buzzwords. I'm also interested in opinions on my proposed deprecation plan (which I haven't discussed with anyone) which is put forth in the 1.0 plans below. -- DRAFT - FUTURE PLANS 0.9 As the project moves towards a 1.0 release, the community is working to clean up and/or remove parts of the code base that are under-supported or that underperform as well as to better focus the energy and contributions on key algorithms that are proven to scale in production and have seen wide-spread adoption. To this end, in the next release, the project is planning on removing support for the following algorithms unless there is sustained support and improvement of them before the next release. The algorithms to be removed are: - From Clustering: Dirichlet MeanShift MinHash - From Classification (both are sequential implementations) Winnow Perceptron - Frequent Pattern Mining - Collaborative Filtering GSI: DO ANY GO HERE? - Other GSI: ANYTHING? If you are interested in supporting 1 or more of these algorithms, please make it known on d...@mahout.apache.org and via JIRA issues that fix and/or improve them. Please also provide supporting evidence as to there effectiveness for you in production. 1.0 PLANS Our plans as a community are to focus 0.9 on cleanup of bugs and the removal of the code mentioned above and then to follow with a 1.0 release soon thereafter, at which point the community is committing to the support of the algorithms packaged in the 1.0 for at least two minor versions after their release. In the case of removal, we will deprecate the functionality in the 1.(x+1) minor release and remove it in the 1.(x+2) release. For instance, if feature X is to be removed after the 1.2 release, it will be deprecated in 1.3 and removed in 1.4. --- DRAFT -- -Grant -- Thanks. Grant Ingersoll | @gsingers http://www.lucidworks.com Grant Ingersoll | @gsingers http://www.lucidworks.com
Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion
I agree with deprecating all of that FWIW. On Sat, Jun 8, 2013 at 6:33 PM, Grant Ingersoll gsing...@apache.org wrote: Collaborative Filtering: - all recommenders in o.a.m.cf.taste.impl.recommender.knn - the TreeClusteringRecommender in o.a.m.cf.taste.impl.recommender - the SlopeOne implementations in o.a.m..cf.taste.hadoop.slopeone and o.a.m.cf.taste.impl.recommender.slopeone - the distributed pseudo recommender in o.a.m.cf.taste.hadoop.pseudo Pseudo is useful, no? Don't know about the others.
Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion
Under Release Highlights, please also add: a) Dan's Streaming kmeans clustering. b) Mahout upgrade to be Lucene 4.3.0 compatible (both of the above deserve special mentions along with lucene2seq and vector/matrix performance improvements). From: Grant Ingersoll gsing...@apache.org To: d...@mahout.apache.org; s...@apache.org Cc: user@mahout.apache.org Sent: Saturday, June 8, 2013 1:33 PM Subject: Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion On Jun 8, 2013, at 1:26 PM, Sebastian Schelter s...@apache.org wrote: Hi Grant, Very good release announcement. I propose that we deprecate a lot more, I think we should be aggressive here to pave the way for a clean and slim 1.0 release. I propose to additionally deprecate the following algorithms, as to my state of knowledge, they are not actively used: Collaborative Filtering: - all recommenders in o.a.m.cf.taste.impl.recommender.knn - the TreeClusteringRecommender in o.a.m.cf.taste.impl.recommender - the SlopeOne implementations in o.a.m..cf.taste.hadoop.slopeone and o.a.m.cf.taste.impl.recommender.slopeone - the distributed pseudo recommender in o.a.m.cf.taste.hadoop.pseudo Pseudo is useful, no? Don't know about the others. Classification: - the Hidden Markov Models in o.a.m.classifier.sequencelearning.hmm We have some parallel training stuff coming, so I'd say -1 here, as I think HMMs are pretty important, no? Clustering - Fuzzy k-Means o.a.m.clustering.fuzzykmeans - Spectral k-Means in o.a.m.clustering.spectral -1 on spectral being dropped as that seems to receive decent traction. Not sure on Fuzzy, as I think it is a pretty trivial extension of K-Means. Math - the tooling in o.a.m.math.stats.entropy Furthermore, I think we should deprecate the Lanczos implementation in o.a.m.math.hadoop.decomposer and port all code that uses it to SSVD. No opinion. +1 on everything else. To all users and other committers, this is a biased first proposal, please shout, if you see things different and want to have things kept. Best, Sebastian On 08.06.2013 16:42, Grant Ingersoll wrote: More tests are always welcome. On Jun 8, 2013, at 10:29 AM, Ravi Mummulla ravi.mummu...@gmail.com wrote: Hi Grant, Regarding 1.0 plans, do we also want to include a note on adding tests where they don't exist or improving them where needed or is that implicit? Thanks. On Sat, Jun 8, 2013 at 3:55 AM, Grant Ingersoll gsing...@apache.org wrote: Hi Mahouts, A full copy of proposed draft release notes are up at https://cwiki.apache.org/confluence/display/MAHOUT/Release+0.8. Please add/edit as appropriate. IN PARTICULAR, PLEASE PAY CLOSE ATTENTION TO THE SECTION LABELLED __FUTURE PLANS__, which I have included below. This is purely my own opinion, but I think it reflects conversations I've had w/ both Robin and Sebastian at Berlin Buzzwords. I'm also interested in opinions on my proposed deprecation plan (which I haven't discussed with anyone) which is put forth in the 1.0 plans below. -- DRAFT - FUTURE PLANS 0.9 As the project moves towards a 1.0 release, the community is working to clean up and/or remove parts of the code base that are under-supported or that underperform as well as to better focus the energy and contributions on key algorithms that are proven to scale in production and have seen wide-spread adoption. To this end, in the next release, the project is planning on removing support for the following algorithms unless there is sustained support and improvement of them before the next release. The algorithms to be removed are: - From Clustering: Dirichlet MeanShift MinHash - From Classification (both are sequential implementations) Winnow Perceptron - Frequent Pattern Mining - Collaborative Filtering GSI: DO ANY GO HERE? - Other GSI: ANYTHING? If you are interested in supporting 1 or more of these algorithms, please make it known on d...@mahout.apache.org and via JIRA issues that fix and/or improve them. Please also provide supporting evidence as to there effectiveness for you in production. 1.0 PLANS Our plans as a community are to focus 0.9 on cleanup of bugs and the removal of the code mentioned above and then to follow with a 1.0 release soon thereafter, at which point the community is committing to the support of the algorithms packaged in the 1.0 for at least two minor versions after their release. In the case of removal, we will deprecate the functionality in the 1.(x+1) minor release and remove it in the 1.(x+2) release. For instance, if feature X is to be removed after the 1.2 release, it will be deprecated in 1.3 and removed in 1.4. --- DRAFT -- -Grant -- Thanks