Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion

2013-06-08 Thread Sebastian Schelter
Hi Grant,

Very good release announcement. I propose that we deprecate a lot more,
I think we should be aggressive here to pave the way for a clean and
slim 1.0 release.

I propose to additionally deprecate the following algorithms, as to my
state of knowledge, they are not actively used:

Collaborative Filtering:

- all recommenders in o.a.m.cf.taste.impl.recommender.knn

- the TreeClusteringRecommender in o.a.m.cf.taste.impl.recommender

- the SlopeOne implementations in o.a.m..cf.taste.hadoop.slopeone and
o.a.m.cf.taste.impl.recommender.slopeone

- the distributed pseudo recommender in o.a.m.cf.taste.hadoop.pseudo

Classification:

- the Hidden Markov Models in o.a.m.classifier.sequencelearning.hmm

Clustering

- Fuzzy k-Means o.a.m.clustering.fuzzykmeans
- Spectral k-Means in o.a.m.clustering.spectral

Math

- the tooling in o.a.m.math.stats.entropy

Furthermore, I think we should deprecate the Lanczos implementation in
o.a.m.math.hadoop.decomposer and port all code that uses it to SSVD.

To all users and other committers, this is a biased first proposal,
please shout, if you see things different and want to have things kept.

Best,
Sebastian


On 08.06.2013 16:42, Grant Ingersoll wrote:
 More tests are always welcome.
 
 On Jun 8, 2013, at 10:29 AM, Ravi Mummulla ravi.mummu...@gmail.com wrote:
 
 Hi Grant,
 Regarding 1.0 plans, do we also want to include a note on adding tests
 where they don't exist or improving them where needed or is that implicit?

 Thanks.


 On Sat, Jun 8, 2013 at 3:55 AM, Grant Ingersoll gsing...@apache.org wrote:

 Hi Mahouts,

 A full copy of proposed draft release notes are up at
 https://cwiki.apache.org/confluence/display/MAHOUT/Release+0.8.  Please
 add/edit as appropriate.

 IN PARTICULAR, PLEASE PAY CLOSE ATTENTION TO THE SECTION LABELLED __FUTURE
 PLANS__, which I have included below.  This is purely my own opinion, but I
 think it reflects conversations I've had w/ both Robin and Sebastian at
 Berlin Buzzwords.   I'm also interested in opinions on my proposed
 deprecation plan (which I haven't discussed with anyone) which is put forth
 in the 1.0 plans below.

 --  DRAFT -
 FUTURE PLANS

 0.9

 As the project moves towards a 1.0 release, the community is working to
 clean up and/or remove parts of the code base that are under-supported or
 that underperform as well as to better focus the energy and contributions
 on key algorithms that are proven to scale in production and have seen
 wide-spread adoption.  To this end, in the next release, the project is
 planning on removing support for the following algorithms unless there is
 sustained support and improvement of them before the next release.

 The algorithms to be removed are:
 - From Clustering:
Dirichlet
MeanShift
MinHash
 - From Classification (both are sequential implementations)
Winnow
Perceptron
 - Frequent Pattern Mining
 - Collaborative Filtering
GSI: DO ANY GO HERE?
 - Other
GSI: ANYTHING?

 If you are interested in supporting 1 or more of these algorithms, please
 make it known on d...@mahout.apache.org and via JIRA issues that fix
 and/or improve them.  Please also provide supporting evidence as to there
 effectiveness for you in production.

 1.0 PLANS

 Our plans as a community are to focus 0.9 on cleanup of bugs and the
 removal of the code mentioned above and then to follow with a 1.0 release
 soon thereafter, at which point the community is committing to the support
 of the algorithms packaged in the 1.0 for at least two minor versions after
 their release.  In the case of removal, we will deprecate the functionality
 in the 1.(x+1) minor release and remove it in the 1.(x+2) release.  For
 instance, if feature X is to be removed after the 1.2 release, it will be
 deprecated in 1.3 and removed in 1.4.

 --- DRAFT --

 -Grant




 -- 
 Thanks.
 
 
 Grant Ingersoll | @gsingers
 http://www.lucidworks.com
 
 
 
 
 
 



Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion

2013-06-08 Thread Grant Ingersoll

On Jun 8, 2013, at 1:26 PM, Sebastian Schelter s...@apache.org wrote:

 Hi Grant,
 
 Very good release announcement. I propose that we deprecate a lot more,
 I think we should be aggressive here to pave the way for a clean and
 slim 1.0 release.
 
 I propose to additionally deprecate the following algorithms, as to my
 state of knowledge, they are not actively used:
 
 Collaborative Filtering:
 
 - all recommenders in o.a.m.cf.taste.impl.recommender.knn
 
 - the TreeClusteringRecommender in o.a.m.cf.taste.impl.recommender
 
 - the SlopeOne implementations in o.a.m..cf.taste.hadoop.slopeone and
 o.a.m.cf.taste.impl.recommender.slopeone
 
 - the distributed pseudo recommender in o.a.m.cf.taste.hadoop.pseudo

Pseudo is useful, no?  Don't know about the others.

 
 Classification:
 
 - the Hidden Markov Models in o.a.m.classifier.sequencelearning.hmm

We have some parallel training stuff coming, so I'd say -1 here, as I think 
HMMs are pretty important, no?

 
 Clustering
 
 - Fuzzy k-Means o.a.m.clustering.fuzzykmeans
 - Spectral k-Means in o.a.m.clustering.spectral

-1 on spectral being dropped as that seems to receive decent traction.

Not sure on Fuzzy, as I think it is a pretty trivial extension of K-Means.

 
 Math
 
 - the tooling in o.a.m.math.stats.entropy
 
 Furthermore, I think we should deprecate the Lanczos implementation in
 o.a.m.math.hadoop.decomposer and port all code that uses it to SSVD.

No opinion.

+1 on everything else.

 
 To all users and other committers, this is a biased first proposal,
 please shout, if you see things different and want to have things kept.
 
 Best,
 Sebastian
 
 
 On 08.06.2013 16:42, Grant Ingersoll wrote:
 More tests are always welcome.
 
 On Jun 8, 2013, at 10:29 AM, Ravi Mummulla ravi.mummu...@gmail.com wrote:
 
 Hi Grant,
 Regarding 1.0 plans, do we also want to include a note on adding tests
 where they don't exist or improving them where needed or is that implicit?
 
 Thanks.
 
 
 On Sat, Jun 8, 2013 at 3:55 AM, Grant Ingersoll gsing...@apache.org wrote:
 
 Hi Mahouts,
 
 A full copy of proposed draft release notes are up at
 https://cwiki.apache.org/confluence/display/MAHOUT/Release+0.8.  Please
 add/edit as appropriate.
 
 IN PARTICULAR, PLEASE PAY CLOSE ATTENTION TO THE SECTION LABELLED __FUTURE
 PLANS__, which I have included below.  This is purely my own opinion, but I
 think it reflects conversations I've had w/ both Robin and Sebastian at
 Berlin Buzzwords.   I'm also interested in opinions on my proposed
 deprecation plan (which I haven't discussed with anyone) which is put forth
 in the 1.0 plans below.
 
 --  DRAFT -
 FUTURE PLANS
 
 0.9
 
 As the project moves towards a 1.0 release, the community is working to
 clean up and/or remove parts of the code base that are under-supported or
 that underperform as well as to better focus the energy and contributions
 on key algorithms that are proven to scale in production and have seen
 wide-spread adoption.  To this end, in the next release, the project is
 planning on removing support for the following algorithms unless there is
 sustained support and improvement of them before the next release.
 
 The algorithms to be removed are:
 - From Clustering:
   Dirichlet
   MeanShift
   MinHash
 - From Classification (both are sequential implementations)
   Winnow
   Perceptron
 - Frequent Pattern Mining
 - Collaborative Filtering
   GSI: DO ANY GO HERE?
 - Other
   GSI: ANYTHING?
 
 If you are interested in supporting 1 or more of these algorithms, please
 make it known on d...@mahout.apache.org and via JIRA issues that fix
 and/or improve them.  Please also provide supporting evidence as to there
 effectiveness for you in production.
 
 1.0 PLANS
 
 Our plans as a community are to focus 0.9 on cleanup of bugs and the
 removal of the code mentioned above and then to follow with a 1.0 release
 soon thereafter, at which point the community is committing to the support
 of the algorithms packaged in the 1.0 for at least two minor versions after
 their release.  In the case of removal, we will deprecate the functionality
 in the 1.(x+1) minor release and remove it in the 1.(x+2) release.  For
 instance, if feature X is to be removed after the 1.2 release, it will be
 deprecated in 1.3 and removed in 1.4.
 
 --- DRAFT --
 
 -Grant
 
 
 
 
 -- 
 Thanks.
 
 
 Grant Ingersoll | @gsingers
 http://www.lucidworks.com
 
 
 
 
 
 
 


Grant Ingersoll | @gsingers
http://www.lucidworks.com







Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion

2013-06-08 Thread Sean Owen
I agree with deprecating all of that FWIW.

On Sat, Jun 8, 2013 at 6:33 PM, Grant Ingersoll gsing...@apache.org wrote:
 Collaborative Filtering:

 - all recommenders in o.a.m.cf.taste.impl.recommender.knn

 - the TreeClusteringRecommender in o.a.m.cf.taste.impl.recommender

 - the SlopeOne implementations in o.a.m..cf.taste.hadoop.slopeone and
 o.a.m.cf.taste.impl.recommender.slopeone

 - the distributed pseudo recommender in o.a.m.cf.taste.hadoop.pseudo

 Pseudo is useful, no?  Don't know about the others.


Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion

2013-06-08 Thread Suneel Marthi
Under Release Highlights, please also add:

a) Dan's Streaming kmeans clustering.
b) Mahout upgrade to be Lucene 4.3.0 compatible 


(both of the above deserve special mentions along with lucene2seq and 
vector/matrix performance improvements).




 From: Grant Ingersoll gsing...@apache.org
To: d...@mahout.apache.org; s...@apache.org 
Cc: user@mahout.apache.org 
Sent: Saturday, June 8, 2013 1:33 PM
Subject: Re: [DRAFT] 0.8 Release Announcement + Future Plans Discussion
 


On Jun 8, 2013, at 1:26 PM, Sebastian Schelter s...@apache.org wrote:

 Hi Grant,
 
 Very good release announcement. I propose that we deprecate a lot more,
 I think we should be aggressive here to pave the way for a clean and
 slim 1.0 release.
 
 I propose to additionally deprecate the following algorithms, as to my
 state of knowledge, they are not actively used:
 
 Collaborative Filtering:
 
 - all recommenders in o.a.m.cf.taste.impl.recommender.knn
 
 - the TreeClusteringRecommender in o.a.m.cf.taste.impl.recommender
 
 - the SlopeOne implementations in o.a.m..cf.taste.hadoop.slopeone and
 o.a.m.cf.taste.impl.recommender.slopeone
 
 - the distributed pseudo recommender in o.a.m.cf.taste.hadoop.pseudo

Pseudo is useful, no?  Don't know about the others.

 
 Classification:
 
 - the Hidden Markov Models in o.a.m.classifier.sequencelearning.hmm

We have some parallel training stuff coming, so I'd say -1 here, as I think 
HMMs are pretty important, no?

 
 Clustering
 
 - Fuzzy k-Means o.a.m.clustering.fuzzykmeans
 - Spectral k-Means in o.a.m.clustering.spectral

-1 on spectral being dropped as that seems to receive decent traction.

Not sure on Fuzzy, as I think it is a pretty trivial extension of K-Means.

 
 Math
 
 - the tooling in o.a.m.math.stats.entropy
 
 Furthermore, I think we should deprecate the Lanczos implementation in
 o.a.m.math.hadoop.decomposer and port all code that uses it to SSVD.

No opinion.

+1 on everything else.

 
 To all users and other committers, this is a biased first proposal,
 please shout, if you see things different and want to have things kept.
 
 Best,
 Sebastian
 
 
 On 08.06.2013 16:42, Grant Ingersoll wrote:
 More tests are always welcome.
 
 On Jun 8, 2013, at 10:29 AM, Ravi Mummulla ravi.mummu...@gmail.com wrote:
 
 Hi Grant,
 Regarding 1.0 plans, do we also want to include a note on adding tests
 where they don't exist or improving them where needed or is that implicit?
 
 Thanks.
 
 
 On Sat, Jun 8, 2013 at 3:55 AM, Grant Ingersoll gsing...@apache.org wrote:
 
 Hi Mahouts,
 
 A full copy of proposed draft release notes are up at
 https://cwiki.apache.org/confluence/display/MAHOUT/Release+0.8.  Please
 add/edit as appropriate.
 
 IN PARTICULAR, PLEASE PAY CLOSE ATTENTION TO THE SECTION LABELLED __FUTURE
 PLANS__, which I have included below.  This is purely my own opinion, but I
 think it reflects conversations I've had w/ both Robin and Sebastian at
 Berlin Buzzwords.   I'm also interested in opinions on my proposed
 deprecation plan (which I haven't discussed with anyone) which is put forth
 in the 1.0 plans below.
 
 --  DRAFT -
 FUTURE PLANS
 
 0.9
 
 As the project moves towards a 1.0 release, the community is working to
 clean up and/or remove parts of the code base that are under-supported or
 that underperform as well as to better focus the energy and contributions
 on key algorithms that are proven to scale in production and have seen
 wide-spread adoption.  To this end, in the next release, the project is
 planning on removing support for the following algorithms unless there is
 sustained support and improvement of them before the next release.
 
 The algorithms to be removed are:
 - From Clustering:
       Dirichlet
       MeanShift
       MinHash
 - From Classification (both are sequential implementations)
       Winnow
       Perceptron
 - Frequent Pattern Mining
 - Collaborative Filtering
       GSI: DO ANY GO HERE?
 - Other
       GSI: ANYTHING?
 
 If you are interested in supporting 1 or more of these algorithms, please
 make it known on d...@mahout.apache.org and via JIRA issues that fix
 and/or improve them.  Please also provide supporting evidence as to there
 effectiveness for you in production.
 
 1.0 PLANS
 
 Our plans as a community are to focus 0.9 on cleanup of bugs and the
 removal of the code mentioned above and then to follow with a 1.0 release
 soon thereafter, at which point the community is committing to the support
 of the algorithms packaged in the 1.0 for at least two minor versions after
 their release.  In the case of removal, we will deprecate the functionality
 in the 1.(x+1) minor release and remove it in the 1.(x+2) release.  For
 instance, if feature X is to be removed after the 1.2 release, it will be
 deprecated in 1.3 and removed in 1.4.
 
 --- DRAFT --
 
 -Grant
 
 
 
 
 -- 
 Thanks