date:20140414

[jira] [Created] (MAHOUT-1515) Contact the original Frequent Pattern Mining author

2014-04-14 Thread Sebastian Schelter (JIRA)

Sebastian Schelter created MAHOUT-1515:
--

 Summary: Contact the original Frequent Pattern Mining author
 Key: MAHOUT-1515
 URL: https://issues.apache.org/jira/browse/MAHOUT-1515
 Project: Mahout
  Issue Type: Task
Reporter: Sebastian Schelter
Priority: Critical
 Fix For: 1.0


We should contact the original FPM author to ask about maintenance of the 
algorithm. Otherwise this becomes a candidate for removal.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (MAHOUT-1514) Contact the original Random Forest author

2014-04-14 Thread Sebastian Schelter (JIRA)

Sebastian Schelter created MAHOUT-1514:
--

 Summary: Contact the original Random Forest author
 Key: MAHOUT-1514
 URL: https://issues.apache.org/jira/browse/MAHOUT-1514
 Project: Mahout
  Issue Type: Task
Reporter: Sebastian Schelter
Priority: Critical
 Fix For: 1.0


We should contact the original Random Forest author to ask about maintenance of 
the implementation. Otherwise, this becomes a candidate for removal.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (MAHOUT-1513) Deprecate Canopy Clustering

2014-04-14 Thread Sebastian Schelter (JIRA)

Sebastian Schelter created MAHOUT-1513:
--

 Summary: Deprecate Canopy Clustering
 Key: MAHOUT-1513
 URL: https://issues.apache.org/jira/browse/MAHOUT-1513
 Project: Mahout
  Issue Type: Task
Reporter: Sebastian Schelter
 Fix For: 1.0


citing [~smarthi] "I meant to deprecate first (and eventually remove) Canopy 
clustering. This is in line with the conversation I had with Ted and Frank at 
AMS about weaning users away from the old style Canopy->KMeans clustering to 
start using Streaming KMeans. No point in keeping Canopy once users switch to 
using Streaming KMeans."



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (MAHOUT-1512) Hadoop 2 compatibility

2014-04-14 Thread Sebastian Schelter (JIRA)

Sebastian Schelter created MAHOUT-1512:
--

 Summary: Hadoop 2 compatibility
 Key: MAHOUT-1512
 URL: https://issues.apache.org/jira/browse/MAHOUT-1512
 Project: Mahout
  Issue Type: Task
Reporter: Sebastian Schelter
Priority: Critical
 Fix For: 1.0


We must ensure that all our MR code also runs on Hadoop 2. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (MAHOUT-1511) Renaming core to mrlegacy

2014-04-14 Thread Sebastian Schelter (JIRA)

Sebastian Schelter created MAHOUT-1511:
--

 Summary: Renaming core to mrlegacy
 Key: MAHOUT-1511
 URL: https://issues.apache.org/jira/browse/MAHOUT-1511
 Project: Mahout
  Issue Type: Task
Reporter: Sebastian Schelter
 Fix For: 1.0


Rename the core module to mrlegacy to reflect that we still maintain this code 
but do not add new MR algorithms. We should aim to gradually pull out items 
that are really needed from this module.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (MAHOUT-1510) Goodbye MapReduce

2014-04-14 Thread Sebastian Schelter (JIRA)

Sebastian Schelter created MAHOUT-1510:
--

 Summary: Goodbye MapReduce
 Key: MAHOUT-1510
 URL: https://issues.apache.org/jira/browse/MAHOUT-1510
 Project: Mahout
  Issue Type: Task
  Components: Documentation
Reporter: Sebastian Schelter
 Fix For: 1.0


We should prominently state on the website that we reject any future MR 
algorithm contributions (but still maintain and bugfix what we have so far).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Tackling the "legacy dilemma"

2014-04-14 Thread Sebastian Schelter


Hi,

From reading the thread, I have the impression that we agree on the 
following actions:


 * reject any future MR algorithm contributions, prominently state this
on the website and in talks
 * make all existing algorithm code compatible with Hadoop 2, if there 
is no one willing to make an existing algorithm compatible, remove the 
algorithm

 * deprecate Canopy clustering
 * email the original FPM and random forest authors to ask for 
maintenance of the algorithms
 * rename core to "mr-legacy" (and  gradually pull items we really need 
out of that later)


I will create jira tickets for those action points. I think the biggest 
challenge here is the Hadoop 2 compatibility, is someone volunteering to 
drive that? Would be awesome.


Best,
Sebastian


On 04/13/2014 07:19 PM, Andrew Musselman wrote:

This is a good summary of how I feel too.


On Apr 13, 2014, at 10:15 AM, Sebastian Schelter  wrote:

Unfortunately, its not that easy to get enough voluntary work. I issued the 
third call for working on the documentation today as there are still lots of 
open issues. That's why I'm trying to suggest a move that involves as few work 
as possible.

We should get the MR codebase into a state that we all can live with and then 
focus on new stuff like the scala DSL.

--sebastian





On 04/13/2014 07:09 PM, Giorgio Zoppi wrote:
The best thing, should be do a plan, and see how much effort do you need to
this. Then find out voluntaries to accomplish the task. Quite sure that
there a lot of people around there that they are willing to help out.

BR,
deneb.


2014-04-13 18:45 GMT+02:00 Sebastian Schelter :


Hi,

I took some days to let the latest discussion about the state and future
of Mahout go through my head. I think the most important thing to address
right now is the MapReduce "legacy" codebase. A lot of the MR algorithms
are currently unmaintained, documentation is outdated and the original
authors have abandoned Mahout. For some algorithms it is hard to get even
questions answered on the mailinglist (e.g. RandomForest). I agree with
Sean's comments that letting the code linger around is no option and will
continue to harm Mahout.

In the previous discussion, I suggested to make a radical move and aim to
delete this codebase, but there were serious objections from committers and
users that convinced me that there is still usage of and interested in that
codebase.

That puts us into a "legacy dilemma". We cannot delete the code without
harming our userbase. On the other hand, I don't see anyone willing to
rework the codebase. Further, the code cannot linger around anymore as it
is doing now, especially when we fail to answer questions or don't provide
documentation.

*We have to make a move*!

I suggest the following actions with regard to the MR codebase. I hope
that they find consent. If there are objections, please give alternatives,
*keeping everything as-is is not an option*:

  * reject any future MR algorithm contributions, prominently state this on
the website and in talks
  * make all existing algorithm code compatible with Hadoop 2, if there is
no one willing to make an existing algorithm compatible, remove the
algorithm
  * deprecate the existing MR algorithms, yet still take bug fix
contributions
  * remove Random Forest as we cannot even answer questions to the
implementation on the mailinglist

There are two more actions that I would like to see, but'd be willing to
give up if there are objections:

  * move the MR algorithms into a separate maven module
  * remove Frequent Pattern Mining again (we already aimed for that in 0.9
but had one user who shouted but never returned to us)

Let me know what you think.

--sebastian

[jira] [Commented] (MAHOUT-1504) Enable/fix thetaSummer job in TrainNaiveBayesJob

2014-04-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969223#comment-13969223
 ] 

Hudson commented on MAHOUT-1504:


SUCCESS: Integrated in Mahout-Quality #2569 (See 
[https://builds.apache.org/job/Mahout-Quality/2569/])
MAHOUT-1504: Enable/fix thetaSummer job in TrainNaiveBayesJob (smarthi: rev 
1587393)
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/BayesUtils.java
MAHOUT-1504: Enable/fix thetaSummer job in TrainNaiveBayesJob (smarthi: rev 
1587390)
* /mahout/trunk/CHANGELOG
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/BayesUtils.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/ComplementaryNaiveBayesClassifier.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/NaiveBayesModel.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/StandardNaiveBayesClassifier.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/training/AbstractThetaTrainer.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/training/ComplementaryThetaTrainer.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/training/StandardThetaTrainer.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/training/TrainNaiveBayesJob.java
* 
/mahout/trunk/core/src/test/java/org/apache/mahout/classifier/naivebayes/NaiveBayesTestBase.java


> Enable/fix thetaSummer job in TrainNaiveBayesJob
> 
>
> Key: MAHOUT-1504
> URL: https://issues.apache.org/jira/browse/MAHOUT-1504
> Project: Mahout
>  Issue Type: Task
>  Components: Classification, Examples
>Affects Versions: 0.9
>Reporter: Andrew Palumbo
>Assignee: Suneel Marthi
>Priority: Minor
> Fix For: 1.0
>
> Attachments: MAHOUT-1504.patch
>
>
> A new implementation of Naive Bayes was introduced in .7.  The weight (theta) 
> normalization job was at least partially carried over but not fully 
> implemented or enabled.  Weight normalization does not effect simple NB or 
> CNB however enabling it will allow for all NB implementations in the Rennie 
> et al. paper. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAHOUT-1503) TestNaiveBayesDriver fails in sequential mode

2014-04-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969224#comment-13969224
 ] 

Hudson commented on MAHOUT-1503:


SUCCESS: Integrated in Mahout-Quality #2569 (See 
[https://builds.apache.org/job/Mahout-Quality/2569/])
MAHOUT-1503: TestNaiveBayesDriver fails in sequential mode (smarthi: rev 
1587387)
* /mahout/trunk/CHANGELOG
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/classifier/naivebayes/test/TestNaiveBayesDriver.java


> TestNaiveBayesDriver fails in sequential mode
> -
>
> Key: MAHOUT-1503
> URL: https://issues.apache.org/jira/browse/MAHOUT-1503
> Project: Mahout
>  Issue Type: Bug
>  Components: Classification, Examples
>Affects Versions: 0.9
>Reporter: Andrew Palumbo
>Assignee: Suneel Marthi
>Priority: Minor
> Fix For: 1.0
>
> Attachments: MAHOUT-1503.patch
>
>
> As reported by Chandler Burgess, testnb fails in sequential mode with 
> exception:
> Exception in thread "main" java.io.FileNotFoundException: 
> /tmp/mahout-work-andy/20news-train-vectors (Is a directory)
>   at java.io.FileInputStream.open(Native Method)
>   at java.io.FileInputStream.(FileInputStream.java:120)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.(RawLocalFileSystem.java:71)
> {...} at 
> org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.run(TestNaiveBayesDriver.java:99)
> {...}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAHOUT-1509) Invalid URL in link from "quick start/basics" page

2014-04-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969222#comment-13969222
 ] 

Hudson commented on MAHOUT-1509:


SUCCESS: Integrated in Mahout-Quality #2569 (See 
[https://builds.apache.org/job/Mahout-Quality/2569/])
MAHOUT-1509:Invalid URL in link from "quick start/basics" page (smarthi: rev 
1587383)
* /mahout/trunk/CHANGELOG


> Invalid URL in link from "quick start/basics" page
> --
>
> Key: MAHOUT-1509
> URL: https://issues.apache.org/jira/browse/MAHOUT-1509
> Project: Mahout
>  Issue Type: Documentation
>  Components: Examples
> Environment: Website
>Reporter: Nick Martin
>Assignee: Suneel Marthi
>Priority: Minor
>  Labels: documentation
> Fix For: 1.0
>
>
> From https://mahout.apache.org/users/basics/quickstart.html the "Dos and 
> Don'ts" link under "Recommendations" goes to nowhere (URL typo - 
> "ecommender") 
> https://mahout.apache.org/users/recommender/ecommender-first-timer-faq.html 
> Can't remember who's running point on the URL updates or I'd [at] them...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAHOUT-1445) Create an intro for item based recommender

2014-04-14 Thread Sebastian Schelter (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated MAHOUT-1445:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Added to the website at 
https://mahout.apache.org/users/recommender/intro-itembased-hadoop.html . 
Thanks for the contribution!

> Create an intro for item based recommender
> --
>
> Key: MAHOUT-1445
> URL: https://issues.apache.org/jira/browse/MAHOUT-1445
> Project: Mahout
>  Issue Type: New Feature
>  Components: Documentation
>Affects Versions: 1.0
>Reporter: Maciej Mazur
>  Labels: documentation, recommender
> Fix For: 1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAHOUT-1477) Clean up website on Logistic Regression

2014-04-14 Thread Sebastian Schelter (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969203#comment-13969203
 ] 

Sebastian Schelter commented on MAHOUT-1477:


[~nimartin] I think you can go ahead and take this one, as there hasn't been 
activity for three weeks from [~kanjilal]

> Clean up website on Logistic Regression
> ---
>
> Key: MAHOUT-1477
> URL: https://issues.apache.org/jira/browse/MAHOUT-1477
> Project: Mahout
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Sebastian Schelter
> Fix For: 1.0
>
>
> The website on Logistic regression needs clean up. We need to go through the 
> text, remove dead links and check whether the information is still consistent 
> with the current code. We should also link to the example created in 
> MAHOUT-1425 
> https://mahout.apache.org/users/classification/logistic-regression.html



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAHOUT-1439) Update talks on Mahout

2014-04-14 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969186#comment-13969186
 ] 

Ted Dunning commented on MAHOUT-1439:
-

@nimartin

That would be SOOO helpful.



> Update talks on Mahout
> --
>
> Key: MAHOUT-1439
> URL: https://issues.apache.org/jira/browse/MAHOUT-1439
> Project: Mahout
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Sebastian Schelter
> Fix For: 1.0
>
>
> The talks listed on our homepage seem to end somewhere in 2012.
> I know that there have been tons of other talks on Mahout since then, I've 
> added mine already. It would be great if everybody who knows of additional 
> talks would paste them here, so I can add them to the website.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (MAHOUT-1369) Why does theta normalization for naive bayes classification commented out?

2014-04-14 Thread Suneel Marthi (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned MAHOUT-1369:
-

Assignee: Suneel Marthi

> Why does theta normalization for naive bayes classification commented out?
> --
>
> Key: MAHOUT-1369
> URL: https://issues.apache.org/jira/browse/MAHOUT-1369
> Project: Mahout
>  Issue Type: Question
>  Components: Classification
>Affects Versions: 0.7, 0.8, 0.9
> Environment: mahout 0.8
>Reporter: utku yaman
>Assignee: Suneel Marthi
>Priority: Minor
>  Labels: features
> Fix For: 1.0
>
>
> TrainNaiveBayesJob line 155:158
> and
> BayesUtils line 86:93
> are commented out and these lines are for theta normalization for bayes.
> what is the problem with the code and is there a plan for correcting these 
> methods.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (MAHOUT-1369) Why does theta normalization for naive bayes classification commented out?

2014-04-14 Thread Suneel Marthi (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1369.
---

Resolution: Fixed

Resolved by the fix for Mahout-1504

> Why does theta normalization for naive bayes classification commented out?
> --
>
> Key: MAHOUT-1369
> URL: https://issues.apache.org/jira/browse/MAHOUT-1369
> Project: Mahout
>  Issue Type: Question
>  Components: Classification
>Affects Versions: 0.7, 0.8, 0.9
> Environment: mahout 0.8
>Reporter: utku yaman
>Priority: Minor
>  Labels: features
> Fix For: 1.0
>
>
> TrainNaiveBayesJob line 155:158
> and
> BayesUtils line 86:93
> are commented out and these lines are for theta normalization for bayes.
> what is the problem with the code and is there a plan for correcting these 
> methods.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAHOUT-1504) Enable/fix thetaSummer job in TrainNaiveBayesJob

2014-04-14 Thread Suneel Marthi (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1504:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed to trunk, thanks again.

> Enable/fix thetaSummer job in TrainNaiveBayesJob
> 
>
> Key: MAHOUT-1504
> URL: https://issues.apache.org/jira/browse/MAHOUT-1504
> Project: Mahout
>  Issue Type: Task
>  Components: Classification, Examples
>Affects Versions: 0.9
>Reporter: Andrew Palumbo
>Assignee: Suneel Marthi
>Priority: Minor
> Fix For: 1.0
>
> Attachments: MAHOUT-1504.patch
>
>
> A new implementation of Naive Bayes was introduced in .7.  The weight (theta) 
> normalization job was at least partially carried over but not fully 
> implemented or enabled.  Weight normalization does not effect simple NB or 
> CNB however enabling it will allow for all NB implementations in the Rennie 
> et al. paper. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (MAHOUT-1504) Enable/fix thetaSummer job in TrainNaiveBayesJob

2014-04-14 Thread Suneel Marthi (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned MAHOUT-1504:
-

Assignee: Suneel Marthi

> Enable/fix thetaSummer job in TrainNaiveBayesJob
> 
>
> Key: MAHOUT-1504
> URL: https://issues.apache.org/jira/browse/MAHOUT-1504
> Project: Mahout
>  Issue Type: Task
>  Components: Classification, Examples
>Affects Versions: 0.9
>Reporter: Andrew Palumbo
>Assignee: Suneel Marthi
>Priority: Minor
> Fix For: 1.0
>
> Attachments: MAHOUT-1504.patch
>
>
> A new implementation of Naive Bayes was introduced in .7.  The weight (theta) 
> normalization job was at least partially carried over but not fully 
> implemented or enabled.  Weight normalization does not effect simple NB or 
> CNB however enabling it will allow for all NB implementations in the Rennie 
> et al. paper. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAHOUT-1503) TestNaiveBayesDriver fails in sequential mode

2014-04-14 Thread Suneel Marthi (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1503:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed the patch with minor changes. Need to add a unit test to ensure 
adequate test coverage.

> TestNaiveBayesDriver fails in sequential mode
> -
>
> Key: MAHOUT-1503
> URL: https://issues.apache.org/jira/browse/MAHOUT-1503
> Project: Mahout
>  Issue Type: Bug
>  Components: Classification, Examples
>Affects Versions: 0.9
>Reporter: Andrew Palumbo
>Assignee: Suneel Marthi
>Priority: Minor
> Fix For: 1.0
>
> Attachments: MAHOUT-1503.patch
>
>
> As reported by Chandler Burgess, testnb fails in sequential mode with 
> exception:
> Exception in thread "main" java.io.FileNotFoundException: 
> /tmp/mahout-work-andy/20news-train-vectors (Is a directory)
>   at java.io.FileInputStream.open(Native Method)
>   at java.io.FileInputStream.(FileInputStream.java:120)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.(RawLocalFileSystem.java:71)
> {...} at 
> org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.run(TestNaiveBayesDriver.java:99)
> {...}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAHOUT-1439) Update talks on Mahout

2014-04-14 Thread Nick Martin (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969142#comment-13969142
 ] 

Nick Martin commented on MAHOUT-1439:
-

[~tdunning] I have some time this week to work on doc cleanup...if it would 
help I can scan your public slideshare and comment back a list of talks/topics 
and dates. Might save you some cycles on aggregating the talk timeline if 
you're tied up.

> Update talks on Mahout
> --
>
> Key: MAHOUT-1439
> URL: https://issues.apache.org/jira/browse/MAHOUT-1439
> Project: Mahout
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Sebastian Schelter
> Fix For: 1.0
>
>
> The talks listed on our homepage seem to end somewhere in 2012.
> I know that there have been tons of other talks on Mahout since then, I've 
> added mine already. It would be great if everybody who knows of additional 
> talks would paste them here, so I can add them to the website.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAHOUT-1477) Clean up website on Logistic Regression

2014-04-14 Thread Nick Martin (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969137#comment-13969137
 ] 

Nick Martin commented on MAHOUT-1477:
-

[~kanjilal] hey there - have you had a chance to start this yet? If not, I have 
some time this week I can probably knock it out but don't want step on your 
stuff if you've started something. Let me know ASAP so I know whether to start 
or not. Thx.

> Clean up website on Logistic Regression
> ---
>
> Key: MAHOUT-1477
> URL: https://issues.apache.org/jira/browse/MAHOUT-1477
> Project: Mahout
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Sebastian Schelter
> Fix For: 1.0
>
>
> The website on Logistic regression needs clean up. We need to go through the 
> text, remove dead links and check whether the information is still consistent 
> with the current code. We should also link to the example created in 
> MAHOUT-1425 
> https://mahout.apache.org/users/classification/logistic-regression.html



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (MAHOUT-1509) Invalid URL in link from "quick start/basics" page

2014-04-14 Thread Suneel Marthi (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned MAHOUT-1509:
-

Assignee: Suneel Marthi

> Invalid URL in link from "quick start/basics" page
> --
>
> Key: MAHOUT-1509
> URL: https://issues.apache.org/jira/browse/MAHOUT-1509
> Project: Mahout
>  Issue Type: Documentation
>  Components: Examples
> Environment: Website
>Reporter: Nick Martin
>Assignee: Suneel Marthi
>Priority: Minor
>  Labels: documentation
> Fix For: 1.0
>
>
> From https://mahout.apache.org/users/basics/quickstart.html the "Dos and 
> Don'ts" link under "Recommendations" goes to nowhere (URL typo - 
> "ecommender") 
> https://mahout.apache.org/users/recommender/ecommender-first-timer-faq.html 
> Can't remember who's running point on the URL updates or I'd [at] them...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (MAHOUT-1509) Invalid URL in link from "quick start/basics" page

2014-04-14 Thread Suneel Marthi (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1509.
---

   Resolution: Fixed
Fix Version/s: 1.0

Thanks for pointing out, fixed the bad link.

> Invalid URL in link from "quick start/basics" page
> --
>
> Key: MAHOUT-1509
> URL: https://issues.apache.org/jira/browse/MAHOUT-1509
> Project: Mahout
>  Issue Type: Documentation
>  Components: Examples
> Environment: Website
>Reporter: Nick Martin
>Assignee: Suneel Marthi
>Priority: Minor
>  Labels: documentation
> Fix For: 1.0
>
>
> From https://mahout.apache.org/users/basics/quickstart.html the "Dos and 
> Don'ts" link under "Recommendations" goes to nowhere (URL typo - 
> "ecommender") 
> https://mahout.apache.org/users/recommender/ecommender-first-timer-faq.html 
> Can't remember who's running point on the URL updates or I'd [at] them...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (MAHOUT-1509) Invalid URL in link from "quick start/basics" page

2014-04-14 Thread Nick Martin (JIRA)

Nick Martin created MAHOUT-1509:
---

 Summary: Invalid URL in link from "quick start/basics" page
 Key: MAHOUT-1509
 URL: https://issues.apache.org/jira/browse/MAHOUT-1509
 Project: Mahout
  Issue Type: Documentation
  Components: Examples
 Environment: Website
Reporter: Nick Martin
Priority: Minor


>From https://mahout.apache.org/users/basics/quickstart.html the "Dos and 
>Don'ts" link under "Recommendations" goes to nowhere (URL typo - "ecommender") 
>https://mahout.apache.org/users/recommender/ecommender-first-timer-faq.html 

Can't remember who's running point on the URL updates or I'd [at] them...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAHOUT-1445) Create an intro for item based recommender

2014-04-14 Thread Nick Martin (JIRA)

[
https://issues.apache.org/jira/browse/MAHOUT-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969112#comment-13969112
]

Nick Martin commented on MAHOUT-1445:
-

Item Based Recommender
Introduction

Mahout’s item based recommender is a flexible and easily implemented algorithm
with a diverse range of applications. The minimalism of the primary input
file’s structure and availability of ancillary filtering controls can make
sourcing required data and shaping a desired output both efficient and
straightforward.

Typical use cases include:
• Recommend products to customers via an eCommerce platform (think: Amazon,
Netflix, Overstock)
• Identify organic sales opportunities
• Segment users/customers based on similar item preferences

Broadly speaking, Mahout's item-based recommendation algorithm takes as input
customer preferences by item and generates an output recommending similar items
with a score indicating the likelihood a customer will "like" the recommended
item.

One of the strengths of the item based recommender is its adaptability to your
business conditions or research interests. For example, there are many
available approaches for providing product preference. One such method is to
calculate the total orders for a given product for each customer (i.e. Acme
Corp has ordered Widget-A 5,678 times) while others rely on user preference
captured via the web (i.e. Jane Doe rated a movie as five stars, or gave a
product two thumbs’ up).

Additionally, a variety of methodologies can be implemented to narrow the focus
of Mahout's recommendations, such as:
• Exclude low volume or low profitability products from consideration
• Group customers by segment or market rather than using user/customer level
data
• Exclude zero-dollar transactions, returns or other order types
• Map product substitutions into the Mahout input (i.e. if WidgetA is a
recommended item replace it with WidgetX)

The item based recommender output can be easily consumed by downstream
applications (i.e. websites, ERP systems or salesforce automation tools) and is
configurable so users can determine the number of item recommendations
generated by the algorithm.

Example

Testing the item based recommender can be a simple and potentially quite
rewarding endeavor. Whereas the typical sample use case for collaborative
filtering focuses on utilization of, and integration with, eCommerce platforms
we can instead look at a potential use case applicable to most businesses (even
those without a web presence). Let’s look at how a company might use Mahout’s
item based recommender to identify new sales opportunities for an existing
customer base. First, you’ll need to get Mahout up and running, the
instructions for which can be found here
(https://mahout.apache.org/users/basics/quickstart.html). After you've ensured
Mahout is properly installed we’re ready to run a quick example.

Step 1: Gather some test data
Mahout’s item based recommender relies on three key pieces of data: userID,
itemID and preference. The “users” could be website visitors or simply
customers that purchase products from your business. Similarly, items could be
products, product groups or even pages on your website – really anything you
would want to recommend to a group of users or customers. For our example let’s
use customer orders as a proxy for preference. A simple count of distinct
orders by customer, by product will work for this example. You’ll find as you
explore ways to manipulate the item based recommender the preference value can
be many things (page clicks, explicit ratings, order counts, etc.). Once your
test data is gathered put it in a .txt file separated by commas with no column
headers included.

Step 2: Pick a similarity measure
Choosing a similarity measure for use in a production environment is something
that requires careful testing, evaluation and research. For our example
purposes, we’ll just go with a Mahout similarity classname called
“SIMILARITY_LOGLIKELIHOOD”.

Step 3: Configure the Mahout command
Assuming your JAVA_HOME is appropriately set and Mahout was installed properly
we’re ready to configure our syntax. Enter the following command:

$ mahout recommenditembased -s SIMILARITY_LOGLIKELIHOOD -i /path/to/input/file
-o /path/to/desired/output --numRecommendations 25

Running the command will execute a series of jobs the final product of which
will be an output file deposited to the directory specified in the command
syntax. The output file will contain two columns: the userID and an array of
itemIDs and scores.

Step 4: Making use of the output and doing more with Mahout
The output file generated in our simple example can be transformed using your
tool of choice and consumed by downstream applications. There exist a variety
of configuration options for Mahout’s item based recommender to accommodate
custom bus

78 matches

Mail list logo