Re: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected

2014-03-05 Thread Margusja
Hi Here are my actions and the problematic result again: [hduser@vm38 ~]$ git clone https://github.com/apache/mahout.git remote: Reusing existing pack: 76099, done. remote: Counting objects: 39, done. remote: Compressing objects: 100% (32/32), done. remote: Total 76138 (delta 2), reused 0

Re: Recommend items not rated by any user

2014-03-05 Thread Juan José Ramos
In case somebody runs into the same situation, the key seems to be in the CandidateItemStrategy being passed to the constructor of GenericItemBasedRecommender. Looking into the code, if no CandidateItemStrategy is specified in the constructor, PreferredItemsNeighborhoodCandidateItemsStrategy is

Re: Recommend items not rated by any user

2014-03-05 Thread Sebastian Schelter
Hi Juan, that is a good catch. CandidateItemsStrategy is the right place to implement this. Maybe we should simply extend its interface to add a parameter that says whether to keep or remove the current users items? We could even do this in the abstract base class then. --sebastian On

Rework our website

2014-03-05 Thread Sebastian Schelter
Hi everyone, In our latest discussion, I argued that the lack (and errors) of documentation on our website is one of the main pain points of Mahout atm. To be honest, I'm also not very happy with the design, especially fonts and spacing make it super hard to read long articles. This also

Re: Recommend items not rated by any user

2014-03-05 Thread Juan José Ramos
Thanks for the reply, Sebastian. I am not sure if that should be implemented in the Abstract base class though because for instance PreferredItemsNeighborhoodCandidateItemsStrategy, by definition, it returns the item not rated by the user and rated by somebody else. Back to my last post, I have

Re: Recommend items not rated by any user

2014-03-05 Thread Sebastian Schelter
On 03/05/2014 01:23 PM, Juan José Ramos wrote: Thanks for the reply, Sebastian. I am not sure if that should be implemented in the Abstract base class though because for instance PreferredItemsNeighborhoodCandidateItemsStrategy, by definition, it returns the item not rated by the user and rated

Re: Rework our website

2014-03-05 Thread Gokhan Capan
I liked both of them Great work Lucas! Gokhan On Wed, Mar 5, 2014 at 2:11 PM, Sebastian Schelter s...@apache.org wrote: Hi everyone, In our latest discussion, I argued that the lack (and errors) of documentation on our website is one of the main pain points of Mahout atm. To be honest,

Re: Recommend items not rated by any user

2014-03-05 Thread Tevfik Aytekin
Sorry there was a typo in the previous paragraph. If I remember correctly, AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value with at least one of the items preferred by the user. On Wed, Mar

Re: PCA with ssvd leads to StackOverFlowError

2014-03-05 Thread Kevin Moulart
Hi and thanks for your help! I had been told that the version of mahout used by Cloudera (CDH 4.6) was in fact 0.8 with a patch for mr2 support. ( http://mail-archives.apache.org/mod_mbox/mahout-user/201402.mbox/%3CCAEccTywqSAKA_HeX4vTZ-5XPmKtj5b8zMGQUfn5qRsiq=7o=u...@mail.gmail.com%3E) But I

Re: Recommend items not rated by any user

2014-03-05 Thread Juan José Ramos
Hi Tefik, Thanks for the response. I think what you says contradicts what Sebastian pointed out before. Also, if AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user, what would AllUnknownItemsCandidateItemsStrategy return? On Wed, Mar 5, 2014 at 1:40 PM,

Re: Recommend items not rated by any user

2014-03-05 Thread Tevfik Aytekin
Juan, You got me wrong, AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value with at least one of the items preferred by the user. So, it does not simply return all items that have not been

Re: Rework our website

2014-03-05 Thread Ted Dunning
Both are nice. I think you are right that the second is calmer. On Wed, Mar 5, 2014 at 4:11 AM, Sebastian Schelter s...@apache.org wrote: Hi everyone, In our latest discussion, I argued that the lack (and errors) of documentation on our website is one of the main pain points of Mahout atm.

Re: Recommend items not rated by any user

2014-03-05 Thread Pat Ferrel
I am ignoring the rest of the thread because I suspect it may have gotten off track. Your data is new articles, right? You would like to recommend from known articles to any user based on an article they rate or even view. You have no collaborative filtering data because the lifetime of a news

Re: Rework our website

2014-03-05 Thread Pat Ferrel
What no centered text?? ;-) Love either. BTW users are no longer able to contribute content to the wiki. Most CMSs have a way to allow input that is moderated. Might this make getting documentation help easier? Allow anyone to contribute but committers can filter out the bad—sort of like

Re: Rework our website

2014-03-05 Thread Andrew Musselman
On Wed, Mar 5, 2014 at 7:47 AM, Pat Ferrel p...@occamsmachete.com wrote: What no centered text?? ;-) Love either. BTW users are no longer able to contribute content to the wiki. Most CMSs have a way to allow input that is moderated. Might this make getting documentation help easier?

Re: Rework our website

2014-03-05 Thread Scott C. Cote
I had recently taken the text tour of mahout, but I couldn't decipher a way to contribute updates to the tour (some of the file names have changed, etc). How would I start? (this was part of my offer to help with the documentation of Mahout). SCott On 3/5/14 9:47 AM, Pat Ferrel

Re: Recommend items not rated by any user

2014-03-05 Thread Juan José Ramos
@Pat. You described my situation very well. The only additional thing is that I am also interested in creating some sort of a profile from the user with all the information s/he has provided by interacting with the articles and not only recommending similar items (news) based on a specific input.

Re: Recommend items not rated by any user

2014-03-05 Thread Juan José Ramos
@Tevfik, running this recommender: GenericItemBasedRecommender itemRecommender = new GenericItemBasedRecommender(dataModel, itemSimilarity, new AllSimilarItemsCandidateItemsStrategy(itemSimilarity), new AllSimilarItemsCandidateItemsStrategy(itemSimilarity)); With this dataModel: 1,1,1.0 1,2,2.0

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-05 Thread Suneel Marthi
Not sure if the CDH4 patches on top of 0.7 has fixes for M-1067 and M-1098 which address the issues u r seeing. The second part of the issue u r seeing with Mahout 0.9 distro seems to be related to how u set it up on CDH4. I apologize for not being helpful here as I am not a CDH4 user or

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-05 Thread Andrew Musselman
I'm not sure about this either but I think these are all the changes to Mahout in CDH 4.6.0: http://archive.cloudera.com/cdh4/cdh/4/mahout-0.7-cdh4.6.0.CHANGES.txt MAHOUT-1291 MAHOUT-1033 MAHOUT-1142 On Wed, Mar 5, 2014 at 8:30 AM, Suneel Marthi suneel_mar...@yahoo.comwrote: Not sure if

Re: Recommend items not rated by any user

2014-03-05 Thread Tevfik Aytekin
If the similarity between item 5 and two of the items user 1 preferred are not NaN then it will return 1, that is what I'm saying. If the similarities were all NaN then it will not return it. But surely, you might wonder if all similarities between an item and user's items are NaN, then

Re: Recommend items not rated by any user

2014-03-05 Thread Sebastian Schelter
So both strategies seems to be effectively the same, I don't know what the implementers had in mind when designing AllSimilarItemsCandidateItemsStrategy. It can take a long time to estimate preferences for all items a user doesn't know. Especially if you have a lot of items. Traditional

Re: Rework our website

2014-03-05 Thread Suneel Marthi
+1 for Option# 2. On Wednesday, March 5, 2014 7:11 AM, Sebastian Schelter s...@apache.org wrote: Hi everyone, In our latest discussion, I argued that the lack (and errors) of documentation on our website is one of the main pain points of Mahout atm. To be honest, I'm also not very happy

Re: Recommend items not rated by any user

2014-03-05 Thread Tevfik Aytekin
Hi Sebastian, But in order not to select items that is not similar to at least one of the items the user interacted with you have to compute the similarity with all user items (which is the main task for estimating the preference of an item in item-based method). So, it seems to me that

Re: Recommend items not rated by any user

2014-03-05 Thread Pat Ferrel
I agree. IMHO using the Mahout recommenders is wrong for this. The recommenders are the CF/cooccurrence type that expect usage or rating data on fairly long lived items from a somewhat static catalog. Trying to make them work for content based recommendations is needlessly difficult especially

Re: Recommend items not rated by any user

2014-03-05 Thread Tevfik Aytekin
It can even make things worse in SVD-based algorithms for which preference estimation is very fast. On Wed, Mar 5, 2014 at 7:00 PM, Tevfik Aytekin tevfik.ayte...@gmail.com wrote: Hi Sebastian, But in order not to select items that is not similar to at least one of the items the user interacted

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-05 Thread Sean Owen
CDH 4.5 and 4.6 are both 0.7 + patches. Neither contains 0.8, since it has (tiny) breaking changes vs 0.7 and this is a minor version update. CDH5 contains 0.8 + patches. I did not say CDH4 has 0.8 -- re-read the message of mine that was quoted.

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-05 Thread Dmitriy Lyubimov
Yeah. it would seem CDH releases of Mahout produce some sort of cut-down version of such. I suggest to switch to official release tarbal (or write to Cloudera support about it). On Wed, Mar 5, 2014 at 8:38 AM, Andrew Musselman andrew.mussel...@gmail.com wrote: I'm not sure about this either

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-05 Thread Sean Owen
I don't follow what here makes you say they are cut down releases? They are release plus patches not release minus patches. The question is not about how to use 0.7, but how to use 1.0-SNAPSHOT. Why would switching to the official 0.7 release help? I think the answer is you build Mahout for

Re: Recommend items not rated by any user

2014-03-05 Thread Sebastian Schelter
For SVD based algorithms, you would should use the AllUnknownItems Strategy then, thats correct. In the majority of industry usecases that I have seen, people use pre-computed item similarities (Mahout has lots of machinery for doing this, btw), so AllSimilarItems totally makes sense there.

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-05 Thread Suneel Marthi
I apologize Sean I wasn't aware of the complete history in this thread.  I didn't know about Hadoop 2.x being involved here, if so yes need to build Mahout against HEAD with Hadoop 2 profile to get working. On Wednesday, March 5, 2014 12:04 PM, Sean Owen sro...@gmail.com wrote: CDH 4.5

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-05 Thread Dmitriy Lyubimov
On Wed, Mar 5, 2014 at 9:08 AM, Sean Owen sro...@gmail.com wrote: I don't follow what here makes you say they are cut down releases? meaning it seems to be pretty much 2 releases behind the official. But i definitely don't follow CDH developments in this department, you seem in a better

Re: Rework our website

2014-03-05 Thread Frank Scholten
+1 for design 2 On Wed, Mar 5, 2014 at 6:00 PM, Suneel Marthi suneel_mar...@yahoo.comwrote: +1 for Option# 2. On Wednesday, March 5, 2014 7:11 AM, Sebastian Schelter s...@apache.org wrote: Hi everyone, In our latest discussion, I argued that the lack (and errors) of documentation

Re: Rework our website

2014-03-05 Thread Matthew Parent
I also prefer design 2 On Wed, Mar 5, 2014 at 11:08 AM, Frank Scholten fr...@frankscholten.nlwrote: +1 for design 2 On Wed, Mar 5, 2014 at 6:00 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: +1 for Option# 2. On Wednesday, March 5, 2014 7:11 AM, Sebastian Schelter

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-05 Thread Sean Owen
I don't understand this -- CDH always bundles the latest release. You know that CDH4 was released in July 2012, right? So it included 0.7 + patches. CDH5 includes 0.8 because 0.9 was released about a month after it began beta 2. CDH follows semantic versioning and won't introduce changes that

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-05 Thread Sean Owen
You can always install whatever version of anything on your cluster that you want. It may or may not work, but often happens to, at least for whatever you need it to do. It's just the same as it is without a packaged distribution -- dump new tarballs and cross your fingers. Nothing is weird or

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-05 Thread Andrew Musselman
Yeah, for sure; balancing clients' risk aversion to technical features is why we often recommend vendor solutions. Having a little button to choose a newer version of a component in the Manager UI (even with a confirmation dialog that said Are you sure? Are you crazy?) would be more palatable to

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-05 Thread Andrew Musselman
I mean balance the risk aversion against the value of new features duh. On Wed, Mar 5, 2014 at 1:39 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: Yeah, for sure; balancing clients' risk aversion to technical features is why we often recommend vendor solutions. Having a little

Re: Rework our website

2014-03-05 Thread Sebastian Schelter
At the moment, only committers can change the website unfortunately. If you have a text to add, I'm happy to work it in and add your name to our contributers list in the CHANGELOG. Best, Sebastian On 03/05/2014 04:58 PM, Scott C. Cote wrote: I had recently taken the text tour of mahout, but