Re: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
Hi Here are my actions and the problematic result again: [hduser@vm38 ~]$ git clone https://github.com/apache/mahout.git remote: Reusing existing pack: 76099, done. remote: Counting objects: 39, done. remote: Compressing objects: 100% (32/32), done. remote: Total 76138 (delta 2), reused 0 (delta 0) Receiving objects: 100% (76138/76138), 49.04 MiB | 275 KiB/s, done. Resolving deltas: 100% (34449/34449), done. [hduser@vm38 ~]$ cd mahout [hduser@vm38 ~]$ mvn clean package -DskipTests=true -Dhadoop2.version=2.2.0 ... ... ... [INFO] Reactor Summary: [INFO] [INFO] Mahout Build Tools SUCCESS [15.529s] [INFO] Apache Mahout . SUCCESS [1.657s] [INFO] Mahout Math ... SUCCESS [1:00.891s] [INFO] Mahout Core ... SUCCESS [2:44.617s] [INFO] Mahout Integration SUCCESS [38.195s] [INFO] Mahout Examples ... SUCCESS [45.458s] [INFO] Mahout Release Package SUCCESS [0.012s] [INFO] Mahout Math/Scala wrappers SUCCESS [53.519s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 6:27.763s [INFO] Finished at: Wed Mar 05 10:22:51 EET 2014 [INFO] Final Memory: 57M/442M [INFO] [hduser@vm38 mahout]$ [hduser@vm38 mahout]$ cd ../ [hduser@vm38 ~]$ /usr/lib/hadoop/bin/hadoop jar mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar org.apache.mahout.classifier.df.mapreduce.BuildForest -d input/data666.noheader.data -ds input/data666.noheader.data.info -sl 5 -p -t 100 -o nsl-forest 14/03/05 10:26:39 INFO mapreduce.BuildForest: Partial Mapred implementation 14/03/05 10:26:39 INFO mapreduce.BuildForest: Building the forest... 14/03/05 10:26:39 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 14/03/05 10:26:51 INFO input.FileInputFormat: Total input paths to process : 1 14/03/05 10:26:51 INFO mapreduce.JobSubmitter: number of splits:1 14/03/05 10:26:51 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name 14/03/05 10:26:51 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 14/03/05 10:26:51 INFO Configuration.deprecation: mapred.cache.files.filesizes is deprecated. Instead, use mapreduce.job.cache.files.filesizes 14/03/05 10:26:51 INFO Configuration.deprecation: mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files 14/03/05 10:26:51 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/03/05 10:26:51 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class 14/03/05 10:26:51 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class 14/03/05 10:26:51 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name 14/03/05 10:26:51 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class 14/03/05 10:26:51 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 14/03/05 10:26:51 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir 14/03/05 10:26:51 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class 14/03/05 10:26:51 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 14/03/05 10:26:51 INFO Configuration.deprecation: mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps 14/03/05 10:26:51 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class 14/03/05 10:26:51 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir 14/03/05 10:26:52 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1393936067845_0018 14/03/05 10:26:52 INFO impl.YarnClientImpl: Submitted application application_1393936067845_0018 to ResourceManager at /0.0.0.0:8032 14/03/05 10:26:52 INFO mapreduce.Job: The url to track the job: http://vm38.dbweb.ee:8088/proxy/application_1393936067845_0018/ 14/03/05 10:26:52 INFO mapreduce.Job: Running job: job_1393936067845_0018 14/03/05 10:27:05 INFO mapreduce.Job: Job job_1393936067845_0018 running in uber mode : false 14/03/05 10:27:05 INFO mapreduce.Job: map 0% reduce 0% 14/03/05 10:27:22 INFO mapreduce.Job: map 100% reduce 0% 14/03/05 10:27:48 INFO
Re: Recommend items not rated by any user
In case somebody runs into the same situation, the key seems to be in the CandidateItemStrategy being passed to the constructor of GenericItemBasedRecommender. Looking into the code, if no CandidateItemStrategy is specified in the constructor, PreferredItemsNeighborhoodCandidateItemsStrategy is used and as the documentation says, the doGetCandidateItems method: returns all items that have not been rated by the user and that were preferred by another user that has preferred at least one item that the current user has preferred too. So, a different CandidateItemStrategy needs to be passed. For this problem, it seems to me that AllSimilarItemsCandidateItemsStrategy, AllUnknownItemsCandidateItemsStrategy are good candidates. Does anybody know where to find some documentation about the different CandidateItemStrategy? Based on the name I would say that: 1) AllSimilarItemsCandidateItemsStrategy returns all similar items regardless of whether they have been already rated by someone or not. 2) AllUnknownItemsCandidateItemsStrategy returns all similar items that have not been rated by anyone yet. Does anybody know if it works like that? Thanks. On Tue, Mar 4, 2014 at 9:16 AM, Juan José Ramos jjar...@gmail.com wrote: First thing is thatI know this requirement would not make sense in a CF Recommender. In my case, I am trying to use Mahout to create something closer to a Content-Based Recommender. In particular, I am pre-computing a similarity matrix between all the documents (items) of my catalogue and using that matrix as the ItemSimilarity for my Item-Based Recommender. So, when a user rates a document, how could I make the recommender outputs similar documents to that ones the user has already rated even if no other user in the system has rated them yet? Is that even possible in the first place? Thanks a lot.
Re: Recommend items not rated by any user
Hi Juan, that is a good catch. CandidateItemsStrategy is the right place to implement this. Maybe we should simply extend its interface to add a parameter that says whether to keep or remove the current users items? We could even do this in the abstract base class then. --sebastian On 03/05/2014 10:42 AM, Juan José Ramos wrote: In case somebody runs into the same situation, the key seems to be in the CandidateItemStrategy being passed to the constructor of GenericItemBasedRecommender. Looking into the code, if no CandidateItemStrategy is specified in the constructor, PreferredItemsNeighborhoodCandidateItemsStrategy is used and as the documentation says, the doGetCandidateItems method: returns all items that have not been rated by the user and that were preferred by another user that has preferred at least one item that the current user has preferred too. So, a different CandidateItemStrategy needs to be passed. For this problem, it seems to me that AllSimilarItemsCandidateItemsStrategy, AllUnknownItemsCandidateItemsStrategy are good candidates. Does anybody know where to find some documentation about the different CandidateItemStrategy? Based on the name I would say that: 1) AllSimilarItemsCandidateItemsStrategy returns all similar items regardless of whether they have been already rated by someone or not. 2) AllUnknownItemsCandidateItemsStrategy returns all similar items that have not been rated by anyone yet. Does anybody know if it works like that? Thanks. On Tue, Mar 4, 2014 at 9:16 AM, Juan José Ramos jjar...@gmail.com wrote: First thing is thatI know this requirement would not make sense in a CF Recommender. In my case, I am trying to use Mahout to create something closer to a Content-Based Recommender. In particular, I am pre-computing a similarity matrix between all the documents (items) of my catalogue and using that matrix as the ItemSimilarity for my Item-Based Recommender. So, when a user rates a document, how could I make the recommender outputs similar documents to that ones the user has already rated even if no other user in the system has rated them yet? Is that even possible in the first place? Thanks a lot.
Rework our website
Hi everyone, In our latest discussion, I argued that the lack (and errors) of documentation on our website is one of the main pain points of Mahout atm. To be honest, I'm also not very happy with the design, especially fonts and spacing make it super hard to read long articles. This also prevents me from wanting to add articles and documentation. I think we should have a beautiful website, where it is fun to add new stuff. My design skills are pretty limited, but fortunately my brother is an art director! I asked him to make our website a bit more beautiful without changing to much of the structure, so that a redesign wouldn't take too long. I really like the results and would volunteer to dig out my CSS skills and do the redesign, if people agree. Here are his drafts, I like the second one best: https://people.apache.org/~ssc/mahout/mahout.jpg https://people.apache.org/~ssc/mahout/mahout2.jpg Let me know what you think! Best, Sebastian
Re: Recommend items not rated by any user
Thanks for the reply, Sebastian. I am not sure if that should be implemented in the Abstract base class though because for instance PreferredItemsNeighborhoodCandidateItemsStrategy, by definition, it returns the item not rated by the user and rated by somebody else. Back to my last post, I have been playing around with AllSimilarItemsCandidateItemsStrategy and AllUnknownItemsCandidateItemsStrategy, and although they both do what I wanted (recommend items not previously rated by any user), I honestly can't tell the difference between the two strategies. In my tests the output was always the same. If the eventual output of the recommender will not include items already rated by the user as pointed out here ( http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCABHkCkuv35dbwF%2B9sK88FR3hg7MAcdv0MP10v-5QWEvwmNdY%2BA%40mail.gmail.com%3E), AllSimilarItemsCandidateItemsStrategy should be equivalent to AllUnkownItemsCandidateItemsStrategy, shouldn't it? Thanks. On Wed, Mar 5, 2014 at 10:23 AM, Sebastian Schelter s...@apache.org wrote: Hi Juan, that is a good catch. CandidateItemsStrategy is the right place to implement this. Maybe we should simply extend its interface to add a parameter that says whether to keep or remove the current users items? We could even do this in the abstract base class then. --sebastian On 03/05/2014 10:42 AM, Juan José Ramos wrote: In case somebody runs into the same situation, the key seems to be in the CandidateItemStrategy being passed to the constructor of GenericItemBasedRecommender. Looking into the code, if no CandidateItemStrategy is specified in the constructor, PreferredItemsNeighborhoodCandidateItemsStrategy is used and as the documentation says, the doGetCandidateItems method: returns all items that have not been rated by the user and that were preferred by another user that has preferred at least one item that the current user has preferred too. So, a different CandidateItemStrategy needs to be passed. For this problem, it seems to me that AllSimilarItemsCandidateItemsStrategy, AllUnknownItemsCandidateItemsStrategy are good candidates. Does anybody know where to find some documentation about the different CandidateItemStrategy? Based on the name I would say that: 1) AllSimilarItemsCandidateItemsStrategy returns all similar items regardless of whether they have been already rated by someone or not. 2) AllUnknownItemsCandidateItemsStrategy returns all similar items that have not been rated by anyone yet. Does anybody know if it works like that? Thanks. On Tue, Mar 4, 2014 at 9:16 AM, Juan José Ramos jjar...@gmail.com wrote: First thing is thatI know this requirement would not make sense in a CF Recommender. In my case, I am trying to use Mahout to create something closer to a Content-Based Recommender. In particular, I am pre-computing a similarity matrix between all the documents (items) of my catalogue and using that matrix as the ItemSimilarity for my Item-Based Recommender. So, when a user rates a document, how could I make the recommender outputs similar documents to that ones the user has already rated even if no other user in the system has rated them yet? Is that even possible in the first place? Thanks a lot.
Re: Recommend items not rated by any user
On 03/05/2014 01:23 PM, Juan José Ramos wrote: Thanks for the reply, Sebastian. I am not sure if that should be implemented in the Abstract base class though because for instance PreferredItemsNeighborhoodCandidateItemsStrategy, by definition, it returns the item not rated by the user and rated by somebody else. Good point. So we seem to need special implementations. Back to my last post, I have been playing around with AllSimilarItemsCandidateItemsStrategy and AllUnknownItemsCandidateItemsStrategy, and although they both do what I wanted (recommend items not previously rated by any user), I honestly can't tell the difference between the two strategies. In my tests the output was always the same. If the eventual output of the recommender will not include items already rated by the user as pointed out here ( http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCABHkCkuv35dbwF%2B9sK88FR3hg7MAcdv0MP10v-5QWEvwmNdY%2BA%40mail.gmail.com%3E), AllSimilarItemsCandidateItemsStrategy should be equivalent to AllUnkownItemsCandidateItemsStrategy, shouldn't it? AllSimilarItems returns all items that are similar to any item that the user already knows. AllUnknownItems simply returns all items that the user has not interacted with yet. These are two different things, although they might overlap in some scenarios. Best, Sebastian Thanks. On Wed, Mar 5, 2014 at 10:23 AM, Sebastian Schelter s...@apache.org wrote: Hi Juan, that is a good catch. CandidateItemsStrategy is the right place to implement this. Maybe we should simply extend its interface to add a parameter that says whether to keep or remove the current users items? We could even do this in the abstract base class then. --sebastian On 03/05/2014 10:42 AM, Juan José Ramos wrote: In case somebody runs into the same situation, the key seems to be in the CandidateItemStrategy being passed to the constructor of GenericItemBasedRecommender. Looking into the code, if no CandidateItemStrategy is specified in the constructor, PreferredItemsNeighborhoodCandidateItemsStrategy is used and as the documentation says, the doGetCandidateItems method: returns all items that have not been rated by the user and that were preferred by another user that has preferred at least one item that the current user has preferred too. So, a different CandidateItemStrategy needs to be passed. For this problem, it seems to me that AllSimilarItemsCandidateItemsStrategy, AllUnknownItemsCandidateItemsStrategy are good candidates. Does anybody know where to find some documentation about the different CandidateItemStrategy? Based on the name I would say that: 1) AllSimilarItemsCandidateItemsStrategy returns all similar items regardless of whether they have been already rated by someone or not. 2) AllUnknownItemsCandidateItemsStrategy returns all similar items that have not been rated by anyone yet. Does anybody know if it works like that? Thanks. On Tue, Mar 4, 2014 at 9:16 AM, Juan José Ramos jjar...@gmail.com wrote: First thing is thatI know this requirement would not make sense in a CF Recommender. In my case, I am trying to use Mahout to create something closer to a Content-Based Recommender. In particular, I am pre-computing a similarity matrix between all the documents (items) of my catalogue and using that matrix as the ItemSimilarity for my Item-Based Recommender. So, when a user rates a document, how could I make the recommender outputs similar documents to that ones the user has already rated even if no other user in the system has rated them yet? Is that even possible in the first place? Thanks a lot.
Re: Rework our website
I liked both of them Great work Lucas! Gokhan On Wed, Mar 5, 2014 at 2:11 PM, Sebastian Schelter s...@apache.org wrote: Hi everyone, In our latest discussion, I argued that the lack (and errors) of documentation on our website is one of the main pain points of Mahout atm. To be honest, I'm also not very happy with the design, especially fonts and spacing make it super hard to read long articles. This also prevents me from wanting to add articles and documentation. I think we should have a beautiful website, where it is fun to add new stuff. My design skills are pretty limited, but fortunately my brother is an art director! I asked him to make our website a bit more beautiful without changing to much of the structure, so that a redesign wouldn't take too long. I really like the results and would volunteer to dig out my CSS skills and do the redesign, if people agree. Here are his drafts, I like the second one best: https://people.apache.org/~ssc/mahout/mahout.jpg https://people.apache.org/~ssc/mahout/mahout2.jpg Let me know what you think! Best, Sebastian
Re: Recommend items not rated by any user
Sorry there was a typo in the previous paragraph. If I remember correctly, AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value with at least one of the items preferred by the user. On Wed, Mar 5, 2014 at 3:38 PM, Tevfik Aytekin tevfik.ayte...@gmail.com wrote: Hi Juan, If I remember correctly, AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value that is with at least one of the items preferred by the user. Tevfik On Wed, Mar 5, 2014 at 2:30 PM, Sebastian Schelter s...@apache.org wrote: On 03/05/2014 01:23 PM, Juan José Ramos wrote: Thanks for the reply, Sebastian. I am not sure if that should be implemented in the Abstract base class though because for instance PreferredItemsNeighborhoodCandidateItemsStrategy, by definition, it returns the item not rated by the user and rated by somebody else. Good point. So we seem to need special implementations. Back to my last post, I have been playing around with AllSimilarItemsCandidateItemsStrategy and AllUnknownItemsCandidateItemsStrategy, and although they both do what I wanted (recommend items not previously rated by any user), I honestly can't tell the difference between the two strategies. In my tests the output was always the same. If the eventual output of the recommender will not include items already rated by the user as pointed out here ( http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCABHkCkuv35dbwF%2B9sK88FR3hg7MAcdv0MP10v-5QWEvwmNdY%2BA%40mail.gmail.com%3E), AllSimilarItemsCandidateItemsStrategy should be equivalent to AllUnkownItemsCandidateItemsStrategy, shouldn't it? AllSimilarItems returns all items that are similar to any item that the user already knows. AllUnknownItems simply returns all items that the user has not interacted with yet. These are two different things, although they might overlap in some scenarios. Best, Sebastian Thanks. On Wed, Mar 5, 2014 at 10:23 AM, Sebastian Schelter s...@apache.org wrote: Hi Juan, that is a good catch. CandidateItemsStrategy is the right place to implement this. Maybe we should simply extend its interface to add a parameter that says whether to keep or remove the current users items? We could even do this in the abstract base class then. --sebastian On 03/05/2014 10:42 AM, Juan José Ramos wrote: In case somebody runs into the same situation, the key seems to be in the CandidateItemStrategy being passed to the constructor of GenericItemBasedRecommender. Looking into the code, if no CandidateItemStrategy is specified in the constructor, PreferredItemsNeighborhoodCandidateItemsStrategy is used and as the documentation says, the doGetCandidateItems method: returns all items that have not been rated by the user and that were preferred by another user that has preferred at least one item that the current user has preferred too. So, a different CandidateItemStrategy needs to be passed. For this problem, it seems to me that AllSimilarItemsCandidateItemsStrategy, AllUnknownItemsCandidateItemsStrategy are good candidates. Does anybody know where to find some documentation about the different CandidateItemStrategy? Based on the name I would say that: 1) AllSimilarItemsCandidateItemsStrategy returns all similar items regardless of whether they have been already rated by someone or not. 2) AllUnknownItemsCandidateItemsStrategy returns all similar items that have not been rated by anyone yet. Does anybody know if it works like that? Thanks. On Tue, Mar 4, 2014 at 9:16 AM, Juan José Ramos jjar...@gmail.com wrote: First thing is thatI know this requirement would not make sense in a CF Recommender. In my case, I am trying to use Mahout to create something closer to a Content-Based Recommender. In particular, I am pre-computing a similarity matrix between all the documents (items) of my catalogue and using that matrix as the ItemSimilarity for my Item-Based Recommender. So, when a user rates a document, how could I make the recommender outputs similar documents to that ones the user has already rated even if no other user in the system has rated them yet? Is that even possible in the first place? Thanks a lot.
Re: PCA with ssvd leads to StackOverFlowError
Hi and thanks for your help! I had been told that the version of mahout used by Cloudera (CDH 4.6) was in fact 0.8 with a patch for mr2 support. ( http://mail-archives.apache.org/mod_mbox/mahout-user/201402.mbox/%3CCAEccTywqSAKA_HeX4vTZ-5XPmKtj5b8zMGQUfn5qRsiq=7o=u...@mail.gmail.com%3E) But I tried to install 0.9 on my own, by compiling it with mvn after I changed the pom.xml : - Added cloudera repository : repository idcloudera-repo/id nameCloudera Repository/name urlhttps://repository.cloudera.com/artifactory/cloudera-repos/url /repository - Changed the version of hadoop to use : hadoop.1.version2.0.0-mr1-cdh4.6.0/hadoop.1.version - I tried adding this one too : hadoop2.version2.0.0-cdh4.6.0/hadoop2.version But then I get a lot of errors when Maven begins to compile the core package : https://gist.github.com/kmoulart/9368193 Could you tell me what I did wrong ? 2014-03-04 19:02 GMT+01:00 Suneel Marthi suneel_mar...@yahoo.com: The -us option was fixed for Mahout 0.8, seems like u r using Mahout 0.7 which had this issue (from ur stacktrace, its apparent u r using Mahout 0.7). Please upgrade to the latest mahout version. On Tuesday, March 4, 2014 8:54 AM, Kevin Moulart kevinmoul...@gmail.com wrote: Hi, I'm trying to apply a PCA to reduce the dimension of a matrix of 1603 columns and 100.000 to 30.000.000 lines using ssvd with the pca option, and I always get a StackOverflowError : Here is my command line : mahout ssvd -i /user/myUser/Echant100k -o /user/myUser/Echant/SVD100 -k 100 -pca true -U false -V false -t 3 -ow I also tried to put -us true as mentionned in https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=18modificationDate=1381347063000api=v2but the option is not available anymore. The output of the previous command is : MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar 14/03/04 14:45:16 INFO common.AbstractJob: Command line arguments: {--abtBlockHeight=[20], --blockHeight=[1], --broadcast=[true], --computeU=[false], --computeV=[false], --endPhase=[2147483647], --input=[/user/myUser/Echant100k], --minSplitSize=[-1], --outerProdBlockHeight=[3], --output=[/user/myUser/Echant/SVD100], --oversampling=[15], --overwrite=null, --pca=[true], --powerIter=[0], --rank=[100], --reduceTasks=[3], --startPhase=[0], --tempDir=[temp], --uHalfSigma=[false], --vHalfSigma=[false]} Exception in thread main java.lang.StackOverflowError at org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55) at org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55) at org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55) ... I search online and didn't find a solution to my problem. Can you help me ? Thanks in advance, -- Kévin Moulart -- Kévin Moulart GSM France : +33 7 81 06 10 10 GSM Belgique : +32 473 85 23 85 Téléphone fixe : +32 2 771 88 45
Re: Recommend items not rated by any user
Hi Tefik, Thanks for the response. I think what you says contradicts what Sebastian pointed out before. Also, if AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user, what would AllUnknownItemsCandidateItemsStrategy return? On Wed, Mar 5, 2014 at 1:40 PM, Tevfik Aytekin tevfik.ayte...@gmail.comwrote: Sorry there was a typo in the previous paragraph. If I remember correctly, AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value with at least one of the items preferred by the user. On Wed, Mar 5, 2014 at 3:38 PM, Tevfik Aytekin tevfik.ayte...@gmail.com wrote: Hi Juan, If I remember correctly, AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value that is with at least one of the items preferred by the user. Tevfik On Wed, Mar 5, 2014 at 2:30 PM, Sebastian Schelter s...@apache.org wrote: On 03/05/2014 01:23 PM, Juan José Ramos wrote: Thanks for the reply, Sebastian. I am not sure if that should be implemented in the Abstract base class though because for instance PreferredItemsNeighborhoodCandidateItemsStrategy, by definition, it returns the item not rated by the user and rated by somebody else. Good point. So we seem to need special implementations. Back to my last post, I have been playing around with AllSimilarItemsCandidateItemsStrategy and AllUnknownItemsCandidateItemsStrategy, and although they both do what I wanted (recommend items not previously rated by any user), I honestly can't tell the difference between the two strategies. In my tests the output was always the same. If the eventual output of the recommender will not include items already rated by the user as pointed out here ( http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCABHkCkuv35dbwF%2B9sK88FR3hg7MAcdv0MP10v-5QWEvwmNdY%2BA%40mail.gmail.com%3E ), AllSimilarItemsCandidateItemsStrategy should be equivalent to AllUnkownItemsCandidateItemsStrategy, shouldn't it? AllSimilarItems returns all items that are similar to any item that the user already knows. AllUnknownItems simply returns all items that the user has not interacted with yet. These are two different things, although they might overlap in some scenarios. Best, Sebastian Thanks. On Wed, Mar 5, 2014 at 10:23 AM, Sebastian Schelter s...@apache.org wrote: Hi Juan, that is a good catch. CandidateItemsStrategy is the right place to implement this. Maybe we should simply extend its interface to add a parameter that says whether to keep or remove the current users items? We could even do this in the abstract base class then. --sebastian On 03/05/2014 10:42 AM, Juan José Ramos wrote: In case somebody runs into the same situation, the key seems to be in the CandidateItemStrategy being passed to the constructor of GenericItemBasedRecommender. Looking into the code, if no CandidateItemStrategy is specified in the constructor, PreferredItemsNeighborhoodCandidateItemsStrategy is used and as the documentation says, the doGetCandidateItems method: returns all items that have not been rated by the user and that were preferred by another user that has preferred at least one item that the current user has preferred too. So, a different CandidateItemStrategy needs to be passed. For this problem, it seems to me that AllSimilarItemsCandidateItemsStrategy, AllUnknownItemsCandidateItemsStrategy are good candidates. Does anybody know where to find some documentation about the different CandidateItemStrategy? Based on the name I would say that: 1) AllSimilarItemsCandidateItemsStrategy returns all similar items regardless of whether they have been already rated by someone or not. 2) AllUnknownItemsCandidateItemsStrategy returns all similar items that have not been rated by anyone yet. Does anybody know if it works like that? Thanks. On Tue, Mar 4, 2014 at 9:16 AM, Juan José Ramos jjar...@gmail.com wrote: First thing is thatI know this requirement would not make sense in a CF Recommender. In my case, I am trying to use Mahout to create something closer to a Content-Based Recommender. In particular, I am pre-computing a similarity matrix between all the documents (items) of my catalogue and using that matrix as the ItemSimilarity for my Item-Based Recommender. So, when a user rates a document, how could I make the recommender outputs similar documents to that ones the user has already rated even if no other user in the system has rated them yet? Is that even possible in the first place? Thanks a lot.
Re: Recommend items not rated by any user
Juan, You got me wrong, AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value with at least one of the items preferred by the user. So, it does not simply return all items that have not been rated by the user. For example, if there is an item X which has not been rated by the user and if the similarity value between X and at least one of the items rated (preferred) by the user is not NaN, then X will be not be returned by AllSimilarItemsCandidateItemsStrategy, but it will be returned by AllUnknownItemsCandidateItemsStrategy. On Wed, Mar 5, 2014 at 4:42 PM, Juan José Ramos jjar...@gmail.com wrote: Hi Tefik, Thanks for the response. I think what you says contradicts what Sebastian pointed out before. Also, if AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user, what would AllUnknownItemsCandidateItemsStrategy return? On Wed, Mar 5, 2014 at 1:40 PM, Tevfik Aytekin tevfik.ayte...@gmail.comwrote: Sorry there was a typo in the previous paragraph. If I remember correctly, AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value with at least one of the items preferred by the user. On Wed, Mar 5, 2014 at 3:38 PM, Tevfik Aytekin tevfik.ayte...@gmail.com wrote: Hi Juan, If I remember correctly, AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value that is with at least one of the items preferred by the user. Tevfik On Wed, Mar 5, 2014 at 2:30 PM, Sebastian Schelter s...@apache.org wrote: On 03/05/2014 01:23 PM, Juan José Ramos wrote: Thanks for the reply, Sebastian. I am not sure if that should be implemented in the Abstract base class though because for instance PreferredItemsNeighborhoodCandidateItemsStrategy, by definition, it returns the item not rated by the user and rated by somebody else. Good point. So we seem to need special implementations. Back to my last post, I have been playing around with AllSimilarItemsCandidateItemsStrategy and AllUnknownItemsCandidateItemsStrategy, and although they both do what I wanted (recommend items not previously rated by any user), I honestly can't tell the difference between the two strategies. In my tests the output was always the same. If the eventual output of the recommender will not include items already rated by the user as pointed out here ( http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCABHkCkuv35dbwF%2B9sK88FR3hg7MAcdv0MP10v-5QWEvwmNdY%2BA%40mail.gmail.com%3E ), AllSimilarItemsCandidateItemsStrategy should be equivalent to AllUnkownItemsCandidateItemsStrategy, shouldn't it? AllSimilarItems returns all items that are similar to any item that the user already knows. AllUnknownItems simply returns all items that the user has not interacted with yet. These are two different things, although they might overlap in some scenarios. Best, Sebastian Thanks. On Wed, Mar 5, 2014 at 10:23 AM, Sebastian Schelter s...@apache.org wrote: Hi Juan, that is a good catch. CandidateItemsStrategy is the right place to implement this. Maybe we should simply extend its interface to add a parameter that says whether to keep or remove the current users items? We could even do this in the abstract base class then. --sebastian On 03/05/2014 10:42 AM, Juan José Ramos wrote: In case somebody runs into the same situation, the key seems to be in the CandidateItemStrategy being passed to the constructor of GenericItemBasedRecommender. Looking into the code, if no CandidateItemStrategy is specified in the constructor, PreferredItemsNeighborhoodCandidateItemsStrategy is used and as the documentation says, the doGetCandidateItems method: returns all items that have not been rated by the user and that were preferred by another user that has preferred at least one item that the current user has preferred too. So, a different CandidateItemStrategy needs to be passed. For this problem, it seems to me that AllSimilarItemsCandidateItemsStrategy, AllUnknownItemsCandidateItemsStrategy are good candidates. Does anybody know where to find some documentation about the different CandidateItemStrategy? Based on the name I would say that: 1) AllSimilarItemsCandidateItemsStrategy returns all similar items regardless of whether they have been already rated by someone or not. 2) AllUnknownItemsCandidateItemsStrategy returns all similar items that have not been rated by anyone yet. Does anybody know if it works like that? Thanks. On Tue, Mar 4, 2014 at 9:16 AM, Juan José Ramos jjar...@gmail.com wrote:
Re: Rework our website
Both are nice. I think you are right that the second is calmer. On Wed, Mar 5, 2014 at 4:11 AM, Sebastian Schelter s...@apache.org wrote: Hi everyone, In our latest discussion, I argued that the lack (and errors) of documentation on our website is one of the main pain points of Mahout atm. To be honest, I'm also not very happy with the design, especially fonts and spacing make it super hard to read long articles. This also prevents me from wanting to add articles and documentation. I think we should have a beautiful website, where it is fun to add new stuff. My design skills are pretty limited, but fortunately my brother is an art director! I asked him to make our website a bit more beautiful without changing to much of the structure, so that a redesign wouldn't take too long. I really like the results and would volunteer to dig out my CSS skills and do the redesign, if people agree. Here are his drafts, I like the second one best: https://people.apache.org/~ssc/mahout/mahout.jpg https://people.apache.org/~ssc/mahout/mahout2.jpg Let me know what you think! Best, Sebastian
Re: Recommend items not rated by any user
I am ignoring the rest of the thread because I suspect it may have gotten off track. Your data is new articles, right? You would like to recommend from known articles to any user based on an article they rate or even view. You have no collaborative filtering data because the lifetime of a news article is short and so there is not enough usage data to create a CF type recommender. Is this a correct problem statement? If so I don’t believe you should be using a CF recommender from Mahout’s collection. However you can use the Mahout text analysis pipeline to find all articles that are similar to each other. In this case when a user views any article in the training data you can show the most similar items precalculated with RowSimilarityJob and the rest of the text prep jobs. The pipeline is outlined here: https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis+using+the+Mahout+command+line But this will only work for news articles already in the training data. Another approach it to not use Mahout at all. Simply index all docs as they come in with Solr. Then when a user rates or even views an article, even if it has not been indexed yet, you can use the viewed article as the query on the indexed articles and Solr will return articles ranked by similarity. This is a content based recommender based solely on Solr. Does this describe your situation? On Mar 4, 2014, at 1:16 AM, Juan José Ramos jjar...@gmail.com wrote: First thing is thatI know this requirement would not make sense in a CF Recommender. In my case, I am trying to use Mahout to create something closer to a Content-Based Recommender. In particular, I am pre-computing a similarity matrix between all the documents (items) of my catalogue and using that matrix as the ItemSimilarity for my Item-Based Recommender. So, when a user rates a document, how could I make the recommender outputs similar documents to that ones the user has already rated even if no other user in the system has rated them yet? Is that even possible in the first place? Thanks a lot.
Re: Rework our website
What no centered text?? ;-) Love either. BTW users are no longer able to contribute content to the wiki. Most CMSs have a way to allow input that is moderated. Might this make getting documentation help easier? Allow anyone to contribute but committers can filter out the bad—sort of like submitting patches. On Mar 5, 2014, at 4:11 AM, Sebastian Schelter s...@apache.org wrote: Hi everyone, In our latest discussion, I argued that the lack (and errors) of documentation on our website is one of the main pain points of Mahout atm. To be honest, I'm also not very happy with the design, especially fonts and spacing make it super hard to read long articles. This also prevents me from wanting to add articles and documentation. I think we should have a beautiful website, where it is fun to add new stuff. My design skills are pretty limited, but fortunately my brother is an art director! I asked him to make our website a bit more beautiful without changing to much of the structure, so that a redesign wouldn't take too long. I really like the results and would volunteer to dig out my CSS skills and do the redesign, if people agree. Here are his drafts, I like the second one best: https://people.apache.org/~ssc/mahout/mahout.jpg https://people.apache.org/~ssc/mahout/mahout2.jpg Let me know what you think! Best, Sebastian
Re: Rework our website
On Wed, Mar 5, 2014 at 7:47 AM, Pat Ferrel p...@occamsmachete.com wrote: What no centered text?? ;-) Love either. BTW users are no longer able to contribute content to the wiki. Most CMSs have a way to allow input that is moderated. Might this make getting documentation help easier? Allow anyone to contribute but committers can filter out the bad--sort of like submitting patches. Yes, that's a good idea. They both look good, thanks Sebastian and Lucas.
Re: Rework our website
I had recently taken the text tour of mahout, but I couldn't decipher a way to contribute updates to the tour (some of the file names have changed, etc). How would I start? (this was part of my offer to help with the documentation of Mahout). SCott On 3/5/14 9:47 AM, Pat Ferrel p...@occamsmachete.com wrote: What no centered text?? ;-) Love either. BTW users are no longer able to contribute content to the wiki. Most CMSs have a way to allow input that is moderated. Might this make getting documentation help easier? Allow anyone to contribute but committers can filter out the badsort of like submitting patches. On Mar 5, 2014, at 4:11 AM, Sebastian Schelter s...@apache.org wrote: Hi everyone, In our latest discussion, I argued that the lack (and errors) of documentation on our website is one of the main pain points of Mahout atm. To be honest, I'm also not very happy with the design, especially fonts and spacing make it super hard to read long articles. This also prevents me from wanting to add articles and documentation. I think we should have a beautiful website, where it is fun to add new stuff. My design skills are pretty limited, but fortunately my brother is an art director! I asked him to make our website a bit more beautiful without changing to much of the structure, so that a redesign wouldn't take too long. I really like the results and would volunteer to dig out my CSS skills and do the redesign, if people agree. Here are his drafts, I like the second one best: https://people.apache.org/~ssc/mahout/mahout.jpg https://people.apache.org/~ssc/mahout/mahout2.jpg Let me know what you think! Best, Sebastian
Re: Recommend items not rated by any user
@Pat. You described my situation very well. The only additional thing is that I am also interested in creating some sort of a profile from the user with all the information s/he has provided by interacting with the articles and not only recommending similar items (news) based on a specific input. Thus, that is why I thought using the output of RowSimilarityJob as the ItemSimilarity of a ItemBasedRecommender would behave as I want since I use Mahout dataModel to create that profile. On Wed, Mar 5, 2014 at 3:40 PM, Pat Ferrel p...@occamsmachete.com wrote: I am ignoring the rest of the thread because I suspect it may have gotten off track. Your data is new articles, right? You would like to recommend from known articles to any user based on an article they rate or even view. You have no collaborative filtering data because the lifetime of a news article is short and so there is not enough usage data to create a CF type recommender. Is this a correct problem statement? If so I don't believe you should be using a CF recommender from Mahout's collection. However you can use the Mahout text analysis pipeline to find all articles that are similar to each other. In this case when a user views any article in the training data you can show the most similar items precalculated with RowSimilarityJob and the rest of the text prep jobs. The pipeline is outlined here: https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis+using+the+Mahout+command+line But this will only work for news articles already in the training data. Another approach it to not use Mahout at all. Simply index all docs as they come in with Solr. Then when a user rates or even views an article, even if it has not been indexed yet, you can use the viewed article as the query on the indexed articles and Solr will return articles ranked by similarity. This is a content based recommender based solely on Solr. Does this describe your situation? On Mar 4, 2014, at 1:16 AM, Juan José Ramos jjar...@gmail.com wrote: First thing is thatI know this requirement would not make sense in a CF Recommender. In my case, I am trying to use Mahout to create something closer to a Content-Based Recommender. In particular, I am pre-computing a similarity matrix between all the documents (items) of my catalogue and using that matrix as the ItemSimilarity for my Item-Based Recommender. So, when a user rates a document, how could I make the recommender outputs similar documents to that ones the user has already rated even if no other user in the system has rated them yet? Is that even possible in the first place? Thanks a lot.
Re: Recommend items not rated by any user
@Tevfik, running this recommender: GenericItemBasedRecommender itemRecommender = new GenericItemBasedRecommender(dataModel, itemSimilarity, new AllSimilarItemsCandidateItemsStrategy(itemSimilarity), new AllSimilarItemsCandidateItemsStrategy(itemSimilarity)); With this dataModel: 1,1,1.0 1,2,2.0 1,3,1.0 1,4,2.0 2,1,1.0 2,2,4.0 And these similarities 1,2,0.1 1,3,0.2 1,4,0.3 2,3,0.5 3,4,0.5 5,1,0.2 5,2,1.0 Returns item 5 for User 1. So item 5 has not been preferred by user 1, and the similarity between item 5 and two of the items user 1 preferred are not NaN, but AllSimilarItemsCandidateItemsStrategy is returning that item. So, I'm truly sorry to insist on this, but I still really do not get the difference. On Wed, Mar 5, 2014 at 2:53 PM, Tevfik Aytekin tevfik.ayte...@gmail.comwrote: Juan, You got me wrong, AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value with at least one of the items preferred by the user. So, it does not simply return all items that have not been rated by the user. For example, if there is an item X which has not been rated by the user and if the similarity value between X and at least one of the items rated (preferred) by the user is not NaN, then X will be not be returned by AllSimilarItemsCandidateItemsStrategy, but it will be returned by AllUnknownItemsCandidateItemsStrategy. On Wed, Mar 5, 2014 at 4:42 PM, Juan José Ramos jjar...@gmail.com wrote: Hi Tefik, Thanks for the response. I think what you says contradicts what Sebastian pointed out before. Also, if AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user, what would AllUnknownItemsCandidateItemsStrategy return? On Wed, Mar 5, 2014 at 1:40 PM, Tevfik Aytekin tevfik.ayte...@gmail.com wrote: Sorry there was a typo in the previous paragraph. If I remember correctly, AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value with at least one of the items preferred by the user. On Wed, Mar 5, 2014 at 3:38 PM, Tevfik Aytekin tevfik.ayte...@gmail.com wrote: Hi Juan, If I remember correctly, AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value that is with at least one of the items preferred by the user. Tevfik On Wed, Mar 5, 2014 at 2:30 PM, Sebastian Schelter s...@apache.org wrote: On 03/05/2014 01:23 PM, Juan José Ramos wrote: Thanks for the reply, Sebastian. I am not sure if that should be implemented in the Abstract base class though because for instance PreferredItemsNeighborhoodCandidateItemsStrategy, by definition, it returns the item not rated by the user and rated by somebody else. Good point. So we seem to need special implementations. Back to my last post, I have been playing around with AllSimilarItemsCandidateItemsStrategy and AllUnknownItemsCandidateItemsStrategy, and although they both do what I wanted (recommend items not previously rated by any user), I honestly can't tell the difference between the two strategies. In my tests the output was always the same. If the eventual output of the recommender will not include items already rated by the user as pointed out here ( http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCABHkCkuv35dbwF%2B9sK88FR3hg7MAcdv0MP10v-5QWEvwmNdY%2BA%40mail.gmail.com%3E ), AllSimilarItemsCandidateItemsStrategy should be equivalent to AllUnkownItemsCandidateItemsStrategy, shouldn't it? AllSimilarItems returns all items that are similar to any item that the user already knows. AllUnknownItems simply returns all items that the user has not interacted with yet. These are two different things, although they might overlap in some scenarios. Best, Sebastian Thanks. On Wed, Mar 5, 2014 at 10:23 AM, Sebastian Schelter s...@apache.org wrote: Hi Juan, that is a good catch. CandidateItemsStrategy is the right place to implement this. Maybe we should simply extend its interface to add a parameter that says whether to keep or remove the current users items? We could even do this in the abstract base class then. --sebastian On 03/05/2014 10:42 AM, Juan José Ramos wrote: In case somebody runs into the same situation, the key seems to be in the CandidateItemStrategy being passed to the constructor of GenericItemBasedRecommender. Looking into the code, if no CandidateItemStrategy is specified in the constructor, PreferredItemsNeighborhoodCandidateItemsStrategy is used and as the documentation says, the
Re: Fwd: PCA with ssvd leads to StackOverFlowError
Not sure if the CDH4 patches on top of 0.7 has fixes for M-1067 and M-1098 which address the issues u r seeing. The second part of the issue u r seeing with Mahout 0.9 distro seems to be related to how u set it up on CDH4. I apologize for not being helpful here as I am not a CDH4 user or expert. Sean? On Wednesday, March 5, 2014 10:23 AM, Kevin Moulart kevinmoul...@gmail.com wrote: Previous mail sent only to Suneel : (my bad sorry) According to my stacktrace it seems that I am running mahout 0.7 indeed. That's the version provided by Cloudera when I install mahout using yum. But according to Sean Owen, it really is a 0.8 inside... Anyway I tried with the compiled version and it didn't work : Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop and HADOOP_CONF_DIR= Exception in thread main java.lang.NoSuchMethodError: org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:122) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) MAHOUT-JOB: /home/cacf/Downloads/mahout-distribution-0.9/mahout-examples-0.9-job.jar And now I changed the conf directory of mahout 0.9 to be linked to the one used by the existing working mahout and the trace changes : MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf MAHOUT-JOB: /home/myCompany/Downloads/mahout-distribution-0.9/mahout-examples-0.9-job.jar 14/03/05 16:16:23 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.clustering.meanshift.MeanShiftCanopyDriver java.lang.ClassNotFoundException: org.apache.mahout.clustering.meanshift.MeanShiftCanopyDriver at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:190) at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:118) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) 14/03/05 16:16:23 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.clustering.spectral.eigencuts.EigencutsDriver java.lang.ClassNotFoundException: org.apache.mahout.clustering.spectral.eigencuts.EigencutsDriver at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:190) at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:118) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) 14/03/05 16:16:23 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.clustering.minhash.MinHashDriver java.lang.ClassNotFoundException: org.apache.mahout.clustering.minhash.MinHashDriver at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:190) at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:118) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
Re: Fwd: PCA with ssvd leads to StackOverFlowError
I'm not sure about this either but I think these are all the changes to Mahout in CDH 4.6.0: http://archive.cloudera.com/cdh4/cdh/4/mahout-0.7-cdh4.6.0.CHANGES.txt MAHOUT-1291 MAHOUT-1033 MAHOUT-1142 On Wed, Mar 5, 2014 at 8:30 AM, Suneel Marthi suneel_mar...@yahoo.comwrote: Not sure if the CDH4 patches on top of 0.7 has fixes for M-1067 and M-1098 which address the issues u r seeing. The second part of the issue u r seeing with Mahout 0.9 distro seems to be related to how u set it up on CDH4. I apologize for not being helpful here as I am not a CDH4 user or expert. Sean? On Wednesday, March 5, 2014 10:23 AM, Kevin Moulart kevinmoul...@gmail.com wrote: Previous mail sent only to Suneel : (my bad sorry) According to my stacktrace it seems that I am running mahout 0.7 indeed. That's the version provided by Cloudera when I install mahout using yum. But according to Sean Owen, it really is a 0.8 inside... Anyway I tried with the compiled version and it didn't work : Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop and HADOOP_CONF_DIR= Exception in thread main java.lang.NoSuchMethodError: org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:122) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) MAHOUT-JOB: /home/cacf/Downloads/mahout-distribution-0.9/mahout-examples-0.9-job.jar And now I changed the conf directory of mahout 0.9 to be linked to the one used by the existing working mahout and the trace changes : MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf MAHOUT-JOB: /home/myCompany/Downloads/mahout-distribution-0.9/mahout-examples-0.9-job.jar 14/03/05 16:16:23 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.clustering.meanshift.MeanShiftCanopyDriver java.lang.ClassNotFoundException: org.apache.mahout.clustering.meanshift.MeanShiftCanopyDriver at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:190) at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:118) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) 14/03/05 16:16:23 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.clustering.spectral.eigencuts.EigencutsDriver java.lang.ClassNotFoundException: org.apache.mahout.clustering.spectral.eigencuts.EigencutsDriver at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:190) at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:118) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) 14/03/05 16:16:23 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.clustering.minhash.MinHashDriver java.lang.ClassNotFoundException: org.apache.mahout.clustering.minhash.MinHashDriver at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at
Re: Recommend items not rated by any user
If the similarity between item 5 and two of the items user 1 preferred are not NaN then it will return 1, that is what I'm saying. If the similarities were all NaN then it will not return it. But surely, you might wonder if all similarities between an item and user's items are NaN, then AllUnknownItemsCandidateItemsStrategy probably will not return it. So both strategies seems to be effectively the same, I don't know what the implementers had in mind when designing AllSimilarItemsCandidateItemsStrategy. On Wed, Mar 5, 2014 at 6:06 PM, Juan José Ramos jjar...@gmail.com wrote: @Tevfik, running this recommender: GenericItemBasedRecommender itemRecommender = new GenericItemBasedRecommender(dataModel, itemSimilarity, new AllSimilarItemsCandidateItemsStrategy(itemSimilarity), new AllSimilarItemsCandidateItemsStrategy(itemSimilarity)); With this dataModel: 1,1,1.0 1,2,2.0 1,3,1.0 1,4,2.0 2,1,1.0 2,2,4.0 And these similarities 1,2,0.1 1,3,0.2 1,4,0.3 2,3,0.5 3,4,0.5 5,1,0.2 5,2,1.0 Returns item 5 for User 1. So item 5 has not been preferred by user 1, and the similarity between item 5 and two of the items user 1 preferred are not NaN, but AllSimilarItemsCandidateItemsStrategy is returning that item. So, I'm truly sorry to insist on this, but I still really do not get the difference. On Wed, Mar 5, 2014 at 2:53 PM, Tevfik Aytekin tevfik.ayte...@gmail.comwrote: Juan, You got me wrong, AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value with at least one of the items preferred by the user. So, it does not simply return all items that have not been rated by the user. For example, if there is an item X which has not been rated by the user and if the similarity value between X and at least one of the items rated (preferred) by the user is not NaN, then X will be not be returned by AllSimilarItemsCandidateItemsStrategy, but it will be returned by AllUnknownItemsCandidateItemsStrategy. On Wed, Mar 5, 2014 at 4:42 PM, Juan José Ramos jjar...@gmail.com wrote: Hi Tefik, Thanks for the response. I think what you says contradicts what Sebastian pointed out before. Also, if AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user, what would AllUnknownItemsCandidateItemsStrategy return? On Wed, Mar 5, 2014 at 1:40 PM, Tevfik Aytekin tevfik.ayte...@gmail.com wrote: Sorry there was a typo in the previous paragraph. If I remember correctly, AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value with at least one of the items preferred by the user. On Wed, Mar 5, 2014 at 3:38 PM, Tevfik Aytekin tevfik.ayte...@gmail.com wrote: Hi Juan, If I remember correctly, AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value that is with at least one of the items preferred by the user. Tevfik On Wed, Mar 5, 2014 at 2:30 PM, Sebastian Schelter s...@apache.org wrote: On 03/05/2014 01:23 PM, Juan José Ramos wrote: Thanks for the reply, Sebastian. I am not sure if that should be implemented in the Abstract base class though because for instance PreferredItemsNeighborhoodCandidateItemsStrategy, by definition, it returns the item not rated by the user and rated by somebody else. Good point. So we seem to need special implementations. Back to my last post, I have been playing around with AllSimilarItemsCandidateItemsStrategy and AllUnknownItemsCandidateItemsStrategy, and although they both do what I wanted (recommend items not previously rated by any user), I honestly can't tell the difference between the two strategies. In my tests the output was always the same. If the eventual output of the recommender will not include items already rated by the user as pointed out here ( http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCABHkCkuv35dbwF%2B9sK88FR3hg7MAcdv0MP10v-5QWEvwmNdY%2BA%40mail.gmail.com%3E ), AllSimilarItemsCandidateItemsStrategy should be equivalent to AllUnkownItemsCandidateItemsStrategy, shouldn't it? AllSimilarItems returns all items that are similar to any item that the user already knows. AllUnknownItems simply returns all items that the user has not interacted with yet. These are two different things, although they might overlap in some scenarios. Best, Sebastian Thanks. On Wed, Mar 5, 2014 at 10:23 AM, Sebastian Schelter s...@apache.org wrote: Hi Juan, that is a good catch. CandidateItemsStrategy is the right place to implement this. Maybe we should simply extend its
Re: Recommend items not rated by any user
So both strategies seems to be effectively the same, I don't know what the implementers had in mind when designing AllSimilarItemsCandidateItemsStrategy. It can take a long time to estimate preferences for all items a user doesn't know. Especially if you have a lot of items. Traditional item-based recommenders will not recommend any item that is not similar to at least one of the items the user interacted with, so AllSimilarItemsStrategy already selects the maximum set of items that could be potentially recommended to the user. --sebastian On 03/05/2014 05:38 PM, Tevfik Aytekin wrote: If the similarity between item 5 and two of the items user 1 preferred are not NaN then it will return 1, that is what I'm saying. If the similarities were all NaN then it will not return it. But surely, you might wonder if all similarities between an item and user's items are NaN, then AllUnknownItemsCandidateItemsStrategy probably will not return it. On Wed, Mar 5, 2014 at 6:06 PM, Juan José Ramos jjar...@gmail.com wrote: @Tevfik, running this recommender: GenericItemBasedRecommender itemRecommender = new GenericItemBasedRecommender(dataModel, itemSimilarity, new AllSimilarItemsCandidateItemsStrategy(itemSimilarity), new AllSimilarItemsCandidateItemsStrategy(itemSimilarity)); With this dataModel: 1,1,1.0 1,2,2.0 1,3,1.0 1,4,2.0 2,1,1.0 2,2,4.0 And these similarities 1,2,0.1 1,3,0.2 1,4,0.3 2,3,0.5 3,4,0.5 5,1,0.2 5,2,1.0 Returns item 5 for User 1. So item 5 has not been preferred by user 1, and the similarity between item 5 and two of the items user 1 preferred are not NaN, but AllSimilarItemsCandidateItemsStrategy is returning that item. So, I'm truly sorry to insist on this, but I still really do not get the difference. On Wed, Mar 5, 2014 at 2:53 PM, Tevfik Aytekin tevfik.ayte...@gmail.comwrote: Juan, You got me wrong, AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value with at least one of the items preferred by the user. So, it does not simply return all items that have not been rated by the user. For example, if there is an item X which has not been rated by the user and if the similarity value between X and at least one of the items rated (preferred) by the user is not NaN, then X will be not be returned by AllSimilarItemsCandidateItemsStrategy, but it will be returned by AllUnknownItemsCandidateItemsStrategy. On Wed, Mar 5, 2014 at 4:42 PM, Juan José Ramos jjar...@gmail.com wrote: Hi Tefik, Thanks for the response. I think what you says contradicts what Sebastian pointed out before. Also, if AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user, what would AllUnknownItemsCandidateItemsStrategy return? On Wed, Mar 5, 2014 at 1:40 PM, Tevfik Aytekin tevfik.ayte...@gmail.com wrote: Sorry there was a typo in the previous paragraph. If I remember correctly, AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value with at least one of the items preferred by the user. On Wed, Mar 5, 2014 at 3:38 PM, Tevfik Aytekin tevfik.ayte...@gmail.com wrote: Hi Juan, If I remember correctly, AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value that is with at least one of the items preferred by the user. Tevfik On Wed, Mar 5, 2014 at 2:30 PM, Sebastian Schelter s...@apache.org wrote: On 03/05/2014 01:23 PM, Juan José Ramos wrote: Thanks for the reply, Sebastian. I am not sure if that should be implemented in the Abstract base class though because for instance PreferredItemsNeighborhoodCandidateItemsStrategy, by definition, it returns the item not rated by the user and rated by somebody else. Good point. So we seem to need special implementations. Back to my last post, I have been playing around with AllSimilarItemsCandidateItemsStrategy and AllUnknownItemsCandidateItemsStrategy, and although they both do what I wanted (recommend items not previously rated by any user), I honestly can't tell the difference between the two strategies. In my tests the output was always the same. If the eventual output of the recommender will not include items already rated by the user as pointed out here ( http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCABHkCkuv35dbwF%2B9sK88FR3hg7MAcdv0MP10v-5QWEvwmNdY%2BA%40mail.gmail.com%3E ), AllSimilarItemsCandidateItemsStrategy should be equivalent to AllUnkownItemsCandidateItemsStrategy, shouldn't it? AllSimilarItems returns all items that are similar to any item that the user already knows. AllUnknownItems simply returns all items that the user has not interacted with yet. These are two different things, although they might overlap in some scenarios. Best,
Re: Rework our website
+1 for Option# 2. On Wednesday, March 5, 2014 7:11 AM, Sebastian Schelter s...@apache.org wrote: Hi everyone, In our latest discussion, I argued that the lack (and errors) of documentation on our website is one of the main pain points of Mahout atm. To be honest, I'm also not very happy with the design, especially fonts and spacing make it super hard to read long articles. This also prevents me from wanting to add articles and documentation. I think we should have a beautiful website, where it is fun to add new stuff. My design skills are pretty limited, but fortunately my brother is an art director! I asked him to make our website a bit more beautiful without changing to much of the structure, so that a redesign wouldn't take too long. I really like the results and would volunteer to dig out my CSS skills and do the redesign, if people agree. Here are his drafts, I like the second one best: https://people.apache.org/~ssc/mahout/mahout.jpg https://people.apache.org/~ssc/mahout/mahout2.jpg Let me know what you think! Best, Sebastian
Re: Recommend items not rated by any user
Hi Sebastian, But in order not to select items that is not similar to at least one of the items the user interacted with you have to compute the similarity with all user items (which is the main task for estimating the preference of an item in item-based method). So, it seems to me that AllSimilarItemsStrategy does not bring much advantage over AllUnknownItemsCandidateItemsStrategy. On Wed, Mar 5, 2014 at 6:46 PM, Sebastian Schelter s...@apache.org wrote: So both strategies seems to be effectively the same, I don't know what the implementers had in mind when designing AllSimilarItemsCandidateItemsStrategy. It can take a long time to estimate preferences for all items a user doesn't know. Especially if you have a lot of items. Traditional item-based recommenders will not recommend any item that is not similar to at least one of the items the user interacted with, so AllSimilarItemsStrategy already selects the maximum set of items that could be potentially recommended to the user. --sebastian On 03/05/2014 05:38 PM, Tevfik Aytekin wrote: If the similarity between item 5 and two of the items user 1 preferred are not NaN then it will return 1, that is what I'm saying. If the similarities were all NaN then it will not return it. But surely, you might wonder if all similarities between an item and user's items are NaN, then AllUnknownItemsCandidateItemsStrategy probably will not return it. On Wed, Mar 5, 2014 at 6:06 PM, Juan José Ramos jjar...@gmail.com wrote: @Tevfik, running this recommender: GenericItemBasedRecommender itemRecommender = new GenericItemBasedRecommender(dataModel, itemSimilarity, new AllSimilarItemsCandidateItemsStrategy(itemSimilarity), new AllSimilarItemsCandidateItemsStrategy(itemSimilarity)); With this dataModel: 1,1,1.0 1,2,2.0 1,3,1.0 1,4,2.0 2,1,1.0 2,2,4.0 And these similarities 1,2,0.1 1,3,0.2 1,4,0.3 2,3,0.5 3,4,0.5 5,1,0.2 5,2,1.0 Returns item 5 for User 1. So item 5 has not been preferred by user 1, and the similarity between item 5 and two of the items user 1 preferred are not NaN, but AllSimilarItemsCandidateItemsStrategy is returning that item. So, I'm truly sorry to insist on this, but I still really do not get the difference. On Wed, Mar 5, 2014 at 2:53 PM, Tevfik Aytekin tevfik.ayte...@gmail.comwrote: Juan, You got me wrong, AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value with at least one of the items preferred by the user. So, it does not simply return all items that have not been rated by the user. For example, if there is an item X which has not been rated by the user and if the similarity value between X and at least one of the items rated (preferred) by the user is not NaN, then X will be not be returned by AllSimilarItemsCandidateItemsStrategy, but it will be returned by AllUnknownItemsCandidateItemsStrategy. On Wed, Mar 5, 2014 at 4:42 PM, Juan José Ramos jjar...@gmail.com wrote: Hi Tefik, Thanks for the response. I think what you says contradicts what Sebastian pointed out before. Also, if AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user, what would AllUnknownItemsCandidateItemsStrategy return? On Wed, Mar 5, 2014 at 1:40 PM, Tevfik Aytekin tevfik.ayte...@gmail.com wrote: Sorry there was a typo in the previous paragraph. If I remember correctly, AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value with at least one of the items preferred by the user. On Wed, Mar 5, 2014 at 3:38 PM, Tevfik Aytekin tevfik.ayte...@gmail.com wrote: Hi Juan, If I remember correctly, AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value that is with at least one of the items preferred by the user. Tevfik On Wed, Mar 5, 2014 at 2:30 PM, Sebastian Schelter s...@apache.org wrote: On 03/05/2014 01:23 PM, Juan José Ramos wrote: Thanks for the reply, Sebastian. I am not sure if that should be implemented in the Abstract base class though because for instance PreferredItemsNeighborhoodCandidateItemsStrategy, by definition, it returns the item not rated by the user and rated by somebody else. Good point. So we seem to need special implementations. Back to my last post, I have been playing around with AllSimilarItemsCandidateItemsStrategy and AllUnknownItemsCandidateItemsStrategy, and although they both do what I wanted (recommend items not previously rated by any user), I honestly can't tell the difference between the two strategies. In my tests the output was always the same. If the eventual output of the recommender will not include items already rated by the user as
Re: Recommend items not rated by any user
I agree. IMHO using the Mahout recommenders is wrong for this. The recommenders are the CF/cooccurrence type that expect usage or rating data on fairly long lived items from a somewhat static catalog. Trying to make them work for content based recommendations is needlessly difficult especially since other tools are custom made for this. Like RowSimilarityJob and Solr. Each find content-based similarity with no rating or CF data needed. Profile creation is another subject and still does not use a Mahout recommender. You can keep the text of articles the user has rated, read, whatever. These will form the basis of your user profile. For each of them (if their are not too many) you could use them as the query to Solr returning similar docs for each in the profile. You could also lump them all together and use this as the query. You can also experiment with various ways to process profile data. If there are enough articles in the profile you might categorize them with clustering. then use the centroid of the clusters as the Solr query. The same thing can be done in batch mode with Mahout’s RowSimilarityJob. Take the user's cluster centroids as synthetic items, add them to the item DRM of news articles you get out of the text pipeline and run RSJ on that. For each synthetic item (cluster centroid) you’ll get a list of articles that are most similar. Not sure clustering the user profile is the best idea though since it would require quite a few articles for the user in question. If you have some method of labeling your articles (categories, tags, or the like) you can build classifiers for each. Then see what categories your user reads from the most by classifying the articles in their profile based on the labeled training data. As new articles come in and are classified you can funnel them to the right users. You can do this with clustering too but generally clustering is not as good as classifying since it is unsupervised learning. However clustering all news will probably give better results than clustering the user’s profile articles. So you would cluster your news corpus, which will include the articles your user has read, then recommend other articles that the user’s profile articles was clustered with (from the same cluster). This is only slightly different than using the profile articles as Solr queries but may produce better results. However the Solr queries will work even if the query (profile news article) is not in the index and will return results in realtime, requiring no batch RSJ. BTW I did just this as an experiment. I used my own browsing history as the profile, clustered the pages I read, then took the top terms from the centroids and did Google searches with them. Since the sources are so varied in Google I had to create a custom search engine to include only specific sites. It worked pretty well for discovering related pages. On Mar 5, 2014, at 8:46 AM, Sebastian Schelter s...@apache.org wrote: So both strategies seems to be effectively the same, I don't know what the implementers had in mind when designing AllSimilarItemsCandidateItemsStrategy. It can take a long time to estimate preferences for all items a user doesn't know. Especially if you have a lot of items. Traditional item-based recommenders will not recommend any item that is not similar to at least one of the items the user interacted with, so AllSimilarItemsStrategy already selects the maximum set of items that could be potentially recommended to the user. --sebastian On 03/05/2014 05:38 PM, Tevfik Aytekin wrote: If the similarity between item 5 and two of the items user 1 preferred are not NaN then it will return 1, that is what I'm saying. If the similarities were all NaN then it will not return it. But surely, you might wonder if all similarities between an item and user's items are NaN, then AllUnknownItemsCandidateItemsStrategy probably will not return it. On Wed, Mar 5, 2014 at 6:06 PM, Juan José Ramos jjar...@gmail.com wrote: @Tevfik, running this recommender: GenericItemBasedRecommender itemRecommender = new GenericItemBasedRecommender(dataModel, itemSimilarity, new AllSimilarItemsCandidateItemsStrategy(itemSimilarity), new AllSimilarItemsCandidateItemsStrategy(itemSimilarity)); With this dataModel: 1,1,1.0 1,2,2.0 1,3,1.0 1,4,2.0 2,1,1.0 2,2,4.0 And these similarities 1,2,0.1 1,3,0.2 1,4,0.3 2,3,0.5 3,4,0.5 5,1,0.2 5,2,1.0 Returns item 5 for User 1. So item 5 has not been preferred by user 1, and the similarity between item 5 and two of the items user 1 preferred are not NaN, but AllSimilarItemsCandidateItemsStrategy is returning that item. So, I'm truly sorry to insist on this, but I still really do not get the difference. On Wed, Mar 5, 2014 at 2:53 PM, Tevfik Aytekin tevfik.ayte...@gmail.comwrote: Juan, You got me wrong, AllSimilarItemsCandidateItemsStrategy returns all items that have not been
Re: Recommend items not rated by any user
It can even make things worse in SVD-based algorithms for which preference estimation is very fast. On Wed, Mar 5, 2014 at 7:00 PM, Tevfik Aytekin tevfik.ayte...@gmail.com wrote: Hi Sebastian, But in order not to select items that is not similar to at least one of the items the user interacted with you have to compute the similarity with all user items (which is the main task for estimating the preference of an item in item-based method). So, it seems to me that AllSimilarItemsStrategy does not bring much advantage over AllUnknownItemsCandidateItemsStrategy. On Wed, Mar 5, 2014 at 6:46 PM, Sebastian Schelter s...@apache.org wrote: So both strategies seems to be effectively the same, I don't know what the implementers had in mind when designing AllSimilarItemsCandidateItemsStrategy. It can take a long time to estimate preferences for all items a user doesn't know. Especially if you have a lot of items. Traditional item-based recommenders will not recommend any item that is not similar to at least one of the items the user interacted with, so AllSimilarItemsStrategy already selects the maximum set of items that could be potentially recommended to the user. --sebastian On 03/05/2014 05:38 PM, Tevfik Aytekin wrote: If the similarity between item 5 and two of the items user 1 preferred are not NaN then it will return 1, that is what I'm saying. If the similarities were all NaN then it will not return it. But surely, you might wonder if all similarities between an item and user's items are NaN, then AllUnknownItemsCandidateItemsStrategy probably will not return it. On Wed, Mar 5, 2014 at 6:06 PM, Juan José Ramos jjar...@gmail.com wrote: @Tevfik, running this recommender: GenericItemBasedRecommender itemRecommender = new GenericItemBasedRecommender(dataModel, itemSimilarity, new AllSimilarItemsCandidateItemsStrategy(itemSimilarity), new AllSimilarItemsCandidateItemsStrategy(itemSimilarity)); With this dataModel: 1,1,1.0 1,2,2.0 1,3,1.0 1,4,2.0 2,1,1.0 2,2,4.0 And these similarities 1,2,0.1 1,3,0.2 1,4,0.3 2,3,0.5 3,4,0.5 5,1,0.2 5,2,1.0 Returns item 5 for User 1. So item 5 has not been preferred by user 1, and the similarity between item 5 and two of the items user 1 preferred are not NaN, but AllSimilarItemsCandidateItemsStrategy is returning that item. So, I'm truly sorry to insist on this, but I still really do not get the difference. On Wed, Mar 5, 2014 at 2:53 PM, Tevfik Aytekin tevfik.ayte...@gmail.comwrote: Juan, You got me wrong, AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value with at least one of the items preferred by the user. So, it does not simply return all items that have not been rated by the user. For example, if there is an item X which has not been rated by the user and if the similarity value between X and at least one of the items rated (preferred) by the user is not NaN, then X will be not be returned by AllSimilarItemsCandidateItemsStrategy, but it will be returned by AllUnknownItemsCandidateItemsStrategy. On Wed, Mar 5, 2014 at 4:42 PM, Juan José Ramos jjar...@gmail.com wrote: Hi Tefik, Thanks for the response. I think what you says contradicts what Sebastian pointed out before. Also, if AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user, what would AllUnknownItemsCandidateItemsStrategy return? On Wed, Mar 5, 2014 at 1:40 PM, Tevfik Aytekin tevfik.ayte...@gmail.com wrote: Sorry there was a typo in the previous paragraph. If I remember correctly, AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value with at least one of the items preferred by the user. On Wed, Mar 5, 2014 at 3:38 PM, Tevfik Aytekin tevfik.ayte...@gmail.com wrote: Hi Juan, If I remember correctly, AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value that is with at least one of the items preferred by the user. Tevfik On Wed, Mar 5, 2014 at 2:30 PM, Sebastian Schelter s...@apache.org wrote: On 03/05/2014 01:23 PM, Juan José Ramos wrote: Thanks for the reply, Sebastian. I am not sure if that should be implemented in the Abstract base class though because for instance PreferredItemsNeighborhoodCandidateItemsStrategy, by definition, it returns the item not rated by the user and rated by somebody else. Good point. So we seem to need special implementations. Back to my last post, I have been playing around with AllSimilarItemsCandidateItemsStrategy and AllUnknownItemsCandidateItemsStrategy, and although they both do what I wanted (recommend items not previously rated by any user), I honestly can't tell the
Re: Fwd: PCA with ssvd leads to StackOverFlowError
CDH 4.5 and 4.6 are both 0.7 + patches. Neither contains 0.8, since it has (tiny) breaking changes vs 0.7 and this is a minor version update. CDH5 contains 0.8 + patches. I did not say CDH4 has 0.8 -- re-read the message of mine that was quoted. http://archive.cloudera.com/cdh4/cdh/4/mahout-0.7-cdh4.5.0.CHANGES.txt http://archive.cloudera.com/cdh4/cdh/4/mahout-0.7-cdh4.6.0.CHANGES.txt Those two patches are not in CDH 4.x, no. The non-upstream changes are basically all internal packaging stuff, and that can include modifying dependency versions to harmonize with the rest of the platform. That's the sense in which it works with Hadoop 2. I don't think the change you cite is sufficient to work with Hadoop 2. You also, for example, must build against the Hadoop 2 profile in Mahout in Maven. For that you do not need the CDH repo even, just point to the Hadoop 2.x release if you like. I know there has been a patch in even just the past few weeks that makes it work even better with 2.x. So I suppose I would build from HEAD if possible to take advantage. On Wed, Mar 5, 2014 at 4:30 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Not sure if the CDH4 patches on top of 0.7 has fixes for M-1067 and M-1098 which address the issues u r seeing. The second part of the issue u r seeing with Mahout 0.9 distro seems to be related to how u set it up on CDH4. I apologize for not being helpful here as I am not a CDH4 user or expert. Sean?
Re: Fwd: PCA with ssvd leads to StackOverFlowError
Yeah. it would seem CDH releases of Mahout produce some sort of cut-down version of such. I suggest to switch to official release tarbal (or write to Cloudera support about it). On Wed, Mar 5, 2014 at 8:38 AM, Andrew Musselman andrew.mussel...@gmail.com wrote: I'm not sure about this either but I think these are all the changes to Mahout in CDH 4.6.0: http://archive.cloudera.com/cdh4/cdh/4/mahout-0.7-cdh4.6.0.CHANGES.txt MAHOUT-1291 MAHOUT-1033 MAHOUT-1142 On Wed, Mar 5, 2014 at 8:30 AM, Suneel Marthi suneel_mar...@yahoo.com wrote: Not sure if the CDH4 patches on top of 0.7 has fixes for M-1067 and M-1098 which address the issues u r seeing. The second part of the issue u r seeing with Mahout 0.9 distro seems to be related to how u set it up on CDH4. I apologize for not being helpful here as I am not a CDH4 user or expert. Sean? On Wednesday, March 5, 2014 10:23 AM, Kevin Moulart kevinmoul...@gmail.com wrote: Previous mail sent only to Suneel : (my bad sorry) According to my stacktrace it seems that I am running mahout 0.7 indeed. That's the version provided by Cloudera when I install mahout using yum. But according to Sean Owen, it really is a 0.8 inside... Anyway I tried with the compiled version and it didn't work : Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop and HADOOP_CONF_DIR= Exception in thread main java.lang.NoSuchMethodError: org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:122) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) MAHOUT-JOB: /home/cacf/Downloads/mahout-distribution-0.9/mahout-examples-0.9-job.jar And now I changed the conf directory of mahout 0.9 to be linked to the one used by the existing working mahout and the trace changes : MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf MAHOUT-JOB: /home/myCompany/Downloads/mahout-distribution-0.9/mahout-examples-0.9-job.jar 14/03/05 16:16:23 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.clustering.meanshift.MeanShiftCanopyDriver java.lang.ClassNotFoundException: org.apache.mahout.clustering.meanshift.MeanShiftCanopyDriver at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:190) at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:118) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) 14/03/05 16:16:23 WARN driver.MahoutDriver: Unable to add class: org.apache.mahout.clustering.spectral.eigencuts.EigencutsDriver java.lang.ClassNotFoundException: org.apache.mahout.clustering.spectral.eigencuts.EigencutsDriver at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:190) at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:118) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) 14/03/05 16:16:23 WARN driver.MahoutDriver: Unable to add class:
Re: Fwd: PCA with ssvd leads to StackOverFlowError
I don't follow what here makes you say they are cut down releases? They are release plus patches not release minus patches. The question is not about how to use 0.7, but how to use 1.0-SNAPSHOT. Why would switching to the official 0.7 release help? I think the answer is you build Mahout for Hadoop 2. right? This has always been the case. Mahout has always been Hadoop 1, with 2 support on the side. On Wed, Mar 5, 2014 at 5:04 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: Yeah. it would seem CDH releases of Mahout produce some sort of cut-down version of such. I suggest to switch to official release tarbal (or write to Cloudera support about it).
Re: Recommend items not rated by any user
For SVD based algorithms, you would should use the AllUnknownItems Strategy then, thats correct. In the majority of industry usecases that I have seen, people use pre-computed item similarities (Mahout has lots of machinery for doing this, btw), so AllSimilarItems totally makes sense there. --sebastian On 03/05/2014 06:01 PM, Tevfik Aytekin wrote: It can even make things worse in SVD-based algorithms for which preference estimation is very fast. On Wed, Mar 5, 2014 at 7:00 PM, Tevfik Aytekin tevfik.ayte...@gmail.com wrote: Hi Sebastian, But in order not to select items that is not similar to at least one of the items the user interacted with you have to compute the similarity with all user items (which is the main task for estimating the preference of an item in item-based method). So, it seems to me that AllSimilarItemsStrategy does not bring much advantage over AllUnknownItemsCandidateItemsStrategy. On Wed, Mar 5, 2014 at 6:46 PM, Sebastian Schelter s...@apache.org wrote: So both strategies seems to be effectively the same, I don't know what the implementers had in mind when designing AllSimilarItemsCandidateItemsStrategy. It can take a long time to estimate preferences for all items a user doesn't know. Especially if you have a lot of items. Traditional item-based recommenders will not recommend any item that is not similar to at least one of the items the user interacted with, so AllSimilarItemsStrategy already selects the maximum set of items that could be potentially recommended to the user. --sebastian On 03/05/2014 05:38 PM, Tevfik Aytekin wrote: If the similarity between item 5 and two of the items user 1 preferred are not NaN then it will return 1, that is what I'm saying. If the similarities were all NaN then it will not return it. But surely, you might wonder if all similarities between an item and user's items are NaN, then AllUnknownItemsCandidateItemsStrategy probably will not return it. On Wed, Mar 5, 2014 at 6:06 PM, Juan José Ramos jjar...@gmail.com wrote: @Tevfik, running this recommender: GenericItemBasedRecommender itemRecommender = new GenericItemBasedRecommender(dataModel, itemSimilarity, new AllSimilarItemsCandidateItemsStrategy(itemSimilarity), new AllSimilarItemsCandidateItemsStrategy(itemSimilarity)); With this dataModel: 1,1,1.0 1,2,2.0 1,3,1.0 1,4,2.0 2,1,1.0 2,2,4.0 And these similarities 1,2,0.1 1,3,0.2 1,4,0.3 2,3,0.5 3,4,0.5 5,1,0.2 5,2,1.0 Returns item 5 for User 1. So item 5 has not been preferred by user 1, and the similarity between item 5 and two of the items user 1 preferred are not NaN, but AllSimilarItemsCandidateItemsStrategy is returning that item. So, I'm truly sorry to insist on this, but I still really do not get the difference. On Wed, Mar 5, 2014 at 2:53 PM, Tevfik Aytekin tevfik.ayte...@gmail.comwrote: Juan, You got me wrong, AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value with at least one of the items preferred by the user. So, it does not simply return all items that have not been rated by the user. For example, if there is an item X which has not been rated by the user and if the similarity value between X and at least one of the items rated (preferred) by the user is not NaN, then X will be not be returned by AllSimilarItemsCandidateItemsStrategy, but it will be returned by AllUnknownItemsCandidateItemsStrategy. On Wed, Mar 5, 2014 at 4:42 PM, Juan José Ramos jjar...@gmail.com wrote: Hi Tefik, Thanks for the response. I think what you says contradicts what Sebastian pointed out before. Also, if AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user, what would AllUnknownItemsCandidateItemsStrategy return? On Wed, Mar 5, 2014 at 1:40 PM, Tevfik Aytekin tevfik.ayte...@gmail.com wrote: Sorry there was a typo in the previous paragraph. If I remember correctly, AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value with at least one of the items preferred by the user. On Wed, Mar 5, 2014 at 3:38 PM, Tevfik Aytekin tevfik.ayte...@gmail.com wrote: Hi Juan, If I remember correctly, AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value that is with at least one of the items preferred by the user. Tevfik On Wed, Mar 5, 2014 at 2:30 PM, Sebastian Schelter s...@apache.org wrote: On 03/05/2014 01:23 PM, Juan José Ramos wrote: Thanks for the reply, Sebastian. I am not sure if that should be implemented in the Abstract base class though because for instance PreferredItemsNeighborhoodCandidateItemsStrategy, by definition, it returns the item not rated by the user and rated by somebody else. Good point. So we seem to need special
Re: Fwd: PCA with ssvd leads to StackOverFlowError
I apologize Sean I wasn't aware of the complete history in this thread. I didn't know about Hadoop 2.x being involved here, if so yes need to build Mahout against HEAD with Hadoop 2 profile to get working. On Wednesday, March 5, 2014 12:04 PM, Sean Owen sro...@gmail.com wrote: CDH 4.5 and 4.6 are both 0.7 + patches. Neither contains 0.8, since it has (tiny) breaking changes vs 0.7 and this is a minor version update. CDH5 contains 0.8 + patches. I did not say CDH4 has 0.8 -- re-read the message of mine that was quoted. http://archive.cloudera.com/cdh4/cdh/4/mahout-0.7-cdh4.5.0.CHANGES.txt http://archive.cloudera.com/cdh4/cdh/4/mahout-0.7-cdh4.6.0.CHANGES.txt Those two patches are not in CDH 4.x, no. The non-upstream changes are basically all internal packaging stuff, and that can include modifying dependency versions to harmonize with the rest of the platform. That's the sense in which it works with Hadoop 2. I don't think the change you cite is sufficient to work with Hadoop 2. You also, for example, must build against the Hadoop 2 profile in Mahout in Maven. For that you do not need the CDH repo even, just point to the Hadoop 2.x release if you like. I know there has been a patch in even just the past few weeks that makes it work even better with 2.x. So I suppose I would build from HEAD if possible to take advantage. On Wed, Mar 5, 2014 at 4:30 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Not sure if the CDH4 patches on top of 0.7 has fixes for M-1067 and M-1098 which address the issues u r seeing. The second part of the issue u r seeing with Mahout 0.9 distro seems to be related to how u set it up on CDH4. I apologize for not being helpful here as I am not a CDH4 user or expert. Sean?
Re: Fwd: PCA with ssvd leads to StackOverFlowError
On Wed, Mar 5, 2014 at 9:08 AM, Sean Owen sro...@gmail.com wrote: I don't follow what here makes you say they are cut down releases? meaning it seems to be pretty much 2 releases behind the official. But i definitely don't follow CDH developments in this department, you seem in a better position to explain the existing patchlevel there so I defer to you to explain why this patchlevel is not there. -d
Re: Rework our website
+1 for design 2 On Wed, Mar 5, 2014 at 6:00 PM, Suneel Marthi suneel_mar...@yahoo.comwrote: +1 for Option# 2. On Wednesday, March 5, 2014 7:11 AM, Sebastian Schelter s...@apache.org wrote: Hi everyone, In our latest discussion, I argued that the lack (and errors) of documentation on our website is one of the main pain points of Mahout atm. To be honest, I'm also not very happy with the design, especially fonts and spacing make it super hard to read long articles. This also prevents me from wanting to add articles and documentation. I think we should have a beautiful website, where it is fun to add new stuff. My design skills are pretty limited, but fortunately my brother is an art director! I asked him to make our website a bit more beautiful without changing to much of the structure, so that a redesign wouldn't take too long. I really like the results and would volunteer to dig out my CSS skills and do the redesign, if people agree. Here are his drafts, I like the second one best: https://people.apache.org/~ssc/mahout/mahout.jpg https://people.apache.org/~ssc/mahout/mahout2.jpg Let me know what you think! Best, Sebastian
Re: Rework our website
I also prefer design 2 On Wed, Mar 5, 2014 at 11:08 AM, Frank Scholten fr...@frankscholten.nlwrote: +1 for design 2 On Wed, Mar 5, 2014 at 6:00 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: +1 for Option# 2. On Wednesday, March 5, 2014 7:11 AM, Sebastian Schelter s...@apache.org wrote: Hi everyone, In our latest discussion, I argued that the lack (and errors) of documentation on our website is one of the main pain points of Mahout atm. To be honest, I'm also not very happy with the design, especially fonts and spacing make it super hard to read long articles. This also prevents me from wanting to add articles and documentation. I think we should have a beautiful website, where it is fun to add new stuff. My design skills are pretty limited, but fortunately my brother is an art director! I asked him to make our website a bit more beautiful without changing to much of the structure, so that a redesign wouldn't take too long. I really like the results and would volunteer to dig out my CSS skills and do the redesign, if people agree. Here are his drafts, I like the second one best: https://urldefense.proofpoint.com/v1/url?u=https://people.apache.org/~ssc/mahout/mahout.jpgk=2a4Akkj3oY%2FOkjwft1MTMw%3D%3D%0Ar=9NWLniU1hq%2FrWXkfnwTRj8Lc%2BfBFgJW%2FYCy4Rls0Pvk%3D%0Am=ePJZLLP4bQhVfRe67t%2BD%2FRnawYF%2F%2Bx4IGnTOLXvydz8%3D%0As=08801d50fb6e66bc069052b66f8d6f5911d8453c35f6292a9ac8fef44e12a866 https://urldefense.proofpoint.com/v1/url?u=https://people.apache.org/~ssc/mahout/mahout2.jpgk=2a4Akkj3oY%2FOkjwft1MTMw%3D%3D%0Ar=9NWLniU1hq%2FrWXkfnwTRj8Lc%2BfBFgJW%2FYCy4Rls0Pvk%3D%0Am=ePJZLLP4bQhVfRe67t%2BD%2FRnawYF%2F%2Bx4IGnTOLXvydz8%3D%0As=cb15ba2620a20c27d93745de448a604e46b0169592cb88febdc850680ba00628 Let me know what you think! Best, Sebastian
Re: Fwd: PCA with ssvd leads to StackOverFlowError
I don't understand this -- CDH always bundles the latest release. You know that CDH4 was released in July 2012, right? So it included 0.7 + patches. CDH5 includes 0.8 because 0.9 was released about a month after it began beta 2. CDH follows semantic versioning and won't introduce changes that are not backwards-compatible in a minor version update. 0.x releases of Mahout act like major version changes -- not backwards compatible. So 4.x will always be 0.7 and 5.x will always be 0.8. On Wed, Mar 5, 2014 at 5:34 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: On Wed, Mar 5, 2014 at 9:08 AM, Sean Owen sro...@gmail.com wrote: I don't follow what here makes you say they are cut down releases? meaning it seems to be pretty much 2 releases behind the official. But i definitely don't follow CDH developments in this department, you seem in a better position to explain the existing patchlevel there so I defer to you to explain why this patchlevel is not there. -d
Re: Fwd: PCA with ssvd leads to StackOverFlowError
You can always install whatever version of anything on your cluster that you want. It may or may not work, but often happens to, at least for whatever you need it to do. It's just the same as it is without a packaged distribution -- dump new tarballs and cross your fingers. Nothing is weird or different about the setup or layout. That is the here be dragons solution, already You go with support from a packaged distribution when you want a here be no dragons solution. Everything else is by definition already something you can and should do yourself outside of a packaged distribution. And really -- you freely can, and it's not hard, if you know what you are doing. On Wed, Mar 5, 2014 at 9:15 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: Feels like just yesterday :) Consider this a feature request to have more flexible component versioning, even with a caveat/here be dragons warning. I know that complicates things but people do use your releases a long time. I personally wished I could upgrade Pig on CDH 4 for new features but there was no simple way on a managed cluster. On Wed, Mar 5, 2014 at 12:12 PM, Sean Owen sro...@gmail.com wrote: I don't understand this -- CDH always bundles the latest release. You know that CDH4 was released in July 2012, right? So it included 0.7 + patches. CDH5 includes 0.8 because 0.9 was released about a month after it began beta 2. CDH follows semantic versioning and won't introduce changes that are not backwards-compatible in a minor version update. 0.x releases of Mahout act like major version changes -- not backwards compatible. So 4.x will always be 0.7 and 5.x will always be 0.8. On Wed, Mar 5, 2014 at 5:34 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: On Wed, Mar 5, 2014 at 9:08 AM, Sean Owen sro...@gmail.com wrote: I don't follow what here makes you say they are cut down releases? meaning it seems to be pretty much 2 releases behind the official. But i definitely don't follow CDH developments in this department, you seem in a better position to explain the existing patchlevel there so I defer to you to explain why this patchlevel is not there. -d
Re: Fwd: PCA with ssvd leads to StackOverFlowError
Yeah, for sure; balancing clients' risk aversion to technical features is why we often recommend vendor solutions. Having a little button to choose a newer version of a component in the Manager UI (even with a confirmation dialog that said Are you sure? Are you crazy?) would be more palatable to some teams than installing tarballs, is what I'm getting at. On Wed, Mar 5, 2014 at 1:30 PM, Sean Owen sro...@gmail.com wrote: You can always install whatever version of anything on your cluster that you want. It may or may not work, but often happens to, at least for whatever you need it to do. It's just the same as it is without a packaged distribution -- dump new tarballs and cross your fingers. Nothing is weird or different about the setup or layout. That is the here be dragons solution, already You go with support from a packaged distribution when you want a here be no dragons solution. Everything else is by definition already something you can and should do yourself outside of a packaged distribution. And really -- you freely can, and it's not hard, if you know what you are doing. On Wed, Mar 5, 2014 at 9:15 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: Feels like just yesterday :) Consider this a feature request to have more flexible component versioning, even with a caveat/here be dragons warning. I know that complicates things but people do use your releases a long time. I personally wished I could upgrade Pig on CDH 4 for new features but there was no simple way on a managed cluster. On Wed, Mar 5, 2014 at 12:12 PM, Sean Owen sro...@gmail.com wrote: I don't understand this -- CDH always bundles the latest release. You know that CDH4 was released in July 2012, right? So it included 0.7 + patches. CDH5 includes 0.8 because 0.9 was released about a month after it began beta 2. CDH follows semantic versioning and won't introduce changes that are not backwards-compatible in a minor version update. 0.x releases of Mahout act like major version changes -- not backwards compatible. So 4.x will always be 0.7 and 5.x will always be 0.8. On Wed, Mar 5, 2014 at 5:34 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: On Wed, Mar 5, 2014 at 9:08 AM, Sean Owen sro...@gmail.com wrote: I don't follow what here makes you say they are cut down releases? meaning it seems to be pretty much 2 releases behind the official. But i definitely don't follow CDH developments in this department, you seem in a better position to explain the existing patchlevel there so I defer to you to explain why this patchlevel is not there. -d
Re: Fwd: PCA with ssvd leads to StackOverFlowError
I mean balance the risk aversion against the value of new features duh. On Wed, Mar 5, 2014 at 1:39 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: Yeah, for sure; balancing clients' risk aversion to technical features is why we often recommend vendor solutions. Having a little button to choose a newer version of a component in the Manager UI (even with a confirmation dialog that said Are you sure? Are you crazy?) would be more palatable to some teams than installing tarballs, is what I'm getting at. On Wed, Mar 5, 2014 at 1:30 PM, Sean Owen sro...@gmail.com wrote: You can always install whatever version of anything on your cluster that you want. It may or may not work, but often happens to, at least for whatever you need it to do. It's just the same as it is without a packaged distribution -- dump new tarballs and cross your fingers. Nothing is weird or different about the setup or layout. That is the here be dragons solution, already You go with support from a packaged distribution when you want a here be no dragons solution. Everything else is by definition already something you can and should do yourself outside of a packaged distribution. And really -- you freely can, and it's not hard, if you know what you are doing. On Wed, Mar 5, 2014 at 9:15 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: Feels like just yesterday :) Consider this a feature request to have more flexible component versioning, even with a caveat/here be dragons warning. I know that complicates things but people do use your releases a long time. I personally wished I could upgrade Pig on CDH 4 for new features but there was no simple way on a managed cluster. On Wed, Mar 5, 2014 at 12:12 PM, Sean Owen sro...@gmail.com wrote: I don't understand this -- CDH always bundles the latest release. You know that CDH4 was released in July 2012, right? So it included 0.7 + patches. CDH5 includes 0.8 because 0.9 was released about a month after it began beta 2. CDH follows semantic versioning and won't introduce changes that are not backwards-compatible in a minor version update. 0.x releases of Mahout act like major version changes -- not backwards compatible. So 4.x will always be 0.7 and 5.x will always be 0.8. On Wed, Mar 5, 2014 at 5:34 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: On Wed, Mar 5, 2014 at 9:08 AM, Sean Owen sro...@gmail.com wrote: I don't follow what here makes you say they are cut down releases? meaning it seems to be pretty much 2 releases behind the official. But i definitely don't follow CDH developments in this department, you seem in a better position to explain the existing patchlevel there so I defer to you to explain why this patchlevel is not there. -d
Re: Rework our website
At the moment, only committers can change the website unfortunately. If you have a text to add, I'm happy to work it in and add your name to our contributers list in the CHANGELOG. Best, Sebastian On 03/05/2014 04:58 PM, Scott C. Cote wrote: I had recently taken the text tour of mahout, but I couldn't decipher a way to contribute updates to the tour (some of the file names have changed, etc). How would I start? (this was part of my offer to help with the documentation of Mahout). SCott On 3/5/14 9:47 AM, Pat Ferrel p...@occamsmachete.com wrote: What no centered text?? ;-) Love either. BTW users are no longer able to contribute content to the wiki. Most CMSs have a way to allow input that is moderated. Might this make getting documentation help easier? Allow anyone to contribute but committers can filter out the bad‹sort of like submitting patches. On Mar 5, 2014, at 4:11 AM, Sebastian Schelter s...@apache.org wrote: Hi everyone, In our latest discussion, I argued that the lack (and errors) of documentation on our website is one of the main pain points of Mahout atm. To be honest, I'm also not very happy with the design, especially fonts and spacing make it super hard to read long articles. This also prevents me from wanting to add articles and documentation. I think we should have a beautiful website, where it is fun to add new stuff. My design skills are pretty limited, but fortunately my brother is an art director! I asked him to make our website a bit more beautiful without changing to much of the structure, so that a redesign wouldn't take too long. I really like the results and would volunteer to dig out my CSS skills and do the redesign, if people agree. Here are his drafts, I like the second one best: https://people.apache.org/~ssc/mahout/mahout.jpg https://people.apache.org/~ssc/mahout/mahout2.jpg Let me know what you think! Best, Sebastian