Re: [ANNOUNCE] Andrew Musselman, New Mahout PMC Chair

2018-07-19 Thread Sebastian Schelter
Congrats! 2018-07-19 9:31 GMT+02:00 Peng Zhang : > Congrats Andrew! > > On Thu, Jul 19, 2018 at 04:01 Andrew Musselman > > wrote: > > > Thanks Andy, looking forward to it! Thank you too for your support and > > dedication the past two years; here's to continued progress! > > > > Best > > Andrew

Re: Does mahout 0.5 fit hadoop-0.20.2?

2014-06-25 Thread Sebastian Schelter
Please use a recent version of mahout. 0.4 and 0.5 are totally outdated. -s On 06/25/2014 09:05 AM, seabiscuit08 wrote: Hi everyone, i am new in mahout. Our hadoop cluster is hadoop-0.20.2 ,i try out mahout-distribution-0.4 lda function, and it works well. But It can't inference new document

Re: divide a vector (sum) by a double, error

2014-06-16 Thread Sebastian Schelter
Its also not a good idea to put the vectors into a hashset, i don't think we have equals and hashcode correctly implemented for that Am 16.06.2014 18:21 schrieb Ted Dunning ted.dunn...@gmail.com: Patrice, This sounds like a classpath problem more than code error. Are you sure that you can

Re: Performance issues in Mahout recommendations

2014-06-06 Thread Sebastian Schelter
You should not use Hadoop for such a tiny dataset. Use the GenericItemBasedRecommender on a single machine in Java. --sebastian On 06/06/2014 11:10 AM, Warunika Ranaweera wrote: Hi, I am using Mahout's recommenditembased algorithm on a data set with nearly 10,000 (implicit) user ratings.

Re: Performance issues in Mahout recommendations

2014-06-06 Thread Sebastian Schelter
performance level. What is the average running time for the Mahout distributed recommendation job on 1 million ratings? Does it usually take more than 1 minute? Thanks in advance, Warunika On Fri, Jun 6, 2014 at 2:42 PM, Sebastian Schelter s...@apache.org wrote: You should not use Hadoop

Re: Performance issues in Mahout recommendations

2014-06-06 Thread Sebastian Schelter
thought of moving to Mahout. However, it seems like, for now, it's better to go with the single machine implementation. Thanks for your suggestions, Warunika On Fri, Jun 6, 2014 at 3:36 PM, Sebastian Schelter s...@apache.org wrote: 1M ratings take up something like 20 megabytes

Re: Indicator Matrix and Mahout + Solr recommender

2014-05-27 Thread Sebastian Schelter
I have added the threshold merely as a way to increase the performance of RowSimilarityJob. If a threshold is given, some item pairs don't need to be looked at. A simple example is if you use cooccurrence count as similarity measure, and set a threshold of n cooccurrences, than any pair

Re: Theory behind LogisticRegression in Mahout

2014-05-23 Thread Sebastian Schelter
We should add these links to the LR page on the website. --s On 05/23/2014 03:20 PM, Ted Dunning wrote: Ahh... my error then. Happily, Dmitriy and others have provided the requisite links. On Thu, May 22, 2014 at 11:50 PM, namit maheshwari namitmaheshwa...@gmail.com wrote: No I didnt find

Re: Setting mahout heapsize for rowsimilarity job

2014-05-23 Thread Sebastian Schelter
I don't think you should use RowSimilarity job for that case, if you only have 6 columns. Can you tell us a little bit about the data and what problem your are trying to solve? --sebastian On 05/23/2014 09:03 PM, Suneel Marthi wrote: I had seen this issue too with RSJ until 0.8. Switch to

Re: Mahout recommendation in implicit feedback situation

2014-05-05 Thread Sebastian Schelter
, Sebastian Schelter wrote: You should try the org.apache.mahout.cf.taste.impl.recommender.GenericBooleanPrefUserBasedRecommender which has been built to handle such data. Best, Sebastian On 05/03/2014 04:34 PM, Alessandro Suglia wrote: I have described it in the SO's post: When I execute

Re: Fwd: Mahout Naive Bayes CSV Classification

2014-05-04 Thread Sebastian Schelter
Hi Jossef, You have to vectorize and normalize your data. The input for naive bayes is a sequencefile containing a Text object as key (your label) and a VectorWritable that holds a vector with the data. Instructions to run NaiveBayes can be found here:

Re: Mahout recommendation in implicit feedback situation

2014-05-03 Thread Sebastian Schelter
Hi Allessandro, what result do you expect and what do you get? Can you give a concrete example? --sebastian On 05/03/2014 12:11 PM, Alessandro Suglia wrote: Good morning, I've tried to create a recommender system using Mahout in an implicit feedback situation. What I'm trying to do is

Re: Mahout recommendation in implicit feedback situation

2014-05-03 Thread Sebastian Schelter
appropriately. On 05/03/14 16:25, Sebastian Schelter wrote: Hi Allessandro, what result do you expect and what do you get? Can you give a concrete example? --sebastian On 05/03/2014 12:11 PM, Alessandro Suglia wrote: Good morning, I've tried to create a recommender system using Mahout in an implicit

Re: Future of Frequent Pattern Mining

2014-05-01 Thread Sebastian Schelter
promised to maintain it and has not been heard from. On Mon, Apr 28, 2014 at 2:19 AM, Sebastian Schelter s...@apache.org wrote: Hi, I'm resending this mail to also include the users list. To wrap up: We currently have a discussion whether our frequent pattern mining package should stay

Future of Frequent Pattern Mining

2014-04-28 Thread Sebastian Schelter
Hi, I'm resending this mail to also include the users list. To wrap up: We currently have a discussion whether our frequent pattern mining package should stay in the codebase. The original author suggested to remove the original implementation and maybe retain the FPGrowth2 implementation.

Re: Future of Frequent Pattern Mining

2014-04-28 Thread Sebastian Schelter
Wechner wrote: what is the alternative and if one would still want to use the frequent pattern mining code in the future, how would this be possible otherwise? Thanks Michael Am 28.04.14 08:19, schrieb Sebastian Schelter: Hi, I'm resending this mail to also include the users list. To wrap up

Re: Reading the wiki

2014-04-28 Thread Sebastian Schelter
to apease the browsers under https handshake. I am not sure what would be associated with that, I am not sure if mathjax is solely static content or it is an actual server doing something. On Sun, Apr 27, 2014 at 12:41 AM, Sebastian Schelter s...@apache.org wrote: What if we store a copy of the js

Re: Future of Frequent Pattern Mining

2014-04-28 Thread Sebastian Schelter
. On Mon, Apr 28, 2014 at 2:19 AM, Sebastian Schelter s...@apache.org wrote: Hi, I'm resending this mail to also include the users list. To wrap up: We currently have a discussion whether our frequent pattern mining package should stay in the codebase. The original author suggested to remove

Re: Reading the wiki

2014-04-27 Thread Sebastian Schelter
What if we store a copy of the js file on our site and also serve it via https? On 04/27/2014 05:34 AM, Pat Ferrel wrote: Often CMSs have a way to configure https access to be used only for password or other secure areas of the site. No idea if the Apache CMS does this but worth asking. If

Welcome Pat Ferrel as new committer on Mahout

2014-04-24 Thread Sebastian Schelter
Hi, this is to announce that the Project Management Committee (PMC) for Apache Mahout has asked Pat Ferrel to become committer and we are pleased to announce that he has accepted. Being a committer enables easier contribution to the project since in addition to posting patches on JIRA it

Re: Spark Mahout with a CLI?

2014-04-20 Thread Sebastian Schelter
I'll create a jira ticket for this, as I have a little time to work on it. On 04/16/2014 08:15 PM, Pat Ferrel wrote: bug in the pseudo code, should use columnIds: val hashedCrossIndicatorMatrix = new HashedSparseMatrix(indicatorMatrices(1), hashedDrms(0).columnIds(),

Re: org.apache.mahout.math.IndexException

2014-04-20 Thread Sebastian Schelter
not applied a patch before. I found the following page http://mahout.apache.org/developers/patch-check-list.html is that description enough for applying a patch? On Sat, Apr 19, 2014 at 2:23 AM, Sebastian Schelter s...@apache.org wrote: Mario, could you check whether the patch from https

Re: simple idea for improving mahout docs over the next month?

2014-04-18 Thread Sebastian Schelter
the information from a mailing list search but i think a rolling FAQ would much more (1) be likely evolve into real documentation and (2) be more easily refined . Is that a little convincing ? If not i guess we can table the idea/// just a thought. On Thu, Apr 17, 2014 at 1:38 AM, Sebastian

Re: Performance Issue using item-based approach!

2014-04-18 Thread Sebastian Schelter
12:41 schrieb Sebastian Schelter s...@apache.org: Hi Najum, I think I found the problem. Remember: Two items are similar whenever at least one user interacted with both of them (the items co-occur). In the movielens dataset this is true for almost all pairs of items, unfortunately. From 3076

Re: Installation on Ubuntu

2014-04-18 Thread Sebastian Schelter
Which version do you use, it shouldn't be a problem with oracle java. --sebastian On 04/18/2014 09:39 PM, Christopher Eugene wrote: Hello, I want to install mahout on Ubuntu 14.04. I had previously tried in vain to install on 13.10. Could the version of Java be the problem? I am compiling

Re: Installation on Ubuntu

2014-04-18 Thread Sebastian Schelter
That is wrong, but you could use a server such as PredictionIO (which uses Mahout internally) with PHP. --sebastian On 04/18/2014 09:49 PM, Christopher Eugene wrote: @sebastian I have version 1.7. @Andrew I plan on using mahout with php since I heard that there is a new API or am I wrong?

Re: Installation on Ubuntu

2014-04-18 Thread Sebastian Schelter
You can, but I'm not sure how much we can help you. Give it a try :) On 04/18/2014 10:11 PM, Christopher Eugene wrote: sorry I thought I replied to it :). I can ask predictionio related questions on the list too? On Fri, Apr 18, 2014 at 11:06 PM, Sebastian Schelter s...@apache.org wrote

Re: org.apache.mahout.math.IndexException

2014-04-18 Thread Sebastian Schelter
Hi Mario, this is indeed a bug. The problem is that the CF code (taste) uses long ids, while our math library internally uses int keys. I'll open a jira and post patch that will hopefully help you. --sebastian On 04/18/2014 11:03 PM, Mario Levitin wrote: In my dataset ID's are strings so I

Re: org.apache.mahout.math.IndexException

2014-04-18 Thread Sebastian Schelter
Mario, could you check whether the patch from https://issues.apache.org/jira/browse/MAHOUT-1517 fixes your problem? Best, Sebastian On 04/18/2014 11:03 PM, Mario Levitin wrote: In my dataset ID's are strings so I use MemoryIDMigrator. This migrator produces large longs. I'm not doing any

Re: simple idea for improving mahout docs over the next month?

2014-04-17 Thread Sebastian Schelter
Hi Najum, please write a new mail to ask a question and don't reply to an unrelated thread -- https://people.apache.org/~hossman/#threadhijack If you write a new mail, I'm sure we can help you with your recommender problem. Can you give us a few more details, such as the similarity that you

Re: Performance Issue using item-based approach!

2014-04-17 Thread Sebastian Schelter
-based using pre computed similarity The last log is the userbased recommender using pearson Look at the huge time difference! Am 17.04.2014 um 11:23 schrieb Sebastian Schelter s...@apache.org mailto:s...@apache.org: Najum, this is really strange, feeding an ItemBased Recommender

Re: Performance Issue using item-based approach!

2014-04-17 Thread Sebastian Schelter
Yes, just to make sure the problem is in the mahout code and not in the surrounding environment. On 04/17/2014 11:43 AM, Najum Ali wrote: @Sebastian What do u mean with a standalone recommender? A simple offline java main program? Am 17.04.2014 um 11:41 schrieb Sebastian Schelter s

Re: Is there any website documentation repository or tool for Apache Mahout?

2014-04-17 Thread Sebastian Schelter
The templates for the individual pages are in the svn under site/ in markdown format. You can use an online markdown editor to approximately see how they look like. We don't have a better solution yet, unfortunately. --sebastian Am 17.04.2014 20:09 schrieb Andrew Musselman

Re: simple idea for improving mahout docs over the next month?

2014-04-16 Thread Sebastian Schelter
Hi Jay, I'm not sure what the benefit of this approach is, people can already post their questions to the mailinglist and get answers here, why would a google doc be helpful? --sebastian On 04/16/2014 09:31 PM, Jay Vyas wrote: hi mahout... i finally thought of a really easy way of ad-hoc

Documentation, Documentation, Documentation

2014-04-13 Thread Sebastian Schelter
Hi, this is another reminder that we still have to finish our documentation improvements! The website looks shiny now and there have been lots of discussions about new directions but we still have some work todo in cleaning up webpages. We should especially make sure that the examples work.

Re: PreferenceArray userID uniqeness?

2014-04-11 Thread Sebastian Schelter
Yes, its a unique identifier for a user. --sebastian On 04/11/2014 04:41 PM, Mike Summers wrote: Does the userId of a preferenceArray need to be unique across all entries in a FastByIDMap? I'm comparing two types of objects that contain the same set of traits however it's possible that the

Re: Best practice for partial cartesian product

2014-04-08 Thread Sebastian Schelter
I don't know a good name for that. The problems is that a quadratic amount of pairs needs to be emitted here. In our collaborative filtering code, we solve this through downsampling. --sebastian On 04/08/2014 10:08 AM, Reinis Vicups wrote: Hi, this is not mahout question directly, but I

Re: Best practice for partial cartesian product

2014-04-08 Thread Sebastian Schelter
response. Could you or anyone point me to the mahout classes where this is being solved? thank you guys reinis On 08.04.2014 10:27, Sebastian Schelter wrote: I don't know a good name for that. The problems is that a quadratic amount of pairs needs to be emitted here. In our collaborative

Re: Can any one help

2014-04-08 Thread Sebastian Schelter
It seems there is a problem with your hdfs, how did you configure that? --sebastian On 04/08/2014 07:23 PM, Neetha wrote: Hi, I am trying to run Mahout -kmeans clustering on hadoop, but I am getting this error, hduser3@ubuntu:/usr/local/hadoop-1.0.1/mahout3$ bin/mahout seqdirectory \-i

Re: Solr+Mahout Recommender Demo Site

2014-04-06 Thread Sebastian Schelter
The top 3 recommendations based on videos you liked are very good! Nice job. On 04/06/2014 07:26 PM, Pat Ferrel wrote: After having integrated several versions of the Mahout and Myrrix recommenders at fairly large scale. I was interested in solving three problems that these did not directly

Re: Number of features for ALS

2014-03-30 Thread Sebastian Schelter
the argument that if you take too many features, it doesn't much hurt so you should always take as many as you can compute. On Thu, Mar 27, 2014 at 6:33 AM, Sebastian Schelter s...@apache.org wrote: Hi, does anyone know of a principled approach of choosing the number of features for ALS

Re: (help!) Can someone scan this

2014-03-29 Thread Sebastian Schelter
Jay, which version of Mahout are you using? Have you tried to explicitly set the temp path? --sebastian On 03/29/2014 01:52 AM, Jay Vyas wrote: Hi again mahout: Im wrapping a distributed recommender like this:

Re: The 3 distributed recommenders

2014-03-28 Thread Sebastian Schelter
Hi Jay, there's not much documentation unfortunately. We're in the process of creating that however. We removed the pseudo-distributed recommender, mainly because nobody ever used it. There are two research papers that could help you with understanding the other two distributed recommenders:

Number of features for ALS

2014-03-27 Thread Sebastian Schelter
Hi, does anyone know of a principled approach of choosing the number of features for ALS (other than cross-validation?) --sebastian

Re: Does Recommender System Overview Demo work?

2014-03-24 Thread Sebastian Schelter
Hi Bhargav, you are right, the content on the page is outdated and contains some errors. I've created a jira ticket to fix this [1]. Thank you for reporting the problem! [1] https://issues.apache.org/jira/browse/MAHOUT-1485 On 03/24/2014 04:41 AM, Bhargav Golla wrote: Hi I was wondering

Re: Does Recommender System Overview Demo work?

2014-03-24 Thread Sebastian Schelter
/in/bhargavgolla | Website http://www.bhargavgolla.com/ On Mon, Mar 24, 2014 at 2:12 AM, Sebastian Schelter s...@apache.org wrote: Hi Bhargav, you are right, the content on the page is outdated and contains some errors. I've created a jira ticket to fix this [1]. Thank you for reporting the problem

Re: Does Recommender System Overview Demo work?

2014-03-24 Thread Sebastian Schelter
: It was removed in 0.9 and am not sure if it was there in 0.8. I vaguely remember removing it in 0.9 based on a conversation with Manuel on user@. Manuel, if u could chime in here. On Monday, March 24, 2014 9:44 AM, Sebastian Schelter s...@apache.org wrote: The webapp in Mahout does not offer much

Re: Problem with K-Means clustering on Amazon EMR

2014-03-23 Thread Sebastian Schelter
when creating filesystem instances by using the two argument get(...). it's time to update it filesystem 2.0 Apis. Can you file a Jira for this ? If not I will :) On Mar 16, 2014, at 12:37 PM, Sebastian Schelter s...@apache.org wrote: I've also encountered a similar error once. It's really just

Documentation, documentation, documentation

2014-03-22 Thread Sebastian Schelter
Hi, It's great to see a lot of work being spent on cleaning up the website. I think we have already done a great job here, but there are still a few more pages that need work. I created a jira issue for every single page that needs some work, would be awesome if we could find enough

Re: Documentation, documentation, documentation

2014-03-22 Thread Sebastian Schelter
, Sebastian Schelter s...@apache.org wrote: Hi, It's great to see a lot of work being spent on cleaning up the website. I think we have already done a great job here, but there are still a few more pages that need work. I created a jira issue for every single page that needs some work, would

Re: Problem with K-Means clustering on Amazon EMR

2014-03-16 Thread Sebastian Schelter
I've also encountered a similar error once. It's really just the FileSystem.get call that needs to be modified. I think its a good idea to walk through the codebase and refactor this where necessary. --sebastian On 03/16/2014 05:16 PM, Andrew Musselman wrote: Another wild guess, I've had

Re: Compiling Mahout with maven in Eclipse

2014-03-13 Thread Sebastian Schelter
Sebastian Schelter ssc.o...@googlemail.com: Those are autogenerated. On 03/13/2014 09:05 AM, Kevin Moulart wrote: Ok it does compile with maven in eclipse as well, but still, many imports are not recognized in the sources : - import org.apache.mahout.math.function.IntObjectProcedure; - import

Re: Compiling Mahout with maven in Eclipse

2014-03-13 Thread Sebastian Schelter
Are executing maven in the topmost directory? On 03/13/2014 10:09 AM, Kevin Moulart wrote: I did, but then it fails because of these missing files : https://gist.github.com/kmoulart/9524828 Kévin Moulart 2014-03-13 9:57 GMT+01:00 Sebastian Schelter s...@apache.org: Maven should generate

Re: verbose output

2014-03-13 Thread Sebastian Schelter
To my knowledge, there is no such flag for mahout. You can check hadoop's logs for further information however. On 03/13/2014 10:21 AM, Mahmood Naderan wrote: Hi, Is there any verbosity flag for hadoop and mahout commands? I can not find such thing in the command line. Regards, Mahmood

Re: Website, urgent help needed

2014-03-13 Thread Sebastian Schelter
to appear to be stupid asking how to update the documentation (my bad - not anyone else). Now I know that it was not possible unless I was a commiter. Who should I send my scripts to, or how should I proceed with a current form of the page? SCott On 3/12/14, 5:02 AM, Sebastian Schelter s

Re: Problem with FileSystem in Kmeans

2014-03-12 Thread Sebastian Schelter
Hi Bikash, Have you tried adding hdfs:// to your input path? Maybe that helps. --sebastian On 03/11/2014 11:22 AM, Bikash Gupta wrote: Hi, I am running Kmeans in cluster where I am setting the configuration of fs.hdfs.impl and fs.file.impl before hand as mentioned below

Website, urgent help needed

2014-03-12 Thread Sebastian Schelter
Hi, As you've probably noticed, I've put in a lot of effort over the last days to kickstart cleaning up our website. I've thrown out a lot of stuff and have been startled by the amout of outdated and incorrect information on our website, as well as links pointing to nowhere. I think our

Re: Website, urgent help needed

2014-03-12 Thread Sebastian Schelter
to keep it in the new web, I can help pointing them out again. Thanks a lot for your effort. On Wed, Mar 12, 2014 at 7:03 AM, Sebastian Schelter s...@apache.org wrote: Hi, As you've probably noticed, I've put in a lot of effort over the last days to kickstart cleaning up our website. I've thrown

Re: Website, urgent help needed

2014-03-12 Thread Sebastian Schelter
documentation. do send me old documentation and i will check and remove errors. or better let me know how to proceed. Pavan On Mar 12, 2014 12:35 PM, Sebastian Schelter s...@apache.org wrote: Hi, As you've probably noticed, I've put in a lot of effort over the last days to kickstart cleaning up

Re: Website, urgent help needed

2014-03-12 Thread Sebastian Schelter
. Manoj On Wed, Mar 12, 2014 at 12:33 PM, Sebastian Schelter s...@apache.org wrote: Hi, As you've probably noticed, I've put in a lot of effort over the last days to kickstart cleaning up our website. I've thrown out a lot of stuff and have been startled by the amout of outdated and incorrect

Re: Website, urgent help needed

2014-03-12 Thread Sebastian Schelter
me know how to proceed. Pavan On Mar 12, 2014 12:35 PM, Sebastian Schelter s...@apache.org wrote: Hi, As you've probably noticed, I've put in a lot of effort over the last days to kickstart cleaning up our website. I've thrown out a lot of stuff and have been startled by the amout of outdated

Re: Website, urgent help needed

2014-03-12 Thread Sebastian Schelter
. It will in-turn help me understand things better. Do we already have a Jira ticket for organizing the cleaning up of documentation ? Just want to be sure, that I am not stepping on pages some else has already updated. Thanks Regards, Pramit On Wed, Mar 12, 2014 at 3:07 AM, Sebastian Schelter s

Re: Few questions about SVM configuration in Mahout

2014-03-10 Thread Sebastian Schelter
Hi Quentin, Mahout does not have SVMs. Best, Sebastian On 03/10/2014 10:38 AM, Quentin-Gabriel Thurier wrote: Hi all, Just few questions about the configuration of an SVM in Mahout : - Is it possible to do a multi-class classification ? - Which kernels are already available (linear,

Re: [blog post] Comparing Document Classification Functions of Lucene and Mahout

2014-03-09 Thread Sebastian Schelter
Hi Koji, I've added a link to your article to our website: https://mahout.apache.org/general/books-tutorials-and-talks.html On 03/07/2014 03:29 AM, Koji Sekiguchi wrote: Hello, I just posted an article on Comparing Document Classification Functions of Lucene and Mahout.

Re: Heap space

2014-03-09 Thread Sebastian Schelter
I usually do try and error. Start with some very large value and do a binary search :) --sebastian On 03/09/2014 01:30 PM, Mahmood Naderan wrote: Excuse me, I added the -Xmx option and restarted the hadoop services using sbin/stop-all.sh sbin/start-all.sh however still I get heap size

Re: Welcome Andrew Musselman as new comitter

2014-03-08 Thread Sebastian Schelter
:56, Frank Scholten fr...@frankscholten.nl wrote: Congratulations Andrew! On Fri, Mar 7, 2014 at 6:12 PM, Sebastian Schelter s...@apache.org wrote: Hi, this is to announce that the Project Management Committee (PMC) for Apache Mahout has asked Andrew Musselman to become committer and we

Welcome Andrew Musselman as new comitter

2014-03-07 Thread Sebastian Schelter
Hi, this is to announce that the Project Management Committee (PMC) for Apache Mahout has asked Andrew Musselman to become committer and we are pleased to announce that he has accepted. Being a committer enables easier contribution to the project since in addition to posting patches on JIRA

Re: Rework our website

2014-03-06 Thread Sebastian Schelter
just the ones I found in 2 minutes on the quickstart page. Best Regards, Kevin 2014-03-05 23:43 GMT+01:00 Sebastian Schelter s...@apache.org: At the moment, only committers can change the website unfortunately. If you have a text to add, I'm happy to work it in and add your name to our

Re: Rework our website

2014-03-06 Thread Sebastian Schelter
? On Thursday, March 6, 2014 9:07 AM, Sebastian Schelter s...@apache.org wrote: Thank you very much! Could you create a jira ticket and post the links there? That would be awesome, then we can track that this stuff gets fixed. Best, Sebastian On 03/06/2014 02:58 PM, Kevin Moulart wrote: Hi I

Re: Recommend items not rated by any user

2014-03-05 Thread Sebastian Schelter
Hi Juan, that is a good catch. CandidateItemsStrategy is the right place to implement this. Maybe we should simply extend its interface to add a parameter that says whether to keep or remove the current users items? We could even do this in the abstract base class then. --sebastian On

Rework our website

2014-03-05 Thread Sebastian Schelter
Hi everyone, In our latest discussion, I argued that the lack (and errors) of documentation on our website is one of the main pain points of Mahout atm. To be honest, I'm also not very happy with the design, especially fonts and spacing make it super hard to read long articles. This also

Re: Recommend items not rated by any user

2014-03-05 Thread Sebastian Schelter
simply returns all items that the user has not interacted with yet. These are two different things, although they might overlap in some scenarios. Best, Sebastian Thanks. On Wed, Mar 5, 2014 at 10:23 AM, Sebastian Schelter s...@apache.org wrote: Hi Juan, that is a good catch

Re: Recommend items not rated by any user

2014-03-05 Thread Sebastian Schelter
correctly, AllSimilarItemsCandidateItemsStrategy returns all items that have not been rated by the user and the similarity metric returns a non-NaN similarity value that is with at least one of the items preferred by the user. Tevfik On Wed, Mar 5, 2014 at 2:30 PM, Sebastian Schelter s

Re: Recommend items not rated by any user

2014-03-05 Thread Sebastian Schelter
AllUnknownItemsCandidateItemsStrategy. On Wed, Mar 5, 2014 at 6:46 PM, Sebastian Schelter s...@apache.org wrote: So both strategies seems to be effectively the same, I don't know what the implementers had in mind when designing AllSimilarItemsCandidateItemsStrategy. It can take a long time to estimate

Re: Rework our website

2014-03-05 Thread Sebastian Schelter
, at 4:11 AM, Sebastian Schelter s...@apache.org wrote: Hi everyone, In our latest discussion, I argued that the lack (and errors) of documentation on our website is one of the main pain points of Mahout atm. To be honest, I'm also not very happy with the design, especially fonts and spacing make

Re: Mahout-232-0.8.patch using

2014-03-04 Thread Sebastian Schelter
I think you should rather choose a different library that already offers an SVM than trying to revive a 4 year old patch. --sebastian On 03/04/2014 08:51 AM, Amol Kakade wrote: Hi, I am new user of Mahout and want to run sample SVM algorithm with Mahout. Can you please list me steps to use

Re: how to recommend users already consumed items

2014-03-04 Thread Sebastian Schelter
I think we should introduce a new parameter for the recommend() method in the Recommender interface that tells whether already known items should be recommended or not. What do you think? Best, Sebastian On 03/04/2014 05:32 PM, Pat Ferrel wrote: I’d suggest a command line option if you want

Re: how to recommend users already consumed items

2014-03-04 Thread Sebastian Schelter
a Jira issue already. I only use the non-hadoop part of Mahout recommender algorithms. May be I can create a patch for that part. However, I have not done it before, and don't know how to proceed. On Wed, Mar 5, 2014 at 1:01 AM, Sebastian Schelter s...@apache.org wrote: Would you be willing

Re: Issue updating a FileDataModel

2014-03-03 Thread Sebastian Schelter
Hi Juan, IIRC then FileDataModel has a parameter that determines how much time must have been spent since the last modification of the underlying file. You can also directly append new data to the original file. If you want a to have a DataModel that can be concurrently updated, I suggest

Re: classification in standalone application in Apache Mahout 0.9

2014-03-03 Thread Sebastian Schelter
If you don't want to call a shell, I assume you don't want to use a Hadoop cluster, right? In that case, you should rather try Mahout's logistic regression classifier, which is tuned for usage on a single machine. --sebastian On 03/03/2014 03:07 PM, Hollow Quincy wrote: I am looking for

Re: classification in standalone application in Apache Mahout 0.9

2014-03-03 Thread Sebastian Schelter
Sebastian Schelter s...@apache.org: If you don't want to call a shell, I assume you don't want to use a Hadoop cluster, right? In that case, you should rather try Mahout's logistic regression classifier, which is tuned for usage on a single machine. --sebastian On 03/03/2014 03:07 PM, Hollow

Re: Issue updating a FileDataModel

2014-03-03 Thread Sebastian Schelter
in time. Would the fact of adding the new preferences to new files or appending to the existing one make any difference or does everything depends on the time elapsed between two calls to recommender.refresh(null)? Many thanks. On Mon, Mar 3, 2014 at 1:18 PM, Sebastian Schelter s...@apache.org

Re: Mahout-232-0.8.patch using

2014-03-03 Thread Sebastian Schelter
Hi Amol, SVMs are not integrated in Mahout. I'd suggest you try our logistic regression classifier instead. Best, Sebastian On 03/04/2014 08:51 AM, Amol Kakade wrote: Hi, I am new user of Mahout and want to run sample SVM algorithm with Mahout. Can you please list me steps to use

Re: parallelALS and RMSE TEST

2014-03-01 Thread Sebastian Schelter
The output of parallelALS are two matrices U and M whose product is an approximation of your input matrix. The matrices are outputed as sequence files with an IntWritable as key (the index of the row in the matrix) and a VectorWritable as value which holds the contents of the row vector.

Re: Load output of rowsimilarity to memory

2014-02-25 Thread Sebastian Schelter
difference would be that I will write the output to a file that can be later used to create a FileItemSimilarity. I think that would be a very nice feature to have in the API. Thanks again. On Mon, Feb 24, 2014 at 9:27 PM, Sebastian Schelter s...@apache.org wrote: I overlooked that you're

Re: Load output of rowsimilarity to memory

2014-02-25 Thread Sebastian Schelter
, 2014 at 9:27 PM, Sebastian Schelter s...@apache.orgwrote: I overlooked that you're interested in document similarities. Sry again :) Another way would be to read the output of RowSimilarityJob with a o.a.m.common.iterator.sequencefile.SequenceFileDirIterable You create a list of instances

Re: Use Naïve Bayes on a large CSV

2014-02-24 Thread Sebastian Schelter
NaiveBayes expects a SequenceFile as input. The key is the class label as Text, the value are the features as VectorWritable. --sebastian On 02/24/2014 11:51 AM, Kevin Moulart wrote: Hi again, I finally set my mind on going through java to make a sequence file for the naive bayes, but I still

Re: Load output of rowsimilarity to memory

2014-02-24 Thread Sebastian Schelter
The output of RowSimilarityJob can be loaded by the FileItemSimilarity. --sebastian On 02/24/2014 08:31 PM, Juan José Ramos wrote: Is there a way to reproduce this process: https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis+using+the+Mahout+command+line inside Java

Re: Load output of rowsimilarity to memory

2014-02-24 Thread Sebastian Schelter
that the output of RowSimilarityJob can be loaded by the FileItemSimilarity after doing the appropriate parsing. Is that correct, or is there actually a way to load the raw output of RowSimilarityJob into FileItemSimilarity? Thanks. On Mon, Feb 24, 2014 at 7:41 PM, Sebastian Schelter s...@apache.org wrote

Re: Load output of rowsimilarity to memory

2014-02-24 Thread Sebastian Schelter
. In order to use ItemSimilarityJob for this purpose, what should be the input I need to provide? Would it be the output of seq2sparse? Thanks again. On Mon, Feb 24, 2014 at 8:54 PM, Sebastian Schelter s...@apache.org wrote: You're right, my bad. If you don't use RowSimilarityJob directly

Re: Mahout on Spark?

2014-02-19 Thread Sebastian Schelter
. And be very careful with concepts. Something that i so far don't see happening with MLib. MLib seems to be old-style Mahout-like rush to become a collection of basic algorithms rather than coherent foundation. Admittedly, i havent looked very closely. On Tue, Feb 18, 2014 at 11:41 PM, Sebastian

Re: Mahout on Spark?

2014-02-18 Thread Sebastian Schelter
I'm also convinced that Spark is a superior platform for executing distributed ML algorithms. We've had a discussion about a change from Hadoop to another platform some time ago, but at that point in time it was not clear which of the upcoming dataflow processing systems (Spark, Hyracks,

Re: get similar items

2014-02-12 Thread Sebastian Schelter
Hi, Mahout's recommenders are based on analyzing interactions between users and items/movies, e.g. ratings or counts how often the movie was watched. On 02/12/2014 11:34 AM, N! wrote: Hi all: Does anyone have any suggestions for the questions below? thanks a lot. --

Re: Mahout algorithms

2014-02-05 Thread Sebastian Schelter
That is outdated unfortunately. I will send a list of current algorithms shortly. --sebastian On 02/05/2014 11:13 AM, Chameera Wijebandara wrote: Hi Sergey, This will help. https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms Thanks, Chameera On Wed, Feb 5, 2014 at 3:30

Re: Mahout algorithms

2014-02-05 Thread Sebastian Schelter
Hi Sergey, here is the list of algorithms. We're currently in the progress of reworking our wiki, that's why the documentation is unfortunately incorrect at the moment. I've added a ticket for this: https://issues.apache.org/jira/browse/MAHOUT-1413 Here's the current list of algorithms in

Re: SGD classifier demo app

2014-02-04 Thread Sebastian Schelter
Would be great to add this as an example to Mahout's codebase. On 02/04/2014 10:27 AM, Ted Dunning wrote: Frank, I just munched on your code and sent a pull request. In doing this, I made a bunch of changes. Hope you liked them. These include massive simplification of the reading and

Re: Mahout 0.9 Release

2014-02-02 Thread Sebastian Schelter
, Shannon Quinn squ...@gatech.edu wrote: LGTM On 1/29/14, 4:27 PM, peng wrote: +1, can't see a bad side. On Wed 29 Jan 2014 11:33:02 AM EST, Suneel Marthi wrote: +1 from me On Wednesday, January 29, 2014 8:58 AM, Sebastian Schelter s...@apache.org wrote: +1 On 01/29/2014 05:25 AM

Re: generic latent variable recommender question

2014-01-24 Thread Sebastian Schelter
Case 1 is fine as is. For Case 2 I would suggest to simply experiment, try different similarity measures like euclidean distance or cosine and see what gives the best results. --sebastian On 01/25/2014 04:08 AM, Koobas wrote: A generic latent variable recommender question. I passed the

Re: Pig local mode issue

2014-01-22 Thread Sebastian Schelter
I think this question is better suited for the mailinglist of the pig project. On 01/23/2014 01:24 AM, Sameer Tilak wrote: Hi All,My script runs find in map reduce mode, but I get the following error when I run it in the local mode. I have made sure that the i/p file exists. I am not sure

Re: Problem with ItemSimilarityJob, empty part-r-00000

2014-01-21 Thread Sebastian Schelter
Hi Quentin, Have you checked the log to ensure that you don't get any exceptions during the computation? Could you test the job with a tiny example where you can calculate the result by hand? Can you share an input file on which this job fails? --sebastian On 01/21/2014 11:22 AM,

  1   2   3   4   5   6   >