Re: Welcome Trevor Grant as a new Mahout Committer
Congratulations Trevor, well deserved, welcome to the team! On Tue, May 24, 2016 at 12:32 PM, Suneel Marthiwrote: > Welcome Trevor !!! Kokanee Cheers !! > > On Mon, May 23, 2016 at 8:39 PM, Andrew Palumbo > wrote: > > > In recognition of Trevor Grant's contributions to the Mahout project > > notably his Zeppelin Integration work, the PMC has invited and is pleased > > to announce that he has accepted our invitation to join the Mahout > project > > as a committer. > > > > As is customary, I will leave it to Trevor to provide a little bit of > > background about himself. > > > > Congratulations and Welcome! > > > > -Andrew Palumbo > > On Behalf of the Mahout PMC > > >
Re: [VOTE] Apache Mahout 0.10.2 Release Candidate
+1 (binding) verified all signatures and hashes, all tests pass on build from distribution source tarball On Mon, Aug 3, 2015 at 2:22 AM, Andrew Musselman andrew.mussel...@gmail.com wrote: -1 unless this is operator error on my part. $ gpg --verify Downloads/apache-mahout-distribution-0.10.2-src.zip.asc gpg: no signed data gpg: can't hash datafile: file open error On Sun, Aug 2, 2015 at 11:58 AM, Suneel Marthi smar...@apache.org wrote: If u folks have not read the email from last friday that talks about both 0.10.2 and 0.11.0 releases this week, I would suggest that you please do. The plan is to release both 0.10.2 and 0.11.0 this week. Seems like we have some bandwidth in the PMC (atleast per the last 2 emails on this thread) to push thru another release today (I definitely don't have the time) . If someone else wants to push thru 0.11.0, please do so. On Sun, Aug 2, 2015 at 1:27 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: Is there any reason not to release 11 too? On Sunday, August 2, 2015, Pat Ferrel p...@occamsmachete.com wrote: +1 (binding) — do we have to say binding? Why do we continue on Spark 1.2 when all distros have updated to Spark 1.3.1 long ago, and Spark has released 1.4 with 1.5 in the works. This is rather incomprehensible to me since we have the master 0.11.0, running on 1.4 ready to release. Can we please, please also release 0.11.0? On Aug 1, 2015, at 9:35 PM, Andrew Palumbo ap@outlook.com javascript:; wrote: Verified source tar and zip, all tests pass. Ran through all options of the classification and clustering examples in the binary tar.gz distribution in pseudo-cluster mode for MR and Spark without incident. Ran through one option each in the .zip Classification and Clustering examples in both pseudo-cluster and MAHOUT_LOCAL mode without incident. Verified spark-document-classifier.mscala example from the spark-shell in both .zip and .tar.gz binaries. +1 (binding) On 08/01/2015 12:44 AM, Suneel Marthi wrote: Verified {src} * {bin, tar} and all tests pass. +1 (binding) On Fri, Jul 31, 2015 at 11:56 PM, Suneel Marthi smar...@apache.org javascript:; wrote: This is a call for Votes for Mahout 0.10.2 Release candidate available at https://repository.apache.org/content/repositories/orgapachemahout-1011 Need atleast 3 PMC +1 votes for the RC to pass. Voting runs until Sunday Aug 2, 2015. Please verify the following: 1. Sigs and Hashes of Release artifacts (Ted/Drew/Grant/Stevo) 2. AWS testing of {src, bin} * {tar, zip} (Andrew ?) 3. Integration testing of {src,bin} * {tar,zip} (Suneel/AP/) 4. Run thru Examples and scripts
Re: [VOTE] Mahout 0.10.1 Release Candidate
+1 (binding) Verified hashes and signatures; distribution sources tarball and zip unpack well, build passes from unpacked sources. On Sun, May 31, 2015 at 8:34 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: +1 (binding) Verified tests pass for src tarball and zip; I'm comfortable skipping EMR smoke testing for a point release given team opinion that it's not required. On Sun, May 31, 2015 at 9:43 AM, Andrew Palumbo ap@outlook.com wrote: +1 (binding) Ran (on Hadoop 2.4.1 + spark 1.2.1) all examples with all options in the |.tar.gz| binary archive in pseudo-cluster mode and one with MAHOUT_LOCAL=true with only the previously noted minor data issue, which I agree can wait for the next release. Ran a mix and match of the |.zip| binary archive examples with MAHOUT_LOCAL=true and in pseudo-cluster mode without issue. Tested the shell from both archives for qr and matrix display fixes. On 05/31/2015 12:09 PM, Pat Ferrel wrote: +1 (binding) Verified on Spark 1.3 psuedo-clustered HDFS 2.4 There are some cleanup of example data issues that can wait for next release. On May 30, 2015, at 8:16 PM, Suneel Marthi smar...@apache.org wrote: Verified locally build and tests for {source} * {zip, tar}. No issues found. +1 (binding) On Sat, May 30, 2015 at 11:14 PM, Suneel Marthi smar...@apache.org wrote: Andrew Palumbo / Dmitriy: Please also verify the various scenarios as described in M-1693 On Sat, May 30, 2015 at 10:32 PM, Suneel Marthi smar...@apache.org wrote: Here's the new 0.10.1 Release Candidate https://repository.apache.org/content/repositories/orgapachemahout-1009/org/apache/mahout/apache-mahout-distribution/0.10.1/ The Voting ends on Sunday, May 31 2015. Need a +1 from the PMC for each of the line items below for the release to pass. 1. Ted/Grant: Verify hashes and checksums - {binary,source} x {zip,tar} + pom 2. AKM: Verify examples on EMR - {binary, source} * {zip, tar} 3. Andrew Palumbo: Verify examples locally - {binary} * {zip, tar} 4. Suneel: Verify build and tests - {source} * {zip, tar} 5. Pat: Verify examples locally - {source} * {zip, tar} The LICENSE and NOTICE files have not been updated this time and will be addressed in future releases. On Sat, May 30, 2015 at 8:32 PM, Suneel Marthi suneel.mar...@gmail.com wrote: Please hold ur votes, will be refreshing staging with another build in the next hour On Sat, May 30, 2015 at 8:31 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: Likewise source zip and tarballs build and pass tests. On Sat, May 30, 2015 at 3:23 PM, Suneel Marthi smar...@apache.org wrote: Verified {source} * {zip, tar} and all tests pass. +1 (binding) On Sat, May 30, 2015 at 5:28 PM, Suneel Marthi smar...@apache.org wrote: This is a call for VOTE to pass Mahout 0.10.1 release candidate that's available at https://repository.apache.org/content/repositories/orgapachemahout-1008/org/apache/mahout/mahout-distribution/0.10.1/ Need atleast 3 PMC +1 (binding) votes to cut the release Below are the tasks breakdown for the PMC and committers: Andy Palumbo Pat Ferrel: verify the binary artifacts and run tests Suneel AKM: verify the src artifacts Ted/Grant/Drew: verify the hashes and Sigs The LICENSE.txt and NOTICE.txt still need to be updated and will not be addressed as part of 0.10.1 release.
Re: Welcome Andrew Musselman as new comitter
Congratulations and welcome to the team Andrew! On Fri, Mar 7, 2014 at 7:28 PM, Saikat Kanjilal sxk1...@hotmail.com wrote: Congrats Andrew, I've taken the coursera course, it was interesting but was hoping it could cover some more in the area of deep learning. Date: Fri, 7 Mar 2014 12:19:52 -0600 Subject: Re: Welcome Andrew Musselman as new comitter From: scottcc...@gmail.com To: user@mahout.apache.org I personally am looking forward to the ³advice from the newest ³recommended² committer to hadoop. Congratulations to Mahout team for increasing and growing :) Now back to my using Š. (and hopefully creating something meaningful for you guys) Scott PS: am bootstrapping my Machine Learning knowledge by taking the coursera course offered by Andrew NG - correct my shaky knowledge of classifiers. Anyone else on this list taking or have taken this course? (obviously - committers are probably not, but Š.) On 3/7/14, 11:36 AM, Andrew Musselman andrew.mussel...@gmail.com wrote: Thank you for the welcome! Looking forward to it. I have a math background and got started with recommenders by building the first album recommender for Rhapsody ( http://rhapsody.com ) while I was doing web development and web services work for the service. Since then I learned to love/hate Pig and Hadoop for a living, and now I do data engineering and analytics at Accenture. We've used Mahout on a few production projects, and we're looking forward to more. See you on the lists! Best Andrew On Fri, Mar 7, 2014 at 9:12 AM, Sebastian Schelter s...@apache.org wrote: Hi, this is to announce that the Project Management Committee (PMC) for Apache Mahout has asked Andrew Musselman to become committer and we are pleased to announce that he has accepted. Being a committer enables easier contribution to the project since in addition to posting patches on JIRA it also gives write access to the code repository. That also means that now we have yet another person who can commit patches submitted by others to our repo *wink* Andrew, we look forward to working with you in the future. Welcome! It would be great if you could introduce yourself with a few words :) Sebastian
Re: Build Failure in Eclipse
copy-dependencies turned out to be necessary for examples module, so I've reverted the change there. I'm using eclipse myself and I do not have build issues as your reported. First, please make sure that you're using latest eclipse, m2e, and m2e connector for build-helper-maven-plugin. For mahout-math-scala module you will also need Scala IDE for Eclipse, and Maven Integration for Scala IDE installed. Before running the build, update maven project configuration from pom.xml for all modules (select all modules -- right click any of them -- Maven -- Update Project... -- OK). When building, run clean install goals, with or without tests. You can configure all this in eclipse maven build run configuration. Kind regards, Stevo Slavic. On Tue, Dec 3, 2013 at 5:56 PM, Stevo Slavić ssla...@gmail.com wrote: copy-dependencies use seems to be no longer necessary. I've just removed it from POMs. Please pull changes and try again. Kind regards, Stevo Slavic. On Tue, Dec 3, 2013 at 6:03 AM, Tharindu Rusira tharindurus...@gmail.comwrote: I recently updated my Mahout-0.9 snapshot version code and rebuilt from the terminal. The process was successful with no build errors. But when I try to build mahout from Eclipse (run as -- Maven build) I get the following build error while Mahout-Integration is being built. Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:2.8:copy-dependencies (copy-dependencies) on project mahout-integration: Artifact has not been packaged yet. When used on reactor artifact, copy should be executed after packaging: see MDEP-187. - [Help 1] org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:2.8:copy-dependencies (copy-dependencies) on project mahout-integration: Artifact has not been packaged yet. When used on reactor artifact, copy should be executed after packaging: see MDEP-187. (I've attached the full error message in build-error.txt). I even checked http://jira.codehaus.org/browse/MDEP-187 and this maven dependency issue seems still unresolved. Could anyone give a clue what's happening here? Thanks,
Re: ALS-WR Predictions
Hello Stuart, Have a look at description of https://issues.apache.org/jira/browse/MAHOUT-872 In a comment there Sebastian also references wiki entry: https://cwiki.apache.org/confluence/display/MAHOUT/Collaborative+Filtering+with+AlS-WR PredictionJob/predictFromFactorization was replaced with org.apache.mahout.cf.taste.hadoop.als.RecommenderJob = recommendfactorized : Compute recommendations using the factorization of a rating matrix See the RecommenderJob class javadoc for parameters: https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/hadoop/als/RecommenderJob.html Kind regards, Stevo Slavic. On Thu, Sep 12, 2013 at 11:49 AM, Stuart Horsman stuart.hors...@gmail.comwrote: So I'm using predictFromFactorization in Mahout 0.5 but this code was removed from 0.6. Is there any special reason for this? Thanks Stuart On 12 September 2013 08:28, Stuart Horsman stuart.hors...@gmail.com wrote: Hi All, I'm new to mahout so thanks up front for the help. I'm running ALS-WR on a sparse matrix of movies and items and getting some recommendations out. However I want to predict user preference. I noticed there was a predictFromFactorization in previous versions but seems to have been removed. What's the best approach I should take using 0.8? Thanks Stuart
Re: Mahout Javadoc Unavailable (404)
As we don't publish Mahout site (yet) so that last successful build execution site remains active even after failure(s), javadoc report is accessible through that Jenkins link only when Mahout-Quality build job is successful. It turns out that Mahout0Quality build job last execution has failed, hence javadoc link is broken. One can see that build job status at https://builds.apache.org/job/Mahout-Quality/ As master Yoda would say, be patient we must, until build is successful again, or we start publishing site report. As another alternative, one can generate javadoc locally with: mvn clean package -DskipTests=true javadoc:javadoc and then open from local filesystem mahout/core/target/site/apidocs/index.html Darius, similarly to previous time you reported broken wiki link, valid URL is: https://cwiki.apache.org/confluence/display/MAHOUT/Mahout+Wiki#MahoutWiki-Examples I'm not sure yet what (maybe some Confluence upgrade, or something else related) has in the past caused duplication of URLs. Anyway, take a look at the two broken and two corresponding valid URLs, I'm sure you'll notice the pattern. To find valid one, one can also just google mahout wiki page name (e.g. mahout wiki examples), search hits will have valid URLs. Kind regards, Stevo Slavic. On Tue, Aug 27, 2013 at 1:22 PM, Darius Miliauskas dariui.miliaus...@gmail.com wrote: Dear All, ant the links from Wiki do not work as well: https://cwiki.apache.org/MAHOUT/mahout-wiki.html#MahoutWiki-Examples Best Wishes, Darius 2013/8/27 Darius Miliauskas dariui.miliaus...@gmail.com Dear All, that's absolutely true, Rafal! I almost wanted to write the same. Moreover, some links in http://mahout.apache.org/; are not working as well. Guys, I do not know who is responsible for the site but the site is the first face of the project. So, it would be nice if you spend some hours just to make the site properly function, and in some places more understandable. It is more important than new functionalities in the new release because everybody judge you from the first impression. I already wrote that the instruction how to install, and run is very messy. I managed to install but I spend almost a day to understand what was written there. README.txt file is not also very helpful. What a waste while the good instruction would make only half an hour, and mahout more attractive! Just write a page instead of giving the links, and make the project look more attractive and substantial. You should revise the instructions, and make it more simple for the ordinary user. Best Wishes, Darius 2013/8/27 Rafal Lukawiecki ra...@projectbotticelli.com Dear All, The Javadoc for Mahout usually available on http://builds.apache.org/job/Mahout-Quality/javadoc/ and linked to from Jenkins at http://builds.apache.org/job/Mahout-Quality/ is showing error 404 at the moment. Rafal -- Rafal Lukawiecki Strategic Consultant and Director Project Botticelli Ltd
Re: Cannot build source version mahout-distribution-0.8
Hello Michael, Seems like temporary Maven Central repo mirror(s) issue. I've just tried several times to open with browser http://repo1.maven.org/maven2/org/apache/maven/plugins/ and sometimes it responds well, and few times it returns empty page. So, please try again. Kind regards, Stevo Slavic. On Tue, Aug 27, 2013 at 3:59 PM, Michael Wechner michael.wech...@wyona.comwrote: Hi I have downloaded http://mirror.switch.ch/**mirror/apache/dist/mahout/0.8/** mahout-distribution-0.8-src.**ziphttp://mirror.switch.ch/mirror/apache/dist/mahout/0.8/mahout-distribution-0.8-src.zip and tried to build it with mvn -DskipTests clean install on Mac OS X 10.6.8 with Java 1.6.0_45 and Maven 3.0.4 but reveived the following errors: [INFO] --**--** [INFO] Reactor Summary: [INFO] [INFO] Mahout Build Tools ..**.. SUCCESS [13.168s] [INFO] Apache Mahout ..**... SUCCESS [2.823s] [INFO] Mahout Math ..**. SUCCESS [1:02.822s] [INFO] Mahout Core ..**. SUCCESS [1:26.430s] [INFO] Mahout Integration ..**.. FAILURE [1:45.435s] [INFO] Mahout Examples ..**. SKIPPED [INFO] Mahout Release Package SKIPPED [INFO] --**--** [INFO] BUILD FAILURE [INFO] --**--** [INFO] Total time: 4:31.448s [INFO] Finished at: Tue Aug 27 15:19:31 CEST 2013 [INFO] Final Memory: 27M/123M [INFO] --**--** [ERROR] Failed to execute goal on project mahout-integration: Could not resolve dependencies for project org.apache.mahout:mahout-**integration:jar:0.8: Could not transfer artifact com.ibm.icu:icu4j:jar:49.1 from/to central ( http://repo.maven.apache.org/**maven2http://repo.maven.apache.org/maven2): GET request of: com/ibm/icu/icu4j/49.1/icu4j-**49.1.jar from central failed: Premature end of Content-Length delimited message body (expected: 7407144; received: 4098921 - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/**confluence/display/MAVEN/** DependencyResolutionExceptionhttp://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn goals -rf :mahout-integration Does anybody else experience the same problem? Thanks Michael
Re: RowSimilarityJob, sampleDown method problem
Findbugs was reporting it whole time (see Warnings tab on https://builds.apache.org/job/Mahout-Quality/2194/findbugsResult/ and ICAST_IDIV_CAST_TO_DOUBLE bug). We should get findbugs to 0. On Tue, Aug 13, 2013 at 9:13 PM, sam wu swu5...@gmail.com wrote: Sorry for the phrasing. I'll file a JIRA Sam On Tue, Aug 13, 2013 at 12:10 PM, Ted Dunning ted.dunn...@gmail.com wrote: Ouch. Sorry... your original posting made it sound like you *wanted* it to be 0.0 or 1.0. This is a bug. Can you file a JIRA? On Tue, Aug 13, 2013 at 12:04 PM, sam wu swu5...@gmail.com wrote: say column a has 1000 entries, maxPref=700 rowSampleRate = Math.min(maxObservationsPerRow, observationsPerRow) / observationsPerRow; we get rowSampleRate =0.0 ( not 0.7) do we totally skip this column or sample column entries with .7 probalility (roughly get 700 entries) On Tue, Aug 13, 2013 at 11:58 AM, Ted Dunning ted.dunn...@gmail.com wrote: Why do you think this? On Tue, Aug 13, 2013 at 11:56 AM, sam wu swu5...@gmail.com wrote: Mahout 0.9 snapshot RowSimilarityJob.java , sampleDown method line 291 or 300 double rowSampleRate = Math.min(maxObservationsPerRow, observationsPerRow) / observationsPerRow; return either 0.0 or 1.0, not fraction. needs (double) casting BR Sam
Re: mahout kmeans not generating clusteredPoint dir?
Current Mahout examples cluster Reuters build has same issue: https://builds.apache.org/user/sslavic/my-views/view/Mahout/job/Mahout-Examples-Cluster-Reuters/395/console Kind regards, Stevo Slavic. On Wed, Jul 17, 2013 at 11:42 AM, Fuhrmann Alpert, Galit galp...@ebay.comwrote: Thanks Suneel. I tried to add this flag (though I think clusteredPoints directory was supposed to be created by default?). Either way, for some reason whenever I add '-cl' (tried to run it on several data sets), I get the following error: There is no queue named default (even though I do specify a queue by -Dmapred.job.queue.name=...). I don't get this error otherwise. Has anyone ever encountered this error? Is there some sort of configuration I'm missing? Thanks, Galit. -Original Message- From: Suneel Marthi [mailto:suneel_mar...@yahoo.com] Sent: Wednesday, July 10, 2013 5:30 PM To: user@mahout.apache.org Subject: Re: mahout kmeans not generating clusteredPoint dir? Been a while since I last worked with this, I believe u r missing the clustering option '-cl'. Give that a try. From: Fuhrmann Alpert, Galit galp...@ebay.com To: user@mahout.apache.org user@mahout.apache.org Sent: Wednesday, July 10, 2013 5:17 AM Subject: mahout kmeans not generating clusteredPoint dir? Hello, I ran mahout kmeans (using rand seeds) on hadoop cluster. It ran successfully and created a directory containing clusters-*, including the last which was clusters-3-final. However, it did not create the clusteredPoints, or at least I cannot find it under the same dir (or anywhere else). My call was: mahout kmeans -k 4000 -i inputSeq.dat -o outputPath --maxIter 3 --clusters outputSeeds Was there an extra argument I needed to specify in order for it to generate the clusteredPoints? (BTW I also can't see the outputSeeds. Was it created for seeds and then deleted?) According to mahout in action: The k-means clustering implementation creates two types of directories in the output folder. The clusters-* directories are formed at the end of each iteration: the clusters-0 directory is generated after the first iteration, clusters-1 after the second iteration, and so on. These directories contain information about the clusters: centroid, standard deviation, and so on. The clusteredPoints directory, on the other hand, contains the final mapping from cluster ID to document ID. This data is generated from the output of the last MapReduce operation. The directory listing of the output folder looks something like this: $ ls -l reuters-kmeans-clusters drwxr-xr-x 4 user 5000 136 Feb 1 18:56 clusters-0 drwxr-xr-x 4 user 5000 136 Feb 1 18:56 clusters-1 drwxr-xr-x 4 user 5000 136 Feb 1 18:56 clusters-2 ... drwxr-xr-x 4 user 5000 136 Feb 1 18:59 clusteredPoint Again, my call did not generate the clusteredPoint directory. I would appreciate your help. Thanks a lot, Galit.
Re: Setting up a recommender
I see Ted created JIRA ticket for this already: https://issues.apache.org/jira/browse/MAHOUT-1288 We should consider changing issue type (currently - bug). One might find this Berlin Buzzwords 2013 recordinghttp://www.youtube.com/watch?v=fWR1T2pY08Yand slideshttp://www.slideshare.net/tdunning/buzz-wordsdunningmultimodalrecommendationof Ted's talk on the subject helpful to understand the terms used and idea. I guess we could start with single kind of interaction/behavior, and consider adding more later. Shall we make it separate subproject (so on level of mahout and site, but still under mahout svn), or make a new mahout submodule, or change mahout examples from single module to a multimodule structure and add the recommender demo as submodule there? I'm fine with Maven tasks, to some extent Solr too (not the most recent versions, but I see it as nice opportunity to update). Kind regards, Stevo Slavic. On Sun, Jul 21, 2013 at 12:15 AM, Ted Dunning ted.dunn...@gmail.com wrote: To kick this off, I have created a design document that is open for comments. Much detail is needed here. I will create a JIRA as well, but the google doc is much easier for collating lots of input into a coherent document. The directory that the document is stored in is accessible at http:// bit.ly/18vbbaT http://bit.ly/18vbbaT Once we get going, we can talk about how to coordinate tasks between hangouts. One option is a public Trello project: https://trello.com/ or we can use JIRA sub-tasks. On Sat, Jul 20, 2013 at 11:25 AM, Andrew Psaltis andrew.psal...@webtrends.com wrote: I am very interested in collaborating on the off-line to Solr part. Just let me know how we want to get going. Thanks, Andrew On 7/19/13 4:45 PM, Ted Dunning ted.dunn...@gmail.com wrote: OK. I think the crux here is the off-line to Solr part so let's see who else pops up. Having a solr maven could be very helpful. On Fri, Jul 19, 2013 at 3:39 PM, Luis Carlos Guerrero Covo lcguerreroc...@gmail.com wrote: I'm currently working for a portal that has a similar use case and I was thinking of implementing this in a similar way. I'm generating recommendations using python scripts based on similarity measures (content based recommendation) only using euclidean distance and some weights for each attribute. I want to use mahout's GenericItemBasedRecommender to generate these same recommendations without user data (no tracking right now of user to item relationship). I was thinking of pushing the generated recommendations to solr using atomic updates since my fields are all stored right now. Since this is very similar to what I'm trying to accomplish, I would sign up to collaborate in any way I can since I'm fairly familiar with solr and I'm starting to learn my way around mahout. On Fri, Jul 19, 2013 at 5:12 PM, Sebastian Schelter s...@apache.org wrote: I would also be willing to provide guidance and advice for anyone taking this on, I can especially help with the offline analysis part. --sebastian 2013/7/19 Ted Dunning ted.dunn...@gmail.com I would be happy to supervise a project to implement a demo of this if anybody is willing to do the grunt work of gluing things together. Sooo, if you would like to work on this, here is a suggested project. This project would entail: a) build a synthetic data source b) write scripts to do the off-line analysis c) write scripts to export to Solr d) write a very quick web facade over Solr to make it look like a recommendation engine. This would include d.1) a most popular page that does combined popularity rise and recommendation d.2) a personal recommendation page that does just recommendation with dithering d.3) item pages with related items at the bottom e) work with others to provide high quality system walk-through and install directions If you want to bite on this, we should arrange a weekly video hangout. I am willing to commit to guiding and providing detailed technical approaches. You should be willing to commit to actually doing stuff. The goal would be to provide a fully worked out scaffolding of a practical recommendation system that presumably would become an example module in Mahout. On Fri, Jul 19, 2013 at 1:08 PM, B Lyon bradfl...@gmail.com wrote: +1 as well. Sounds fun. On Fri, Jul 19, 2013 at 4:06 PM, Dominik Hübner cont...@dhuebner.com wrote: +1 for getting something like that in a future release of Mahout On Jul 19, 2013, at 10:02 PM, Sebastian Schelter s...@apache.org wrote: It would be awesome if we could get a nice, easily deployable
Re: FP Growth
Hello Mahout community, Multiple algos got axed for various reasons, e.g. more scalable solution exists and got implemented, or algo is not really used much in real world for whatever reason. I'd like Mahout developers to consider not removing those implementations but to create a separate submodule for them, or a project, and minimize maintenance effort for that part. Even those algorithms do not meet project main goals and are not best for running in production, I believe it would be useful to have them in the project for educational purposes, machine learning in general but also to learn Mahout (APIs, command line) on presumably simpler, more widely known algorithm. Kind regards, Stevo Slavic. On Sun, Jun 2, 2013 at 4:07 AM, DaiQinghao rogerda...@gmail.com wrote: hello sir,may I ask what dev support do you expect? --Best Regards,Qinghao From: gsing...@apache.org Subject: FP Growth Date: Sat, 1 Jun 2013 23:15:56 +0200 To: user@mahout.apache.org FP Growth seems to not have a lot of dev support. Are there users out there using it? Should it live on or get the axe prior to 1.0? -Grant
Re: mahout colt collections
maven-javadoc-plugin generates javadoc OK for Mahout Math module: https://builds.apache.org/job/Mahout-Quality/ws/trunk/math/target/site/apidocs/index.html Jenkins javadoc plugin seems to have some issues with showing javadoc for generated sources: https://builds.apache.org/job/Mahout-Quality/javadoc/ Maybe it's a bug in Jenkins or Jenkins javadoc plugin. On Mon, May 20, 2013 at 12:27 AM, Ted Dunning ted.dunn...@gmail.com wrote: Sophie, Can you say a bit more about what you want to do? On Sun, May 19, 2013 at 2:22 PM, Sophie Sperner sophie.sper...@gmail.com wrote: Dear, I'm experiencing difficulties with hppchttp://labs.carrotsearch.com/hppc.htmllibrary that I'm using. My algorithms work perfectly fine for small inputs, but when I go for amazon machine and want to compute larger inputs, my code hangs on forever as a result of some hidden bugs in that library (probably it was not tested for big data). I'm aware of Colt CERN library for Java primitive computing. It's like Troove, hppc, fastutil or whatever. But Colt is dead. So what I know is mahout uses old code of Colt and what I want now is to find any document or articles on how to use similar structures for OpenIntSet, OpenIntIntHashMap, OpenIntObjectHashMap and so on in order to convert my present code to mahout colt library. If there is no such document, *could you please point me at where this library is? (I need jar file with those collections).* Hope the API will be quite similar to the one I use so that converting phase will be relatively easy. Thank you and best wishes. -- Yours, Sophie
Re: mahout colt collections
Hello Sophie, Mahout 0.7 Math module is available on Maven Central repository: http://repo1.maven.org/maven2/org/apache/mahout/mahout-math/0.7/ Besides jar with binaries there is also a javadoc and sources jar. I've just counted, since 0.7 release there have been 60 commits which included math module so you might consider using latest code. Binaries of latest 0.8 snapshots are available on Apache snapshots repository: https://repository.apache.org/content/groups/snapshots/org/apache/mahout/mahout-math/0.8-SNAPSHOT/ Latest sources mirror is available on github: https://github.com/apache/mahout Javadoc for latest sources is at: https://builds.apache.org/hudson/job/Mahout-Quality/javadoc and look there for org.apache.mahout.math and nested packages. Kind regards, Stevo Slavic. On Sun, May 19, 2013 at 11:22 PM, Sophie Sperner sophie.sper...@gmail.comwrote: Dear, I'm experiencing difficulties with hppchttp://labs.carrotsearch.com/hppc.htmllibrary that I'm using. My algorithms work perfectly fine for small inputs, but when I go for amazon machine and want to compute larger inputs, my code hangs on forever as a result of some hidden bugs in that library (probably it was not tested for big data). I'm aware of Colt CERN library for Java primitive computing. It's like Troove, hppc, fastutil or whatever. But Colt is dead. So what I know is mahout uses old code of Colt and what I want now is to find any document or articles on how to use similar structures for OpenIntSet, OpenIntIntHashMap, OpenIntObjectHashMap and so on in order to convert my present code to mahout colt library. If there is no such document, *could you please point me at where this library is? (I need jar file with those collections).* Hope the API will be quite similar to the one I use so that converting phase will be relatively easy. Thank you and best wishes. -- Yours, Sophie
Re: mahout colt collections
They do, but it seems javadoc generation is not configured well - doesn't generate report for generated sources. org.apache.mahout.math.set.OpenIntHashSet org.apache.mahout.math.map.OpenIntIntHashMap On Mon, May 20, 2013 at 12:15 AM, Sophie Sperner sophie.sper...@gmail.comwrote: Dear Stevo, By this link https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/math/package-summary.html there is no OpenIntHashSet or OpenIntIntHashMap classes or with similar names, do they exist there? Thank you for reply, Best wishes On 19 May 2013 22:50, Stevo Slavić ssla...@gmail.com wrote: Hello Sophie, Mahout 0.7 Math module is available on Maven Central repository: http://repo1.maven.org/maven2/org/apache/mahout/mahout-math/0.7/ Besides jar with binaries there is also a javadoc and sources jar. I've just counted, since 0.7 release there have been 60 commits which included math module so you might consider using latest code. Binaries of latest 0.8 snapshots are available on Apache snapshots repository: https://repository.apache.org/content/groups/snapshots/org/apache/mahout/mahout-math/0.8-SNAPSHOT/ Latest sources mirror is available on github: https://github.com/apache/mahout Javadoc for latest sources is at: https://builds.apache.org/hudson/job/Mahout-Quality/javadoc and look there for org.apache.mahout.math and nested packages. Kind regards, Stevo Slavic. On Sun, May 19, 2013 at 11:22 PM, Sophie Sperner sophie.sper...@gmail.comwrote: Dear, I'm experiencing difficulties with hppchttp://labs.carrotsearch.com/hppc.htmllibrary that I'm using. My algorithms work perfectly fine for small inputs, but when I go for amazon machine and want to compute larger inputs, my code hangs on forever as a result of some hidden bugs in that library (probably it was not tested for big data). I'm aware of Colt CERN library for Java primitive computing. It's like Troove, hppc, fastutil or whatever. But Colt is dead. So what I know is mahout uses old code of Colt and what I want now is to find any document or articles on how to use similar structures for OpenIntSet, OpenIntIntHashMap, OpenIntObjectHashMap and so on in order to convert my present code to mahout colt library. If there is no such document, *could you please point me at where this library is? (I need jar file with those collections).* Hope the API will be quite similar to the one I use so that converting phase will be relatively easy. Thank you and best wishes. -- Yours, Sophie -- Yours, Sophie
Re: Finding best NearestNUserNeighborhood size
When evaluating recommender before running evaluator put RandomUtils.useTestSeed(); to make splitting of data set consistent; don't use it in production, just for evaluation. This is all explained more thoroughly in Mahout in Action book. Kind regards, Stevo Slavic. On Wed, Jan 23, 2013 at 2:01 PM, Zia mel ziad.kame...@gmail.com wrote: Hi I used NearestNUserNeighborhood with RMSE in a user recommender that use PearsonCorrelationSimilarity , I found that changing the neighborhood size has no clear pattern or effect. Sometimes it increase others decrease. While using the neighborhood size with precision has a better pattern. Any reason? Another point is that the RMSE change for every run since it choose different sample , so would running the code for 10 or 20 times and taking the average be a good idea or there is better thing to do? //-- RUN 1 2, 0.5523623146152608 3, 0.5425283201773704 4, 0.669846658662311 5, 0.5956616542334392 6, 0.6033911039809353 7, 0.6135206544496685 8, 0.5740444208649034 9, 0.642798288443049 10, 0.626653651472 //-- RUN 2 2, 0.5415411343523825 3, 0.6784589323396696 4, 0.6347069968141124 5, 0.6968820296725008 6, 0.5953849874479478 7, 0.6791828191904128 8, 0.6072462830257853 9, 0.6461346217476011 10, 0.6043919119341171 Thanks !
Re: m2e and mahout problem
Nice, glad it works for you all. Hopefully it will be accepted and merged soon. On Wed, Jan 9, 2013 at 5:03 PM, Ying Liao yliao...@gmail.com wrote: it works! Thanks. Ying On Mon, Jan 7, 2013 at 8:33 PM, Marty Kube martyk...@beavercreekconsulting.com wrote: This patch worked for me: https://issues.apache.org/**jira/browse/MAHOUT-1136 https://issues.apache.org/jira/browse/MAHOUT-1136 On 01/07/2013 05:27 PM, Ying Liao wrote: Hi Marty, Yes, I see some email threads from you. I used mvn eclipse:eclipse to build the project and imported to Eclipse. Thanks! I was wondering if there is a way to modify pom file to work around the m2e issue. Thanks, Ying On Fri, Jan 4, 2013 at 8:26 PM, Marty Kube martykube@**beavercreekconsulting.com martyk...@beavercreekconsulting.com wrote: Hi Ying, I had the same problem and checked in on the m2e users list: http://dev.eclipse.org/mhonarc/lists/m2e-users/msg03611.html http://dev.eclipse.org/**mhonarc/lists/m2e-users/**msg03611.html http://dev.**eclipse.org/mhonarc/lists/m2e-**users/msg03611.html http://dev.eclipse.org/mhonarc/lists/m2e-users/msg03611.html I've been a loyal eclipse user for like 10 years. This year I started working on large maven projects. I struggled to get m2e eclipse working and ended up switching to NetBeans. That hurt. On 01/04/2013 04:16 PM, Ying Liao wrote: Hi, I checked out latest mahout project from truck, successfully built it in maven. But I am not able to import the project to eclipse. I got error message: Cannot parse lifecycle mapping metadata for maven project MavenProject: org.apache.mahout:mahour:0.8-SNAPSHOT @ ... Cause: Unrecognized tag: 'version' (position: START_TAG seen ... /artifactid\n version... @8:18 ) I am using eclipse JUNO and m2e 1.2. Any help is appreciated. Thanks, Ying
Re: Updated MIA samples
https://github.com/tdunning/MiA/tree/mahout-0.7 On Thu, Jan 3, 2013 at 10:33 AM, Robin Chesterman robinchester...@gmail.com wrote: Does anyone know if there are up to date code samples from Mahout in Action anywhere that work with 0.7 (I think the book was aimed at 0.5)? Having a little trouble getting started with the clustering examples Thanks!
Re: Contributors link throws NullPointerException
Contributors link (JIRA) on bottom of the page, throws NPE for me too. It points to: http://issues.apache.org/jira/secure/ConfigureReport.jspa?versionId=-1issueStatus=allselectedProjectId=12310751reportKey=com.sourcelabs.jira.plugin.report.contributions%3AcontributionreportNext=Next Kind regards, Stevo Slavic. On Mon, Oct 8, 2012 at 1:53 PM, Sean Owen sro...@gmail.com wrote: It works OK for me -- I assume this was some transient glitch in Confluence. On Mon, Oct 8, 2012 at 12:51 PM, Ahmet Arslan iori...@yahoo.com wrote: Hello, Contributors link at https://cwiki.apache.org/confluence/display/MAHOUT/Who+We+Are throws the following : java.lang.NullPointerException at com.atlassian.jira.web.action.browser.ConfigureReport.getReportModule(ConfigureReport.java:217) at com.atlassian.jira.web.action.browser.ConfigureReport.doExecute(ConfigureReport.java:116) at webwork.action.ActionSupport.execute(ActionSupport.java:165) Ahmet
Re: Contributors link throws NullPointerException
I guess this was created for Hadoop project, got moved to infrastructure JIRA project, and closed there with explanation that it's JIRA bug: https://issues.apache.org/jira/browse/INFRA-5014 Kind regards, Stevo Slavic. On Mon, Oct 8, 2012 at 1:56 PM, Stevo Slavić ssla...@gmail.com wrote: Contributors link (JIRA) on bottom of the page, throws NPE for me too. It points to: http://issues.apache.org/jira/secure/ConfigureReport.jspa?versionId=-1issueStatus=allselectedProjectId=12310751reportKey=com.sourcelabs.jira.plugin.report.contributions%3AcontributionreportNext=Next Kind regards, Stevo Slavic. On Mon, Oct 8, 2012 at 1:53 PM, Sean Owen sro...@gmail.com wrote: It works OK for me -- I assume this was some transient glitch in Confluence. On Mon, Oct 8, 2012 at 12:51 PM, Ahmet Arslan iori...@yahoo.com wrote: Hello, Contributors link at https://cwiki.apache.org/confluence/display/MAHOUT/Who+We+Are throws the following : java.lang.NullPointerException at com.atlassian.jira.web.action.browser.ConfigureReport.getReportModule(ConfigureReport.java:217) at com.atlassian.jira.web.action.browser.ConfigureReport.doExecute(ConfigureReport.java:116) at webwork.action.ActionSupport.execute(ActionSupport.java:165) Ahmet
Re: Import Mahout's source code to eclipse
Do not combine maven-eclipse-plugin (eclipse:eclipse) and m2e plugin for eclipse. To explain what's happening when you import with m2e: For maven plugins configured in build scripts to execute on specific build lifecycle phases m2e needs metadata/info on what to do with them - execute, ignore, ... when file changes in an eclipse project. m2e can consume these metadata from build scripts themselves, from m2e connectors (m2e plugins), or (as of m2e 1.1) from maven plugins. For more info on this, read http://wiki.eclipse.org/M2E_plugin_execution_not_covered For the two maven plugins used in Mahout build scripts there are no such sources of build lifecycle mapping metadata or logic. maven-antrun-plugin is used basically to copy a resource on compile phase (see details in pom: https://svn.apache.org/repos/asf/mahout/trunk/core/pom.xml ) I don't understand yet why isn't this resource named and placed under src/main/resources so that maven-antrun-plugin can be removed. As temporary workaround, you can configure lifecycle metadata in mahout parent pom just to ignore maven-antrun-plugin run goal executions. Another pom, https://svn.apache.org/repos/asf/mahout/trunk/math/pom.xml , makes use of mahout-collections-codegen-plugin to generate sources. This plugin should include lifecycle mapping metadata, but it does not. SO as another temporary workaround you can configure lifecycle metadata in mahout parent pom to execute mahout-collection-codegen-plugin generate goal Kind regards, Stevo Slavic. On Mon, Jul 9, 2012 at 6:03 PM, chenghao liu twins...@gmail.com wrote: try mvn eclipse:eclipse 2012/7/9 huangjia cucumbergua...@gmail.com Hi all, I'm reading Mahout in Action and new to Mahout. Before I can run the code in 2.2.2 Creating a recommender, I think I need to import Mahout into Eclipse first. However, encountered a problem when trying to import *Mahout*'s source code to eclipse. My steps are as follows. 1 Start Eclipse, click Help-Install new software: enter m2e - http://download.eclipse.org/technology/m2e/releases; 2 downloaded the Mahout source zip file, unzip it, and put it under the workspace of Eclipse. 3 Start Eclipse, click File-Import-Maven-Existing Maven Projects. Then chose the Mahout source file as the Root Directory. However, when I first tried it, it popped out some errors as in this post http://stackoverflow.com/questions/11282737/errors-when-importing-mahouts-maven-resource-to-eclipse . [image: enter image description here] But when I tried it today, the Next button becomes gray, so I cannot proceed. [image: Inline image 1] Does anyone have an idea of what's going on? Info: I'm using Eclipse Helios, and mahout distribution 0.7. Is it because the Helios does not support Mahout? But it shouldn't, I think, since this person did it successfully. http://shuyo.wordpress.com/2011/02/01/mahout-development-environment-with-maven-and-eclipse-1/ Or, instead of installing Maven in Eclipse, I should download it separately? Thanks very much! Jia
Re: Import Mahout's source code to eclipse
See https://issues.apache.org/jira/browse/MAHOUT-1043 for more info and feel free to vote up if you're interested in having Apache Mahout sources importable and buildable in modern eclipse/maven environment. Kind regards, Stevo Slavic. On Mon, Jul 9, 2012 at 6:59 PM, huangjia cucumbergua...@gmail.com wrote: Hi Stevo, Sorry, but I couldn't quite understand your answer. Do you suggest that I change the pom.xml? Is there a permanent solution to my problem? Hi Chenghao, Where shall I execute the mvn eclipse:eclipse command? In Cygwin? Thank you both ! Jia On Mon, Jul 9, 2012 at 12:13 PM, Stevo Slavić ssla...@gmail.com wrote: Do not combine maven-eclipse-plugin (eclipse:eclipse) and m2e plugin for eclipse. To explain what's happening when you import with m2e: For maven plugins configured in build scripts to execute on specific build lifecycle phases m2e needs metadata/info on what to do with them - execute, ignore, ... when file changes in an eclipse project. m2e can consume these metadata from build scripts themselves, from m2e connectors (m2e plugins), or (as of m2e 1.1) from maven plugins. For more info on this, read http://wiki.eclipse.org/M2E_plugin_execution_not_covered For the two maven plugins used in Mahout build scripts there are no such sources of build lifecycle mapping metadata or logic. maven-antrun-plugin is used basically to copy a resource on compile phase (see details in pom: https://svn.apache.org/repos/asf/mahout/trunk/core/pom.xml ) I don't understand yet why isn't this resource named and placed under src/main/resources so that maven-antrun-plugin can be removed. As temporary workaround, you can configure lifecycle metadata in mahout parent pom just to ignore maven-antrun-plugin run goal executions. Another pom, https://svn.apache.org/repos/asf/mahout/trunk/math/pom.xml, makes use of mahout-collections-codegen-plugin to generate sources. This plugin should include lifecycle mapping metadata, but it does not. SO as another temporary workaround you can configure lifecycle metadata in mahout parent pom to execute mahout-collection-codegen-plugin generate goal Kind regards, Stevo Slavic. On Mon, Jul 9, 2012 at 6:03 PM, chenghao liu twins...@gmail.com wrote: try mvn eclipse:eclipse 2012/7/9 huangjia cucumbergua...@gmail.com Hi all, I'm reading Mahout in Action and new to Mahout. Before I can run the code in 2.2.2 Creating a recommender, I think I need to import Mahout into Eclipse first. However, encountered a problem when trying to import *Mahout*'s source code to eclipse. My steps are as follows. 1 Start Eclipse, click Help-Install new software: enter m2e - http://download.eclipse.org/technology/m2e/releases; 2 downloaded the Mahout source zip file, unzip it, and put it under the workspace of Eclipse. 3 Start Eclipse, click File-Import-Maven-Existing Maven Projects. Then chose the Mahout source file as the Root Directory. However, when I first tried it, it popped out some errors as in this post http://stackoverflow.com/questions/11282737/errors-when-importing-mahouts-maven-resource-to-eclipse . [image: enter image description here] But when I tried it today, the Next button becomes gray, so I cannot proceed. [image: Inline image 1] Does anyone have an idea of what's going on? Info: I'm using Eclipse Helios, and mahout distribution 0.7. Is it because the Helios does not support Mahout? But it shouldn't, I think, since this person did it successfully. http://shuyo.wordpress.com/2011/02/01/mahout-development-environment-with-maven-and-eclipse-1/ Or, instead of installing Maven in Eclipse, I should download it separately? Thanks very much! Jia -- Jia Huang PhD student College of Information Science Technology Drexel University
Re: Starting out with Mahout
Configure MAVEN_OPTS environment variable, to give Maven more memory. Regards, Stevo. On May 7, 2011 11:59 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: the jvm seems a little bit dated (hadoop and by extension Mahout is not recommended for use on 18 but 19 and on should be fine) but otherwise I must admit i don't see anything wrong with the setup. On Sat, May 7, 2011 at 2:52 PM, Brent Downs brentcdo...@hotmail.com wrote: For Java, I haveja...