[jira] [Commented] (MAHOUT-1413) Rework algorithms page
[ https://issues.apache.org/jira/browse/MAHOUT-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13893018#comment-13893018 ] Tharindu Rusira commented on MAHOUT-1413: - [~chameerawijebandara], Mahout wiki is currently running on CMS. So Suneel is right, only commiters are allowed to update the wiki. But still, as a non-commiter, you can submit patches to the mailing list. Please see this CMS reference[1] [1] http://www.apache.org/dev/cmsref.html#non-committer Rework algorithms page -- Key: MAHOUT-1413 URL: https://issues.apache.org/jira/browse/MAHOUT-1413 Project: Mahout Issue Type: Improvement Reporter: Sebastian Schelter Priority: Critical It's crucial that we update our algorithms page to reflect the current state of algorithms in Mahout 0.9!!! https://mahout.apache.org/users/basics/algorithms.html -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Error while running ./cluster-reuters.sh with option lda clustering
Hi all, I'm running Mahout examples from the latest Mahout 0.9 release candidate. I got this error while running ./cluster-reuters.sh with option 3 lda clustering. As to the error log, this does not seem to be a Mahout issue but Hadoop(1.2.1) fails to write to */tmp/mahout-work-tkumara/reuters-lda. *This is however strange because /tmp/mahout-work-tkumara/ does not have a *reuters-lda *directory and the exception stack trace complains that the said directory already exists. 14/01/31 15:20:39 ERROR security.UserGroupInformation: PriviledgedActionException as:tkumara cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /tmp/mahout-work-tkumara/reuters-lda already exists Exception in thread main org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /tmp/mahout-work-tkumara/reuters-lda already exists at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:137) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:973) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:394) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) at org.apache.hadoop.mapreduce.Job.submit(Job.java:550) at org.apache.mahout.clustering.lda.cvb.CVB0Driver.writeTopicModel(CVB0Driver.java:441) at org.apache.mahout.clustering.lda.cvb.CVB0Driver.run(CVB0Driver.java:336) at org.apache.mahout.clustering.lda.cvb.CVB0Driver.run(CVB0Driver.java:198) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.mahout.clustering.lda.cvb.CVB0Driver.main(CVB0Driver.java:534) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) I also checked relevant section in ./cluster-reuters.sh but could not find anything there. elif [ x$clustertype == xlda ]; then $MAHOUT seq2sparse \ -i ${WORK_DIR}/reuters-out-seqdir/ \ -o ${WORK_DIR}/reuters-out-seqdir-sparse-lda -ow --maxDFPercent 85 --namedVector \ \ $MAHOUT rowid \ -i ${WORK_DIR}/reuters-out-seqdir-sparse-lda/tfidf-vectors \ -o ${WORK_DIR}/reuters-out-matrix \ \ rm -rf ${WORK_DIR}/reuters-lda ${WORK_DIR}/reuters-lda-topics ${WORK_DIR}/reuters-lda-model \ \ $MAHOUT cvb \ -i ${WORK_DIR}/reuters-out-matrix/matrix \ -o ${WORK_DIR}/reuters-lda -k 20 -ow -x 20 \ -dict ${WORK_DIR}/reuters-out-seqdir-sparse-lda/dictionary.file-* \ -dt ${WORK_DIR}/reuters-lda-topics \ -mt ${WORK_DIR}/reuters-lda-model \ \ $MAHOUT vectordump \ -i ${WORK_DIR}/reuters-lda-topics/part-m-0 \ -o ${WORK_DIR}/reuters-lda/vectordump \ -vs 10 -p true \ -d ${WORK_DIR}/reuters-out-seqdir-sparse-lda/dictionary.file-* \ -dt sequencefile -sort ${WORK_DIR}/reuters-lda-topics/part-m-0 \ \ cat ${WORK_DIR}/reuters-lda/vectordump So what would possibly be the reason for this exception? Thanks, -- M.P. Tharindu Rusira Kumara Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka. +94757033733 www.tharindu-rusira.blogspot.com
Re: Error while running ./cluster-reuters.sh with option lda clustering
I managed to overcome the issue by using the famous hadoop trick, formatting the namenode and restarting hadoop. But still I have no clue what went wrong the first time but the problem was obviously with Hadoop. $HADOOP_HOME/bin/stop-all.sh $HADOOP_HOME/bin/hadoop namenode -format $HADOOP_HOME/bin/start-all.sh Regards, On Fri, Jan 31, 2014 at 3:28 PM, Tharindu Rusira tharindurus...@gmail.comwrote: Hi all, I'm running Mahout examples from the latest Mahout 0.9 release candidate. I got this error while running ./cluster-reuters.sh with option 3 lda clustering. As to the error log, this does not seem to be a Mahout issue but Hadoop(1.2.1) fails to write to */tmp/mahout-work-tkumara/reuters-lda. *This is however strange because /tmp/mahout-work-tkumara/ does not have a *reuters-lda *directory and the exception stack trace complains that the said directory already exists. 14/01/31 15:20:39 ERROR security.UserGroupInformation: PriviledgedActionException as:tkumara cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /tmp/mahout-work-tkumara/reuters-lda already exists Exception in thread main org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /tmp/mahout-work-tkumara/reuters-lda already exists at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:137) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:973) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:394) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) at org.apache.hadoop.mapreduce.Job.submit(Job.java:550) at org.apache.mahout.clustering.lda.cvb.CVB0Driver.writeTopicModel(CVB0Driver.java:441) at org.apache.mahout.clustering.lda.cvb.CVB0Driver.run(CVB0Driver.java:336) at org.apache.mahout.clustering.lda.cvb.CVB0Driver.run(CVB0Driver.java:198) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.mahout.clustering.lda.cvb.CVB0Driver.main(CVB0Driver.java:534) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) I also checked relevant section in ./cluster-reuters.sh but could not find anything there. elif [ x$clustertype == xlda ]; then $MAHOUT seq2sparse \ -i ${WORK_DIR}/reuters-out-seqdir/ \ -o ${WORK_DIR}/reuters-out-seqdir-sparse-lda -ow --maxDFPercent 85 --namedVector \ \ $MAHOUT rowid \ -i ${WORK_DIR}/reuters-out-seqdir-sparse-lda/tfidf-vectors \ -o ${WORK_DIR}/reuters-out-matrix \ \ rm -rf ${WORK_DIR}/reuters-lda ${WORK_DIR}/reuters-lda-topics ${WORK_DIR}/reuters-lda-model \ \ $MAHOUT cvb \ -i ${WORK_DIR}/reuters-out-matrix/matrix \ -o ${WORK_DIR}/reuters-lda -k 20 -ow -x 20 \ -dict ${WORK_DIR}/reuters-out-seqdir-sparse-lda/dictionary.file-* \ -dt ${WORK_DIR}/reuters-lda-topics \ -mt ${WORK_DIR}/reuters-lda-model \ \ $MAHOUT vectordump \ -i ${WORK_DIR}/reuters-lda-topics/part-m-0 \ -o ${WORK_DIR}/reuters-lda/vectordump \ -vs 10 -p true \ -d ${WORK_DIR}/reuters-out-seqdir-sparse-lda/dictionary.file-* \ -dt sequencefile -sort ${WORK_DIR}/reuters-lda-topics/part-m-0 \ \ cat ${WORK_DIR}/reuters-lda/vectordump So what would possibly be the reason for this exception? Thanks, -- M.P. Tharindu Rusira Kumara Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka. +94757033733 www.tharindu-rusira.blogspot.com -- M.P. Tharindu Rusira Kumara Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka. +94757033733 www.tharindu-rusira.blogspot.com
Re: Mahout 0.9 Release Candidate - VOTE
On Wed, Jan 15, 2014 at 6:48 PM, Chameera Wijebandara chameerawijeband...@gmail.com wrote: Hi Tharindu, Chameera, sorry for the late reply. I'm having issues with my personal computer these days :) Still I could not able to download the artifacts. Could you please hep me to test the Release Have you figured out a way to proceed? I think the given URL is down as Suneel mentioned. Once the fixed release candidate is posted, you can download the source tar and check it as any other Mahout release. Regards, Thanks Chameera On Wed, Jan 15, 2014 at 12:21 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Thanks Tharindu. On Tuesday, January 14, 2014 11:30 PM, Tharindu Rusira tharindurus...@gmail.com wrote: Hi Suneel, I tested the installation process with unit tests and everything went well. (Ubuntu 12.10 32bit, Java 1.7.0_40). Please note that I did not clean my local maven repository before the installation so I assumed maven dependencies are all available . On Tue, Jan 14, 2014 at 7:03 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Here's the link to Release artifacts for Mahout 0.9: https://repository.apache.org/content/repositories/orgapachemahout-1000/ For those volunteering to test this, some of the stuff to look out for: a) Verify u can unpack the Release tar. Verified b) Verify u are able to compile the distribution Verified [INFO] [INFO] Reactor Summary: [INFO] [INFO] Mahout Build Tools SUCCESS [4.380s] [INFO] Apache Mahout . SUCCESS [0.965s] [INFO] Mahout Math ... SUCCESS [2:07.687s] [INFO] Mahout Core ... SUCCESS [10:34.651s] [INFO] Mahout Integration SUCCESS [1:03.250s] [INFO] Mahout Examples ... SUCCESS [16.607s] [INFO] Mahout Release Package SUCCESS [0.469s] [INFO] Mahout Math/Scala wrappers SUCCESS [35.562s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 14:44.158s [INFO] Finished at: Wed Jan 15 09:06:26 IST 2014 [INFO] Final Memory: 41M/252M [INFO] c) Run through the unit tests: mvn clean test Verified. d) Run the example scripts under $MAHOUT_HOME/examples/bin. I'm yet to test the example scripts and I will give an update soon. Regards, See http://incubator.apache.org/guides/releasemanagement.html#check-list for more details. On Tuesday, January 14, 2014 8:26 AM, spa...@gmail.com spa...@gmail.com wrote: I want to volunteer to test this release. What is the procedure/steps to get started and what pre-reqs I need to have? Cheers .S On Tue, Jan 14, 2014 at 6:52 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Calling for volunteers to test this Release. On Friday, January 10, 2014 7:39 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Pushed the Mahout 0.9 Release candidate. See https://repository.apache.org/content/repositories/orgapachemahout-1000/ This is a call for Vote. -- http://spawgi.wordpress.com We can do it and do it better. -- M.P. Tharindu Rusira Kumara Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka. +94757033733 www.tharindu-rusira.blogspot.com -- Thanks, Chameera -- M.P. Tharindu Rusira Kumara Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka. +94757033733 www.tharindu-rusira.blogspot.com
Re: Mahout 0.9 Release Candidate - VOTE
On Thu, Jan 16, 2014 at 9:31 AM, Chameera Wijebandara chameerawijeband...@gmail.com wrote: Tharindu, There is no .tar file in the given link (befor give the 404 error). Well, what about a .zip file? As I can remember, it was a .zip file. Thanks, Chameera On Thu, Jan 16, 2014 at 9:27 AM, Tharindu Rusira tharindurus...@gmail.comwrote: On Wed, Jan 15, 2014 at 6:48 PM, Chameera Wijebandara chameerawijeband...@gmail.com wrote: Hi Tharindu, Chameera, sorry for the late reply. I'm having issues with my personal computer these days :) Still I could not able to download the artifacts. Could you please hep me to test the Release Have you figured out a way to proceed? I think the given URL is down as Suneel mentioned. Once the fixed release candidate is posted, you can download the source tar and check it as any other Mahout release. Regards, Thanks Chameera On Wed, Jan 15, 2014 at 12:21 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Thanks Tharindu. On Tuesday, January 14, 2014 11:30 PM, Tharindu Rusira tharindurus...@gmail.com wrote: Hi Suneel, I tested the installation process with unit tests and everything went well. (Ubuntu 12.10 32bit, Java 1.7.0_40). Please note that I did not clean my local maven repository before the installation so I assumed maven dependencies are all available . On Tue, Jan 14, 2014 at 7:03 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Here's the link to Release artifacts for Mahout 0.9: https://repository.apache.org/content/repositories/orgapachemahout-1000/ For those volunteering to test this, some of the stuff to look out for: a) Verify u can unpack the Release tar. Verified b) Verify u are able to compile the distribution Verified [INFO] [INFO] Reactor Summary: [INFO] [INFO] Mahout Build Tools SUCCESS [4.380s] [INFO] Apache Mahout . SUCCESS [0.965s] [INFO] Mahout Math ... SUCCESS [2:07.687s] [INFO] Mahout Core ... SUCCESS [10:34.651s] [INFO] Mahout Integration SUCCESS [1:03.250s] [INFO] Mahout Examples ... SUCCESS [16.607s] [INFO] Mahout Release Package SUCCESS [0.469s] [INFO] Mahout Math/Scala wrappers SUCCESS [35.562s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 14:44.158s [INFO] Finished at: Wed Jan 15 09:06:26 IST 2014 [INFO] Final Memory: 41M/252M [INFO] c) Run through the unit tests: mvn clean test Verified. d) Run the example scripts under $MAHOUT_HOME/examples/bin. I'm yet to test the example scripts and I will give an update soon. Regards, See http://incubator.apache.org/guides/releasemanagement.html#check-list for more details. On Tuesday, January 14, 2014 8:26 AM, spa...@gmail.com spa...@gmail.com wrote: I want to volunteer to test this release. What is the procedure/steps to get started and what pre-reqs I need to have? Cheers .S On Tue, Jan 14, 2014 at 6:52 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Calling for volunteers to test this Release. On Friday, January 10, 2014 7:39 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: Pushed the Mahout 0.9 Release candidate. See https://repository.apache.org/content/repositories/orgapachemahout-1000/ This is a call for Vote. -- http://spawgi.wordpress.com We can do it and do it better. -- M.P. Tharindu Rusira Kumara Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka. +94757033733 www.tharindu-rusira.blogspot.com -- Thanks, Chameera -- M.P. Tharindu Rusira Kumara Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka. +94757033733 www.tharindu-rusira.blogspot.com -- Thanks, Chameera -- M.P. Tharindu Rusira Kumara Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka. +94757033733 www.tharindu
Edit CMS in anonymous mode
Hi all, I'm new to Mahout CMS and I managed to access as an anonymous user. I edited trunk » templates » standard.html where I do not get a preview as in content markup pages. So how am I supposed to view the effect of changes I made? Basically I'm asking whether it is possible to view my working copy as a separate instance of Mahout website? Thanks, -- M.P. Tharindu Rusira Kumara Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka. +94757033733 www.tharindu-rusira.blogspot.com
Re: CMS Redesign for better usability
On Mon, Jan 6, 2014 at 8:21 PM, Ted Dunning ted.dunn...@gmail.com wrote: Sotiris, It appears that the link to the Mahout twitter handle has been lost in recent changes. Would be nice to repair that. Hi Ted, Isabel and all, I had a look at trunk » templates » standard.html and put a new Twitter widget. I've attached a patch with this mail. Please check it and let me know whether the widget is functioning. (Unfortunately I could not find a way to stage and verify the changes before sending this patch as I logged in anonymously. I tried the embedded code on my blog and it's working as expected). If the patch is working, I will remove the old code and send a cleaner patch :) Regards, On Mon, Jan 6, 2014 at 6:19 AM, Isabel Drost-Fromm isa...@apache.org wrote: On Mon, Jan 06, 2014 at 11:50:14AM +0200, Sotiris Salloumis wrote: Glad for the +1s. Will do above steps and have try to upload them via JIRA by this Thursday EOB CET. Awesome. Thanks. If you run into any issues or questions please to post on this list - we switched only recently so this might also be a good learning for others. Isabel -- M.P. Tharindu Rusira Kumara Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka. +94757033733 www.tharindu-rusira.blogspot.com
Re: Edit CMS in anonymous mode
On Fri, Jan 10, 2014 at 2:08 PM, Sotiris Salloumis i...@eprice.gr wrote: Hi , I had similar issues, I did changes last week committed them as anonymous but the changes where never uploaded. Yes Sotiris, Only commitors are allowed to push changes to staging or production sites. As anonymous editors, all what we can do is submit patches with diff summary. This is what I have understood so far. The tutorial at http://www.youtube.com/watch?v=xcDZN3Lu6HA is for a non-anonymous user . Thanks, I'm currently trying the path via jira and svn will update the list with a tutorial when I've finished successfully. Regards Sotiris -Original Message- From: Tharindu Rusira [mailto:tharindurus...@gmail.com] Sent: Friday, January 10, 2014 10:05 AM To: dev@mahout.apache.org Subject: Edit CMS in anonymous mode Hi all, I'm new to Mahout CMS and I managed to access as an anonymous user. I edited trunk templates standard.html where I do not get a preview as in content markup pages. So how am I supposed to view the effect of changes I made? Basically I'm asking whether it is possible to view my working copy as a separate instance of Mahout website? Thanks, -- M.P. Tharindu Rusira Kumara Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka. +94757033733 www.tharindu-rusira.blogspot.com -- M.P. Tharindu Rusira Kumara Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka. +94757033733 www.tharindu-rusira.blogspot.com
[jira] [Commented] (MAHOUT-1305) Rework the wiki
[ https://issues.apache.org/jira/browse/MAHOUT-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867518#comment-13867518 ] Tharindu Rusira commented on MAHOUT-1305: - Link to the Wikipedia dataset[1] is broken(404 - Not Found) on Naive Bayes Wikipedia example page [2] . Does anybody know where we could find this dataset? [1] http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2.html [2] http://mahout.apache.org/users/classification/wikipedia-bayes-example.html Rework the wiki --- Key: MAHOUT-1305 URL: https://issues.apache.org/jira/browse/MAHOUT-1305 Project: Mahout Issue Type: Bug Components: Documentation Reporter: Sebastian Schelter Priority: Blocker Fix For: 0.9 Attachments: MAHOUT-221213-1315-15716.pdf We should think about completely redoing our wiki. At the moment, we're listing lots of algorithms that we either never implemented or already removed. I also have the impression that a lot of stuff is outdated. It would be awesome if we had an up-to-date documentation of the code with instructions on how to get into using mahout quickly. We should also have examples for all our 3 C's. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: CMS Redesign for better usability
+1 for the new design. On Sat, Jan 4, 2014 at 5:36 AM, Ted Dunning ted.dunn...@gmail.com wrote: Much better. On Fri, Jan 3, 2014 at 3:05 PM, Sotiris Salloumis i...@eprice.gr wrote: Sorry for that please check here http://www.eprice.gr/mahout_cms.png *From:* Ted Dunning [mailto:ted.dunn...@gmail.com] *Sent:* Saturday, January 04, 2014 12:22 AM *To:* Mahout Dev List; i...@eprice.gr *Subject:* Re: CMS Redesign for better usability Attachments are stripped by the mailing list software. Can you post your thumbnails somewhere (pastebin, dropbox, anywhere)? On Fri, Jan 3, 2014 at 2:11 PM, Sotiris Salloumis i...@eprice.gr wrote: Hi all, Going through the pages I believe it will increase usability the removal of the yellow banner, which takes 1/3 of the screen for no obvious reason. I’m attaching a design screenshot to understand the difference ( the message has been moved in the white space near the search form) . Please vote and let me know if I should include it in the update I’m working on. Regards Sotiris -- M.P. Tharindu Rusira Kumara Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka. +94757033733 www.tharindu-rusira.blogspot.com
Re: Getting Involved
Hi, On Thu, Jan 2, 2014 at 7:10 PM, Chameera Wijebandara chameerawijeband...@gmail.com wrote: Tharindu, Thank you for your quick response. Still i cannot identify which jar s to import in to my project Chameera, this actually depends on what you're going to do with Mahout. If you're trying to write some code using Mahout Java API, mahout-core-0.9-SNAPSHOT.jar and mahout-math-0.9-SNAPSHOT.jar will contain most of the classes you need. (Sometimes you may need to import third party dependencies (eg. guava) from mahout-examples/target/dependency) I have attach folder structure with this mail pls help me to identify relevant jar files. Thanks Chameera On Thu, Jan 2, 2014 at 8:23 AM, Tharindu Rusira tharindurus...@gmail.comwrote: Hi Chameera, On Wed, Jan 1, 2014 at 10:21 PM, Chameera Wijebandara chameerawijeband...@gmail.com wrote: Hi I have successfully build mahout from source. But still i couldnt find build files to import to my project(Net beans). If your build process was successful, you should be able to find Mahout JAR files in corresponding target directories. (Is this what you are asking?), or else if you are having troubles with importing Mahout to Netbeans, please refer to Netbeans docs as Isabel suggested. pls help me to make my fist step with mahout. Hope this helps and Good luck !!! Thanks Chameera On Fri, Dec 27, 2013 at 9:13 PM, Chameera Wijebandara chameerawijeband...@gmail.com wrote: Thanks ill work on it On Fri, Dec 27, 2013 at 2:05 AM, Sebastian Schelter ssc.o...@googlemail.com wrote: Hi Chameera, A very good way to get involved would be to play with some of Mahout's functionality (e.g. the recommenders) and write an easy-to-follow tutorial about that, which we can put on our wiki then. Best, Sebastian On 27.12.2013 11:02, Isabel Drost-Fromm wrote: On Fri, Dec 27, 2013 at 01:33:10PM +0530, Chameera Wijebandara wrote: I am currently undergraduate i university of Moratuwa. I am highly interesting machine learning big data and related algorithms. I am happy to contribute to Mahout. Can you please show me the path i want to follow. Welcome. Best way to get started is to check out the code and get it running on some of the machine learning problems that you are interested in. There's a more detailed explanation available in our docs: http://mahout.apache.org/developers/how-to-contribute.html For more specific tasks also check out the following mail I wrote several months ago: http://markmail.org/message/jhdjlrom2jvcjx5v Isabel -- -- Chameera Wijebandara, Undergraduate Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka. -- M.P. Tharindu Rusira Kumara Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka. +94757033733 www.tharindu-rusira.blogspot.com -- M.P. Tharindu Rusira Kumara Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka. +94757033733 www.tharindu-rusira.blogspot.com
Re: Happy Holidays!
Happy Holidays everyone !!! :) On Wed, Dec 25, 2013 at 8:09 AM, Andrew Musselman andrew.mussel...@gmail.com wrote: Merry Christmas and a Happy New Year! On Dec 24, 2013, at 3:36 PM, Stevo Slavić ssla...@gmail.com wrote: Happy Holidays Everyone! On Tue, Dec 24, 2013 at 12:28 PM, Frank Scholten fr...@frankscholten.nl wrote: Best wishes! On Tue, Dec 24, 2013 at 11:11 AM, Sebastian Schelter s...@apache.org wrote: dito! On 24.12.2013 11:09, Isabel Drost-Fromm wrote: I'd like to take some time and wish everyone a Happy Holiday! Enjoy the time with your family and friends. Thank you all for your contributions and work on Mahout. Looking forward to an exciting 2014. Isabel -- M.P. Tharindu Rusira Kumara Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka. +94757033733 www.tharindu-rusira.blogspot.com
[jira] [Updated] (MAHOUT-1242) No key redistribution function for associative maps
[ https://issues.apache.org/jira/browse/MAHOUT-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tharindu Rusira updated MAHOUT-1242: Attachment: MAHOUT-1242.patch Hi [~dawidweiss], I'm currently working on this issue and for the time being I attach a simple implementation of the final step of murmurhash3 as you suggested. Your feedback is highly appreciated. Thanks. P.S. not tested No key redistribution function for associative maps --- Key: MAHOUT-1242 URL: https://issues.apache.org/jira/browse/MAHOUT-1242 Project: Mahout Issue Type: Improvement Components: collections, Math Reporter: Dawid Weiss Attachments: MAHOUT-1242.patch All integer-based maps currently use HashFunctions.hash(int) which just returns the key value: {code} /** * Returns a hashcode for the specified value. * * @return a hash code value for the specified value. */ public static int hash(int value) { return value; //return value * 0x278DDE6D; // see org.apache.mahout.math.jet.random.engine.DRand /* value = 0x7FFF; // make it =0 int hashCode = 0; do hashCode = 31*hashCode + value%10; while ((value /= 10) 0); return 28629151*hashCode; // spread even further; h*31^5 */ } {code} This easily leads to very degenerate behavior on keys that have constant lower bits (long collision chains). A simple (and strong) hash function like the final step of murmurhash3 goes a long way at ensuring the keys distribution is more uniform regardless of the input distribution. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (MAHOUT-1242) No key redistribution function for associative maps
[ https://issues.apache.org/jira/browse/MAHOUT-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837560#comment-13837560 ] Tharindu Rusira edited comment on MAHOUT-1242 at 12/3/13 10:46 AM: --- Hi [~dweiss], I'm currently working on this issue and for the time being I attach a simple implementation of the final step of murmurhash3 as you suggested. Your feedback is highly appreciated. Thanks. P.S. not tested was (Author: tharindu_rusira): Hi [~dawidweiss], I'm currently working on this issue and for the time being I attach a simple implementation of the final step of murmurhash3 as you suggested. Your feedback is highly appreciated. Thanks. P.S. not tested No key redistribution function for associative maps --- Key: MAHOUT-1242 URL: https://issues.apache.org/jira/browse/MAHOUT-1242 Project: Mahout Issue Type: Improvement Components: collections, Math Reporter: Dawid Weiss Attachments: MAHOUT-1242.patch All integer-based maps currently use HashFunctions.hash(int) which just returns the key value: {code} /** * Returns a hashcode for the specified value. * * @return a hash code value for the specified value. */ public static int hash(int value) { return value; //return value * 0x278DDE6D; // see org.apache.mahout.math.jet.random.engine.DRand /* value = 0x7FFF; // make it =0 int hashCode = 0; do hashCode = 31*hashCode + value%10; while ((value /= 10) 0); return 28629151*hashCode; // spread even further; h*31^5 */ } {code} This easily leads to very degenerate behavior on keys that have constant lower bits (long collision chains). A simple (and strong) hash function like the final step of murmurhash3 goes a long way at ensuring the keys distribution is more uniform regardless of the input distribution. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAHOUT-1242) No key redistribution function for associative maps
[ https://issues.apache.org/jira/browse/MAHOUT-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tharindu Rusira updated MAHOUT-1242: Attachment: MAHOUT-1242.patch Thanks [~dweiss] for the quick feedback. I'm attaching the reworked patch. By the way, any ideas of a better hash mechanism you can think of (which suits this context)? No key redistribution function for associative maps --- Key: MAHOUT-1242 URL: https://issues.apache.org/jira/browse/MAHOUT-1242 Project: Mahout Issue Type: Improvement Components: collections, Math Reporter: Dawid Weiss Attachments: MAHOUT-1242.patch, MAHOUT-1242.patch All integer-based maps currently use HashFunctions.hash(int) which just returns the key value: {code} /** * Returns a hashcode for the specified value. * * @return a hash code value for the specified value. */ public static int hash(int value) { return value; //return value * 0x278DDE6D; // see org.apache.mahout.math.jet.random.engine.DRand /* value = 0x7FFF; // make it =0 int hashCode = 0; do hashCode = 31*hashCode + value%10; while ((value /= 10) 0); return 28629151*hashCode; // spread even further; h*31^5 */ } {code} This easily leads to very degenerate behavior on keys that have constant lower bits (long collision chains). A simple (and strong) hash function like the final step of murmurhash3 goes a long way at ensuring the keys distribution is more uniform regardless of the input distribution. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAHOUT-1242) No key redistribution function for associative maps
[ https://issues.apache.org/jira/browse/MAHOUT-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tharindu Rusira updated MAHOUT-1242: Attachment: (was: MAHOUT-1242.patch) No key redistribution function for associative maps --- Key: MAHOUT-1242 URL: https://issues.apache.org/jira/browse/MAHOUT-1242 Project: Mahout Issue Type: Improvement Components: collections, Math Reporter: Dawid Weiss Attachments: MAHOUT-1242.patch All integer-based maps currently use HashFunctions.hash(int) which just returns the key value: {code} /** * Returns a hashcode for the specified value. * * @return a hash code value for the specified value. */ public static int hash(int value) { return value; //return value * 0x278DDE6D; // see org.apache.mahout.math.jet.random.engine.DRand /* value = 0x7FFF; // make it =0 int hashCode = 0; do hashCode = 31*hashCode + value%10; while ((value /= 10) 0); return 28629151*hashCode; // spread even further; h*31^5 */ } {code} This easily leads to very degenerate behavior on keys that have constant lower bits (long collision chains). A simple (and strong) hash function like the final step of murmurhash3 goes a long way at ensuring the keys distribution is more uniform regardless of the input distribution. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAHOUT-1285) Arff loader can misparse string data as double
[ https://issues.apache.org/jira/browse/MAHOUT-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13835177#comment-13835177 ] Tharindu Rusira commented on MAHOUT-1285: - [~neilwalkinshaw], could you please show me how to re-generate this exception? Thanks Arff loader can misparse string data as double -- Key: MAHOUT-1285 URL: https://issues.apache.org/jira/browse/MAHOUT-1285 Project: Mahout Issue Type: Bug Affects Versions: 0.9 Environment: Linux Ubuntu 12.4 Reporter: Neil Walkinshaw Fix For: Backlog Attachments: tempArff Have successfully loaded numerous ARFF files with Mahout (originally generated via WEKA). The files contain randomly generated data. For a specific random seed, the following exception is thrown: java.lang.NumberFormatException: For input string: b1shkt70694difsmmmdv0ikmoh at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241) at java.lang.Double.parseDouble(Double.java:540) at org.apache.mahout.utils.vectors.arff.MapBackedARFFModel.processNumeric(MapBackedARFFModel.java:146) at org.apache.mahout.utils.vectors.arff.MapBackedARFFModel.getValue(MapBackedARFFModel.java:97) at org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:77) at org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:30) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:44) at org.apache.mahout.utils.vectors.arff.Driver.writeFile(Driver.java:251) at org.apache.mahout.utils.vectors.arff.Driver.main(Driver.java:145) at libInterfaces.MahoutTraceBuilder.generateMahoutFile(MahoutTraceBuilder.java:38) at libInterfaces.MahoutTraceBuilder.generateMahoutReader(MahoutTraceBuilder.java:42) at tests.InputTester.testMahoutMeansShift(InputTester.java:111) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Issue Comment Deleted] (MAHOUT-1285) Arff loader can misparse string data as double
[ https://issues.apache.org/jira/browse/MAHOUT-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tharindu Rusira updated MAHOUT-1285: Comment: was deleted (was: [~neilwalkinshaw], could you please show me how to re-generate this exception? Thanks) Arff loader can misparse string data as double -- Key: MAHOUT-1285 URL: https://issues.apache.org/jira/browse/MAHOUT-1285 Project: Mahout Issue Type: Bug Affects Versions: 0.9 Environment: Linux Ubuntu 12.4 Reporter: Neil Walkinshaw Fix For: Backlog Attachments: tempArff Have successfully loaded numerous ARFF files with Mahout (originally generated via WEKA). The files contain randomly generated data. For a specific random seed, the following exception is thrown: java.lang.NumberFormatException: For input string: b1shkt70694difsmmmdv0ikmoh at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241) at java.lang.Double.parseDouble(Double.java:540) at org.apache.mahout.utils.vectors.arff.MapBackedARFFModel.processNumeric(MapBackedARFFModel.java:146) at org.apache.mahout.utils.vectors.arff.MapBackedARFFModel.getValue(MapBackedARFFModel.java:97) at org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:77) at org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:30) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:44) at org.apache.mahout.utils.vectors.arff.Driver.writeFile(Driver.java:251) at org.apache.mahout.utils.vectors.arff.Driver.main(Driver.java:145) at libInterfaces.MahoutTraceBuilder.generateMahoutFile(MahoutTraceBuilder.java:38) at libInterfaces.MahoutTraceBuilder.generateMahoutReader(MahoutTraceBuilder.java:42) at tests.InputTester.testMahoutMeansShift(InputTester.java:111) -- This message was sent by Atlassian JIRA (v6.1#6144)
(MAHOUT-1285) Re-generate the exception programmatically
Hi, Pardon if I'm missing something trivial , I'm new to Mahout. Is there a way to generate this exception scenario from the code (within a debugger) ? I could only find this [1], which says how to load arff files from the command line. [1] https://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors+from+Weka's+ARFF+Format# Thanks.
[jira] [Commented] (MAHOUT-1305) Rework the wiki
[ https://issues.apache.org/jira/browse/MAHOUT-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13832390#comment-13832390 ] Tharindu Rusira commented on MAHOUT-1305: - I found that the following link in Matrix and Vector Needs page [1] is broken (404 Not Found) Background: http://mail-archives.apache.org/mod_mbox/lucene-mahout-dev/200802.mbox/browser [1] https://cwiki.apache.org/confluence/display/MAHOUT/Matrix+and+Vector+Needs# Please take this into consideration while updating the Wiki. Rework the wiki --- Key: MAHOUT-1305 URL: https://issues.apache.org/jira/browse/MAHOUT-1305 Project: Mahout Issue Type: Bug Components: Website Reporter: Sebastian Schelter Priority: Blocker Fix For: 0.9 We should think about completely redoing our wiki. At the moment, we're listing lots of algorithms that we either never implemented or already removed. I also have the impression that a lot of stuff is outdated. It would be awesome if we had an up-to-date documentation of the code with instructions on how to get into using mahout quickly. We should also have examples for all our 3 C's. -- This message was sent by Atlassian JIRA (v6.1#6144)
Wiki - Broken link
I found that the following link in Matrix and Vector Needs page [1] is broken (404 Not Found) Background: http://mail-archives.apache.org/mod_mbox/lucene-mahout-dev/200802.mbox/browser I think no issue has been raised for this particular problem in JIRA even though similar issues have been reported. Should I create a separate issue for this ? [1] https://cwiki.apache.org/confluence/display/MAHOUT/Matrix+and+Vector+Needs# Thanks,
[jira] [Commented] (MAHOUT-1307) Distinguish implemented algorithms from algorithms which may be implemented in the future in algorithms page
[ https://issues.apache.org/jira/browse/MAHOUT-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830974#comment-13830974 ] Tharindu Rusira commented on MAHOUT-1307: - Hi [~yamakatu], would having different lists for integrated, open and currently being developed help? But in my opinion, the existing page structure helps more as the current level of development of an algorithm can be found by referring to the category which the algorithm belongs. (classification, clustering etc.) Distinguish implemented algorithms from algorithms which may be implemented in the future in algorithms page Key: MAHOUT-1307 URL: https://issues.apache.org/jira/browse/MAHOUT-1307 Project: Mahout Issue Type: Documentation Components: Website Affects Versions: 0.8 Reporter: yamakatu Priority: Minor Fix For: 0.9 In case of the description of the Mahout algorithms web page, (https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms) the algorithms which may be implemented in the future are easy to be confused with the already implemented algorithms, and I think that it is difficult to recognize both intuitively. I think that both algorithms should be distinguished more clearly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[MAHOUT-953]ArffVectorIterable does not gracefully handle duplicate attribute name
Hi, I am currently looking at the issue [MAHOUT-953] [1] and I'm working on the 0.9-SNAPSHOT version of the code. Please let me know if this issue still exists in the development version because I cannot find the said ARFFVectorIterable.hasNext method in mahout/integration/src/main/java/org/apache/mahout/utils/vectors/arff/ARFFVectorIterable.java. Am I looking at the wrong class ? [1] https://issues.apache.org/jira/browse/MAHOUT-953 Regards, -- M.P. Tharindu Rusira Kumara Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka. +94757033733 www.tharindu-rusira.blogspot.com
Re: [MAHOUT-953]ArffVectorIterable does not gracefully handle duplicate attribute name
I made a mistake in the previous mail, It should be ARFFVectorIterable.computeNext On Mon, Nov 25, 2013 at 9:48 AM, Tharindu Rusira tharindurus...@gmail.comwrote: Hi, I am currently looking at the issue [MAHOUT-953] [1] and I'm working on the 0.9-SNAPSHOT version of the code. Please let me know if this issue still exists in the development version because I cannot find the said ARFFVectorIterable.hasNext method in mahout/integration/src/main/java/org/apache/mahout/utils/vectors/arff/ARFFVectorIterable.java. Am I looking at the wrong class ? [1] https://issues.apache.org/jira/browse/MAHOUT-953 Regards, -- M.P. Tharindu Rusira Kumara Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka. +94757033733 www.tharindu-rusira.blogspot.com -- M.P. Tharindu Rusira Kumara Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka. +94757033733 www.tharindu-rusira.blogspot.com