[jira] [Updated] (MAHOUT-1270) Broken link on Developer Resources page
[ https://issues.apache.org/jira/browse/MAHOUT-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1270: -- Affects Version/s: 0.7 Fix Version/s: 0.8 Broken link on Developer Resources page --- Key: MAHOUT-1270 URL: https://issues.apache.org/jira/browse/MAHOUT-1270 Project: Mahout Issue Type: Bug Components: Website Affects Versions: 0.7 Reporter: Erhan Bagdemir Assignee: Robin Anil Priority: Minor Fix For: 0.8 The link How to contribute on the page https://cwiki.apache.org/confluence/display/MAHOUT/Developer+Resources is broken :-| https://cwiki.apache.org/MAHOUT/how-to-contribute.html returns 404. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1202) Speed up Vector operations
[ https://issues.apache.org/jira/browse/MAHOUT-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1202: -- Fix Version/s: 0.8 Speed up Vector operations -- Key: MAHOUT-1202 URL: https://issues.apache.org/jira/browse/MAHOUT-1202 Project: Mahout Issue Type: Improvement Components: Math Affects Versions: 0.8 Reporter: Dan Filimon Fix For: 0.8 Vector assign() and aggregate() can be significantly improved in some conditions taking into account the different properties of the vectors we're working with. This issue relates to the design document at https://docs.google.com/document/d/1g1PjUuvjyh2LBdq2_rKLIcUiDbeOORA1sCJiSsz-JVU/edit#heading=h.koi571fvwha3jj and the patch at https://reviews.apache.org/r/10669 The benchmarks are at https://docs.google.com/spreadsheet/ccc?key=0AochdzPoBmWodG9RTms1UG40YlNQd3ByUFpQY0FLWmcpli=1#gid=10 and while there are a few regressions (which will be fixed later regarding RandomAccessSparseVectors), it improves a lot of benchmarks as well as cleans up the code significantly. Part 1, the new function interfaces is merged. [Committed revision 1478853.] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1216) Add locality sensitive hashing and a LocalitySensitiveHash searcher
[ https://issues.apache.org/jira/browse/MAHOUT-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1216: -- Fix Version/s: 0.8 Add locality sensitive hashing and a LocalitySensitiveHash searcher --- Key: MAHOUT-1216 URL: https://issues.apache.org/jira/browse/MAHOUT-1216 Project: Mahout Issue Type: New Feature Components: Math Affects Versions: 0.8 Reporter: Dan Filimon Fix For: 0.8 This issue tackles the LocalitySensitiveHashSearch, that was initially supposed to be part of MAHOUT-1156. It adds HashedVector, the class that adds the LSH to vectors, a new searcher (although a better implementation is possible) and adds support in the existing tests and new StreamingKMeans infrastructure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1205) ParallelALSFactorizationJob should leverage the distributed cache
[ https://issues.apache.org/jira/browse/MAHOUT-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1205: -- Fix Version/s: 0.8 ParallelALSFactorizationJob should leverage the distributed cache - Key: MAHOUT-1205 URL: https://issues.apache.org/jira/browse/MAHOUT-1205 Project: Mahout Issue Type: Improvement Components: Collaborative Filtering Affects Versions: 0.8 Reporter: Sebastian Schelter Assignee: Sebastian Schelter Fix For: 0.8 ParallelALSFactorizationJob should use the DistributedCache to broadcast the feature matrices only once per re-computation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1198) Allow Latex in javadox
[ https://issues.apache.org/jira/browse/MAHOUT-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1198: -- Affects Version/s: 0.7 Fix Version/s: 0.8 Allow Latex in javadox -- Key: MAHOUT-1198 URL: https://issues.apache.org/jira/browse/MAHOUT-1198 Project: Mahout Issue Type: Bug Affects Versions: 0.7 Reporter: Ted Dunning Fix For: 0.8 We are headed into a release (hopefully) and now would be a nice time to add the capability to generate javadocs with embedded latex. Following a hint from commons math, I tested a way to inject mathjax into the header of the resulting web-site and got good results (see http://tdunning.github.io/bandit-ranking/ especially docs for GammaNormalDistribution and BetaBinomialDistribution. The basic idea is that we need to add the following config to the javadocs plugin: {quote} configuration additionalparam-header apos;lt;script type=quot;text/javascriptquot; src=quot;http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTMLquot;gt; lt;/scriptgt;apos;/additionalparam /configuration {quote} Having done this, \[ \] and \( \) can be used to embed latex equations in the javadocs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1182) remove useless append
[ https://issues.apache.org/jira/browse/MAHOUT-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1182: -- Affects Version/s: 0.7 Fix Version/s: 0.8 remove useless append - Key: MAHOUT-1182 URL: https://issues.apache.org/jira/browse/MAHOUT-1182 Project: Mahout Issue Type: Improvement Components: Integration Affects Versions: 0.7 Reporter: Dave Brosius Priority: Trivial Fix For: 0.8 Attachments: uselessappend.txt .append() removed -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1119) code bug in org.apache.mahout.text.SequenceFilesFromDirectory
[ https://issues.apache.org/jira/browse/MAHOUT-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1119: -- Fix Version/s: 0.8 code bug in org.apache.mahout.text.SequenceFilesFromDirectory - Key: MAHOUT-1119 URL: https://issues.apache.org/jira/browse/MAHOUT-1119 Project: Mahout Issue Type: Bug Components: Integration Affects Versions: 0.7 Environment: linux、JDK1.6 Reporter: 徐家 Assignee: Sebastian Schelter Labels: SequenceFilesFromDirectory Fix For: 0.8 Original Estimate: 1h Remaining Estimate: 1h in org.apache.mahout.text.SequenceFilesFromDirectory from line 89 to 96 the code is pathFilterClass.getConstructor(Configuration.class, String.class, Map.class, ChunkedWriter.class, Charset.class, FileSystem.class); pathFilter = constructor.newInstance(conf, keyPrefix, options, writer, charset,fs); obviously,the method constructor.newInstance lacks a parameter charset,if i implements a subclass of SequenceFilesFromDirectoryFilter,there will be a runtime error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1148) QR Decomposition is too slow
[ https://issues.apache.org/jira/browse/MAHOUT-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1148: -- Affects Version/s: 0.7 Fix Version/s: 0.8 QR Decomposition is too slow Key: MAHOUT-1148 URL: https://issues.apache.org/jira/browse/MAHOUT-1148 Project: Mahout Issue Type: Bug Affects Versions: 0.7 Reporter: Ted Dunning Fix For: 0.8 A user reported that QR decomposition is too slow. I coded up a replacement that can be 10x faster under certain cases and the new version is also tested. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1127) OnlineLogisticRegression test is flaky (and wrong)
[ https://issues.apache.org/jira/browse/MAHOUT-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1127: -- Affects Version/s: 0.7 Fix Version/s: 0.8 OnlineLogisticRegression test is flaky (and wrong) -- Key: MAHOUT-1127 URL: https://issues.apache.org/jira/browse/MAHOUT-1127 Project: Mahout Issue Type: Bug Affects Versions: 0.7 Reporter: Ted Dunning Fix For: 0.8 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1174) Lanczos code and javadocs should refer users to the SSVD stuff
[ https://issues.apache.org/jira/browse/MAHOUT-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1174: -- Affects Version/s: 0.7 Fix Version/s: 0.8 Lanczos code and javadocs should refer users to the SSVD stuff -- Key: MAHOUT-1174 URL: https://issues.apache.org/jira/browse/MAHOUT-1174 Project: Mahout Issue Type: Bug Affects Versions: 0.7 Reporter: Ted Dunning Assignee: Ted Dunning Fix For: 0.8 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1166) Multithreaded version of distributed ALS
[ https://issues.apache.org/jira/browse/MAHOUT-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1166: -- Affects Version/s: 0.7 Fix Version/s: 0.8 Multithreaded version of distributed ALS Key: MAHOUT-1166 URL: https://issues.apache.org/jira/browse/MAHOUT-1166 Project: Mahout Issue Type: Improvement Components: Collaborative Filtering Affects Versions: 0.7 Reporter: Sebastian Schelter Assignee: Sebastian Schelter Fix For: 0.8 Attachments: MAHOUT-1166.patch Our implementation of ALS broadcasts the feature matrices in each iteration. Therefore, it makes sense to run the mappers in multithreaded mode to not have to load one copy of the feature matrix per core, but share the read-only in-memory copy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1173) Reactivate checkstyle
[ https://issues.apache.org/jira/browse/MAHOUT-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1173: -- Fix Version/s: 0.8 Reactivate checkstyle -- Key: MAHOUT-1173 URL: https://issues.apache.org/jira/browse/MAHOUT-1173 Project: Mahout Issue Type: Improvement Affects Versions: 0.8 Reporter: Sebastian Schelter Assignee: Sebastian Schelter Fix For: 0.8 Attachments: mahout-checkstyle.xml I would like to reactivate checkstyle in our build. IMHO we should not make it fail on checkstyle errors at the moment (anyone disagree on this?). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1143) DecisionForest classifier should output label string instead of code
[ https://issues.apache.org/jira/browse/MAHOUT-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1143: -- Affects Version/s: 0.7 Fix Version/s: 0.8 DecisionForest classifier should output label string instead of code Key: MAHOUT-1143 URL: https://issues.apache.org/jira/browse/MAHOUT-1143 Project: Mahout Issue Type: Improvement Components: Classification Affects Versions: 0.7 Reporter: Deneche A. Hakim Assignee: Deneche A. Hakim Priority: Critical Fix For: 0.8 Attachments: MAHOUT-1143.patch when calling TestForest with a classification problem, output labels are numerical values corresponding to the label's internal code. TestForest should instead output the string label instead of the code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1157) AbstractCluster.formatVector iteration bug.
[ https://issues.apache.org/jira/browse/MAHOUT-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1157: -- Fix Version/s: 0.8 AbstractCluster.formatVector iteration bug. --- Key: MAHOUT-1157 URL: https://issues.apache.org/jira/browse/MAHOUT-1157 Project: Mahout Issue Type: Bug Components: Clustering Affects Versions: 0.7 Reporter: Adam Bozanich Fix For: 0.8 Attachments: mahout.patch AbstractCluster.formatVector's use of the size field of the given vector causes problems when the vector is sparse. I clustered a handful of vectors which had been initialized with a cardinality of Integer.MAX_VALUE. Running seqdump on the resulting clusteredPoints took over four minutes. This is because formatVector() was iterating over the entire integer space for every vector. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1151) Object reuse in distributed ALS
[ https://issues.apache.org/jira/browse/MAHOUT-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1151: -- Fix Version/s: 0.8 Object reuse in distributed ALS --- Key: MAHOUT-1151 URL: https://issues.apache.org/jira/browse/MAHOUT-1151 Project: Mahout Issue Type: Improvement Components: Collaborative Filtering Affects Versions: 0.8 Reporter: Sebastian Schelter Assignee: Sebastian Schelter Fix For: 0.8 Attachments: MAHOUT-1151-2.patch, MAHOUT-1151.patch In order to improve the performance our distributed ALS code, we should try to avoid object instantiation as much as possible, especially when it is done per input tuple. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1019) VectorDistanceSimilarityJob
[ https://issues.apache.org/jira/browse/MAHOUT-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1019: -- Fix Version/s: 0.8 VectorDistanceSimilarityJob --- Key: MAHOUT-1019 URL: https://issues.apache.org/jira/browse/MAHOUT-1019 Project: Mahout Issue Type: Improvement Components: Math Affects Versions: 0.8 Environment: all Reporter: Timothy Potter Priority: Minor Labels: VectorDistanceSimilarityJob, distance, vector Fix For: 0.8 Attachments: MAHOUT-1019.patch Original Estimate: 12h Remaining Estimate: 12h The VectorDistanceSimilarityJob is a fantastic tool, but poses the risk of creating terabytes of output of dubious value. For example, I have ~10K seed vectors and millions of vectors to compute the similarity between so I would like to add an optional parameter to this job to specify a maximum distance threshold that prevents any distances above this value from being written to the output. The default would be 1.0d so no filtering is applied which ensures backwards compatibility, but if supplied, only rows where the distance is less than the threshold would be output from the mapper. This can help reduce the storage requirements of the output immensely. Probably name the parameter something like: noOutputIfDistanceGreaterThan -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1082) driver seqdirectory fails with param -filter set
[ https://issues.apache.org/jira/browse/MAHOUT-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1082: -- Fix Version/s: 0.8 driver seqdirectory fails with param -filter set Key: MAHOUT-1082 URL: https://issues.apache.org/jira/browse/MAHOUT-1082 Project: Mahout Issue Type: Bug Components: Integration Affects Versions: 0.7 Reporter: Johannes Rauber Priority: Minor Fix For: 0.8 The following error is thrown when an own implementation of PrefixAdditionFilter is specified with parameter -filter for seqdirectory. Exception in thread main java.lang.IllegalArgumentException: wrong number of arguments at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:96) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:53) In class org.apache.mahout.text.SequenceFilesFromDirectory line 96 the following additional parameter should be inserted into the reflection call of the ctor: charset Raises Error: pathFilter = constructor.newInstance(conf, keyPrefix, options, writer, fs); Fix: pathFilter = constructor.newInstance(conf, keyPrefix, options, writer, charset, fs); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1075) ClusterDumper output file should be optional
[ https://issues.apache.org/jira/browse/MAHOUT-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1075: -- Fix Version/s: 0.8 ClusterDumper output file should be optional Key: MAHOUT-1075 URL: https://issues.apache.org/jira/browse/MAHOUT-1075 Project: Mahout Issue Type: Bug Affects Versions: 0.8 Reporter: Dave Byrne Fix For: 0.8 Attachments: clusterdumper_out.patch ClusterDumper output option should be optional, defaults to System.out if -o is not specified -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1093) CrossFoldLearner trains in all folds if trackign key is negative
[ https://issues.apache.org/jira/browse/MAHOUT-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1093: -- Affects Version/s: 0.7 Fix Version/s: 0.8 CrossFoldLearner trains in all folds if trackign key is negative Key: MAHOUT-1093 URL: https://issues.apache.org/jira/browse/MAHOUT-1093 Project: Mahout Issue Type: Bug Components: Classification Affects Versions: 0.7 Reporter: Eric Springer Assignee: Sebastian Schelter Fix For: 0.8 See: https://github.com/apache/mahout/pull/7 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1171) PMD regression
[ https://issues.apache.org/jira/browse/MAHOUT-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1171: -- Affects Version/s: 0.7 Fix Version/s: 0.8 PMD regression -- Key: MAHOUT-1171 URL: https://issues.apache.org/jira/browse/MAHOUT-1171 Project: Mahout Issue Type: Bug Affects Versions: 0.7 Reporter: Ted Dunning Assignee: Ted Dunning Fix For: 0.8 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1167) Parallel item similarity precomputation on a single machine
[ https://issues.apache.org/jira/browse/MAHOUT-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1167: -- Fix Version/s: 0.8 Parallel item similarity precomputation on a single machine --- Key: MAHOUT-1167 URL: https://issues.apache.org/jira/browse/MAHOUT-1167 Project: Mahout Issue Type: New Feature Components: Collaborative Filtering Affects Versions: 0.8 Reporter: Sebastian Schelter Assignee: Sebastian Schelter Fix For: 0.8 Attachments: MAHOUT-1167.patch We need some code for item-based CF usecases with an intermediate data size (e.g., a few million interactions). In such cases, the data might be too big to allow online computation of similarities and recommendations, but at the same time, going to Hadoop might still not be necessary and desired. In such a case, it makes sense to precompute item similarities on a single machine. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1150) ARFF Integration does not support quoted identifiers
[ https://issues.apache.org/jira/browse/MAHOUT-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1150: -- Fix Version/s: 0.8 ARFF Integration does not support quoted identifiers Key: MAHOUT-1150 URL: https://issues.apache.org/jira/browse/MAHOUT-1150 Project: Mahout Issue Type: Bug Components: Integration Affects Versions: 0.7 Environment: All Reporter: Marty Kube Fix For: 0.8 Attachments: MAHOUT-1150.patch I ran the NSL-KDD data set (http://nsl.cs.unb.ca/NSL-KDD/) through the ARFF integration. The process failed to parse the arff formatted file. The file has quoted identifiers: @relation 'KDDTrain-20Percent' @attribute 'duration' real @attribute 'protocol_type' {'tcp','udp', 'icmp'} The quotes caused the problem. The official arff BNF shows that quotes should be supported: https://list.scms.waikato.ac.nz/mailman/htdig/wekalist/2008-January/012153.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1114) Some delegating vectors have subtle clone bug
[ https://issues.apache.org/jira/browse/MAHOUT-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1114: -- Affects Version/s: 0.7 Fix Version/s: 0.8 Some delegating vectors have subtle clone bug - Key: MAHOUT-1114 URL: https://issues.apache.org/jira/browse/MAHOUT-1114 Project: Mahout Issue Type: Improvement Affects Versions: 0.7 Reporter: Ted Dunning Fix For: 0.8 Cloning a Centroid returns a WeightedVector. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1031) Drop empty vectors in encoding pipeline
[ https://issues.apache.org/jira/browse/MAHOUT-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1031: -- Fix Version/s: 0.8 Drop empty vectors in encoding pipeline --- Key: MAHOUT-1031 URL: https://issues.apache.org/jira/browse/MAHOUT-1031 Project: Mahout Issue Type: Bug Affects Versions: 0.7 Reporter: Robin Anil Assignee: Robin Anil Fix For: 0.8 Attachments: MAHOUT-1031.patch, MAHOUT-1031.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1113) Need test case to demonstrate simple use of SGD
[ https://issues.apache.org/jira/browse/MAHOUT-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1113: -- Affects Version/s: 0.7 Fix Version/s: 0.8 Need test case to demonstrate simple use of SGD --- Key: MAHOUT-1113 URL: https://issues.apache.org/jira/browse/MAHOUT-1113 Project: Mahout Issue Type: Improvement Affects Versions: 0.7 Reporter: Ted Dunning Priority: Minor Fix For: 0.8 Need a test case that shows how to use SGD on a well known data set like the Iris data. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-707) Setup Jenkins Jobs to validate our Examples/bin Scripts
[ https://issues.apache.org/jira/browse/MAHOUT-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-707: - Affects Version/s: 0.7 Fix Version/s: 0.8 Setup Jenkins Jobs to validate our Examples/bin Scripts --- Key: MAHOUT-707 URL: https://issues.apache.org/jira/browse/MAHOUT-707 Project: Mahout Issue Type: Task Affects Versions: 0.7 Reporter: Grant Ingersoll Fix For: 0.8 We should setup Jenkins to run our example scripts on a regular basis (See MAHOUT-694) and check for breakage. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1061) mapreduce split causes ClassNotFound exception
[ https://issues.apache.org/jira/browse/MAHOUT-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1061: -- Fix Version/s: 0.8 mapreduce split causes ClassNotFound exception -- Key: MAHOUT-1061 URL: https://issues.apache.org/jira/browse/MAHOUT-1061 Project: Mahout Issue Type: Bug Components: Integration Affects Versions: 0.7 Reporter: David Engel Assignee: Sebastian Schelter Labels: patch Fix For: 0.8 Running the split program in mapreduce mode, e.g. mahout split -xm mapreduce ... results in a ClassNotFound exception because the job jar is not set. The following patch fixes the problem for me. diff -ur mahout-distribution-0.7.orig/integration/src/main/java/org/apache/mahout/utils/SplitInputJob.java mahout-distribution-0.7/integration/src/main/java/org/apache/mahout/utils/SplitInputJob.java --- mahout-distribution-0.7.orig/integration/src/main/java/org/apache/mahout/utils/SplitInputJob.java 2012-06-12 03:30:39.0 -0500 +++ mahout-distribution-0.7/integration/src/main/java/org/apache/mahout/utils/SplitInputJob.java 2012-08-20 17:28:18.0 -0500 @@ -114,6 +114,6 @@ // Setup job with new API Job job = new Job(oldApiJob); +job.setJarByClass(SplitInputJob.class); FileInputFormat.addInputPath(job, inputPath); FileOutputFormat.setOutputPath(job, outputPath); job.setNumReduceTasks(1); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-917) Build takes too long
[ https://issues.apache.org/jira/browse/MAHOUT-917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-917: - Affects Version/s: 0.6 0.7 Fix Version/s: 0.8 Build takes too long Key: MAHOUT-917 URL: https://issues.apache.org/jira/browse/MAHOUT-917 Project: Mahout Issue Type: Improvement Components: build Affects Versions: 0.6, 0.7 Reporter: Frank Scholten Assignee: Sebastian Schelter Fix For: 0.8 On my machine a full mvn clean install takes 55 minutes. As an experiment I put all MapReduce job tests for all clustering algorithms on ignore. This reduces the build to 45 minutes. There are a lot of these long running tests in the project. What about creating a separate maven profile for the nightly build that run all MapReduce job tests? For this we have to move these MapReduce tests to separate classes with a naming convention such as *JobTest or *IntegrationTest and add some maven configuration. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1104) Improve Javadoc for AbstractVectorClassifier
[ https://issues.apache.org/jira/browse/MAHOUT-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1104: -- Affects Version/s: 0.7 Fix Version/s: 0.8 Improve Javadoc for AbstractVectorClassifier Key: MAHOUT-1104 URL: https://issues.apache.org/jira/browse/MAHOUT-1104 Project: Mahout Issue Type: Improvement Components: Classification Affects Versions: 0.7 Reporter: Timothy Mann Priority: Minor Labels: classification, documentation, patch Fix For: 0.8 Attachments: classifier_jdoc.patch Original Estimate: 1h Remaining Estimate: 1h Modify javadocs for AbstractVectorClassifier to clarify what classify and classifyFull methods do. Override javadoc for classify and classifyScalar methods in AbstractNaiveBayesClassifier to reflect the fact that they are not supported. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1141) Driver for cvb0_local does not warn about missing maxIterations command line parameter
[ https://issues.apache.org/jira/browse/MAHOUT-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1141: -- Fix Version/s: 0.8 Driver for cvb0_local does not warn about missing maxIterations command line parameter -- Key: MAHOUT-1141 URL: https://issues.apache.org/jira/browse/MAHOUT-1141 Project: Mahout Issue Type: Bug Components: Clustering Affects Versions: 0.7, 0.8 Environment: MacOS 10.8, Java 7 Reporter: Samar Lotia Priority: Minor Fix For: 0.8 The driver for cvb0_local does not seem to verify whether the caller has specified the required maxIterations command line parameter. This results in an exception much further down which pretty much requires looking at the source to discover the source of the error. Exception in thread main java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.String at org.apache.mahout.clustering.lda.cvb.InMemoryCollapsedVariationalBayes0.main2(InMemoryCollapsedVariationalBayes0.java:374) at org.apache.mahout.clustering.lda.cvb.InMemoryCollapsedVariationalBayes0.run(InMemoryCollapsedVariationalBayes0.java:521) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.mahout.clustering.lda.cvb.InMemoryCollapsedVariationalBayes0.main(InMemoryCollapsedVariationalBayes0.java:525) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-760) org.apache.mahout.fpm.pfpgrowth.PFPGrowthTest test fails during install
[ https://issues.apache.org/jira/browse/MAHOUT-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-760: - Fix Version/s: 0.7 org.apache.mahout.fpm.pfpgrowth.PFPGrowthTest test fails during install -- Key: MAHOUT-760 URL: https://issues.apache.org/jira/browse/MAHOUT-760 Project: Mahout Issue Type: Bug Components: Frequent Itemset/Association Rule Mining Affects Versions: 0.5 Environment: #uname -a Linux hostname 2.6.27.54-0.2-default #1 SMP 2010-10-19 18:40:07 +0200 x86_64 x86_64 x86_64 GNU/Linux # java -version java version 1.6.0 Java(TM) SE Runtime Environment (build pxa6460sr9ifix-20110211_02(SR9+IZ94423)) IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux amd64-64 jvmxa6460sr9-20101124_69295 (JIT enabled, AOT enabled) J9VM - 20101124_069295 JIT - r9_20101028_17488ifx2 GC - 20101027_AA) JCL - 20110211_02 Reporter: Chintamani Assignee: Sean Owen Priority: Minor Labels: hadoop, ibm-jdk Fix For: 0.7 mvn install core fails because of a single failed test - org.apache.mahout.fpm.pfpgrowth.PFPGrowthTest with the following error (extracted from target/surefire-reports/org.apache.mahout.fpm.pfpgrowth.PFPGrowthTest.txt) Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.615 sec FAILURE! testStartParallelFPGrowth(org.apache.mahout.fpm.pfpgrowth.PFPGrowthTest) Time elapsed: 6.587 sec FAILURE! org.junit.ComparisonFailure: expected:{[D=0, E=1, A=0, B=0, C]=1} but was:{[A=0, B=0, C=1, D=0, E]=1} at org.junit.Assert.assertEquals(Assert.java:123) at org.junit.Assert.assertEquals(Assert.java:145) at org.apache.mahout.fpm.pfpgrowth.PFPGrowthTest.testStartParallelFPGrowth(PFPGrowthTest.java:95) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:119) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:101) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(ProviderFactory.java:103) at $Proxy0.invoke(Unknown Source) at org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:150) at org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(SurefireStarter.java:91) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:69) Every other test in all the components succeed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1118) SLF4J Log4j bindings are messed up causing examples to fail
[ https://issues.apache.org/jira/browse/MAHOUT-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1118: -- Affects Version/s: 0.7 Fix Version/s: 0.8 SLF4J Log4j bindings are messed up causing examples to fail --- Key: MAHOUT-1118 URL: https://issues.apache.org/jira/browse/MAHOUT-1118 Project: Mahout Issue Type: Bug Affects Versions: 0.7 Reporter: Grant Ingersoll Assignee: Grant Ingersoll Fix For: 0.8 Attachments: MAHOUT-1118.patch We are routinely seeing the following failures when running the examples on Jenkins and they are due to old SLF4j bindings on Cassandra and HBase: {code} Training on /tmp/mahout-work-jenkins/20news-bydate/20news-bydate-train/ hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running locally SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/x1/jenkins/jenkins-slave/workspace/Mahout-Examples-Classify-20News/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/x1/jenkins/jenkins-slave/workspace/Mahout-Examples-Classify-20News/trunk/examples/target/dependency/slf4j-jcl-1.7.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/x1/jenkins/jenkins-slave/workspace/Mahout-Examples-Classify-20News/trunk/examples/target/dependency/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: slf4j-api 1.6.x (or later) is incompatible with this binding. SLF4J: Your binding is version 1.5.5 or earlier. SLF4J: Upgrade your binding to version 1.6.x. Exception in thread main java.lang.NoSuchMethodError: org.slf4j.impl.StaticLoggerBinder.getSingleton()Lorg/slf4j/impl/StaticLoggerBinder; at org.slf4j.LoggerFactory.bind(LoggerFactory.java:128) at org.slf4j.LoggerFactory.performInitialization(LoggerFactory.java:107) at org.slf4j.LoggerFactory.getILoggerFactory(LoggerFactory.java:295) at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:269) at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:281) at org.apache.mahout.driver.MahoutDriver.clinit(MahoutDriver.java:89) Could not find the main class: org.apache.mahout.driver.MahoutDriver. Program will exit. Build step 'Execute shell' marked build as failure Sending e-mails to: dev@mahout.apache.org ssc.o...@googlemail.com p {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-804) Each page in Mahout's Confluence Wiki has 2 URLs, with differing page styles and search behaviours
[ https://issues.apache.org/jira/browse/MAHOUT-804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-804: - Affects Version/s: 0.6 0.7 Fix Version/s: 0.8 Each page in Mahout's Confluence Wiki has 2 URLs, with differing page styles and search behaviours -- Key: MAHOUT-804 URL: https://issues.apache.org/jira/browse/MAHOUT-804 Project: Mahout Issue Type: Improvement Components: Website Affects Versions: 0.6, 0.7 Reporter: Dan Brickley Labels: atlassian, confluence, wiki Fix For: 0.8 There are two styles of URL in circulation for URLs into Mahout's Wiki (presumably an Apache-wide configuration issue): https://cwiki.apache.org/MAHOUT/svd-singular-value-decomposition.html vs https://cwiki.apache.org/confluence/display/MAHOUT/SVD+-+Singular+Value+Decomposition They appear to be the self-same confluence 3.4.9 installation (or its raw filetree). Each has a different search box at the top of the page. The version with 'confluence/' in the path does a confluence search, and returns similar URLs as results. The one with '.html' suffixes does a domain-constrained Google search. Despite markup canonicalising the confluence variant, ie. link rel=canonical href=https://cwiki.apache.org/confluence/display/MAHOUT/SVD+-+Singular+Value+Decomposition; appearing in the confluence pages, it seems the Google search results typically throw people into the other version of the Wiki site. This is all mildly confusing, mildly annoying but overall mostly harmless. It could be having some negative impact on google rank suchlike, since incoming links will be split between the two styles. Maybe this could be passed along to the Wiki admins? Which version does the Mahout team consider canonical URLs (for external links etc)? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1136) Cannot import project into eclipse with m2e 1.2
[ https://issues.apache.org/jira/browse/MAHOUT-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1136: -- Fix Version/s: 0.8 Cannot import project into eclipse with m2e 1.2 --- Key: MAHOUT-1136 URL: https://issues.apache.org/jira/browse/MAHOUT-1136 Project: Mahout Issue Type: Bug Components: build Affects Versions: 0.7 Reporter: Stevo Slavic Labels: m2e Fix For: 0.8 Attachments: MAHOUT-1136.patch Seems fix for MAHOUT-1043 wasn't good, in pluginExecutionFilter instead of version, versionRange should be used. Related SO entry: http://stackoverflow.com/a/6701595/381140 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1088) biased item-based recommender
[ https://issues.apache.org/jira/browse/MAHOUT-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1088: -- Affects Version/s: 0.7 Fix Version/s: 0.8 biased item-based recommender - Key: MAHOUT-1088 URL: https://issues.apache.org/jira/browse/MAHOUT-1088 Project: Mahout Issue Type: Improvement Components: Collaborative Filtering Affects Versions: 0.7 Reporter: Sebastian Schelter Assignee: Sean Owen Fix For: 0.8 Attachments: MAHOUT-1088.patch user-item-baseline estimation offers a simple yet very effective to improve the rating prediction of recommenders (see http://dl.acm.org/citation.cfm?id=1644874 for details). We should offer an item-based recommender that incorporates this technique -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1131) Can't execute alternative FPG implementation from command line
[ https://issues.apache.org/jira/browse/MAHOUT-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1131: -- Affects Version/s: 0.7 Fix Version/s: 0.8 Can't execute alternative FPG implementation from command line -- Key: MAHOUT-1131 URL: https://issues.apache.org/jira/browse/MAHOUT-1131 Project: Mahout Issue Type: Bug Affects Versions: 0.7 Reporter: Kirill A. Korinskiy Fix For: 0.8 Attachments: MAHOUT-1131.patch When I execute: ./bin/mahout fpg -i input -o output -2 option -2 — execute alternative FPG implementation didn't work. Follow patch fix it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1089) SGD matrix factorization for rating prediction with user and item biases
[ https://issues.apache.org/jira/browse/MAHOUT-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1089: -- Affects Version/s: 0.7 Fix Version/s: 0.8 SGD matrix factorization for rating prediction with user and item biases Key: MAHOUT-1089 URL: https://issues.apache.org/jira/browse/MAHOUT-1089 Project: Mahout Issue Type: New Feature Components: Collaborative Filtering Affects Versions: 0.7 Reporter: Zeno Gantner Assignee: Sebastian Schelter Fix For: 0.8 Attachments: MAHOUT-1089.patch, RatingSGDFactorizer.java, RatingSGDFactorizer.java A matrix factorization that is trained with standard SGD on all features at the same time, in contrast to ExpectationMaximizationFactorizer, which learns feature by feature. Additionally to the free features it models a rating bias for each user and item. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1106) SVD++
[ https://issues.apache.org/jira/browse/MAHOUT-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1106: -- Affects Version/s: 0.7 Fix Version/s: 0.8 SVD++ - Key: MAHOUT-1106 URL: https://issues.apache.org/jira/browse/MAHOUT-1106 Project: Mahout Issue Type: New Feature Components: Collaborative Filtering Affects Versions: 0.7 Reporter: Zeno Gantner Assignee: Sebastian Schelter Fix For: 0.8 Attachments: SVDPlusPlusFactorizer.java Initial shot at SVD++. Relies on the RatingsSGDFactorizer class introduced in MAHOUT-1089. One could also think about several enhancements, e.g. having separate regularization constants for user and item factors. I am also the author of the SVDPlusPlus class in MyMediaLite, so if there are any similarities, no need to worry -- I am okay with relicensing this to the Apache 2.0 license. https://github.com/zenogantner/MyMediaLite/blob/master/src/MyMediaLite/RatingPrediction/SVDPlusPlus.cs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-852) Upgrade Lucene dependency to 3.4
[ https://issues.apache.org/jira/browse/MAHOUT-852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-852: - Affects Version/s: 0.5 Fix Version/s: 0.6 Upgrade Lucene dependency to 3.4 Key: MAHOUT-852 URL: https://issues.apache.org/jira/browse/MAHOUT-852 Project: Mahout Issue Type: Improvement Affects Versions: 0.5 Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Trivial Fix For: 0.6 As the title says, commit coming shortly once the tests are done running -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1194) Allow to change java target version during the build
[ https://issues.apache.org/jira/browse/MAHOUT-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1194: -- Affects Version/s: 0.7 Fix Version/s: 0.8 Allow to change java target version during the build Key: MAHOUT-1194 URL: https://issues.apache.org/jira/browse/MAHOUT-1194 Project: Mahout Issue Type: Task Affects Versions: 0.7 Reporter: Jarek Jarcec Cecho Assignee: Jarek Jarcec Cecho Priority: Minor Fix For: 0.8 Attachments: bugMAHOUT-1194.patch It seems that current build have hard coded java target for JDK6. I think that it would be useful to parametrise that, so that it can be easily overridden on the command line. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1111) Logging bindings not working in current trunk as of github 2012-November-9 18:41
[ https://issues.apache.org/jira/browse/MAHOUT-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-: -- Affects Version/s: 0.7 Fix Version/s: 0.8 Logging bindings not working in current trunk as of github 2012-November-9 18:41 Key: MAHOUT- URL: https://issues.apache.org/jira/browse/MAHOUT- Project: Mahout Issue Type: Bug Components: build, Examples Affects Versions: 0.7 Environment: == Most Recent Commit commit 1743c1521679daab600a982be6e53751730e Author: Paritosh Ranjan pran...@apache.org Date: Thu Nov 1 13:02:03 2012 + MAHOUT-1109, Creatinng parent directories if not present while creating file git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1404572 13f79535-4 github runs behind svn, apologies if this is fixed. I can't find an online svn commit log in the apache SVN server. Reporter: Lance Norskog Assignee: Sebastian Schelter Priority: Blocker Fix For: 0.8 Attachments: multiple-slf4j.patch Current commit is 1743c1521679daab600a982be6e53751730e On trunk, running examples/bin/classify-20newsgroups.sh gives this error: {noformat} SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: slf4j-api 1.6.x (or later) is incompatible with this binding. SLF4J: Your binding is version 1.5.5 or earlier. SLF4J: Upgrade your binding to version 1.6.x. Exception in thread main java.lang.NoSuchMethodError: org.slf4j.impl.StaticLoggerBinder.getSingleton()Lorg/slf4j/impl/StaticLoggerBinder; at org.slf4j.LoggerFactory.bind(LoggerFactory.java:128) at org.slf4j.LoggerFactory.performInitialization(LoggerFactory.java:107) at org.slf4j.LoggerFactory.getILoggerFactory(LoggerFactory.java:295) at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:269) at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:281) at org.apache.mahout.driver.MahoutDriver.clinit(MahoutDriver.java:89) {noformat} Marked Blocker since script just plain does not run. Here is the complete trace from running the script under shell's -x option: {noformat} @mac bin [trunk] $ sh -x classify-20newsgroups.sh + '[' '' = --help ']' + '[' '' = '--?' ']' + SCRIPT_PATH=classify-20newsgroups.sh + '[' classify-20newsgroups.sh '!=' classify-20newsgroups.sh ']' ++ pwd + START_PATH=/Users/lancenorskog/Documents/open/mahout/examples/bin + WORK_DIR=/tmp/mahout-work-lancenorskog + algorithm=(cnaivebayes naivebayes sgd clean) + '[' -n '' ']' + echo 'Please select a number to choose the corresponding task to run' Please select a number to choose the corresponding task to run + echo '1. cnaivebayes' 1. cnaivebayes + echo '2. naivebayes' 2. naivebayes + echo '3. sgd' 3. sgd + echo '4. clean -- cleans up the work area in /tmp/mahout-work-lancenorskog' 4. clean -- cleans up the work area in /tmp/mahout-work-lancenorskog + read -p 'Enter your choice : ' choice Enter your choice : 1 + echo 'ok. You chose 1 and we'\''ll use cnaivebayes' ok. You chose 1 and we'll use cnaivebayes + alg=cnaivebayes + echo 'creating work directory at /tmp/mahout-work-lancenorskog' creating work directory at /tmp/mahout-work-lancenorskog + mkdir -p /tmp/mahout-work-lancenorskog + '[' '!' -e /tmp/mahout-work-lancenorskog/20news-bayesinput ']' + '[' '!' -e /tmp/mahout-work-lancenorskog/20news-bydate ']' + cd /Users/lancenorskog/Documents/open/mahout/examples/bin + cd ../.. + set -e + '[' xcnaivebayes == xnaivebayes -o xcnaivebayes == xcnaivebayes ']' + c= + '[' xcnaivebayes == xcnaivebayes ']' + c=' -c' + set -x + echo 'Preparing 20newsgroups data' Preparing 20newsgroups data + rm -rf /tmp/mahout-work-lancenorskog/20news-all + mkdir /tmp/mahout-work-lancenorskog/20news-all + cp -R /tmp/mahout-work-lancenorskog/20news-bydate/20news-bydate-test/alt.atheism /tmp/mahout-work-lancenorskog/20news-bydate/20news-bydate-test/comp.graphics /tmp/mahout-work-lancenorskog/20news-bydate/20news-bydate-test/comp.os.ms-windows.misc /tmp/mahout-work-lancenorskog/20news-bydate/20news-bydate-test/comp.sys.ibm.pc.hardware /tmp/mahout-work-lancenorskog/20news-bydate/20news-bydate-test/comp.sys.mac.hardware /tmp/mahout-work-lancenorskog/20news-bydate/20news-bydate-test/comp.windows.x /tmp/mahout-work-lancenorskog/20news-bydate/20news-bydate-test/misc.forsale /tmp/mahout-work-lancenorskog/20news-bydate/20news-bydate-test/rec.autos /tmp/mahout-work-lancenorskog/20news-bydate/20news-bydate-test/rec.motorcycles /tmp/mahout-work-lancenorskog/20news-bydate/20news-bydate-test/rec.sport.baseball
[jira] [Updated] (MAHOUT-1087) ExpectationMaximizationSVDFactorizer doesn't do expectation maximization
[ https://issues.apache.org/jira/browse/MAHOUT-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1087: -- Affects Version/s: 0.7 Fix Version/s: 0.8 ExpectationMaximizationSVDFactorizer doesn't do expectation maximization Key: MAHOUT-1087 URL: https://issues.apache.org/jira/browse/MAHOUT-1087 Project: Mahout Issue Type: Improvement Components: Collaborative Filtering Affects Versions: 0.7 Reporter: Sebastian Schelter Assignee: Sean Owen Fix For: 0.8 This factorizer simply learns the user and item features via SGD as described in Simon Funk's famous blogpost, which is not expectation maximization, so I suggest we rename it to FunkSVD. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Updating JIRA issues
IIRC when you bulk update issues, you can choose to *not* send e-mail. Might be good if affecting many at once like this!
[jira] [Updated] (MAHOUT-1083) CIReducer in kmeans doesn't work well
[ https://issues.apache.org/jira/browse/MAHOUT-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1083: -- Affects Version/s: 0.7 Fix Version/s: 0.8 CIReducer in kmeans doesn't work well - Key: MAHOUT-1083 URL: https://issues.apache.org/jira/browse/MAHOUT-1083 Project: Mahout Issue Type: Bug Affects Versions: 0.7 Environment: hadoop-2.0.0-alpha: pseudo cluster and single node clusterhadoop-1.0.3: pseudo clusterhadoop-0.20.2:pseudo cluster mahout:mahout-0.7 os:ubuntu 11.04 jdk:jdk1.6.0_27 Reporter: liutengfei Fix For: 0.8 Attachments: MAHOUT-1083.patch the function reduce in mahout-0.7-kmeans-CIReducer.java doesn't work well as it looks like. protected void reduce(IntWritable key, IterableClusterWritable values, Context context) throws IOException, InterruptedException { IteratorClusterWritable iter = values.iterator(); ClusterWritable first = null; while (iter.hasNext()) { ClusterWritable cw = iter.next(); if (first == null) { first = cw; } else { first.getValue().observe(cw.getValue()); } } ListCluster models = new ArrayListCluster(); models.add(first.getValue()); classifier = new ClusterClassifier(models, policy); classifier.close(); context.write(key, first); } Apparently, the variable first will collect all output data of maps. Actually but, the value of first will change after the code ClusterWritable cw = iter.next();, same with this new variable cw! I don't why but running result shows that the code runs looks like this:ClusterWritable cw = first = iter.next();. is cw a reference a to iter? is iter.next just change the value of iter itself to the next? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Updating JIRA issues
Sorry about that... will keep in mind the next time. From: Sean Owen sro...@gmail.com To: Mahout Dev List dev@mahout.apache.org Sent: Thursday, July 25, 2013 8:46 AM Subject: Updating JIRA issues IIRC when you bulk update issues, you can choose to *not* send e-mail. Might be good if affecting many at once like this!
Re: 0.8
With Isabel's help, updated the 0.8 Release notes on the Wiki and below is the text version of the Release notes. Checkout the Wiki version at https://cwiki.apache.org/confluence/display/MAHOUT/Release+0.8 --- The Apache Mahout PMC is pleased to announce the release of Mahout 0.8. Mahout's goal is to build scalable machine learning libraries focused primarily in the areas of collaborative filtering (recommenders), clustering and classification (known as the 3Cs), as well as the necessary infrastructure to support those implementations including, but not limited to, math packages for statistics, linear algebra and others as well as Java primitive collections, local and distributed vector and matrix classes and a variety of integrative code to work with popular packages like Apache Hadoop, Apache Lucene, Apache HBase, Apache Cassandra and much more. The 0.8 release is mainly a clean up release in preparation for an upcoming 1.0 release, but there are several significant new features, which are highlighted below. To get started with Apache Mahout 0.8, download the release artifacts and signatures at http://www.apache.org/dyn/closer.cgi/mahout. The examples directory contains several working examples of the core functionality available in Mahout. These can be run via scripts in the examples/bin directory. Most examples do not need a Hadoop cluster in order to run. Please pay attention to the section labelled FUTURE PLANS below for more information about upcoming releases of Mahout. As with any release, we wish to thank all of the users and contributors to Mahout. Please see the CHANGELOG [1] and JIRA Release Notes [2] for individual credits, as there are too many to list here. RELEASE HIGHLIGHTS The highlights of the Apache Mahout 0.8 release include, but are not limited to the list below. For further information, see the included CHANGELOG file. - Numerous performance improvements to Vector and Matrix implementations, API's and their iterators (see also MAHOUT-1192, MAHOUT-1202) - Numerous performance improvements to the recommender implementations (see also MAHOUT-1272, MAHOUT-1035, MAHOUT-1042, MAHOUT-1151, MAHOUT-1166, MAHOUT-1167, MAHOUT-1169, MAHOUT-1205, MAHOUT-1264) - MAHOUT-1088: Support for biased item-based recommender - MAHOUT-1089: SGD matrix factorization for rating prediction with user and item biases - MAHOUT-1106: Support for SVD++ - MAHOUT-944: Support for converting one or more Lucene storage indexes to SequenceFiles as well as an upgrade of the supported Lucene version to Lucene 4.3.1. - MAHOUT-1154 and friends: New streaming k-means implementation that offers on-line (and fast) clustering - MAHOUT-833: Make conversion to SequenceFiles Map-Reduce, 'seqdirectory' can now be run as a MapReduce job. - MAHOUT-1052: Add an option to MinHashDriver that specifies the dimension of vector to hash (indexes or values). - MAHOUT-884: Matrix Concat utility, presently only concatenates two matrices. - MAHOUT-1244: Upgraded to use Lucene 4.3 - MAHOUT-1187: Upgraded to CommonsLang3 - MAHOUT-916: Speedup the Mahout build by making tests run in parallel. - The usual bug fixes. See JIRA [2] for more information on the 0.8 release. A total of 218 separate JIRA issues are addressed in this release. CONTRIBUTING Mahout is always looking for contributions focused on the 3Cs. If you are interested in contributing, please see our https://cwiki.apache.org/MAHOUT/how-to-contribute.html on the Mahout wiki or contact us via email at dev@mahout.apache.org. FUTURE PLANS 0.9 As the project moves towards a 1.0 release, the community is working to clean up and/or remove parts of the code base that are under-supported or that underperform as well as to better focus the energy and contributions on key algorithms that are proven to scale in production and have seen wide-spread adoption. To this end, in the next release, the project is planning on removing support for the following algorithms unless there is sustained support and improvement of them before the next release. The algorithms to be removed are: - From Clustering: Dirichlet MeanShift MinHash Eigencuts - From Classification (both are sequential implementations) Winnow Perceptron - Frequent Pattern Mining - Collaborative Filtering All recommenders in org.apache.mahout.cf.taste. impl.recommender.knn SlopeOne implementations in org.apache.mahout.cf.taste.hadoop.slopeone and org.apache.mahout.cf.taste.impl.recommender.slopeone Distributed pseudo recommender in org.apache.mahout.cf.taste.hadoop.pseudo TreeClusteringRecommender in org.apache.mahout.cf.taste.impl.recommender - Mahout Math Lanczos in favour of SSVD Hadoop entropy stuff in org.apache.mahout.math.stats.entropy If you are interested in supporting 1 or more of these algorithms, please make it known on dev@mahout.apache.org and via JIRA issues that fix and/or improve them. Please also provide supporting
Re: 0.8
Awesome, I will send out the announcement as soon as I check the mirrors. On Jul 25, 2013, at 2:44 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: With Isabel's help, updated the 0.8 Release notes on the Wiki and below is the text version of the Release notes. Checkout the Wiki version at https://cwiki.apache.org/confluence/display/MAHOUT/Release+0.8 --- The Apache Mahout PMC is pleased to announce the release of Mahout 0.8. Mahout's goal is to build scalable machine learning libraries focused primarily in the areas of collaborative filtering (recommenders), clustering and classification (known as the 3Cs), as well as the necessary infrastructure to support those implementations including, but not limited to, math packages for statistics, linear algebra and others as well as Java primitive collections, local and distributed vector and matrix classes and a variety of integrative code to work with popular packages like Apache Hadoop, Apache Lucene, Apache HBase, Apache Cassandra and much more. The 0.8 release is mainly a clean up release in preparation for an upcoming 1.0 release, but there are several significant new features, which are highlighted below. To get started with Apache Mahout 0.8, download the release artifacts and signatures at http://www.apache.org/dyn/closer.cgi/mahout. The examples directory contains several working examples of the core functionality available in Mahout. These can be run via scripts in the examples/bin directory. Most examples do not need a Hadoop cluster in order to run. Please pay attention to the section labelled FUTURE PLANS below for more information about upcoming releases of Mahout. As with any release, we wish to thank all of the users and contributors to Mahout. Please see the CHANGELOG [1] and JIRA Release Notes [2] for individual credits, as there are too many to list here. RELEASE HIGHLIGHTS The highlights of the Apache Mahout 0.8 release include, but are not limited to the list below. For further information, see the included CHANGELOG file. - Numerous performance improvements to Vector and Matrix implementations, API's and their iterators (see also MAHOUT-1192, MAHOUT-1202) - Numerous performance improvements to the recommender implementations (see also MAHOUT-1272, MAHOUT-1035, MAHOUT-1042, MAHOUT-1151, MAHOUT-1166, MAHOUT-1167, MAHOUT-1169, MAHOUT-1205, MAHOUT-1264) - MAHOUT-1088: Support for biased item-based recommender - MAHOUT-1089: SGD matrix factorization for rating prediction with user and item biases - MAHOUT-1106: Support for SVD++ - MAHOUT-944: Support for converting one or more Lucene storage indexes to SequenceFiles as well as an upgrade of the supported Lucene version to Lucene 4.3.1. - MAHOUT-1154 and friends: New streaming k-means implementation that offers on-line (and fast) clustering - MAHOUT-833: Make conversion to SequenceFiles Map-Reduce, 'seqdirectory' can now be run as a MapReduce job. - MAHOUT-1052: Add an option to MinHashDriver that specifies the dimension of vector to hash (indexes or values). - MAHOUT-884: Matrix Concat utility, presently only concatenates two matrices. - MAHOUT-1244: Upgraded to use Lucene 4.3 - MAHOUT-1187: Upgraded to CommonsLang3 - MAHOUT-916: Speedup the Mahout build by making tests run in parallel. - The usual bug fixes. See JIRA [2] for more information on the 0.8 release. A total of 218 separate JIRA issues are addressed in this release. CONTRIBUTING Mahout is always looking for contributions focused on the 3Cs. If you are interested in contributing, please see our https://cwiki.apache.org/MAHOUT/how-to-contribute.html on the Mahout wiki or contact us via email at dev@mahout.apache.org. FUTURE PLANS 0.9 As the project moves towards a 1.0 release, the community is working to clean up and/or remove parts of the code base that are under-supported or that underperform as well as to better focus the energy and contributions on key algorithms that are proven to scale in production and have seen wide-spread adoption. To this end, in the next release, the project is planning on removing support for the following algorithms unless there is sustained support and improvement of them before the next release. The algorithms to be removed are: - From Clustering: Dirichlet MeanShift MinHash Eigencuts - From Classification (both are sequential implementations) Winnow Perceptron - Frequent Pattern Mining - Collaborative Filtering All recommenders in org.apache.mahout.cf.taste. impl.recommender.knn SlopeOne implementations in org.apache.mahout.cf.taste.hadoop.slopeone and org.apache.mahout.cf.taste.impl.recommender.slopeone Distributed pseudo recommender in org.apache.mahout.cf.taste.hadoop.pseudo TreeClusteringRecommender in org.apache.mahout.cf.taste.impl.recommender - Mahout Math Lanczos in favour of SSVD Hadoop
Apache Mahout 0.8 Released
The Apache Mahout PMC is pleased to announce the release of Mahout 0.8. Mahout's goal is to build scalable machine learning libraries focused primarily in the areas of collaborative filtering (recommenders), clustering and classification (known collectively as the 3Cs), as well as the necessary infrastructure to support those implementations including, but not limited to, math packages for statistics, linear algebra and others as well as Java primitive collections, local and distributed vector and matrix classes and a variety of integrative code to work with popular packages like Apache Hadoop, Apache Lucene, Apache HBase, Apache Cassandra and much more. The 0.8 release is mainly a clean up release in preparation for an upcoming 1.0 release, but there are several significant new features, which are highlighted below. To get started with Apache Mahout 0.8, download the release artifacts and signatures at http://www.apache.org/dyn/closer.cgi/mahout or visit the central Maven repository. In addition to the release highlights and artifacts, please pay attention to the section labelled FUTURE PLANS below for more information about upcoming releases of Mahout. As with any release, we wish to thank all of the users and contributors to Mahout. Please see the CHANGELOG [1] and JIRA Release Notes [2] for individual credits, as there are too many to list here. GETTING STARTED In the release package, the examples directory contains several working examples of the core functionality available in Mahout. These can be run via scripts in the examples/bin directory and will prompt you for more information to help you try things out. Most examples do not need a Hadoop cluster in order to run. RELEASE HIGHLIGHTS The highlights of the Apache Mahout 0.8 release include, but are not limited to the list below. For further information, see the included CHANGELOG file. - Numerous performance improvements to Vector and Matrix implementations, API's and their iterators (see also MAHOUT-1192, MAHOUT-1202) - Numerous performance improvements to the recommender implementations (see also MAHOUT-1272, MAHOUT-1035, MAHOUT-1042, MAHOUT-1151, MAHOUT-1166, MAHOUT-1167, MAHOUT-1169, MAHOUT-1205, MAHOUT-1264) - MAHOUT-1088: Support for biased item-based recommender - MAHOUT-1089: SGD matrix factorization for rating prediction with user and item biases - MAHOUT-1106: Support for SVD++ - MAHOUT-944: Support for converting one or more Lucene storage indexes to SequenceFiles as well as an upgrade of the supported Lucene version to Lucene 4.3.1. - MAHOUT-1154 and friends: New streaming k-means implementation that offers on-line (and fast) clustering - MAHOUT-833: Make conversion to SequenceFiles Map-Reduce, 'seqdirectory' can now be run as a MapReduce job. - MAHOUT-1052: Add an option to MinHashDriver that specifies the dimension of vector to hash (indexes or values). - MAHOUT-884: Matrix Concat utility, presently only concatenates two matrices. - MAHOUT-1244: Upgraded to use Lucene 4.3 - MAHOUT-1187: Upgraded to CommonsLang3 - MAHOUT-916: Speedup the Mahout build by making tests run in parallel. - The usual bug fixes. See JIRA [2] for more information on the 0.8 release. A total of 218 separate JIRA issues are addressed in this release. CONTRIBUTING Mahout is always looking for contributions focused on the 3Cs. If you are interested in contributing, please see our contribution page, https://cwiki.apache.org/MAHOUT/how-to-contribute.html, on the Mahout wiki or contact us via email at dev@mahout.apache.org. FUTURE PLANS 0.9 As the project moves towards a 1.0 release, the community is working to clean up and/or remove parts of the code base that are under-supported or that underperform as well as to better focus the energy and contributions on key algorithms that are proven to scale in production and have seen wide-spread adoption. To this end, in the next release, the project is planning on removing support for the following algorithms unless there is sustained support and improvement of them before the next release. The algorithms to be removed are: - From Clustering: Dirichlet MeanShift MinHash Eigencuts - From Classification (both are sequential implementations) Winnow Perceptron - Frequent Pattern Mining - Collaborative Filtering All recommenders in org.apache.mahout.cf.taste. impl.recommender.knn SlopeOne implementations in org.apache.mahout.cf.taste.hadoop.slopeone and org.apache.mahout.cf.taste.impl.recommender.slopeone Distributed pseudo recommender in org.apache.mahout.cf.taste.hadoop.pseudo TreeClusteringRecommender in org.apache.mahout.cf.taste.impl.recommender - Mahout Math Lanczos in favour of SSVD Hadoop entropy stuff in org.apache.mahout.math.stats.entropy If you are interested in supporting 1 or more of these algorithms, please make it known on dev@mahout.apache.org and via JIRA issues that fix and/or improve them. Please also provide supporting evidence as to their
[jira] [Updated] (MAHOUT-1291) MahoutDriver yields cosmetically suboptimal exception when bin/mahout runs without args, on some Hadoop versions
[ https://issues.apache.org/jira/browse/MAHOUT-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-1291: -- Attachment: MAHOUT-1291.patch MahoutDriver yields cosmetically suboptimal exception when bin/mahout runs without args, on some Hadoop versions Key: MAHOUT-1291 URL: https://issues.apache.org/jira/browse/MAHOUT-1291 Project: Mahout Issue Type: Improvement Affects Versions: 0.8 Reporter: Sean Owen Priority: Trivial Fix For: 0.9 Attachments: MAHOUT-1291.patch If you run bin/mahout without arguments, an error is correctly displayed about lack of an argument. The part that displays the error is actually within Hadoop code. In some versions of Hadoop, in the error case, it will quit the JVM with System.exit(). In others, it does not. In the calling code in MahoutDriver, in this error case, the main() method does not actually return. So, for versions where Hadoop code doesn't immediately exit the JVM, execution continues. This yields another exception. It's pretty harmless but ugly. Attached is a one-line fix, to return from main() in the error case, which is more correct to begin with. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAHOUT-1291) MahoutDriver yields cosmetically suboptimal exception when bin/mahout runs without args, on some Hadoop versions
Sean Owen created MAHOUT-1291: - Summary: MahoutDriver yields cosmetically suboptimal exception when bin/mahout runs without args, on some Hadoop versions Key: MAHOUT-1291 URL: https://issues.apache.org/jira/browse/MAHOUT-1291 Project: Mahout Issue Type: Improvement Affects Versions: 0.8 Reporter: Sean Owen Priority: Trivial Fix For: 0.9 Attachments: MAHOUT-1291.patch If you run bin/mahout without arguments, an error is correctly displayed about lack of an argument. The part that displays the error is actually within Hadoop code. In some versions of Hadoop, in the error case, it will quit the JVM with System.exit(). In others, it does not. In the calling code in MahoutDriver, in this error case, the main() method does not actually return. So, for versions where Hadoop code doesn't immediately exit the JVM, execution continues. This yields another exception. It's pretty harmless but ugly. Attached is a one-line fix, to return from main() in the error case, which is more correct to begin with. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1290) Issue when running Mahout Recommender Demo
[ https://issues.apache.org/jira/browse/MAHOUT-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Helder Garay Martins updated MAHOUT-1290: - Labels: newdev patch (was: ) Status: Patch Available (was: Open) - Added Jetty dependency to the examples module, and removed it from integration module - Added Jetty conf file to the examples module, and removed it from integration module Issue when running Mahout Recommender Demo -- Key: MAHOUT-1290 URL: https://issues.apache.org/jira/browse/MAHOUT-1290 Project: Mahout Issue Type: Bug Components: Examples Affects Versions: 0.8 Reporter: Suneel Marthi Labels: patch, newdev Fix For: 0.9 When running jetty:run under *mahout-integration*, seeing a ClassNotFoundException: org.apache.mahout.cf.taste.**example.grouplens.**GroupLensRecommender. The problem is happening because the webapp folder wasn't moved to the examples dir and the Jetty dependency wasn't added asa Maven plugin when the GroupLens example moved to the examples submodule. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1290) Issue when running Mahout Recommender Demo
[ https://issues.apache.org/jira/browse/MAHOUT-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Helder Garay Martins updated MAHOUT-1290: - Attachment: MAHOUT-1290.patch I'm not sure if the patch was sent before, I'm sending it here again just to be sure. Issue when running Mahout Recommender Demo -- Key: MAHOUT-1290 URL: https://issues.apache.org/jira/browse/MAHOUT-1290 Project: Mahout Issue Type: Bug Components: Examples Affects Versions: 0.8 Reporter: Suneel Marthi Labels: newdev, patch Fix For: 0.9 Attachments: MAHOUT-1290.patch When running jetty:run under *mahout-integration*, seeing a ClassNotFoundException: org.apache.mahout.cf.taste.**example.grouplens.**GroupLensRecommender. The problem is happening because the webapp folder wasn't moved to the examples dir and the Jetty dependency wasn't added asa Maven plugin when the GroupLens example moved to the examples submodule. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAHOUT-1292) lucene2seq creates single document from index
Liz Merkhofer created MAHOUT-1292: - Summary: lucene2seq creates single document from index Key: MAHOUT-1292 URL: https://issues.apache.org/jira/browse/MAHOUT-1292 Project: Mahout Issue Type: Bug Components: Integration Affects Versions: 0.8 Reporter: Liz Merkhofer Lucene2seq creates only one sequencefile, rather than a file for each document in the index. Running lucene2seq on my Solr (4.3) index produces a file with a header and, it seems, the field I specified from the index, concatenated for all the documents. After running this through seq2sparse and rowid (to prepare for cvb), the resulting matrix has only one row, though it should create one row per document. This issue prevents, at least, data from a lucene index from being easily used as input for cvb. Lucene.vector is also currently inadequate: the keys to its sequence files are LongWriteable, and rowid will not convert only Text to IntWriteable, as is necessary for the keys in cvb. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1292) lucene2seq creates single document from index
[ https://issues.apache.org/jira/browse/MAHOUT-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1292: -- Fix Version/s: 0.9 lucene2seq creates single document from index - Key: MAHOUT-1292 URL: https://issues.apache.org/jira/browse/MAHOUT-1292 Project: Mahout Issue Type: Bug Components: Integration Affects Versions: 0.8 Reporter: Liz Merkhofer Assignee: Suneel Marthi Labels: cvb, lucene, solr Fix For: 0.9 Lucene2seq creates only one sequencefile, rather than a file for each document in the index. Running lucene2seq on my Solr (4.3) index produces a file with a header and, it seems, the field I specified from the index, concatenated for all the documents. After running this through seq2sparse and rowid (to prepare for cvb), the resulting matrix has only one row, though it should create one row per document. This issue prevents, at least, data from a lucene index from being easily used as input for cvb. Lucene.vector is also currently inadequate: the keys to its sequence files are LongWriteable, and rowid will not convert only Text to IntWriteable, as is necessary for the keys in cvb. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAHOUT-1292) lucene2seq creates single document from index
[ https://issues.apache.org/jira/browse/MAHOUT-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi reassigned MAHOUT-1292: - Assignee: Suneel Marthi lucene2seq creates single document from index - Key: MAHOUT-1292 URL: https://issues.apache.org/jira/browse/MAHOUT-1292 Project: Mahout Issue Type: Bug Components: Integration Affects Versions: 0.8 Reporter: Liz Merkhofer Assignee: Suneel Marthi Labels: cvb, lucene, solr Lucene2seq creates only one sequencefile, rather than a file for each document in the index. Running lucene2seq on my Solr (4.3) index produces a file with a header and, it seems, the field I specified from the index, concatenated for all the documents. After running this through seq2sparse and rowid (to prepare for cvb), the resulting matrix has only one row, though it should create one row per document. This issue prevents, at least, data from a lucene index from being easily used as input for cvb. Lucene.vector is also currently inadequate: the keys to its sequence files are LongWriteable, and rowid will not convert only Text to IntWriteable, as is necessary for the keys in cvb. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Mahout-Examples-Cluster-Reuters-II #553
See https://builds.apache.org/job/Mahout-Examples-Cluster-Reuters-II/553/ -- [...truncated 2174 lines...] [INFO] Writing to /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/buffer/ByteBufferConsumer.java [INFO] Writing to /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/buffer/CharBufferConsumer.java [INFO] Writing to /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/buffer/IntBufferConsumer.java [INFO] Writing to /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/buffer/ShortBufferConsumer.java [INFO] Writing to /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/buffer/LongBufferConsumer.java [INFO] Writing to /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/buffer/FloatBufferConsumer.java [INFO] Writing to /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/buffer/DoubleBufferConsumer.java [INFO] Writing to /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/list/ByteArrayList.java [INFO] Writing to /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/list/CharArrayList.java [INFO] Writing to /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/list/IntArrayList.java [INFO] Writing to /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/list/ShortArrayList.java [INFO] Writing to /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/list/LongArrayList.java [INFO] Writing to /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/list/FloatArrayList.java [INFO] Writing to /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/list/DoubleArrayList.java [INFO] Writing to /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/list/AbstractByteList.java [INFO] Writing to /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/list/AbstractCharList.java [INFO] Writing to /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/list/AbstractIntList.java [INFO] Writing to /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/list/AbstractShortList.java [INFO] Writing to /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/list/AbstractLongList.java [INFO] Writing to /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/list/AbstractFloatList.java [INFO] Writing to /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/list/AbstractDoubleList.java [INFO] Writing to /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/function/ByteByteProcedure.java [INFO] Writing to
Re: 0.8
On Thu, Jul 25, 2013 at 6:44 AM, Suneel Marthi suneel_mar...@yahoo.comwrote: With Isabel's help, updated the 0.8 Release notes on the Wiki and below is the text version of the Release notes. Checkout the Wiki version at https://cwiki.apache.org/confluence/display/MAHOUT/Release+0.8 --- The Apache Mahout PMC is pleased to announce the release of Mahout 0.8. Mahout's goal is to build scalable machine learning libraries focused primarily in the areas of collaborative filtering (recommenders), clustering and classification (known as the 3Cs), as well as the necessary infrastructure to support those implementations including, but not limited to, math packages for statistics, linear algebra and others as well as Java primitive collections, local and distributed vector and matrix classes and a variety of integrative code to work with popular packages like Apache Hadoop, Apache Lucene, Apache HBase, Apache Cassandra and much more. The 0.8 release is mainly a clean up release in preparation for an upcoming 1.0 release, but there are several significant new features, which are highlighted below. To get started with Apache Mahout 0.8, download the release artifacts and signatures at http://www.apache.org/dyn/closer.cgi/mahout. The examples directory contains several working examples of the core functionality available in Mahout. These can be run via scripts in the examples/bin directory. Most examples do not need a Hadoop cluster in order to run. Please pay attention to the section labelled FUTURE PLANS below for more information about upcoming releases of Mahout. As with any release, we wish to thank all of the users and contributors to Mahout. Please see the CHANGELOG [1] and JIRA Release Notes [2] for individual credits, as there are too many to list here. RELEASE HIGHLIGHTS The highlights of the Apache Mahout 0.8 release include, but are not limited to the list below. For further information, see the included CHANGELOG file. - Numerous performance improvements to Vector and Matrix implementations, API's and their iterators (see also MAHOUT-1192, MAHOUT-1202) - Numerous performance improvements to the recommender implementations (see also MAHOUT-1272, MAHOUT-1035, MAHOUT-1042, MAHOUT-1151, MAHOUT-1166, MAHOUT-1167, MAHOUT-1169, MAHOUT-1205, MAHOUT-1264) - MAHOUT-1088: Support for biased item-based recommender - MAHOUT-1089: SGD matrix factorization for rating prediction with user and item biases - MAHOUT-1106: Support for SVD++ - MAHOUT-944: Support for converting one or more Lucene storage indexes to SequenceFiles as well as an upgrade of the supported Lucene version to Lucene 4.3.1. - MAHOUT-1154 and friends: New streaming k-means implementation that offers on-line (and fast) clustering - MAHOUT-833: Make conversion to SequenceFiles Map-Reduce, 'seqdirectory' can now be run as a MapReduce job. - MAHOUT-1052: Add an option to MinHashDriver that specifies the dimension of vector to hash (indexes or values). - MAHOUT-884: Matrix Concat utility, presently only concatenates two matrices. - MAHOUT-1244: Upgraded to use Lucene 4.3 - MAHOUT-1187: Upgraded to CommonsLang3 - MAHOUT-916: Speedup the Mahout build by making tests run in parallel. - The usual bug fixes. See JIRA [2] for more information on the 0.8 release. A total of 218 separate JIRA issues are addressed in this release. CONTRIBUTING Mahout is always looking for contributions focused on the 3Cs. If you are interested in contributing, please see our https://cwiki.apache.org/MAHOUT/how-to-contribute.html on the Mahout wiki or contact us via email at dev@mahout.apache.org. FUTURE PLANS 0.9 As the project moves towards a 1.0 release, the community is working to clean up and/or remove parts of the code base that are under-supported or that underperform as well as to better focus the energy and contributions on key algorithms that are proven to scale in production and have seen wide-spread adoption. To this end, in the next release, the project is planning on removing support for the following algorithms unless there is sustained support and improvement of them before the next release. The algorithms to be removed are: - From Clustering: Dirichlet MeanShift MinHash Eigencuts - From Classification (both are sequential implementations) Winnow Perceptron - Frequent Pattern Mining - Collaborative Filtering All recommenders in org.apache.mahout.cf.taste. impl.recommender.knn SlopeOne implementations in org.apache.mahout.cf.taste.hadoop.slopeone and org.apache.mahout.cf.taste.impl.recommender.slopeone Distributed pseudo recommender in org.apache.mahout.cf.taste.hadoop.pseudo TreeClusteringRecommender in org.apache.mahout.cf.taste.impl.recommender - Mahout Math What does it mean -- remove Mahout Math? Lanczos in favour of SSVD Hadoop entropy stuff in org.apache.mahout.math.stats.entropy If you are interested
Re: 0.8
It means to aim to remove the following things *from Mahout Math*: - Lanczos (use SSVD instead) - Hadoop entropy stuff in org.apache.mahout.math.stats.entropy 2013/7/25 Dmitriy Lyubimov dlie...@gmail.com On Thu, Jul 25, 2013 at 6:44 AM, Suneel Marthi suneel_mar...@yahoo.com wrote: With Isabel's help, updated the 0.8 Release notes on the Wiki and below is the text version of the Release notes. Checkout the Wiki version at https://cwiki.apache.org/confluence/display/MAHOUT/Release+0.8 --- The Apache Mahout PMC is pleased to announce the release of Mahout 0.8. Mahout's goal is to build scalable machine learning libraries focused primarily in the areas of collaborative filtering (recommenders), clustering and classification (known as the 3Cs), as well as the necessary infrastructure to support those implementations including, but not limited to, math packages for statistics, linear algebra and others as well as Java primitive collections, local and distributed vector and matrix classes and a variety of integrative code to work with popular packages like Apache Hadoop, Apache Lucene, Apache HBase, Apache Cassandra and much more. The 0.8 release is mainly a clean up release in preparation for an upcoming 1.0 release, but there are several significant new features, which are highlighted below. To get started with Apache Mahout 0.8, download the release artifacts and signatures at http://www.apache.org/dyn/closer.cgi/mahout. The examples directory contains several working examples of the core functionality available in Mahout. These can be run via scripts in the examples/bin directory. Most examples do not need a Hadoop cluster in order to run. Please pay attention to the section labelled FUTURE PLANS below for more information about upcoming releases of Mahout. As with any release, we wish to thank all of the users and contributors to Mahout. Please see the CHANGELOG [1] and JIRA Release Notes [2] for individual credits, as there are too many to list here. RELEASE HIGHLIGHTS The highlights of the Apache Mahout 0.8 release include, but are not limited to the list below. For further information, see the included CHANGELOG file. - Numerous performance improvements to Vector and Matrix implementations, API's and their iterators (see also MAHOUT-1192, MAHOUT-1202) - Numerous performance improvements to the recommender implementations (see also MAHOUT-1272, MAHOUT-1035, MAHOUT-1042, MAHOUT-1151, MAHOUT-1166, MAHOUT-1167, MAHOUT-1169, MAHOUT-1205, MAHOUT-1264) - MAHOUT-1088: Support for biased item-based recommender - MAHOUT-1089: SGD matrix factorization for rating prediction with user and item biases - MAHOUT-1106: Support for SVD++ - MAHOUT-944: Support for converting one or more Lucene storage indexes to SequenceFiles as well as an upgrade of the supported Lucene version to Lucene 4.3.1. - MAHOUT-1154 and friends: New streaming k-means implementation that offers on-line (and fast) clustering - MAHOUT-833: Make conversion to SequenceFiles Map-Reduce, 'seqdirectory' can now be run as a MapReduce job. - MAHOUT-1052: Add an option to MinHashDriver that specifies the dimension of vector to hash (indexes or values). - MAHOUT-884: Matrix Concat utility, presently only concatenates two matrices. - MAHOUT-1244: Upgraded to use Lucene 4.3 - MAHOUT-1187: Upgraded to CommonsLang3 - MAHOUT-916: Speedup the Mahout build by making tests run in parallel. - The usual bug fixes. See JIRA [2] for more information on the 0.8 release. A total of 218 separate JIRA issues are addressed in this release. CONTRIBUTING Mahout is always looking for contributions focused on the 3Cs. If you are interested in contributing, please see our https://cwiki.apache.org/MAHOUT/how-to-contribute.html on the Mahout wiki or contact us via email at dev@mahout.apache.org. FUTURE PLANS 0.9 As the project moves towards a 1.0 release, the community is working to clean up and/or remove parts of the code base that are under-supported or that underperform as well as to better focus the energy and contributions on key algorithms that are proven to scale in production and have seen wide-spread adoption. To this end, in the next release, the project is planning on removing support for the following algorithms unless there is sustained support and improvement of them before the next release. The algorithms to be removed are: - From Clustering: Dirichlet MeanShift MinHash Eigencuts - From Classification (both are sequential implementations) Winnow Perceptron - Frequent Pattern Mining - Collaborative Filtering All recommenders in org.apache.mahout.cf.taste. impl.recommender.knn SlopeOne implementations in org.apache.mahout.cf.taste.hadoop.slopeone and org.apache.mahout.cf.taste.impl.recommender.slopeone
Re: 0.8
oh. of course. On Thu, Jul 25, 2013 at 3:37 PM, Sebastian Schelter s...@apache.org wrote: It means to aim to remove the following things *from Mahout Math*: - Lanczos (use SSVD instead) - Hadoop entropy stuff in org.apache.mahout.math.stats.entropy 2013/7/25 Dmitriy Lyubimov dlie...@gmail.com On Thu, Jul 25, 2013 at 6:44 AM, Suneel Marthi suneel_mar...@yahoo.com wrote: With Isabel's help, updated the 0.8 Release notes on the Wiki and below is the text version of the Release notes. Checkout the Wiki version at https://cwiki.apache.org/confluence/display/MAHOUT/Release+0.8 --- The Apache Mahout PMC is pleased to announce the release of Mahout 0.8. Mahout's goal is to build scalable machine learning libraries focused primarily in the areas of collaborative filtering (recommenders), clustering and classification (known as the 3Cs), as well as the necessary infrastructure to support those implementations including, but not limited to, math packages for statistics, linear algebra and others as well as Java primitive collections, local and distributed vector and matrix classes and a variety of integrative code to work with popular packages like Apache Hadoop, Apache Lucene, Apache HBase, Apache Cassandra and much more. The 0.8 release is mainly a clean up release in preparation for an upcoming 1.0 release, but there are several significant new features, which are highlighted below. To get started with Apache Mahout 0.8, download the release artifacts and signatures at http://www.apache.org/dyn/closer.cgi/mahout. The examples directory contains several working examples of the core functionality available in Mahout. These can be run via scripts in the examples/bin directory. Most examples do not need a Hadoop cluster in order to run. Please pay attention to the section labelled FUTURE PLANS below for more information about upcoming releases of Mahout. As with any release, we wish to thank all of the users and contributors to Mahout. Please see the CHANGELOG [1] and JIRA Release Notes [2] for individual credits, as there are too many to list here. RELEASE HIGHLIGHTS The highlights of the Apache Mahout 0.8 release include, but are not limited to the list below. For further information, see the included CHANGELOG file. - Numerous performance improvements to Vector and Matrix implementations, API's and their iterators (see also MAHOUT-1192, MAHOUT-1202) - Numerous performance improvements to the recommender implementations (see also MAHOUT-1272, MAHOUT-1035, MAHOUT-1042, MAHOUT-1151, MAHOUT-1166, MAHOUT-1167, MAHOUT-1169, MAHOUT-1205, MAHOUT-1264) - MAHOUT-1088: Support for biased item-based recommender - MAHOUT-1089: SGD matrix factorization for rating prediction with user and item biases - MAHOUT-1106: Support for SVD++ - MAHOUT-944: Support for converting one or more Lucene storage indexes to SequenceFiles as well as an upgrade of the supported Lucene version to Lucene 4.3.1. - MAHOUT-1154 and friends: New streaming k-means implementation that offers on-line (and fast) clustering - MAHOUT-833: Make conversion to SequenceFiles Map-Reduce, 'seqdirectory' can now be run as a MapReduce job. - MAHOUT-1052: Add an option to MinHashDriver that specifies the dimension of vector to hash (indexes or values). - MAHOUT-884: Matrix Concat utility, presently only concatenates two matrices. - MAHOUT-1244: Upgraded to use Lucene 4.3 - MAHOUT-1187: Upgraded to CommonsLang3 - MAHOUT-916: Speedup the Mahout build by making tests run in parallel. - The usual bug fixes. See JIRA [2] for more information on the 0.8 release. A total of 218 separate JIRA issues are addressed in this release. CONTRIBUTING Mahout is always looking for contributions focused on the 3Cs. If you are interested in contributing, please see our https://cwiki.apache.org/MAHOUT/how-to-contribute.html on the Mahout wiki or contact us via email at dev@mahout.apache.org. FUTURE PLANS 0.9 As the project moves towards a 1.0 release, the community is working to clean up and/or remove parts of the code base that are under-supported or that underperform as well as to better focus the energy and contributions on key algorithms that are proven to scale in production and have seen wide-spread adoption. To this end, in the next release, the project is planning on removing support for the following algorithms unless there is sustained support and improvement of them before the next release. The algorithms to be removed are: - From Clustering: Dirichlet MeanShift MinHash Eigencuts - From Classification (both are sequential implementations) Winnow Perceptron - Frequent Pattern Mining - Collaborative
Re: 0.8
On Jul 25, 2013, at 11:08 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: What does it mean -- remove Mahout Math? It is a high level bullet, see the items underneath. Unfortunately, they don't translate to text format very well.
Build failed in Jenkins: Mahout-Quality #2156
See https://builds.apache.org/job/Mahout-Quality/2156/ -- [...truncated 197709 lines...] Running org.apache.mahout.cf.taste.impl.common.InvertedRunningAverageTest Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.008 sec - in org.apache.mahout.cf.taste.impl.common.InvertedRunningAverageTest Running org.apache.mahout.cf.taste.impl.common.FastByIDMapTest Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.311 sec - in org.apache.mahout.cf.taste.impl.common.FastByIDMapTest Running org.apache.mahout.cf.taste.common.CommonTest Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.012 sec - in org.apache.mahout.cf.taste.common.CommonTest Running org.apache.mahout.clustering.meanshift.TestMeanShift Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.961 sec - in org.apache.mahout.clustering.meanshift.TestMeanShift Running org.apache.mahout.clustering.classify.ClusterClassificationDriverTest Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.954 sec - in org.apache.mahout.clustering.classify.ClusterClassificationDriverTest Running org.apache.mahout.clustering.dirichlet.TestMapReduce Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 14.627 sec - in org.apache.mahout.clustering.dirichlet.TestMapReduce Running org.apache.mahout.clustering.dirichlet.TestDistributions Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.143 sec - in org.apache.mahout.clustering.dirichlet.TestDistributions Running org.apache.mahout.clustering.dirichlet.TestDirichletClustering Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.14 sec - in org.apache.mahout.clustering.dirichlet.TestDirichletClustering Running org.apache.mahout.clustering.TestGaussianAccumulators Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.441 sec - in org.apache.mahout.clustering.TestGaussianAccumulators Running org.apache.mahout.clustering.lda.cvb.TestCVBModelTrainer Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 71.283 sec - in org.apache.mahout.clustering.lda.cvb.TestCVBModelTrainer Running org.apache.mahout.clustering.canopy.TestCanopyCreation Tests run: 17, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.645 sec - in org.apache.mahout.clustering.canopy.TestCanopyCreation Running org.apache.mahout.clustering.kmeans.TestEigenSeedGenerator Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.506 sec - in org.apache.mahout.clustering.kmeans.TestEigenSeedGenerator Running org.apache.mahout.clustering.kmeans.TestRandomSeedGenerator Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.507 sec - in org.apache.mahout.clustering.kmeans.TestRandomSeedGenerator Running org.apache.mahout.clustering.kmeans.TestKmeansClustering Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 15.592 sec - in org.apache.mahout.clustering.kmeans.TestKmeansClustering Running org.apache.mahout.clustering.TestClusterInterface Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.073 sec - in org.apache.mahout.clustering.TestClusterInterface Running org.apache.mahout.clustering.minhash.TestMinHashClustering Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.359 sec - in org.apache.mahout.clustering.minhash.TestMinHashClustering Running org.apache.mahout.clustering.topdown.PathDirectoryTest Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.007 sec - in org.apache.mahout.clustering.topdown.PathDirectoryTest Running org.apache.mahout.clustering.topdown.postprocessor.ClusterOutputPostProcessorTest Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.731 sec - in org.apache.mahout.clustering.topdown.postprocessor.ClusterOutputPostProcessorTest Running org.apache.mahout.clustering.topdown.postprocessor.ClusterCountReaderTest Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.729 sec - in org.apache.mahout.clustering.topdown.postprocessor.ClusterCountReaderTest Running org.apache.mahout.clustering.streaming.cluster.StreamingKMeansTest Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 49.173 sec - in org.apache.mahout.clustering.streaming.cluster.StreamingKMeansTest Running org.apache.mahout.clustering.streaming.cluster.BallKMeansTest Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 33.151 sec FAILURE! - in org.apache.mahout.clustering.streaming.cluster.BallKMeansTest testClustering(org.apache.mahout.clustering.streaming.cluster.BallKMeansTest) Time elapsed: 0.877 sec FAILURE! java.lang.AssertionError: expected:625.0 but was:787.0 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:494) at org.junit.Assert.assertEquals(Assert.java:592) at
Build failed in Jenkins: mahout-nightly #1302
See https://builds.apache.org/job/mahout-nightly/1302/changes Changes: [dlyubimov] MAHOUT-1280: moving UpperTriangularMatrix to mahout-math as well as adding Symmetric matrix as a first class citizen. Squashed commit of the following: commit 50d97093eff5416b7b644efaae159ea35d7e7279 Author: Dmitriy Lyubimov dlyubi...@apache.org Date: Wed Jul 17 23:35:49 2013 -0700 Illegal like() commit 7ce78c1dfc7b2c15fef787380e617b873df5890d Author: Dmitriy Lyubimov dlyubi...@apache.org Date: Wed Jul 10 12:54:46 2013 -0700 Bug fixes in constructor-by-vector commit ef11cfa02727fb29b2533c0848734809f77f8a3e Author: Dmitriy Lyubimov dlyubi...@apache.org Date: Wed Jul 10 11:22:06 2013 -0700 Switching SSVD uses to UpperTriangular. commit 3e73a8cd7ba32cb8696d76b93ec287540c710f68 Author: Dmitriy Lyubimov dlyubi...@apache.org Date: Wed Jul 10 10:55:11 2013 -0700 Adding test for dense symmetric matrix asserting Eigen decomposition equivalent to that over a dense matrix. commit 6fc530b75215c5ad1c0b5561ff3af724c6e48c6b Author: Dmitriy Lyubimov dlyubi...@apache.org Date: Tue Jul 9 18:30:59 2013 -0700 Moving UpperTriangular matrix to mahout.math; adding DenseSymmetric matrix. -- [...truncated 1546 lines...] Running org.apache.mahout.cf.taste.impl.common.InvertedRunningAverageTest Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.009 sec - in org.apache.mahout.cf.taste.impl.common.InvertedRunningAverageTest Running org.apache.mahout.cf.taste.impl.common.FastByIDMapTest Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.271 sec - in org.apache.mahout.cf.taste.impl.common.FastByIDMapTest Running org.apache.mahout.cf.taste.impl.common.RunningAverageTest Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.005 sec - in org.apache.mahout.cf.taste.impl.common.RunningAverageTest Running org.apache.mahout.cf.taste.impl.common.RefreshHelperTest Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.073 sec - in org.apache.mahout.cf.taste.impl.common.RefreshHelperTest Running org.apache.mahout.cf.taste.impl.common.FastIDSetTest Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.243 sec - in org.apache.mahout.cf.taste.impl.common.FastIDSetTest Running org.apache.mahout.cf.taste.impl.common.RunningAverageAndStdDevTest Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.062 sec - in org.apache.mahout.cf.taste.impl.common.RunningAverageAndStdDevTest Running org.apache.mahout.cf.taste.impl.common.CacheTest Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.885 sec - in org.apache.mahout.cf.taste.impl.common.CacheTest Running org.apache.mahout.cf.taste.impl.common.BitSetTest Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.008 sec - in org.apache.mahout.cf.taste.impl.common.BitSetTest Running org.apache.mahout.cf.taste.impl.common.LongPrimitiveArrayIteratorTest Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.009 sec - in org.apache.mahout.cf.taste.impl.common.LongPrimitiveArrayIteratorTest Running org.apache.mahout.cf.taste.impl.common.WeightedRunningAverageTest Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.009 sec - in org.apache.mahout.cf.taste.impl.common.WeightedRunningAverageTest Running org.apache.mahout.cf.taste.impl.common.FastMapTest Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.396 sec - in org.apache.mahout.cf.taste.impl.common.FastMapTest Running org.apache.mahout.cf.taste.impl.common.SamplingLongPrimitiveIteratorTest Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.924 sec - in org.apache.mahout.cf.taste.impl.common.SamplingLongPrimitiveIteratorTest Running org.apache.mahout.cf.taste.impl.similarity.GenericItemSimilarityTest Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.086 sec - in org.apache.mahout.cf.taste.impl.similarity.GenericItemSimilarityTest Running org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarityTest Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.097 sec - in org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarityTest Running org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarityTest Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.098 sec - in org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarityTest Running org.apache.mahout.cf.taste.impl.similarity.AveragingPreferenceInferrerTest Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.085 sec - in org.apache.mahout.cf.taste.impl.similarity.AveragingPreferenceInferrerTest Running org.apache.mahout.cf.taste.impl.similarity.file.FileItemSimilarityTest Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.082 sec - in org.apache.mahout.cf.taste.impl.similarity.file.FileItemSimilarityTest Running