[jira] [Commented] (MAHOUT-1052) Add an option to MinHashDriver that specifies the dimension of vector to hash (indexes or values)
[ https://issues.apache.org/jira/browse/MAHOUT-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13674098#comment-13674098 ] Hudson commented on MAHOUT-1052: Integrated in Mahout-Quality #2036 (See [https://builds.apache.org/job/Mahout-Quality/2036/]) MAHOUT-1052: Add an option to MinHashDriver that specifies the dimension of vector to hash (indexes or values) (Revision 1489281) Result = SUCCESS smarthi : Files : * /mahout/trunk/CHANGELOG * /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/minhash/MinHashDriver.java * /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/minhash/MinHashMapper.java * /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/minhash/MinhashOptionCreator.java * /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/minhash/TestMinHashClustering.java > Add an option to MinHashDriver that specifies the dimension of vector to hash > (indexes or values) > - > > Key: MAHOUT-1052 > URL: https://issues.apache.org/jira/browse/MAHOUT-1052 > Project: Mahout > Issue Type: Improvement > Components: Clustering >Affects Versions: 0.6 >Reporter: Elena Smirnova >Assignee: Suneel Marthi >Priority: Minor > Labels: minhash > Fix For: 0.8 > > Attachments: MAHOUT-1052.patch, MAHOUT-1052.patch > > > Add a parameter to MinHash clustering that specifies the dimension of vector > to hash (indexes or values). Current version of MinHash clustering only > hashed values of vectors. Based on discussion on dev-mahout list, both of the > use-cases are possible and frequently met in practice. > Preserve backward compatibility with default dimension set to values. Add new > unit tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1052) Add an option to MinHashDriver that specifies the dimension of vector to hash (indexes or values)
[ https://issues.apache.org/jira/browse/MAHOUT-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13674001#comment-13674001 ] Suneel Marthi commented on MAHOUT-1052: --- Patch committed to trunk > Add an option to MinHashDriver that specifies the dimension of vector to hash > (indexes or values) > - > > Key: MAHOUT-1052 > URL: https://issues.apache.org/jira/browse/MAHOUT-1052 > Project: Mahout > Issue Type: Improvement > Components: Clustering >Affects Versions: 0.6 >Reporter: Elena Smirnova >Assignee: Suneel Marthi >Priority: Minor > Labels: minhash > Fix For: Backlog > > Attachments: MAHOUT-1052.patch, MAHOUT-1052.patch > > > Add a parameter to MinHash clustering that specifies the dimension of vector > to hash (indexes or values). Current version of MinHash clustering only > hashed values of vectors. Based on discussion on dev-mahout list, both of the > use-cases are possible and frequently met in practice. > Preserve backward compatibility with default dimension set to values. Add new > unit tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1052) Add an option to MinHashDriver that specifies the dimension of vector to hash (indexes or values)
[ https://issues.apache.org/jira/browse/MAHOUT-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673392#comment-13673392 ] Suneel Marthi commented on MAHOUT-1052: --- Cleaned up the patch to be compatible with present codebase. Uploading new patch. > Add an option to MinHashDriver that specifies the dimension of vector to hash > (indexes or values) > - > > Key: MAHOUT-1052 > URL: https://issues.apache.org/jira/browse/MAHOUT-1052 > Project: Mahout > Issue Type: Improvement > Components: Clustering >Affects Versions: 0.6 >Reporter: Elena Smirnova >Assignee: Suneel Marthi >Priority: Minor > Labels: minhash > Fix For: Backlog > > Attachments: MAHOUT-1052.patch > > > Add a parameter to MinHash clustering that specifies the dimension of vector > to hash (indexes or values). Current version of MinHash clustering only > hashed values of vectors. Based on discussion on dev-mahout list, both of the > use-cases are possible and frequently met in practice. > Preserve backward compatibility with default dimension set to values. Add new > unit tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1052) Add an option to MinHashDriver that specifies the dimension of vector to hash (indexes or values)
[ https://issues.apache.org/jira/browse/MAHOUT-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13672834#comment-13672834 ] Suneel Marthi commented on MAHOUT-1052: --- This patch can be committed to trunk (as part of 0.8 release). Cleaned up the patch to be in sync with present codebase. > Add an option to MinHashDriver that specifies the dimension of vector to hash > (indexes or values) > - > > Key: MAHOUT-1052 > URL: https://issues.apache.org/jira/browse/MAHOUT-1052 > Project: Mahout > Issue Type: Improvement > Components: Clustering >Affects Versions: 0.6 >Reporter: Elena Smirnova >Assignee: Suneel Marthi >Priority: Minor > Labels: minhash > Fix For: Backlog > > Attachments: MAHOUT-1052.patch > > > Add a parameter to MinHash clustering that specifies the dimension of vector > to hash (indexes or values). Current version of MinHash clustering only > hashed values of vectors. Based on discussion on dev-mahout list, both of the > use-cases are possible and frequently met in practice. > Preserve backward compatibility with default dimension set to values. Add new > unit tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1052) Add an option to MinHashDriver that specifies the dimension of vector to hash (indexes or values)
[ https://issues.apache.org/jira/browse/MAHOUT-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13672183#comment-13672183 ] Suneel Marthi commented on MAHOUT-1052: --- I can get this patch in for the 0.8 release, but the quality of clusters is still questionable. Nevertheless this patch is still needed, I can open another JIRA for Minhash clustering itself (based on Broder's paper). Thoughts? > Add an option to MinHashDriver that specifies the dimension of vector to hash > (indexes or values) > - > > Key: MAHOUT-1052 > URL: https://issues.apache.org/jira/browse/MAHOUT-1052 > Project: Mahout > Issue Type: Improvement > Components: Clustering >Affects Versions: 0.6 >Reporter: Elena Smirnova >Assignee: Suneel Marthi >Priority: Minor > Labels: minhash > Fix For: Backlog > > Attachments: MAHOUT-1052.patch > > > Add a parameter to MinHash clustering that specifies the dimension of vector > to hash (indexes or values). Current version of MinHash clustering only > hashed values of vectors. Based on discussion on dev-mahout list, both of the > use-cases are possible and frequently met in practice. > Preserve backward compatibility with default dimension set to values. Add new > unit tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira