[ https://issues.apache.org/jira/browse/MAHOUT-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392179#comment-14392179 ]
Saikat Kanjilal edited comment on MAHOUT-1539 at 4/2/15 5:38 AM: ----------------------------------------------------------------- Enough with high level concepts already :), so I took the next logical step: I'm not ready to include my code into the mahout master repo yet, so I created my own repo and started a sample implementation there, you will see a first cut of LocalitySensitiveHashing implemented using Euclidean Distance only, code is at least compiling as a first step: https://github.com/skanjila/AffinityMatrix TBD 1) Implement unit and potentially integration tests to test performance of this 2) Once LSH is all the way tested I will then implement the affinityMatrix piece on top of this 3) I will then add some more unit tests for Affinitymatrix 4) I will then add CosineDistance and ManhattanDistance as configurable parameters 5) I will need to incorporate into spark API specifically invoking the SparkContext and using the broadcast mechanisms in the spark clusters as appropriate 6) I will merge this into my mahout checkout out branch Some early feedback on the code would be greatly appreciated, watch for changes in my repo coming frequently was (Author: kanjilal): Enough with high level concepts already :), so I took the next logical step: I'm not ready to include my code into the mahout master repo yet, so I created my own repo and started a sample implementation there, you will see a first cut of LocalitySensitiveHashing implemented using Euclidean Distance only, code is at least compiling as a first step: https://github.com/skanjila/AffinityMatrix TBD 1) Implement unit and potentially integration tests to test performance of this 2) Once LSH is all the way tested I will then implement the affinityMatrix piece on top of this 3) I will then add some more unit tests for Affinitymatrix 4) I will then add CosineDistance and ManhattanDistance as configurable parameters 5) I will need to incorporate into spark API specifically invoking the SparkContext and using the broadcast mechanisms in the spark clusters as appropriate 5) I will merge this into my mahout checkout out branch Some early feedback on the code would be greatly appreciated, watch for changes in my repo coming frequently > Implement affinity matrix computation in Mahout DSL > --------------------------------------------------- > > Key: MAHOUT-1539 > URL: https://issues.apache.org/jira/browse/MAHOUT-1539 > Project: Mahout > Issue Type: Improvement > Components: Clustering > Affects Versions: 0.9 > Reporter: Shannon Quinn > Assignee: Shannon Quinn > Labels: DSL, scala, spark > Fix For: 0.10.1 > > Attachments: ComputeAffinities.scala > > > This has the same goal as MAHOUT-1506, but rather than code the pairwise > computations in MapReduce, this will be done in the Mahout DSL. > An orthogonal issue is the format of the raw input (vectors, text, images, > SequenceFiles), and how the user specifies the distance equation and any > associated parameters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)