[ https://issues.apache.org/jira/browse/MAHOUT-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387984#comment-14387984 ]
Saikat Kanjilal edited comment on MAHOUT-1539 at 3/31/15 4:47 AM: ------------------------------------------------------------------ So I did some more research and have some questions: 1) Are we going to deal with images or text data to start? 2) What do we really mean by data point, in my mind its represented by a (x,y) 3) I think the similarity measure associated with determining locality sensitive hashing should be configurable, namely we should be able to plug in Jacard/Euclidean or Cosine similarities as functions to be computed I have a sample localitysensitivehashing scheme coded up in scala but want to get further clarifications on the above before I proceed further Thanks for your help was (Author: kanjilal): So I did some more research and have some questions, I have added questions to JIRA as well: 1) Are we going to deal with images or text data to start? 2) What do we really mean by data point, in my mind its represented by a (x,y) 3) I think the similarity measure associated with determining locality sensitive hashing should be configurable, namely we should be able to plug in Jacard/Euclidean or Cosine similarities as functions to be computed I have a sample localitysensitivehashing scheme coded up in scala but want to get further clarifications on the above before I proceed further Thanks for your help > Implement affinity matrix computation in Mahout DSL > --------------------------------------------------- > > Key: MAHOUT-1539 > URL: https://issues.apache.org/jira/browse/MAHOUT-1539 > Project: Mahout > Issue Type: Improvement > Components: Clustering > Affects Versions: 0.9 > Reporter: Shannon Quinn > Assignee: Shannon Quinn > Labels: DSL, scala, spark > Fix For: 0.10.1 > > Attachments: ComputeAffinities.scala > > > This has the same goal as MAHOUT-1506, but rather than code the pairwise > computations in MapReduce, this will be done in the Mahout DSL. > An orthogonal issue is the format of the raw input (vectors, text, images, > SequenceFiles), and how the user specifies the distance equation and any > associated parameters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)