[ 
https://issues.apache.org/jira/browse/MAHOUT-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13974008#comment-13974008
 ] 

Shannon Quinn commented on MAHOUT-1506:
---------------------------------------

That's fine. This still needs to get done but I'll open up another ticket 
specifying scala DSL instead.

> Creation of affinity matrix for spectral clustering
> ---------------------------------------------------
>
>                 Key: MAHOUT-1506
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1506
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 1.0
>            Reporter: Shannon Quinn
>            Assignee: Shannon Quinn
>             Fix For: 1.0
>
>
> I wanted to get this discussion going, since I think this is a critical 
> blocker for any kind of documentation update on spectral clustering (I can't 
> update the documentation until the algorithm is useful, and it won't be 
> useful until there's a built-in method for converting raw data to an affinity 
> matrix).
> Namely, I'm wondering what kind of "raw" data should this algorithm be 
> expecting (anything that k-means expects, basically?), and what are the data 
> structures associated with this? I've created a proof-of-concept for how 
> pairwise affinity generation could work.
> https://github.com/magsol/Hadoop-Affinity
> It's a two-step job, but if the data structures in the input data format 
> provides 1) the total number of data points, and 2) for each data point to 
> know its index in the overall set, then the first job can be scrapped 
> entirely and affinity generation will consist of 1 MR task.
> (discussions on Spark / h20 pending, of course)
> Mainly this is an engineering problem at this point. Let me know your 
> thoughts and I'll get this done (I'm out of town the next 10 days for my 
> wedding/honeymoon, will get to this on my return).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to