[ https://issues.apache.org/jira/browse/MAHOUT-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Schelter updated MAHOUT-1506: --------------------------------------- Fix Version/s: 1.0 > Creation of affinity matrix for spectral clustering > --------------------------------------------------- > > Key: MAHOUT-1506 > URL: https://issues.apache.org/jira/browse/MAHOUT-1506 > Project: Mahout > Issue Type: Improvement > Components: Clustering > Affects Versions: 1.0 > Reporter: Shannon Quinn > Assignee: Shannon Quinn > Fix For: 1.0 > > > I wanted to get this discussion going, since I think this is a critical > blocker for any kind of documentation update on spectral clustering (I can't > update the documentation until the algorithm is useful, and it won't be > useful until there's a built-in method for converting raw data to an affinity > matrix). > Namely, I'm wondering what kind of "raw" data should this algorithm be > expecting (anything that k-means expects, basically?), and what are the data > structures associated with this? I've created a proof-of-concept for how > pairwise affinity generation could work. > https://github.com/magsol/Hadoop-Affinity > It's a two-step job, but if the data structures in the input data format > provides 1) the total number of data points, and 2) for each data point to > know its index in the overall set, then the first job can be scrapped > entirely and affinity generation will consist of 1 MR task. > (discussions on Spark / h20 pending, of course) > Mainly this is an engineering problem at this point. Let me know your > thoughts and I'll get this done (I'm out of town the next 10 days for my > wedding/honeymoon, will get to this on my return). -- This message was sent by Atlassian JIRA (v6.2#6252)