[jira] [Updated] (MAHOUT-996) Support NamedVectors in arff.vector job by convention

Suneel Marthi (JIRA) Sun, 26 Jan 2014 18:00:50 -0800

     [ 
https://issues.apache.org/jira/browse/MAHOUT-996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Suneel Marthi updated MAHOUT-996:
---------------------------------

    Fix Version/s:     (was: Backlog)
                   0.9
         Assignee: Suneel Marthi  (was: Sebastian Schelter)

Recent fix for Mahout-1410 addresses this issue, hence marking this as 
'Resolved'.

> Support NamedVectors in arff.vector job by convention
> -----------------------------------------------------
>
>                 Key: MAHOUT-996
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-996
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Integration
>    Affects Versions: 0.7
>         Environment: OS X
>            Reporter: Andrew Harbick
>            Assignee: Suneel Marthi
>            Priority: Minor
>             Fix For: 0.9
>
>         Attachments: forillustration.patch
>
>
> If you do something like:
> MAHOUT_LOCAL=1 $MAHOUT_HOME/bin/mahout arff.vector --input $PWD/file.arff 
> --dictOut file.bindings --output $PWD
> MAHOUT_LOCAL=1 $MAHOUT_HOME/bin/mahout kmeans --input $PWD/file.arff.mvc 
> --clusters $PWD/output/file.clusters --output $PWD/output --numClusters 3 
> --maxIter 1000 --clustering
> MAHOUT_LOCAL=1 $MAHOUT_HOME/bin/mahout clusterdump --seqFileDir 
> $PWD/output/clusters-*-final --pointsDir $PWD/output/clusteredPoints --output 
> $PWD/output/clusteranalyze.txt
> Currently you don't get any information out of clusterdump that helps you 
> identify which element from your source data is in which cluster.
> I did an patch for illustration of using an attribute (by convention) from 
> the ARFF file as the name for a NamedVector.  The result of clusterdump is 
> much easier to use:
> VL-18589{n=6165 c=[1.376, 879.144, 3.947, 10.691, 0.874, 1.266, 16.644, 
> 9.689, 2.207, 1.855] r=[0.484, 160.571, 1.959, 6.176, 0.551, 0.442, 34.125, 
> 7.953, 1.988, 0.352]}
>         Weight : [props - optional]:  Point:
>         1.0: 4ee342afd04516354c000140 = [1.000, 597.000, 7.000, 7.000, 1.000, 
> 1.000, 11.000, 12.000, 6.000, 2.000]
>         1.0: 4ee49257eb8b3e28c60025a2 = [1.000, 597.000, 1.000, 7.000, 1.000, 
> 1.000, 8.000, 17.000, 6.000, 2.000]
>         1.0: 4ee60430ab2c714006000937 = [1.000, 597.000, 2.000, 9.000, 1.000, 
> 1.000, 21.000, 21.000, 2.000, 2.000]
>         1.0: 4ef2d580ab2c71231b0019ae = [0:1.000, 1:598.000, 2:5.000, 
> 3:3.000, 5:1.000, 6:4.000, 9:1.000]
>         1.0: 4eda14a30b5d3e655b0043e9 = [1.000, 599.000, 7.000, 8.000, 2.000, 
> 1.000, 15.000, 7.000, 3.000, 2.000]
>         1.0: 4edba62deb8b3e27e6000614 = [0:1.000, 1:599.000, 2:1.000, 
> 3:12.000, 4:1.000, 5:1.000, 6:3.000, 8:3.000, 9:2.000]
>         1.0: 4ede1ea6eb8b3e1f330050f4 = [0:1.000, 1:599.000, 2:3.000, 
> 3:9.000, 4:1.000, 5:1.000, 6:14.000, 7:20.000, 9:2.000]
> ...
> I haven't done serious Java in 15 years so the attached patch is just for 
> idea sake...
> Thanks,
> Andy



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (MAHOUT-996) Support NamedVectors in arff.vector job by convention

Reply via email to