[ https://issues.apache.org/jira/browse/MAHOUT-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964856#comment-13964856 ]
Andrew Musselman edited comment on MAHOUT-1505 at 4/10/14 12:52 AM: -------------------------------------------------------------------- I just realized top_terms is a list for a reason, which is to have the terms ranked in descending order of weight. "top_terms":[{"weight":1.9162907600402832,"term":"red"}, {"weight":1.678889846801758,"term":"over"}, {"weight":1.678889846801758,"term":"lazy"}, {"weight":1.678889846801758,"term":"brown"}, {"weight":1.678889846801758,"term":"quick"}, {"weight":1.678889846801758,"term":"jumped"}, {"weight":1.5330326080322265,"term":"dogs"}, {"weight":1.0437751770019532,"term":"cat"}, {"weight":1.0437751770019532,"term":"fox"}, {"weight":0.46435117721557617,"term":"cap"}] Can leave it that way or refigure like this: "top_terms": { "terms": [ "red", "over", "lazy", "brown", "quick", "jumped", "dogs" ], "weights": [ 1.9162907600402832, 1.678889846801758, 1.678889846801758, 1.678889846801758, 1.678889846801758, 1.678889846801758, 1.5330326080322265, 1.0437751770019532, 1.0437751770019532, 0.46435117721557617 ] } was (Author: andrew.musselman): I just realized top_terms is a list for a reason, which is to have the terms ranked in descending order of weight. "top_terms":[{"weight":1.9162907600402832,"term":"red"},{"weight":1.678889846801758,"term":"over"},{"weight":1.678889846801758,"term":"lazy"},{"weight":1.678889846801758,"term":"brown"},{"weight":1.678889846801758,"term":"quick"},{"weight":1.678889846801758,"term":"jumped"},{"weight":1.5330326080322265,"term":"dogs"},{"weight":1.0437751770019532,"term":"cat"},{"weight":1.0437751770019532,"term":"fox"},{"weight":0.46435117721557617,"term":"cap"}] Can leave it that way or refigure like this: "top_terms": { "terms": [ "red", "over", "lazy", "brown", "quick", "jumped", "dogs" ], "weights": [ 1.9162907600402832, 1.678889846801758, 1.678889846801758, 1.678889846801758, 1.678889846801758, 1.678889846801758, 1.5330326080322265, 1.0437751770019532, 1.0437751770019532, 0.46435117721557617 ] } > structure of clusterdump's JSON output > -------------------------------------- > > Key: MAHOUT-1505 > URL: https://issues.apache.org/jira/browse/MAHOUT-1505 > Project: Mahout > Issue Type: Bug > Components: Clustering > Affects Versions: 0.9 > Reporter: Terry Blankers > Assignee: Andrew Musselman > Labels: json > > Hi all, I'm working on some automated analysis of the clusterdump output > using '-of = JSON'. While digging into the structure of the representation of > the data I've noticed something that seems a little odd to me. > In order to access the data for a particular cluster, the 'cluster', 'n', 'c' > & 'r' values are all in one continuous string. For example: > {noformat} > {"cluster":"VL-10515{n=5924 c=[action:0.023, adherence:0.223, > administration:0.011 r=[action:0.446, adherence:1.501, > administration:0.306]}"} > {noformat} > This is also the case for the "point": > {noformat} > {"point":"013FFD34580BA31AECE5D75DE65478B3D691D138 = [body:6.904, > harm:10.101]","vector_name":"013FFD34580BA31AECE5D75DE65478B3D691D138","weight":"1.0"} > {noformat} > This leads me to believe that the only way I can get to the individual data > in these items is by string parsing. For JSON deserialization I would have > expected to see something along the lines of: > {noformat} > { > "cluster":"VL-10515", > "n":5924, > "c": > [ > {"action":0.023}, > {"adherence":0.223}, > {"administration":0.011} > ], > "r": > [ > {"action":0.446}, > {"adherence":1.501}, > {"administration":0.306} > ] > } > {noformat} > and: > {noformat} > { > "point": { > "body": 6.904, > "harm": 10.101 > }, > "vector_name": "013FFD34580BA31AECE5D75DE65478B3D691D138", > "weight": 1.0 > } > {noformat} > Andrew Musselman replied: > {quote} > Looks like a bug to me as well; I would have expected something similar to > what you were expecting except maybe something like this which puts the "c" > and "r" values in objects rather than arrays of single-element objects: > {noformat} > { > "cluster":"VL-10515", > "n":5924, > "c": > { > "action":0.023, > "adherence":0.223, > "administration":0.011 > }, > "r": > { > "action":0.446, > "adherence":1.501, > "administration":0.306 > } > } > {noformat} > {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)