[
https://issues.apache.org/jira/browse/MAHOUT-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964856#comment-13964856
]
Andrew Musselman edited comment on MAHOUT-1505 at 4/10/14 12:52 AM:
--------------------------------------------------------------------
I just realized top_terms is a list for a reason, which is to have the terms
ranked in descending order of weight.
"top_terms":[{"weight":1.9162907600402832,"term":"red"},
{"weight":1.678889846801758,"term":"over"},
{"weight":1.678889846801758,"term":"lazy"},
{"weight":1.678889846801758,"term":"brown"},
{"weight":1.678889846801758,"term":"quick"},
{"weight":1.678889846801758,"term":"jumped"},
{"weight":1.5330326080322265,"term":"dogs"},
{"weight":1.0437751770019532,"term":"cat"},
{"weight":1.0437751770019532,"term":"fox"},
{"weight":0.46435117721557617,"term":"cap"}]
Can leave it that way or refigure like this:
"top_terms": { "terms": [ "red", "over", "lazy", "brown", "quick", "jumped",
"dogs" ], "weights": [ 1.9162907600402832, 1.678889846801758,
1.678889846801758, 1.678889846801758, 1.678889846801758, 1.678889846801758,
1.5330326080322265, 1.0437751770019532, 1.0437751770019532, 0.46435117721557617
] }
was (Author: andrew.musselman):
I just realized top_terms is a list for a reason, which is to have the terms
ranked in descending order of weight.
"top_terms":[{"weight":1.9162907600402832,"term":"red"},{"weight":1.678889846801758,"term":"over"},{"weight":1.678889846801758,"term":"lazy"},{"weight":1.678889846801758,"term":"brown"},{"weight":1.678889846801758,"term":"quick"},{"weight":1.678889846801758,"term":"jumped"},{"weight":1.5330326080322265,"term":"dogs"},{"weight":1.0437751770019532,"term":"cat"},{"weight":1.0437751770019532,"term":"fox"},{"weight":0.46435117721557617,"term":"cap"}]
Can leave it that way or refigure like this:
"top_terms": { "terms": [ "red", "over", "lazy", "brown", "quick", "jumped",
"dogs" ], "weights": [ 1.9162907600402832, 1.678889846801758,
1.678889846801758, 1.678889846801758, 1.678889846801758, 1.678889846801758,
1.5330326080322265, 1.0437751770019532, 1.0437751770019532, 0.46435117721557617
] }
> structure of clusterdump's JSON output
> --------------------------------------
>
> Key: MAHOUT-1505
> URL: https://issues.apache.org/jira/browse/MAHOUT-1505
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 0.9
> Reporter: Terry Blankers
> Assignee: Andrew Musselman
> Labels: json
>
> Hi all, I'm working on some automated analysis of the clusterdump output
> using '-of = JSON'. While digging into the structure of the representation of
> the data I've noticed something that seems a little odd to me.
> In order to access the data for a particular cluster, the 'cluster', 'n', 'c'
> & 'r' values are all in one continuous string. For example:
> {noformat}
> {"cluster":"VL-10515{n=5924 c=[action:0.023, adherence:0.223,
> administration:0.011 r=[action:0.446, adherence:1.501,
> administration:0.306]}"}
> {noformat}
> This is also the case for the "point":
> {noformat}
> {"point":"013FFD34580BA31AECE5D75DE65478B3D691D138 = [body:6.904,
> harm:10.101]","vector_name":"013FFD34580BA31AECE5D75DE65478B3D691D138","weight":"1.0"}
> {noformat}
> This leads me to believe that the only way I can get to the individual data
> in these items is by string parsing. For JSON deserialization I would have
> expected to see something along the lines of:
> {noformat}
> {
> "cluster":"VL-10515",
> "n":5924,
> "c":
> [
> {"action":0.023},
> {"adherence":0.223},
> {"administration":0.011}
> ],
> "r":
> [
> {"action":0.446},
> {"adherence":1.501},
> {"administration":0.306}
> ]
> }
> {noformat}
> and:
> {noformat}
> {
> "point": {
> "body": 6.904,
> "harm": 10.101
> },
> "vector_name": "013FFD34580BA31AECE5D75DE65478B3D691D138",
> "weight": 1.0
> }
> {noformat}
> Andrew Musselman replied:
> {quote}
> Looks like a bug to me as well; I would have expected something similar to
> what you were expecting except maybe something like this which puts the "c"
> and "r" values in objects rather than arrays of single-element objects:
> {noformat}
> {
> "cluster":"VL-10515",
> "n":5924,
> "c":
> {
> "action":0.023,
> "adherence":0.223,
> "administration":0.011
> },
> "r":
> {
> "action":0.446,
> "adherence":1.501,
> "administration":0.306
> }
> }
> {noformat}
> {quote}
--
This message was sent by Atlassian JIRA
(v6.2#6252)