[ 
https://issues.apache.org/jira/browse/MAHOUT-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845434#action_12845434
 ] 

Jake Mannix commented on MAHOUT-337:
------------------------------------

So a question about this: do we really want to do this?  The original 
discussion on the mailing list was around being able to debug by string 
comparison of the json form, effectively, and obviously in general we should be 
encouraging users to do things like " if (myVector.equivalent(otherVector) ) as 
a way of checking this. 

The reason I ask is that in certainly M/R jobs, if that lengthsquared is ever 
calculated, and then the vector is used immutably from then on, if we actually 
let the serialized form keep the lengthSquared, it will be carried around and 
not recomputed in subsequent Reduce or further M/R steps after it's computed in 
some initial Map.

Is there another way to "fix" the issue of two vectors, one which has the value 
cached, and the other which does not, but which are otherwise "equal", be 
actually properly compared (either by eye in JSON form, or otherwise).  Maybe 
documentation can take care of this?

> Don't serialize cached length squared in JSON vector representation
> -------------------------------------------------------------------
>
>                 Key: MAHOUT-337
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-337
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.3
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>            Priority: Minor
>             Fix For: 0.4
>
>
> The cached length-squared field in vectors should be marked transient so that 
> it is not part of the JSON serialized state. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to