Hi Sean,
I have used equivalent() in all the places and this worked perfectly
fine with me. Yes. As Jake said, probably this needs a documentation
rather a bug.
Thanks
Pallavi
On 03/16/2010 02:17 AM, Jake Mannix (JIRA) wrote:
[
https://issues.apache.org/jira/browse/MAHOUT-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845519#action_12845519
]
Jake Mannix commented on MAHOUT-337:
------------------------------------
Depending on how Pallavi was seeing this come up (doing surely not
String.equalsIgnoreCase() on the json strings? Ordering is not guaranteed on the
key-value pairs of the sparse outputs, for one thing), there are a couple of solutions.
The "right" way to tell if two vectors are the same is to do
AbstractVector.equivalent(Vector). We should make that more clear in the documentation,
certainly.
Actually, now that I read the implementation of equals() in the vector classes,
they're *horribly* inefficient (don't take into account sparsity at all), and I
really hope it's not being used anywhere that performance matters (but for the
record: equals and equivalent correctly ignore the cached value when comparing
vectors)
Should the string form of the vector have the cached value? It's actually useful
knowledge to debug with, as I think I mentioned in the list thread - if you expect that
this value should not be changing, or if you need a quick "mental checksum"
that this is the vector you're expecting to see, keeping track of that one double value
is a good signature of what the vector is.
I guess I'm in favor of "not-a-bug" and just documenting it. But I'm open to
further thinking on how to make it easy to keep track of this kind of thing.
Don't serialize cached length squared in JSON vector representation
-------------------------------------------------------------------
Key: MAHOUT-337
URL: https://issues.apache.org/jira/browse/MAHOUT-337
Project: Mahout
Issue Type: Bug
Components: Math
Affects Versions: 0.3
Reporter: Sean Owen
Assignee: Sean Owen
Priority: Minor
Fix For: 0.4
The cached length-squared field in vectors should be marked transient so that
it is not part of the JSON serialized state.