Hi Sean,

I have used equivalent() in all the places and this worked perfectly fine with me. Yes. As Jake said, probably this needs a documentation rather a bug.

Thanks
Pallavi
On 03/16/2010 02:17 AM, Jake Mannix (JIRA) wrote:
     [ 
https://issues.apache.org/jira/browse/MAHOUT-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845519#action_12845519
 ]

Jake Mannix commented on MAHOUT-337:
------------------------------------

Depending on how Pallavi was seeing this come up (doing surely not 
String.equalsIgnoreCase() on the json strings?  Ordering is not guaranteed on the 
key-value pairs of the sparse outputs, for one thing), there are a couple of solutions.  
The "right" way to tell if two vectors are the same is to do 
AbstractVector.equivalent(Vector).  We should make that more clear in the documentation, 
certainly.

Actually, now that I read the implementation of equals() in the vector classes, 
they're *horribly* inefficient (don't take into account sparsity at all), and I 
really hope it's not being used anywhere that performance matters (but for the 
record: equals and equivalent correctly ignore the cached value when comparing 
vectors)

Should the string form of the vector have the cached value?  It's actually useful 
knowledge to debug with, as I think I mentioned in the list thread - if you expect that 
this value should not be changing, or if you need a quick "mental checksum" 
that this is the vector you're expecting to see, keeping track of that one double value 
is a good signature of what the vector is.

I guess I'm in favor of "not-a-bug" and just documenting it.  But I'm open to 
further thinking on how to make it easy to keep track of this kind of thing.

Don't serialize cached length squared in JSON vector representation
-------------------------------------------------------------------

                 Key: MAHOUT-337
                 URL: https://issues.apache.org/jira/browse/MAHOUT-337
             Project: Mahout
          Issue Type: Bug
          Components: Math
    Affects Versions: 0.3
            Reporter: Sean Owen
            Assignee: Sean Owen
            Priority: Minor
             Fix For: 0.4


The cached length-squared field in vectors should be marked transient so that 
it is not part of the JSON serialized state.

Reply via email to