It took a couple of hours time to figure out exactly what is causing issue. :) Also, I have faced issues in comparing Cluster objects. We should be comparing with the final centroid/center which ever is available. However, it doesn't seem to be happening in that way. I haven't drilled down much in this issue though. So, finally, I had to switch to asFormatString() and use it for comparison.

Thanks
Pallavi

On 03/17/2010 07:09 PM, Jake Mannix wrote:
On Wed, Mar 17, 2010 at 6:14 AM, Jeff Eastman<j...@windwardsolutions.com>wrote:

Pallavi Palleti wrote:

Hi,

Could some one kindly let me know the significance of instance variable
"name" in AbstractVector? It is causing problems, when I write a vector to
file and read and compare with the same vector if the value of "name" is
null. Because, while writing to file, "name" is set to empty string if it is
null. So, when we read the vector from the file, it will have different
value (not null) and asFormatString will have two different values for these
vectors and so concludes that they are different.

Thanks
Pallavi

  The "name" instance variable was added in MAHOUT-65 along with the
"labelBindings" feature so that e.g. a term vector can retain its term in
its state. I guess the problem you are seeing is an interaction between the
vector Writable implementation - which incorrectly handles null - and the
Json produced by asFormatString.<rant>  I've said this before and, not to
belabor the point, using a Json encoding to compare vectors for equality has
a host of related problems, most recently with lazy lengthSquared. If Vector
implemented Printable instead, then asFormatString(bindings) could probably
be crafted to eliminate these problems and be usable for such comparisons.
</rant>

I think Pallavi is running into a worse version of this, Jeff - since
Writable "rehydrates" a null String as "" as you say, but equals() and even
equivalent() in Vector actually do do string compares on the name,
VectorWritables do not properly compare with equals before and after
serialization with *themselves*, which is a nasty *bug* and needs to get
fixed, regardless of Printable / asFormatString work.

   -jake

Reply via email to