[ https://issues.apache.org/jira/browse/MAHOUT-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856812#action_12856812 ]
Sean Owen commented on MAHOUT-379: ---------------------------------- Yeah let's take some time to get this right. At the moment I see four notions of equivalence in Vector (which is down from five!), and this seems like one too many: ==: of course equals(): compares values, names, not implementation equivalent(): compares values only strictEquivalence(): compares values, names, implementation equals() ought to be strict-ish. Its current implementation is fine, though conventional wisdom is that it's better to only consider instances of the same class equals() in order to avoid transitivity problems. I think that's a valid concern here. Therefore I submit that equals() should be replaced with what strictEquivalence() does. (And then, of course, fix the underlying issue that was raised too!) > SequentialAccessSparseVector.equals does not agree with > AbstractVector.equivalent > --------------------------------------------------------------------------------- > > Key: MAHOUT-379 > URL: https://issues.apache.org/jira/browse/MAHOUT-379 > Project: Mahout > Issue Type: Bug > Components: Math > Affects Versions: 0.4 > Reporter: Danny Leshem > Priority: Minor > Fix For: 0.3 > > > When a SequentialAccessSparseVector is serialized and deserialized using > VectorWritable, the result vector and the original vector are equivalent, yet > equals returns false. > The following unit-test reproduces the problem: > {code} > @Test > public void testSequentialAccessSparseVectorEquals() throws Exception { > final Vector v = new SequentialAccessSparseVector(1); > final VectorWritable vectorWritable = new VectorWritable(v); > final VectorWritable vectorWritable2 = new VectorWritable(); > writeAndRead(vectorWritable, vectorWritable2); > final Vector v2 = vectorWritable2.get(); > assertTrue(AbstractVector.equivalent(v, v2)); > assertEquals(v, v2); // This line fails! > } > private void writeAndRead(Writable toWrite, Writable toRead) throws > IOException { > final ByteArrayOutputStream baos = new ByteArrayOutputStream(); > final DataOutputStream dos = new DataOutputStream(baos); > toWrite.write(dos); > final ByteArrayInputStream bais = new > ByteArrayInputStream(baos.toByteArray()); > final DataInputStream dis = new DataInputStream(bais); > toRead.readFields(dis); > } > {code} > The problem seems to be that the original vector name is null, while the new > vector's name is an empty string. The same issue probably also happens with > RandomAccessSparseVector. > SequentialAccessSparseVectorWritable (line 40): > {code} > dataOutput.writeUTF(getName() == null ? "" : getName()); > {code} > RandomAccessSparseVectorWritable (line 42): > {code} > dataOutput.writeUTF(this.getName() == null ? "" : this.getName()); > {code} > The simplest fix is probably to change the default Vector's name from null to > the empty string. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira