[ https://issues.apache.org/jira/browse/MAHOUT-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858807#action_12858807 ]
Sean Owen commented on MAHOUT-379: ---------------------------------- I'd like to commit this patch as it addresses a couple issues, but it's a big one. Deserves some looking if you have a moment. We can retroactively undo some elements later, but best to have a glance now. > SequentialAccessSparseVector.equals does not agree with > AbstractVector.equivalent > --------------------------------------------------------------------------------- > > Key: MAHOUT-379 > URL: https://issues.apache.org/jira/browse/MAHOUT-379 > Project: Mahout > Issue Type: Bug > Components: Math > Affects Versions: 0.4 > Reporter: Danny Leshem > Assignee: Sean Owen > Priority: Minor > Fix For: 0.3 > > Attachments: MAHOUT-379.patch, MAHOUT-379.patch, MAHOUT-379.patch > > > When a SequentialAccessSparseVector is serialized and deserialized using > VectorWritable, the result vector and the original vector are equivalent, yet > equals returns false. > The following unit-test reproduces the problem: > {code} > @Test > public void testSequentialAccessSparseVectorEquals() throws Exception { > final Vector v = new SequentialAccessSparseVector(1); > final VectorWritable vectorWritable = new VectorWritable(v); > final VectorWritable vectorWritable2 = new VectorWritable(); > writeAndRead(vectorWritable, vectorWritable2); > final Vector v2 = vectorWritable2.get(); > assertTrue(AbstractVector.equivalent(v, v2)); > assertEquals(v, v2); // This line fails! > } > private void writeAndRead(Writable toWrite, Writable toRead) throws > IOException { > final ByteArrayOutputStream baos = new ByteArrayOutputStream(); > final DataOutputStream dos = new DataOutputStream(baos); > toWrite.write(dos); > final ByteArrayInputStream bais = new > ByteArrayInputStream(baos.toByteArray()); > final DataInputStream dis = new DataInputStream(bais); > toRead.readFields(dis); > } > {code} > The problem seems to be that the original vector name is null, while the new > vector's name is an empty string. The same issue probably also happens with > RandomAccessSparseVector. > SequentialAccessSparseVectorWritable (line 40): > {code} > dataOutput.writeUTF(getName() == null ? "" : getName()); > {code} > RandomAccessSparseVectorWritable (line 42): > {code} > dataOutput.writeUTF(this.getName() == null ? "" : this.getName()); > {code} > The simplest fix is probably to change the default Vector's name from null to > the empty string. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.