[
https://issues.apache.org/jira/browse/MAHOUT-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856812#action_12856812
]
Sean Owen commented on MAHOUT-379:
----------------------------------
Yeah let's take some time to get this right. At the moment I see four notions
of equivalence in Vector (which is down from five!), and this seems like one
too many:
==: of course
equals(): compares values, names, not implementation
equivalent(): compares values only
strictEquivalence(): compares values, names, implementation
equals() ought to be strict-ish. Its current implementation is fine, though
conventional wisdom is that it's better to only consider instances of the same
class equals() in order to avoid transitivity problems. I think that's a valid
concern here.
Therefore I submit that equals() should be replaced with what
strictEquivalence() does.
(And then, of course, fix the underlying issue that was raised too!)
> SequentialAccessSparseVector.equals does not agree with
> AbstractVector.equivalent
> ---------------------------------------------------------------------------------
>
> Key: MAHOUT-379
> URL: https://issues.apache.org/jira/browse/MAHOUT-379
> Project: Mahout
> Issue Type: Bug
> Components: Math
> Affects Versions: 0.4
> Reporter: Danny Leshem
> Priority: Minor
> Fix For: 0.3
>
>
> When a SequentialAccessSparseVector is serialized and deserialized using
> VectorWritable, the result vector and the original vector are equivalent, yet
> equals returns false.
> The following unit-test reproduces the problem:
> {code}
> @Test
> public void testSequentialAccessSparseVectorEquals() throws Exception {
> final Vector v = new SequentialAccessSparseVector(1);
> final VectorWritable vectorWritable = new VectorWritable(v);
> final VectorWritable vectorWritable2 = new VectorWritable();
> writeAndRead(vectorWritable, vectorWritable2);
> final Vector v2 = vectorWritable2.get();
> assertTrue(AbstractVector.equivalent(v, v2));
> assertEquals(v, v2); // This line fails!
> }
> private void writeAndRead(Writable toWrite, Writable toRead) throws
> IOException {
> final ByteArrayOutputStream baos = new ByteArrayOutputStream();
> final DataOutputStream dos = new DataOutputStream(baos);
> toWrite.write(dos);
> final ByteArrayInputStream bais = new
> ByteArrayInputStream(baos.toByteArray());
> final DataInputStream dis = new DataInputStream(bais);
> toRead.readFields(dis);
> }
> {code}
> The problem seems to be that the original vector name is null, while the new
> vector's name is an empty string. The same issue probably also happens with
> RandomAccessSparseVector.
> SequentialAccessSparseVectorWritable (line 40):
> {code}
> dataOutput.writeUTF(getName() == null ? "" : getName());
> {code}
> RandomAccessSparseVectorWritable (line 42):
> {code}
> dataOutput.writeUTF(this.getName() == null ? "" : this.getName());
> {code}
> The simplest fix is probably to change the default Vector's name from null to
> the empty string.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira