[ 
https://issues.apache.org/jira/browse/MAHOUT-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858153#action_12858153
 ] 

Robin Anil commented on MAHOUT-379:
-----------------------------------

If the id from the vector is removed, I believe it will affect all clustering 
algorithms. The final stage is generating the vector_id, cluster_id pair.  will 
have to verify if this doesn't affect that step

> SequentialAccessSparseVector.equals does not agree with 
> AbstractVector.equivalent
> ---------------------------------------------------------------------------------
>
>                 Key: MAHOUT-379
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-379
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.4
>            Reporter: Danny Leshem
>            Assignee: Sean Owen
>            Priority: Minor
>             Fix For: 0.3
>
>         Attachments: MAHOUT-379.patch, MAHOUT-379.patch
>
>
> When a SequentialAccessSparseVector is serialized and deserialized using 
> VectorWritable, the result vector and the original vector are equivalent, yet 
> equals returns false.
> The following unit-test reproduces the problem:
> {code}
> @Test
> public void testSequentialAccessSparseVectorEquals() throws Exception {
>     final Vector v = new SequentialAccessSparseVector(1);
>     final VectorWritable vectorWritable = new VectorWritable(v);
>     final VectorWritable vectorWritable2 = new VectorWritable();
>     writeAndRead(vectorWritable, vectorWritable2);
>     final Vector v2 = vectorWritable2.get();
>     assertTrue(AbstractVector.equivalent(v, v2));
>     assertEquals(v, v2); // This line fails!
> }
> private void writeAndRead(Writable toWrite, Writable toRead) throws 
> IOException {
>     final ByteArrayOutputStream baos = new ByteArrayOutputStream();
>     final DataOutputStream dos = new DataOutputStream(baos);
>     toWrite.write(dos);
>     final ByteArrayInputStream bais = new 
> ByteArrayInputStream(baos.toByteArray());
>     final DataInputStream dis = new DataInputStream(bais);
>     toRead.readFields(dis);
> }
> {code}
> The problem seems to be that the original vector name is null, while the new 
> vector's name is an empty string. The same issue probably also happens with 
> RandomAccessSparseVector.
> SequentialAccessSparseVectorWritable (line 40):
> {code}
> dataOutput.writeUTF(getName() == null ? "" : getName());
> {code}
> RandomAccessSparseVectorWritable (line 42):
> {code}
> dataOutput.writeUTF(this.getName() == null ? "" : this.getName());
> {code}
> The simplest fix is probably to change the default Vector's name from null to 
> the empty string.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to