[
https://issues.apache.org/jira/browse/MAHOUT-531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925152#action_12925152
]
Alexander Hans commented on MAHOUT-531:
---------------------------------------
I hadn't realized that there are both, an iterator() and a iterateNonZeros(),
for vectors. I just checked, their behavior is indeed identical for dense as
well as sparse vectors. So I think the best solution would mean to
- implement iterator() for matrices like the one for vectors: iterate over
everything, even for values that might not be stored in memory (values = 0 for
sparse representations)
- implement iterateNonZero() for matrices like the one for vectors: do the same
as iterator(), but skip values = 0; this would speed up the iteration for
sparse representations and change almost nothing for dense ones
Now it remains open what to do with getNumNondefaultElements(). For sparse
vectors, it returns the number of non-zero elements, i.e., it gives the number
of elements that iterateNonZero() will iterate over. For dense vectors, it just
returns the size of the vector, no matter what the actual values are.
Matrix.getNumNondefaultElements() currently returns an int[2], where the first
value is the number of rows containing non-zero elements, the second value is
the number of columns with non-zero elements. I don't see any way of deriving
the actual number of non-zero elements from that, just an upper bound is
possible is int[0] * int[1]. However, to write matrices similarly to how
vectors are written, that number is needed. I see two options:
- 1. Change Matrix.getNumNondefaultElements() to behave like
Vector.getNumNondefaultElements(), i.e., return just an int. This would break
break backward compatibility and we'd lose the size() analogy.
- 2. Introduce Matrix.getNumNonZeroElements(). For consistency, we might also
want a Vector.getNumNonZeroElements(). For sparse representations those would
contain code that is identical (vector) or similar (matrix) to
getNumNondefaultElements(), for the dense versions there wouldn't be a way
around (costly) iterating the whole thing. That could be noted in the JavaDoc,
though. I wouldn't need those for reading/writing anyway.
So, what do you think?
> MatrixWritable doesn't actually write/read anything
> ---------------------------------------------------
>
> Key: MAHOUT-531
> URL: https://issues.apache.org/jira/browse/MAHOUT-531
> Project: Mahout
> Issue Type: Bug
> Components: Math
> Reporter: Alexander Hans
> Attachments: MAHOUT-531.patch, MAHOUT-531.patch
>
>
> The write() and readFields() methods of MatrixWritable write/read only the
> classname, they don't write/read actual data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.