[ 
https://issues.apache.org/jira/browse/MAHOUT-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576037#action_12576037
 ] 

jeastman edited comment on MAHOUT-6 at 3/7/08 8:20 AM:
-----------------------------------------------------------

Boy, I am sure not wedded to the HashMap implementation. From my Smalltalk 
experience, one hash lookup like that is equivalent to about 40 iterations down 
a fixed array. Unless the vector cardinality is very large and mostly not 
sparse (size ~ cardinality), your implementation will likely outperform mine. I 
was coding stream-of-consciousness and picked only the most obvious and 
simplest to code approach for everything. Please feel free to propose a patch 
with your alternative. If you could include a unit test that measures the 
tradeoff point between iterating and hashing in the current jdk, it would be 
quite informative and even more compelling.

I'm ok with either introducing another sparse implementation, or changing the 
current one. I do think we ought to make these sorts of changes based upon 
empirical data and agreed upon user stories.

I do think we ought to get something into trunk soon so that the patch merging 
hassle is behind us.

      was (Author: jeastman):
    Boy, I am sure not wedded to the HashMap implementation. From my Smalltalk 
experience, one hash lookup like that is equivalent to about 40 iterations down 
a fixed array. Unless the vector cardinality is very large and mostly not 
sparse (size ~ cardinality), your implementation will likely outperform mine. I 
was coding stream-of-consciousness and picked only the most obvious and 
simplest to code approach for everything. Please feel free to propose a patch 
with your alternative. If you could include a unit test that measures the 
tradeoff point between iterating and hashing in the current jdk, it would be 
quite informative and even more compelling.

Unless others feel strongly about the HashMap approach, and can document its 
superiority with some compelling unit tests, I think we ought to change it.

I do think we ought to get something into trunk soon so that the patch merging 
hassle is behind us.
  
> Need a matrix implementation
> ----------------------------
>
>                 Key: MAHOUT-6
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-6
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Ted Dunning
>         Attachments: MAHOUT-6a.diff, MAHOUT-6b.diff, MAHOUT-6c.diff, 
> MAHOUT-6d.diff, MAHOUT-6e.diff, MAHOUT-6f.diff, MAHOUT-6g.diff, 
> MAHOUT-6h.patch, MAHOUT-6i.diff, MAHOUT-6j.diff
>
>
> We need matrices for Mahout.
> An initial set of basic requirements includes:
> a) sparse and dense support are required
> b) row and column labels are important
> c) serialization for hadoop use is required
> d) reasonable floating point performance is required, but awesome FP is not
> e) the API should be simple enough to understand
> f) it should be easy to carve out sub-matrices for sending to different 
> reducers
> g) a reasonable set of matrix operations should be supported, these should 
> eventually include:
>     simple matrix-matrix and matrix-vector and matrix-scalar linear algebra 
> operations, A B, A + B, A v, A + x, v + x, u + v, dot(u, v)
>     row and column sums  
>     generalized level 2 and 3 BLAS primitives, alpha A B + beta C and A u + 
> beta v
> h) easy and efficient iteration constructs, especially for sparse matrices
> i) easy to extend with new implementations

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to