I have been batting that question back and forth in my own head recently. It IS absolutely a huge help to have labels. R has the data.frame to do this and it helps enormously. I have done it in some applications and it saved endless hassle.
On the other hand, there is a real danger that the label functionality would get sucked into a single implementation. Labels really are an orthogonal concern that are (should be) independent of how the matrix is implemented. So should there really be something like a LabeledMatrix wrapper that provides this labeling service to any matrix? On 3/16/08 2:23 PM, "Grant Ingersoll (JIRA)" <[EMAIL PROTECTED]> wrote: > > [ > https://issues.apache.org/jira/browse/MAHOUT-6?page=com.atlassian.jira.plugin. > system.issuetabpanels:comment-tabpanel&focusedCommentId=12579261#action_125792 > 61 ] > > Grant Ingersoll commented on MAHOUT-6: > -------------------------------------- > > Does it make sense to be able to assign labels to the rows and columns and > maybe even have it accessible as a map? For instance, I think I could use > these for the bayesian classifier implementation I am working on and it would > make sense to be able to label the features and the labels. Naturally, I can > store the information elsewhere as well, but didn't know whether it made sense > to keep the info w/ the matrix. > >> Need a matrix implementation >> ---------------------------- >> >> Key: MAHOUT-6 >> URL: https://issues.apache.org/jira/browse/MAHOUT-6 >> Project: Mahout >> Issue Type: New Feature >> Reporter: Ted Dunning >> Assignee: Grant Ingersoll >> Attachments: MAHOUT-6a.diff, MAHOUT-6b.diff, MAHOUT-6c.diff, >> MAHOUT-6d.diff, MAHOUT-6e.diff, MAHOUT-6f.diff, MAHOUT-6g.diff, >> MAHOUT-6h.patch, MAHOUT-6i.diff, MAHOUT-6j.diff, MAHOUT-6k.diff, >> MAHOUT-6l.patch >> >> >> We need matrices for Mahout. >> An initial set of basic requirements includes: >> a) sparse and dense support are required >> b) row and column labels are important >> c) serialization for hadoop use is required >> d) reasonable floating point performance is required, but awesome FP is not >> e) the API should be simple enough to understand >> f) it should be easy to carve out sub-matrices for sending to different >> reducers >> g) a reasonable set of matrix operations should be supported, these should >> eventually include: >> simple matrix-matrix and matrix-vector and matrix-scalar linear algebra >> operations, A B, A + B, A v, A + x, v + x, u + v, dot(u, v) >> row and column sums >> generalized level 2 and 3 BLAS primitives, alpha A B + beta C and A u + >> beta v >> h) easy and efficient iteration constructs, especially for sparse matrices >> i) easy to extend with new implementations