[ https://issues.apache.org/jira/browse/MAHOUT-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720752#action_12720752 ]
Grant Ingersoll commented on MAHOUT-65: --------------------------------------- bq. For really big vectors I'd expect serialization and deserialization to be String-intensive. Yeah, it is. It's the majority of the work, in fact, in the profiling (85%+ of time spent) I've done so far on this. Not sure if it is premature optimization, but it seems likely that a compact binary format may be a nice option to have for internal pieces of the puzzle. In other words, the input can still be String based as we have now, but the internal mappers and reducers can use something more compact. I've seen this happen a lot with Solr and other XML based apps in that if you control both ends of the pipe, String based approaches, while nice from a readability standpoint, are showstoppers for performance. Having a converter from binary to String-based can then be employed when readability/debugging is required. > Add Element Labels to Vectors and Matrices > ------------------------------------------ > > Key: MAHOUT-65 > URL: https://issues.apache.org/jira/browse/MAHOUT-65 > Project: Mahout > Issue Type: New Feature > Components: Matrix > Affects Versions: 0.1 > Reporter: Jeff Eastman > Assignee: Jeff Eastman > Attachments: MAHOUT-65-name.patch, MAHOUT-65-name.patch, > MAHOUT-65-name.patch, MAHOUT-65.patch, MAHOUT-65b.patch, MAHOUT-65c.patch, > MAHOUT-65d.patch > > > Many applications can benefit by accessing elements in vectors and matrices > using String labels in addition to numeric indices. Investigate adding such a > capability. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.