[ 
https://issues.apache.org/jira/browse/MAHOUT-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720752#action_12720752
 ] 

Grant Ingersoll commented on MAHOUT-65:
---------------------------------------

bq. For really big vectors I'd expect serialization and deserialization to be 
String-intensive.

Yeah, it is.  It's the majority of the work, in fact, in the profiling (85%+ of 
time spent) I've done so far on this.  Not sure if it is premature 
optimization, but it seems likely that a compact binary format may be a nice 
option to have for internal pieces of the puzzle.  In other words, the input 
can still be String based as we have now, but the internal mappers and reducers 
can use something more compact.  

I've seen this happen a lot with Solr and other XML based apps in that if you 
control both ends of the pipe, String based approaches, while nice from a 
readability standpoint, are showstoppers for performance.  Having a converter 
from binary to String-based can then be employed when readability/debugging is 
required.

> Add Element Labels to Vectors and Matrices
> ------------------------------------------
>
>                 Key: MAHOUT-65
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-65
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Matrix
>    Affects Versions: 0.1
>            Reporter: Jeff Eastman
>            Assignee: Jeff Eastman
>         Attachments: MAHOUT-65-name.patch, MAHOUT-65-name.patch, 
> MAHOUT-65-name.patch, MAHOUT-65.patch, MAHOUT-65b.patch, MAHOUT-65c.patch, 
> MAHOUT-65d.patch
>
>
> Many applications can benefit by accessing elements in vectors and matrices 
> using String labels in addition to numeric indices. Investigate adding such a 
> capability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to