[ https://issues.apache.org/jira/browse/MAHOUT-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864308#action_12864308 ]
Sean Owen commented on MAHOUT-391: ---------------------------------- Oh I get it. My other outstanding patch for MAHOUT-302 adds a second MahoutTestCase, so that there is one for the math and core module separately. The core one extends the math one of course. This is implicitly importing the one in math, which is in the same package. But you won't have it yet. Well that goes away soon anyway when all is committed. Importing the other temporarily is a fine patch. Oops, I hadn't even imagined these could overlap. > Make vector more space efficient with variable-length encoding, et al > --------------------------------------------------------------------- > > Key: MAHOUT-391 > URL: https://issues.apache.org/jira/browse/MAHOUT-391 > Project: Mahout > Issue Type: Improvement > Affects Versions: 0.3 > Reporter: Sean Owen > Assignee: Sean Owen > Priority: Minor > Fix For: 0.4 > > Attachments: MAHOUT-391.patch > > > There are a few things we can do to make Vector representations smaller on > disk: > - Use variable-length encoding for integer values like size and element > indices in sparse representations > - Further, delta-encode indices in sequential representations > - Let caller specify that precision isn't crucial in values, allowing it to > store values as floats > Since indices are usually small-ish, I'd guess this saves 2 bytes or so on > average, out of 12 bytes per element now. > Using floats where applicable saves another 4. Not bad. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.