Make vector more space efficient with variable-length encoding, et al
---------------------------------------------------------------------
Key: MAHOUT-391
URL: https://issues.apache.org/jira/browse/MAHOUT-391
Project: Mahout
Issue Type: Improvement
Affects Versions: 0.3
Reporter: Sean Owen
Assignee: Sean Owen
Priority: Minor
Fix For: 0.4
There are a few things we can do to make Vector representations smaller on disk:
- Use variable-length encoding for integer values like size and element indices
in sparse representations
- Further, delta-encode indices in sequential representations
- Let caller specify that precision isn't crucial in values, allowing it to
store values as floats
Since indices are usually small-ish, I'd guess this saves 2 bytes or so on
average, out of 12 bytes per element now.
Using floats where applicable saves another 4. Not bad.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.