On Tue, Feb 18, 2014 at 10:14 PM, Mike Drob <[email protected]> wrote:
> The column visibility is stored as a bytes on disk, derived from the entire > visibility expression. In theory this may seem like a lot of space, but in > practice it turns out to be fine for a couple of reasons. > > First, RFiles employ relative key encoding, so if the visibility is the > same in two consecutive keys, then the second one is simply omitted. > In 1.5, common prefixes in consecutive key fields may be compressed away. In 1.4 the entire field had to match. https://issues.apache.org/jira/browse/ACCUMULO-790 > Also, RFiles are use gz encoding by default. If you have a few similar > (repeated) text strings to represent your visibilities, then they will > compress very well. > > However, if you have lots of different visibilities, then you may not end > up gaining much from the storage tricks we employ. > > Mike > > > On Mon, Feb 17, 2014 at 10:30 PM, Sitaraman Vilayannur < > [email protected]> wrote: > > > Hi, > > How are the column visibility elements stored in Accumulo. Is there a > > kind of compression that is used to save space or are all the elements > for > > each key value paired stored as is. > > A pointer to the region of the code that i should look at for the > > implementation will also be helpful. > > Thanks > > Sitaraman > > >
