[ https://issues.apache.org/jira/browse/HBASE-9794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798188#comment-13798188 ]
Lars Hofhansl commented on HBASE-9794: -------------------------------------- This is one of my pet peeves :) and the reason why scanning with block encoding is so much slower and more GC intensive than without. > KeyValues / cells backed by buffer fragments > -------------------------------------------- > > Key: HBASE-9794 > URL: https://issues.apache.org/jira/browse/HBASE-9794 > Project: HBase > Issue Type: Brainstorming > Reporter: Andrew Purtell > > There are various places in the code where we see comments to the effect > "would be great if we had a scatter gather API for KV", appearing at places > where we rewrite KVs on the server, for example in HRegion where we process > appends and increments. > KeyValues are stored in buffers of fixed length. This approach has > performance advantages for the common case where KVs are not manipulated on > their way from disk to RPC. The disadvantage of this approach is any > manipulation of tags requires the creation of a new buffer to hold the > result, and a copy of the KV data into the new buffer. Appends and increments > are typically a small percentage of overall workload so this has been fine up > to now. > > KeyValues can now carry metadata known as tags. Tags are stored contiguously > with the rest of the KeyValue. Applications wishing to use tags (like per > cell security) change the equation by wanting to rewrite KVs significantly > more often. > We should consider backing KeyValue with an alternative structure that can > better support rewriting portions of its data, appends to existing buffers, > scatter-gather copies, possibly even copy-on-write. -- This message was sent by Atlassian JIRA (v6.1#6144)