I have been following comments on HBASE-8496. I think introducing cell tagging through HFile v3 is acceptable.
Looking forward to seeing your implementation. Cheers On Thu, Jul 18, 2013 at 10:14 AM, ramkrishna vasudevan < [email protected]> wrote: > For the past couple of months, we have been working through various > prototypes for supporting inline storage of tags in cells as persisted on > disk. Our goals are to support optional use of tags with minimal changes to > core code while also avoiding performance impacts to users who do not use > tags. > > For background, refer to the comments in > > > https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13708228&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13708228 > > and > > > https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653 > > We have iterated on a couple of prototypes that implement tag awareness in > DataBlockEncoders, later as a new type of Codec for Cells. This point is > discussed in the above comments in HBASE-8496. > > We think that tag awareness in Cell Codecs is the right way, but there are > some shortcomings with the current interfaces internal to HFile that need > to addressed in order to avoid any performance impacts for those who do not > want to use inline tags, and that may involve a drastic amount of code > change. > > We can avoid several problems with HFile V2 internals, and backwards > compatibility concerns, and allow for working tags support with no > performance impact and low risk to all HBase users who do not want tag > support, while still allowing for inline tags capabilities in a shipping > version of HBase, by introducing this in a new V3 version for HFile. > > The new V3 version for HFile differs from earlier versions by supporting > inline tag storage. This version does not change the HFileBlock format > whereas it just serializes and deserializes the Tag information that would > be persisted in the HFile. Having HFile V3 would also help to keep Tags > optional such that the existing cases where there are no tags are totally > unaffected. Also we ensure that we keep the changes outside of the V3 > reader and writer minimal. Compatibility would not be a problem with > future versions when we go with Cell Codecs. What Codecs used for writing > the file will be persisted in the HFile header. Now for files that are > either V2 or V3 we will instantiate two default codecs that know to deal > with serializations with and without tags. > > There have been thoughts on an HFile V3 prior, e.g.: > > > https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653 > > We have been working on this and will have a clean patch with good amount > of testing in time for 0.96. > > Although our focus is on performance-neutral persistence of inline cell > tags in 0.96 to enable a couple of security coprocessor users, introducing > an HFile V3 provides design freedom for some other features and problems > too that can be developed through the 0.96 cycle into 0.98. > > Pls voice your opinion on this so that we can make this clear and may be > define the scope of the patch. Also feel free to comment on HBASE-8496 on > your thoughts and ideas. > > Regards > > Ram >
