bq. V3 would now serailize the tags also after the Value part before the memstoreTS
Any consideration that the tags are serialized before the memstoreTS instead of after ? bq. The BuffereddataEncoder, being the base class for all encoders other than PrefixTree would now be tag aware. When would PrefixTree be able to handle tags ? When a new HFile is opened, would user be able to specify that there is no tagging involved ? Put in another way, after this feature goes in, would HFile V3 always be written ? Thanks On Thu, Jul 18, 2013 at 9:29 PM, ramkrishna vasudevan < ramkrishna.s.vasude...@gmail.com> wrote: > What changes/differences that we would be introducing in the V3 format > would be (I will put down in words under subcategory) > > To reduce the code duplicate we would subclass ReaderV3 and WriterV3 from > ReaderV2 and WriterV2 respectively. > *HFileBlockFormat* > *=============* > No change in V2 and V3. > > *KV serialization* > *============* > V2 no change > V3 would now serailize the tags also after the Value part before the > memstoreTS > > *FixedFileTrailer* > *===========* > Introduces a new information into the trailer which can be used in V3 to > make tags optional. Suppose take the case that user selects V3 but in one > CF there are no tags. Then we would write the tag bytes while flushing but > during compaction using this header info we would just avoid writing tags > in the compacted files. This would mean no impact on read performances > after the compaction has been completed. > V2 would code also tries to get this trailer info but this being null no > impact on any of the existing code. > > *WriterV3 and ReaderV3* > *=================* > Tries to handle the tags based on the meta data from the trailer info. All > the apis like seekTo, next(), getKeyValue() are now able to handle tags > based on the flag passed during the construction of the Readers and > Writers. We can be sure that for any instances of V2 the includeTags flag > would always be false. > > *DataBlockEncoders* > *==============* > Additonal arguments added to the apis in the interfaces related to > HFileDataBlockEncoders, BufferedDataBlockEncoders, > HFileDataBlockEncodingContext etc. Again for V2 the new apis would still > behave the same way and there would be no impact for V2 based usecases. > The BuffereddataEncoder, being the base class for all encoders other than > PrefixTree would now be tag aware. > > *PrefixTreeEncoders* > *==============* > Trying to keep changes minimal here but would ensure that there are no > behaviourial changes while using PrefixTree with V2. > > *KeyValue class* > *===========* > Wil include changes to have a Tag class inside this. Apis to identify tags > in a KV would be needed. Util method changes also would be there. > > For V2 based read/write flow the existing code path applies with no/minimal > changes. > > Many testcases has to be changed to accomodate the api changes happening to > the internal interfaces. > I have listed down the changes at a high level, may be once you could see a > patch that would give more clarity. Let me know if further information > would be needed. > > Regards > Ram > > > On Thu, Jul 18, 2013 at 11:25 PM, Jimmy Xiang <jxi...@cloudera.com> wrote: > > > Can you share some more details about it? A graph/chart/table showing > the > > specific difference will be helpful. > > > > Thanks, > > Jimmy > > > > > > On Thu, Jul 18, 2013 at 10:23 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > I have been following comments on HBASE-8496. > > > > > > I think introducing cell tagging through HFile v3 is acceptable. > > > > > > Looking forward to seeing your implementation. > > > > > > Cheers > > > > > > On Thu, Jul 18, 2013 at 10:14 AM, ramkrishna vasudevan < > > > ramkrishna.s.vasude...@gmail.com> wrote: > > > > > > > For the past couple of months, we have been working through various > > > > prototypes for supporting inline storage of tags in cells as > persisted > > on > > > > disk. Our goals are to support optional use of tags with minimal > > changes > > > to > > > > core code while also avoiding performance impacts to users who do not > > use > > > > tags. > > > > > > > > For background, refer to the comments in > > > > > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13708228&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13708228 > > > > > > > > and > > > > > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653 > > > > > > > > We have iterated on a couple of prototypes that implement tag > > awareness > > > in > > > > DataBlockEncoders, later as a new type of Codec for Cells. This point > > is > > > > discussed in the above comments in HBASE-8496. > > > > > > > > We think that tag awareness in Cell Codecs is the right way, but > there > > > are > > > > some shortcomings with the current interfaces internal to HFile that > > need > > > > to addressed in order to avoid any performance impacts for those who > do > > > not > > > > want to use inline tags, and that may involve a drastic amount of > code > > > > change. > > > > > > > > We can avoid several problems with HFile V2 internals, and backwards > > > > compatibility concerns, and allow for working tags support with no > > > > performance impact and low risk to all HBase users who do not want > tag > > > > support, while still allowing for inline tags capabilities in a > > shipping > > > > version of HBase, by introducing this in a new V3 version for HFile. > > > > > > > > The new V3 version for HFile differs from earlier versions by > > supporting > > > > inline tag storage. This version does not change the HFileBlock > format > > > > whereas it just serializes and deserializes the Tag information that > > > would > > > > be persisted in the HFile. Having HFile V3 would also help to keep > Tags > > > > optional such that the existing cases where there are no tags are > > totally > > > > unaffected. Also we ensure that we keep the changes outside of the > V3 > > > > reader and writer minimal. Compatibility would not be a problem with > > > > future versions when we go with Cell Codecs. What Codecs used for > > > writing > > > > the file will be persisted in the HFile header. Now for files that > are > > > > either V2 or V3 we will instantiate two default codecs that know to > > deal > > > > with serializations with and without tags. > > > > > > > > There have been thoughts on an HFile V3 prior, e.g.: > > > > > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653 > > > > > > > > We have been working on this and will have a clean patch with good > > > amount > > > > of testing in time for 0.96. > > > > > > > > Although our focus is on performance-neutral persistence of inline > cell > > > > tags in 0.96 to enable a couple of security coprocessor users, > > > introducing > > > > an HFile V3 provides design freedom for some other features and > > problems > > > > too that can be developed through the 0.96 cycle into 0.98. > > > > > > > > Pls voice your opinion on this so that we can make this clear and may > > be > > > > define the scope of the patch. Also feel free to comment on > HBASE-8496 > > > on > > > > your thoughts and ideas. > > > > > > > > Regards > > > > > > > > Ram > > > > > > > > > >