bq. By default code will go with V2. Good.
Looking forward to the patch. On Thu, Jul 18, 2013 at 9:57 PM, ramkrishna vasudevan < ramkrishna.s.vasude...@gmail.com> wrote: > >>Any consideration that the tags are serialized before the memstoreTS > instead of after ? > The argument is basically simple like memstoreTS is optional and that comes > only in HFile and not in KV. The tags are as part of the current design > comes after Value in the KV structure. Hence the same would be better to > be applied on HFiles also. > >>When would PrefixTree be able to handle tags ? > May be my stmt confused you. Pls see the point on PrefixTreeEncoders in > the previous mail. I meant that as per the current design PrefixKey, > DiffKey, FastDiff extend BufferedDataEncoders and hence > BufferedDataEncoders are made tag aware. > > PrefixTreecodec has been handled separately to make it work with tags. > >> Put in another way, after this feature goes in, would > HFile V3 always be written ? > By default code will go with V2. So when user says he needs V3 he would > need to update the hfile.format.version to 3. This would ensure that the > system uses V3. > > Thanks Ted. > > Regards > Ram > > > On Fri, Jul 19, 2013 at 10:10 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > > bq. V3 would now serailize the tags also after the Value part before the > > memstoreTS > > > > Any consideration that the tags are serialized before the memstoreTS > > instead of after ? > > > > bq. The BuffereddataEncoder, being the base class for all encoders other > > than PrefixTree would now be tag aware. > > > > When would PrefixTree be able to handle tags ? > > > > When a new HFile is opened, would user be able to specify that there is > no > > tagging involved ? Put in another way, after this feature goes in, would > > HFile V3 always be written ? > > > > Thanks > > > > On Thu, Jul 18, 2013 at 9:29 PM, ramkrishna vasudevan < > > ramkrishna.s.vasude...@gmail.com> wrote: > > > > > What changes/differences that we would be introducing in the V3 format > > > would be (I will put down in words under subcategory) > > > > > > To reduce the code duplicate we would subclass ReaderV3 and WriterV3 > from > > > ReaderV2 and WriterV2 respectively. > > > *HFileBlockFormat* > > > *=============* > > > No change in V2 and V3. > > > > > > *KV serialization* > > > *============* > > > V2 no change > > > V3 would now serailize the tags also after the Value part before the > > > memstoreTS > > > > > > *FixedFileTrailer* > > > *===========* > > > Introduces a new information into the trailer which can be used in V3 > to > > > make tags optional. Suppose take the case that user selects V3 but in > > one > > > CF there are no tags. Then we would write the tag bytes while flushing > > but > > > during compaction using this header info we would just avoid writing > tags > > > in the compacted files. This would mean no impact on read performances > > > after the compaction has been completed. > > > V2 would code also tries to get this trailer info but this being null > no > > > impact on any of the existing code. > > > > > > *WriterV3 and ReaderV3* > > > *=================* > > > Tries to handle the tags based on the meta data from the trailer info. > > All > > > the apis like seekTo, next(), getKeyValue() are now able to handle tags > > > based on the flag passed during the construction of the Readers and > > > Writers. We can be sure that for any instances of V2 the includeTags > > flag > > > would always be false. > > > > > > *DataBlockEncoders* > > > *==============* > > > Additonal arguments added to the apis in the interfaces related to > > > HFileDataBlockEncoders, BufferedDataBlockEncoders, > > > HFileDataBlockEncodingContext etc. Again for V2 the new apis would > still > > > behave the same way and there would be no impact for V2 based usecases. > > > The BuffereddataEncoder, being the base class for all encoders other > than > > > PrefixTree would now be tag aware. > > > > > > *PrefixTreeEncoders* > > > *==============* > > > Trying to keep changes minimal here but would ensure that there are no > > > behaviourial changes while using PrefixTree with V2. > > > > > > *KeyValue class* > > > *===========* > > > Wil include changes to have a Tag class inside this. Apis to identify > > tags > > > in a KV would be needed. Util method changes also would be there. > > > > > > For V2 based read/write flow the existing code path applies with > > no/minimal > > > changes. > > > > > > Many testcases has to be changed to accomodate the api changes > happening > > to > > > the internal interfaces. > > > I have listed down the changes at a high level, may be once you could > > see a > > > patch that would give more clarity. Let me know if further information > > > would be needed. > > > > > > Regards > > > Ram > > > > > > > > > On Thu, Jul 18, 2013 at 11:25 PM, Jimmy Xiang <jxi...@cloudera.com> > > wrote: > > > > > > > Can you share some more details about it? A graph/chart/table > showing > > > the > > > > specific difference will be helpful. > > > > > > > > Thanks, > > > > Jimmy > > > > > > > > > > > > On Thu, Jul 18, 2013 at 10:23 AM, Ted Yu <yuzhih...@gmail.com> > wrote: > > > > > > > > > I have been following comments on HBASE-8496. > > > > > > > > > > I think introducing cell tagging through HFile v3 is acceptable. > > > > > > > > > > Looking forward to seeing your implementation. > > > > > > > > > > Cheers > > > > > > > > > > On Thu, Jul 18, 2013 at 10:14 AM, ramkrishna vasudevan < > > > > > ramkrishna.s.vasude...@gmail.com> wrote: > > > > > > > > > > > For the past couple of months, we have been working through > various > > > > > > prototypes for supporting inline storage of tags in cells as > > > persisted > > > > on > > > > > > disk. Our goals are to support optional use of tags with minimal > > > > changes > > > > > to > > > > > > core code while also avoiding performance impacts to users who do > > not > > > > use > > > > > > tags. > > > > > > > > > > > > For background, refer to the comments in > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13708228&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13708228 > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653 > > > > > > > > > > > > We have iterated on a couple of prototypes that implement tag > > > > awareness > > > > > in > > > > > > DataBlockEncoders, later as a new type of Codec for Cells. This > > point > > > > is > > > > > > discussed in the above comments in HBASE-8496. > > > > > > > > > > > > We think that tag awareness in Cell Codecs is the right way, but > > > there > > > > > are > > > > > > some shortcomings with the current interfaces internal to HFile > > that > > > > need > > > > > > to addressed in order to avoid any performance impacts for those > > who > > > do > > > > > not > > > > > > want to use inline tags, and that may involve a drastic amount of > > > code > > > > > > change. > > > > > > > > > > > > We can avoid several problems with HFile V2 internals, and > > backwards > > > > > > compatibility concerns, and allow for working tags support with > no > > > > > > performance impact and low risk to all HBase users who do not > want > > > tag > > > > > > support, while still allowing for inline tags capabilities in a > > > > shipping > > > > > > version of HBase, by introducing this in a new V3 version for > > HFile. > > > > > > > > > > > > The new V3 version for HFile differs from earlier versions by > > > > supporting > > > > > > inline tag storage. This version does not change the HFileBlock > > > format > > > > > > whereas it just serializes and deserializes the Tag information > > that > > > > > would > > > > > > be persisted in the HFile. Having HFile V3 would also help to > keep > > > Tags > > > > > > optional such that the existing cases where there are no tags are > > > > totally > > > > > > unaffected. Also we ensure that we keep the changes outside of > the > > > V3 > > > > > > reader and writer minimal. Compatibility would not be a problem > > with > > > > > > future versions when we go with Cell Codecs. What Codecs used > for > > > > > writing > > > > > > the file will be persisted in the HFile header. Now for files > that > > > are > > > > > > either V2 or V3 we will instantiate two default codecs that know > to > > > > deal > > > > > > with serializations with and without tags. > > > > > > > > > > > > There have been thoughts on an HFile V3 prior, e.g.: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653 > > > > > > > > > > > > We have been working on this and will have a clean patch with > good > > > > > amount > > > > > > of testing in time for 0.96. > > > > > > > > > > > > Although our focus is on performance-neutral persistence of > inline > > > cell > > > > > > tags in 0.96 to enable a couple of security coprocessor users, > > > > > introducing > > > > > > an HFile V3 provides design freedom for some other features and > > > > problems > > > > > > too that can be developed through the 0.96 cycle into 0.98. > > > > > > > > > > > > Pls voice your opinion on this so that we can make this clear and > > may > > > > be > > > > > > define the scope of the patch. Also feel free to comment on > > > HBASE-8496 > > > > > on > > > > > > your thoughts and ideas. > > > > > > > > > > > > Regards > > > > > > > > > > > > Ram > > > > > > > > > > > > > > > > > > > > >