[ https://issues.apache.org/jira/browse/HBASE-18201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16527252#comment-16527252 ]
Kuan-Po Tseng commented on HBASE-18201: --------------------------------------- Thanks for reviewing, Reid Chan. {quote}I think adjust the call order like following should works. No need to add another if branch, kind of confusing. {code:java} this.dataBlockEncoder.endBlockEncoding(encodingCtx, out, baosBytes); baos.flush(); baosBytes = baos.toByteArray(); {code} {quote} The problem is ROW_INDEX_V1 extends different class, its #endBlockEncoding write (int) onDiskDataSize in OutputStream while other encoders write in byte array which is under OutputStream. If we do #endBlockEncoding first and #flush, Encoder ROW_INDEX_V1 runs well, while byte array with other encoders is {(int)onDiskDataSize, byte, byte....,byte} since they write (int)onDiskDataSize in byte array first and flush all data, but the right order is {byte, byte, ...., (int) onDiskDataSize}. (int)onDiskDataSize should be the last. Could we add useTag = currentKV.getTagsLength() > 0 in while loop above? Once it is set true, the rest no needs to check. {quote} {code:java} HStoreFile hsf = new HStoreFile(fs, path, conf, cacheConf, BloomType.NONE, true); StoreFileReader reader = hsf.getReader(); boolean useTag = reader.getHFileReader().getFileContext().isIncludesTags(); {code} Kinds of heavy to create a HStoreFile instance just to use its isIncludesTags method. {quote} Sorry, I didn't explain carefully. HStoreFile instance is already created in #testCodecs which happened before #checkStatistics , we could check if useTag is true in #testCodecs instead of creating a new one. {quote} {code:java} DataBlockEncodingTool#checkStatistics rawKVs = uncompressedOutputStream.toByteArray(); {code} I doubt it a real rawKVs, since i see no about writing tags (if a kv has). {quote} Pardon me. Do you mean rawKVs may not be a real rawKVs because #checkStatistics doesn't write tags to rawKVs ? > add UT and docs for DataBlockEncodingTool > ----------------------------------------- > > Key: HBASE-18201 > URL: https://issues.apache.org/jira/browse/HBASE-18201 > Project: HBase > Issue Type: Sub-task > Components: tooling > Reporter: Chia-Ping Tsai > Assignee: Kuan-Po Tseng > Priority: Minor > Labels: beginner > Attachments: HBASE-18201.master.001.patch, > HBASE-18201.master.002.patch, HBASE-18201.master.002.patch, > HBASE-18201.master.003.patch > > > There is no example, documents, or tests for DataBlockEncodingTool. We should > have it friendly if any use case exists. Otherwise, we should just get rid of > it because DataBlockEncodingTool presumes that the implementation of cell > returned from DataBlockEncoder is KeyValue. The presume may obstruct the > cleanup of KeyValue references in the code base of read/write path. -- This message was sent by Atlassian JIRA (v7.6.3#76005)