[ 
https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14395844#comment-14395844
 ] 

Lars Hofhansl commented on HBASE-13389:
---------------------------------------

It turns out that the optimization in HBASE-8151 and HBASE-9751 still works, 
but only after 6 days, when compactions allow setting mvcc readpoints to 0.

I think we can get the optimization for HBASE-8166 and back and still have 
HBASE-12600 correctly, if we replace this:
{code}
- final boolean needMvcc = fd.maxMVCCReadpoint >= smallestReadPoint;
+
final Compression.Algorithm compression = 
store.getFamily().getCompactionCompression();
StripeMultiFileWriter.WriterFactory factory = new 
StripeMultiFileWriter.WriterFactory() {
@Override
public Writer createWriter() throws IOException {
return store.createWriterInTmp(
- fd.maxKeyCount, compression, true, needMvcc, fd.maxTagsLength > 0);
+ fd.maxKeyCount, compression, true, true, fd.maxTagsLength > 0);
}
{code}

With this:

{code}
- final boolean needMvcc = fd.maxMVCCReadpoint >= smallestReadPoint;
+ final boolean needMvcc = fd.maxMVCCReadpoint >= 0;
final Compression.Algorithm compression = 
store.getFamily().getCompactionCompression();
StripeMultiFileWriter.WriterFactory factory = new 
StripeMultiFileWriter.WriterFactory() {
@Override
public Writer createWriter() throws IOException {
return store.createWriterInTmp(
fd.maxKeyCount, compression, true, needMvcc, fd.maxTagsLength > 0);
}
{code}

So when all mvccr readpoint are 0, the next compaction can then still do the 
optimization for HBASE-8166 and not write the mvcc information at all. It just 
will be later... Before we already do that when we do not have any scanner open 
with a readpoint older than any of the readpoints in the HFile, now we have to 
wait until comactions set them all to 0.

It's not all that bad. [~stack], if the data is older than 6 days I'd expect 
this to no longer show in the profiler.

Maybe we need to write some unittests for this, although I assume that won't be 
easy.


> [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
> -------------------------------------------------------------
>
>                 Key: HBASE-13389
>                 URL: https://issues.apache.org/jira/browse/HBASE-13389
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Performance
>            Reporter: stack
>
> HBASE-12600 moved the edit sequenceid from tags to instead exploit the 
> mvcc/sequenceid slot in a key. Now Cells near-always have an associated 
> mvcc/sequenceid where previous it was rare or the mvcc was kept up at the 
> file level. This is sort of how it should be many of us would argue but as a 
> side-effect of this change, read-time optimizations that helped speed scans 
> were undone by this change.
> In this issue, lets see if we can get the optimizations back -- or just 
> remove the optimizations altogether.
> The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291.
> The optimizations undone by this changes are (to quote the optimizer himself, 
> Mr [~lhofhansl]):
> {quote}
> Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166.
> We're always storing the mvcc readpoints, and we never compare them against 
> the actual smallestReadpoint, and hence we're always performing all the 
> checks, tests, and comparisons that these jiras removed in addition to 
> actually storing the data - which with up to 8 bytes per Cell is not trivial.
> {quote}
> This is the 'breaking' change: 
> https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to