[ https://issues.apache.org/jira/browse/HBASE-29039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eungsop Yoo updated HBASE-29039: -------------------------------- Attachment: screenshot-2.png > Optimize read performance for accumulated delete markers on the same row or > cell > -------------------------------------------------------------------------------- > > Key: HBASE-29039 > URL: https://issues.apache.org/jira/browse/HBASE-29039 > Project: HBase > Issue Type: Improvement > Affects Versions: 2.6.1, 2.5.10 > Reporter: Eungsop Yoo > Priority: Major > Labels: pull-request-available > Attachments: screenshot-2.png > > > I am confronted with a problem that some Get operations take several seconds. > !screenshot-1.png! > The reason was founded that users Put and Delete on some rows repeatedly. As > delete markers are accumulated on the same row or cell, Get operations slow > down. It can be reproduced by follow HBase shell commands. > {code} > create 'test', 'c' > java_import org.apache.hadoop.hbase.client.Delete > java_import org.apache.hadoop.hbase.TableName > java_import java.lang.System > con = @hbase.instance_variable_get(:@connection) > table = con.getTable(TableName.valueOf('test')) > 1000.times do |i| > # batch 10000 deletes with different timestamps every 10 seconds > now = System.currentTimeMillis() > dels = 10000.times.map do |i| > del = Delete.new(Bytes.toBytes('row')) > del.addFamily(Bytes.toBytes('c'), now + i) > end > table.delete(dels) > sleep(10) > puts "i - #{i}" > get 'test', 'row' > end > {code} > {code} > i - 0 > COLUMN > CELL > 0 row(s) > Took 0.0251 seconds > ... > i - 10 > COLUMN > CELL > 0 row(s) > Took 0.0412 seconds > ... > i - 20 > COLUMN > CELL > 0 row(s) > Took 0.0760 seconds > ... > i - 30 > COLUMN > CELL > 0 row(s) > Took 0.1014 seconds > ... > i - 40 > COLUMN > CELL > 0 row(s) > Took 0.1616 seconds > ... > {code} > But the performance of Get operations can be optimized by using SEEK_NEXT_COL. > {code} > i - 1 > COLUMN > CELL > 0 row(s) > Took 0.0087 seconds > ... > i - 11 > COLUMN > CELL > 0 row(s) > Took 0.0077 seconds > ... > i - 21 > COLUMN > CELL > 0 row(s) > Took 0.0087 seconds > ... > {code} > Please review the PR. > https://github.com/apache/hbase/pull/6557 -- This message was sent by Atlassian Jira (v8.20.10#820010)