[ https://issues.apache.org/jira/browse/HBASE-29039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eungsop Yoo updated HBASE-29039: -------------------------------- Description: I am confronted with a problem that some Get operations take several seconds. !screenshot-1.png! The reason was founded that users Put and Delete on some rows repeatedly. As delete markers are accumulated on the same row or cell, Get operations slow down. It can be reproduced by follow HBase shell commands. {code} create 'test', 'c' java_import org.apache.hadoop.hbase.client.Delete java_import org.apache.hadoop.hbase.TableName java_import java.lang.System con = @hbase.instance_variable_get(:@connection) table = con.getTable(TableName.valueOf('test')) 1000.times do |i| # batch 10000 deletes with different timestamps every 10 seconds now = System.currentTimeMillis() dels = 10000.times.map do |i| del = Delete.new(Bytes.toBytes('row')) del.addFamily(Bytes.toBytes('c'), now + i) end table.delete(dels) sleep(10) puts "i - #{i}" get 'test', 'row' end {code} {code} i - 0 COLUMN CELL 0 row(s) Took 0.0251 seconds ... i - 10 COLUMN CELL 0 row(s) Took 0.0412 seconds ... i - 20 COLUMN CELL 0 row(s) Took 0.0760 seconds ... i - 30 COLUMN CELL 0 row(s) Took 0.1014 seconds ... i - 40 COLUMN CELL 0 row(s) Took 0.1616 seconds ... {code} But the performance of Get operations can be optimized by using SEEK_NEXT_COL. {code} i - 1 COLUMN CELL 0 row(s) Took 0.0087 seconds ... i - 11 COLUMN CELL 0 row(s) Took 0.0077 seconds ... i - 21 COLUMN CELL 0 row(s) Took 0.0087 seconds ... {code} Please review the PR. https://github.com/apache/hbase/pull/6557 was: I am confronted with a problem that some Get operations take several seconds. !screenshot-1.png|thumbnail! The reason was founded that users Put and Delete on some rows repeatedly. As delete markers are accumulated on the same row or cell, Get operations slow down. It can be reproduced by follow HBase shell commands. {code} create 'test', 'c' java_import org.apache.hadoop.hbase.client.Delete java_import org.apache.hadoop.hbase.TableName java_import java.lang.System con = @hbase.instance_variable_get(:@connection) table = con.getTable(TableName.valueOf('test')) 1000.times do |i| # batch 10000 deletes with different timestamps every 10 seconds now = System.currentTimeMillis() dels = 10000.times.map do |i| del = Delete.new(Bytes.toBytes('row')) del.addFamily(Bytes.toBytes('c'), now + i) end table.delete(dels) sleep(10) puts "i - #{i}" get 'test', 'row' end {code} {code} i - 0 COLUMN CELL 0 row(s) Took 0.0251 seconds ... i - 10 COLUMN CELL 0 row(s) Took 0.0412 seconds ... i - 20 COLUMN CELL 0 row(s) Took 0.0760 seconds ... i - 30 COLUMN CELL 0 row(s) Took 0.1014 seconds ... i - 40 COLUMN CELL 0 row(s) Took 0.1616 seconds ... {code} But the performance of Get operations can be optimized by using SEEK_NEXT_COL. {code} i - 1 COLUMN CELL 0 row(s) Took 0.0087 seconds ... i - 11 COLUMN CELL 0 row(s) Took 0.0077 seconds ... i - 21 COLUMN CELL 0 row(s) Took 0.0087 seconds ... {code} Please review the PR. https://github.com/apache/hbase/pull/6557 > Optimize read performance for accumulated delete markers on the same row or > cell > -------------------------------------------------------------------------------- > > Key: HBASE-29039 > URL: https://issues.apache.org/jira/browse/HBASE-29039 > Project: HBase > Issue Type: Improvement > Affects Versions: 2.6.1, 2.5.10 > Reporter: Eungsop Yoo > Priority: Major > Labels: pull-request-available > Attachments: screenshot-1.png > > > I am confronted with a problem that some Get operations take several seconds. > !screenshot-1.png! > The reason was founded that users Put and Delete on some rows repeatedly. As > delete markers are accumulated on the same row or cell, Get operations slow > down. It can be reproduced by follow HBase shell commands. > {code} > create 'test', 'c' > java_import org.apache.hadoop.hbase.client.Delete > java_import org.apache.hadoop.hbase.TableName > java_import java.lang.System > con = @hbase.instance_variable_get(:@connection) > table = con.getTable(TableName.valueOf('test')) > 1000.times do |i| > # batch 10000 deletes with different timestamps every 10 seconds > now = System.currentTimeMillis() > dels = 10000.times.map do |i| > del = Delete.new(Bytes.toBytes('row')) > del.addFamily(Bytes.toBytes('c'), now + i) > end > table.delete(dels) > sleep(10) > puts "i - #{i}" > get 'test', 'row' > end > {code} > {code} > i - 0 > COLUMN > CELL > 0 row(s) > Took 0.0251 seconds > ... > i - 10 > COLUMN > CELL > 0 row(s) > Took 0.0412 seconds > ... > i - 20 > COLUMN > CELL > 0 row(s) > Took 0.0760 seconds > ... > i - 30 > COLUMN > CELL > 0 row(s) > Took 0.1014 seconds > ... > i - 40 > COLUMN > CELL > 0 row(s) > Took 0.1616 seconds > ... > {code} > But the performance of Get operations can be optimized by using SEEK_NEXT_COL. > {code} > i - 1 > COLUMN > CELL > 0 row(s) > Took 0.0087 seconds > ... > i - 11 > COLUMN > CELL > 0 row(s) > Took 0.0077 seconds > ... > i - 21 > COLUMN > CELL > 0 row(s) > Took 0.0087 seconds > ... > {code} > Please review the PR. > https://github.com/apache/hbase/pull/6557 -- This message was sent by Atlassian Jira (v8.20.10#820010)