[jira] [Commented] (HBASE-21418) Reduce a number of reseek operations in MemstoreScanner when seek point is close to the current row.
[ https://issues.apache.org/jira/browse/HBASE-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835033#comment-16835033 ] HBase QA commented on HBASE-21418: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 7s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 1s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-1.2 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 39s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 24s{color} | {color:green} branch-1.2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 10s{color} | {color:green} branch-1.2 passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} branch-1.2 passed with JDK v1.7.0_222 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 51s{color} | {color:green} branch-1.2 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 40s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} branch-1.2 passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} branch-1.2 passed with JDK v1.7.0_222 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 16s{color} | {color:red} hbase-server: The patch generated 16 new + 377 unchanged - 4 fixed = 393 total (was 381) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 23s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 8m 30s{color} | {color:green} Patch does not cause any errors with Hadoop 2.4.1 2.5.2 2.6.5 2.7.4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 2s{color} | {color:green} hbase-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}120m 8s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 40s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}159m 32s{color} | {color:black} {color} | \\ \\ ||
[jira] [Commented] (HBASE-21418) Reduce a number of reseek operations in MemstoreScanner when seek point is close to the current row.
[ https://issues.apache.org/jira/browse/HBASE-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16684460#comment-16684460 ] Lars Hofhansl commented on HBASE-21418: --- I'll look this evening (bit busy throughout the day) > Reduce a number of reseek operations in MemstoreScanner when seek point is > close to the current row. > > > Key: HBASE-21418 > URL: https://issues.apache.org/jira/browse/HBASE-21418 > Project: HBase > Issue Type: Improvement > Components: scan, Scanners >Affects Versions: 1.2.5 >Reporter: Jeongdae Kim >Assignee: Jeongdae Kim >Priority: Minor > Labels: performance > Attachments: HBASE-21418.branch-1.2.001.patch, > HBASE-21418.branch-1.2.001.patch, HBASE-21418.branch-1.2.002.patch > > > We observed “responseTooSlow” logs for Get requests in our production > clusters. even some get requests were responded after 10 seconds. > Affected get requests were done with the timerange, and target rows have many > columns that have some versions. > We reproduced this issue, and found this behavior happens only when scanning > in the memstore. after flushing the HStore, this slow response issue for Get > disappeared and all same get requests are responded very quickly. > > We investigated this case, and found this performance difference between > memstore scanner and hfile scanner is caused by the number of reseek > operations executed while scanning. When a store scanner needs to reseek the > next column, Hfile scanner wisely decide whether it have to reseek or not by > checking the seek point is in current block, whereas memstore scanner just do > reseek without decision unlike Hfile scanner. In our case, almost all columns > in the memstore have older timestamp than scan(get)’s timerange, and so many > reseek operations occur as much as about the number of columns. This results > in increasing the response time of Get requests sporadically. > > To improve the reseek operation of the memstore scanner, i think it’s better > skipping than seeking when reseek requested, if seek point is quite close to > current cell that the scanner is pointing now.(Actually, i changed > MatchCode.SEEK_NEXT_COL to MatchCode.Skip in our case, and the response time > of Get was 6x faster than before) But we can’t decide whether seek point is > close to the current cell or not, because memstore scannner has no > information such as next block index. > Before HBASE-13109, Scan.HINT_LOOKAHEAD was introduced to handle like this > case, and it may be deprecated someday. But, i think that hint is still be > useful for the memstore scanner to try to skip first, before reseeking, and > with this option we can make reseek operations of memstore scanner smarter. > > I tested this patch in our case, and got the same result as i changed > matchcode (mentioned above). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21418) Reduce a number of reseek operations in MemstoreScanner when seek point is close to the current row.
[ https://issues.apache.org/jira/browse/HBASE-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16681314#comment-16681314 ] Hadoop QA commented on HBASE-21418: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 18m 56s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-1.2 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 51s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 44s{color} | {color:green} branch-1.2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} branch-1.2 passed with JDK v1.8.0_192 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} branch-1.2 passed with JDK v1.7.0_201 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 44s{color} | {color:green} branch-1.2 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 21s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} branch-1.2 passed with JDK v1.8.0_192 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} branch-1.2 passed with JDK v1.7.0_201 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} the patch passed with JDK v1.8.0_192 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} the patch passed with JDK v1.7.0_201 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 14s{color} | {color:red} hbase-server: The patch generated 16 new + 377 unchanged - 4 fixed = 393 total (was 381) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 15s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 8m 44s{color} | {color:green} Patch does not cause any errors with Hadoop 2.4.1 2.5.2 2.6.5 2.7.4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} the patch passed with JDK v1.8.0_192 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} the patch passed with JDK v1.7.0_201 {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 59s{color} | {color:green} hbase-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}114m 39s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}172m 11s{color} | {color:black} {color} | \\ \\ || Reaso
[jira] [Commented] (HBASE-21418) Reduce a number of reseek operations in MemstoreScanner when seek point is close to the current row.
[ https://issues.apache.org/jira/browse/HBASE-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16681082#comment-16681082 ] Jeongdae Kim commented on HBASE-21418: -- In the second patch, I tried to collect stats of columns with a bit of cost(stats collected only when flushing or compacting, because there are already a lot of comparisions in there), and use it to decide whether doing reseek or skip, when reseeking to next row, and use maxVersions when reseeking to next column as a heuristic. Please take a look again [~lhofhansl]. Thanks. > Reduce a number of reseek operations in MemstoreScanner when seek point is > close to the current row. > > > Key: HBASE-21418 > URL: https://issues.apache.org/jira/browse/HBASE-21418 > Project: HBase > Issue Type: Improvement > Components: scan, Scanners >Affects Versions: 1.2.5 >Reporter: Jeongdae Kim >Assignee: Jeongdae Kim >Priority: Minor > Labels: performance > Attachments: HBASE-21418.branch-1.2.001.patch, > HBASE-21418.branch-1.2.001.patch, HBASE-21418.branch-1.2.002.patch > > > We observed “responseTooSlow” logs for Get requests in our production > clusters. even some get requests were responded after 10 seconds. > Affected get requests were done with the timerange, and target rows have many > columns that have some versions. > We reproduced this issue, and found this behavior happens only when scanning > in the memstore. after flushing the HStore, this slow response issue for Get > disappeared and all same get requests are responded very quickly. > > We investigated this case, and found this performance difference between > memstore scanner and hfile scanner is caused by the number of reseek > operations executed while scanning. When a store scanner needs to reseek the > next column, Hfile scanner wisely decide whether it have to reseek or not by > checking the seek point is in current block, whereas memstore scanner just do > reseek without decision unlike Hfile scanner. In our case, almost all columns > in the memstore have older timestamp than scan(get)’s timerange, and so many > reseek operations occur as much as about the number of columns. This results > in increasing the response time of Get requests sporadically. > > To improve the reseek operation of the memstore scanner, i think it’s better > skipping than seeking when reseek requested, if seek point is quite close to > current cell that the scanner is pointing now.(Actually, i changed > MatchCode.SEEK_NEXT_COL to MatchCode.Skip in our case, and the response time > of Get was 6x faster than before) But we can’t decide whether seek point is > close to the current cell or not, because memstore scannner has no > information such as next block index. > Before HBASE-13109, Scan.HINT_LOOKAHEAD was introduced to handle like this > case, and it may be deprecated someday. But, i think that hint is still be > useful for the memstore scanner to try to skip first, before reseeking, and > with this option we can make reseek operations of memstore scanner smarter. > > I tested this patch in our case, and got the same result as i changed > matchcode (mentioned above). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21418) Reduce a number of reseek operations in MemstoreScanner when seek point is close to the current row.
[ https://issues.apache.org/jira/browse/HBASE-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16672938#comment-16672938 ] Jeongdae Kim commented on HBASE-21418: -- Thanks for your comments. I’ll reflect your comments to the next patch. {quote} Generally I am not a fan of adding more HBase and/or scan options that one has to know about. (which is why I had removed the LOOK_AHEAD hint that I myself had added a bit earlier). {quote} I 100% agree with you, and would like to do without options too. but, I couldn’t find a nice solution without extra cost. {quote} Why max versions here? The SEEKing can also be an issue with many columns, right? If we can, let's find a heuristic to do this automatically (like I did with HFiles), so that a user won't have to hint. {quote} Right, I used the max versions as a heuristic in case that users pass no hint. I had no any idea about proper heuristic. If we can bear small extra costs when putting cells into a memstore, What about maintaining some stats for columns and using it to decide whether doing seek operations or not. Let me try to make a patch for this. > Reduce a number of reseek operations in MemstoreScanner when seek point is > close to the current row. > > > Key: HBASE-21418 > URL: https://issues.apache.org/jira/browse/HBASE-21418 > Project: HBase > Issue Type: Improvement > Components: scan, Scanners >Affects Versions: 1.2.5 >Reporter: Jeongdae Kim >Assignee: Jeongdae Kim >Priority: Minor > Labels: performance > Attachments: HBASE-21418.branch-1.2.001.patch, > HBASE-21418.branch-1.2.001.patch > > > We observed “responseTooSlow” logs for Get requests in our production > clusters. even some get requests were responded after 10 seconds. > Affected get requests were done with the timerange, and target rows have many > columns that have some versions. > We reproduced this issue, and found this behavior happens only when scanning > in the memstore. after flushing the HStore, this slow response issue for Get > disappeared and all same get requests are responded very quickly. > > We investigated this case, and found this performance difference between > memstore scanner and hfile scanner is caused by the number of reseek > operations executed while scanning. When a store scanner needs to reseek the > next column, Hfile scanner wisely decide whether it have to reseek or not by > checking the seek point is in current block, whereas memstore scanner just do > reseek without decision unlike Hfile scanner. In our case, almost all columns > in the memstore have older timestamp than scan(get)’s timerange, and so many > reseek operations occur as much as about the number of columns. This results > in increasing the response time of Get requests sporadically. > > To improve the reseek operation of the memstore scanner, i think it’s better > skipping than seeking when reseek requested, if seek point is quite close to > current cell that the scanner is pointing now.(Actually, i changed > MatchCode.SEEK_NEXT_COL to MatchCode.Skip in our case, and the response time > of Get was 6x faster than before) But we can’t decide whether seek point is > close to the current cell or not, because memstore scannner has no > information such as next block index. > Before HBASE-13109, Scan.HINT_LOOKAHEAD was introduced to handle like this > case, and it may be deprecated someday. But, i think that hint is still be > useful for the memstore scanner to try to skip first, before reseeking, and > with this option we can make reseek operations of memstore scanner smarter. > > I tested this patch in our case, and got the same result as i changed > matchcode (mentioned above). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21418) Reduce a number of reseek operations in MemstoreScanner when seek point is close to the current row.
[ https://issues.apache.org/jira/browse/HBASE-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16672936#comment-16672936 ] Jeongdae Kim commented on HBASE-21418: -- Thanks for the comment [~yuzhih...@gmail.com]. {quote} What is TestLookAheadBeforeReseek supposed to show without the fix ? {quote} It's for showing performance difference . I'll remove the test from my patch and make an external link for this test. > Reduce a number of reseek operations in MemstoreScanner when seek point is > close to the current row. > > > Key: HBASE-21418 > URL: https://issues.apache.org/jira/browse/HBASE-21418 > Project: HBase > Issue Type: Improvement > Components: scan, Scanners >Affects Versions: 1.2.5 >Reporter: Jeongdae Kim >Assignee: Jeongdae Kim >Priority: Minor > Labels: performance > Attachments: HBASE-21418.branch-1.2.001.patch, > HBASE-21418.branch-1.2.001.patch > > > We observed “responseTooSlow” logs for Get requests in our production > clusters. even some get requests were responded after 10 seconds. > Affected get requests were done with the timerange, and target rows have many > columns that have some versions. > We reproduced this issue, and found this behavior happens only when scanning > in the memstore. after flushing the HStore, this slow response issue for Get > disappeared and all same get requests are responded very quickly. > > We investigated this case, and found this performance difference between > memstore scanner and hfile scanner is caused by the number of reseek > operations executed while scanning. When a store scanner needs to reseek the > next column, Hfile scanner wisely decide whether it have to reseek or not by > checking the seek point is in current block, whereas memstore scanner just do > reseek without decision unlike Hfile scanner. In our case, almost all columns > in the memstore have older timestamp than scan(get)’s timerange, and so many > reseek operations occur as much as about the number of columns. This results > in increasing the response time of Get requests sporadically. > > To improve the reseek operation of the memstore scanner, i think it’s better > skipping than seeking when reseek requested, if seek point is quite close to > current cell that the scanner is pointing now.(Actually, i changed > MatchCode.SEEK_NEXT_COL to MatchCode.Skip in our case, and the response time > of Get was 6x faster than before) But we can’t decide whether seek point is > close to the current cell or not, because memstore scannner has no > information such as next block index. > Before HBASE-13109, Scan.HINT_LOOKAHEAD was introduced to handle like this > case, and it may be deprecated someday. But, i think that hint is still be > useful for the memstore scanner to try to skip first, before reseeking, and > with this option we can make reseek operations of memstore scanner smarter. > > I tested this patch in our case, and got the same result as i changed > matchcode (mentioned above). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21418) Reduce a number of reseek operations in MemstoreScanner when seek point is close to the current row.
[ https://issues.apache.org/jira/browse/HBASE-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16672052#comment-16672052 ] Hadoop QA commented on HBASE-21418: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 36s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 5s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} branch-1.2 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 38s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 28s{color} | {color:green} branch-1.2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 16s{color} | {color:green} branch-1.2 passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} branch-1.2 passed with JDK v1.7.0_191 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 44s{color} | {color:green} branch-1.2 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 31s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} branch-1.2 passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} branch-1.2 passed with JDK v1.7.0_191 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} the patch passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} the patch passed with JDK v1.7.0_191 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 16s{color} | {color:red} hbase-server: The patch generated 13 new + 164 unchanged - 1 fixed = 177 total (was 165) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 21s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 8m 29s{color} | {color:green} Patch does not cause any errors with Hadoop 2.4.1 2.5.2 2.6.5 2.7.4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} the patch passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} the patch passed with JDK v1.7.0_191 {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 2s{color} | {color:green} hbase-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}140m 51s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}181m 41s{color} | {color:black} {color} | \\ \\ || Reaso
[jira] [Commented] (HBASE-21418) Reduce a number of reseek operations in MemstoreScanner when seek point is close to the current row.
[ https://issues.apache.org/jira/browse/HBASE-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671944#comment-16671944 ] Lars Hofhansl commented on HBASE-21418: --- Thanks for looking into this [~Jeongdae Kim]! Generally I am not a fan of adding more HBase and/or scan options that one has to know about. (which is why I had removed the LOOK_AHEAD hint that I myself had added a bit earlier). Is there no way to guess whether we should SKIP or SEEK? Patch looks good in general. Few comments: {code:java} - /** - * @deprecated without replacement - * This is now a no-op, SEEKs and SKIPs are optimizated automatically. - * Will be removed in 2.0+ - */ - @Deprecated + public static final String HINT_LOOKAHEAD = "_look_ahead_";{code} We might want to put the older (EXPERT_ONLY) comment back (or any comment explaining what this is doing). {code:java} +MemStoreScanner(long readPoint, ScanQueryMatcher scanQueryMatcher) { + this(readPoint); + + if (scanQueryMatcher != null) { +int lookAheadRows = conf.getInt("hbase.hregion.memstore.scanner.lookahead.rows", 0); +if (scanQueryMatcher.getLookAheadBeforeReseek() <= lookAheadRows) { + this.lookAheadRows = scanQueryMatcher.getLookAheadBeforeReseek(); +} + } +} +{code} Note that conf.getXXX can be slow itself. We might want to cache this value somewhere. {code:java} + + if (lookAheadRows > 0 && peek() != null) { +for (int i = lookAheadRows; i > 0; --i) { + next(); + if (peek() == null) { +return false; + } + if (comparator.compare(peek(), key) >= 0) { +return true; + } +} + } +{code} This is a nit, but since the value if {{i}} is not actually used inside the loop, I'd prefer {{for (int i=0; i< lookAheadRows; i++)...}} {code:java} + +byte[] attr = scan.getAttribute(Scan.HINT_LOOKAHEAD); +this.lookAheadBeforeSeek = Math.min(attr == null ? 0 : Bytes.toInt(attr), scanInfo.getMaxVersions()); +{code} Why max versions here? The SEEKing can also be an issue with many columns, right? If we can, let's find a heuristic to do this automatically (like I did with HFiles), so that a user won't have to hint. > Reduce a number of reseek operations in MemstoreScanner when seek point is > close to the current row. > > > Key: HBASE-21418 > URL: https://issues.apache.org/jira/browse/HBASE-21418 > Project: HBase > Issue Type: Improvement > Components: scan, Scanners >Affects Versions: 1.2.5 >Reporter: Jeongdae Kim >Assignee: Jeongdae Kim >Priority: Minor > Labels: performance > Attachments: HBASE-21418.branch-1.2.001.patch, > HBASE-21418.branch-1.2.001.patch > > > We observed “responseTooSlow” logs for Get requests in our production > clusters. even some get requests were responded after 10 seconds. > Affected get requests were done with the timerange, and target rows have many > columns that have some versions. > We reproduced this issue, and found this behavior happens only when scanning > in the memstore. after flushing the HStore, this slow response issue for Get > disappeared and all same get requests are responded very quickly. > > We investigated this case, and found this performance difference between > memstore scanner and hfile scanner is caused by the number of reseek > operations executed while scanning. When a store scanner needs to reseek the > next column, Hfile scanner wisely decide whether it have to reseek or not by > checking the seek point is in current block, whereas memstore scanner just do > reseek without decision unlike Hfile scanner. In our case, almost all columns > in the memstore have older timestamp than scan(get)’s timerange, and so many > reseek operations occur as much as about the number of columns. This results > in increasing the response time of Get requests sporadically. > > To improve the reseek operation of the memstore scanner, i think it’s better > skipping than seeking when reseek requested, if seek point is quite close to > current cell that the scanner is pointing now.(Actually, i changed > MatchCode.SEEK_NEXT_COL to MatchCode.Skip in our case, and the response time > of Get was 6x faster than before) But we can’t decide whether seek point is > close to the current cell or not, because memstore scannner has no > information such as next block index. > Before HBASE-13109, Scan.HINT_LOOKAHEAD was introduced to handle like this > case, and it may be deprecated someday. But, i think that hint is still be > useful for the memstore scanner to try to skip first, before reseeking, and > with this option we c
[jira] [Commented] (HBASE-21418) Reduce a number of reseek operations in MemstoreScanner when seek point is close to the current row.
[ https://issues.apache.org/jira/browse/HBASE-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671928#comment-16671928 ] Lars Hofhansl commented on HBASE-21418: --- (Sorry about the re-assign, hit some keyboard combination that did that) In the HFile case we only reseek if the seek would land us in a different HFile block. For the memstore we not have a similar metric. I'm suprised that reseek is so expensive in the memstore case. Taking a look now. > Reduce a number of reseek operations in MemstoreScanner when seek point is > close to the current row. > > > Key: HBASE-21418 > URL: https://issues.apache.org/jira/browse/HBASE-21418 > Project: HBase > Issue Type: Improvement > Components: scan, Scanners >Affects Versions: 1.2.5 >Reporter: Jeongdae Kim >Assignee: Jeongdae Kim >Priority: Minor > Labels: performance > Attachments: HBASE-21418.branch-1.2.001.patch, > HBASE-21418.branch-1.2.001.patch > > > We observed “responseTooSlow” logs for Get requests in our production > clusters. even some get requests were responded after 10 seconds. > Affected get requests were done with the timerange, and target rows have many > columns that have some versions. > We reproduced this issue, and found this behavior happens only when scanning > in the memstore. after flushing the HStore, this slow response issue for Get > disappeared and all same get requests are responded very quickly. > > We investigated this case, and found this performance difference between > memstore scanner and hfile scanner is caused by the number of reseek > operations executed while scanning. When a store scanner needs to reseek the > next column, Hfile scanner wisely decide whether it have to reseek or not by > checking the seek point is in current block, whereas memstore scanner just do > reseek without decision unlike Hfile scanner. In our case, almost all columns > in the memstore have older timestamp than scan(get)’s timerange, and so many > reseek operations occur as much as about the number of columns. This results > in increasing the response time of Get requests sporadically. > > To improve the reseek operation of the memstore scanner, i think it’s better > skipping than seeking when reseek requested, if seek point is quite close to > current cell that the scanner is pointing now.(Actually, i changed > MatchCode.SEEK_NEXT_COL to MatchCode.Skip in our case, and the response time > of Get was 6x faster than before) But we can’t decide whether seek point is > close to the current cell or not, because memstore scannner has no > information such as next block index. > Before HBASE-13109, Scan.HINT_LOOKAHEAD was introduced to handle like this > case, and it may be deprecated someday. But, i think that hint is still be > useful for the memstore scanner to try to skip first, before reseeking, and > with this option we can make reseek operations of memstore scanner smarter. > > I tested this patch in our case, and got the same result as i changed > matchcode (mentioned above). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21418) Reduce a number of reseek operations in MemstoreScanner when seek point is close to the current row.
[ https://issues.apache.org/jira/browse/HBASE-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671750#comment-16671750 ] Ted Yu commented on HBASE-21418: For the new test, I ran it without the rest of the patch: {code} Running org.apache.hadoop.hbase.client.TestLookAheadBeforeReseek Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 24.647 sec - in org.apache.hadoop.hbase.client.TestLookAheadBeforeReseek {code} What is TestLookAheadBeforeReseek supposed to show without the fix ? > Reduce a number of reseek operations in MemstoreScanner when seek point is > close to the current row. > > > Key: HBASE-21418 > URL: https://issues.apache.org/jira/browse/HBASE-21418 > Project: HBase > Issue Type: Improvement > Components: scan, Scanners >Affects Versions: 1.2.5 >Reporter: Jeongdae Kim >Assignee: Jeongdae Kim >Priority: Minor > Labels: performance > Attachments: HBASE-21418.branch-1.2.001.patch > > > We observed “responseTooSlow” logs for Get requests in our production > clusters. even some get requests were responded after 10 seconds. > Affected get requests were done with the timerange, and target rows have many > columns that have some versions. > We reproduced this issue, and found this behavior happens only when scanning > in the memstore. after flushing the HStore, this slow response issue for Get > disappeared and all same get requests are responded very quickly. > > We investigated this case, and found this performance difference between > memstore scanner and hfile scanner is caused by the number of reseek > operations executed while scanning. When a store scanner needs to reseek the > next column, Hfile scanner wisely decide whether it have to reseek or not by > checking the seek point is in current block, whereas memstore scanner just do > reseek without decision unlike Hfile scanner. In our case, almost all columns > in the memstore have older timestamp than scan(get)’s timerange, and so many > reseek operations occur as much as about the number of columns. This results > in increasing the response time of Get requests sporadically. > > To improve the reseek operation of the memstore scanner, i think it’s better > skipping than seeking when reseek requested, if seek point is quite close to > current cell that the scanner is pointing now.(Actually, i changed > MatchCode.SEEK_NEXT_COL to MatchCode.Skip in our case, and the response time > of Get was 6x faster than before) But we can’t decide whether seek point is > close to the current cell or not, because memstore scannner has no > information such as next block index. > Before HBASE-13109, Scan.HINT_LOOKAHEAD was introduced to handle like this > case, and it may be deprecated someday. But, i think that hint is still be > useful for the memstore scanner to try to skip first, before reseeking, and > with this option we can make reseek operations of memstore scanner smarter. > > I tested this patch in our case, and got the same result as i changed > matchcode (mentioned above). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21418) Reduce a number of reseek operations in MemstoreScanner when seek point is close to the current row.
[ https://issues.apache.org/jira/browse/HBASE-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671453#comment-16671453 ] Hadoop QA commented on HBASE-21418: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m 44s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} branch-1.2 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 40s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 51s{color} | {color:green} branch-1.2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} branch-1.2 passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} branch-1.2 passed with JDK v1.7.0_191 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 37s{color} | {color:green} branch-1.2 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 13s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} branch-1.2 passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} branch-1.2 passed with JDK v1.7.0_191 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} the patch passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} the patch passed with JDK v1.7.0_191 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 9s{color} | {color:red} hbase-server: The patch generated 13 new + 164 unchanged - 1 fixed = 177 total (was 165) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 13s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 8m 15s{color} | {color:green} Patch does not cause any errors with Hadoop 2.4.1 2.5.2 2.6.5 2.7.4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | {color:green} the patch passed with JDK v1.8.0_181 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} the patch passed with JDK v1.7.0_191 {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 55s{color} | {color:green} hbase-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 99m 50s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}153m 32s{color} | {color:black} {color} | \\ \\ || Reaso
[jira] [Commented] (HBASE-21418) Reduce a number of reseek operations in MemstoreScanner when seek point is close to the current row.
[ https://issues.apache.org/jira/browse/HBASE-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671239#comment-16671239 ] Jeongdae Kim commented on HBASE-21418: -- [~lhofhansl] Could you take a look? > Reduce a number of reseek operations in MemstoreScanner when seek point is > close to the current row. > > > Key: HBASE-21418 > URL: https://issues.apache.org/jira/browse/HBASE-21418 > Project: HBase > Issue Type: Improvement > Components: scan, Scanners >Affects Versions: 1.2.5 >Reporter: Jeongdae Kim >Assignee: Jeongdae Kim >Priority: Minor > Labels: performance > Attachments: HBASE-21418.branch-1.2.001.patch > > > We observed “responseTooSlow” logs for Get requests in our production > clusters. even some get requests were responded after 10 seconds. > Affected get requests were done with the timerange, and target rows have many > columns that have some versions. > We reproduced this issue, and found this behavior happens only when scanning > in the memstore. after flushing the HStore, this slow response issue for Get > disappeared and all same get requests are responded very quickly. > > We investigated this case, and found this performance difference between > memstore scanner and hfile scanner is caused by the number of reseek > operations executed while scanning. When a store scanner needs to reseek the > next column, Hfile scanner wisely decide whether it have to reseek or not by > checking the seek point is in current block, whereas memstore scanner just do > reseek without decision unlike Hfile scanner. In our case, almost all columns > in the memstore have older timestamp than scan(get)’s timerange, and so many > reseek operations occur as much as about the number of columns. This results > in increasing the response time of Get requests sporadically. > > To improve the reseek operation of the memstore scanner, i think it’s better > skipping than seeking when reseek requested, if seek point is quite close to > current cell that the scanner is pointing now.(Actually, i changed > MatchCode.SEEK_NEXT_COL to MatchCode.Skip in our case, and the response time > of Get was 6x faster than before) But we can’t decide whether seek point is > close to the current cell or not, because memstore scannner has no > information such as next block index. > Before HBASE-13109, Scan.HINT_LOOKAHEAD was introduced to handle like this > case, and it may be deprecated someday. But, i think that hint is still be > useful for the memstore scanner to try to skip first, before reseeking, and > with this option we can make reseek operations of memstore scanner smarter. > > I tested this patch in our case, and got the same result as i changed > matchcode (mentioned above). -- This message was sent by Atlassian JIRA (v7.6.3#76005)