[jira] [Updated] (HBASE-24637) Filter SKIP hinting regression

2020-06-26 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-24637:

Attachment: parse_call_trace.pl

> Filter SKIP hinting regression
> --
>
> Key: HBASE-24637
> URL: https://issues.apache.org/jira/browse/HBASE-24637
> Project: HBase
>  Issue Type: Bug
>  Components: Filters, Performance, Scanners
>Affects Versions: 2.2.5
>Reporter: Andrew Kyle Purtell
>Priority: Major
> Attachments: W-7665966-FAST_DIFF-FILTER_ALL.pdf, 
> W-7665966-Instrument-low-level-scan-details-branch-1.patch, 
> W-7665966-Instrument-low-level-scan-details-branch-2.2.patch, 
> parse_call_trace.pl
>
>
> I have been looking into reported performance regressions in HBase 2 relative 
> to HBase 1. Depending on the test scenario, HBase 2 can demonstrate 
> significantly better microbenchmarks in a number of cases, and usually shows 
> improvement in whole cluster benchmarks like YCSB.
> To assist in debugging I added methods to RpcServer for updating per-call 
> metrics that leverage the fact it puts a reference to the current Call into a 
> thread local and that all activity for a given RPC is processed by a single 
> thread context. I then instrumented ScanQueryMatcher (in branch-1) and its 
> various friends (in branch-2.2), StoreScanner, HFileReaderV2 and 
> HFileReaderV3 (in branch-1) and HFileReaderImpl (in branch-2.2), HFileBlock, 
> and DefaultMemStore (branch-1) and SegmentScanner (branch-2.2). Test tables 
> with one family and 1, 5, 10, 20, 50, and 100 distinct column-qualifiers per 
> row were created, snapshot, dropped, and cloned from the snapshot. Both 1.6 
> and 2.2 versions under test operated on identical data files in HDFS. For 
> tests with 1.6 and 2.2 on the server side the same 1.6 PE client was used, to 
> ensure only the server side differed.
> The results for pe --filterAll were revealing. See attached. 
> It appears a refactor to ScanQueryMatcher and friends has disabled the 
> ability of filters to provide meaningful SKIP hints, which disables an 
> optimization that avoids reseeking, leading to a serious and proportional 
> regression in reseek activity and time spent in that code path. So for 
> queries that use filters, there can be a substantial regression.
> Other test cases that did not use filters did not show this regression. If 
> filters are not used the behavior of ScanQueryMatcher between 1.6 and 2.2 was 
> almost identical, as measured by counts of the hint types returned, whether 
> or not column or version trackers are called, and counts of store seeks or 
> reseeks. Regarding micro-timings, there was a 10% variance in my testing and 
> results generally fell within this range, except for the filter all case of 
> course. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24637) Filter SKIP hinting regression

2020-06-26 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-24637:

Attachment: (was: parse_call_trace.pl)

> Filter SKIP hinting regression
> --
>
> Key: HBASE-24637
> URL: https://issues.apache.org/jira/browse/HBASE-24637
> Project: HBase
>  Issue Type: Bug
>  Components: Filters, Performance, Scanners
>Affects Versions: 2.2.5
>Reporter: Andrew Kyle Purtell
>Priority: Major
> Attachments: W-7665966-FAST_DIFF-FILTER_ALL.pdf, 
> W-7665966-Instrument-low-level-scan-details-branch-1.patch, 
> W-7665966-Instrument-low-level-scan-details-branch-2.2.patch, 
> parse_call_trace.pl
>
>
> I have been looking into reported performance regressions in HBase 2 relative 
> to HBase 1. Depending on the test scenario, HBase 2 can demonstrate 
> significantly better microbenchmarks in a number of cases, and usually shows 
> improvement in whole cluster benchmarks like YCSB.
> To assist in debugging I added methods to RpcServer for updating per-call 
> metrics that leverage the fact it puts a reference to the current Call into a 
> thread local and that all activity for a given RPC is processed by a single 
> thread context. I then instrumented ScanQueryMatcher (in branch-1) and its 
> various friends (in branch-2.2), StoreScanner, HFileReaderV2 and 
> HFileReaderV3 (in branch-1) and HFileReaderImpl (in branch-2.2), HFileBlock, 
> and DefaultMemStore (branch-1) and SegmentScanner (branch-2.2). Test tables 
> with one family and 1, 5, 10, 20, 50, and 100 distinct column-qualifiers per 
> row were created, snapshot, dropped, and cloned from the snapshot. Both 1.6 
> and 2.2 versions under test operated on identical data files in HDFS. For 
> tests with 1.6 and 2.2 on the server side the same 1.6 PE client was used, to 
> ensure only the server side differed.
> The results for pe --filterAll were revealing. See attached. 
> It appears a refactor to ScanQueryMatcher and friends has disabled the 
> ability of filters to provide meaningful SKIP hints, which disables an 
> optimization that avoids reseeking, leading to a serious and proportional 
> regression in reseek activity and time spent in that code path. So for 
> queries that use filters, there can be a substantial regression.
> Other test cases that did not use filters did not show this regression. If 
> filters are not used the behavior of ScanQueryMatcher between 1.6 and 2.2 was 
> almost identical, as measured by counts of the hint types returned, whether 
> or not column or version trackers are called, and counts of store seeks or 
> reseeks. Regarding micro-timings, there was a 10% variance in my testing and 
> results generally fell within this range, except for the filter all case of 
> course. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24637) Filter SKIP hinting regression

2020-06-26 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-24637:

Attachment: (was: 
W-7665966-Instrument-low-level-scan-details-branch-1.patch)

> Filter SKIP hinting regression
> --
>
> Key: HBASE-24637
> URL: https://issues.apache.org/jira/browse/HBASE-24637
> Project: HBase
>  Issue Type: Bug
>  Components: Filters, Performance, Scanners
>Affects Versions: 2.2.5
>Reporter: Andrew Kyle Purtell
>Priority: Major
> Attachments: W-7665966-FAST_DIFF-FILTER_ALL.pdf, 
> W-7665966-Instrument-low-level-scan-details-branch-1.patch, 
> W-7665966-Instrument-low-level-scan-details-branch-2.2.patch, 
> parse_call_trace.pl
>
>
> I have been looking into reported performance regressions in HBase 2 relative 
> to HBase 1. Depending on the test scenario, HBase 2 can demonstrate 
> significantly better microbenchmarks in a number of cases, and usually shows 
> improvement in whole cluster benchmarks like YCSB.
> To assist in debugging I added methods to RpcServer for updating per-call 
> metrics that leverage the fact it puts a reference to the current Call into a 
> thread local and that all activity for a given RPC is processed by a single 
> thread context. I then instrumented ScanQueryMatcher (in branch-1) and its 
> various friends (in branch-2.2), StoreScanner, HFileReaderV2 and 
> HFileReaderV3 (in branch-1) and HFileReaderImpl (in branch-2.2), HFileBlock, 
> and DefaultMemStore (branch-1) and SegmentScanner (branch-2.2). Test tables 
> with one family and 1, 5, 10, 20, 50, and 100 distinct column-qualifiers per 
> row were created, snapshot, dropped, and cloned from the snapshot. Both 1.6 
> and 2.2 versions under test operated on identical data files in HDFS. For 
> tests with 1.6 and 2.2 on the server side the same 1.6 PE client was used, to 
> ensure only the server side differed.
> The results for pe --filterAll were revealing. See attached. 
> It appears a refactor to ScanQueryMatcher and friends has disabled the 
> ability of filters to provide meaningful SKIP hints, which disables an 
> optimization that avoids reseeking, leading to a serious and proportional 
> regression in reseek activity and time spent in that code path. So for 
> queries that use filters, there can be a substantial regression.
> Other test cases that did not use filters did not show this regression. If 
> filters are not used the behavior of ScanQueryMatcher between 1.6 and 2.2 was 
> almost identical, as measured by counts of the hint types returned, whether 
> or not column or version trackers are called, and counts of store seeks or 
> reseeks. Regarding micro-timings, there was a 10% variance in my testing and 
> results generally fell within this range, except for the filter all case of 
> course. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24637) Filter SKIP hinting regression

2020-06-26 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-24637:

Attachment: (was: 
W-7665966-Instrument-low-level-scan-details-branch-2.2.patch)

> Filter SKIP hinting regression
> --
>
> Key: HBASE-24637
> URL: https://issues.apache.org/jira/browse/HBASE-24637
> Project: HBase
>  Issue Type: Bug
>  Components: Filters, Performance, Scanners
>Affects Versions: 2.2.5
>Reporter: Andrew Kyle Purtell
>Priority: Major
> Attachments: W-7665966-FAST_DIFF-FILTER_ALL.pdf, 
> W-7665966-Instrument-low-level-scan-details-branch-1.patch, 
> W-7665966-Instrument-low-level-scan-details-branch-2.2.patch, 
> parse_call_trace.pl
>
>
> I have been looking into reported performance regressions in HBase 2 relative 
> to HBase 1. Depending on the test scenario, HBase 2 can demonstrate 
> significantly better microbenchmarks in a number of cases, and usually shows 
> improvement in whole cluster benchmarks like YCSB.
> To assist in debugging I added methods to RpcServer for updating per-call 
> metrics that leverage the fact it puts a reference to the current Call into a 
> thread local and that all activity for a given RPC is processed by a single 
> thread context. I then instrumented ScanQueryMatcher (in branch-1) and its 
> various friends (in branch-2.2), StoreScanner, HFileReaderV2 and 
> HFileReaderV3 (in branch-1) and HFileReaderImpl (in branch-2.2), HFileBlock, 
> and DefaultMemStore (branch-1) and SegmentScanner (branch-2.2). Test tables 
> with one family and 1, 5, 10, 20, 50, and 100 distinct column-qualifiers per 
> row were created, snapshot, dropped, and cloned from the snapshot. Both 1.6 
> and 2.2 versions under test operated on identical data files in HDFS. For 
> tests with 1.6 and 2.2 on the server side the same 1.6 PE client was used, to 
> ensure only the server side differed.
> The results for pe --filterAll were revealing. See attached. 
> It appears a refactor to ScanQueryMatcher and friends has disabled the 
> ability of filters to provide meaningful SKIP hints, which disables an 
> optimization that avoids reseeking, leading to a serious and proportional 
> regression in reseek activity and time spent in that code path. So for 
> queries that use filters, there can be a substantial regression.
> Other test cases that did not use filters did not show this regression. If 
> filters are not used the behavior of ScanQueryMatcher between 1.6 and 2.2 was 
> almost identical, as measured by counts of the hint types returned, whether 
> or not column or version trackers are called, and counts of store seeks or 
> reseeks. Regarding micro-timings, there was a 10% variance in my testing and 
> results generally fell within this range, except for the filter all case of 
> course. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24637) Filter SKIP hinting regression

2020-06-26 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-24637:

Attachment: W-7665966-Instrument-low-level-scan-details-branch-2.2.patch
W-7665966-Instrument-low-level-scan-details-branch-1.patch

> Filter SKIP hinting regression
> --
>
> Key: HBASE-24637
> URL: https://issues.apache.org/jira/browse/HBASE-24637
> Project: HBase
>  Issue Type: Bug
>  Components: Filters, Performance, Scanners
>Affects Versions: 2.2.5
>Reporter: Andrew Kyle Purtell
>Priority: Major
> Attachments: W-7665966-FAST_DIFF-FILTER_ALL.pdf, 
> W-7665966-Instrument-low-level-scan-details-branch-1.patch, 
> W-7665966-Instrument-low-level-scan-details-branch-2.2.patch, 
> parse_call_trace.pl
>
>
> I have been looking into reported performance regressions in HBase 2 relative 
> to HBase 1. Depending on the test scenario, HBase 2 can demonstrate 
> significantly better microbenchmarks in a number of cases, and usually shows 
> improvement in whole cluster benchmarks like YCSB.
> To assist in debugging I added methods to RpcServer for updating per-call 
> metrics that leverage the fact it puts a reference to the current Call into a 
> thread local and that all activity for a given RPC is processed by a single 
> thread context. I then instrumented ScanQueryMatcher (in branch-1) and its 
> various friends (in branch-2.2), StoreScanner, HFileReaderV2 and 
> HFileReaderV3 (in branch-1) and HFileReaderImpl (in branch-2.2), HFileBlock, 
> and DefaultMemStore (branch-1) and SegmentScanner (branch-2.2). Test tables 
> with one family and 1, 5, 10, 20, 50, and 100 distinct column-qualifiers per 
> row were created, snapshot, dropped, and cloned from the snapshot. Both 1.6 
> and 2.2 versions under test operated on identical data files in HDFS. For 
> tests with 1.6 and 2.2 on the server side the same 1.6 PE client was used, to 
> ensure only the server side differed.
> The results for pe --filterAll were revealing. See attached. 
> It appears a refactor to ScanQueryMatcher and friends has disabled the 
> ability of filters to provide meaningful SKIP hints, which disables an 
> optimization that avoids reseeking, leading to a serious and proportional 
> regression in reseek activity and time spent in that code path. So for 
> queries that use filters, there can be a substantial regression.
> Other test cases that did not use filters did not show this regression. If 
> filters are not used the behavior of ScanQueryMatcher between 1.6 and 2.2 was 
> almost identical, as measured by counts of the hint types returned, whether 
> or not column or version trackers are called, and counts of store seeks or 
> reseeks. Regarding micro-timings, there was a 10% variance in my testing and 
> results generally fell within this range, except for the filter all case of 
> course. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24637) Filter SKIP hinting regression

2020-06-25 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-24637:

Attachment: parse_call_trace.pl

> Filter SKIP hinting regression
> --
>
> Key: HBASE-24637
> URL: https://issues.apache.org/jira/browse/HBASE-24637
> Project: HBase
>  Issue Type: Bug
>  Components: Filters, Performance, Scanners
>Affects Versions: 2.2.5
>Reporter: Andrew Kyle Purtell
>Priority: Major
> Attachments: W-7665966-FAST_DIFF-FILTER_ALL.pdf, 
> W-7665966-Instrument-low-level-scan-details-branch-1.patch, 
> W-7665966-Instrument-low-level-scan-details-branch-2.2.patch, 
> parse_call_trace.pl
>
>
> I have been looking into reported performance regressions in HBase 2 relative 
> to HBase 1. Depending on the test scenario, HBase 2 can demonstrate 
> significantly better microbenchmarks in a number of cases, and usually shows 
> improvement in whole cluster benchmarks like YCSB.
> To assist in debugging I added methods to RpcServer for updating per-call 
> metrics that leverage the fact it puts a reference to the current Call into a 
> thread local and that all activity for a given RPC is processed by a single 
> thread context. I then instrumented ScanQueryMatcher (in branch-1) and its 
> various friends (in branch-2.2), StoreScanner, HFileReaderV2 and 
> HFileReaderV3 (in branch-1) and HFileReaderImpl (in branch-2.2), HFileBlock, 
> and DefaultMemStore (branch-1) and SegmentScanner (branch-2.2). Test tables 
> with one family and 1, 5, 10, 20, 50, and 100 distinct column-qualifiers per 
> row were created, snapshot, dropped, and cloned from the snapshot. Both 1.6 
> and 2.2 versions under test operated on identical data files in HDFS. For 
> tests with 1.6 and 2.2 on the server side the same 1.6 PE client was used, to 
> ensure only the server side differed.
> The results for pe --filterAll were revealing. See attached. 
> It appears a refactor to ScanQueryMatcher and friends has disabled the 
> ability of filters to provide meaningful SKIP hints, which disables an 
> optimization that avoids reseeking, leading to a serious and proportional 
> regression in reseek activity and time spent in that code path. So for 
> queries that use filters, there can be a substantial regression.
> Other test cases that did not use filters did not show this regression. If 
> filters are not used the behavior of ScanQueryMatcher between 1.6 and 2.2 was 
> almost identical, as measured by counts of the hint types returned, whether 
> or not column or version trackers are called, and counts of store seeks or 
> reseeks. Regarding micro-timings, there was a 10% variance in my testing and 
> results generally fell within this range, except for the filter all case of 
> course. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24637) Filter SKIP hinting regression

2020-06-25 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-24637:

Attachment: W-7665966-FAST_DIFF-FILTER_ALL.pdf

> Filter SKIP hinting regression
> --
>
> Key: HBASE-24637
> URL: https://issues.apache.org/jira/browse/HBASE-24637
> Project: HBase
>  Issue Type: Bug
>  Components: Filters, Performance, Scanners
>Affects Versions: 2.2.5
>Reporter: Andrew Kyle Purtell
>Priority: Major
> Attachments: W-7665966-FAST_DIFF-FILTER_ALL.pdf, 
> W-7665966-Instrument-low-level-scan-details-branch-1.patch, 
> W-7665966-Instrument-low-level-scan-details-branch-2.2.patch
>
>
> I have been looking into reported performance regressions in HBase 2 relative 
> to HBase 1. Depending on the test scenario, HBase 2 can demonstrate 
> significantly better microbenchmarks in a number of cases, and usually shows 
> improvement in whole cluster benchmarks like YCSB.
> To assist in debugging I added methods to RpcServer for updating per-call 
> metrics that leverage the fact it puts a reference to the current Call into a 
> thread local and that all activity for a given RPC is processed by a single 
> thread context. I then instrumented ScanQueryMatcher (in branch-1) and its 
> various friends (in branch-2.2), StoreScanner, HFileReaderV2 and 
> HFileReaderV3 (in branch-1) and HFileReaderImpl (in branch-2.2), HFileBlock, 
> and DefaultMemStore (branch-1) and SegmentScanner (branch-2.2). Test tables 
> with one family and 1, 5, 10, 20, 50, and 100 distinct column-qualifiers per 
> row were created, snapshot, dropped, and cloned from the snapshot. Both 1.6 
> and 2.2 versions under test operated on identical data files in HDFS. For 
> tests with 1.6 and 2.2 on the server side the same 1.6 PE client was used, to 
> ensure only the server side differed.
> The results for pe --filterAll were revealing. See attached. 
> It appears a refactor to ScanQueryMatcher and friends has disabled the 
> ability of filters to provide meaningful SKIP hints, which disables an 
> optimization that avoids reseeking, leading to a serious and proportional 
> regression in reseek activity and time spent in that code path. So for 
> queries that use filters, there can be a substantial regression.
> Other test cases that did not use filters did not show this regression. If 
> filters are not used the behavior of ScanQueryMatcher between 1.6 and 2.2 was 
> almost identical, as measured by counts of the hint types returned, whether 
> or not column or version trackers are called, and counts of store seeks or 
> reseeks. Regarding micro-timings, there was a 10% variance in my testing and 
> results generally fell within this range, except for the filter all case of 
> course. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24637) Filter SKIP hinting regression

2020-06-25 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-24637:

Attachment: W-7665966-Instrument-low-level-scan-details-branch-1.patch

> Filter SKIP hinting regression
> --
>
> Key: HBASE-24637
> URL: https://issues.apache.org/jira/browse/HBASE-24637
> Project: HBase
>  Issue Type: Bug
>  Components: Filters, Performance, Scanners
>Affects Versions: 2.2.5
>Reporter: Andrew Kyle Purtell
>Priority: Major
> Attachments: W-7665966-FAST_DIFF-FILTER_ALL.pdf, 
> W-7665966-Instrument-low-level-scan-details-branch-1.patch, 
> W-7665966-Instrument-low-level-scan-details-branch-2.2.patch
>
>
> I have been looking into reported performance regressions in HBase 2 relative 
> to HBase 1. Depending on the test scenario, HBase 2 can demonstrate 
> significantly better microbenchmarks in a number of cases, and usually shows 
> improvement in whole cluster benchmarks like YCSB.
> To assist in debugging I added methods to RpcServer for updating per-call 
> metrics that leverage the fact it puts a reference to the current Call into a 
> thread local and that all activity for a given RPC is processed by a single 
> thread context. I then instrumented ScanQueryMatcher (in branch-1) and its 
> various friends (in branch-2.2), StoreScanner, HFileReaderV2 and 
> HFileReaderV3 (in branch-1) and HFileReaderImpl (in branch-2.2), HFileBlock, 
> and DefaultMemStore (branch-1) and SegmentScanner (branch-2.2). Test tables 
> with one family and 1, 5, 10, 20, 50, and 100 distinct column-qualifiers per 
> row were created, snapshot, dropped, and cloned from the snapshot. Both 1.6 
> and 2.2 versions under test operated on identical data files in HDFS. For 
> tests with 1.6 and 2.2 on the server side the same 1.6 PE client was used, to 
> ensure only the server side differed.
> The results for pe --filterAll were revealing. See attached. 
> It appears a refactor to ScanQueryMatcher and friends has disabled the 
> ability of filters to provide meaningful SKIP hints, which disables an 
> optimization that avoids reseeking, leading to a serious and proportional 
> regression in reseek activity and time spent in that code path. So for 
> queries that use filters, there can be a substantial regression.
> Other test cases that did not use filters did not show this regression. If 
> filters are not used the behavior of ScanQueryMatcher between 1.6 and 2.2 was 
> almost identical, as measured by counts of the hint types returned, whether 
> or not column or version trackers are called, and counts of store seeks or 
> reseeks. Regarding micro-timings, there was a 10% variance in my testing and 
> results generally fell within this range, except for the filter all case of 
> course. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24637) Filter SKIP hinting regression

2020-06-25 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-24637:

Attachment: W-7665966-Instrument-low-level-scan-details-branch-2.2.patch

> Filter SKIP hinting regression
> --
>
> Key: HBASE-24637
> URL: https://issues.apache.org/jira/browse/HBASE-24637
> Project: HBase
>  Issue Type: Bug
>  Components: Filters, Performance, Scanners
>Affects Versions: 2.2.5
>Reporter: Andrew Kyle Purtell
>Priority: Major
> Attachments: W-7665966-FAST_DIFF-FILTER_ALL.pdf, 
> W-7665966-Instrument-low-level-scan-details-branch-1.patch, 
> W-7665966-Instrument-low-level-scan-details-branch-2.2.patch
>
>
> I have been looking into reported performance regressions in HBase 2 relative 
> to HBase 1. Depending on the test scenario, HBase 2 can demonstrate 
> significantly better microbenchmarks in a number of cases, and usually shows 
> improvement in whole cluster benchmarks like YCSB.
> To assist in debugging I added methods to RpcServer for updating per-call 
> metrics that leverage the fact it puts a reference to the current Call into a 
> thread local and that all activity for a given RPC is processed by a single 
> thread context. I then instrumented ScanQueryMatcher (in branch-1) and its 
> various friends (in branch-2.2), StoreScanner, HFileReaderV2 and 
> HFileReaderV3 (in branch-1) and HFileReaderImpl (in branch-2.2), HFileBlock, 
> and DefaultMemStore (branch-1) and SegmentScanner (branch-2.2). Test tables 
> with one family and 1, 5, 10, 20, 50, and 100 distinct column-qualifiers per 
> row were created, snapshot, dropped, and cloned from the snapshot. Both 1.6 
> and 2.2 versions under test operated on identical data files in HDFS. For 
> tests with 1.6 and 2.2 on the server side the same 1.6 PE client was used, to 
> ensure only the server side differed.
> The results for pe --filterAll were revealing. See attached. 
> It appears a refactor to ScanQueryMatcher and friends has disabled the 
> ability of filters to provide meaningful SKIP hints, which disables an 
> optimization that avoids reseeking, leading to a serious and proportional 
> regression in reseek activity and time spent in that code path. So for 
> queries that use filters, there can be a substantial regression.
> Other test cases that did not use filters did not show this regression. If 
> filters are not used the behavior of ScanQueryMatcher between 1.6 and 2.2 was 
> almost identical, as measured by counts of the hint types returned, whether 
> or not column or version trackers are called, and counts of store seeks or 
> reseeks. Regarding micro-timings, there was a 10% variance in my testing and 
> results generally fell within this range, except for the filter all case of 
> course. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24637) Filter SKIP hinting regression

2020-06-25 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-24637:

Affects Version/s: 2.2.5

> Filter SKIP hinting regression
> --
>
> Key: HBASE-24637
> URL: https://issues.apache.org/jira/browse/HBASE-24637
> Project: HBase
>  Issue Type: Bug
>  Components: Filters, Performance, Scanners
>Affects Versions: 2.2.5
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> I have been looking into reported performance regressions in HBase 2 relative 
> to HBase 1. Depending on the test scenario, HBase 2 can demonstrate 
> significantly better microbenchmarks in a number of cases, and usually shows 
> improvement in whole cluster benchmarks like YCSB.
> To assist in debugging I added methods to RpcServer for updating per-call 
> metrics that leverage the fact it puts a reference to the current Call into a 
> thread local and that all activity for a given RPC is processed by a single 
> thread context. I then instrumented ScanQueryMatcher (in branch-1) and its 
> various friends (in branch-2.2), StoreScanner, HFileReaderV2 and 
> HFileReaderV3 (in branch-1) and HFileReaderImpl (in branch-2.2), HFileBlock, 
> and DefaultMemStore (branch-1) and SegmentScanner (branch-2.2). Test tables 
> with one family and 1, 5, 10, 20, 50, and 100 distinct column-qualifiers per 
> row were created, snapshot, dropped, and cloned from the snapshot. Both 1.6 
> and 2.2 versions under test operated on identical data files in HDFS. For 
> tests with 1.6 and 2.2 on the server side the same 1.6 PE client was used, to 
> ensure only the server side differed.
> The results for pe --filterAll were revealing. See attached. 
> It appears a refactor to ScanQueryMatcher and friends has disabled the 
> ability of filters to provide meaningful SKIP hints, which disables an 
> optimization that avoids reseeking, leading to a serious and proportional 
> regression in reseek activity and time spent in that code path. So for 
> queries that use filters, there can be a substantial regression.
> Other test cases that did not use filters did not show this regression. If 
> filters are not used the behavior of ScanQueryMatcher between 1.6 and 2.2 was 
> almost identical, as measured by counts of the hint types returned, whether 
> or not column or version trackers are called, and counts of store seeks or 
> reseeks. Regarding micro-timings, there was a 10% variance in my testing and 
> results generally fell within this range, except for the filter all case of 
> course. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)