[ https://issues.apache.org/jira/browse/HBASE-24637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17159481#comment-17159481 ]
Andrew Kyle Purtell edited comment on HBASE-24637 at 7/16/20, 8:51 PM: ----------------------------------------------------------------------- I agree the difference in hint codes is not the regression per se, the reseeking is the regression. There is a serious and proportional cost spent in reseeking in branch-2 that is absent in branch-1 under identical test conditions and same store files in hdfs. The metrics for this are store_reseek and store_reseek_ms. It is suspicious both hint code and reseek metrics show such deviation in branch-2 as opposed to branch-1 was (Author: apurtell): I agree the difference in hint codes is not the regression per se, the reseeking is the regression. There is a serious and proportional cost spent in reseeking in branch-2 that is absent in branch-1 under identical test conditions and same store files in hdfs. > Filter SKIP hinting regression > ------------------------------ > > Key: HBASE-24637 > URL: https://issues.apache.org/jira/browse/HBASE-24637 > Project: HBase > Issue Type: Bug > Components: Filters, Performance, Scanners > Affects Versions: 2.2.5 > Reporter: Andrew Kyle Purtell > Priority: Major > Attachments: W-7665966-FAST_DIFF-FILTER_ALL.pdf, > W-7665966-Instrument-low-level-scan-details-branch-1.patch, > W-7665966-Instrument-low-level-scan-details-branch-2.2.patch, > parse_call_trace.pl > > > I have been looking into reported performance regressions in HBase 2 relative > to HBase 1. Depending on the test scenario, HBase 2 can demonstrate > significantly better microbenchmarks in a number of cases, and usually shows > improvement in whole cluster benchmarks like YCSB. > To assist in debugging I added methods to RpcServer for updating per-call > metrics that leverage the fact it puts a reference to the current Call into a > thread local and that all activity for a given RPC is processed by a single > thread context. I then instrumented ScanQueryMatcher (in branch-1) and its > various friends (in branch-2.2), StoreScanner, HFileReaderV2 and > HFileReaderV3 (in branch-1) and HFileReaderImpl (in branch-2.2), HFileBlock, > and DefaultMemStore (branch-1) and SegmentScanner (branch-2.2). Test tables > with one family and 1, 5, 10, 20, 50, and 100 distinct column-qualifiers per > row were created, snapshot, dropped, and cloned from the snapshot. Both 1.6 > and 2.2 versions under test operated on identical data files in HDFS. For > tests with 1.6 and 2.2 on the server side the same 1.6 PE client was used, to > ensure only the server side differed. > The results for pe --filterAll were revealing. See attached. > It appears a refactor to ScanQueryMatcher and friends has disabled the > ability of filters to provide meaningful SKIP hints, which disables an > optimization that avoids reseeking, leading to a serious and proportional > regression in reseek activity and time spent in that code path. So for > queries that use filters, there can be a substantial regression. > Other test cases that did not use filters did not show this regression. If > filters are not used the behavior of ScanQueryMatcher between 1.6 and 2.2 was > almost identical, as measured by counts of the hint types returned, whether > or not column or version trackers are called, and counts of store seeks or > reseeks. Regarding micro-timings, there was a 10% variance in my testing and > results generally fell within this range, except for the filter all case of > course. -- This message was sent by Atlassian Jira (v8.3.4#803005)