[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807637#comment-13807637 ] Vladimir Rodionov commented on HBASE-9769: -- Yes, its 1 HFile and all data was cached in block cache. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt, 9769-trunk-v3.txt, 9769-trunk-v4.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807632#comment-13807632 ] Chao Shi commented on HBASE-9769: - Hi folks, I'm running into a similar problem (HBASE-9811) and I got some interesting testing figures (in that ticket). Briefly speaking, 1) I get similar improvement when replace SEEK_NEXT_COL with SKIP and 2) the performance drops greatly as we get more HFiles. So did your test case use only 1 HFile, [~vrodionov]? > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt, 9769-trunk-v3.txt, 9769-trunk-v4.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13804780#comment-13804780 ] Vladimir Rodionov commented on HBASE-9769: -- Yes, I can confirm that on scan operations only reseeks are used and HBASE-5987 works in 0.94 upwards. May be there is not much sense in this optimization for seekTo (as since it is for initial scan setup and we need anyway to go through block index)? > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt, 9769-trunk-v3.txt, 9769-trunk-v4.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13804375#comment-13804375 ] Lars Hofhansl commented on HBASE-9769: -- Let's do a new issue. HBASE-5987 is for reseek only as we know we only scan forward in that case. So it looks like HBASE-5987 is working as expected for reseeks. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt, 9769-trunk-v3.txt, 9769-trunk-v4.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13803317#comment-13803317 ] stack commented on HBASE-9769: -- HBASE-5987 was forward-ported as HBASE-6032 I reopened it. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt, 9769-trunk-v3.txt, 9769-trunk-v4.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13803319#comment-13803319 ] stack commented on HBASE-9769: -- Or rather than reopen we should do a new issue. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt, 9769-trunk-v3.txt, 9769-trunk-v4.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13803214#comment-13803214 ] Vladimir Rodionov commented on HBASE-9769: -- I think that HBASE-5987 does not work in 0.94 and trunk. This is why we are doing all this scanners hints and optimizations. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt, 9769-trunk-v3.txt, 9769-trunk-v4.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13803209#comment-13803209 ] Vladimir Rodionov commented on HBASE-9769: -- Looks like HBase-5987 and related JIRAs need to be reopened. Can somebody go through HFileScanner's hierarchy and confirm my findings? > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt, 9769-trunk-v3.txt, 9769-trunk-v4.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13803206#comment-13803206 ] Vladimir Rodionov commented on HBASE-9769: -- Interesting. We check if sought key is inside current block only in *AbstractScannerV2.reseekTo*. There are other public methods exposed by HFileScanner and implemented in AbstractScannerV2 : seekTo and seekBefore, which do not check current block and always goes to index. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt, 9769-trunk-v3.txt, 9769-trunk-v4.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13803203#comment-13803203 ] stack commented on HBASE-9769: -- Do we see the benefit [~zjushch] talks of? > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt, 9769-trunk-v3.txt, 9769-trunk-v4.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13803178#comment-13803178 ] Vladimir Rodionov commented on HBASE-9769: -- I just checked trunk AbstractScannerV2. The code of HBase-5987 is there: {code} @Override public int reseekTo(byte[] key, int offset, int length) throws IOException { int compared; if (isSeeked()) { ByteBuffer bb = getKey(); compared = reader.getComparator().compare(key, offset, length, bb.array(), bb.arrayOffset(), bb.limit()); if (compared < 1) { // If the required key is less than or equal to current key, then // don't do anything. return compared; } else { if (this.nextIndexedKey != null && (this.nextIndexedKey == HConstants.NO_NEXT_INDEXED_KEY || reader.getComparator().compare(key, offset, length, nextIndexedKey, 0, nextIndexedKey.length) < 0)) { // The reader shall continue to scan the current data block instead of querying the // block index as long as it knows the target key is strictly smaller than // the next indexed key or the current data block is the last data block. return loadBlockAndSeekToKey(this.block, this.nextIndexedKey, false, key, offset, length, false); } } } // Don't rewind on a reseek operation, because reseek implies that we are // always going forward in the file. return seekTo(key, offset, length, false); } {code} but it seems that *nextIndexedKey* is never initialized properly. I did not manage to find the place where the next block first key is assigned to this variable. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt, 9769-trunk-v3.txt, 9769-trunk-v4.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13803186#comment-13803186 ] Vladimir Rodionov commented on HBASE-9769: -- Never mind, I was wrong. The code is correct. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt, 9769-trunk-v3.txt, 9769-trunk-v4.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13803184#comment-13803184 ] stack commented on HBASE-9769: -- Perhaps this was incorrectly forward-ported? > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt, 9769-trunk-v3.txt, 9769-trunk-v4.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13803066#comment-13803066 ] stack commented on HBASE-9769: -- Isn't HBASE-5987 committed on 0.94? If so, I wonder why we do not see the benefit [~zjushch] talks of. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt, 9769-trunk-v3.txt, 9769-trunk-v4.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13802634#comment-13802634 ] chunhui shen commented on HBASE-9769: - HBASE-5987 would greatly improve the performance for this case , I think. 'reseek' is optimizated in HBASE-5987. {noformat} +// The reader shall continue to scan the current data block instead of querying the +// block index as long as it knows the target key is strictly smaller than +// the next indexed key or the current data block is the last data block. {noformat} In addition, Performace difference is not so much between expliciting column list and scaning wildcard columns in my test environment. I think it's the effect of HBASE-5987 since we applied it > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt, 9769-trunk-v3.txt, 9769-trunk-v4.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13802127#comment-13802127 ] stack commented on HBASE-9769: -- Patch looks good caveat above questions. It was better having the filter in regionserver package as you originally had it -- then its access could have been shutdown confined to where it is used. I like what [~lhofhansl] says about "...if that the column tracker code is not efficient we should fix that rather than circumventing it completely with a filter." Doc of this public static could be better explaining when a user would set the attribute: - + /** Scan Hints */ + static public final String HINT_NARROW_ROWS = "_hint_narrow_rows_"; Good stuff Vladimir. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt, 9769-trunk-v3.txt, 9769-trunk-v4.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13802059#comment-13802059 ] Vladimir Rodionov commented on HBASE-9769: -- *Only with a coprocessor would it be possible to exercise checkVersion and avoid the network IO.* This is exactly what I am interested in - optimizing scan operation in coprocessors (think - Phoenix). I will try your patch when I have a time this week. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt, 9769-trunk-v3.txt, 9769-trunk-v4.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13801529#comment-13801529 ] Lars Hofhansl commented on HBASE-9769: -- We should test end-to-end, not a microbenchmark of StoreScanner. Note that you cannot exercise the seeking code in checkVersion without returning data to the client, in which case network IO will dominate. If you filter KVs out with a filter before that checkVersion is never called, if the filter returns INCLUDE it'll call checkVersion and incur a seek. Only with a coprocessor would it be possible to exercise checkVersion and avoid the network IO. Also note that in your filter case you'd still get the SEEK_NEXT_ROW/SEEK_NEXT_COL in ScanWildcardColumnTracker.checkVersion for each column that you included. When you get a chance, could you check out the last patch on HBASE-9778? Maybe you could run it through your micro StoreScanner test, I'd be curious how it compares. Generally, if that the column tracker code is not efficient we should fix that rather than circumventing it completely with a filter. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt, 9769-trunk-v3.txt, 9769-trunk-v4.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13801510#comment-13801510 ] Vladimir Rodionov commented on HBASE-9769: -- Performance-wise, I think this filter is going to be faster than ExplicitColumnTracker with a hint. To make it comparable to the ExplicitScanReplacementFilter, you will have to optimize the ExplicitColumnTracker's code. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt, 9769-trunk-v3.txt, 9769-trunk-v4.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13801471#comment-13801471 ] Lars Hofhansl commented on HBASE-9769: -- Lastly (and sorry for being difficult), why is this faster than passing the small row hint to ExplicitColumnTracker and replace SEEK_NEXT_COL with SKIP? (this would be HBASE-9778 but with your explicit hint) It seems the ExplicitColumnTracker then does close to the same work as ScanWildcardColumnTracker. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt, 9769-trunk-v3.txt, 9769-trunk-v4.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13801214#comment-13801214 ] Lars Hofhansl commented on HBASE-9769: -- Some comments: * shouldUseExplicitColumnFilter is misnamed. It has the side effect of adding the filter. * Curious how much slower just using HashMap was instead of having your own bucket array. * I think this would be cleaner if the Filter would be a proper filter that can serialized (i.e. protoful in trunk and readFields/write in 0.94). (FYI. I am debating the same in HBASE-9272. The parallel scanner could just be a sample scanner to use, or it could automatically triggered, but it is still 100% client side in either case) > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt, 9769-trunk-v3.txt, 9769-trunk-v4.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799670#comment-13799670 ] Ted Yu commented on HBASE-9769: --- Since HStore is modified, this feature is not totally client side. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt, 9769-trunk-v3.txt, 9769-trunk-v4.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799666#comment-13799666 ] Vladimir Rodionov commented on HBASE-9769: -- I prefer keeping the existing version, of course. The reason is new Scan hinting system. This is the first performance - oriented HINT for Scan operation. There are some others coming. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt, 9769-trunk-v3.txt, 9769-trunk-v4.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799657#comment-13799657 ] Vladimir Rodionov commented on HBASE-9769: -- Sure, it will be slower for rows above 1-2K in size. I have not done any testing on max row size, but 5 cols rows of 150 bytes total is much faster with the filter. The filter is not client - side. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt, 9769-trunk-v3.txt, 9769-trunk-v4.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799649#comment-13799649 ] Lars Hofhansl commented on HBASE-9769: -- I would prefer if we just included the Filter and document its use. Generally this approach will be slower with many columns *or* many versions of few columns. What do you think about that ([~vrodionov], [~stack], [~yuzhih...@gmail.com])? > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt, 9769-trunk-v3.txt, 9769-trunk-v4.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799607#comment-13799607 ] Ted Yu commented on HBASE-9769: --- +1 from me. Mind adding release note ? > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt, 9769-trunk-v3.txt, 9769-trunk-v4.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799235#comment-13799235 ] Vladimir Rodionov commented on HBASE-9769: -- This patch has nothing to do with failed Zk test. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt, 9769-trunk-v3.txt, 9769-trunk-v4.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798874#comment-13798874 ] Hadoop QA commented on HBASE-9769: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12609099/9769-trunk-v4.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.zookeeper.lock.TestZKInterProcessReadWriteLock Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7587//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7587//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7587//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7587//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7587//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7587//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7587//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7587//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7587//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7587//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7587//console This message is automatically generated. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt, 9769-trunk-v3.txt, 9769-trunk-v4.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798819#comment-13798819 ] Hadoop QA commented on HBASE-9769: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12609092/9769-trunk-v3.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7586//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7586//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7586//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7586//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7586//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7586//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7586//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7586//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7586//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7586//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7586//console This message is automatically generated. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt, 9769-trunk-v3.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798748#comment-13798748 ] Ted Yu commented on HBASE-9769: --- For ExplicitScanReplacementFilter: {code} + * Copyright 2013 The Apache Software Foundation {code} Year is not needed. {code} +package org.apache.hadoop.hbase.regionserver; {code} I thought you were going to move this class to filter package. {code} - private abstract static class SinkWriter { + static class SinkWriter { {code} Is the above needed for this JIRA ? {code} + private static byte[][] CQ = new byte[][]{ {code} nit: since CQ holds qualifiers, consider naming the variable CQs. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798740#comment-13798740 ] Hadoop QA commented on HBASE-9769: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12609076/9769-trunk-v2.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7584//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7584//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7584//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7584//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7584//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7584//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7584//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7584//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7584//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7584//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7584//console This message is automatically generated. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt, > 9769-trunk-v2.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798643#comment-13798643 ] Ted Yu commented on HBASE-9769: --- Minor comment: {code} +for (int i = 0; i < length; i++) { + h = 31 * h + buffer[off++]; {code} Both i and off are incremented in each iteration. Looks like 'off++' can be replaced with 'offset+i'. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797680#comment-13797680 ] Lars Hofhansl commented on HBASE-9769: -- Did some profiling on why reseek() is so much slower than next() even when reseek just has to seek to the next key. The reason is all the compares we're doing... For each reseek: * 2 KV compares in KeyValueHeap.generalizedSeek to find the top scanner * 2 key compares in HFileReaderV2.ScannerV2.reseekTo (one to check for reseek, one to check against the index key) * 2 key compares in HFileReaderV2.ScannerV2.blockSeek to find the right key After all that we finally read the KV we found. While next() just reads the next KV from the current HFile block. Nothing jumps here as to how we could simplify this. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797574#comment-13797574 ] Vladimir Rodionov commented on HBASE-9769: -- 1. Its not a client side filter (keeps only column qualifiers - no column families). I decided to put it into regionserver, but I can move to hbase-client 2. OK, shouldUse is better 3. you right this check is not needed. I will add tests, of course. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797559#comment-13797559 ] Ted Yu commented on HBASE-9769: --- Please add annotation for audience and stability: {code} +public class ExplicitScanReplacementFilter extends FilterBase { {code} Should it be in org.apache.hadoop.hbase.filter package ? {code} +package org.apache.hadoop.hbase.regionserver; {code} {code} + private boolean doesUseExplicitColumnFilter(Scan scan) { {code} Name the method shouldUseExplicitColumnFilter() ? {code} + if (cols != null && (cols.size() > 1 || cols.first() != null)) { {code} Why is cols.size() > 1 check needed ? Can you add a test for the new class ? > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797524#comment-13797524 ] Hadoop QA commented on HBASE-9769: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608840/9769-trunk-v1.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7574//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7574//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7574//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7574//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7574//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7574//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7574//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7574//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7574//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7574//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7574//console This message is automatically generated. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt, 9769-trunk-v1.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797234#comment-13797234 ] Ted Yu commented on HBASE-9769: --- This is what I meant with 'trunk': http://svn.apache.org/repos/asf/hbase/trunk > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797230#comment-13797230 ] Vladimir Rodionov commented on HBASE-9769: -- I thought that I created patch for 0.94-trunk (which I created with branch 0.94). > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797224#comment-13797224 ] Ted Yu commented on HBASE-9769: --- QA only applies patch on HBase trunk. Can you attach patch for trunk ? > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797195#comment-13797195 ] Vladimir Rodionov commented on HBASE-9769: -- What does *The patch command could not apply the patch.* mean? I used git diff --no-prefix > patch.txt to create the patch on 94-trunk. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797167#comment-13797167 ] Hadoop QA commented on HBASE-9769: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608773/9769-94-v2.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7570//console This message is automatically generated. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt, 9769-94-v2.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796461#comment-13796461 ] Vladimir Rodionov commented on HBASE-9769: -- Some additional performance numbers on a patch: Table: 1 CF + 5 CQ, value ~ 10-15 bytes. Rows = 50. All data in a block cache. Tested on *StoreScanner * directly. Default: Raw = 1.28M rows per sec 1 CQ in Scan = 0.7M 2 CQ in Scan = 0.5M 3 CQ in Scan = 0.4M 4 CQ in Scan = 0.32M 5 CQ in Scan = 0.33M Patch: Raw = 1.28M rows per sec 1 CQ in Scan = 1.27M 2 CQ in Scan = 1.2M 3 CQ in Scan = 1.1M 4 CQ in Scan = 1.05M 5 CQ in Scan = 1M > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796436#comment-13796436 ] Lars Hofhansl commented on HBASE-9769: -- Nit: Use the Eclipse formatter from HBASE-5961. We use 2 spaces instead of a tab for indentation. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796435#comment-13796435 ] Lars Hofhansl commented on HBASE-9769: -- I created HBASE-9778 for my patch idea. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796434#comment-13796434 ] Lars Hofhansl commented on HBASE-9769: -- MAX_VERSIONS=1 (or a low number) can only be used to eliminate the NEXT_COL seek (as that is use to seek past versions of the same column). It does not indicate anything about the number of columns in a row, and hence we know nothing about whether SEEK_NEXT_ROW or a series of SKIPs is better. We need both, I think. (MAX_VERSIONS is a hint in the sense that there temporarily can be more versions in the memstore and/or distributed over various HFiles, only after a major compaction will the number of versions actually be <= MAX_VERSIONS.) > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796433#comment-13796433 ] Vladimir Rodionov commented on HBASE-9769: -- Lars, our patches are independent. I think they need to be merged into one, or you better create new JIRA for *do-not-seek-next-col* thing. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796431#comment-13796431 ] Vladimir Rodionov commented on HBASE-9769: -- To activate this feature (hint): {code} Scan scan = ... scan.setAttribute(Scan.SCAN_NARROW_ROWS, "true".getBytes()); {code} OK. I think I will replace SCAN_NARROW_ROWS with HINT_NARROW_ROWS. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796430#comment-13796430 ] Vladimir Rodionov commented on HBASE-9769: -- It contains check MAX_VERSIONS = 1 suggested by Lars (not sure if it is really a hint?). Lars version gives improvements as well, but it relies on default hint of MAX_VERSIONS and is slower I think. I completely eliminated ExplicitColumnTracker from the code path. The more columns in a scan the more is going to be performance difference, I think (again). > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796426#comment-13796426 ] Vladimir Rodionov commented on HBASE-9769: -- It contains check MAX_VERSIONS = 1 suggested by Lars (not sure if it is really a hint?). > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt, 9769-94.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796417#comment-13796417 ] Lars Hofhansl commented on HBASE-9769: -- bq. Lars, HTable can have small number of versions and large number of column qualifiers or large values (say 100K). That is true. Seeking to the next column is not a good idea, though, if we know there are not going to be many versions to skip. So the suggested patch here will not be slower than before, and it will improve performance in many cases. As the size of a KV approaches the HFile blocksize (64k by default), SKIP and SEEK_NEXT_COL should become equivalent in performance (in both cases we'll need to find the KV in the next block). As I said, this does not eliminate the NEXT_ROW seeking. I fear the filter approach will lead to issues when there are already filters configured on the scan. You'd have to convert this to a FilterList while keeping all the semantics and performance characteristics. I think it might be best to ship your Filter and document its use. I'll file a separate issue for my patch. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796411#comment-13796411 ] Lars Hofhansl commented on HBASE-9769: -- Note that Vladimir's small row hint still can be used to eliminate the NEXT_ROW seek. Maybe, again, it is prudent to split this in two issues. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample2.txt, > 9769-0.94-sample.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796408#comment-13796408 ] Lars Hofhansl commented on HBASE-9769: -- Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch sample1: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample1.txt, 9769-0.94-sample.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796348#comment-13796348 ] Vladimir Rodionov commented on HBASE-9769: -- Lars, HTable can have small number of versions and large number of column qualifiers or large values (say 100K). > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: 9769-0.94-sample.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796345#comment-13796345 ] Lars Hofhansl commented on HBASE-9769: -- Another idea I had was to make use of the column family's VERSIONS setting. If it is "small" use INCLUDE and SKIP in the ExplicitColumnTracker, otherwise use INCLUDE_AND_SEEK_NEXT_COL and SEEK_NEXT_COL. In my tests this yields a nice improvement bringing ExplicitColumnTracker on par with ScanWildcardColumnTracker. For now I defined "small" as 10, but that needs to be tested more. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796331#comment-13796331 ] Vladimir Rodionov commented on HBASE-9769: -- Its a server-side filter and is not meant to be exposed to HBase client. The reason: it has only list of qualifiers - no columns. It is instantiated in StoreScanner. If Scan has already filter, the new FilterList is created with MUST_PASS_ALL operator. First goes ExplicitColumnsFilter then existing filter. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796311#comment-13796311 ] chunhui shen commented on HBASE-9769: - Move the logic of above patch to Scan class, is it also OK? It means adding the ExplicitColumnsFilter in Scan.java when setting the attribute "SCAN-SMALL-ROWS" In addition, I'm worry the data correctness if Scan already has complex filters. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796242#comment-13796242 ] Vladimir Rodionov commented on HBASE-9769: -- In 0.94.12 the difference is not so dramatic as in 0.94.6 but still exists: default: 500K rows per sec filter-based: 1.2M rows per sec It seems that there is performance regression in scan filters in 0.94.12. The code which gives me almost 1.5M in 0.94.6 runs only 1.2M rows per sec in 0.94.12. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795694#comment-13795694 ] Lars Hofhansl commented on HBASE-9769: -- Also try with 0.94.12. The specific issue you're seeing might be fixed there. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795559#comment-13795559 ] Vladimir Rodionov commented on HBASE-9769: -- Ted, is this suggestion to change the attribute name? Smallness of rows is not easy to estimate automatically that is why I suggest using explicit hint in a Scan instance. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795547#comment-13795547 ] Ted Yu commented on HBASE-9769: --- bq. Generally reseeks are better if they can skip many KVs. There is already a feature for small scans. If small in "SCAN-SMALL-ROWS" is replaced with narrow (or something similar), would it help clarify its purpose ? > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795519#comment-13795519 ] Vladimir Rodionov commented on HBASE-9769: -- The main idea is to provide new Scanner's hint (via new attribute) for RS - something SCAN-SMALL-ROWS and in Store.getScanners we will check for this attribute and if it is present we use StoreSCanner ctor with NULL as columns set: {code} public KeyValueScanner getScanner(Scan scan, final NavigableSet targetCols) throws IOException { lock.readLock().lock(); boolean smallRowsScan = scan.getAttribute("SCAN-SMALL-ROWS") != null; if(smallRowsScan){ Filter ecFilter = new ExplicitColumnsFilter(targetCols); // update filter in Scan with ecFilter // remove columnFamilyMap from Scan } try { KeyValueScanner scanner = null; if (getHRegion().getCoprocessorHost() != null) { scanner = getHRegion().getCoprocessorHost().preStoreScannerOpen(this, scan, smallRowsScan? null: targetCols); } if (scanner == null) { scanner = new StoreScanner(this, getScanInfo(), scan, smallRowsScan? null: targetCols:targetCols); } return scanner; } finally { lock.readLock().unlock(); } } {code} If no attribute than - default path > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9769) Improve performance of a Scanner with explicit column list when rows are small/medium size
[ https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795509#comment-13795509 ] Vladimir Rodionov commented on HBASE-9769: -- Lars. I am going (finally) to create the patch. The code is running inside my test bed which is slightly off from any HBase revs and trunk. > Improve performance of a Scanner with explicit column list when rows are > small/medium size > -- > > Key: HBASE-9769 > URL: https://issues.apache.org/jira/browse/HBASE-9769 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 0.98.0, 0.94.12, 0.96.0 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > -- This message was sent by Atlassian JIRA (v6.1#6144)