[jira] [Commented] (HBASE-10502) [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel.
[ https://issues.apache.org/jira/browse/HBASE-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898138#comment-13898138 ] Liyin Tang commented on HBASE-10502: In addition, the API of HBASE-10502 seems to more flexible (to me). Because if there is a single scan request, spanning multiple region boundaries, then hbase client is always able to split this scan request into multiple region-local scan requests, and then submit to HBASE-10502 for parallel execution. > [89-fb] ParallelScanner: a client utility to perform multiple scan requests > in parallel. > > > Key: HBASE-10502 > URL: https://issues.apache.org/jira/browse/HBASE-10502 > Project: HBase > Issue Type: New Feature >Reporter: Liyin Tang > Fix For: 0.89-fb > > > ParallelScanner is a utility class for the HBase client to perform multiple > scan requests in parallel. It requires all the scan requests having the same > caching size for the simplicity purpose. > > This class provides 3 very basic functionalities: > * The initialize function will Initialize all the ResultScanners by calling > {@link HTable#getScanner(Scan)} in parallel for each scan request. > * The next function will call the corresponding {@link ResultScanner#next(int > numRows)} from each scan request in parallel, and then return all the results > together as a list. Also, if result list is empty, it indicates there is no > data left for all the scanners and the user can call {@link #close()} > afterwards. > * The close function will close all the scanners and shutdown the thread pool. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10502) [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel.
[ https://issues.apache.org/jira/browse/HBASE-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898132#comment-13898132 ] Liyin Tang commented on HBASE-10502: Actually HBase-9272 + HBase10502 is quite effective to optimize Join queries. Assuming a join query such as Table A joins Table B based on row key / some prefix, then HBase-9272 is useful to issue the initial scan in parallel to retrieve all the join keys, and then based on join keys, multiple scan queries for Table B can be constructed and be submitted in parallel by HBase10502. > [89-fb] ParallelScanner: a client utility to perform multiple scan requests > in parallel. > > > Key: HBASE-10502 > URL: https://issues.apache.org/jira/browse/HBASE-10502 > Project: HBase > Issue Type: New Feature >Reporter: Liyin Tang > Fix For: 0.89-fb > > > ParallelScanner is a utility class for the HBase client to perform multiple > scan requests in parallel. It requires all the scan requests having the same > caching size for the simplicity purpose. > > This class provides 3 very basic functionalities: > * The initialize function will Initialize all the ResultScanners by calling > {@link HTable#getScanner(Scan)} in parallel for each scan request. > * The next function will call the corresponding {@link ResultScanner#next(int > numRows)} from each scan request in parallel, and then return all the results > together as a list. Also, if result list is empty, it indicates there is no > data left for all the scanners and the user can call {@link #close()} > afterwards. > * The close function will close all the scanners and shutdown the thread pool. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10502) [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel.
[ https://issues.apache.org/jira/browse/HBASE-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898128#comment-13898128 ] Liyin Tang commented on HBASE-10502: By skimming though HBASE-9272, the semantics seem to be a little different. In this case, the client actually wants to construct multiple scan requests, while HBASE-9272 is to perform a single scan request in parallel. > [89-fb] ParallelScanner: a client utility to perform multiple scan requests > in parallel. > > > Key: HBASE-10502 > URL: https://issues.apache.org/jira/browse/HBASE-10502 > Project: HBase > Issue Type: New Feature >Reporter: Liyin Tang > Fix For: 0.89-fb > > > ParallelScanner is a utility class for the HBase client to perform multiple > scan requests in parallel. It requires all the scan requests having the same > caching size for the simplicity purpose. > > This class provides 3 very basic functionalities: > * The initialize function will Initialize all the ResultScanners by calling > {@link HTable#getScanner(Scan)} in parallel for each scan request. > * The next function will call the corresponding {@link ResultScanner#next(int > numRows)} from each scan request in parallel, and then return all the results > together as a list. Also, if result list is empty, it indicates there is no > data left for all the scanners and the user can call {@link #close()} > afterwards. > * The close function will close all the scanners and shutdown the thread pool. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10502) [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel.
[ https://issues.apache.org/jira/browse/HBASE-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898111#comment-13898111 ] Lars Hofhansl commented on HBASE-10502: --- see also HBASE-9272 > [89-fb] ParallelScanner: a client utility to perform multiple scan requests > in parallel. > > > Key: HBASE-10502 > URL: https://issues.apache.org/jira/browse/HBASE-10502 > Project: HBase > Issue Type: New Feature >Reporter: Liyin Tang > Fix For: 0.89-fb > > > ParallelScanner is a utility class for the HBase client to perform multiple > scan requests in parallel. It requires all the scan requests having the same > caching size for the simplicity purpose. > > This class provides 3 very basic functionalities: > * The initialize function will Initialize all the ResultScanners by calling > {@link HTable#getScanner(Scan)} in parallel for each scan request. > * The next function will call the corresponding {@link ResultScanner#next(int > numRows)} from each scan request in parallel, and then return all the results > together as a list. Also, if result list is empty, it indicates there is no > data left for all the scanners and the user can call {@link #close()} > afterwards. > * The close function will close all the scanners and shutdown the thread pool. -- This message was sent by Atlassian JIRA (v6.1.5#6160)