[jira] [Commented] (HBASE-10502) [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel.

2014-02-11 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898138#comment-13898138
 ] 

Liyin Tang commented on HBASE-10502:


In addition, the API of HBASE-10502 seems to more flexible (to me). Because if 
there is a single scan request, spanning multiple region boundaries, then hbase 
client is always able to split this scan request into multiple region-local 
scan requests, and then submit to HBASE-10502 for parallel execution.


> [89-fb] ParallelScanner: a client utility to perform multiple scan requests 
> in parallel.
> 
>
> Key: HBASE-10502
> URL: https://issues.apache.org/jira/browse/HBASE-10502
> Project: HBase
>  Issue Type: New Feature
>Reporter: Liyin Tang
> Fix For: 0.89-fb
>
>
> ParallelScanner is a utility class for the HBase client to perform multiple 
> scan requests in parallel. It requires all the scan requests having the same 
> caching size for the simplicity purpose. 
>  
> This class provides 3 very basic functionalities: 
> * The initialize function will Initialize all the ResultScanners by calling 
> {@link HTable#getScanner(Scan)} in parallel for each scan request.
> * The next function will call the corresponding {@link ResultScanner#next(int 
> numRows)} from each scan request in parallel, and then return all the results 
> together as a list.  Also, if result list is empty, it indicates there is no 
> data left for all the scanners and the user can call {@link #close()} 
> afterwards.
> * The close function will close all the scanners and shutdown the thread pool.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10502) [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel.

2014-02-11 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898132#comment-13898132
 ] 

Liyin Tang commented on HBASE-10502:


Actually HBase-9272 + HBase10502 is quite effective to optimize Join queries. 
Assuming a join query such as Table A joins Table B based on row key / some 
prefix, then HBase-9272 is useful to issue the initial scan in parallel to 
retrieve all the join keys, and then based on join keys, multiple scan queries 
for Table B can be constructed and be submitted in parallel by HBase10502.

> [89-fb] ParallelScanner: a client utility to perform multiple scan requests 
> in parallel.
> 
>
> Key: HBASE-10502
> URL: https://issues.apache.org/jira/browse/HBASE-10502
> Project: HBase
>  Issue Type: New Feature
>Reporter: Liyin Tang
> Fix For: 0.89-fb
>
>
> ParallelScanner is a utility class for the HBase client to perform multiple 
> scan requests in parallel. It requires all the scan requests having the same 
> caching size for the simplicity purpose. 
>  
> This class provides 3 very basic functionalities: 
> * The initialize function will Initialize all the ResultScanners by calling 
> {@link HTable#getScanner(Scan)} in parallel for each scan request.
> * The next function will call the corresponding {@link ResultScanner#next(int 
> numRows)} from each scan request in parallel, and then return all the results 
> together as a list.  Also, if result list is empty, it indicates there is no 
> data left for all the scanners and the user can call {@link #close()} 
> afterwards.
> * The close function will close all the scanners and shutdown the thread pool.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10502) [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel.

2014-02-11 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898128#comment-13898128
 ] 

Liyin Tang commented on HBASE-10502:


By skimming though HBASE-9272,  the semantics seem to be a little different. In 
this case, the client actually wants to construct multiple scan requests, while 
HBASE-9272 is to perform a single scan request in parallel. 


> [89-fb] ParallelScanner: a client utility to perform multiple scan requests 
> in parallel.
> 
>
> Key: HBASE-10502
> URL: https://issues.apache.org/jira/browse/HBASE-10502
> Project: HBase
>  Issue Type: New Feature
>Reporter: Liyin Tang
> Fix For: 0.89-fb
>
>
> ParallelScanner is a utility class for the HBase client to perform multiple 
> scan requests in parallel. It requires all the scan requests having the same 
> caching size for the simplicity purpose. 
>  
> This class provides 3 very basic functionalities: 
> * The initialize function will Initialize all the ResultScanners by calling 
> {@link HTable#getScanner(Scan)} in parallel for each scan request.
> * The next function will call the corresponding {@link ResultScanner#next(int 
> numRows)} from each scan request in parallel, and then return all the results 
> together as a list.  Also, if result list is empty, it indicates there is no 
> data left for all the scanners and the user can call {@link #close()} 
> afterwards.
> * The close function will close all the scanners and shutdown the thread pool.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10502) [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel.

2014-02-11 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898111#comment-13898111
 ] 

Lars Hofhansl commented on HBASE-10502:
---

see also HBASE-9272

> [89-fb] ParallelScanner: a client utility to perform multiple scan requests 
> in parallel.
> 
>
> Key: HBASE-10502
> URL: https://issues.apache.org/jira/browse/HBASE-10502
> Project: HBase
>  Issue Type: New Feature
>Reporter: Liyin Tang
> Fix For: 0.89-fb
>
>
> ParallelScanner is a utility class for the HBase client to perform multiple 
> scan requests in parallel. It requires all the scan requests having the same 
> caching size for the simplicity purpose. 
>  
> This class provides 3 very basic functionalities: 
> * The initialize function will Initialize all the ResultScanners by calling 
> {@link HTable#getScanner(Scan)} in parallel for each scan request.
> * The next function will call the corresponding {@link ResultScanner#next(int 
> numRows)} from each scan request in parallel, and then return all the results 
> together as a list.  Also, if result list is empty, it indicates there is no 
> data left for all the scanners and the user can call {@link #close()} 
> afterwards.
> * The close function will close all the scanners and shutdown the thread pool.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)