[ 
https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384102#comment-14384102
 ] 

Michael Segel  commented on HBASE-12853:
----------------------------------------

Ok, 
So then when a scanner object is passed from the client to the server the 
client will ask the HMaster for the region(s) that satisfy the scan, or just 
the first region? 

This would imply that when running a m/r that the m/r program will ask the 
HMaster for the regions and then will create a split for each region in the 
list and then each mapper task will initiate its own scan over a specific 
region? 

Ok... on one level for m/r that makes sense because you wouldn't want 1000 
mappers trying to coordinate queries with the HMaster at the same time because 
it could become a bottleneck. 

On the other side, if you're using HBase as a database outside of Map/Reduce, 
you'd want to have a query engine that would abstract the underlying workings 
of a scan from the client. 



> distributed write pattern to replace ad hoc 'salting'
> -----------------------------------------------------
>
>                 Key: HBASE-12853
>                 URL: https://issues.apache.org/jira/browse/HBASE-12853
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Michael Segel 
>            Priority: Minor
>
> In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is 
> that while 'salting' alleviated  regional hot spotting, it increased the 
> complexity required to utilize the data.  
> Through the use of coprocessors, it should be possible to offer a method 
> which distributes the data on write across the cluster and then manages 
> reading the data returning a sort ordered result set, abstracting the 
> underlying process. 
> On table creation, a flag is set to indicate that this is a parallel table. 
> On insert in to the table, if the flag is set to true then a prefix is added 
> to the key.  e.g. <region server#>- or <region server #|| where the region 
> server # is an integer between 1 and the number of region servers defined.  
> On read (scan) for each region server defined, a separate scan is created 
> adding the prefix. Since each scan will be in sort order, its possible to 
> strip the prefix and return the lowest value key from each of the subsets. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to