[ 
https://issues.apache.org/jira/browse/HBASE-9291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864614#comment-13864614
 ] 

Andrew Purtell commented on HBASE-9291:
---------------------------------------

bq. Any idea on the answer to my above question about tacking on the info to 
cache on the first Put: If this data is added to the first Put for each region 
server, is there any guarantee that one of the other regions isn't processed 
first (since these are sent in parallel from the client)?

Not to my knowledge, you can guarantee a put is the "first" Put by dispatching 
Puts to appropriate keys to get a once-per-RS op first, then proceed with the 
rest of the work in another RPC. There's a race there but you mentioned you 
already handle the case if part of the keyspace has relocated inbetween. 
Pushing this down into the client library won't prevent the same kind of race, 
might as well handle it in your application since you may have special 
knowledge not available to a generalized API.

> Enable client to setAttribute that is sent once to each region server
> ---------------------------------------------------------------------
>
>                 Key: HBASE-9291
>                 URL: https://issues.apache.org/jira/browse/HBASE-9291
>             Project: HBase
>          Issue Type: New Feature
>          Components: IPC/RPC
>            Reporter: James Taylor
>
> Currently a Scan and Mutation allow the client to set its own attributes that 
> get passed through the RPC layer and are accessible from a coprocessor. This 
> is very handy, but breaks down if the amount of information is large, since 
> this information ends up being sent again and again to every region. Clients 
> can work around this with an endpoint "pre" and "post" coprocessor invocation 
> that:
> 1) sends the information and caches it on the region server in the "pre" 
> invocation
> 2) invokes the Scan or sends the batch of Mutations, and then
> 3) removes it in the "post" invocation.
> In this case, the client is forced to identify all region servers (ideally, 
> all region servers that will be involved in the Scan/Mutation), make extra 
> RPC calls, manage the caching of the information on the region server, 
> age-out the information (in case the client dies before step (3) that clears 
> the cached information), and must deal with the possibility of a split 
> occurring while this operation is in-progress.
> Instead, it'd be much better if an attribute could be identified as a "region 
> server" attribute in OperationWithAttributes and the HBase RPC layer would 
> take care of doing the above.
> The use case where the above are necessary in Phoenix include:
> 1) Hash joins, where the results of the smaller side of a join scan are 
> packaged up and sent to each region server, and
> 2) Secondary indexing, where the metadata of knowing a) which column 
> family/column qualifier pairs and b) which part of the row key contributes to 
> which indexes are sent to each region server that will process a batched put.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to