[
https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632727#action_12632727
]
Jean-Daniel Cryans commented on HBASE-748:
------------------------------------------
Here is how I plan to implement the "many rows to many regions" logic.
In HRS, add a new version of batchUpdate that takes an array of RowUpdate
(HBASE-880). For this version, it will only iterate over the array and call the
current batchUpdate. A bit of logic will be added so that if an WRE gets
thrown, we return what was the index of the last inserted row.
In HTable, when the flushing is called, it calls a method that takes an
ArrayList of unsorted RowOperation (HBASE-880). Following pseudo code does the
rest:
{code}
sort the row operations (called ops)
create a temporary empty list of ops
retrieve the cached region of the first op and mark it as "current"
for i = 0; i < number of ops; i++
current op is at index i of the array of ops
add the op to the temporary list
retrieve the cached region of the following op (if any)
if current region not equals retrieved region or current op is the last one
do the operation on region server of current region
if an WRE is thrown
retrieve the real region of the op at the index in WRE (becomes the
retrieved region)
reset i to the index of the returned row - 1 in WRE
the retrieved region is now the current region
clear the temporary list
{code}
The big trade-off in this algo is that I try to limit the number of queries to
.META. by using the cache at the expense of moving potentially big chunks of
rows back an forth if the cache is stale. This impact could be diminished if we
fetched more .META. rows at each locateRegionInMeta using HBASE-887 instead of
using getClosestRowBefore (just a thought). That's what Bigtable does.
Any comments?
> Add an efficient way to batch update many rows
> ----------------------------------------------
>
> Key: HBASE-748
> URL: https://issues.apache.org/jira/browse/HBASE-748
> Project: Hadoop HBase
> Issue Type: New Feature
> Components: client
> Affects Versions: 0.1.3, 0.2.0
> Reporter: Jean-Daniel Cryans
> Assignee: Jean-Daniel Cryans
> Fix For: 0.19.0
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this
> issue is to have an enhanced version that will send many rows in a single RPC
> to each region server. To do this, the client code will have to figure which
> rows goes to which server, group them accordingly and then send them.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.