[jira] Commented: (HBASE-748) Add an efficient way to batch update many rows

stack (JIRA) Wed, 24 Sep 2008 17:04:39 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634348#action_12634348
 ]


stack commented on HBASE-748:
-----------------------------

On the patch:

I see you've made BatchUpdate comparable already.  Scratch my suggestion above. 
 I like the fact that you can ask a BatchUpdate its size.  Should size include 
row and column sizes too to cover the case where they are really large?

bq. The region gets split after 10 rows... (From 'Jean-Daniel Cryans - 
24/Sep/08 03:59 PM - edited')

I'd been thinking that doing batch updates, we'd take out the 
splitsAndClosesLock so splits wouldn't happen.  Would this be easier if you 
moved the array of batch update handlings down into HRegion rather than run it 
in HRS?

Minor nitpick: Put declaration and this line -- '+    this.writeBuffer = new 
ArrayList<BatchUpdate>();' -- together and then add final keywork (not 
important... but if you are revising anyways).  Similarily for this line -- +   
 this.currentWriteBufferSize = 0;... why not declare and do initial assign in 
the one line rather than have it split with assignment down in constructor?

More nitpicks:

{code}
+      if (bu.getRow() == null)
+        throw new IllegalArgumentException("update has null row");
{code}

In hadoop coding, folks supply the parens as in:

{code}
+      if (bu.getRow() == null) {
+        throw new IllegalArgumentException("update has null row");
+     }
{code}

Just FYI.

Here's a style comment:  Rather than do this in fflushCommits:

{code}
+    if (!writeBuffer.isEmpty()) {
{code}

... instead do

{code}
+    if (writeBuffer.isEmpty()) {
+      return;
+   }
{code}

Then you don't have whole method body indented.  Gives you more room on a line. 
 Deals with the isEmpty immediately rather than let it last the length of the 
method body.

bq. 1- The way retries are handled client-side is ugly, cannot be reused for 
other operations (so HBASE-880 wouldn't fit in).

I don't follow the above J-D?  Are you saying that we can't do batching with 
880?

bq. 3- WRE and NSRE maybe should share a common super class.

Thats fine by me.

bq. 4- If any exception is thrown server-side, like values are too long or 
columns in the wrong format, the state of the whole transaction will be unknown 
to the user.

Could we iterate the BatchUpdate array first before committing to check its 
wholesomeness so if problem, user knows that nothing was committed?

bq. ...but it's not atomic if a row fails, for example, if a value was too long.

What you mean by the above?  That if we have none-WRE or none-NSRE-like 
exception, then batch is dropped?

Good stuff J-D.










> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>
>         Attachments: hbase-748-v1.patch
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this 
> issue is to have an enhanced version that will send many rows in a single RPC 
> to each region server. To do this, the client code will have to figure which 
> rows goes to which server, group them accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-748) Add an efficient way to batch update many rows

Reply via email to