[jira] Commented: (HBASE-2353) HBASE-2283 removed bulk sync optimization for multi-row puts

ryan rawson (JIRA) Mon, 05 Apr 2010 11:52:50 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853488#action_12853488
 ]

ryan rawson commented on HBASE-2353:
------------------------------------

I think Todd was going to try implementing his algorithm above. Lets see how
that looks.

On Apr 5, 2010 11:22 AM, "Jean-Daniel Cryans (JIRA)" <[email protected]>
wrote:

[
https://issues.apache.org/jira/browse/HBASE-2353?page=com.atlassian.jira.plugin.system.issue.
..
Jean-Daniel Cryans updated HBASE-2353:
--------------------------------------

        Priority: Blocker  (was: Major)
   Fix Version/s: 0.20.4

Marking as blocker.

Testing trunk with PE seqWrite, it's now a bit more than 5x slower (what
took 55 secs now takes 320). Deferred log flushing would help here but it
would still be slower than the bulk sync optimization we had. This is a huge
performance regression, even if they get durability some of our users will
see MR jobs that took maybe an hour now take more than 5... for a variety of
reasons this is enough to make this issue a blocker.

I think shipping with configs is good, but it won't solve this problem.

This mini-batching solution sounds awesome, unsure how soon we can get it
tho.

Like Ryan was initially saying, bringing deferred log flush in 0.20 would be
an easy task since it's a few lines to fix. The issue then is to decide
whether we want to ship with this turned on or off by default (we already
had a vote on this issue for trunk in November, we decided to enable it by
default for all tables). Also if we turn this on, how big would the window
be (currently 1 second in trunk).

I would like to point out that the MySQL binary log isn't flushed for every
edit by default. See http://dev.mysql.com/doc/refman/5.0/en/binary-log.html,
grep for "sync_binlog". We can't rely on HDFS to flush the HLog so we do it
with the polling timeout, also we already force flush catalog edits and
tables with deferred log flush disabled are flushing others edits. We could
set a very small window, say 100ms?, and everyone is free to change it for
their own tables.

> HBASE-2283 removed bulk sync optimization for multi-row puts
> ------------------------------------------------------------
>
>                 Key: HBASE-2353
>                 URL: https://issues.apache.org/jira/browse/HBASE-2353
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: ryan rawson
>            Priority: Blocker
>             Fix For: 0.20.4, 0.21.0
>
>         Attachments: HBASE-2353-deferred.txt
>
>
> previously to HBASE-2283 we used to call flush/sync once per put(Put[]) call 
> (ie: batch of commits).  Now we do for every row.  
> This makes bulk uploads slower if you are using WAL.  Is there an acceptable 
> solution to achieve both safety and performance by bulk-sync'ing puts?  Or 
> would this not work in face of atomic guarantees?
> discuss!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2353) HBASE-2283 removed bulk sync optimization for multi-row puts

Reply via email to