[jira] [Commented] (HBASE-10499) In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException

Hadoop QA (JIRA) Thu, 22 Jan 2015 09:52:56 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287856#comment-14287856
 ]


Hadoop QA commented on HBASE-10499:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12693910/10499-v6.txt
  against master branch at commit 833feefbf977a8208f8f71f5f3bd9b027d54961f.
  ATTACHMENT ID: 12693910

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

    {color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

    {color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

    {color:green}+1 core tests{color}.  The patch passed unit tests in .

     {color:red}-1 core zombie tests{color}.  There are 5 zombie test(s):       
at 
org.apache.hadoop.hbase.io.encoding.TestDataBlockEncoders.testSeekingOnSample(TestDataBlockEncoders.java:205)
        at 
org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite.testNotCachingDataBlocksDuringCompactionInternals(TestCacheOnWrite.java:458)
        at 
org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite.testNotCachingDataBlocksDuringCompaction(TestCacheOnWrite.java:410)
        at 
org.apache.hadoop.hbase.io.hfile.TestScannerSelectionUsingTTL.testScannerSelection(TestScannerSelectionUsingTTL.java:118)
        at 
org.apache.hadoop.hbase.TestAcidGuarantees.testScanAtomicity(TestAcidGuarantees.java:354)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12548//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12548//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12548//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12548//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12548//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12548//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12548//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12548//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12548//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12548//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12548//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12548//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12548//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/12548//console

This message is automatically generated.

> In write heavy scenario one of the regions does not get flushed causing 
> RegionTooBusyException
> ----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-10499
>                 URL: https://issues.apache.org/jira/browse/HBASE-10499
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.0
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 2.0.0, 1.1.0
>
>         Attachments: 10499-v2.txt, 10499-v3.txt, 10499-v4.txt, 10499-v6.txt, 
> HBASE-10499.patch, HBASE-10499_v5.patch, compaction-queue.png, 
> hbase-root-master-ip-10-157-0-229.zip, 
> hbase-root-regionserver-ip-10-93-128-92.zip, master_4e39.log, 
> master_576f.log, rs_4e39.log, rs_576f.log, t1.dump, t2.dump, 
> workloada_0.98.dat
>
>
> I got this while testing 0.98RC.  But am not sure if it is specific to this 
> version.  Doesn't seem so to me.  
> Also it is something similar to HBASE-5312 and HBASE-5568.
> Using 10 threads i do writes to 4 RS using YCSB. The table created has 200 
> regions.  In one of the run with 0.98 server and 0.98 client I faced this 
> problem like the hlogs became more and the system requested flushes for those 
> many regions.
> One by one everything was flushed except one and that one thing remained 
> unflushed.  The ripple effect of this on the client side
> {code}
> com.yahoo.ycsb.DBException: 
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 
> 54 actions: RegionTooBusyException: 54 times,
>         at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:245)
>         at com.yahoo.ycsb.DBWrapper.cleanup(DBWrapper.java:73)
>         at com.yahoo.ycsb.ClientThread.run(Client.java:307)
> Caused by: 
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 
> 54 actions: RegionTooBusyException: 54 times,
>         at 
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:187)
>         at 
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:171)
>         at 
> org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:897)
>         at 
> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:961)
>         at 
> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1225)
>         at com.yahoo.ycsb.db.HBaseClient.cleanup(HBaseClient.java:232)
>         ... 2 more
> {code}
> On one of the RS
> {code}
> 2014-02-11 08:45:58,714 INFO  [regionserver60020.logRoller] wal.FSHLog: Too 
> many hlogs: logs=38, maxlogs=32; forcing flush of 23 regions(s): 
> 97d8ae2f78910cc5ded5fbb1ddad8492, d396b8a1da05c871edcb68a15608fdf2, 
> 01a68742a1be3a9705d574ad68fec1d7, 1250381046301e7465b6cf398759378e, 
> 127c133f47d0419bd5ab66675aff76d4, 9f01c5d25ddc6675f750968873721253, 
> 29c055b5690839c2fa357cd8e871741e, ca4e33e3eb0d5f8314ff9a870fc43463, 
> acfc6ae756e193b58d956cb71ccf0aa3, 187ea304069bc2a3c825bc10a59c7e84, 
> 0ea411edc32d5c924d04bf126fa52d1e, e2f9331fc7208b1b230a24045f3c869e, 
> d9309ca864055eddf766a330352efc7a, 1a71bdf457288d449050141b5ff00c69, 
> 0ba9089db28e977f86a27f90bbab9717, fdbb3242d3b673bbe4790a47bc30576f, 
> bbadaa1f0e62d8a8650080b824187850, b1a5de30d8603bd5d9022e09c574501b, 
> cc6a9fabe44347ed65e7c325faa72030, 313b17dbff2497f5041b57fe13fa651e, 
> 6b788c498503ddd3e1433a4cd3fb4e39, 3d71274fe4f815882e9626e1cfa050d1, 
> acc43e4b42c1a041078774f4f20a3ff5
> ......................................................
> 2014-02-11 08:47:49,580 INFO  [regionserver60020.logRoller] wal.FSHLog: Too 
> many hlogs: logs=53, maxlogs=32; forcing flush of 2 regions(s): 
> fdbb3242d3b673bbe4790a47bc30576f, 6b788c498503ddd3e1433a4cd3fb4e39
> {code}
> {code}
> 2014-02-11 09:42:44,237 INFO  [regionserver60020.periodicFlusher] 
> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
> flush for region 
> usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a 
> delay of 16689
> 2014-02-11 09:42:44,237 INFO  [regionserver60020.periodicFlusher] 
> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
> flush for region 
> usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a 
> delay of 15868
> 2014-02-11 09:42:54,238 INFO  [regionserver60020.periodicFlusher] 
> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
> flush for region 
> usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a 
> delay of 20847
> 2014-02-11 09:42:54,238 INFO  [regionserver60020.periodicFlusher] 
> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
> flush for region 
> usertable,user6264,1392107806983.6b788c498503ddd3e1433a4cd3fb4e39. after a 
> delay of 20099
> 2014-02-11 09:43:04,238 INFO  [regionserver60020.periodicFlusher] 
> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
> flush for region 
> usertable,user3654,1392107806977.fdbb3242d3b673bbe4790a47bc30576f. after a 
> delay of 8677
> {code}
> {code}
> 2014-02-11 10:31:21,020 INFO  [regionserver60020.logRoller] wal.FSHLog: Too 
> many hlogs: logs=54, maxlogs=32; forcing flush of 1 regions(s): 
> fdbb3242d3b673bbe4790a47bc30576f
> {code}
> I restarted another RS and there were region movements with other regions but 
> this region stays with the RS that has this issue.  One important observation 
> is that in HRegion.internalflushCache() we need to add a debug log here
> {code}
> // If nothing to flush, return and avoid logging start/stop flush.
>     if (this.memstoreSize.get() <= 0) {
>       return false;
>     }
> {code}
> Because we can see that the region is requsted for a flush but it does not 
> happen and no logs related to flush are printed in the logs. so due to some 
> reason this memstore.size() has become 0( I assume this).  The earlier bugs 
> were also due to similar reason.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-10499) In write heavy scenario one of the regions does not get flushed causing RegionTooBusyException

Reply via email to