[ 
https://issues.apache.org/jira/browse/HBASE-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861925#comment-13861925
 ] 

Jean-Marc Spaggiari commented on HBASE-8912:
--------------------------------------------

After the first restart, 36 regions are stuck in transition :( But not any 
server crashed.

What I did:
- Restored default balancer to make sure as much regions as possible will move.
- Stop/start HBase
- Run balancer from shell.

Every thing is back up after a 2nd restart.

I get many errors like this one:
{code}
2014-01-03 16:03:03,958 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
Received FAILED_OPEN for region b75cb9067c3c4456d6198c9237c143b3 from server 
node4.domain.com,60020,1388782921790 but region was in  the state 
page,rf.idua.www\x1Fhttp\x1F-1\x1F/fr/brand/fr/audi_fleet_solutions/contact/contact_transport_personnes.html\x1Fnull,1379103792232.b75cb9067c3c4456d6198c9237c143b3.
 state=CLOSED, ts=1388782983373, server=node4.domain.com,60020,1388782921790 
and not in OFFLINE, PENDING_OPEN or OPENING
{code}

After investigations, I figured that snappy was missing on a server. I fixed 
that, restart: All seems to be fine. So I restored my customized balancer, 
restart, balanced.

Still some warning in the logs:
{code}
2014-01-03 16:21:52,864 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
Received OPENED for region db8e67acde26bf340da481d3c1b934cd from server 
node4.domain.com,60020,1388784051197 but region was in  the state 
page,moc.tenretnigruoboc.www\x1Fhttp\x1F-1\x1F/cobourg-and-the-web\x1Fnull,1379103844627.db8e67acde26bf340da481d3c1b934cd.
 state=OPEN, ts=1388784100392, server=node4.distparser.com,60020,1388784051197 
and not in expected OFFLINE, PENDING_OPEN or OPENING states
{code}

But this time all the regions are assigned correctly.

I did that one more time (change balancer, stop, start, balance. Change 
balancer, stop, start, balance). I turned loglevel to warn.

{code}
2014-01-03 16:28:51,142 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
Received OPENED for region 17bee313797fc1ce982c0e31fdb6620c from server 
node8.domain.com,60020,1388784498327 but region was in  the state 
page,rf.ofniecnarf.www\x1Fhttp\x1F-1\x1F/vote/comment/27996/1/vote/zero_vote/c99b0992e5a9cd6bf3a4cfc91769ceeb\x1Fnull,1379104524006.17bee313797fc1ce982c0e31fdb6620c.
 state=OPEN, ts=1388784531048, server=node8.distparser.com,60020,1388784498327 
and not in expected OFFLINE, PENDING_OPEN or OPENING states
2014-01-03 16:28:52,135 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
Received OPENED for region 6dc6290df1855b319f60bf89faa3da41 from server 
node8.domain.com,60020,1388784498327 but region was in  the state 
page_crc,\x00\x00\x00\x00\xD7\xD9\x97\x8Bvideo.k-wreview.ca,1378042601904.6dc6290df1855b319f60bf89faa3da41.
 state=OPEN, ts=1388784531793, server=node8.distparser.com,60020,1388784498327 
and not in expected OFFLINE, PENDING_OPEN or OPENING states
2014-01-03 16:28:52,712 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
Received OPENED for region ec4f96b6cedd935aeba279b15d5337af from server 
node8.domain.com,60020,1388784498327 but region was in  the state 
work_proposed,\x98\xBF\xAF\x90\x00\x00\x00\x00http://feedproxy.google.com/~r/WheatWeeds/~3/Of24fZKcpco/the-eighth-day-of-christmas.html,1378975430143.ec4f96b6cedd935aeba279b15d5337af.
 state=OPEN, ts=1388784532540, server=node8.distparser.com,60020,1388784498327 
and not in expected OFFLINE, PENDING_OPEN or OPENING states
2014-01-03 16:28:52,747 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
Received OPENED for region 4f823b5de664556a89cbd86aa41cd0b0 from server 
node8.distparser.com,60020,1388784498327 but region was in  the state 
work_proposed,\x8D4K\xEA\x00\x00\x00\x00http://twitter.com/home?status=CartoonStock%3A++http%3A%2F%2Fwww%2Ecartoonstock%2Ecom%2Fdirectory%2Fc%2Fcream%5Ftea%5Fgifts%2Easp,1378681682935.4f823b5de664556a89cbd86aa41cd0b0.
 state=OPEN, ts=1388784532552, server=node8.distparser.com,60020,1388784498327 
and not in expected OFFLINE, PENDING_OPEN or OPENING states
2014-01-03 16:28:53,244 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
Received OPENED for region da0bd0a6b7187f731fb34d4ac14ca279 from server 
node8.domain.com,60020,1388784498327 but region was in  the state 
work_proposed,\xB2\xE6\xB6\xBB\x00\x00\x00\x00http://www.canpages.ca/page/QC/notre-dame-des-prairies/concept-beton-design/4550984.html,1378737981443.da0bd0a6b7187f731fb34d4ac14ca279.
 state=OPEN, ts=1388784533203, server=node8.distparser.com,60020,1388784498327 
and not in expected OFFLINE, PENDING_OPEN or OPENING states
{code}

But everything finally got assigned without any restart required, any pretty 
quickly.

Logs from the last run:
{code}
2014-01-03 16:32:20,252 WARN org.apache.hadoop.ipc.HBaseServer: 
(responseTooSlow): {"processingtimems":10969,"call":"balance(), rpc version=1, 
client version=29, 
methodsFingerPrint=1886733559","client":"192.168.23.7:54614","starttimems":1388784729247,"queuetimems":0,"class":"HMaster","responsesize":0,"method":"balance"}
2014-01-03 16:32:21,278 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
Received OPENED for region 043d45cada6185d86e743754957e579a from server 
node1.distparser.com,60020,1388784692832 but region was in  the state 
page,moc.yubreffotseb.www\x1Fhttp\x1F-1\x1F/camera-bags-cases-straps-camera-bags-cases-c-282_888_580.html\x1Fzenid=ji3nr2ps8rnbaa7joc0lv4qln2,1388782516646.043d45cada6185d86e743754957e579a.
 state=OPEN, ts=1388784735731, server=node1.distparser.com,60020,1388784692832 
and not in expected OFFLINE, PENDING_OPEN or OPENING states
2014-01-03 16:32:21,713 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
Received OPENED for region e097f829b6eafb70b30c254fd4af662c from server 
node1.distparser.com,60020,1388784692832 but region was in  the state 
page,ac.usneeuq.ssenisub\x1Fhttp\x1F-1\x1F/grad_studies/PHD/about_us/queens_leaders_forum/about_us/about_us/grad_studies/PHD/student_career_services/queens_leaders_forum/recruiting/news/recruiting/about_us/accreditations.php\x1Fnull,1383168138496.e097f829b6eafb70b30c254fd4af662c.
 state=OPEN, ts=1388784736528, server=node1.distparser.com,60020,1388784692832 
and not in expected OFFLINE, PENDING_OPEN or OPENING states
2014-01-03 16:32:26,862 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
Received OPENED for region c94705f4a7c23a0a05a01bfe9d7755bc from server 
node1.distparser.com,60020,1388784692832 but region was in  the state 
entry,christian_labelle,1377000858428.c94705f4a7c23a0a05a01bfe9d7755bc. 
state=OPEN, ts=1388784740724, server=node1.distparser.com,60020,1388784692832 
and not in expected OFFLINE, PENDING_OPEN or OPENING states
2014-01-03 16:32:34,516 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
Received OPENED for region a8d60db86bd03cfbfba0bae0bb3cb564 from server 
node1.distparser.com,60020,1388784692832 but region was in  the state 
work_proposed,W{8\xB2\x00\x00\x00\x00http://www.prairiesouth.ca/williamgrayson/calendar-mainmenu-26/day.listevents/2013/10/29/23.html,1383415634227.a8d60db86bd03cfbfba0bae0bb3cb564.
 state=OPEN, ts=1388784754264, server=node1.distparser.com,60020,1388784692832 
and not in expected OFFLINE, PENDING_OPEN or OPENING states
{code}

So overall, it's WAY more stable! I have not been able to get anything stuck or 
crashed with the 2 patchs applied. I will keep them ;) Big +1 from me. Thanks 
for fixing that. I think it might be easy for fix the last few remaining 
warnings...

> [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to 
> OFFLINE
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-8912
>                 URL: https://issues.apache.org/jira/browse/HBASE-8912
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Lars Hofhansl
>            Priority: Critical
>             Fix For: 0.94.16
>
>         Attachments: 8912-0.94-alt2.txt, 8912-0.94.txt, 8912-fix-race.txt, 
> HBASE-8912.patch, HBase-0.94 #1036 test - testRetrying [Jenkins].html, 
> log.txt, org.apache.hadoop.hbase.catalog.TestMetaReaderEditor-output.txt
>
>
> AM throws this exception which subsequently causes the master to abort: 
> {code}
> java.lang.IllegalStateException: Unexpected state : 
> testRetrying,jjj,1372891751115.9b828792311001062a5ff4b1038fe33b. 
> state=PENDING_OPEN, ts=1372891751912, 
> server=hemera.apache.org,39064,1372891746132 .. Cannot transit it to OFFLINE.
>       at 
> org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879)
>       at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688)
>       at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424)
>       at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399)
>       at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394)
>       at 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105)
>       at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>       at java.lang.Thread.run(Thread.java:662)
> {code}
> This exception trace is from the failing test TestMetaReaderEditor which is 
> failing pretty frequently, but looking at the test code, I think this is not 
> a test-only issue, but affects the main code path. 
> https://builds.apache.org/job/HBase-0.94/1036/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to