[ 
https://issues.apache.org/jira/browse/HBASE-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574205#comment-14574205
 ] 

Samir Ahmic commented on HBASE-13337:
-------------------------------------

Thanks for review [~stack],
As far as i can see we have two options for fixing this issue and handling 
connection correctly:

1. Change default connection to non final and instead of creating new 
connection object  recreate existing connection in  ServerManager#getRsAdmin(),
{code}
156    -  private final ClusterConnection connection;
156    +  private ClusterConnection connection;
{code}

{code}
+      Configuration conf = master.getConfiguration();
+       this.connection = (ClusterConnection) 
ConnectionFactory.createConnection(conf);     
{code}
This connection will be closed when master is shutdown.

2. We can implement additional logic in ServerManager that will take care of 
creating new connection when rs is restarted and close/remove it when becomes 
staled.  

I have tested first option and issue is fixed. Which method we prefer ? 


> Table regions are not assigning back, after restarting all regionservers at 
> once.
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-13337
>                 URL: https://issues.apache.org/jira/browse/HBASE-13337
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>    Affects Versions: 2.0.0
>            Reporter: Y. SREENIVASULU REDDY
>            Priority: Blocker
>             Fix For: 2.0.0
>
>         Attachments: HBASE-13337-v2.patch, HBASE-13337.patch
>
>
> Regions of the table are continouly in state=FAILED_CLOSE.
> {noformat}
> Region                                        State                           
>                                                                               
>                                                   RIT time (ms)
> 8f62e819b356736053e06240f7f7c6fd      
> t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM1,16040,1427362531818      113929
> caf59209ae65ea80fca6bdc6996a7d68      
> t1,dddddddd,1427362431330.caf59209ae65ea80fca6bdc6996a7d68. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM2,16040,1427362533691      113929
> db52a74988f71e5cf257bbabf31f26f3      
> t1,44444444,1427362431330.db52a74988f71e5cf257bbabf31f26f3. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM3,16040,1427362533691      113920
> 43f3a65b9f9ff283f598c5450feab1f8      
> t1,88888888,1427362431330.43f3a65b9f9ff283f598c5450feab1f8. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM1,16040,1427362531818      113920
> {noformat}
> *Steps to reproduce:*
> 1. Start HBase cluster with more than one regionserver.
> 2. Create a table with precreated regions. (lets say 15 regions)
> 3. Make sure the regions are well balanced.
> 4. Restart all the Regionservers process at once across the cluster, except 
> HMaster process
> 5. After restarting the Regionservers, successfully will connect to the 
> HMaster.
> *Bug:*
> But no regions are assigning back to the Regionservers.
> *Master log shows as follows:*
> {noformat}
> 2015-03-26 15:05:36,201 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
> state=OFFLINE, ts=1427362536106, server=VM2,16040,1427362242602} to 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,202 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStateStore: Updating row 
> t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
> state=PENDING_OPEN&sn=VM1,16040,1427362531818
> 2015-03-26 15:05:36,244 DEBUG [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Force region state offline 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
> state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} to 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_CLOSE, ts=1427362536244, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStateStore: Updating row 
> t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
> state=PENDING_CLOSE
> 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=1 of 10
> 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=2 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=3 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=4 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=5 of 10
> 2015-03-26 15:05:36,250 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=6 of 10
> 2015-03-26 15:05:36,250 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=7 of 10
> 2015-03-26 15:05:36,250 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=8 of 10
> 2015-03-26 15:05:36,251 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=9 of 10
> 2015-03-26 15:05:36,251 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=10 of 10
> 2015-03-26 15:05:36,251 WARN  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStates: Failed to open/close 8f62e819b356736053e06240f7f7c6fd on 
> VM1,16040,1427362531818, set to FAILED_CLOSE
> 2015-03-26 15:05:36,251 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
> state=PENDING_CLOSE, ts=1427362536244, server=VM1,16040,1427362531818} to 
> {8f62e819b356736053e06240f7f7c6fd state=FAILED_CLOSE, ts=1427362536251, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,251 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStateStore: Updating row 
> t1,55555555,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
> state=FAILED_CLOSE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to