[ https://issues.apache.org/jira/browse/HBASE-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13023352#comment-13023352 ]
Prakash Khemani commented on HBASE-3815: ---------------------------------------- Log snippets showing assignment-manager continuously choosing server-132 for region assignment even though it constantly fails. There ought to be a global exclude list in addition to a per region exclude list? 2011-04-17 07:14:06,312 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region realtime_domain_feed_imps_domains,e0a3d6b6,1289467228948.64f5ad9ca3f4d6a235365f10ccc4ae87. to pumahbase132.snc5.facebook.com,60020,1303046136711 2011-04-17 07:14:06,314 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of realtime_domain_feed_imps_domains,e0a3d6b6,1289467228948.64f5ad9ca3f4d6a235365f10ccc4ae87. to serverName=pumahbase132.snc5.facebook.com,60020,1303046136711, load=(requests=0, regions=81, usedHeap=155, maxHeap=31987), trying to assign elsewhere instead; retry=0 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:406) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:884) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:751) at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257) at $Proxy6.openRegion(Unknown Source) at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:547) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:901) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:730) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:710) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:92) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) 2011-04-17 07:14:06,314 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for realtime_domain_feed_imps_domains,e0a3d6b6,1289467228948.64f5ad9ca3f4d6a235365f10ccc4ae87. so generated a random one; hri=realtime_domain_feed_imps_domains,e0a3d6b6,1289467228948.64f5ad9ca3f4d6a235365f10ccc4ae87., src=, dest=pumahbase156.snc5.facebook.com,60020,1302847439345; 72 (online=72, exclude=serverName=pumahbase132.snc5.facebook.com,60020,1303046136711, load=(requests=0, regions=81, usedHeap=155, maxHeap=31987)) available servers 2011-04-17 07:19:06,097 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region realtime_domain_acts_urls_hot,9b851e7e,1290555837119.c41a23fd0bd57d1eb4c3a5ef1ed6ccac. to pumahbase132.snc5.facebook.com,60020,1303046136711 2011-04-17 07:19:06,098 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of realtime_domain_acts_urls_hot,9b851e7e,1290555837119.c41a23fd0bd57d1eb4c3a5ef1ed6ccac. to serverName=pumahbase132.snc5.facebook.com,60020,1303046136711, load=(requests=0, regions=81, usedHeap=155, maxHeap=31987), trying to assign elsewhere instead; retry=0 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:406) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:884) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:751) at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257) at $Proxy6.openRegion(Unknown Source) at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:547) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:901) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:730) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:710) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:92) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) 2011-04-17 07:19:06,098 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for realtime_domain_acts_urls_hot,9b851e7e,1290555837119.c41a23fd0bd57d1eb4c3a5ef1ed6ccac. so generated a random one; hri=realtime_domain_acts_urls_hot,9b851e7e,1290555837119.c41a23fd0bd57d1eb4c3a5ef1ed6ccac., src=, dest=pumahbase150.snc5.facebook.com,60020,1302847439118; 72 (online=72, exclude=serverName=pumahbase132.snc5.facebook.com,60020,1303046136711, load=(requests=0, regions=81, usedHeap=155, maxHeap=31987)) available servers 2011-04-17 07:19:08,018 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region realtime_domain_feed_imps_urls,beec529151bb2bf4c2715f8a6fb9423cly.bit sefi 182668488437983,1298404111923.b12b2ab3839ed40bbf0139802db3d23a.; plan=hri=realtime_domain_feed_imps_urls,beec529151bb2bf4c2715f8a6fb9423cly.bit sefi 182668488437983,1298404111923.b12b2ab3839ed40bbf0139802db3d23a., src=pumahbase156.snc5.facebook.com,60020,1302847439345, dest=pumahbase132.snc5.facebook.com,60020,1303046136711 2011-04-17 07:19:08,018 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region realtime_domain_feed_imps_urls,beec529151bb2bf4c2715f8a6fb9423cly.bit sefi 182668488437983,1298404111923.b12b2ab3839ed40bbf0139802db3d23a. to pumahbase132.snc5.facebook.com,60020,1303046136711 2011-04-17 07:19:08,020 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of realtime_domain_feed_imps_urls,beec529151bb2bf4c2715f8a6fb9423cly.bit sefi 182668488437983,1298404111923.b12b2ab3839ed40bbf0139802db3d23a. to serverName=pumahbase132.snc5.facebook.com,60020,1303046136711, load=(requests=0, regions=81, usedHeap=155, maxHeap=31987), trying to assign elsewhere instead; retry=0 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:406) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:884) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:751) at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257) at $Proxy6.openRegion(Unknown Source) at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:547) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:901) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:730) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:710) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:92) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) 2011-04-17 07:19:08,020 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for realtime_domain_feed_imps_urls,beec529151bb2bf4c2715f8a6fb9423cly.bit sefi 182668488437983,1298404111923.b12b2ab3839ed40bbf0139802db3d23a. so generated a random one; hri=realtime_domain_feed_imps_urls,beec529151bb2bf4c2715f8a6fb9423cly.bit sefi 182668488437983,1298404111923.b12b2ab3839ed40bbf0139802db3d23a., src=, dest=pumahbase193.snc5.facebook.com,60020,1302847439839; 72 (online=72, exclude=serverName=pumahbase132.snc5.facebook.com,60020,1303046136711, load=(requests=0, regions=81, usedHeap=155, maxHeap=31987)) available servers ... and this continues till late in the night 2011-04-17 23:40:28,560 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region realtime_domain_imps_domains,feb8518c,1289277895527.2f56039fb808ef84bd78fcf9c15c5fb1.; plan=hri=realtime_domain_imps_domains,feb8518c,1289277895527.2f56039fb808ef84bd78fcf9c15c5fb1., src=pumahbase133.snc5.facebook.com,60020,1302847439080, dest=pumahbase132.snc5.facebook.com,60020,1303107053536 2011-04-17 23:40:28,560 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region realtime_domain_imps_domains,feb8518c,1289277895527.2f56039fb808ef84bd78fcf9c15c5fb1. to pumahbase132.snc5.facebook.com,60020,1303107053536 2011-04-17 23:40:28,562 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of realtime_domain_imps_domains,feb8518c,1289277895527.2f56039fb808ef84bd78fcf9c15c5fb1. to serverName=pumahbase132.snc5.facebook.com,60020,1303107053536, load=(requests=7214, regions=1, usedHeap=243, maxHeap=31987), trying to assign elsewhere instead; retry=0 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:406) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:884) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:751) at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257) at $Proxy6.openRegion(Unknown Source) at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:547) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:901) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:730) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:710) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:92) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) 2011-04-17 23:40:28,562 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for realtime_domain_imps_domains,feb8518c,1289277895527.2f56039fb808ef84bd78fcf9c15c5fb1. so generated a random one; hri=realtime_domain_imps_domains,feb8518c,1289277895527.2f56039fb808ef84bd78fcf9c15c5fb1., src=, dest=pumahbase170.snc5.facebook.com,60020,1302847439039; 72 (online=72, exclude=serverName=pumahbase132.snc5.facebook.com,60020,1303107053536, load=(requests=7214, regions=1, usedHeap=243, maxHeap=31987)) available servers > lb should ignore bad region servers > ----------------------------------- > > Key: HBASE-3815 > URL: https://issues.apache.org/jira/browse/HBASE-3815 > Project: HBase > Issue Type: Bug > Reporter: Prakash Khemani > > the loadbalancer should remember which region server is constantly having > trouble opening regions and it should take that rs out of the equation ... > otherwise the lb goes into an unproductive loop ... > I don't have logs handy for this one. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira