[ https://issues.apache.org/jira/browse/HBASE-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010901#comment-13010901 ]
Jonathan Gray commented on HBASE-3654: -------------------------------------- Still seemed to make sense to synchronize around other sections, but I guess it's okay. I felt like I had a reason to keep the other stuff synchronized but don't see it right now. > Weird blocking between getOnlineRegion and createRegionLoad > ----------------------------------------------------------- > > Key: HBASE-3654 > URL: https://issues.apache.org/jira/browse/HBASE-3654 > Project: HBase > Issue Type: Bug > Affects Versions: 0.90.1 > Reporter: Jean-Daniel Cryans > Assignee: Subbu M Iyer > Priority: Blocker > Fix For: 0.90.2 > > Attachments: ConcurrentHM, ConcurrentSKLM, CopyOnWrite, > HBASE-3654-ConcurrentHashMap-RemoveGetSync.patch, > HBASE-3654_Weird_blocking_getOnlineRegions_and_createServerLoad_-_COWAL.patch, > > HBASE-3654_Weird_blocking_getOnlineRegions_and_createServerLoad_-_COWAL1.patch, > > HBASE-3654_Weird_blocking_getOnlineRegions_and_createServerLoad_-_ConcurrentHM.patch, > TestOnlineRegions.java, hashmap > > > Saw this when debugging something else: > {code} > "regionserver60020" prio=10 tid=0x00007f538c1c0000 nid=0x4c7 runnable > [0x00007f53931da000] > java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.hbase.regionserver.Store.getStorefilesIndexSize(Store.java:1380) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:916) > - locked <0x0000000672aa0a00> (a > java.util.concurrent.ConcurrentSkipListMap) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:767) > - locked <0x0000000656f62710> (a java.util.HashMap) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:722) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:591) > at java.lang.Thread.run(Thread.java:662) > "IPC Reader 9 on port 60020" prio=10 tid=0x00007f538c1be000 nid=0x4c6 waiting > for monitor entry [0x00007f53932db000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getFromOnlineRegions(HRegionServer.java:2295) > - waiting to lock <0x0000000656f62710> (a java.util.HashMap) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getOnlineRegion(HRegionServer.java:2307) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2333) > at > org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.isMetaRegion(HRegionServer.java:379) > at > org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.apply(HRegionServer.java:422) > at > org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.apply(HRegionServer.java:361) > at > org.apache.hadoop.hbase.ipc.HBaseServer.getQosLevel(HBaseServer.java:1126) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:982) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:946) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:522) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:316) > - locked <0x0000000656e60068> (a > org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > ... > "IPC Reader 0 on port 60020" prio=10 tid=0x00007f538c08b000 nid=0x4bd waiting > for monitor entry [0x00007f5393be4000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getFromOnlineRegions(HRegionServer.java:2295) > - waiting to lock <0x0000000656f62710> (a java.util.HashMap) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getOnlineRegion(HRegionServer.java:2307) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2333) > at > org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.isMetaRegion(HRegionServer.java:379) > at > org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.apply(HRegionServer.java:422) > at > org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.apply(HRegionServer.java:361) > at > org.apache.hadoop.hbase.ipc.HBaseServer.getQosLevel(HBaseServer.java:1126) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:982) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:946) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:522) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:316) > - locked <0x0000000656e635c8> (a > org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {code} > All the readers are blocked! I have the feeling something much better could > be done. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira