[ https://issues.apache.org/jira/browse/HBASE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lars Hofhansl closed HBASE-6240. -------------------------------- > Race in HCM.getMaster stalls clients > ------------------------------------ > > Key: HBASE-6240 > URL: https://issues.apache.org/jira/browse/HBASE-6240 > Project: HBase > Issue Type: Bug > Affects Versions: 0.94.0 > Reporter: Jean-Daniel Cryans > Assignee: ramkrishna.s.vasudevan > Priority: Critical > Fix For: 0.94.1 > > Attachments: HBASE-6240_1_0.94.patch, HBASE-6240.patch > > > I found this issue trying to run YCSB on 0.94, I don't think it exists on any > other branch. I believe that this was introduced in HBASE-5058 "Allow > HBaseAdmin to use an existing connection". > The issue is that in HCM.getMaster it does this recipe: > # Check if the master is null and runs (if so, return) > # Grab a lock on masterLock > # nullify this.master > # try to get a new master > The issue happens at 3, it should re-run 1 since while you're waiting on the > lock someone else could have already fixed it for you. What happens right now > is that the threads are all able to set the master to null before others are > able to get out of getMaster and it's a complete mess. > Figuring it out took me some time because it doesn't manifest itself right > away, silent retries are done in the background. Basically the first clue was > this: > {noformat} > Error doing get: org.apache.hadoop.hbase.client.RetriesExhaustedException: > Failed after attempts=10, exceptions: > Tue Jun 19 23:40:46 UTC 2012, > org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5 > closed > Tue Jun 19 23:40:47 UTC 2012, > org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5 > closed > Tue Jun 19 23:40:48 UTC 2012, > org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5 > closed > Tue Jun 19 23:40:49 UTC 2012, > org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5 > closed > Tue Jun 19 23:40:51 UTC 2012, > org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5 > closed > Tue Jun 19 23:40:53 UTC 2012, > org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5 > closed > Tue Jun 19 23:40:57 UTC 2012, > org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5 > closed > Tue Jun 19 23:41:01 UTC 2012, > org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5 > closed > Tue Jun 19 23:41:09 UTC 2012, > org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5 > closed > Tue Jun 19 23:41:25 UTC 2012, > org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5 > closed > {noformat} > This was caused by the little dance up in HBaseAdmin where it deletes "stale" > connections... which are not stale at all. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira