[ https://issues.apache.org/jira/browse/HBASE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733357#comment-14733357 ]
Hudson commented on HBASE-14317: -------------------------------- FAILURE: Integrated in HBase-1.2 #154 (See [https://builds.apache.org/job/HBase-1.2/154/]) HBASE-14317 Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL (stack: rev 990e3698a7ca7e95894150a2905ba4271eb371e9) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSWALEntry.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/DamagedWALException.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiVersionConcurrencyControl.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConcurrencyControl.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/LogRoller.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SyncFuture.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogReader.java * hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALKey.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestFailedAppendAndSync.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiVersionConcurrencyControlBasic.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestWALLockup.java > Stuck FSHLog: bad disk (HDFS-8960) and can't roll WAL > ----------------------------------------------------- > > Key: HBASE-14317 > URL: https://issues.apache.org/jira/browse/HBASE-14317 > Project: HBase > Issue Type: Bug > Affects Versions: 1.2.0, 1.1.1 > Reporter: stack > Assignee: stack > Priority: Blocker > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14317.branch-1.txt, 14317.branch-1.txt, > 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, > 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, > 14317.branch-1.v2.txt, 14317.branch-1.v2.txt, 14317.test.txt, 14317v10.txt, > 14317v11.txt, 14317v12.txt, 14317v13.txt, 14317v14.txt, 14317v15.txt, > 14317v5.branch-1.2.txt, 14317v5.txt, 14317v9.txt, HBASE-14317-v1.patch, > HBASE-14317-v2.patch, HBASE-14317-v3.patch, HBASE-14317-v4.patch, > HBASE-14317.patch, [Java] RS stuck on WAL sync to a dead DN - > Pastebin.com.html, append-only-test.patch, raw.php, repro.txt, san_dump.txt, > subset.of.rs.log, timeouts.branch-1.txt > > > hbase-1.1.1 and hadoop-2.7.1 > We try to roll logs because can't append (See HDFS-8960) but we get stuck. > See attached thread dump and associated log. What is interesting is that > syncers are waiting to take syncs to run and at same time we want to flush so > we are waiting on a safe point but there seems to be nothing in our ring > buffer; did we go to roll log and not add safe point sync to clear out > ringbuffer? > Needs a bit of study. Try to reproduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)