So we think this critical to hbase?
Stack


On Dec 12, 2009, at 12:43 PM, Andrew Purtell <[email protected]> wrote:

All HBase committers should jump on that issue and +1. We should make that kind of statement for the record.




________________________________
From: stack (JIRA) <[email protected]>
To: [email protected]
Sent: Sat, December 12, 2009 12:39:18 PM
Subject: [jira] Resolved: (HBASE-1972) Failed split results in closed region and non-registration of daughters; fix the order in which things are run


[ https://issues.apache.org/jira/browse/HBASE-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-1972.
--------------------------

   Resolution: Won't Fix

Marking as invalid addressed by hdfs-630. Thanks for looking at this cosmin. Want to open an issue on getting 630 into 0.21. There will be pushback I'd imagine since not "critical" but might make 0.21.1

Failed split results in closed region and non-registration of daughters; fix the order in which things are run --- --- --- --- --- --- --- --- --- --- --- --- --- --- --------------------------------------------------------------------

               Key: HBASE-1972
               URL: https://issues.apache.org/jira/browse/HBASE-1972
           Project: Hadoop HBase
        Issue Type: Bug
          Reporter: stack
          Priority: Blocker
           Fix For: 0.21.0


As part of a split, we go to close the region. The close fails because flush failed -- a DN was down and HDFS refuses to move past it -- so we jump up out of the close with an IOE. But the region has been closed yet its still in the .META. as online.
Here is where the hole is:
1. CompactSplitThread calls split.
2. This calls HRegion splitRegion.
3. splitRegion calls close(false).
4. Down the end of the close, we get as far as the LOG.info("Closed " + this)..... but a DFSClient running thread throws an exception because it can't allocate block for the flush made as part of the close (Ain't sure how... we should add more try/catch in here):
{code}
2009-11-12 00:47:17,865 [regionserver/ 208.76.44.142:60020.compactor] DEBUG org.apache.hadoop.hbase.regionserver.Store: Added hdfs:// aa0-000-12.u.powerset.com:9002/hbase/TestTable/868626151/info/ 5071349140567656566, entries=46975, sequenceid=2350017, memsize=52.0m, filesize=46.5m to TestTable,,1257986664542 2009-11-12 00:47:17,866 [regionserver/ 208.76.44.142:60020.compactor] DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~52.0m for region TestTable,,1257986664542 in 7985ms, sequence id=2350017, compaction requested=false 2009-11-12 00:47:17,866 [regionserver/ 208.76.44.142:60020.compactor] DEBUG org.apache.hadoop.hbase.regionserver.Store: closed info 2009-11-12 00:47:17,866 [regionserver/ 208.76.44.142:60020.compactor] INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed TestTable,,1257986664542 2009-11-12 00:47:17,906 [Thread-315] INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink as 208.76.44.140:51010 2009-11-12 00:47:17,906 [Thread-315] INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_1351692500502810095_1391 2009-11-12 00:47:23,918 [Thread-315] INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink as 208.76.44.140:51010 2009-11-12 00:47:23,918 [Thread-315] INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-3310646336307339512_1391 2009-11-12 00:47:29,982 [Thread-318] INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink as 208.76.44.140:51010 2009-11-12 00:47:29,982 [Thread-318] INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_3070440586900692765_1393 2009-11-12 00:47:35,997 [Thread-318] INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink as 208.76.44.140:51010 2009-11-12 00:47:35,997 [Thread-318] INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-5656011219762164043_1393 2009-11-12 00:47:42,007 [Thread-318] INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink as 208.76.44.140:51010 2009-11-12 00:47:42,007 [Thread-318] INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-2359634393837722978_1393 2009-11-12 00:47:48,017 [Thread-318] INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink as 208.76.44.140:51010 2009-11-12 00:47:48,017 [Thread-318] INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-1626727145091780831_1393 2009-11-12 00:47:54,022 [Thread-318] WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block. at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream $DataStreamer.nextBlockOutputStream(DFSClient.java:3100) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream $DataStreamer.run(DFSClient.java:2681) 2009-11-12 00:47:54,022 [Thread-318] WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source file "/hbase/TestTable/868626151/splits/1211221550/info/ 5071349140567656566.868626151" - Aborting... 2009-11-12 00:47:54,029 [regionserver/ 208.76.44.142:60020.compactor] ERROR org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction/ Split failed for region TestTable,,1257986664542 java.io.IOException: Bad connect ack with firstBadLink as 208.76.44.140:51010 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream $DataStreamer.createBlockOutputStream(DFSClient.java:3160) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream $DataStreamer.nextBlockOutputStream(DFSClient.java:3080) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream $DataStreamer.run(DFSClient.java:2681)
{code}
Marking this as blocker.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Reply via email to