[
https://issues.apache.org/jira/browse/HBASE-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776855#action_12776855
]
stack commented on HBASE-1973:
------------------------------
Weird thing is that it seems like the create eventually succeeds.. after we've
given up on it if I look at the NN:
{code}
[st...@aa0-000-12 logs]$ grep
'/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617'
hadoop-stack-namenode-aa0-000-12.u.powerset.com.log
2009-11-12 05:09:38,618 DEBUG org.apache.hadoop.hdfs.StateChange: *DIR*
NameNode.create: file
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
for DFSClient_276778250 at 208.76.44.141
2009-11-12 05:09:38,618 DEBUG org.apache.hadoop.hdfs.StateChange: DIR*
NameSystem.startFile:
src=/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617,
holder=DFSClient_276778250, clientMachine=208.76.44.141, createParent=true,
replication=3, overwrite=true, append=false
2009-11-12 05:09:38,619 DEBUG org.apache.hadoop.hdfs.StateChange: DIR*
FSDirectory.addFile:
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
is added to the file system
2009-11-12 05:09:38,619 DEBUG org.apache.hadoop.hdfs.StateChange: DIR*
NameSystem.startFile: add
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
to namespace for DFSClient_276778250
2009-11-12 05:09:38,620 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=stack,powerset,engineering ip=/208.76.44.141 cmd=create
src=/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
dst=null perm=stack:supergroup:rw-r--r--
2009-11-12 05:09:38,630 DEBUG org.apache.hadoop.hdfs.StateChange: *BLOCK*
NameNode.addBlock: file
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
for DFSClient_276778250
2009-11-12 05:09:38,630 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.getAdditionalBlock: file
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
for DFSClient_276778250
2009-11-12 05:09:38,631 DEBUG org.apache.hadoop.hdfs.StateChange: DIR*
FSDirectory.addFile:
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
with blk_1786802275426334496_1076 block is added to the in-memory file system
2009-11-12 05:09:38,631 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.allocateBlock:
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617.
blk_1786802275426334496_1076{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[208.76.44.141:51010|RBW],
ReplicaUnderConstruction[208.76.44.139:51010|RBW],
ReplicaUnderConstruction[208.76.44.140:51010|RBW]]}
2009-11-12 05:09:38,646 DEBUG org.apache.hadoop.hdfs.StateChange: *BLOCK*
NameNode.abandonBlock: blk_1786802275426334496_1076 of file
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
2009-11-12 05:09:38,646 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.abandonBlock: blk_1786802275426334496_1076of file
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
2009-11-12 05:09:38,646 DEBUG org.apache.hadoop.hdfs.StateChange: DIR*
FSDirectory.addFile:
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
with blk_1786802275426334496_1076 block is added to the file system
2009-11-12 05:09:44,655 DEBUG org.apache.hadoop.hdfs.StateChange: *BLOCK*
NameNode.addBlock: file
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
for DFSClient_276778250
2009-11-12 05:09:44,655 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.getAdditionalBlock: file
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
for DFSClient_276778250
2009-11-12 05:09:44,656 DEBUG org.apache.hadoop.hdfs.StateChange: DIR*
FSDirectory.addFile:
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
with blk_5838454320666652488_1076 block is added to the in-memory file system
2009-11-12 05:09:44,656 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.allocateBlock:
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617.
blk_5838454320666652488_1076{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[208.76.44.141:51010|RBW],
ReplicaUnderConstruction[208.76.44.140:51010|RBW],
ReplicaUnderConstruction[208.76.44.139:51010|RBW]]}
2009-11-12 05:09:44,662 DEBUG org.apache.hadoop.hdfs.StateChange: *BLOCK*
NameNode.abandonBlock: blk_5838454320666652488_1076 of file
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
2009-11-12 05:09:44,662 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.abandonBlock: blk_5838454320666652488_1076of file
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
2009-11-12 05:09:44,662 DEBUG org.apache.hadoop.hdfs.StateChange: DIR*
FSDirectory.addFile:
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
with blk_5838454320666652488_1076 block is added to the file system
2009-11-12 05:09:50,675 DEBUG org.apache.hadoop.hdfs.StateChange: *BLOCK*
NameNode.addBlock: file
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
for DFSClient_276778250
2009-11-12 05:09:50,675 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.getAdditionalBlock: file
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
for DFSClient_276778250
2009-11-12 05:09:50,676 DEBUG org.apache.hadoop.hdfs.StateChange: DIR*
FSDirectory.addFile:
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
with blk_8615264034947520229_1076 block is added to the in-memory file system
2009-11-12 05:09:50,676 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.allocateBlock:
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617.
blk_8615264034947520229_1076{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[208.76.44.141:51010|RBW],
ReplicaUnderConstruction[208.76.44.139:51010|RBW],
ReplicaUnderConstruction[208.76.44.140:51010|RBW]]}
2009-11-12 05:09:50,683 DEBUG org.apache.hadoop.hdfs.StateChange: *BLOCK*
NameNode.abandonBlock: blk_8615264034947520229_1076 of file
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
2009-11-12 05:09:50,683 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.abandonBlock: blk_8615264034947520229_1076of file
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
2009-11-12 05:09:50,683 DEBUG org.apache.hadoop.hdfs.StateChange: DIR*
FSDirectory.addFile:
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
with blk_8615264034947520229_1076 block is added to the file system
2009-11-12 05:09:56,685 DEBUG org.apache.hadoop.hdfs.StateChange: *BLOCK*
NameNode.addBlock: file
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
for DFSClient_276778250
2009-11-12 05:09:56,685 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.getAdditionalBlock: file
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
for DFSClient_276778250
2009-11-12 05:09:56,686 DEBUG org.apache.hadoop.hdfs.StateChange: DIR*
FSDirectory.addFile:
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
with blk_-5632723632070749366_1077 block is added to the in-memory file system
2009-11-12 05:09:56,686 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.allocateBlock:
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617.
blk_-5632723632070749366_1077{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[208.76.44.141:51010|RBW],
ReplicaUnderConstruction[208.76.44.140:51010|RBW],
ReplicaUnderConstruction[208.76.44.139:51010|RBW]]}
2009-11-12 05:09:56,691 DEBUG org.apache.hadoop.hdfs.StateChange: *BLOCK*
NameNode.abandonBlock: blk_-5632723632070749366_1077 of file
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
2009-11-12 05:09:56,691 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.abandonBlock: blk_-5632723632070749366_1077of file
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
2009-11-12 05:09:56,691 DEBUG org.apache.hadoop.hdfs.StateChange: DIR*
FSDirectory.addFile:
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
with blk_-5632723632070749366_1077 block is added to the file system
Its failing above opening the file.
Below is open by the master doing its split....
2009-11-12 05:10:10,924 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=stack,powerset,engineering ip=/208.76.44.139 cmd=open
src=/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
dst=null perm=null
2009-11-12 05:10:10,925 DEBUG org.apache.hadoop.hdfs.StateChange: *DIR*
Namenode.delete:
src=/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617,
recursive=true
2009-11-12 05:10:10,926 DEBUG org.apache.hadoop.hdfs.StateChange: DIR*
NameSystem.delete:
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
2009-11-12 05:10:10,926 DEBUG org.apache.hadoop.hdfs.StateChange: DIR*
FSDirectory.delete:
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
2009-11-12 05:10:10,926 DEBUG org.apache.hadoop.hdfs.StateChange: DIR*
FSDirectory.unprotectedDelete:
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
is removed
2009-11-12 05:10:10,926 DEBUG
org.apache.hadoop.hdfs.server.namenode.LeaseManager: LeaseManager.findLease:
prefix=/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
2009-11-12 05:10:10,926 DEBUG
org.apache.hadoop.hdfs.server.namenode.LeaseManager:
LeaseManager.removeLeaseWithPrefixPath:
entry=/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617=[Lease.
Holder: DFSClient_276778250, pendingcreates: 1]
2009-11-12 05:10:10,926 DEBUG org.apache.hadoop.hdfs.StateChange: DIR*
Namesystem.delete:
/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
is removed
2009-11-12 05:10:10,927 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=stack,powerset,engineering ip=/208.76.44.139 cmd=delete
src=/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617
dst=null perm=null
{code}
The above open on master actually throws an EOFE. File length is seen as zero.
We don't do the available on the backing stream to get the length hdfs sees.
> Second RS goes down because "Bad connect ack with firstBadLink as..."
> ---------------------------------------------------------------------
>
> Key: HBASE-1973
> URL: https://issues.apache.org/jira/browse/HBASE-1973
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: stack
>
> Placeholder for issue where a second RS goes down when I kill another RS and
> its DN. I asked over on hdfs-user why we're stuck on 140. Why can't it move
> on to another DN creating a block.
> {code}
> 2009-11-12 05:09:38,624 [regionserver/208.76.44.141:60020.logRoller] INFO
> org.apache.hadoop.hbase.regionserver.wal.HLog: Roll
> /hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002405346,
> entries=52747, calcsize=63757973, filesize=58588977. New hlog
> /hbase/.logs/aa0-000-14.u.powerset.com,60020,12
> 58002404935/hlog.dat.1258002578617
> 2009-11-12 05:09:38,628 [regionserver/208.76.44.141:60020.logRoller] DEBUG
> org.apache.hadoop.hbase.regionserver.wal.HLog: Found 0 hlogs to remove out
> of total 1; oldest outstanding seqnum is 1 from region .META.,,1
> 2009-11-12 05:09:38,628 [IPC Server handler 16 on 60020] INFO
> org.apache.hadoop.hbase.regionserver.wal.HLog: edit=0,
> write=TestTable/TestTable,,1258002466308/612228
> 2009-11-12 05:09:38,630 [regionserver/208.76.44.141:60020.logSyncer] INFO
> org.apache.hadoop.hbase.regionserver.wal.HLog: sync
> 2009-11-12 05:09:38,646 [Thread-61] INFO org.apache.hadoop.hdfs.DFSClient:
> Exception in createBlockOutputStream java.io.IOException: Bad connect ack
> with firstBadLink as 208.76.44.140:51010
> 2009-11-12 05:09:38,646 [Thread-61] INFO org.apache.hadoop.hdfs.DFSClient:
> Abandoning block blk_1786802275426334496_1076
> 2009-11-12 05:09:44,662 [Thread-61] INFO org.apache.hadoop.hdfs.DFSClient:
> Exception in createBlockOutputStream java.io.IOException: Bad connect ack
> with firstBadLink as 208.76.44.140:51010
> 2009-11-12 05:09:44,662 [Thread-61] INFO org.apache.hadoop.hdfs.DFSClient:
> Abandoning block blk_5838454320666652488_1076
> 2009-11-12 05:09:46,226 [pool-1-thread-1] DEBUG
> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
> Total=5.5656967MB (5836056), Free=672.77185MB (705452392), Max=678.3375MB
> (711288448), Counts: Blocks=0, Access=0, Hit=0, Miss=0, Evictions=0,
> Evicted=0, Ratios: Hit Ratio=NaN%, Miss Ratio=NaN%, Evicted
> /Run=NaN
> 2009-11-12 05:09:50,683 [Thread-61] INFO org.apache.hadoop.hdfs.DFSClient:
> Exception in createBlockOutputStream java.io.IOException: Bad connect ack
> with firstBadLink as 208.76.44.140:51010
> 2009-11-12 05:09:50,684 [Thread-61] INFO org.apache.hadoop.hdfs.DFSClient:
> Abandoning block blk_8615264034947520229_1076
> 2009-11-12 05:09:56,691 [Thread-61] INFO org.apache.hadoop.hdfs.DFSClient:
> Exception in createBlockOutputStream java.io.IOException: Bad connect ack
> with firstBadLink as 208.76.44.140:51010
> 2009-11-12 05:09:56,692 [Thread-61] INFO org.apache.hadoop.hdfs.DFSClient:
> Abandoning block blk_-5632723632070749366_1077
> 2009-11-12 05:10:02,696 [Thread-61] WARN org.apache.hadoop.hdfs.DFSClient:
> DataStreamer Exception: java.io.IOException: Unable to create new block.
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSClient.java:3100)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2681)
> 2009-11-12 05:10:02,697 [Thread-61] WARN org.apache.hadoop.hdfs.DFSClient:
> Could not get block locations. Source file
> "/hbase/.logs/aa0-000-14.u.powerset.com,60020,1258002404935/hlog.dat.1258002578617"
> - Aborting...
> 2009-11-12 05:10:02,699 [regionserver/208.76.44.141:60020.logSyncer] FATAL
> org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting
> close of hlog
> java.io.IOException: Bad connect ack with firstBadLink as 208.76.44.140:51010
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.createBlockOutputStream(DFSClient.java:3160)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSClient.java:3080)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2681)
> 2009-11-12 05:10:02,701 [regionserver/208.76.44.141:60020.logSyncer] ERROR
> org.apache.hadoop.hbase.regionserver.wal.HLog: Error while syncing,
> requesting close of hlog
> java.io.IOException: Bad connect ack with firstBadLink as 208.76.44.140:51010
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.createBlockOutputStream(DFSClient.java:3160)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSClient.java:3080)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2681)
> 2009-11-12 05:10:02,703 [IPC Server handler 20 on 60020] FATAL
> org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting
> close of hlog
> java.io.IOException: IOException flush:java.io.IOException: Bad connect ack
> with firstBadLink as 208.76.44.140:51010
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.hflush(DFSClient.java:3527)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3473)
> at
> org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
> at org.apache.hadoop.hbase.regionserver.wal.HLog.hflush(HLog.java:829)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:751)
> 2009-11-12 05:10:02,703 [regionserver/208.76.44.141:60020.logSyncer] INFO
> org.apache.hadoop.hbase.regionserver.wal.HLog:
> regionserver/208.76.44.141:60020.logSyncer exiting
> 2009-11-12 05:10:02,706 [IPC Server handler 10 on 60020] FATAL
> org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append. Requesting
> close of hlog
> java.io.IOException: IOException flush:java.io.IOException: Bad connect ack
> with firstBadLink as 208.76.44.140:51010
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.hflush(DFSClient.java:3527)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3473)
> at
> org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
> at org.apache.hadoop.hbase.regionserver.wal.HLog.hflush(HLog.java:829)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:751)
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.