[ https://issues.apache.org/jira/browse/HDFS-16829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rushabh Shah updated HDFS-16829: -------------------------------- Description: We encountered this data loss issue in one of our production clusters which runs hbase service. We received a missing block alert in this cluster. This error was logged in the datanode holding the block. {noformat} 2022-10-27 18:37:51,341 ERROR [17546151_2244173222]] datanode.DataNode - nodeA:51010:DataXceiver error processing READ_BLOCK operation src: /nodeA:31722 dst: <destination> java.io.IOException: Offset 64410559 and length 4096 don't match block BP-958889176-1567030695029:blk_3317546151_2244173222 ( blockLen 59158528 ) at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:384) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:603) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:145) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:100) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:298) at java.lang.Thread.run(Thread.java:750) {noformat} The node +nodeA+ has this block blk_3317546151_2244173222 with file length: 59158528 but the length of this block according to namenode is 64414655 (according to fsck) This are the sequence of events for this block. 1. Namenode created a file with 3 replicas with block id: blk_3317546151 and genstamp: 2244173147. 2. The first datanode in the pipeline (This physical host was also running region server process which was hdfs client) was restarting at the same time. Unfortunately this node was sick and it didn't log anything neither in datanode process or regionserver process during the time of block creation. 3. Namenode updated the pipeline just with the first node. 4. Namenode logged updatePipeline success with just 1st node nodeA with block size: 64414655 and new generation stamp: 2244173222 5. Namenode asked nodeB and nodeC to delete the block since it has old generation stamp. 6. All the reads (client reads and data transfer reads) from nodeA are failing with the above stack trace. See logs below from namenode and nodeB and nodeC. {noformat} ---------------- Logs from namenode ----------------- 2022-10-23 12:36:34,449 INFO [on default port 8020] hdfs.StateChange - BLOCK* allocate blk_3317546151_2244173147, replicas=nodeA:51010, nodeB:51010 , nodeC:51010 for <file> 2022-10-23 12:36:34,978 INFO [on default port 8020] namenode.FSNamesystem - updatePipeline(blk_3317546151_2244173147 => blk_3317546151_2244173222) success 2022-10-23 12:36:34,978 INFO [on default port 8020] namenode.FSNamesystem - updatePipeline(blk_3317546151_2244173147, newGS=2244173222, newLength=64414655, newNodes=[nodeA:51010], client=DFSClient_NONMAPREDUCE_1038417265_1) 2022-10-23 12:36:35,004 INFO [on default port 8020] hdfs.StateChange - DIR* completeFile: <file> is closed by DFSClient_NONMAPREDUCE_1038417265_1 {noformat} {noformat} ------------- Logs from nodeB ------------- 2022-10-23 12:36:35,084 INFO [0.180.160.231:51010]] datanode.DataNode - Received BP-958889176-1567030695029:blk_3317546151_2244173147 size 64414655 from nodeA:30302 2022-10-23 12:36:35,084 INFO [0.180.160.231:51010]] datanode.DataNode - PacketResponder: BP-958889176-1567030695029:blk_3317546151_2244173147, type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=1:[nodeC:51010] terminating 2022-10-23 12:36:39,738 INFO [/data-2/hdfs/current] impl.FsDatasetAsyncDiskService - Deleted BP-958889176-1567030695029 blk_3317546151_2244173147 file /data-2/hdfs/current/BP-958889176-1567030695029/current/finalized/subdir189/subdir188/blk_3317546151 {noformat} {noformat} ------------- Logs from nodeC ------------- 2022-10-23 12:36:34,985 INFO [ype=LAST_IN_PIPELINE] datanode.DataNode - Received BP-958889176-1567030695029:blk_3317546151_2244173147 size 64414655 from nodeB:56486 2022-10-23 12:36:34,985 INFO [ype=LAST_IN_PIPELINE] datanode.DataNode - PacketResponder: BP-958889176-1567030695029:blk_3317546151_2244173147, type=LAST_IN_PIPELINE terminating 2022-10-23 12:36:38,891 INFO [/data-1/hdfs/current] impl.FsDatasetAsyncDiskService - Deleted BP-958889176-1567030695029 blk_3317546151_2244173147 file /data-1/hdfs/current/BP-958889176-1567030695029/current/finalized/subdir189/subdir188/blk_3317546151 {noformat} My observation but can't confirm anything since I don't have logs from first node in pipeline (nodeA) or hdfs client. Due to disk write errors or bug in BlockReceiver, nodeA didn't flush/sync the last 5MB (64414655-59158528) of data to disk. I assume it buffered in memory since nameonode got an acknowledgement from the client that updatePipeline succeeded. *Improvement:* One suggested improvement is to delay deleting the blocks with old generation stamp until the block is replicated sufficiently which satisfies replication factor. was: We encountered this data loss issue in one of our production clusters which runs hbase service. We received a missing block alert in this cluster. This error was logged in the datanode holding the block. {noformat} 2022-10-27 18:37:51,341 ERROR [17546151_2244173222]] datanode.DataNode - nodeA:51010:DataXceiver error processing READ_BLOCK operation src: /nodeA:31722 dst: <destination> java.io.IOException: Offset 64410559 and length 4096 don't match block BP-958889176-1567030695029:blk_3317546151_2244173222 ( blockLen 59158528 ) at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:384) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:603) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:145) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:100) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:298) at java.lang.Thread.run(Thread.java:750) {noformat} The node +nodeA+ has this block blk_3317546151_2244173222 with file length: 59158528 but the length of this block according to namenode is 64414655 (according to fsck) This are the sequence of events for this block. 1. Namenode created a file with 3 replicas with block id: blk_3317546151 and genstamp: 2244173147. 2. The first datanode in the pipeline (This physical host was also running region server process which was hdfs client) was restarting at the same time. Unfortunately this node was sick and it didn't log anything neither in datanode process or regionserver process during the time of block creation. 3. Namenode updated the pipeline just with the first node. 4. Namenode logged updatePipeline success with just 1st node nodeA with block size: 64414655 and new generation stamp: 2244173222 5. Namenode asked nodeB and nodeC to delete the block since it has old generation stamp. 6. All the reads (client reads and data transfer reads) from nodeA are failing with the above stack trace. See logs below from namenode and nodeB and nodeC. {noformat} ---------------- Logs from namenode ----------------- 2022-10-23 12:36:34,449 INFO [on default port 8020] hdfs.StateChange - BLOCK* allocate blk_3317546151_2244173147, replicas=nodeA:51010, nodeB:51010 , nodeC:51010 for <file> 2022-10-23 12:36:34,978 INFO [on default port 8020] namenode.FSNamesystem - updatePipeline(blk_3317546151_2244173147 => blk_3317546151_2244173222) success 2022-10-23 12:36:34,978 INFO [on default port 8020] namenode.FSNamesystem - updatePipeline(blk_3317546151_2244173147, newGS=2244173222, newLength=64414655, newNodes=[nodeA:51010], client=DFSClient_NONMAPREDUCE_1038417265_1) 2022-10-23 12:36:35,004 INFO [on default port 8020] hdfs.StateChange - DIR* completeFile: <file> is closed by DFSClient_NONMAPREDUCE_1038417265_1 {noformat} {noformat} ------------- Logs from nodeB ------------- 2022-10-23 12:36:35,084 INFO [0.180.160.231:51010]] datanode.DataNode - Received BP-958889176-1567030695029:blk_3317546151_2244173147 size 64414655 from nodeA:30302 2022-10-23 12:36:35,084 INFO [0.180.160.231:51010]] datanode.DataNode - PacketResponder: BP-958889176-1567030695029:blk_3317546151_2244173147, type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=1:[nodeC:51010] terminating 2022-10-23 12:36:39,738 INFO [/data-2/hdfs/current] impl.FsDatasetAsyncDiskService - Deleted BP-958889176-1567030695029 blk_3317546151_2244173147 file /data-2/hdfs/current/BP-958889176-1567030695029/current/finalized/subdir189/subdir188/blk_3317546151 {noformat} {noformat} ------------- Logs from nodeC ------------- 2022-10-23 12:36:34,985 INFO [ype=LAST_IN_PIPELINE] datanode.DataNode - Received BP-958889176-1567030695029:blk_3317546151_2244173147 size 64414655 from nodeB:56486 2022-10-23 12:36:34,985 INFO [ype=LAST_IN_PIPELINE] datanode.DataNode - PacketResponder: BP-958889176-1567030695029:blk_3317546151_2244173147, type=LAST_IN_PIPELINE terminating 2022-10-23 12:36:38,891 INFO [/data-1/hdfs/current] impl.FsDatasetAsyncDiskService - Deleted BP-958889176-1567030695029 blk_3317546151_2244173147 file /data-1/hdfs/current/BP-958889176-1567030695029/current/finalized/subdir189/subdir188/blk_3317546151 {noformat} My observation but can't confirm anything since I don't have logs from first node in pipeline (nodeA) or hdfs client. Due to disk write errors or bug in BlockReceiver, nodeA didn't flush/sync the last 5MB (64414655-59158528) of data to disk. I assume it buffered in memory since nameonode got an acknowledgement from the client that updatePipeline succeeded. *Improvement: * One suggested improvement is to delay deleting the blocks with old generation stamp until the block is replicated sufficiently which satisfies replication factor. > Delay deleting blocks with older generation stamp until the block is fully > replicated. > -------------------------------------------------------------------------------------- > > Key: HDFS-16829 > URL: https://issues.apache.org/jira/browse/HDFS-16829 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode > Affects Versions: 2.10.1 > Reporter: Rushabh Shah > Priority: Major > > We encountered this data loss issue in one of our production clusters which > runs hbase service. We received a missing block alert in this cluster. This > error was logged in the datanode holding the block. > {noformat} > 2022-10-27 18:37:51,341 ERROR [17546151_2244173222]] datanode.DataNode - > nodeA:51010:DataXceiver error processing READ_BLOCK operation src: > /nodeA:31722 dst: <destination> > java.io.IOException: Offset 64410559 and length 4096 don't match block > BP-958889176-1567030695029:blk_3317546151_2244173222 ( blockLen 59158528 ) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:384) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:603) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:145) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:100) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:298) > at java.lang.Thread.run(Thread.java:750) > {noformat} > The node +nodeA+ has this block blk_3317546151_2244173222 with file length: > 59158528 but the length of this block according to namenode is 64414655 > (according to fsck) > This are the sequence of events for this block. > > 1. Namenode created a file with 3 replicas with block id: blk_3317546151 and > genstamp: 2244173147. > 2. The first datanode in the pipeline (This physical host was also running > region server process which was hdfs client) was restarting at the same time. > Unfortunately this node was sick and it didn't log anything neither in > datanode process or regionserver process during the time of block creation. > 3. Namenode updated the pipeline just with the first node. > 4. Namenode logged updatePipeline success with just 1st node nodeA with block > size: 64414655 and new generation stamp: 2244173222 > 5. Namenode asked nodeB and nodeC to delete the block since it has old > generation stamp. > 6. All the reads (client reads and data transfer reads) from nodeA are > failing with the above stack trace. > See logs below from namenode and nodeB and nodeC. > {noformat} > ---------------- Logs from namenode ----------------- > 2022-10-23 12:36:34,449 INFO [on default port 8020] hdfs.StateChange - > BLOCK* allocate blk_3317546151_2244173147, replicas=nodeA:51010, nodeB:51010 > , nodeC:51010 for <file> > 2022-10-23 12:36:34,978 INFO [on default port 8020] namenode.FSNamesystem - > updatePipeline(blk_3317546151_2244173147 => blk_3317546151_2244173222) success > 2022-10-23 12:36:34,978 INFO [on default port 8020] namenode.FSNamesystem - > updatePipeline(blk_3317546151_2244173147, newGS=2244173222, > newLength=64414655, newNodes=[nodeA:51010], > client=DFSClient_NONMAPREDUCE_1038417265_1) > 2022-10-23 12:36:35,004 INFO [on default port 8020] hdfs.StateChange - DIR* > completeFile: <file> is closed by DFSClient_NONMAPREDUCE_1038417265_1 > {noformat} > {noformat} > ------------- Logs from nodeB ------------- > 2022-10-23 12:36:35,084 INFO [0.180.160.231:51010]] datanode.DataNode - > Received BP-958889176-1567030695029:blk_3317546151_2244173147 size 64414655 > from nodeA:30302 > 2022-10-23 12:36:35,084 INFO [0.180.160.231:51010]] datanode.DataNode - > PacketResponder: BP-958889176-1567030695029:blk_3317546151_2244173147, > type=HAS_DOWNSTREAM_IN_PIPELINE, downstreams=1:[nodeC:51010] terminating > 2022-10-23 12:36:39,738 INFO [/data-2/hdfs/current] > impl.FsDatasetAsyncDiskService - Deleted BP-958889176-1567030695029 > blk_3317546151_2244173147 file > /data-2/hdfs/current/BP-958889176-1567030695029/current/finalized/subdir189/subdir188/blk_3317546151 > {noformat} > > {noformat} > ------------- Logs from nodeC ------------- > 2022-10-23 12:36:34,985 INFO [ype=LAST_IN_PIPELINE] datanode.DataNode - > Received BP-958889176-1567030695029:blk_3317546151_2244173147 size 64414655 > from nodeB:56486 > 2022-10-23 12:36:34,985 INFO [ype=LAST_IN_PIPELINE] datanode.DataNode - > PacketResponder: BP-958889176-1567030695029:blk_3317546151_2244173147, > type=LAST_IN_PIPELINE terminating > 2022-10-23 12:36:38,891 INFO [/data-1/hdfs/current] > impl.FsDatasetAsyncDiskService - Deleted BP-958889176-1567030695029 > blk_3317546151_2244173147 file > /data-1/hdfs/current/BP-958889176-1567030695029/current/finalized/subdir189/subdir188/blk_3317546151 > {noformat} > My observation but can't confirm anything since I don't have logs from first > node in pipeline (nodeA) or hdfs client. > Due to disk write errors or bug in BlockReceiver, nodeA didn't flush/sync the > last 5MB (64414655-59158528) of data to disk. I assume it buffered in memory > since nameonode got an acknowledgement from the client that updatePipeline > succeeded. > *Improvement:* > One suggested improvement is to delay deleting the blocks with old generation > stamp until the block is replicated sufficiently which satisfies replication > factor. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org