[jira] [Updated] (HDFS-15175) Multiple CloseOp shared block instance causes the standby namenode to crash when rolling editlog

2024-02-11 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15175:
--
 Component/s: namnode
Target Version/s: 3.2.4, 3.3.2, 3.2.3, 3.4.0

> Multiple CloseOp shared block instance causes the standby namenode to crash 
> when rolling editlog
> 
>
> Key: HDFS-15175
> URL: https://issues.apache.org/jira/browse/HDFS-15175
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namnode
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Wan Chang
>Priority: Critical
>  Labels: NameNode
> Fix For: 3.4.0, 3.2.3, 3.3.2, 3.2.4
>
> Attachments: HDFS-15175-trunk.1.patch
>
>
>  
> {panel:title=Crash exception}
> 2020-02-16 09:24:46,426 [507844305] - ERROR [Edit log 
> tailer:FSEditLogLoader@245] - Encountered exception on operation CloseOp 
> [length=0, inodeId=0, path=..., replication=3, mtime=1581816138774, 
> atime=1581814760398, blockSize=536870912, blocks=[blk_5568434562_4495417845], 
> permissions=da_music:hdfs:rw-r-, aclEntries=null, clientName=, 
> clientMachine=, overwrite=false, storagePolicyId=0, opCode=OP_CLOSE, 
> txid=32625024993]
>  java.io.IOException: File is not under construction: ..
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:442)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:237)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:146)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:891)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:872)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:262)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:395)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:348)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:365)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:360)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1873)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:479)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:361)
> {panel}
>  
> {panel:title=Editlog}
> 
>  OP_REASSIGN_LEASE
>  
>  32625021150
>  DFSClient_NONMAPREDUCE_-969060727_197760
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625023743
>  0
>  0
>  ..
>  3
>  1581816135883
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> ..
> 
>  OP_TRUNCATE
>  
>  32625024049
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  ..
>  185818644
>  1581816136336
>  
>  5568434562
>  185818648
>  4495417845
>  
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625024993
>  0
>  0
>  ..
>  3
>  1581816138774
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> {panel}
>  
>  
> The block size should be 185818648 in the first CloseOp. When truncate is 
> used, the block size becomes 185818644. The CloseOp/TruncateOp/CloseOp is 
> synchronized to the JournalNode in the same batch. The block used by CloseOp 
> twice is the same instance, which causes the first CloseOp has wrong block 
> size. When SNN rolling Editlog, TruncateOp does not make the file to the 
> UnderConstruction state. Then, when the second CloseOp is executed, the file 
> is not in the UnderConstruction state, and SNN crashes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15175) Multiple CloseOp shared block instance causes the standby namenode to crash when rolling editlog

2021-07-29 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-15175:
---
Fix Version/s: 3.3.2

> Multiple CloseOp shared block instance causes the standby namenode to crash 
> when rolling editlog
> 
>
> Key: HDFS-15175
> URL: https://issues.apache.org/jira/browse/HDFS-15175
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Wan Chang
>Priority: Critical
>  Labels: NameNode
> Fix For: 3.4.0, 3.2.3, 3.3.2, 3.2.4
>
> Attachments: HDFS-15175-trunk.1.patch
>
>
>  
> {panel:title=Crash exception}
> 2020-02-16 09:24:46,426 [507844305] - ERROR [Edit log 
> tailer:FSEditLogLoader@245] - Encountered exception on operation CloseOp 
> [length=0, inodeId=0, path=..., replication=3, mtime=1581816138774, 
> atime=1581814760398, blockSize=536870912, blocks=[blk_5568434562_4495417845], 
> permissions=da_music:hdfs:rw-r-, aclEntries=null, clientName=, 
> clientMachine=, overwrite=false, storagePolicyId=0, opCode=OP_CLOSE, 
> txid=32625024993]
>  java.io.IOException: File is not under construction: ..
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:442)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:237)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:146)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:891)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:872)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:262)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:395)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:348)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:365)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:360)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1873)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:479)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:361)
> {panel}
>  
> {panel:title=Editlog}
> 
>  OP_REASSIGN_LEASE
>  
>  32625021150
>  DFSClient_NONMAPREDUCE_-969060727_197760
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625023743
>  0
>  0
>  ..
>  3
>  1581816135883
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> ..
> 
>  OP_TRUNCATE
>  
>  32625024049
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  ..
>  185818644
>  1581816136336
>  
>  5568434562
>  185818648
>  4495417845
>  
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625024993
>  0
>  0
>  ..
>  3
>  1581816138774
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> {panel}
>  
>  
> The block size should be 185818648 in the first CloseOp. When truncate is 
> used, the block size becomes 185818644. The CloseOp/TruncateOp/CloseOp is 
> synchronized to the JournalNode in the same batch. The block used by CloseOp 
> twice is the same instance, which causes the first CloseOp has wrong block 
> size. When SNN rolling Editlog, TruncateOp does not make the file to the 
> UnderConstruction state. Then, when the second CloseOp is executed, the file 
> is not in the UnderConstruction state, and SNN crashes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15175) Multiple CloseOp shared block instance causes the standby namenode to crash when rolling editlog

2021-07-28 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-15175:
---
Fix Version/s: 3.2.4
   3.2.3

> Multiple CloseOp shared block instance causes the standby namenode to crash 
> when rolling editlog
> 
>
> Key: HDFS-15175
> URL: https://issues.apache.org/jira/browse/HDFS-15175
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Wan Chang
>Priority: Critical
>  Labels: NameNode
> Fix For: 3.4.0, 3.2.3, 3.2.4
>
> Attachments: HDFS-15175-trunk.1.patch
>
>
>  
> {panel:title=Crash exception}
> 2020-02-16 09:24:46,426 [507844305] - ERROR [Edit log 
> tailer:FSEditLogLoader@245] - Encountered exception on operation CloseOp 
> [length=0, inodeId=0, path=..., replication=3, mtime=1581816138774, 
> atime=1581814760398, blockSize=536870912, blocks=[blk_5568434562_4495417845], 
> permissions=da_music:hdfs:rw-r-, aclEntries=null, clientName=, 
> clientMachine=, overwrite=false, storagePolicyId=0, opCode=OP_CLOSE, 
> txid=32625024993]
>  java.io.IOException: File is not under construction: ..
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:442)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:237)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:146)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:891)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:872)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:262)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:395)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:348)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:365)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:360)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1873)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:479)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:361)
> {panel}
>  
> {panel:title=Editlog}
> 
>  OP_REASSIGN_LEASE
>  
>  32625021150
>  DFSClient_NONMAPREDUCE_-969060727_197760
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625023743
>  0
>  0
>  ..
>  3
>  1581816135883
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> ..
> 
>  OP_TRUNCATE
>  
>  32625024049
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  ..
>  185818644
>  1581816136336
>  
>  5568434562
>  185818648
>  4495417845
>  
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625024993
>  0
>  0
>  ..
>  3
>  1581816138774
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> {panel}
>  
>  
> The block size should be 185818648 in the first CloseOp. When truncate is 
> used, the block size becomes 185818644. The CloseOp/TruncateOp/CloseOp is 
> synchronized to the JournalNode in the same batch. The block used by CloseOp 
> twice is the same instance, which causes the first CloseOp has wrong block 
> size. When SNN rolling Editlog, TruncateOp does not make the file to the 
> UnderConstruction state. Then, when the second CloseOp is executed, the file 
> is not in the UnderConstruction state, and SNN crashes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15175) Multiple CloseOp shared block instance causes the standby namenode to crash when rolling editlog

2021-07-28 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-15175:
---
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks [~caiyicong] report and thanks [~wanchang] for your 
contribution! Thanks [~sodonnell] for your reviews!

> Multiple CloseOp shared block instance causes the standby namenode to crash 
> when rolling editlog
> 
>
> Key: HDFS-15175
> URL: https://issues.apache.org/jira/browse/HDFS-15175
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Wan Chang
>Priority: Critical
>  Labels: NameNode
> Fix For: 3.4.0
>
> Attachments: HDFS-15175-trunk.1.patch
>
>
>  
> {panel:title=Crash exception}
> 2020-02-16 09:24:46,426 [507844305] - ERROR [Edit log 
> tailer:FSEditLogLoader@245] - Encountered exception on operation CloseOp 
> [length=0, inodeId=0, path=..., replication=3, mtime=1581816138774, 
> atime=1581814760398, blockSize=536870912, blocks=[blk_5568434562_4495417845], 
> permissions=da_music:hdfs:rw-r-, aclEntries=null, clientName=, 
> clientMachine=, overwrite=false, storagePolicyId=0, opCode=OP_CLOSE, 
> txid=32625024993]
>  java.io.IOException: File is not under construction: ..
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:442)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:237)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:146)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:891)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:872)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:262)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:395)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:348)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:365)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:360)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1873)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:479)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:361)
> {panel}
>  
> {panel:title=Editlog}
> 
>  OP_REASSIGN_LEASE
>  
>  32625021150
>  DFSClient_NONMAPREDUCE_-969060727_197760
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625023743
>  0
>  0
>  ..
>  3
>  1581816135883
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> ..
> 
>  OP_TRUNCATE
>  
>  32625024049
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  ..
>  185818644
>  1581816136336
>  
>  5568434562
>  185818648
>  4495417845
>  
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625024993
>  0
>  0
>  ..
>  3
>  1581816138774
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> {panel}
>  
>  
> The block size should be 185818648 in the first CloseOp. When truncate is 
> used, the block size becomes 185818644. The CloseOp/TruncateOp/CloseOp is 
> synchronized to the JournalNode in the same batch. The block used by CloseOp 
> twice is the same instance, which causes the first CloseOp has wrong block 
> size. When SNN rolling Editlog, TruncateOp does not make the file to the 
> UnderConstruction state. Then, when the second CloseOp is executed, the file 
> is not in the UnderConstruction state, and SNN crashes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15175) Multiple CloseOp shared block instance causes the standby namenode to crash when rolling editlog

2020-06-15 Thread Wan Chang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wan Chang updated HDFS-15175:
-
Attachment: HDFS-15175-trunk.1.patch
Labels: NameNode  (was: )
Status: Patch Available  (was: Open)

> Multiple CloseOp shared block instance causes the standby namenode to crash 
> when rolling editlog
> 
>
> Key: HDFS-15175
> URL: https://issues.apache.org/jira/browse/HDFS-15175
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Wan Chang
>Priority: Critical
>  Labels: NameNode
> Attachments: HDFS-15175-trunk.1.patch
>
>
>  
> {panel:title=Crash exception}
> 2020-02-16 09:24:46,426 [507844305] - ERROR [Edit log 
> tailer:FSEditLogLoader@245] - Encountered exception on operation CloseOp 
> [length=0, inodeId=0, path=..., replication=3, mtime=1581816138774, 
> atime=1581814760398, blockSize=536870912, blocks=[blk_5568434562_4495417845], 
> permissions=da_music:hdfs:rw-r-, aclEntries=null, clientName=, 
> clientMachine=, overwrite=false, storagePolicyId=0, opCode=OP_CLOSE, 
> txid=32625024993]
>  java.io.IOException: File is not under construction: ..
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:442)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:237)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:146)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:891)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:872)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:262)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:395)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:348)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:365)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:360)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1873)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:479)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:361)
> {panel}
>  
> {panel:title=Editlog}
> 
>  OP_REASSIGN_LEASE
>  
>  32625021150
>  DFSClient_NONMAPREDUCE_-969060727_197760
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625023743
>  0
>  0
>  ..
>  3
>  1581816135883
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> ..
> 
>  OP_TRUNCATE
>  
>  32625024049
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  ..
>  185818644
>  1581816136336
>  
>  5568434562
>  185818648
>  4495417845
>  
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625024993
>  0
>  0
>  ..
>  3
>  1581816138774
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> {panel}
>  
>  
> The block size should be 185818648 in the first CloseOp. When truncate is 
> used, the block size becomes 185818644. The CloseOp/TruncateOp/CloseOp is 
> synchronized to the JournalNode in the same batch. The block used by CloseOp 
> twice is the same instance, which causes the first CloseOp has wrong block 
> size. When SNN rolling Editlog, TruncateOp does not make the file to the 
> UnderConstruction state. Then, when the second CloseOp is executed, the file 
> is not in the UnderConstruction state, and SNN crashes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15175) Multiple CloseOp shared block instance causes the standby namenode to crash when rolling editlog

2020-02-17 Thread Yicong Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yicong Cai updated HDFS-15175:
--
Description: 
 
{panel:title=Crash exception}
2020-02-16 09:24:46,426 [507844305] - ERROR [Edit log 
tailer:FSEditLogLoader@245] - Encountered exception on operation CloseOp 
[length=0, inodeId=0, path=..., replication=3, mtime=1581816138774, 
atime=1581814760398, blockSize=536870912, blocks=[blk_5568434562_4495417845], 
permissions=da_music:hdfs:rw-r-, aclEntries=null, clientName=, 
clientMachine=, overwrite=false, storagePolicyId=0, opCode=OP_CLOSE, 
txid=32625024993]
 java.io.IOException: File is not under construction: ..
 at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:442)
 at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:237)
 at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:146)
 at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:891)
 at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:872)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:262)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:395)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:348)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:365)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:360)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1873)
 at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:479)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:361)
{panel}
 
{panel:title=Editlog}

 OP_REASSIGN_LEASE
 
 32625021150
 DFSClient_NONMAPREDUCE_-969060727_197760
 ..
 DFSClient_NONMAPREDUCE_1000868229_201260
 
 

..


 OP_CLOSE
 
 32625023743
 0
 0
 ..
 3
 1581816135883
 1581814760398
 536870912
 
 
 false
 
 5568434562
 185818644
 4495417845
 
 
 da_music
 hdfs
 416
 
 
 

..


 OP_TRUNCATE
 
 32625024049
 ..
 DFSClient_NONMAPREDUCE_1000868229_201260
 ..
 185818644
 1581816136336
 
 5568434562
 185818648
 4495417845
 
 
 

..


 OP_CLOSE
 
 32625024993
 0
 0
 ..
 3
 1581816138774
 1581814760398
 536870912
 
 
 false
 
 5568434562
 185818644
 4495417845
 
 
 da_music
 hdfs
 416
 
 
 
{panel}
 

 

The block size should be 185818648 in the first CloseOp. When truncate is used, 
the block size becomes 185818644. The CloseOp/TruncateOp/CloseOp is 
synchronized to the JournalNode in the same batch. The block used by CloseOp 
twice is the same instance, which causes the first CloseOp has wrong block 
size. When SNN rolling Editlog, TruncateOp does not make the file to the 
UnderConstruction state. Then, when the second CloseOp is executed, the file is 
not in the UnderConstruction state, and SNN crashes.

  was:
 
{panel:title=Crash exception}
2020-02-16 09:24:46,426 [507844305] - ERROR [Edit log 
tailer:FSEditLogLoader@245] - Encountered exception on operation CloseOp 
[length=0, inodeId=0, path=..., replication=3, mtime=1581816138774, 
atime=1581814760398, blockSize=536870912, blocks=[blk_5568434562_4495417845], 
permissions=da_music:hdfs:rw-r-, aclEntries=null, clientName=, 
clientMachine=, overwrite=false, storagePolicyId=0, opCode=OP_CLOSE, 
txid=32625024993]
java.io.IOException: File is not under construction: ..
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:442)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:237)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:146)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:891)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:872)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:262)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:395)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:348)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:365)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1873)
at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:479)
at