[ https://issues.apache.org/jira/browse/HDFS-15315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17096986#comment-17096986 ]
Ayush Saxena commented on HDFS-15315: ------------------------------------- Some issue with Datanode IBR's? Seems the datanode isn't sending IBR or is being slow, You can check in the Datanode logs. I got the same logs, when pausedIBR for 2 Datanodes. Exception : {code:java} java.io.IOException: Unable to close file because the last block BP-1842875897-127.0.1.1-1588280195530:blk_-9223372036854775792_1001 does not have enough number of replicas. at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:971) at org.apache.hadoop.hdfs.DFSStripedOutputStream.closeImpl(DFSStripedOutputStream.java:1265) {code} LOGS at NN : {noformat} 2020-05-01 02:26:41,867 [IPC Server handler 2 on default port 42203] INFO namenode.FSNamesystem (FSNamesystem.java:checkBlocksComplete(3176)) - BLOCK* blk_-9223372036854775792_1001 is COMMITTED but not COMPLETE(numNodes= 0 < minimum = 2) in file /dir/file 2020-05-01 02:26:42,281 [IPC Server handler 8 on default port 42203] INFO namenode.FSNamesystem (FSNamesystem.java:checkBlocksComplete(3176)) - BLOCK* blk_-9223372036854775792_1001 is COMMITTED but not COMPLETE(numNodes= 1 < minimum = 2) in file /dir/file 2020-05-01 02:26:43,085 [IPC Server handler 1 on default port 42203] INFO namenode.FSNamesystem (FSNamesystem.java:checkBlocksComplete(3176)) - BLOCK* blk_-9223372036854775792_1001 is COMMITTED but not COMPLETE(numNodes= 1 < minimum = 2) in file /dir/file 2020-05-01 02:26:44,688 [IPC Server handler 6 on default port 42203] INFO namenode.FSNamesystem (FSNamesystem.java:checkBlocksComplete(3176)) - BLOCK* blk_-9223372036854775792_1001 is COMMITTED but not COMPLETE(numNodes= 1 < minimum = 2) in file /dir/file 2020-05-01 02:26:47,891 [IPC Server handler 0 on default port 42203] INFO namenode.FSNamesystem (FSNamesystem.java:checkBlocksComplete(3176)) - BLOCK* blk_-9223372036854775792_1001 is COMMITTED but not COMPLETE(numNodes= 1 < minimum = 2) in file /dir/file {noformat} Test Code : {code:java} @Test public void testECIssue() throws IOException { HdfsConfiguration conf = new HdfsConfiguration(); try (MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf).numDataNodes(3).build()) { cluster.waitActive(); final DistributedFileSystem dfs = cluster.getFileSystem(); Path dir = new Path("/dir"); dfs.mkdirs(dir); dfs.enableErasureCodingPolicy("XOR-2-1-1024k"); dfs.setErasureCodingPolicy(dir,"XOR-2-1-1024k"); FSDataOutputStream str = dfs.create(new Path("/dir/file")); for(int i=0; i <1024*1024*4;i++) { str.write(i); } DataNodeTestUtils.pauseIBR(cluster.getDataNodes().get(0)); DataNodeTestUtils.pauseIBR(cluster.getDataNodes().get(1)); str.close(); } } {code} Well usually in normal replicated files, {{dfs.namenode.file.close.num-committed-allowed}} can be used to counter this, to allow closing files with committed file.IIRC we too have it configured in the prod cases. But in EC, this doesn't seems to take affect : {code:java} if (b.isStriped() || i < blocks.length - numCommittedAllowed) { return b + " is " + state + " but not COMPLETE"; } {code} It is there for only replicated files. > IOException on close() when using Erasure Coding > ------------------------------------------------ > > Key: HDFS-15315 > URL: https://issues.apache.org/jira/browse/HDFS-15315 > Project: Hadoop HDFS > Issue Type: Bug > Components: 3.1.1, ec, hdfs > Affects Versions: 3.1.1 > Environment: XOR-2-1-1024k policy on hadoop 3.1.1 with 3 datanodes > Reporter: Anshuman Singh > Priority: Major > > When using Erasure Coding policy on a directory, the replication factor is > set to 1. Solr fails in indexing documents with error - _java.io.IOException: > Unable to close file because the last block does not have enough number of > replicas._ It works fine without EC (with replication factor as 3.) It seems > to be identical to this issue. [ > https://issues.apache.org/jira/browse/HDFS-11486|https://issues.apache.org/jira/browse/HDFS-11486] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org