[ https://issues.apache.org/jira/browse/HDFS-15779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276167#comment-17276167 ]
Hongbing Wang commented on HDFS-15779: -------------------------------------- [~ferhui] Thanks for review! From the structural point of view, using *if (targetsStatus[i])* is the best, but I was worried that there would be problems. Because the status of targetsStatus[i] may be changed in _StripedWriter#transferData2Targets_, it will cause targetsStatus[i] and writer[i] to not correspond one to one. Note that they correspond before this. {code:java} // StripedWriter#transferData2Targets int transferData2Targets() { int nSuccess = 0; for (int i = 0; i < targets.length; i++) { if (targetsStatus[i]) { boolean success = false; try { writers[i].transferData2Target(packetBuf); nSuccess++; success = true; } catch (IOException e) { LOG.warn(e.getMessage()); } targetsStatus[i] = success; // may be false here } } return nSuccess; } {code} If _transferData2Target()_ throws IOException, _writer[i]_ may still need to call _clearBuffers_(), I think. Is that so? Thanks again. > EC: fix NPE caused by StripedWriter.clearBuffers during reconstruct block > ------------------------------------------------------------------------- > > Key: HDFS-15779 > URL: https://issues.apache.org/jira/browse/HDFS-15779 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 3.2.0 > Reporter: Hongbing Wang > Assignee: Hongbing Wang > Priority: Major > Attachments: HDFS-15779.001.patch > > > The NullPointerException in DN log as follows: > {code:java} > 2020-12-28 15:49:25,453 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeCommand action: DNA_ERASURE_CODING_RECOVERY > //... > 2020-12-28 15:51:25,551 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Connection timed out > 2020-12-28 15:51:25,553 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Failed to reconstruct striped block: > BP-1922004198-10.83.xx.xx-1515033360950:blk_-9223372036804064064_6311920695 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedWriter.clearBuffers(StripedWriter.java:299) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.clearBuffers(StripedBlockReconstructor.java:139) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:115) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2020-12-28 15:51:25,749 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Receiving > BP-1922004198-10.83.xx.xx-1515033360950:blk_-9223372036799445643_6313197139 > src: /10.83.xxx.52:53198 dest: /10.83.xxx.52:50 > 010 > {code} > NPE occurs at `writer.getTargetBuffer()` in codes: > {code:java} > // StripedWriter#clearBuffers > void clearBuffers() { > for (StripedBlockWriter writer : writers) { > ByteBuffer targetBuffer = writer.getTargetBuffer(); > if (targetBuffer != null) { > targetBuffer.clear(); > } > } > } > {code} > So, why is the writer null? Let's track when the writer is initialized and > when reconstruct() is called, as follows: > {code:java} > // StripedBlockReconstructor#run > public void run() { > try { > initDecoderIfNecessary(); > getStripedReader().init(); > stripedWriter.init(); //① > reconstruct(); //② > stripedWriter.endTargetBlocks(); > } catch (Throwable e) { > LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e); > // ...{code} > They are called at ① and ② above respectively. `stripedWriter.init()` -> > `initTargetStreams()`, as follows: > {code:java} > // StripedWriter#initTargetStreams > int initTargetStreams() { > int nSuccess = 0; > for (short i = 0; i < targets.length; i++) { > try { > writers[i] = createWriter(i); > nSuccess++; > targetsStatus[i] = true; > } catch (Throwable e) { > LOG.warn(e.getMessage()); > } > } > return nSuccess; > } > {code} > NPE occurs when createWriter() gets an exception and 0 < nSuccess < > targets.length. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org