[ 
https://issues.apache.org/jira/browse/HDFS-15779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongbing Wang updated HDFS-15779:
---------------------------------
    Description: 
The NullPointerException in DN log as follows: 
{code:java}
2020-12-28 15:49:25,453 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeCommand action: DNA_ERASURE_CODING_RECOVERY
//...
2020-12-28 15:51:25,551 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
Connection timed out
2020-12-28 15:51:25,553 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
Failed to reconstruct striped block: 
BP-1922004198-10.83.xx.xx-1515033360950:blk_-9223372036804064064_6311920695
java.lang.NullPointerException
        at 
org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedWriter.clearBuffers(StripedWriter.java:299)
        at 
org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.clearBuffers(StripedBlockReconstructor.java:139)
        at 
org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:115)
        at 
org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
2020-12-28 15:51:25,749 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Receiving 
BP-1922004198-10.83.xx.xx-1515033360950:blk_-9223372036799445643_6313197139 
src: /10.83.xxx.52:53198 dest: /10.83.xxx.52:50
010
{code}
NPE occurs at `writer.getTargetBuffer()` in codes:
{code:java}
// StripedWriter#clearBuffers
void clearBuffers() {
  for (StripedBlockWriter writer : writers) {
    ByteBuffer targetBuffer = writer.getTargetBuffer();
    if (targetBuffer != null) {
      targetBuffer.clear();
    }
  }
}
{code}
So, why is the writer null? Let's track when the writer is initialized and when 
reconstruct() is called,  as follows:
{code:java}
// StripedBlockReconstructor#run
public void run() {
  try {
    initDecoderIfNecessary();

    getStripedReader().init();

    stripedWriter.init();  //①

    reconstruct();  //②

    stripedWriter.endTargetBlocks();
  } catch (Throwable e) {
    LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e);
    // ...{code}
They are called at ① and ② above respectively. `stripedWriter.init()` -> 
`initTargetStreams()`, as follows:
{code:java}
// StripedWriter#initTargetStreams
int initTargetStreams() {
  int nSuccess = 0;
  for (short i = 0; i < targets.length; i++) {
    try {
      writers[i] = createWriter(i);
      nSuccess++;
      targetsStatus[i] = true;
    } catch (Throwable e) {
      LOG.warn(e.getMessage());
    }
  }
  return nSuccess;
}
{code}
NPE occurs when createWriter(i) gets an exception and  0 < nSuccess < 
targets.length. 

  was:
The NullPointerException in DN log as follows: 
{code:java}
2020-12-28 15:49:25,453 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeCommand action: DNA_ERASURE_CODING_RECOVERY
//...
2020-12-28 15:51:25,551 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
Connection timed out
2020-12-28 15:51:25,553 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
Failed to reconstruct striped block: 
BP-1922004198-10.83.xx.xx-1515033360950:blk_-9223372036804064064_6311920695
java.lang.NullPointerException
        at 
org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedWriter.clearBuffers(StripedWriter.java:299)
        at 
org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.clearBuffers(StripedBlockReconstructor.java:139)
        at 
org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:115)
        at 
org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
2020-12-28 15:51:25,749 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Receiving 
BP-1922004198-10.83.xx.xx-1515033360950:blk_-9223372036799445643_6313197139 
src: /10.83.xxx.52:53198 dest: /10.83.xxx.52:50
010
{code}
NPE occurs at `writer.getTargetBuffer()` in codes:
{code:java}
void clearBuffers() {
  for (StripedBlockWriter writer : writers) {
    ByteBuffer targetBuffer = writer.getTargetBuffer();
    if (targetBuffer != null) {
      targetBuffer.clear();
    }
  }
}
{code}
So, why is the writer null? Let's track when the writer is initialized and when 
reconstruct() is called,  as follows:
{code:java}
// StripedBlockReconstructor#run
public void run() {
  try {
    initDecoderIfNecessary();

    getStripedReader().init();

    stripedWriter.init();  //①

    reconstruct();  //②

    stripedWriter.endTargetBlocks();
  } catch (Throwable e) {
    LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e);
    // ...{code}
They are called at ① and ② above respectively. `stripedWriter.init()` -> 
`initTargetStreams()`, as follows:

 

and `writers[i] = createWriter(i)`

`

 

 


> EC: fix NPE caused by StripedWriter.clearBuffers during reconstruct block
> -------------------------------------------------------------------------
>
>                 Key: HDFS-15779
>                 URL: https://issues.apache.org/jira/browse/HDFS-15779
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.2.0
>            Reporter: Hongbing Wang
>            Assignee: Hongbing Wang
>            Priority: Major
>
> The NullPointerException in DN log as follows: 
> {code:java}
> 2020-12-28 15:49:25,453 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeCommand action: DNA_ERASURE_CODING_RECOVERY
> //...
> 2020-12-28 15:51:25,551 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Connection timed out
> 2020-12-28 15:51:25,553 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Failed to reconstruct striped block: 
> BP-1922004198-10.83.xx.xx-1515033360950:blk_-9223372036804064064_6311920695
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedWriter.clearBuffers(StripedWriter.java:299)
>         at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.clearBuffers(StripedBlockReconstructor.java:139)
>         at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:115)
>         at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> 2020-12-28 15:51:25,749 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Receiving 
> BP-1922004198-10.83.xx.xx-1515033360950:blk_-9223372036799445643_6313197139 
> src: /10.83.xxx.52:53198 dest: /10.83.xxx.52:50
> 010
> {code}
> NPE occurs at `writer.getTargetBuffer()` in codes:
> {code:java}
> // StripedWriter#clearBuffers
> void clearBuffers() {
>   for (StripedBlockWriter writer : writers) {
>     ByteBuffer targetBuffer = writer.getTargetBuffer();
>     if (targetBuffer != null) {
>       targetBuffer.clear();
>     }
>   }
> }
> {code}
> So, why is the writer null? Let's track when the writer is initialized and 
> when reconstruct() is called,  as follows:
> {code:java}
> // StripedBlockReconstructor#run
> public void run() {
>   try {
>     initDecoderIfNecessary();
>     getStripedReader().init();
>     stripedWriter.init();  //①
>     reconstruct();  //②
>     stripedWriter.endTargetBlocks();
>   } catch (Throwable e) {
>     LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e);
>     // ...{code}
> They are called at ① and ② above respectively. `stripedWriter.init()` -> 
> `initTargetStreams()`, as follows:
> {code:java}
> // StripedWriter#initTargetStreams
> int initTargetStreams() {
>   int nSuccess = 0;
>   for (short i = 0; i < targets.length; i++) {
>     try {
>       writers[i] = createWriter(i);
>       nSuccess++;
>       targetsStatus[i] = true;
>     } catch (Throwable e) {
>       LOG.warn(e.getMessage());
>     }
>   }
>   return nSuccess;
> }
> {code}
> NPE occurs when createWriter(i) gets an exception and  0 < nSuccess < 
> targets.length. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to