[jira] [Commented] (HBASE-24713) RS startup with FSHLog throws NPE after HBASE-21751

Anoop Sam John (Jira) Sat, 25 Jul 2020 00:10:20 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-24713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164794#comment-17164794
 ]


Anoop Sam John commented on HBASE-24713:
----------------------------------------

[~ram_krish],  Is it caused by HBASE-21751?  Seeing the patch there , ya in 
branch-2.1, this patch only moved the rollWriter() from the constructor to the 
init.  So the call happens after the disruptor is started.
But seeing the branch-2.2+ patches there, looks like it just added a catch at 
FSHLog create and do RS abort in case of exception.  So at least there, this 
move of rollWriter happened as part of some other jira.  So the actual reason 
for NPE is the move of the rollWriter correct. 
rollWriter() will happen as part of init.  This API only called for creating 
the initial writer itself. As part of rollWriter()'s replaceWriter() call, we 
will try attain a safe point and that include a sycn call.. Previously this 
sync call was not happening because the roll call was on constructor and by 
then ringBufferEventHandler object in FSHLog was null. So because of that there 
is no waitSafePoint call needed and so no sync call..  Now we delayed the call 
to roll as it is moved to init which is called after ringBufferEventHandler 
been initialized.  
The null is fine.  Or else we could have added some thing like writer check 
while trying for attain safe point
Now
{code}
SafePointZigZagLatch zigzagLatch = null;
    long sequence = -1L;
    if (this.ringBufferEventHandler != null) {
      sequence = getSequenceOnRingBuffer();
      zigzagLatch = this.ringBufferEventHandler.attainSafePoint();
    }
    afterCreatingZigZagLatch();
    try {
      try {
        if (zigzagLatch != null) {
          assert sequence > 0L : "Failed to get sequence from ring buffer";
          TraceUtil.addTimelineAnnotation("awaiting safepoint");
          syncFuture = 
zigzagLatch.waitSafePoint(publishSyncOnRingBuffer(sequence, false));
        }
{code}
publishSyncOnRingBuffer -> Only causing this sync call and so a run by 
SyncerThread
We can add
 if ( this.writer != null && this.ringBufferEventHandler != null) {

This will be good addition I believe.  This null check is ok only.

> RS startup with FSHLog throws NPE after HBASE-21751
> ---------------------------------------------------
>
>                 Key: HBASE-24713
>                 URL: https://issues.apache.org/jira/browse/HBASE-24713
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 2.1.6
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: Gaurav Kanade
>            Priority: Minor
>
> Every RS startup creates this NPE
> {code}
> [sync.1] wal.FSHLog: UNEXPECTED
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:582)
>         at java.lang.Thread.run(Thread.java:748)
> 2020-07-07 10:51:23,208 WARN  [regionserver/xxxxx:16020] wal.FSHLog: Failed 
> sync-before-close but no outstanding appends; closing 
> WALjava.lang.NullPointerException
> {code}
> the reason is that the Disruptor frameworks starts the Syncrunner thread but 
> the init of the writer happens after that. A simple null check in the 
> Syncrunner will help here .
> No major damage happens though since we handle Throwable Exception. It will 
> good to solve this. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-24713) RS startup with FSHLog throws NPE after HBASE-21751

Reply via email to