[ 
https://issues.apache.org/jira/browse/HBASE-24625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152610#comment-17152610
 ] 

chenglei edited comment on HBASE-24625 at 7/7/20, 10:40 AM:
------------------------------------------------------------

[~ndimiduk] [~stack] [~zhangduo], 
I think the  {{TestWALEntryStream}} error in branch 2.x  is  caused by  
following {{AsyncFSWAL.doShutdown}} method of branch-2.x, which is different 
from the master:
{code:java}
711    @Override
712    protected void doShutdown() throws IOException {
713           waitForSafePoint();
714           closeWriter();
715           closeExecutor.shutdown();
{code} 

After we invoke {{closeWriter()}}, we do not set the {{writer}} to null, which 
causes {{AbstractFSWAL.getLogFileSizeIfBeingWritten}} continues to invoke 
{{AsyncProtobufLogWriter.getSyncedLength}} in following line 1017 after the 
{{AsyncProtobufLogWriter}} is close.
{code:java}
1010    @Override
1011    public OptionalLong getLogFileSizeIfBeingWritten(Path path) {
1012        rollWriterLock.lock();
1013        try {
1014           Path currentPath = getOldPath();
1015           if (path.equals(currentPath)) {
1016               W writer = this.writer;
1017               return writer != null ? 
OptionalLong.of(writer.getSyncedLength()) : OptionalLong.empty();
1018        } else {
1019             return OptionalLong.empty();
1020        }
1021    } finally {
1022        rollWriterLock.unlock();
1023    }
1024  }
{code}

Why  {{TestWALEntryStream}} on the master is passed? That is because the  
{{AsyncFSWAL.doShutdown}} code  in the master set the {{writer}} to null after  
{{closeWriter()}} in
following line 780:
{code:java}
776     @Override
777     protected void doShutdown() throws IOException {
778          waitForSafePoint();
779          closeWriter(this.writer);
780          this.writer = null;
781          closeExecutor.shutdown();
{code}

The difference of {{{AsyncFSWAL.doShutdown}} is caused by HBASE-20456, which is 
a Synchronous replication patch did not applied to branch 2.x .

BTW.
After we set set the {{writer}} to null after  {{closeWriter()}} , the end of 
{{TestWALEntryStream.testReplicationSourceWALReaderRecovered}} should be 
modified as  master by
 HBASE-20456  :
{code:java}
  @@ -413,9 +413,7 @@ public class TestWALEntryStream {
     batch = reader.take();
     assertEquals(walPath, batch.getLastWalPath());
     assertEquals(5, batch.getNbEntries());
-    // Actually this should be true but we haven't handled this yet since for 
a normal queue the
-    // last one is always open... Not a big deal for now.
-    assertFalse(batch.isEndOfFile());
+    assertTrue(batch.isEndOfFile());
 
     assertSame(WALEntryBatch.NO_MORE_DATA, reader.take());
   }
{code}









was (Author: comnetwork):
[~ndimiduk] [~stack] [~zhangduo], 
I think the  {{TestWALEntryStream}} error in branch 2.x  is  caused by  
following {{AsyncFSWAL.doShutdown}} method of branch-2.x, which is different 
from the master:
{code:java}
711    @Override
712    protected void doShutdown() throws IOException {
713           waitForSafePoint();
714           closeWriter();
715           closeExecutor.shutdown();
{code} 

After we invoke {{closeWriter()}}, we do not set the {{writer}} to null, which 
causes {{AbstractFSWAL.getLogFileSizeIfBeingWritten}} continues to invoke 
{{AsyncProtobufLogWriter.getSyncedLength}} in following line 1016 after the 
{{AsyncProtobufLogWriter}} is close.
{code:java}
1010    @Override
1011    public OptionalLong getLogFileSizeIfBeingWritten(Path path) {
1012        rollWriterLock.lock();
1013        try {
1014           Path currentPath = getOldPath();
1015           if (path.equals(currentPath)) {
1016               W writer = this.writer;
1017               return writer != null ? 
OptionalLong.of(writer.getSyncedLength()) : OptionalLong.empty();
1018        } else {
1019             return OptionalLong.empty();
1020        }
1021    } finally {
1022        rollWriterLock.unlock();
1023    }
1024  }
{code}

Why  {{TestWALEntryStream}} on the master is passed? That is because the  
{{AsyncFSWAL.doShutdown}} code  in the master set the {{writer}} to null after  
{{closeWriter()}} in
following line 780:
{code:java}
776     @Override
777     protected void doShutdown() throws IOException {
778          waitForSafePoint();
779          closeWriter(this.writer);
780          this.writer = null;
781          closeExecutor.shutdown();
{code}

The difference of {{{AsyncFSWAL.doShutdown}} is caused by HBASE-20456, which is 
a Synchronous replication patch did not applied to branch 2.x .

BTW.
After we set set the {{writer}} to null after  {{closeWriter()}} , the end of 
{{TestWALEntryStream.testReplicationSourceWALReaderRecovered}} should be 
modified as  master by
 HBASE-20456  :
{code:java}
  @@ -413,9 +413,7 @@ public class TestWALEntryStream {
     batch = reader.take();
     assertEquals(walPath, batch.getLastWalPath());
     assertEquals(5, batch.getNbEntries());
-    // Actually this should be true but we haven't handled this yet since for 
a normal queue the
-    // last one is always open... Not a big deal for now.
-    assertFalse(batch.isEndOfFile());
+    assertTrue(batch.isEndOfFile());
 
     assertSame(WALEntryBatch.NO_MORE_DATA, reader.take());
   }
{code}








> AsyncFSWAL.getLogFileSizeIfBeingWritten does not return the expected synced 
> file length.
> ----------------------------------------------------------------------------------------
>
>                 Key: HBASE-24625
>                 URL: https://issues.apache.org/jira/browse/HBASE-24625
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication, wal
>    Affects Versions: 2.1.0, 2.0.0, 2.2.0, 2.3.0
>            Reporter: chenglei
>            Assignee: chenglei
>            Priority: Critical
>             Fix For: 3.0.0-alpha-1, 2.3.1, 2.2.6
>
>
> By HBASE-14004, we introduce {{WALFileLengthProvider}} interface to keep the 
> current writing wal file length  by ourselves,  {{WALEntryStream}} used by 
> {{ReplicationSourceWALReader}} could only read WAL file byte size <= 
> {{WALFileLengthProvider.getLogFileSizeIfBeingWritten}} if the WAL file is 
> current been writing on the same RegionServer .
> {{AsyncFSWAL}} implements {{WALFileLengthProvider}}  by 
> {{AbstractFSWAL.getLogFileSizeIfBeingWritten}}, just as folllows :
> {code:java}
>    public OptionalLong getLogFileSizeIfBeingWritten(Path path) {
>     rollWriterLock.lock();
>     try {
>       Path currentPath = getOldPath();
>       if (path.equals(currentPath)) {
>         W writer = this.writer;
>         return writer != null ? OptionalLong.of(writer.getLength()) : 
> OptionalLong.empty();
>       } else {
>         return OptionalLong.empty();
>       }
>     } finally {
>       rollWriterLock.unlock();
>     }
>   }
> {code}
> For {{AsyncFSWAL}},  above {{AsyncFSWAL.writer}}  is 
> {{AsyncProtobufLogWriter}} ,and {{AsyncProtobufLogWriter.getLength}}  is as 
> follows:
> {code:java}
>     public long getLength() {
>         return length.get();
>     }
> {code}
> But for {{AsyncProtobufLogWriter}}, any append method may increase the above 
> {{AsyncProtobufLogWriter.length}}, especially for following 
> {{AsyncFSWAL.append}}
> method just appending the {{WALEntry}} to 
> {{FanOutOneBlockAsyncDFSOutput.buf}}:
> {code:java}
>      public void append(Entry entry) {
>           int buffered = output.buffered();
>           try {
>               entry.getKey().
>               
> getBuilder(compressor).setFollowingKvCount(entry.getEdit().size()).build()
>               .writeDelimitedTo(asyncOutputWrapper);
>           } catch (IOException e) {
>              throw new AssertionError("should not happen", e);
>           }
>    
>         try {
>            for (Cell cell : entry.getEdit().getCells()) {
>              cellEncoder.write(cell);
>            }
>           } catch (IOException e) {
>            throw new AssertionError("should not happen", e);
>          }
>          length.addAndGet(output.buffered() - buffered);
>      }
> {code}
> That is to say, {{AsyncFSWAL.getLogFileSizeIfBeingWritten}} could not reflect 
> the  file length which successfully synced to underlying HDFS, which is not 
> as expected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to