[ https://issues.apache.org/jira/browse/HBASE-24625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152250#comment-17152250 ]
Michael Stack commented on HBASE-24625: --------------------------------------- I'd have reopened it because it is failing branch-2. See [https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests/job/branch-2/lastSuccessfulBuild/artifact/dashboard.html] See the bottom half of the screen where replication.regionserver.TestWALEntryStream fails since #6494. Here is what happens when I try test locally: {code:java} [INFO] [INFO] Results: [INFO] [ERROR] Errors: [ERROR] org.apache.hadoop.hbase.replication.regionserver.TestWALEntryStream.null [ERROR] Run 1: TestWALEntryStream.testReplicationSourceWALReaderRecovered:442 » TestTimedOut ... [ERROR] Run 2: TestWALEntryStream » Appears to be stuck in thread AsyncFSWAL-1-1 [INFO] [ERROR] TestWALEntryStream.testReplicationSourceWALReaderRecovered:442 » Interrupted [INFO] [ERROR] Tests run: 4, Failures: 0, Errors: 2, Skipped: 0 {code} Will try and take a look later... > AsyncFSWAL.getLogFileSizeIfBeingWritten does not return the expected synced > file length. > ---------------------------------------------------------------------------------------- > > Key: HBASE-24625 > URL: https://issues.apache.org/jira/browse/HBASE-24625 > Project: HBase > Issue Type: Bug > Components: Replication, wal > Affects Versions: 2.1.0, 2.0.0, 2.2.0, 2.3.0 > Reporter: chenglei > Assignee: chenglei > Priority: Critical > Fix For: 3.0.0-alpha-1, 2.3.0, 2.2.6 > > > By HBASE-14004, we introduce {{WALFileLengthProvider}} interface to keep the > current writing wal file length by ourselves, {{WALEntryStream}} used by > {{ReplicationSourceWALReader}} could only read WAL file byte size <= > {{WALFileLengthProvider.getLogFileSizeIfBeingWritten}} if the WAL file is > current been writing on the same RegionServer . > {{AsyncFSWAL}} implements {{WALFileLengthProvider}} by > {{AbstractFSWAL.getLogFileSizeIfBeingWritten}}, just as folllows : > {code:java} > public OptionalLong getLogFileSizeIfBeingWritten(Path path) { > rollWriterLock.lock(); > try { > Path currentPath = getOldPath(); > if (path.equals(currentPath)) { > W writer = this.writer; > return writer != null ? OptionalLong.of(writer.getLength()) : > OptionalLong.empty(); > } else { > return OptionalLong.empty(); > } > } finally { > rollWriterLock.unlock(); > } > } > {code} > For {{AsyncFSWAL}}, above {{AsyncFSWAL.writer}} is > {{AsyncProtobufLogWriter}} ,and {{AsyncProtobufLogWriter.getLength}} is as > follows: > {code:java} > public long getLength() { > return length.get(); > } > {code} > But for {{AsyncProtobufLogWriter}}, any append method may increase the above > {{AsyncProtobufLogWriter.length}}, especially for following > {{AsyncFSWAL.append}} > method just appending the {{WALEntry}} to > {{FanOutOneBlockAsyncDFSOutput.buf}}: > {code:java} > public void append(Entry entry) { > int buffered = output.buffered(); > try { > entry.getKey(). > > getBuilder(compressor).setFollowingKvCount(entry.getEdit().size()).build() > .writeDelimitedTo(asyncOutputWrapper); > } catch (IOException e) { > throw new AssertionError("should not happen", e); > } > > try { > for (Cell cell : entry.getEdit().getCells()) { > cellEncoder.write(cell); > } > } catch (IOException e) { > throw new AssertionError("should not happen", e); > } > length.addAndGet(output.buffered() - buffered); > } > {code} > That is to say, {{AsyncFSWAL.getLogFileSizeIfBeingWritten}} could not reflect > the file length which successfully synced to underlying HDFS, which is not > as expected. -- This message was sent by Atlassian Jira (v8.3.4#803005)