[ https://issues.apache.org/jira/browse/HBASE-24625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
chenglei updated HBASE-24625: ----------------------------- Release Note: We add a method {{getSyncedLength}} in {{WALProvider.WriterBase}} interface for {{WALFileLengthProvider}} used for replication, considering the case if we use {{AsyncFSWAL}},we write to 3 DNs concurrently,according to the visibility guarantee of HDFS, the data will be available immediately when arriving at DN since all the DNs will be considered as the last one in pipeline.This means replication may read uncommitted data and replicate it to the remote cluster and cause data inconsistency.The method {{WriterBase#getLength}} may return length which just in hdfs client buffer and not successfully synced to HDFS, so we use this method {{WriterBase#getSyncedLength}} to return the length successfully synced to HDFS and replication thread could only read writing WAL file limited by this length. see also HBASE-14004 and this document for more details: https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit# Before this patch, replication may read uncommitted data and replicate it to the slave cluster and cause data inconsistency between master and slave cluster, we could use {{FSHLog}} instead of {{AsyncFSWAL}} to reduce probability of inconsistency without this patch. Status: Patch Available (was: Open) > AsyncFSWAL.getLogFileSizeIfBeingWritten does not return the expected synced > file length. > ---------------------------------------------------------------------------------------- > > Key: HBASE-24625 > URL: https://issues.apache.org/jira/browse/HBASE-24625 > Project: HBase > Issue Type: Bug > Components: Replication, wal > Affects Versions: 2.2.0, 2.0.0, 2.1.0, 2.3.0 > Reporter: chenglei > Priority: Critical > Fix For: 3.0.0-alpha-1, 2.3.1, 2.2.6 > > > By HBASE-14004, we introduce {{WALFileLengthProvider}} interface to keep the > current writing wal file length by ourselves, {{WALEntryStream}} used by > {{ReplicationSourceWALReader}} could only read WAL file byte size <= > {{WALFileLengthProvider.getLogFileSizeIfBeingWritten}} if the WAL file is > current been writing on the same RegionServer . > {{AsyncFSWAL}} implements {{WALFileLengthProvider}} by > {{AbstractFSWAL.getLogFileSizeIfBeingWritten}}, just as folllows : > {code:java} > public OptionalLong getLogFileSizeIfBeingWritten(Path path) { > rollWriterLock.lock(); > try { > Path currentPath = getOldPath(); > if (path.equals(currentPath)) { > W writer = this.writer; > return writer != null ? OptionalLong.of(writer.getLength()) : > OptionalLong.empty(); > } else { > return OptionalLong.empty(); > } > } finally { > rollWriterLock.unlock(); > } > } > {code} > For {{AsyncFSWAL}}, above {{AsyncFSWAL.writer}} is > {{AsyncProtobufLogWriter}} ,and {{AsyncProtobufLogWriter.getLength}} is as > follows: > {code:java} > public long getLength() { > return length.get(); > } > {code} > But for {{AsyncProtobufLogWriter}}, any append method may increase the above > {{AsyncProtobufLogWriter.length}}, especially for following > {{AsyncFSWAL.append}} > method just appending the {{WALEntry}} to > {{FanOutOneBlockAsyncDFSOutput.buf}}: > {code:java} > public void append(Entry entry) { > int buffered = output.buffered(); > try { > entry.getKey(). > > getBuilder(compressor).setFollowingKvCount(entry.getEdit().size()).build() > .writeDelimitedTo(asyncOutputWrapper); > } catch (IOException e) { > throw new AssertionError("should not happen", e); > } > > try { > for (Cell cell : entry.getEdit().getCells()) { > cellEncoder.write(cell); > } > } catch (IOException e) { > throw new AssertionError("should not happen", e); > } > length.addAndGet(output.buffered() - buffered); > } > {code} > That is to say, {{AsyncFSWAL.getLogFileSizeIfBeingWritten}} could not reflect > the file length which successfully synced to underlying HDFS, which is not > as expected. -- This message was sent by Atlassian Jira (v8.3.4#803005)