oneby-wang commented on PR #4607:
URL: https://github.com/apache/bookkeeper/pull/4607#issuecomment-3228330508
Hi, guys, I think I figured it out, the key point is the following code in
this pull request, although setting logfileHeader to zeros is also very
important. I'll explain later.
```java
if (header.ledgersCount == 0) {
throw new IOException("No ledgers map found in entryLogId " + entryLogId
+ ", do scan to double confirm");
}
```
I readed the journal redo code, I found that bookeeper redo will create a
new entryLog file, instead of using the old entryLog file after bookie crashed.
So, the old entryLog file still has the uncorrect logFileHeader(not updated by
BufferedLogChannel.appendLedgersMap() method).
In DefaultEntryLogger, if getEntryLogMetadata() method returns empty
entryLogMeta, the GC thread will delete the entryLog file, which may contain
some ledgers can't be deleted.
```java
protected void extractMetaFromEntryLogs() throws
EntryLogMetadataMapException {
// ...
// Read through the entry log file and extract the entry log meta
EntryLogMetadata entryLogMeta =
entryLogger.getEntryLogMetadata(entryLogId, throttler);
removeIfLedgerNotExists(entryLogMeta);
if (entryLogMeta.isEmpty()) {
// This means the entry log is not associated with any active
// ledgers anymore.
// We can remove this entry log file now.
LOG.info("Deleting entryLogId {} as it has no active ledgers!",
entryLogId);
if (removeEntryLog(entryLogId)) {
gcStats.getReclaimedSpaceViaDeletes().addCount(entryLogMeta.getTotalSize());
} else {
gcStats.getReclaimFailedToDelete().inc();
}
}
```
How does this happen? here is the old unfixed code. If
logfileHeader.ledgersMapOffset != 0(random value, logfileHeader is
uninitialized), and logfileHeader.ledgersCount == 0(random value, logfileHeader
is uninitialized), and in while loop read ledgersMapSize < 0(some lucky value
in entryLog file random location, interesting huh), the while loop is breaked
out, and the entryLogMeta is empty and returned to the caller method. I think
we should throw an IOException when ledgersMapSize < 0 instead of breaking out
the while loop.
```java
EntryLogMetadata extractEntryLogMetadataFromIndex(long entryLogId) throws
IOException {
Header header = getHeaderForLogId(entryLogId);
if (header.version < HEADER_V1) {
throw new IOException("Old log file header without ledgers map on
entryLogId " + entryLogId);
}
if (header.ledgersMapOffset == 0L) {
// The index was not stored in the log file (possibly because the
bookie crashed before flushing it)
throw new IOException("No ledgers map index found on entryLogId " +
entryLogId);
}
if (LOG.isDebugEnabled()) {
LOG.debug("Recovering ledgers maps for log {} at offset: {}",
entryLogId, header.ledgersMapOffset);
}
BufferedReadChannel bc = getChannelForLogId(entryLogId);
// There can be multiple entries containing the various components of
the serialized ledgers map
long offset = header.ledgersMapOffset;
EntryLogMetadata meta = new EntryLogMetadata(entryLogId);
final int maxMapSize = LEDGERS_MAP_HEADER_SIZE + LEDGERS_MAP_ENTRY_SIZE
* LEDGERS_MAP_MAX_BATCH_SIZE;
ByteBuf ledgersMap = allocator.directBuffer(maxMapSize);
try {
while (offset < bc.size()) {
// Read ledgers map size
sizeBuffer.get().clear();
bc.read(sizeBuffer.get(), offset);
int ledgersMapSize = sizeBuffer.get().readInt();
if (ledgersMapSize <= 0) {
break;
}
// Read the index into a buffer
ledgersMap.clear();
bc.read(ledgersMap, offset + 4, ledgersMapSize);
// Discard ledgerId and entryId
long lid = ledgersMap.readLong();
if (lid != INVALID_LID) {
throw new IOException("Cannot deserialize ledgers map from
ledger " + lid);
}
long entryId = ledgersMap.readLong();
if (entryId != LEDGERS_MAP_ENTRY_ID) {
throw new IOException("Cannot deserialize ledgers map from
entryId " + entryId);
}
// Read the number of ledgers in the current entry batch
int ledgersCount = ledgersMap.readInt();
// Extract all (ledger,size) tuples from buffer
for (int i = 0; i < ledgersCount; i++) {
long ledgerId = ledgersMap.readLong();
long size = ledgersMap.readLong();
if (LOG.isDebugEnabled()) {
LOG.debug("Recovering ledgers maps for log {} -- Found
ledger: {} with size: {}",
entryLogId, ledgerId, size);
}
meta.addLedgerSize(ledgerId, size);
}
if (ledgersMap.isReadable()) {
throw new IOException("Invalid entry size when reading
ledgers map");
}
// Move to next entry, if any
offset += ledgersMapSize + 4;
}
} catch (IndexOutOfBoundsException e) {
throw new IOException(e);
} finally {
ReferenceCountUtil.release(ledgersMap);
}
if (meta.getLedgersMap().size() != header.ledgersCount) {
throw new IOException("Not all ledgers were found in ledgers map
index. expected: " + header.ledgersCount
+ " -- found: " + meta.getLedgersMap().size() + " --
entryLogId: " + entryLogId);
}
return meta;
}
```
If extractEntryLogMetadataFromIndex() method throw a Throwable,
getEntryLogMetadata() method will fall back to
extractEntryLogMetadataByScanning() method, which will skip logfileHeader, and
scan the rest part of entryLog file.
```java
public EntryLogMetadata getEntryLogMetadata(long entryLogId,
AbstractLogCompactor.Throttler throttler)
throws IOException {
// First try to extract the EntryLogMetadata from the index, if there's
no index then fallback to scanning the
// entry log
try {
return extractEntryLogMetadataFromIndex(entryLogId);
} catch (FileNotFoundException fne) {
LOG.warn("Cannot find entry log file {}.log : {}",
Long.toHexString(entryLogId), fne.getMessage());
throw fne;
} catch (Throwable e) {
LOG.info("Failed to get ledgers map index from: {}.log : {}",
entryLogId, e.getMessage());
// Fall-back to scanning
return extractEntryLogMetadataByScanning(entryLogId, throttler);
}
}
```
I think the key point is the entryLog file lacks metadata validation, eg.
CRC validation, so we can't figure it out it's completed with flush or
corrupted due to crash?
And I think the header.ledgersCount == 0 validation code should be writen
after header.ledgersMapOffset validation(just for readability).
```java
if (header.ledgersMapOffset == 0L) {
// The index was not stored in the log file (possibly because the bookie
crashed before flushing it)
throw new IOException("No ledgers map index found on entryLogId " +
entryLogId);
}
if (header.ledgersCount == 0) {
throw new IOException("No ledgers map found in entryLogId " + entryLogId
+ ", do scan to double confirm");
}
```
If bookie can't be shutdown gracefully, eg. bookie instance crashes, the old
entryLog file may be deleted. So, what can we do in old version bookie after
crash? First, don't start up bookie instance directly, backup your data, andd
then use some hex editor to set ledgersMapOffset to zero, then start up your
bookie instance, the GC thread will fall back to
extractEntryLogMetadataByScanning() method, which will not delete the entryLog
file.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]