nsnycde commented on PR #4607:
URL: https://github.com/apache/bookkeeper/pull/4607#issuecomment-3233468998

   > > Hi, I'm wondering what situation will cause data loss(caused by wrong 
file header)? I read the source code, it seems the file header is updated by 
BufferedLogChannel.appendLedgersMap() method.
   > 
   > @oneby-wang @nsnycde I guess that the file might be read by GC before that 
happens. This PR addresses the issue that there's garbage in the initial 
header. The original assumption has been that the header would be initialized 
with zeros. However, that assumption didn't hold with Netty's optimized array 
allocation as explained in [#4607 
(comment)](https://github.com/apache/bookkeeper/pull/4607#issuecomment-2918429558)
 . If there's an invalid header in the entry log file, it could get deleted in 
GC instead of skipped. There's also the chance that the header just happens to 
contain data the gets interpreted as valid data in GC. At least this is how I 
understood the problem and the fix.
   
   When bk is running normally, gc should not read unflushed logs. You can see 
that the loop in the code below is entryLogger.getFlushedLogIds()
   ```
   
//org.apache.bookkeeper.bookie.GarbageCollectorThread#extractMetaFromEntryLogs
   
   protected void extractMetaFromEntryLogs() throws 
EntryLogMetadataMapException {
           for (long entryLogId : entryLogger.getFlushedLogIds()) {
               // Comb the current entry log file if it has not already been 
extracted.
               if (entryLogMetaMap.containsKey(entryLogId)) {
                   continue;
               }
   
               // check whether log file exists or not
               // if it doesn't exist, this log file might have been garbage 
collected.
               if (!entryLogger.logExists(entryLogId)) {
                   continue;
               }
   
               LOG.info("Extracting entry log meta from entryLogId: {}", 
entryLogId);
   
               try {
                   // Read through the entry log file and extract the entry log 
meta
                   EntryLogMetadata entryLogMeta = 
entryLogger.getEntryLogMetadata(entryLogId, throttler);
                   removeIfLedgerNotExists(entryLogMeta);
                   if (entryLogMeta.isEmpty()) {
                       // This means the entry log is not associated with any 
active
                       // ledgers anymore.
                       // We can remove this entry log file now.
                       LOG.info("Deleting entryLogId {} as it has no active 
ledgers!", entryLogId);
                       removeEntryLog(entryLogId);
                       
gcStats.getReclaimedSpaceViaDeletes().addCount(entryLogMeta.getTotalSize());
                   } else {
                       entryLogMetaMap.put(entryLogId, entryLogMeta);
                   }
               } catch (IOException | RuntimeException e) {
                   LOG.warn("Premature exception when processing " + entryLogId
                            + " recovery will take care of the problem", e);
               } catch (OutOfMemoryError oome) {
                   LOG.warn("OutOfMemoryError when processing {} - skipping the 
entry log", entryLogId, oome);
               }
           }
       }
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to