chenhongSZ opened a new issue, #4036: URL: https://github.com/apache/bookkeeper/issues/4036
**BUG REPORT** ***Describe the bug*** Bookeeper version : 4.14.7 The following exception occurred in our production environment,and Bookie process shutdown. I searched for the issue and did not find any similar errors <img width="1411" alt="企业微信截图_01d6ed89-c87d-42dc-b1b9-28c822e14d53" src="https://github.com/apache/bookkeeper/assets/18002867/a3b1d3c3-ad94-44d8-8645-e9362e2e4cf2"> We added some logs and found that when replicate a certain ledger, this error is reported the ledger metadata here below <img width="1217" alt="企业微信截图_200ebf21-35ba-40bb-92ce-fcae2d7a9dfe" src="https://github.com/apache/bookkeeper/assets/18002867/0cb19592-cdbe-4b39-aec7-da872e211de5"> we do some dig into codes but I'm not sure if there's a problem here ` org.apache.bookkeeper.client.LedgerFragmentReplicator#splitIntoSubFragments /** * Split the full fragment into batched entry fragments by keeping * rereplicationEntryBatchSize of entries in each one and can treat them as * sub fragments. */ static Set<LedgerFragment> splitIntoSubFragments(LedgerHandle lh, LedgerFragment ledgerFragment, long rereplicationEntryBatchSize) { Set<LedgerFragment> fragments = new HashSet<LedgerFragment>(); if (rereplicationEntryBatchSize <= 0) { // rereplicationEntryBatchSize can not be 0 or less than 0, // returning with the current fragment fragments.add(ledgerFragment); return fragments; } long firstEntryId = ledgerFragment.getFirstStoredEntryId(); long lastEntryId = ledgerFragment.getLastStoredEntryId(); /* * if firstEntryId is INVALID_ENTRY_ID then lastEntryId should be * INVALID_ENTRY_ID and viceversa. */ if (firstEntryId == INVALID_ENTRY_ID ^ lastEntryId == INVALID_ENTRY_ID) { LOG.error("For LedgerFragment: {}, seeing inconsistent firstStoredEntryId: {} and lastStoredEntryId: {}", ledgerFragment, firstEntryId, lastEntryId); assert false; } long numberOfEntriesToReplicate = (lastEntryId - firstEntryId) + 1; long splitsWithFullEntries = numberOfEntriesToReplicate / rereplicationEntryBatchSize; if (splitsWithFullEntries == 0) {// only one fragment fragments.add(ledgerFragment); return fragments; } long fragmentSplitLastEntry = 0; for (int i = 0; i < splitsWithFullEntries; i++) { fragmentSplitLastEntry = (firstEntryId + rereplicationEntryBatchSize) - 1; fragments.add(new LedgerFragment(lh, firstEntryId, fragmentSplitLastEntry, ledgerFragment.getBookiesIndexes())); firstEntryId = fragmentSplitLastEntry + 1; } long lastSplitWithPartialEntries = numberOfEntriesToReplicate % rereplicationEntryBatchSize; if (lastSplitWithPartialEntries > 0) { fragments.add(new LedgerFragment(lh, firstEntryId, firstEntryId + lastSplitWithPartialEntries - 1, ledgerFragment .getBookiesIndexes())); } return fragments; } ` the `numberOfEntriesToReplicate` variable can be wrong(assigned 1) when `firstEntryId == INVALID_ENTRY_ID && lastEntryId == INVALID_ENTRY_ID`, It should be assigned 0. Then it leads to the above error message when the control reaches to `new LedgerFragment` line. Is it a bug? I’v reviewed the master branch and it has not been fixed yet. ***To Reproduce*** Is hard to reproduce in test env. ***Expected behavior*** the ledger replicate can be completed and replicate next. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
