Do you have the full stack-trace of the exception? Also, do you still have the damaged journal? It would be good to have it to test with.
Justin On Mon, Jul 1, 2019 at 4:57 AM yw yw <[email protected]> wrote: > Hi, All > > > Yesterday our cluster experienced a sudden loss of power. When we started > broker after power brought back, exception occurred: > > > The exception showed the userRecordType loaded was illegal. The operation > team deleted data journals and broker started successfully. > > > It was a pity we didn't backup the problematic journal files. We checked > dmesg command output, no disk errors. SMART tests on disk also showed disk > not broken. Then we digged into code(JournalImpl::readJournalFile) and > tried to find something. We have two doubts with the code. > > > First doubt: > > The comment says "I - We scan for any valid record on the file. If a hole > happened on the middle of the file we keep looking until all the > possibilities are gone". > > Considering we're appending journal file and fileId is strictly increasing, > so we can just skip the whole file if the fileId of record is not equal to > file id. IMO the rest records in the file are the same, no need to read > them. Should we keep looking all the possibilities, is there a > possibility(very low one) that we just assemble a record of which fileId, > recordType, checkSize all qualifies but actually does not exist? > > Our second one: > > In the case of power outage where part of record is written into disk, e.g. > recordyType,fileId is successfully written, we may read the old record > though fileId is latest? > > Can anyone shed some lights on this please? Thanks. >
