Do you have the full stack-trace of the exception? Also, do you still have
the damaged journal? It would be good to have it to test with.


Justin

On Mon, Jul 1, 2019 at 4:57 AM yw yw <[email protected]> wrote:

> Hi, All
>
>
> Yesterday our cluster experienced a sudden loss of power. When we started
> broker after power brought back, exception occurred:
>
>
> The exception showed the userRecordType loaded was illegal. The operation
> team deleted data journals and broker started successfully.
>
>
> It was a pity we didn't backup the problematic journal files. We checked
> dmesg command output, no disk errors. SMART tests on disk also showed disk
> not broken. Then we digged into code(JournalImpl::readJournalFile) and
> tried to find something. We have two doubts with the code.
>
>
> First doubt:
>
> The comment says "I - We scan for any valid record on the file. If a hole
> happened on the middle of the file we keep looking until all the
> possibilities are gone".
>
> Considering we're appending journal file and fileId is strictly increasing,
> so we can just skip the whole file if the fileId of record is not equal to
> file id. IMO the rest records in the file are the same, no need to read
> them. Should we keep looking all the possibilities, is there a
> possibility(very low one) that we just assemble a record of which fileId,
> recordType, checkSize all qualifies but actually does not exist?
>
> Our second one:
>
> In the case of power outage where part of record is written into disk, e.g.
> recordyType,fileId is successfully written, we may read the old record
> though fileId is latest?
>
> Can anyone shed some lights on this please? Thanks.
>

Reply via email to