[ https://issues.apache.org/jira/browse/OAK-8063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16775105#comment-16775105 ]
Andrei Dulceanu commented on OAK-8063: -------------------------------------- [~frm], in [^OAK-8063.patch] I propose a new, simplified implementation for {{StandbyClientSyncExecution#copySegmentHierarchyFromPrimary}}. It follows the classical topological order algorithm with a twist: by eagerly marking a node visited, it breaks possible loops if the graph contains a cycle. BTW, could really be the case that the segment graph to be transferred from primary contains a cycle? My understanding is that segments only refer previous written segments, without any forward references, thus avoiding cycles. I also kept the optimisation about segments already transferred (i.e. local), but removed the one for the "diamond problem" which could not be possible in the current implementation. Could you please take a look at the patch? > The cold standby client doesn't correctly handle backward references > -------------------------------------------------------------------- > > Key: OAK-8063 > URL: https://issues.apache.org/jira/browse/OAK-8063 > Project: Jackrabbit Oak > Issue Type: Bug > Components: segment-tar, tarmk-standby > Affects Versions: 1.6.0 > Reporter: Andrei Dulceanu > Assignee: Andrei Dulceanu > Priority: Major > Labels: cold-standby > Fix For: 1.12, 1.10.1, 1.8.12 > > Attachments: OAK-8063.patch > > > The logic from {{StandbyClientSyncExecution#copySegmentHierarchyFromPrimary}} > has a flaw when it comes to "backward references". Suppose we have the > following data segment graph to be transferred from primary: S1, which > references \{S2, S3} and S3 which references S2. Then, the correct transfer > order should be S2, S3 and S1. > Going through the current logic employed by the method, here's what happens: > {noformat} > Step 0: batch={S1} > Step 1: visited={S1}, data={S1}, batch={S2, S3}, queued={S2, S3} > Step 2: visited={S1, S2}, data={S2, S1}, batch={S3}, queued={S2, S3} > Step 3: visited={S1, S2, S3}, data={S3, S2, S1}, batch={}, queued={S2, > S3}.{noformat} > Therefore, at the end of the loop, the order of the segments to be > transferred will be S3, S2, S1, which might trigger a > {{SegmentNotFoundException}} when S3 is further processed, because S2 is > missing on standby (see OAK-8006). > /cc [~frm] -- This message was sent by Atlassian JIRA (v7.6.3#76005)