[ https://issues.apache.org/jira/browse/IGNITE-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vladislav Pyatkov updated IGNITE-12935: --------------------------------------- Description: # Mention in the log only partitions for which there are no nodes that suit as historical supplier For these partitions, print minimal counter (since which we should perform historical rebalancing) with corresponding node and maximum reserved counter (since which cluster can perform historical rebalancing) with corresponding node. This will let us know: ## Whether history was reserved at all ## How much reserved history we lack to perform a historical rebalancing ## I see resulting output like this: Historical rebalancing wasn't scheduled for some partitions: History wasn't reserved for: [list of partitions and groups] History was reserved, but minimum present counter is less than maximum reserved: [[grp=GRP, part=ID, minCntr=cntr, minNodeId=ID, maxReserved=cntr, maxReservedNodeId=ID], ...] ## We can also aggregate previous message by (minNodeId) to easily find the exact node (or nodes) which were the reason of full rebalance. # Log results of {{reserveHistoryForExchange()}}. They can be compactly represented as mappings: {{(grpId -> checkpoint (id, timestamp))}}. For every group, also log message about why the previous checkpoint wasn't successfully reserved. There can be three reasons: ## Previous checkpoint simply isn't present in the history (the oldest is reserved) ## WAL reservation failure (call below returned false) {code:java} chpEntry = entry(cpTs);boolean reserved = cctx.wal().reserve(chpEntry.checkpointMark());// If checkpoint WAL history can't be reserved, stop searching. if (!reserved) break; {code} 3. Checkpoint was marked as inapplicable for historical rebalancing {code:java} for (Integer grpId : new HashSet<>(groupsAndPartitions.keySet())) if (!isCheckpointApplicableForGroup(grpId, chpEntry)) groupsAndPartitions.remove(grpId); {code} was: # Mention in the log only partitions for which there are no nodes that suit as historical supplier For these partitions, print minimal counter (since which we should perform historical rebalancing) with corresponding node and maximum reserved counter (since which cluster can perform historical rebalancing) with corresponding node. This will let us know: ## Whether history was reserved at all ## How much reserved history we lack to perform a historical rebalancing ## I see resulting output like this: Historical rebalancing wasn't scheduled for some partitions: History wasn't reserved for: [list of partitions and groups] History was reserved, but minimum present counter is less than maximum reserved: [[grp=GRP, part=ID, minCntr=cntr, minNodeId=ID, maxReserved=cntr, maxReservedNodeId=ID], ...] ## We can also aggregate previous message by (minNodeId) to easily find the exact node (or nodes) which were the reason of full rebalance. # Log results of {{reserveHistoryForExchange()}}. They can be compactly represented as mappings: {{(grpId -> checkpoint (id, timestamp))}}. For every group, also log message about why the previous checkpoint wasn't successfully reserved. There can be three reasons: ## Previous checkpoint simply isn't present in the history (the oldest is reserved) ## WAL reservation failure (call below returned false) {code:java} chpEntry = entry(cpTs);boolean reserved = cctx.wal().reserve(chpEntry.checkpointMark());// If checkpoint WAL history can't be reserved, stop searching. if (!reserved) break; {code} # ## Checkpoint was marked as inapplicable for historical rebalancing {code:java} for (Integer grpId : new HashSet<>(groupsAndPartitions.keySet())) if (!isCheckpointApplicableForGroup(grpId, chpEntry)) groupsAndPartitions.remove(grpId); {code} > Disadvantages in log of historical rebalance > -------------------------------------------- > > Key: IGNITE-12935 > URL: https://issues.apache.org/jira/browse/IGNITE-12935 > Project: Ignite > Issue Type: Improvement > Reporter: Vladislav Pyatkov > Priority: Major > > # Mention in the log only partitions for which there are no nodes that suit > as historical supplier > For these partitions, print minimal counter (since which we should perform > historical rebalancing) with corresponding node and maximum reserved counter > (since which cluster can perform historical rebalancing) with corresponding > node. > This will let us know: > ## Whether history was reserved at all > ## How much reserved history we lack to perform a historical rebalancing > ## I see resulting output like this: > Historical rebalancing wasn't scheduled for some partitions: > History wasn't reserved for: [list of partitions and groups] > History was reserved, but minimum present counter is less than maximum > reserved: [[grp=GRP, part=ID, minCntr=cntr, minNodeId=ID, maxReserved=cntr, > maxReservedNodeId=ID], ...] > ## We can also aggregate previous message by (minNodeId) to easily find the > exact node (or nodes) which were the reason of full rebalance. > # Log results of {{reserveHistoryForExchange()}}. They can be compactly > represented as mappings: {{(grpId -> checkpoint (id, timestamp))}}. For every > group, also log message about why the previous checkpoint wasn't successfully > reserved. > There can be three reasons: > ## Previous checkpoint simply isn't present in the history (the oldest is > reserved) > ## WAL reservation failure (call below returned false) > {code:java} > chpEntry = entry(cpTs);boolean reserved = > cctx.wal().reserve(chpEntry.checkpointMark());// If checkpoint WAL history > can't be reserved, stop searching. > if (!reserved) > break; > {code} > 3. Checkpoint was marked as inapplicable for historical > rebalancing > {code:java} > for (Integer grpId : new HashSet<>(groupsAndPartitions.keySet())) > if (!isCheckpointApplicableForGroup(grpId, chpEntry)) > groupsAndPartitions.remove(grpId); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)