[jira] [Commented] (IGNITE-12935) Disadvantages in log of historical rebalance

2020-04-25 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17092132#comment-17092132
 ] 

Ignite TC Bot commented on IGNITE-12935:


{panel:title=Branch: [pull/7722/head] Base: [master] : Possible Blockers 
(2)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}
{color:#d04437}Platform .NET (NuGet)*{color} [[tests 0 Exit Code , Compilation 
Error |https://ci.ignite.apache.org/viewLog.html?buildId=5256336]]

{color:#d04437}Platform .NET (Inspections)*{color} [[tests 0 Failure on metric 
|https://ci.ignite.apache.org/viewLog.html?buildId=5256340]]

{panel}
[TeamCity *--> Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=5254864&buildTypeId=IgniteTests24Java8_RunAll]

> Disadvantages in log of historical rebalance
> 
>
> Key: IGNITE-12935
> URL: https://issues.apache.org/jira/browse/IGNITE-12935
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> # Mention in the log only partitions for which there are no nodes that suit 
> as historical supplier
>  For these partitions, print minimal counter (since which we should perform 
> historical rebalancing) with corresponding node and maximum reserved counter 
> (since which cluster can perform historical rebalancing) with corresponding 
> node.
>  This will let us know:
>  ## Whether history was reserved at all
>  ## How much reserved history we lack to perform a historical rebalancing
>  ## I see resulting output like this:
> {noformat}
>  Historical rebalancing wasn't scheduled for some partitions:
>  History wasn't reserved for: [list of partitions and groups]
>  History was reserved, but minimum present counter is less than maximum 
> reserved: [[grp=GRP, part=ID, minCntr=cntr, minNodeId=ID, maxReserved=cntr, 
> maxReservedNodeId=ID], ...]{noformat}
>  ## We can also aggregate previous message by (minNodeId) to easily find the 
> exact node (or nodes) which were the reason of full rebalance.
>  # Log results of {{reserveHistoryForExchange()}}. They can be compactly 
> represented as mappings: {{(grpId -> checkpoint (id, timestamp))}}. For every 
> group, also log message about why the previous checkpoint wasn't successfully 
> reserved.
>  There can be three reasons:
>  ## Previous checkpoint simply isn't present in the history (the oldest is 
> reserved)
>  ## WAL reservation failure (call below returned false)
> {code:java}
> chpEntry = entry(cpTs);
> boolean reserved = cctx.wal().reserve(chpEntry.checkpointMark());// If 
> checkpoint WAL history can't be reserved, stop searching. 
> if (!reserved) 
>   break;{code}
> ## Checkpoint was marked as inapplicable for historical rebalancing
> {code:java}
> for (Integer grpId : new HashSet<>(groupsAndPartitions.keySet()))
>    if (!isCheckpointApplicableForGroup(grpId, chpEntry))
>      groupsAndPartitions.remove(grpId);{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12935) Disadvantages in log of historical rebalance

2020-04-25 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17092135#comment-17092135
 ] 

Vladislav Pyatkov commented on IGNITE-12935:


These two suites are continued failing in master.

> Disadvantages in log of historical rebalance
> 
>
> Key: IGNITE-12935
> URL: https://issues.apache.org/jira/browse/IGNITE-12935
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> # Mention in the log only partitions for which there are no nodes that suit 
> as historical supplier
>  For these partitions, print minimal counter (since which we should perform 
> historical rebalancing) with corresponding node and maximum reserved counter 
> (since which cluster can perform historical rebalancing) with corresponding 
> node.
>  This will let us know:
>  ## Whether history was reserved at all
>  ## How much reserved history we lack to perform a historical rebalancing
>  ## I see resulting output like this:
> {noformat}
>  Historical rebalancing wasn't scheduled for some partitions:
>  History wasn't reserved for: [list of partitions and groups]
>  History was reserved, but minimum present counter is less than maximum 
> reserved: [[grp=GRP, part=ID, minCntr=cntr, minNodeId=ID, maxReserved=cntr, 
> maxReservedNodeId=ID], ...]{noformat}
>  ## We can also aggregate previous message by (minNodeId) to easily find the 
> exact node (or nodes) which were the reason of full rebalance.
>  # Log results of {{reserveHistoryForExchange()}}. They can be compactly 
> represented as mappings: {{(grpId -> checkpoint (id, timestamp))}}. For every 
> group, also log message about why the previous checkpoint wasn't successfully 
> reserved.
>  There can be three reasons:
>  ## Previous checkpoint simply isn't present in the history (the oldest is 
> reserved)
>  ## WAL reservation failure (call below returned false)
> {code:java}
> chpEntry = entry(cpTs);
> boolean reserved = cctx.wal().reserve(chpEntry.checkpointMark());// If 
> checkpoint WAL history can't be reserved, stop searching. 
> if (!reserved) 
>   break;{code}
> ## Checkpoint was marked as inapplicable for historical rebalancing
> {code:java}
> for (Integer grpId : new HashSet<>(groupsAndPartitions.keySet()))
>    if (!isCheckpointApplicableForGroup(grpId, chpEntry))
>      groupsAndPartitions.remove(grpId);{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12935) Disadvantages in log of historical rebalance

2020-04-28 Thread Alexey Scherbakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094615#comment-17094615
 ] 

Alexey Scherbakov commented on IGNITE-12935:


[~v.pyatkov] [~irakov]

My thoughts on a patch:

1. The string constants describing reservation stop reason should be replaced 
with an enum, like:

public enum ReservationResult {
 ...
}

2. Reasons are a subject to change:

NOT_RESERVED_WAL_REASON - not needed, it's just a particular case of 
reservation error when start pointer is invalid.

WAL_SEG_CORRUPTED_REASON - should be renamed to WAL_RESERVATION_ERROR - 
describes any unexpected error during processing of previous checkpoint.

NO_MORE_HISTORY_REASON - OK

CHECKPOINT_NOT_APPLICABLE_REASON - OK

FULL_HISTORY_REASON - let's rename it to NO_MORE_DATA_EARLIER_IN_HISTORY.

NO_PARTITIONS_OWNED_REASON - this is actually same as 
NO_MORE_DATA_EARLIER_IN_HISTORY but no history reserved at all (cp will be 
empty in log) 

_REASON suffix is not needed.

3. It's currently impossible to understand from a loggging if a specific 
partition have history for rebalancing, because each partition history is 
reserved independently and we currently log the "best" partition (having 
deepest history).
Is it done intentionally ?

4. Avoid using hard-coded strings in tests. This makes changing log format a 
PITA. All log messages should be stored in a single place.

5. We should log (by "Following caches were not reserved") groups having 
partitions excluded by reservation by partition size heuristic with 
corresponding reason.

6. Fix typos in a word "supplay".

> Disadvantages in log of historical rebalance
> 
>
> Key: IGNITE-12935
> URL: https://issues.apache.org/jira/browse/IGNITE-12935
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> # Mention in the log only partitions for which there are no nodes that suit 
> as historical supplier
>  For these partitions, print minimal counter (since which we should perform 
> historical rebalancing) with corresponding node and maximum reserved counter 
> (since which cluster can perform historical rebalancing) with corresponding 
> node.
>  This will let us know:
>  ## Whether history was reserved at all
>  ## How much reserved history we lack to perform a historical rebalancing
>  ## I see resulting output like this:
> {noformat}
>  Historical rebalancing wasn't scheduled for some partitions:
>  History wasn't reserved for: [list of partitions and groups]
>  History was reserved, but minimum present counter is less than maximum 
> reserved: [[grp=GRP, part=ID, minCntr=cntr, minNodeId=ID, maxReserved=cntr, 
> maxReservedNodeId=ID], ...]{noformat}
>  ## We can also aggregate previous message by (minNodeId) to easily find the 
> exact node (or nodes) which were the reason of full rebalance.
>  # Log results of {{reserveHistoryForExchange()}}. They can be compactly 
> represented as mappings: {{(grpId -> checkpoint (id, timestamp))}}. For every 
> group, also log message about why the previous checkpoint wasn't successfully 
> reserved.
>  There can be three reasons:
>  ## Previous checkpoint simply isn't present in the history (the oldest is 
> reserved)
>  ## WAL reservation failure (call below returned false)
> {code:java}
> chpEntry = entry(cpTs);
> boolean reserved = cctx.wal().reserve(chpEntry.checkpointMark());// If 
> checkpoint WAL history can't be reserved, stop searching. 
> if (!reserved) 
>   break;{code}
> ## Checkpoint was marked as inapplicable for historical rebalancing
> {code:java}
> for (Integer grpId : new HashSet<>(groupsAndPartitions.keySet()))
>    if (!isCheckpointApplicableForGroup(grpId, chpEntry))
>      groupsAndPartitions.remove(grpId);{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12935) Disadvantages in log of historical rebalance

2020-04-28 Thread Alexey Scherbakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094664#comment-17094664
 ] 

Alexey Scherbakov commented on IGNITE-12935:


7. Due to big amount of text probably it worth to move partition reservation 
logging under debug level, preferrable using separate logger.

> Disadvantages in log of historical rebalance
> 
>
> Key: IGNITE-12935
> URL: https://issues.apache.org/jira/browse/IGNITE-12935
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> # Mention in the log only partitions for which there are no nodes that suit 
> as historical supplier
>  For these partitions, print minimal counter (since which we should perform 
> historical rebalancing) with corresponding node and maximum reserved counter 
> (since which cluster can perform historical rebalancing) with corresponding 
> node.
>  This will let us know:
>  ## Whether history was reserved at all
>  ## How much reserved history we lack to perform a historical rebalancing
>  ## I see resulting output like this:
> {noformat}
>  Historical rebalancing wasn't scheduled for some partitions:
>  History wasn't reserved for: [list of partitions and groups]
>  History was reserved, but minimum present counter is less than maximum 
> reserved: [[grp=GRP, part=ID, minCntr=cntr, minNodeId=ID, maxReserved=cntr, 
> maxReservedNodeId=ID], ...]{noformat}
>  ## We can also aggregate previous message by (minNodeId) to easily find the 
> exact node (or nodes) which were the reason of full rebalance.
>  # Log results of {{reserveHistoryForExchange()}}. They can be compactly 
> represented as mappings: {{(grpId -> checkpoint (id, timestamp))}}. For every 
> group, also log message about why the previous checkpoint wasn't successfully 
> reserved.
>  There can be three reasons:
>  ## Previous checkpoint simply isn't present in the history (the oldest is 
> reserved)
>  ## WAL reservation failure (call below returned false)
> {code:java}
> chpEntry = entry(cpTs);
> boolean reserved = cctx.wal().reserve(chpEntry.checkpointMark());// If 
> checkpoint WAL history can't be reserved, stop searching. 
> if (!reserved) 
>   break;{code}
> ## Checkpoint was marked as inapplicable for historical rebalancing
> {code:java}
> for (Integer grpId : new HashSet<>(groupsAndPartitions.keySet()))
>    if (!isCheckpointApplicableForGroup(grpId, chpEntry))
>      groupsAndPartitions.remove(grpId);{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12935) Disadvantages in log of historical rebalance

2020-04-29 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17095690#comment-17095690
 ] 

Ivan Rakov commented on IGNITE-12935:
-

My comments:
1, 2 - agree
regarding NOT_RESERVED_WAL_REASON - I'd keep it to track cases when 
CheckpointHistory provide entries that aren't present in WAL anymore (shouldn't 
happen, but just in case)
3 - we log best partition only in case we didn't succeed in finding any 
partition suitable for WAL rebalance. Logging other partition is redundant: 
their history is even more shallow.
4 - agree
5 - totally agree, this is crucial. Forgot about it during my review
6 - agree
7 - arguable. Main purpose of these logs is investigation of full rebalance 
reasons post factum. Anyway, lots of messages about lots of partition will 
occur only on PME when some partitions need to be rebalanced, which shouldn't 
happed very often.

> Disadvantages in log of historical rebalance
> 
>
> Key: IGNITE-12935
> URL: https://issues.apache.org/jira/browse/IGNITE-12935
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> # Mention in the log only partitions for which there are no nodes that suit 
> as historical supplier
>  For these partitions, print minimal counter (since which we should perform 
> historical rebalancing) with corresponding node and maximum reserved counter 
> (since which cluster can perform historical rebalancing) with corresponding 
> node.
>  This will let us know:
>  ## Whether history was reserved at all
>  ## How much reserved history we lack to perform a historical rebalancing
>  ## I see resulting output like this:
> {noformat}
>  Historical rebalancing wasn't scheduled for some partitions:
>  History wasn't reserved for: [list of partitions and groups]
>  History was reserved, but minimum present counter is less than maximum 
> reserved: [[grp=GRP, part=ID, minCntr=cntr, minNodeId=ID, maxReserved=cntr, 
> maxReservedNodeId=ID], ...]{noformat}
>  ## We can also aggregate previous message by (minNodeId) to easily find the 
> exact node (or nodes) which were the reason of full rebalance.
>  # Log results of {{reserveHistoryForExchange()}}. They can be compactly 
> represented as mappings: {{(grpId -> checkpoint (id, timestamp))}}. For every 
> group, also log message about why the previous checkpoint wasn't successfully 
> reserved.
>  There can be three reasons:
>  ## Previous checkpoint simply isn't present in the history (the oldest is 
> reserved)
>  ## WAL reservation failure (call below returned false)
> {code:java}
> chpEntry = entry(cpTs);
> boolean reserved = cctx.wal().reserve(chpEntry.checkpointMark());// If 
> checkpoint WAL history can't be reserved, stop searching. 
> if (!reserved) 
>   break;{code}
> ## Checkpoint was marked as inapplicable for historical rebalancing
> {code:java}
> for (Integer grpId : new HashSet<>(groupsAndPartitions.keySet()))
>    if (!isCheckpointApplicableForGroup(grpId, chpEntry))
>      groupsAndPartitions.remove(grpId);{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12935) Disadvantages in log of historical rebalance

2020-06-26 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17146658#comment-17146658
 ] 

Ignite TC Bot commented on IGNITE-12935:


{panel:title=Branch: [pull/7722/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/7722/head] Base: [master] : New Tests 
(11)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}
{color:#8b}PDS 2{color} [tests 3]
* {color:#013220}IgnitePdsTestSuite2: 
IgniteWalRebalanceLoggingTest.testFullRebalanceLogMsgs - PASSED{color}
* {color:#013220}IgnitePdsTestSuite2: 
IgniteWalRebalanceLoggingTest.testHistoricalRebalanceLogMsg - PASSED{color}
* {color:#013220}IgnitePdsTestSuite2: 
IgniteWalRebalanceLoggingTest.testFullRebalanceWithShortCpHistoryLogMsgs - 
PASSED{color}

{color:#8b}Service Grid{color} [tests 4]
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.requestId[Test event=IgniteBiTuple 
[val1=DiscoveryCustomEvent [customMsg=ServiceChangeBatchRequest 
[id=96c5501f271-66e9bc30-1170-484b-9193-d8f9cfd301fd, reqs=SingletonList 
[ServiceUndeploymentRequest []]], affTopVer=null, super=DiscoveryEvent 
[evtNode=a304cc77-4e87-4ce2-b494-a989bf3ebf48, topVer=0, nodeId8=a304cc77, 
msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1593181559909]], 
val2=AffinityTopologyVersion [topVer=8179224952790605856, minorTopVer=0]]] - 
PASSED{color}
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.topologyVersion[Test event=IgniteBiTuple 
[val1=DiscoveryEvent [evtNode=9b6e82d8-5428-41aa-a108-47cc665e6094, topVer=0, 
nodeId8=19cbc984, msg=, type=NODE_JOINED, tstamp=1593181559909], 
val2=AffinityTopologyVersion [topVer=-1440090742220744401, minorTopVer=0]]] - 
PASSED{color}
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.requestId[Test event=IgniteBiTuple 
[val1=DiscoveryEvent [evtNode=9b6e82d8-5428-41aa-a108-47cc665e6094, topVer=0, 
nodeId8=19cbc984, msg=, type=NODE_JOINED, tstamp=1593181559909], 
val2=AffinityTopologyVersion [topVer=-1440090742220744401, minorTopVer=0]]] - 
PASSED{color}
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.topologyVersion[Test event=IgniteBiTuple 
[val1=DiscoveryCustomEvent [customMsg=ServiceChangeBatchRequest 
[id=96c5501f271-66e9bc30-1170-484b-9193-d8f9cfd301fd, reqs=SingletonList 
[ServiceUndeploymentRequest []]], affTopVer=null, super=DiscoveryEvent 
[evtNode=a304cc77-4e87-4ce2-b494-a989bf3ebf48, topVer=0, nodeId8=a304cc77, 
msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1593181559909]], 
val2=AffinityTopologyVersion [topVer=8179224952790605856, minorTopVer=0]]] - 
PASSED{color}

{color:#8b}Service Grid (legacy mode){color} [tests 4]
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.topologyVersion[Test event=IgniteBiTuple 
[val1=DiscoveryEvent [evtNode=abd5e161-4ec6-4fdc-b350-86058cb1800c, topVer=0, 
nodeId8=40dcc9b2, msg=, type=NODE_JOINED, tstamp=1593181679755], 
val2=AffinityTopologyVersion [topVer=2068679071159579793, minorTopVer=0]]] - 
PASSED{color}
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.requestId[Test event=IgniteBiTuple 
[val1=DiscoveryEvent [evtNode=abd5e161-4ec6-4fdc-b350-86058cb1800c, topVer=0, 
nodeId8=40dcc9b2, msg=, type=NODE_JOINED, tstamp=1593181679755], 
val2=AffinityTopologyVersion [topVer=2068679071159579793, minorTopVer=0]]] - 
PASSED{color}
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.topologyVersion[Test event=IgniteBiTuple 
[val1=DiscoveryCustomEvent [customMsg=ServiceChangeBatchRequest 
[id=84d3111f271-9687f3aa-7b2b-4782-8da2-1f301caf60e5, reqs=SingletonList 
[ServiceUndeploymentRequest []]], affTopVer=null, super=DiscoveryEvent 
[evtNode=719469e6-3ad6-4407-be76-fee0a142a03c, topVer=0, nodeId8=719469e6, 
msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1593181679755]], 
val2=AffinityTopologyVersion [topVer=-4572596404447967496, minorTopVer=0]]] - 
PASSED{color}
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.requestId[Test event=IgniteBiTuple 
[val1=DiscoveryCustomEvent [customMsg=ServiceChangeBatchRequest 
[id=84d3111f271-9687f3aa-7b2b-4782-8da2-1f301caf60e5, reqs=SingletonList 
[ServiceUndeploymentRequest []]], affTopVer=null, super=DiscoveryEvent 
[evtNode=719469e6-3ad6-4407-be76-fee0a142a03c, topVer=0, nodeId8=719469e6, 
msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1593181679755]], 
val2=AffinityTopologyVersion [topVer=-4572596404447967496, minorTopVer=0]]] - 
PASSED{color}

{panel}
[TeamCity *--> Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=5419518&buildTypeId=IgniteTests24Java8_RunAll]

> Disadvantages in log of historical rebalance
> 
>
> Key: IGNITE-12935
> UR

[jira] [Commented] (IGNITE-12935) Disadvantages in log of historical rebalance

2020-06-29 Thread Alexey Scherbakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147822#comment-17147822
 ] 

Alexey Scherbakov commented on IGNITE-12935:


[~v.pyatkov]

I've left several minor comments.

> Disadvantages in log of historical rebalance
> 
>
> Key: IGNITE-12935
> URL: https://issues.apache.org/jira/browse/IGNITE-12935
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> # Mention in the log only partitions for which there are no nodes that suit 
> as historical supplier
>  For these partitions, print minimal counter (since which we should perform 
> historical rebalancing) with corresponding node and maximum reserved counter 
> (since which cluster can perform historical rebalancing) with corresponding 
> node.
>  This will let us know:
>  ## Whether history was reserved at all
>  ## How much reserved history we lack to perform a historical rebalancing
>  ## I see resulting output like this:
> {noformat}
>  Historical rebalancing wasn't scheduled for some partitions:
>  History wasn't reserved for: [list of partitions and groups]
>  History was reserved, but minimum present counter is less than maximum 
> reserved: [[grp=GRP, part=ID, minCntr=cntr, minNodeId=ID, maxReserved=cntr, 
> maxReservedNodeId=ID], ...]{noformat}
>  ## We can also aggregate previous message by (minNodeId) to easily find the 
> exact node (or nodes) which were the reason of full rebalance.
>  # Log results of {{reserveHistoryForExchange()}}. They can be compactly 
> represented as mappings: {{(grpId -> checkpoint (id, timestamp))}}. For every 
> group, also log message about why the previous checkpoint wasn't successfully 
> reserved.
>  There can be three reasons:
>  ## Previous checkpoint simply isn't present in the history (the oldest is 
> reserved)
>  ## WAL reservation failure (call below returned false)
> {code:java}
> chpEntry = entry(cpTs);
> boolean reserved = cctx.wal().reserve(chpEntry.checkpointMark());// If 
> checkpoint WAL history can't be reserved, stop searching. 
> if (!reserved) 
>   break;{code}
> ## Checkpoint was marked as inapplicable for historical rebalancing
> {code:java}
> for (Integer grpId : new HashSet<>(groupsAndPartitions.keySet()))
>    if (!isCheckpointApplicableForGroup(grpId, chpEntry))
>      groupsAndPartitions.remove(grpId);{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12935) Disadvantages in log of historical rebalance

2020-07-01 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149688#comment-17149688
 ] 

Ignite TC Bot commented on IGNITE-12935:


{panel:title=Branch: [pull/7722/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/7722/head] Base: [master] : New Tests 
(11)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}
{color:#8b}PDS 2{color} [tests 3]
* {color:#013220}IgnitePdsTestSuite2: 
IgniteWalRebalanceLoggingTest.testFullRebalanceLogMsgs - PASSED{color}
* {color:#013220}IgnitePdsTestSuite2: 
IgniteWalRebalanceLoggingTest.testHistoricalRebalanceLogMsg - PASSED{color}
* {color:#013220}IgnitePdsTestSuite2: 
IgniteWalRebalanceLoggingTest.testFullRebalanceWithShortCpHistoryLogMsgs - 
PASSED{color}

{color:#8b}Service Grid{color} [tests 4]
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.topologyVersion[Test event=IgniteBiTuple 
[val1=DiscoveryEvent [evtNode=71f38fbe-fa60-4411-83d6-b5acdd2bc0e8, topVer=0, 
nodeId8=fdfe0196, msg=, type=NODE_JOINED, tstamp=1593445379902], 
val2=AffinityTopologyVersion [topVer=-2492070593086437179, minorTopVer=0]]] - 
PASSED{color}
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.requestId[Test event=IgniteBiTuple 
[val1=DiscoveryEvent [evtNode=71f38fbe-fa60-4411-83d6-b5acdd2bc0e8, topVer=0, 
nodeId8=fdfe0196, msg=, type=NODE_JOINED, tstamp=1593445379902], 
val2=AffinityTopologyVersion [topVer=-2492070593086437179, minorTopVer=0]]] - 
PASSED{color}
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.topologyVersion[Test event=IgniteBiTuple 
[val1=DiscoveryCustomEvent [customMsg=ServiceChangeBatchRequest 
[id=685e8c00371-15808eff-eb74-466a-8e55-0fb81d2b00fe, reqs=SingletonList 
[ServiceUndeploymentRequest []]], affTopVer=null, super=DiscoveryEvent 
[evtNode=c6ae959b-6292-4f7e-94d4-5ee2a741c4aa, topVer=0, nodeId8=c6ae959b, 
msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1593445379902]], 
val2=AffinityTopologyVersion [topVer=-282040637918382668, minorTopVer=0]]] - 
PASSED{color}
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.requestId[Test event=IgniteBiTuple 
[val1=DiscoveryCustomEvent [customMsg=ServiceChangeBatchRequest 
[id=685e8c00371-15808eff-eb74-466a-8e55-0fb81d2b00fe, reqs=SingletonList 
[ServiceUndeploymentRequest []]], affTopVer=null, super=DiscoveryEvent 
[evtNode=c6ae959b-6292-4f7e-94d4-5ee2a741c4aa, topVer=0, nodeId8=c6ae959b, 
msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1593445379902]], 
val2=AffinityTopologyVersion [topVer=-282040637918382668, minorTopVer=0]]] - 
PASSED{color}

{color:#8b}Service Grid (legacy mode){color} [tests 4]
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.topologyVersion[Test event=IgniteBiTuple 
[val1=DiscoveryCustomEvent [customMsg=ServiceChangeBatchRequest 
[id=822c9c00371-a9abbef0-887c-4d96-978e-1e3c0384af3c, reqs=SingletonList 
[ServiceUndeploymentRequest []]], affTopVer=null, super=DiscoveryEvent 
[evtNode=a6a60a67-e379-4bf6-98f1-23ae7274c8a2, topVer=0, nodeId8=a6a60a67, 
msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1593445458885]], 
val2=AffinityTopologyVersion [topVer=-6687727961648831972, minorTopVer=0]]] - 
PASSED{color}
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.requestId[Test event=IgniteBiTuple 
[val1=DiscoveryCustomEvent [customMsg=ServiceChangeBatchRequest 
[id=822c9c00371-a9abbef0-887c-4d96-978e-1e3c0384af3c, reqs=SingletonList 
[ServiceUndeploymentRequest []]], affTopVer=null, super=DiscoveryEvent 
[evtNode=a6a60a67-e379-4bf6-98f1-23ae7274c8a2, topVer=0, nodeId8=a6a60a67, 
msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1593445458885]], 
val2=AffinityTopologyVersion [topVer=-6687727961648831972, minorTopVer=0]]] - 
PASSED{color}
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.topologyVersion[Test event=IgniteBiTuple 
[val1=DiscoveryEvent [evtNode=28008880-0458-4c7d-98c3-c62112aa1c38, topVer=0, 
nodeId8=71936a1f, msg=, type=NODE_JOINED, tstamp=1593445458885], 
val2=AffinityTopologyVersion [topVer=-3852995104049898338, minorTopVer=0]]] - 
PASSED{color}
* {color:#013220}IgniteServiceGridTestSuite: 
ServiceDeploymentProcessIdSelfTest.requestId[Test event=IgniteBiTuple 
[val1=DiscoveryEvent [evtNode=28008880-0458-4c7d-98c3-c62112aa1c38, topVer=0, 
nodeId8=71936a1f, msg=, type=NODE_JOINED, tstamp=1593445458885], 
val2=AffinityTopologyVersion [topVer=-3852995104049898338, minorTopVer=0]]] - 
PASSED{color}

{panel}
[TeamCity *--> Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=5425485&buildTypeId=IgniteTests24Java8_RunAll]

> Disadvantages in log of historical rebalance
> 
>
> Key: IGNITE-12935
> 

[jira] [Commented] (IGNITE-12935) Disadvantages in log of historical rebalance

2020-07-02 Thread Vladislav Pyatkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17150064#comment-17150064
 ] 

Vladislav Pyatkov commented on IGNITE-12935:


[~ascherbakov] Could you please look at again?

> Disadvantages in log of historical rebalance
> 
>
> Key: IGNITE-12935
> URL: https://issues.apache.org/jira/browse/IGNITE-12935
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> # Mention in the log only partitions for which there are no nodes that suit 
> as historical supplier
>  For these partitions, print minimal counter (since which we should perform 
> historical rebalancing) with corresponding node and maximum reserved counter 
> (since which cluster can perform historical rebalancing) with corresponding 
> node.
>  This will let us know:
>  ## Whether history was reserved at all
>  ## How much reserved history we lack to perform a historical rebalancing
>  ## I see resulting output like this:
> {noformat}
>  Historical rebalancing wasn't scheduled for some partitions:
>  History wasn't reserved for: [list of partitions and groups]
>  History was reserved, but minimum present counter is less than maximum 
> reserved: [[grp=GRP, part=ID, minCntr=cntr, minNodeId=ID, maxReserved=cntr, 
> maxReservedNodeId=ID], ...]{noformat}
>  ## We can also aggregate previous message by (minNodeId) to easily find the 
> exact node (or nodes) which were the reason of full rebalance.
>  # Log results of {{reserveHistoryForExchange()}}. They can be compactly 
> represented as mappings: {{(grpId -> checkpoint (id, timestamp))}}. For every 
> group, also log message about why the previous checkpoint wasn't successfully 
> reserved.
>  There can be three reasons:
>  ## Previous checkpoint simply isn't present in the history (the oldest is 
> reserved)
>  ## WAL reservation failure (call below returned false)
> {code:java}
> chpEntry = entry(cpTs);
> boolean reserved = cctx.wal().reserve(chpEntry.checkpointMark());// If 
> checkpoint WAL history can't be reserved, stop searching. 
> if (!reserved) 
>   break;{code}
> ## Checkpoint was marked as inapplicable for historical rebalancing
> {code:java}
> for (Integer grpId : new HashSet<>(groupsAndPartitions.keySet()))
>    if (!isCheckpointApplicableForGroup(grpId, chpEntry))
>      groupsAndPartitions.remove(grpId);{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12935) Disadvantages in log of historical rebalance

2020-07-03 Thread Alexey Scherbakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17150828#comment-17150828
 ] 

Alexey Scherbakov commented on IGNITE-12935:


[~v.pyatkov]

LGTM.

> Disadvantages in log of historical rebalance
> 
>
> Key: IGNITE-12935
> URL: https://issues.apache.org/jira/browse/IGNITE-12935
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> # Mention in the log only partitions for which there are no nodes that suit 
> as historical supplier
>  For these partitions, print minimal counter (since which we should perform 
> historical rebalancing) with corresponding node and maximum reserved counter 
> (since which cluster can perform historical rebalancing) with corresponding 
> node.
>  This will let us know:
>  ## Whether history was reserved at all
>  ## How much reserved history we lack to perform a historical rebalancing
>  ## I see resulting output like this:
> {noformat}
>  Historical rebalancing wasn't scheduled for some partitions:
>  History wasn't reserved for: [list of partitions and groups]
>  History was reserved, but minimum present counter is less than maximum 
> reserved: [[grp=GRP, part=ID, minCntr=cntr, minNodeId=ID, maxReserved=cntr, 
> maxReservedNodeId=ID], ...]{noformat}
>  ## We can also aggregate previous message by (minNodeId) to easily find the 
> exact node (or nodes) which were the reason of full rebalance.
>  # Log results of {{reserveHistoryForExchange()}}. They can be compactly 
> represented as mappings: {{(grpId -> checkpoint (id, timestamp))}}. For every 
> group, also log message about why the previous checkpoint wasn't successfully 
> reserved.
>  There can be three reasons:
>  ## Previous checkpoint simply isn't present in the history (the oldest is 
> reserved)
>  ## WAL reservation failure (call below returned false)
> {code:java}
> chpEntry = entry(cpTs);
> boolean reserved = cctx.wal().reserve(chpEntry.checkpointMark());// If 
> checkpoint WAL history can't be reserved, stop searching. 
> if (!reserved) 
>   break;{code}
> ## Checkpoint was marked as inapplicable for historical rebalancing
> {code:java}
> for (Integer grpId : new HashSet<>(groupsAndPartitions.keySet()))
>    if (!isCheckpointApplicableForGroup(grpId, chpEntry))
>      groupsAndPartitions.remove(grpId);{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12935) Disadvantages in log of historical rebalance

2020-07-06 Thread Alexey Scherbakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152518#comment-17152518
 ] 

Alexey Scherbakov commented on IGNITE-12935:


Merged to master #f55901d3de2148227b102e5f5260bc637617f261

> Disadvantages in log of historical rebalance
> 
>
> Key: IGNITE-12935
> URL: https://issues.apache.org/jira/browse/IGNITE-12935
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> # Mention in the log only partitions for which there are no nodes that suit 
> as historical supplier
>  For these partitions, print minimal counter (since which we should perform 
> historical rebalancing) with corresponding node and maximum reserved counter 
> (since which cluster can perform historical rebalancing) with corresponding 
> node.
>  This will let us know:
>  ## Whether history was reserved at all
>  ## How much reserved history we lack to perform a historical rebalancing
>  ## I see resulting output like this:
> {noformat}
>  Historical rebalancing wasn't scheduled for some partitions:
>  History wasn't reserved for: [list of partitions and groups]
>  History was reserved, but minimum present counter is less than maximum 
> reserved: [[grp=GRP, part=ID, minCntr=cntr, minNodeId=ID, maxReserved=cntr, 
> maxReservedNodeId=ID], ...]{noformat}
>  ## We can also aggregate previous message by (minNodeId) to easily find the 
> exact node (or nodes) which were the reason of full rebalance.
>  # Log results of {{reserveHistoryForExchange()}}. They can be compactly 
> represented as mappings: {{(grpId -> checkpoint (id, timestamp))}}. For every 
> group, also log message about why the previous checkpoint wasn't successfully 
> reserved.
>  There can be three reasons:
>  ## Previous checkpoint simply isn't present in the history (the oldest is 
> reserved)
>  ## WAL reservation failure (call below returned false)
> {code:java}
> chpEntry = entry(cpTs);
> boolean reserved = cctx.wal().reserve(chpEntry.checkpointMark());// If 
> checkpoint WAL history can't be reserved, stop searching. 
> if (!reserved) 
>   break;{code}
> ## Checkpoint was marked as inapplicable for historical rebalancing
> {code:java}
> for (Integer grpId : new HashSet<>(groupsAndPartitions.keySet()))
>    if (!isCheckpointApplicableForGroup(grpId, chpEntry))
>      groupsAndPartitions.remove(grpId);{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)