[jira] [Resolved] (IGNITE-12502) Document ignite-spring-data_2.2 module
[ https://issues.apache.org/jira/browse/IGNITE-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amelchev Nikita resolved IGNITE-12502. -- Resolution: Won't Fix The module was documented: https://ignite.apache.org/docs/latest/extensions-and-integrations/spring/spring-data > Document ignite-spring-data_2.2 module > -- > > Key: IGNITE-12502 > URL: https://issues.apache.org/jira/browse/IGNITE-12502 > Project: Ignite > Issue Type: Improvement > Components: documentation >Reporter: Ilya Kasnacheev >Priority: Major > > After IGNITE-12259 > I think there are no API changes, but we should mention that we have such > module and what its dependencies are. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (IGNITE-16919) H2 Index cost function must take into account only corresponding columns.
[ https://issues.apache.org/jira/browse/IGNITE-16919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17532434#comment-17532434 ] Konstantin Orlov commented on IGNITE-16919: --- I've taken another look, and now I see, it's definitely a bug. It's hard to comprehend without any test or reproducer... For the history: without the fix, we sometimes double the cost of an index just because we mistakenly suppose there are columns which aren't covered by the current index, so the read from the scan index is required. BTW, as far as I know, we always read the data row from the page, and do it only once despite all columns are covered by the index or not. Perhaps, we should revisit this place. > H2 Index cost function must take into account only corresponding columns. > - > > Key: IGNITE-16919 > URL: https://issues.apache.org/jira/browse/IGNITE-16919 > Project: Ignite > Issue Type: Bug > Components: sql >Affects Versions: 2.13 >Reporter: Evgeny Stanilovsky >Assignee: Evgeny Stanilovsky >Priority: Major > Attachments: image-2022-04-30-19-13-59-997.png > > Time Spent: 10m > Remaining Estimate: 0h > > H2IndexCostedBase#getCostRangeIndex is called with allColumnsSet where > consists columns from all operating tables, check: > org.h2.table.Plan#calculateCost : > {code:java} > final HashSet allColumnsSet = ExpressionVisitor > .allColumnsForTableFilters(allFilters); > {code} > thus allColumnsSet consist columns from all operating tables > !image-2022-04-30-19-13-59-997.png! > and erroneous iteration here: > H2IndexCostedBase#getCostRangeIndex > ... > {code:java} > if (!isScanIndex && allColumnsSet != null && !skipColumnsIntersection && > !allColumnsSet.isEmpty()) { > boolean foundAllColumnsWeNeed = true; > for (Column c : allColumnsSet) { // <-- all columns > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (IGNITE-15316) Read Repair may see inconsistent entry when it is consistent but updated right before the check
[ https://issues.apache.org/jira/browse/IGNITE-15316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Vinogradov updated IGNITE-15316: -- Summary: Read Repair may see inconsistent entry when it is consistent but updated right before the check (was: Read Repair may see inconsistent entry at tx cache when it is consistent but updated right before the check) > Read Repair may see inconsistent entry when it is consistent but updated > right before the check > --- > > Key: IGNITE-15316 > URL: https://issues.apache.org/jira/browse/IGNITE-15316 > Project: Ignite > Issue Type: Sub-task >Reporter: Anton Vinogradov >Assignee: Anton Vinogradov >Priority: Major > Labels: iep-31 > > Even at FULL_SYNC mode stale reads are possible from backups after the lock > is obtained by "Read Repair" tx. > This is possible because (at previous tx) entry becomes unlocked (committed) > on primary before tx committed on backups. > This is not a problem for Ignite (since backups keep locks until updated) but > produces false positive "inconsistency state found" events and repairs. > As to Atomic caches, there is even no chance to lock entry before the check, > so, the inconsistency window is wider than in the tx case. > This problem does not allow to use ReadRepair with concurrent modifications, > since repair may happen because of an inconsistent read (while another > operation is in progress), not because of real inconsistency. > A possible solution is to implement fake updates, which will guarantee that > the previous update is fully finished -> consistent read. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (IGNITE-15316) Read Repair may see inconsistent entry at tx cache when it is consistent but updated right before the check
[ https://issues.apache.org/jira/browse/IGNITE-15316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Vinogradov updated IGNITE-15316: -- Description: Even at FULL_SYNC mode stale reads are possible from backups after the lock is obtained by "Read Repair" tx. This is possible because (at previous tx) entry becomes unlocked (committed) on primary before tx committed on backups. This is not a problem for Ignite (since backups keep locks until updated) but produces false positive "inconsistency state found" events and repairs. As to Atomic caches, there is even no chance to lock entry before the check, so, the inconsistency window is wider than in the tx case. This problem does not allow to use ReadRepair with concurrent modifications, since repair may happen because of an inconsistent read (while another operation is in progress), not because of real inconsistency. A possible solution is to implement fake updates, which will guarantee that the previous update is fully finished -> consistent read. was: Even at FULL_SYNC mode stale reads are possible from backups after the lock is obtained by "Read Repair" tx. This is possible because (at previous tx) entry becomes unlocked (committed) on primary before tx committed on backups. This is not a problem for Ignite (since backups keep locks until updated) but produces false positive "inconsistency state found" events and repairs. Unlock relocation does not seems to be a proper fix, since it will cause a performance drop. So, we should recheck values several times if an inconsistency is found, even when the lock is already obtained by "Read Repair". > Read Repair may see inconsistent entry at tx cache when it is consistent but > updated right before the check > --- > > Key: IGNITE-15316 > URL: https://issues.apache.org/jira/browse/IGNITE-15316 > Project: Ignite > Issue Type: Sub-task >Reporter: Anton Vinogradov >Assignee: Anton Vinogradov >Priority: Major > Labels: iep-31 > > Even at FULL_SYNC mode stale reads are possible from backups after the lock > is obtained by "Read Repair" tx. > This is possible because (at previous tx) entry becomes unlocked (committed) > on primary before tx committed on backups. > This is not a problem for Ignite (since backups keep locks until updated) but > produces false positive "inconsistency state found" events and repairs. > As to Atomic caches, there is even no chance to lock entry before the check, > so, the inconsistency window is wider than in the tx case. > This problem does not allow to use ReadRepair with concurrent modifications, > since repair may happen because of an inconsistent read (while another > operation is in progress), not because of real inconsistency. > A possible solution is to implement fake updates, which will guarantee that > the previous update is fully finished -> consistent read. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] (IGNITE-15316) Read Repair may see inconsistent entry at tx cache when it is consistent but updated right before the check
[ https://issues.apache.org/jira/browse/IGNITE-15316 ] Anton Vinogradov deleted comment on IGNITE-15316: --- was (Author: av): It's a good idea to consider a replacement (as a part of this issue) {noformat} for (KeyCacheObject key : keys) { List nodes = ctx.affinity().nodesByKey(key, topVer); // affinity primaryNodes.put(key, nodes.get(0)); ... {noformat} to {noformat} for (KeyCacheObject key : keys) { List nodes = ctx.topology().nodes(key.partition(), topVer); // topology primaryNodes.put(key, nodes.get(0)); ... {noformat} at {{org.apache.ignite.internal.processors.cache.distributed.near.consistency.GridNearReadRepairAbstractFuture#map}}. This may help to reduce remaps count at unstable topology, but require being thoughtfully researched. Looks like affinity mapping instead of topology may cause unchecked copies on unstable topology. > Read Repair may see inconsistent entry at tx cache when it is consistent but > updated right before the check > --- > > Key: IGNITE-15316 > URL: https://issues.apache.org/jira/browse/IGNITE-15316 > Project: Ignite > Issue Type: Sub-task >Reporter: Anton Vinogradov >Assignee: Anton Vinogradov >Priority: Major > Labels: iep-31 > > Even at FULL_SYNC mode stale reads are possible from backups after the lock > is obtained by "Read Repair" tx. > This is possible because (at previous tx) entry becomes unlocked (committed) > on primary before tx committed on backups. > This is not a problem for Ignite (since backups keep locks until updated) but > produces false positive "inconsistency state found" events and repairs. > Unlock relocation does not seems to be a proper fix, since it will cause a > performance drop. > So, we should recheck values several times if an inconsistency is found, even > when the lock is already obtained by "Read Repair". -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (IGNITE-16931) Read Repair should support unstable topology
Anton Vinogradov created IGNITE-16931: - Summary: Read Repair should support unstable topology Key: IGNITE-16931 URL: https://issues.apache.org/jira/browse/IGNITE-16931 Project: Ignite Issue Type: Improvement Reporter: Anton Vinogradov Assignee: Anton Vinogradov Currently RR does not support unstable topology (when not all owners are located by affinity) and this can be fixed. As a start point, it's a good idea to consider a replacement {noformat} for (KeyCacheObject key : keys) { List nodes = ctx.affinity().nodesByKey(key, topVer); // affinity primaryNodes.put(key, nodes.get(0)); ... {noformat} to {noformat} for (KeyCacheObject key : keys) { List nodes = ctx.topology().nodes(key.partition(), topVer); // topology primaryNodes.put(key, nodes.get(0)); ... {noformat} at {{{}org.apache.ignite.internal.processors.cache.distributed.near.consistency.GridNearReadRepairAbstractFuture#map{}}}. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (IGNITE-16931) Read Repair should support unstable topology
[ https://issues.apache.org/jira/browse/IGNITE-16931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Vinogradov updated IGNITE-16931: -- Parent: IGNITE-15167 Issue Type: Sub-task (was: Improvement) > Read Repair should support unstable topology > > > Key: IGNITE-16931 > URL: https://issues.apache.org/jira/browse/IGNITE-16931 > Project: Ignite > Issue Type: Sub-task >Reporter: Anton Vinogradov >Assignee: Anton Vinogradov >Priority: Major > Labels: iep-31 > > Currently RR does not support unstable topology (when not all owners are > located by affinity) and this can be fixed. > As a start point, it's a good idea to consider a replacement > {noformat} > for (KeyCacheObject key : keys) { > List nodes = ctx.affinity().nodesByKey(key, > topVer); // affinity > primaryNodes.put(key, nodes.get(0)); > ... > {noformat} > to > {noformat} > for (KeyCacheObject key : keys) { > List nodes = > ctx.topology().nodes(key.partition(), topVer); // topology > primaryNodes.put(key, nodes.get(0)); > ... > {noformat} > at > {{{}org.apache.ignite.internal.processors.cache.distributed.near.consistency.GridNearReadRepairAbstractFuture#map{}}}. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (IGNITE-16239) [Extensions] Document the zookeeper-ip-finder-ext extension.
[ https://issues.apache.org/jira/browse/IGNITE-16239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amelchev Nikita updated IGNITE-16239: - Fix Version/s: 2.13 > [Extensions] Document the zookeeper-ip-finder-ext extension. > > > Key: IGNITE-16239 > URL: https://issues.apache.org/jira/browse/IGNITE-16239 > Project: Ignite > Issue Type: Task >Reporter: Amelchev Nikita >Priority: Minor > Fix For: 2.13 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (IGNITE-16919) H2 Index cost function must take into account only corresponding columns.
[ https://issues.apache.org/jira/browse/IGNITE-16919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17532345#comment-17532345 ] Konstantin Orlov commented on IGNITE-16919: --- Hi, [~zstan]! The patch looks good to me. Could you please change the type of this ticket to "Improvement"? The old behaviour seems legit too. > H2 Index cost function must take into account only corresponding columns. > - > > Key: IGNITE-16919 > URL: https://issues.apache.org/jira/browse/IGNITE-16919 > Project: Ignite > Issue Type: Bug > Components: sql >Affects Versions: 2.13 >Reporter: Evgeny Stanilovsky >Assignee: Evgeny Stanilovsky >Priority: Major > Attachments: image-2022-04-30-19-13-59-997.png > > Time Spent: 10m > Remaining Estimate: 0h > > H2IndexCostedBase#getCostRangeIndex is called with allColumnsSet where > consists columns from all operating tables, check: > org.h2.table.Plan#calculateCost : > {code:java} > final HashSet allColumnsSet = ExpressionVisitor > .allColumnsForTableFilters(allFilters); > {code} > thus allColumnsSet consist columns from all operating tables > !image-2022-04-30-19-13-59-997.png! > and erroneous iteration here: > H2IndexCostedBase#getCostRangeIndex > ... > {code:java} > if (!isScanIndex && allColumnsSet != null && !skipColumnsIntersection && > !allColumnsSet.isEmpty()) { > boolean foundAllColumnsWeNeed = true; > for (Column c : allColumnsSet) { // <-- all columns > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (IGNITE-16668) Design in-memory raft group reconfiguration on node failure
[ https://issues.apache.org/jira/browse/IGNITE-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Lapin updated IGNITE-16668: - Description: If a node storing a partition of an in-memory table fails and leaves the cluster all data it had is lost. From the point of view of the partition it looks like as the node is left forever. Although Raft protocol tolerates leaving some amount of nodes composing Raft group (partition); for in-memory caches we cannot restore replica factor because of in-memory nature of the table. It means that we need to detect failures of each node owning a partition and recalculate assignments for the table without keeping replica factor. h4. Upd 1: h4. Problem By design raft has several persisted segments, e.g. raft meta (term/committedIndex) and stable raft log. So, by converting common raft to in-memory one it’s possible to break some of it’s invariants. For example Node C could vote for Candidate A before self-restart and vote then for Candidate B after one. As a result two leaders will be elected which is illegal. !Screenshot from 2022-04-19 11-11-05.png! h4. Solution In order to solve the problem mentioned above it’s possible to remove and then return back the restarting node from the peers of the corresponding raft group. The peer-removal process should be finished before the restarting of the corresponding raft server node. !Screenshot from 2022-04-19 11-12-55.png! The process of removing and then returning back the restarting node is however itself tricky. And to answer why it’s non-trivial action, it’s necessary to reveal the main ideas of the rebalance protocol. Reconfiguration of the raft group - is a process driven by the fact of changing the assignments. Each partition has three corresponding sets of assignments stored in the metastore: # assignments.stable - current distribution # assignments.pending - partition distribution for an ongoing rebalance if any # assignments.planned - in some cases it’s not possible to cancel or merge pending rebalance with new one. In that case newly calculated assignments will be stored explicitly with corresponding assignments.planned key. It's worth noting that it doesn't make sense to keep more than one planned rebalance. Any new scheduled one will overwrite already existing. However such idea of overwriting the assignments.planned key wont work within the context of an in-memory raft restart, because it’s not valid to overwrite the reduction of assignments. Let's illustrate this problem with the following example. # In-memory partition p1 is hosted on nodes A, B and C, meaning that p1.assignments.stable=[A,B,C] # Let's say that the baseline was changed, resulting in a rebalance on assignments.pending=[A,B,C,*D*] # During the non-cancelable phase of [A,B,C]->[A,B,C,D], node C fails and returns back, meaning that we should plan [A,B,D] and [A,B,C,D] assignments. Both must be recorded in the only assignments.planned key meaning that [A,B,C,D] will overwrite reduction [A,B,D], so no actual raft reconfiguration will take place, which is not acceptable. In order to overcome given issue, let’s introduce two new keys _assignments.switch.reduce_ that will hold nodes that should be removed and _assignments.switch.append_ that will hold nodes that should be returned back and run following actions: h5. On in-memory partition restart (or on partition start with cleaned-up PDS) within retry loop add current node to assignments.switch.reduce set: {code:java} do { retrievedAssignmentsSwitchReduce = metastorage.read(assignments.switch.reduce); calculatedAssignmetnsSwitchReduce = union(retrievedAssignmentsSwitchReduce.value, currentNode); if (retrievedAssignmentsSwitchReduce.isEmpty()) { invokeRes = metastoreInvoke: if empty(assignments.switch.reduce) assignments.switch.reduce = calculatedAssignmentsSwitchReduce } else { invokeRes = metastoreInvoke: eq(revision(assignments.switch.reduce), retrievedAssignmentsSwitchReduce.revision) assignments.switch.reduce = calculatedAssignmentsSwitchReduce } } while (!invokeRes);{code} h5. On assignments.switch.reduce change on corresponding partition leader Within watch listener on assignments.switch.reduce key on corresponding partition leader we trigger new rebalance if there are no pending one. {code:java} calculatedAssignments = substract(calcPartAssighments(), assignments.switch.reduce); metastoreInvoke: if empty(partition.assignments.change.trigger.revision) || partition.assignments.change.trigger.revision < event.revision if empty(assignments.pending) assignments.pending = calculatedAssignments partition.assignments.change.trigger.revision = event.revision {code} h5. On rebalance done changePeers() calles
[jira] [Updated] (IGNITE-16668) Design in-memory raft group reconfiguration on node failure
[ https://issues.apache.org/jira/browse/IGNITE-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Lapin updated IGNITE-16668: - Description: If a node storing a partition of an in-memory table fails and leaves the cluster all data it had is lost. From the point of view of the partition it looks like as the node is left forever. Although Raft protocol tolerates leaving some amount of nodes composing Raft group (partition); for in-memory caches we cannot restore replica factor because of in-memory nature of the table. It means that we need to detect failures of each node owning a partition and recalculate assignments for the table without keeping replica factor. h4. Upd 1: h4. Problem By design raft has several persisted segments, e.g. raft meta (term/committedIndex) and stable raft log. So, by converting common raft to in-memory one it’s possible to break some of it’s invariants. For example Node C could vote for Candidate A before self-restart and vote then for Candidate B after one. As a result two leaders will be elected which is illegal. !Screenshot from 2022-04-19 11-11-05.png! h4. Solution In order to solve the problem mentioned above it’s possible to remove and then return back the restarting node from the peers of the corresponding raft group. The peer-removal process should be finished before the restarting of the corresponding raft server node. !Screenshot from 2022-04-19 11-12-55.png! The process of removing and then returning back the restarting node is however itself tricky. And to answer why it’s non-trivial action, it’s necessary to reveal the main ideas of the rebalance protocol. Reconfiguration of the raft group - is a process driven by the fact of changing the assignments. Each partition has three corresponding sets of assignments stored in the metastore: # assignments.stable - current distribution # assignments.pending - partition distribution for an ongoing rebalance if any # assignments.planned - in some cases it’s not possible to cancel or merge pending rebalance with new one. In that case newly calculated assignments will be stored explicitly with corresponding assignments.planned key. It's worth noting that it doesn't make sense to keep more than one planned rebalance. Any new scheduled one will overwrite already existing. However such idea of overwriting the assignments.planned key wont work within the context of an in-memory raft restart, because it’s not valid to overwrite the reduction of assignments. Let's illustrate this problem with the following example. # In-memory partition p1 is hosted on nodes A, B and C, meaning that p1.assignments.stable=[A,B,C] # Let's say that the baseline was changed, resulting in a rebalance on assignments.pending=[A,B,C,*D*] # During the non-cancelable phase of [A,B,C]->[A,B,C,D], node C fails and returns back, meaning that we should plan [A,B,D] and [A,B,C,D] assignments. Both must be recorded in the only assignments.planned key meaning that [A,B,C,D] will overwrite reduction [A,B,D], so no actual raft reconfiguration will take place, which is not acceptable. In order to overcome given issue, let’s introduce two new keys _assignments.switch.reduce_ that will hold nodes that should be removed and _assignments.switch.append_ that will hold nodes that should be returned back and run following actions: h5. On in-memory partition restart (or on partition start with cleaned-up PDS) within retry loop add current node to assignments.switch.reduce set: {code:java} do { retrievedAssignmentsSwitchReduce = metastorage.read(assignments.switch.reduce); calculatedAssignmetnsSwitchReduce = union(retrievedAssignmentsSwitchReduce.value, currentNode); if (retrievedAssignmentsSwitchReduce.isEmpty()) { invokeRes = metastoreInvoke: if empty(assignments.switch.reduce) assignments.switch.reduce = calculatedAssignmentsSwitchReduce } else { invokeRes = metastoreInvoke: eq(revision(assignments.switch.reduce), retrievedAssignmentsSwitchReduce.revision) assignments.switch.reduce = calculatedAssignmentsSwitchReduce } } while (!invokeRes);{code} h5. On assignments.switch.reduce change on corresponding partition leader Within watch listener on assignments.switch.reduce key on corresponding partition leader we trigger new rebalance if there are no pending one. {code:java} calculatedAssignments = substract(calcPartAssighments(), assignments.switch.reduce); metastoreInvoke: if empty(partition.assignments.change.trigger.revision) || partition.assignments.change.trigger.revision < event.revision if empty(assignments.pending) assignments.pending = calculatedAssignments partition.assignments.change.trigger.revision = event.revision {code} h5. On rebalance done changePeers() calles
[jira] [Comment Edited] (IGNITE-16895) Update documentation with GitHub Actions
[ https://issues.apache.org/jira/browse/IGNITE-16895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17532282#comment-17532282 ] Amelchev Nikita edited comment on IGNITE-16895 at 5/5/22 2:48 PM: -- Merged into the master. [~mmuzaf], thank you for the review! was (Author: nsamelchev): Merged into the master. > Update documentation with GitHub Actions > > > Key: IGNITE-16895 > URL: https://issues.apache.org/jira/browse/IGNITE-16895 > Project: Ignite > Issue Type: Task >Reporter: Amelchev Nikita >Assignee: Amelchev Nikita >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > GitHub Actions can be used to update documentation for released Ignite > versions. > For now, this is a complex manual work that requires understanding all the > intermediate steps: > [wiki|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=85461527#HowtoDocument-UpdatingPublishedDocs]. > I propose to automatize this and give an ability to update documentation on a > push event to a released branch. > ASF GitHub Actions Policy allows automated services to push changes related > to documentation: > [policy|https://infra.apache.org/github-actions-policy.html]. > Write access is > [required|https://docs.github.com/en/actions/managing-workflow-runs/manually-running-a-workflow] > to run the update. So, only committers can run workflows manually. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (IGNITE-16895) Update documentation with GitHub Actions
[ https://issues.apache.org/jira/browse/IGNITE-16895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amelchev Nikita updated IGNITE-16895: - Description: GitHub Actions can be used to update documentation for released Ignite versions. For now, this is a complex manual work that requires understanding all the intermediate steps: [wiki|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=85461527#HowtoDocument-UpdatingPublishedDocs]. I propose to automatize this and give an ability to update documentation on a push event to a released branch. ASF GitHub Actions Policy allows automated services to push changes related to documentation: [policy|https://infra.apache.org/github-actions-policy.html]. Write access is [required|https://docs.github.com/en/actions/managing-workflow-runs/manually-running-a-workflow] to run the update. So, only committers can run workflows manually. was: GitHub Actions can be used to update documentation for released Ignite versions. For now, this is a complex manual work that requires understanding all the intermediate steps: [wiki|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=85461527#HowtoDocument-UpdatingPublishedDocs]. I propose to automatize this and give an ability to update documentation by a click (or on a push to a released branch event): ASF GitHub Actions Policy allows automated services to push changes related to documentation: [policy|https://infra.apache.org/github-actions-policy.html]. Write access is [required|https://docs.github.com/en/actions/managing-workflow-runs/manually-running-a-workflow] to run the update. So, only committers can run workflows manually. > Update documentation with GitHub Actions > > > Key: IGNITE-16895 > URL: https://issues.apache.org/jira/browse/IGNITE-16895 > Project: Ignite > Issue Type: Task >Reporter: Amelchev Nikita >Assignee: Amelchev Nikita >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > GitHub Actions can be used to update documentation for released Ignite > versions. > For now, this is a complex manual work that requires understanding all the > intermediate steps: > [wiki|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=85461527#HowtoDocument-UpdatingPublishedDocs]. > I propose to automatize this and give an ability to update documentation on a > push event to a released branch. > ASF GitHub Actions Policy allows automated services to push changes related > to documentation: > [policy|https://infra.apache.org/github-actions-policy.html]. > Write access is > [required|https://docs.github.com/en/actions/managing-workflow-runs/manually-running-a-workflow] > to run the update. So, only committers can run workflows manually. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (IGNITE-16895) Update documentation with GitHub Actions
[ https://issues.apache.org/jira/browse/IGNITE-16895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amelchev Nikita updated IGNITE-16895: - Attachment: (was: image-2022-04-23-16-19-41-327.png) > Update documentation with GitHub Actions > > > Key: IGNITE-16895 > URL: https://issues.apache.org/jira/browse/IGNITE-16895 > Project: Ignite > Issue Type: Task >Reporter: Amelchev Nikita >Assignee: Amelchev Nikita >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > GitHub Actions can be used to update documentation for released Ignite > versions. > For now, this is a complex manual work that requires understanding all the > intermediate steps: > [wiki|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=85461527#HowtoDocument-UpdatingPublishedDocs]. > I propose to automatize this and give an ability to update documentation on a > push event to a released branch. > ASF GitHub Actions Policy allows automated services to push changes related > to documentation: > [policy|https://infra.apache.org/github-actions-policy.html]. > Write access is > [required|https://docs.github.com/en/actions/managing-workflow-runs/manually-running-a-workflow] > to run the update. So, only committers can run workflows manually. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (IGNITE-16895) Update documentation with GitHub Actions
[ https://issues.apache.org/jira/browse/IGNITE-16895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amelchev Nikita updated IGNITE-16895: - Description: GitHub Actions can be used to update documentation for released Ignite versions. For now, this is a complex manual work that requires understanding all the intermediate steps: [wiki|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=85461527#HowtoDocument-UpdatingPublishedDocs]. I propose to automatize this and give an ability to update documentation by a click (or on a push to a released branch event): ASF GitHub Actions Policy allows automated services to push changes related to documentation: [policy|https://infra.apache.org/github-actions-policy.html]. Write access is [required|https://docs.github.com/en/actions/managing-workflow-runs/manually-running-a-workflow] to run the update. So, only committers can run workflows manually. was: GitHub Actions can be used to update documentation for released Ignite versions. For now, this is a complex manual work that requires understanding all the intermediate steps: [wiki|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=85461527#HowtoDocument-UpdatingPublishedDocs]. I propose to automatize this and give an ability to update documentation by a click (or on a push to a released branch event): !image-2022-04-23-16-19-41-327.png|width=260,height=216! ASF GitHub Actions Policy allows automated services to push changes related to documentation: [policy|https://infra.apache.org/github-actions-policy.html]. Write access is [required|https://docs.github.com/en/actions/managing-workflow-runs/manually-running-a-workflow] to run the update. So, only committers can run workflows manually. > Update documentation with GitHub Actions > > > Key: IGNITE-16895 > URL: https://issues.apache.org/jira/browse/IGNITE-16895 > Project: Ignite > Issue Type: Task >Reporter: Amelchev Nikita >Assignee: Amelchev Nikita >Priority: Major > Attachments: image-2022-04-23-16-19-41-327.png > > Time Spent: 10m > Remaining Estimate: 0h > > GitHub Actions can be used to update documentation for released Ignite > versions. > For now, this is a complex manual work that requires understanding all the > intermediate steps: > [wiki|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=85461527#HowtoDocument-UpdatingPublishedDocs]. > I propose to automatize this and give an ability to update documentation by a > click (or on a push to a released branch event): > ASF GitHub Actions Policy allows automated services to push changes related > to documentation: > [policy|https://infra.apache.org/github-actions-policy.html]. > Write access is > [required|https://docs.github.com/en/actions/managing-workflow-runs/manually-running-a-workflow] > to run the update. So, only committers can run workflows manually. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (IGNITE-16908) Move ignite-hibernate modules to the Ignite Extension
[ https://issues.apache.org/jira/browse/IGNITE-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17532266#comment-17532266 ] Maxim Muzafarov commented on IGNITE-16908: -- In addition to the pull requrest the following branches prepared: https://github.com/apache/ignite-extensions/tree/release/ignite-hibernate-ext-5.1.0/modules/hibernate-ext https://github.com/apache/ignite-extensions/tree/release/ignite-hibernate-ext-4.2.0/modules/hibernate-ext > Move ignite-hibernate modules to the Ignite Extension > - > > Key: IGNITE-16908 > URL: https://issues.apache.org/jira/browse/IGNITE-16908 > Project: Ignite > Issue Type: Task > Components: extensions >Reporter: Maxim Muzafarov >Assignee: Maxim Muzafarov >Priority: Major > Fix For: 2.14 > > Time Spent: 10m > Remaining Estimate: 0h > > The following list of modules should be moved to the Extensions. > - ignite-hibernate_4.2 > - ignite-hibernate_5.1 > - ignite-hibernate_5.3 > - ignite-hibernate-core (a common part for all hibernate modules) > In details: > - remove all these modules from the Ignite project. > - create ignite-hibernate extension. > - move ignite-hibernate-core + ignite-hibernate_4.2 to > release/ignite-hibernate-4.2.0 branch (the version of ignite-hibernate > extension will be 4.2.0) and release it on demand; > - move ignite-hibernate-core + ignite-hibernate_5.1 to > release/ignite-hibernate-5.1.0 branch (the version of ignite-hibernate > extension will be 5.1.0) and release it on demand; > - move ignite-hibernate-core + ignite-hibernate_5.3 to the master > branch and to the release/ignite-hibernate-5.3.0 branch (the version > of ignite-hibernate extension will be 5.3.0) and release it > immediately; -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (IGNITE-16930) .NET: Thin 3.0: Implement Compute.ExecuteColocated
Pavel Tupitsyn created IGNITE-16930: --- Summary: .NET: Thin 3.0: Implement Compute.ExecuteColocated Key: IGNITE-16930 URL: https://issues.apache.org/jira/browse/IGNITE-16930 Project: Ignite Issue Type: Improvement Components: platforms, thin client Reporter: Pavel Tupitsyn Assignee: Pavel Tupitsyn Fix For: 3.0.0-alpha5 Implement executeColocated without partition awareness (send the request using the default connection, let the server route it to the correct node). See IGNITE-16786 for a reference implementation. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (IGNITE-16786) Thin 3.0: Implement ClientCompute#executeColocated()
[ https://issues.apache.org/jira/browse/IGNITE-16786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Tupitsyn updated IGNITE-16786: Component/s: thin client (was: clients) > Thin 3.0: Implement ClientCompute#executeColocated() > > > Key: IGNITE-16786 > URL: https://issues.apache.org/jira/browse/IGNITE-16786 > Project: Ignite > Issue Type: Improvement > Components: thin client >Reporter: Roman Puchkovskiy >Assignee: Pavel Tupitsyn >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-alpha5 > > > Implement executeColocated without partition awareness (send the request > using the default connection, let the server route it to the correct node). -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Assigned] (IGNITE-16926) Interrupted compute job may fail a node
[ https://issues.apache.org/jira/browse/IGNITE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov reassigned IGNITE-16926: -- Assignee: Ivan Bessonov > Interrupted compute job may fail a node > --- > > Key: IGNITE-16926 > URL: https://issues.apache.org/jira/browse/IGNITE-16926 > Project: Ignite > Issue Type: Bug > Components: persistence >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > {code:java} > Critical system error detected. Will be handled accordingly to configured > handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, > super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet > [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], > failureCtx=FailureContext [type=CRITICAL_ERROR, err=class > o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is > corrupted [groupId=1234619879, pageIds=[7290201467513], > cacheId=645096946, cacheName=*, indexName=*, msg=Runtime failure on row: > Row@79570772[ key: 1168930235, val: Data hidden due to > IGNITE_SENSITIVE_DATA_LOGGING flag. ][ data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden > ","logger_name":"ROOT","thread_name":"pub-#1278%x%","level":"ERROR","level_value":4,"stack_trace":"org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: > B+Tree is corrupted [groupId=1234619879, pageIds=[7290201467513], > cacheId=645096946, cacheName=*, indexName=*, msg=Runtime failure on row: > Row@79570772[ key: 1168930235, val: Data hidden due to > IGNITE_SENSITIVE_DATA_LOGGING flag. ][ data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, > data hidden ]] at > org.apache.ignite.internal.processors.query.h2.database.H2Tree.corruptedTreeException(H2Tree.java:1003) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doPut(BPlusTree.java:2492) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.putx(BPlusTree.java:2432) > at > org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.putx(H2TreeIndex.java:500) > at >
[jira] [Created] (IGNITE-16929) .NET: Thin 3.0: Implement sessions for .NET thin client
Igor Sapego created IGNITE-16929: Summary: .NET: Thin 3.0: Implement sessions for .NET thin client Key: IGNITE-16929 URL: https://issues.apache.org/jira/browse/IGNITE-16929 Project: Ignite Issue Type: New Feature Components: platforms, thin client Affects Versions: 3.0.0-alpha4 Reporter: Igor Sapego Fix For: 3.0.0-alpha5 Let's implement sessions support for .NET client. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (IGNITE-16928) Thin 3.0: Implement sessions for Java client
Igor Sapego created IGNITE-16928: Summary: Thin 3.0: Implement sessions for Java client Key: IGNITE-16928 URL: https://issues.apache.org/jira/browse/IGNITE-16928 Project: Ignite Issue Type: New Feature Components: platforms, thin client Affects Versions: 3.0.0-alpha4 Reporter: Igor Sapego Fix For: 3.0.0-alpha5 Let's implemnt local sessions for Java client. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (IGNITE-5956) Ignite Continuous Query (Queries 3): IgniteCacheDistributedJoinPartitionedAndReplicatedTest fails
[ https://issues.apache.org/jira/browse/IGNITE-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17532180#comment-17532180 ] Evgeny Stanilovsky commented on IGNITE-5956: [~jooger] seems this is still an issue and need to be fixed. > Ignite Continuous Query (Queries 3): > IgniteCacheDistributedJoinPartitionedAndReplicatedTest fails > - > > Key: IGNITE-5956 > URL: https://issues.apache.org/jira/browse/IGNITE-5956 > Project: Ignite > Issue Type: Bug > Components: sql >Affects Versions: 2.1, 2.13 >Reporter: Sergey Chugunov >Priority: Major > Labels: MakeTeamcityGreenAgain, test-failure > > Reproducible locally. > May be broken by commit *70eed75422ea*. > Fails with exception: > {noformat} > javax.cache.CacheException: Failed to execute query: for distributed join all > REPLICATED caches must be at the end of the joined tables list. > at > org.apache.ignite.internal.processors.query.h2.opt.GridH2CollocationModel.isCollocated(GridH2CollocationModel.java:704) > at > org.apache.ignite.internal.processors.query.h2.sql.GridSqlQuerySplitter.split(GridSqlQuerySplitter.java:239) > at > org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.queryDistributedSqlFields(IgniteH2Indexing.java:1309) > at > org.apache.ignite.internal.processors.query.GridQueryProcessor$5.applyx(GridQueryProcessor.java:1804) > at > org.apache.ignite.internal.processors.query.GridQueryProcessor$5.applyx(GridQueryProcessor.java:1802) > at > org.apache.ignite.internal.util.lang.IgniteOutClosureX.apply(IgniteOutClosureX.java:36) > at > org.apache.ignite.internal.processors.query.GridQueryProcessor.executeQuery(GridQueryProcessor.java:2282) > at > org.apache.ignite.internal.processors.query.GridQueryProcessor.querySqlFields(GridQueryProcessor.java:1809) > at > org.apache.ignite.internal.processors.cache.IgniteCacheProxy.query(IgniteCacheProxy.java:788) > at > org.apache.ignite.internal.processors.cache.IgniteCacheProxy.query(IgniteCacheProxy.java:758) > at > org.apache.ignite.testframework.junits.common.GridCommonAbstractTest.queryPlan(GridCommonAbstractTest.java:1650) > at > org.apache.ignite.internal.processors.cache.IgniteCacheDistributedJoinPartitionedAndReplicatedTest.checkQuery(IgniteCacheDistributedJoinPartitionedAndReplicatedTest.java:389) > at > org.apache.ignite.internal.processors.cache.IgniteCacheDistributedJoinPartitionedAndReplicatedTest.checkQueries(IgniteCacheDistributedJoinPartitionedAndReplicatedTest.java:364) > at > org.apache.ignite.internal.processors.cache.IgniteCacheDistributedJoinPartitionedAndReplicatedTest.join(IgniteCacheDistributedJoinPartitionedAndReplicatedTest.java:283) > at > org.apache.ignite.internal.processors.cache.IgniteCacheDistributedJoinPartitionedAndReplicatedTest.testJoin2(IgniteCacheDistributedJoinPartitionedAndReplicatedTest.java:197) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at junit.framework.TestCase.runTest(TestCase.java:176) > at > org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:1980) > at > org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:131) > at > org.apache.ignite.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:1895) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (IGNITE-5956) Ignite Continuous Query (Queries 3): IgniteCacheDistributedJoinPartitionedAndReplicatedTest fails
[ https://issues.apache.org/jira/browse/IGNITE-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Evgeny Stanilovsky updated IGNITE-5956: --- Component/s: sql > Ignite Continuous Query (Queries 3): > IgniteCacheDistributedJoinPartitionedAndReplicatedTest fails > - > > Key: IGNITE-5956 > URL: https://issues.apache.org/jira/browse/IGNITE-5956 > Project: Ignite > Issue Type: Bug > Components: sql >Affects Versions: 2.1, 2.13 >Reporter: Sergey Chugunov >Priority: Major > Labels: MakeTeamcityGreenAgain, test-failure > > Reproducible locally. > May be broken by commit *70eed75422ea*. > Fails with exception: > {noformat} > javax.cache.CacheException: Failed to execute query: for distributed join all > REPLICATED caches must be at the end of the joined tables list. > at > org.apache.ignite.internal.processors.query.h2.opt.GridH2CollocationModel.isCollocated(GridH2CollocationModel.java:704) > at > org.apache.ignite.internal.processors.query.h2.sql.GridSqlQuerySplitter.split(GridSqlQuerySplitter.java:239) > at > org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.queryDistributedSqlFields(IgniteH2Indexing.java:1309) > at > org.apache.ignite.internal.processors.query.GridQueryProcessor$5.applyx(GridQueryProcessor.java:1804) > at > org.apache.ignite.internal.processors.query.GridQueryProcessor$5.applyx(GridQueryProcessor.java:1802) > at > org.apache.ignite.internal.util.lang.IgniteOutClosureX.apply(IgniteOutClosureX.java:36) > at > org.apache.ignite.internal.processors.query.GridQueryProcessor.executeQuery(GridQueryProcessor.java:2282) > at > org.apache.ignite.internal.processors.query.GridQueryProcessor.querySqlFields(GridQueryProcessor.java:1809) > at > org.apache.ignite.internal.processors.cache.IgniteCacheProxy.query(IgniteCacheProxy.java:788) > at > org.apache.ignite.internal.processors.cache.IgniteCacheProxy.query(IgniteCacheProxy.java:758) > at > org.apache.ignite.testframework.junits.common.GridCommonAbstractTest.queryPlan(GridCommonAbstractTest.java:1650) > at > org.apache.ignite.internal.processors.cache.IgniteCacheDistributedJoinPartitionedAndReplicatedTest.checkQuery(IgniteCacheDistributedJoinPartitionedAndReplicatedTest.java:389) > at > org.apache.ignite.internal.processors.cache.IgniteCacheDistributedJoinPartitionedAndReplicatedTest.checkQueries(IgniteCacheDistributedJoinPartitionedAndReplicatedTest.java:364) > at > org.apache.ignite.internal.processors.cache.IgniteCacheDistributedJoinPartitionedAndReplicatedTest.join(IgniteCacheDistributedJoinPartitionedAndReplicatedTest.java:283) > at > org.apache.ignite.internal.processors.cache.IgniteCacheDistributedJoinPartitionedAndReplicatedTest.testJoin2(IgniteCacheDistributedJoinPartitionedAndReplicatedTest.java:197) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at junit.framework.TestCase.runTest(TestCase.java:176) > at > org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:1980) > at > org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:131) > at > org.apache.ignite.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:1895) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (IGNITE-5956) Ignite Continuous Query (Queries 3): IgniteCacheDistributedJoinPartitionedAndReplicatedTest fails
[ https://issues.apache.org/jira/browse/IGNITE-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Evgeny Stanilovsky updated IGNITE-5956: --- Affects Version/s: 2.13 > Ignite Continuous Query (Queries 3): > IgniteCacheDistributedJoinPartitionedAndReplicatedTest fails > - > > Key: IGNITE-5956 > URL: https://issues.apache.org/jira/browse/IGNITE-5956 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.1, 2.13 >Reporter: Sergey Chugunov >Priority: Major > Labels: MakeTeamcityGreenAgain, test-failure > > Reproducible locally. > May be broken by commit *70eed75422ea*. > Fails with exception: > {noformat} > javax.cache.CacheException: Failed to execute query: for distributed join all > REPLICATED caches must be at the end of the joined tables list. > at > org.apache.ignite.internal.processors.query.h2.opt.GridH2CollocationModel.isCollocated(GridH2CollocationModel.java:704) > at > org.apache.ignite.internal.processors.query.h2.sql.GridSqlQuerySplitter.split(GridSqlQuerySplitter.java:239) > at > org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.queryDistributedSqlFields(IgniteH2Indexing.java:1309) > at > org.apache.ignite.internal.processors.query.GridQueryProcessor$5.applyx(GridQueryProcessor.java:1804) > at > org.apache.ignite.internal.processors.query.GridQueryProcessor$5.applyx(GridQueryProcessor.java:1802) > at > org.apache.ignite.internal.util.lang.IgniteOutClosureX.apply(IgniteOutClosureX.java:36) > at > org.apache.ignite.internal.processors.query.GridQueryProcessor.executeQuery(GridQueryProcessor.java:2282) > at > org.apache.ignite.internal.processors.query.GridQueryProcessor.querySqlFields(GridQueryProcessor.java:1809) > at > org.apache.ignite.internal.processors.cache.IgniteCacheProxy.query(IgniteCacheProxy.java:788) > at > org.apache.ignite.internal.processors.cache.IgniteCacheProxy.query(IgniteCacheProxy.java:758) > at > org.apache.ignite.testframework.junits.common.GridCommonAbstractTest.queryPlan(GridCommonAbstractTest.java:1650) > at > org.apache.ignite.internal.processors.cache.IgniteCacheDistributedJoinPartitionedAndReplicatedTest.checkQuery(IgniteCacheDistributedJoinPartitionedAndReplicatedTest.java:389) > at > org.apache.ignite.internal.processors.cache.IgniteCacheDistributedJoinPartitionedAndReplicatedTest.checkQueries(IgniteCacheDistributedJoinPartitionedAndReplicatedTest.java:364) > at > org.apache.ignite.internal.processors.cache.IgniteCacheDistributedJoinPartitionedAndReplicatedTest.join(IgniteCacheDistributedJoinPartitionedAndReplicatedTest.java:283) > at > org.apache.ignite.internal.processors.cache.IgniteCacheDistributedJoinPartitionedAndReplicatedTest.testJoin2(IgniteCacheDistributedJoinPartitionedAndReplicatedTest.java:197) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at junit.framework.TestCase.runTest(TestCase.java:176) > at > org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:1980) > at > org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:131) > at > org.apache.ignite.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:1895) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (IGNITE-16900) Add checkstyle LeftCurly rule
[ https://issues.apache.org/jira/browse/IGNITE-16900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17532171#comment-17532171 ] Nikolay Izhikov commented on IGNITE-16900: -- Failures unrelated. Tests broken by - https://github.com/apache/ignite/commit/7357847369079925289114f650a506408812fe4c > Add checkstyle LeftCurly rule > - > > Key: IGNITE-16900 > URL: https://issues.apache.org/jira/browse/IGNITE-16900 > Project: Ignite > Issue Type: Improvement >Reporter: Nikolay Izhikov >Assignee: Nikolay Izhikov >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Ignite codestyle specify > > { starts on the same line as the opening block statement. For example: > To force this checkstyle has a LeftCurly rule. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (IGNITE-16900) Add checkstyle LeftCurly rule
[ https://issues.apache.org/jira/browse/IGNITE-16900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17532170#comment-17532170 ] Ignite TC Bot commented on IGNITE-16900: {panel:title=Branch: [pull//head] Base: [master] : Possible Blockers (2)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1} {color:#d04437}Control Utility{color} [[tests 2|https://ci2.ignite.apache.org/viewLog.html?buildId=6422754]] * IgniteControlUtilityTestSuite: KillCommandsCommandShTest.testCancelConsistencyTask - Test has low fail rate in base branch 3,8% and is not flaky * IgniteControlUtilityTestSuite: KillCommandsCommandShTest.testCancelComputeTask - Test has low fail rate in base branch 3,8% and is not flaky {panel} {panel:title=Branch: [pull//head] Base: [master] : No new tests found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}{panel} [TeamCity *-- Run :: All* Results|https://ci2.ignite.apache.org/viewLog.html?buildId=6422624buildTypeId=IgniteTests24Java8_RunAll] > Add checkstyle LeftCurly rule > - > > Key: IGNITE-16900 > URL: https://issues.apache.org/jira/browse/IGNITE-16900 > Project: Ignite > Issue Type: Improvement >Reporter: Nikolay Izhikov >Assignee: Nikolay Izhikov >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Ignite codestyle specify > > { starts on the same line as the opening block statement. For example: > To force this checkstyle has a LeftCurly rule. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (IGNITE-16916) Make nodes more resilient in case of a job cancellation
[ https://issues.apache.org/jira/browse/IGNITE-16916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17532164#comment-17532164 ] Anton Vinogradov edited comment on IGNITE-16916 at 5/5/22 10:14 AM: Reopening because of failing tests was (Author: av): Reopening bacause of failing tests > Make nodes more resilient in case of a job cancellation > --- > > Key: IGNITE-16916 > URL: https://issues.apache.org/jira/browse/IGNITE-16916 > Project: Ignite > Issue Type: Task > Components: compute >Reporter: Kirill Tkalenko >Assignee: Kirill Tkalenko >Priority: Major > Fix For: 2.14 > > Attachments: image-2022-05-05-12-46-26-543.png, screenshot-1.png > > Time Spent: 20m > Remaining Estimate: 0h > > In case of a job being cancelled we currently have a really questionable > approach. > We are now setting the interruption flag even before we give a use a chance > to stop the job gracefully. > Proposal for the implementation: > * Adding a distributed property in the metastore that will set a timeout for > interrupting *GridJobWorker* that did not gracefully complete after calling > *GridJobWorker#cancel*; > * On the call of the *GridJobWorker#cancel*, do not *Thread#interrupt* the > thread, but add *GridTimeoutObject*. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (IGNITE-14341) Reduce contention in the PendingEntriesTree when cleaning up expired entries.
[ https://issues.apache.org/jira/browse/IGNITE-14341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Pereslegin updated IGNITE-14341: -- Summary: Reduce contention in the PendingEntriesTree when cleaning up expired entries. (was: Significant performance drop when entries expiring concurrently) > Reduce contention in the PendingEntriesTree when cleaning up expired entries. > - > > Key: IGNITE-14341 > URL: https://issues.apache.org/jira/browse/IGNITE-14341 > Project: Ignite > Issue Type: Improvement >Reporter: Aleksey Plekhanov >Assignee: Pavel Pereslegin >Priority: Major > Labels: ise > Attachments: JmhCacheExpireBenchmark.java > > Time Spent: 20m > Remaining Estimate: 0h > > Currently, there is a significant performance drop when expired entries > concurrently evicted by threads that perform some actions with cache (see > attached reproducer): > {noformat} > Benchmark Mode Cnt Score Error > Units > JmhCacheExpireBenchmark.putWithExpire thrpt3 100,132 ± 21,025 > ops/ms > JmhCacheExpireBenchmark.putWithoutExpire thrpt3 2133,122 ± 559,694 > ops/ms{noformat} > Root cause: pending entries tree (offheap BPlusTree) is used to track expired > entries, after each cache operation (and by timeout thread) there is an > attempt to evict some amount of expired entries. these entries looked up from > the start of the pending entries tree and there is a contention on the first > leaf page of that tree. > All threads waiting for the same page lock: > {noformat} > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at > org.apache.ignite.internal.util.OffheapReadWriteLock.waitAcquireWriteLock(OffheapReadWriteLock.java:503) > at > org.apache.ignite.internal.util.OffheapReadWriteLock.writeLock(OffheapReadWriteLock.java:244) > at > org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl.writeLock(PageMemoryNoStoreImpl.java:528) > at > org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writeLock(PageHandler.java:422) > at > org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writePage(PageHandler.java:350) > at > org.apache.ignite.internal.processors.cache.persistence.DataStructure.write(DataStructure.java:325) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$13200(BPlusTree.java:100) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.doRemoveFromLeaf(BPlusTree.java:4588) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.removeFromLeaf(BPlusTree.java:4567) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.tryRemoveFromLeaf(BPlusTree.java:5196) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.access$6800(BPlusTree.java:4209) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.removeDown(BPlusTree.java:2189) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.removeDown(BPlusTree.java:2165) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.removeDown(BPlusTree.java:2165) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doRemove(BPlusTree.java:2076) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.removex(BPlusTree.java:1905) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.expireInternal(IgniteCacheOffheapManagerImpl.java:1426) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.expire(IgniteCacheOffheapManagerImpl.java:1375) > at > org.apache.ignite.internal.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:246) > at > org.apache.ignite.internal.processors.cache.GridCacheUtils.unwindEvicts(GridCacheUtils.java:882){noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (IGNITE-16916) Make nodes more resilient in case of a job cancellation
[ https://issues.apache.org/jira/browse/IGNITE-16916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17532160#comment-17532160 ] Anton Vinogradov edited comment on IGNITE-16916 at 5/5/22 9:54 AM: --- [~ktkale...@gridgain.com], [~sergeychugunov] Looks like {{KillCommandsCommandShTest.testCancelComputeTask}} is [broken|https://ci.ignite.apache.org/test/-8103382042071142009?currentProjectId=IgniteTests24Java8=%3Cdefault%3E=true] now !screenshot-1.png! As far as I can see, you never checked these changes :( !image-2022-05-05-12-46-26-543.png! was (Author: av): [~ktkale...@gridgain.com] Looks like {{KillCommandsCommandShTest.testCancelComputeTask}} is [broken|https://ci.ignite.apache.org/test/-8103382042071142009?currentProjectId=IgniteTests24Java8=%3Cdefault%3E=true] now !screenshot-1.png! As far as I can see, you never checked these changes :( !image-2022-05-05-12-46-26-543.png! > Make nodes more resilient in case of a job cancellation > --- > > Key: IGNITE-16916 > URL: https://issues.apache.org/jira/browse/IGNITE-16916 > Project: Ignite > Issue Type: Task > Components: compute >Reporter: Kirill Tkalenko >Assignee: Kirill Tkalenko >Priority: Major > Fix For: 2.14 > > Attachments: image-2022-05-05-12-46-26-543.png, screenshot-1.png > > Time Spent: 20m > Remaining Estimate: 0h > > In case of a job being cancelled we currently have a really questionable > approach. > We are now setting the interruption flag even before we give a use a chance > to stop the job gracefully. > Proposal for the implementation: > * Adding a distributed property in the metastore that will set a timeout for > interrupting *GridJobWorker* that did not gracefully complete after calling > *GridJobWorker#cancel*; > * On the call of the *GridJobWorker#cancel*, do not *Thread#interrupt* the > thread, but add *GridTimeoutObject*. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (IGNITE-16927) [Extensions] Fix scope of spring-data-commons dependency.
Mikhail Petrov created IGNITE-16927: --- Summary: [Extensions] Fix scope of spring-data-commons dependency. Key: IGNITE-16927 URL: https://issues.apache.org/jira/browse/IGNITE-16927 Project: Ignite Issue Type: Bug Reporter: Mikhail Petrov Currently scope of spring-data-commons dependency for extensions is `compile` which means that extensions are dependent on hardcoded version of spring-data-commons. We should change it to provided to avoid releasing the spring-data-ext for each spring-data-commons version. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Reopened] (IGNITE-16916) Make nodes more resilient in case of a job cancellation
[ https://issues.apache.org/jira/browse/IGNITE-16916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Vinogradov reopened IGNITE-16916: --- Reopening bacause of failing tests > Make nodes more resilient in case of a job cancellation > --- > > Key: IGNITE-16916 > URL: https://issues.apache.org/jira/browse/IGNITE-16916 > Project: Ignite > Issue Type: Task > Components: compute >Reporter: Kirill Tkalenko >Assignee: Kirill Tkalenko >Priority: Major > Fix For: 2.14 > > Attachments: image-2022-05-05-12-46-26-543.png, screenshot-1.png > > Time Spent: 20m > Remaining Estimate: 0h > > In case of a job being cancelled we currently have a really questionable > approach. > We are now setting the interruption flag even before we give a use a chance > to stop the job gracefully. > Proposal for the implementation: > * Adding a distributed property in the metastore that will set a timeout for > interrupting *GridJobWorker* that did not gracefully complete after calling > *GridJobWorker#cancel*; > * On the call of the *GridJobWorker#cancel*, do not *Thread#interrupt* the > thread, but add *GridTimeoutObject*. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (IGNITE-16916) Make nodes more resilient in case of a job cancellation
[ https://issues.apache.org/jira/browse/IGNITE-16916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17532160#comment-17532160 ] Anton Vinogradov edited comment on IGNITE-16916 at 5/5/22 9:46 AM: --- [~ktkale...@gridgain.com] Looks like {{KillCommandsCommandShTest.testCancelComputeTask}} is [broken|https://ci.ignite.apache.org/test/-8103382042071142009?currentProjectId=IgniteTests24Java8=%3Cdefault%3E=true] now !screenshot-1.png! As far as I can see, you never checked these changes :( !image-2022-05-05-12-46-26-543.png! was (Author: av): [~ktkale...@gridgain.com] Looks like {{KillCommandsCommandShTest.testCancelComputeTask}} is [broken|https://ci.ignite.apache.org/test/-8103382042071142009?currentProjectId=IgniteTests24Java8=%3Cdefault%3E=true] now !screenshot-1.png! > Make nodes more resilient in case of a job cancellation > --- > > Key: IGNITE-16916 > URL: https://issues.apache.org/jira/browse/IGNITE-16916 > Project: Ignite > Issue Type: Task > Components: compute >Reporter: Kirill Tkalenko >Assignee: Kirill Tkalenko >Priority: Major > Fix For: 2.14 > > Attachments: image-2022-05-05-12-46-26-543.png, screenshot-1.png > > Time Spent: 20m > Remaining Estimate: 0h > > In case of a job being cancelled we currently have a really questionable > approach. > We are now setting the interruption flag even before we give a use a chance > to stop the job gracefully. > Proposal for the implementation: > * Adding a distributed property in the metastore that will set a timeout for > interrupting *GridJobWorker* that did not gracefully complete after calling > *GridJobWorker#cancel*; > * On the call of the *GridJobWorker#cancel*, do not *Thread#interrupt* the > thread, but add *GridTimeoutObject*. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (IGNITE-16916) Make nodes more resilient in case of a job cancellation
[ https://issues.apache.org/jira/browse/IGNITE-16916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17532160#comment-17532160 ] Anton Vinogradov edited comment on IGNITE-16916 at 5/5/22 9:44 AM: --- [~ktkale...@gridgain.com] Looks like {{KillCommandsCommandShTest.testCancelComputeTask}} is [broken|https://ci.ignite.apache.org/test/-8103382042071142009?currentProjectId=IgniteTests24Java8=%3Cdefault%3E=true] now !screenshot-1.png! was (Author: av): [~ktkale...@gridgain.com] Looks like {{KillCommandsCommandShTest.testCancelComputeTask}} is broken now !screenshot-1.png! > Make nodes more resilient in case of a job cancellation > --- > > Key: IGNITE-16916 > URL: https://issues.apache.org/jira/browse/IGNITE-16916 > Project: Ignite > Issue Type: Task > Components: compute >Reporter: Kirill Tkalenko >Assignee: Kirill Tkalenko >Priority: Major > Fix For: 2.14 > > Attachments: screenshot-1.png > > Time Spent: 20m > Remaining Estimate: 0h > > In case of a job being cancelled we currently have a really questionable > approach. > We are now setting the interruption flag even before we give a use a chance > to stop the job gracefully. > Proposal for the implementation: > * Adding a distributed property in the metastore that will set a timeout for > interrupting *GridJobWorker* that did not gracefully complete after calling > *GridJobWorker#cancel*; > * On the call of the *GridJobWorker#cancel*, do not *Thread#interrupt* the > thread, but add *GridTimeoutObject*. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (IGNITE-16916) Make nodes more resilient in case of a job cancellation
[ https://issues.apache.org/jira/browse/IGNITE-16916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17532160#comment-17532160 ] Anton Vinogradov commented on IGNITE-16916: --- [~ktkale...@gridgain.com] Looks like {{KillCommandsCommandShTest.testCancelComputeTask}} is broken now !screenshot-1.png! > Make nodes more resilient in case of a job cancellation > --- > > Key: IGNITE-16916 > URL: https://issues.apache.org/jira/browse/IGNITE-16916 > Project: Ignite > Issue Type: Task > Components: compute >Reporter: Kirill Tkalenko >Assignee: Kirill Tkalenko >Priority: Major > Fix For: 2.14 > > Attachments: screenshot-1.png > > Time Spent: 20m > Remaining Estimate: 0h > > In case of a job being cancelled we currently have a really questionable > approach. > We are now setting the interruption flag even before we give a use a chance > to stop the job gracefully. > Proposal for the implementation: > * Adding a distributed property in the metastore that will set a timeout for > interrupting *GridJobWorker* that did not gracefully complete after calling > *GridJobWorker#cancel*; > * On the call of the *GridJobWorker#cancel*, do not *Thread#interrupt* the > thread, but add *GridTimeoutObject*. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (IGNITE-16916) Make nodes more resilient in case of a job cancellation
[ https://issues.apache.org/jira/browse/IGNITE-16916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Vinogradov updated IGNITE-16916: -- Attachment: screenshot-1.png > Make nodes more resilient in case of a job cancellation > --- > > Key: IGNITE-16916 > URL: https://issues.apache.org/jira/browse/IGNITE-16916 > Project: Ignite > Issue Type: Task > Components: compute >Reporter: Kirill Tkalenko >Assignee: Kirill Tkalenko >Priority: Major > Fix For: 2.14 > > Attachments: screenshot-1.png > > Time Spent: 20m > Remaining Estimate: 0h > > In case of a job being cancelled we currently have a really questionable > approach. > We are now setting the interruption flag even before we give a use a chance > to stop the job gracefully. > Proposal for the implementation: > * Adding a distributed property in the metastore that will set a timeout for > interrupting *GridJobWorker* that did not gracefully complete after calling > *GridJobWorker#cancel*; > * On the call of the *GridJobWorker#cancel*, do not *Thread#interrupt* the > thread, but add *GridTimeoutObject*. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (IGNITE-16926) Interrupted compute job may fail a node
Ivan Bessonov created IGNITE-16926: -- Summary: Interrupted compute job may fail a node Key: IGNITE-16926 URL: https://issues.apache.org/jira/browse/IGNITE-16926 Project: Ignite Issue Type: Bug Components: persistence Reporter: Ivan Bessonov {code:java} Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is corrupted [groupId=1234619879, pageIds=[7290201467513], cacheId=645096946, cacheName=*, indexName=*, msg=Runtime failure on row: Row@79570772[ key: 1168930235, val: Data hidden due to IGNITE_SENSITIVE_DATA_LOGGING flag. ][ data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden ","logger_name":"ROOT","thread_name":"pub-#1278%x%","level":"ERROR","level_value":4,"stack_trace":"org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is corrupted [groupId=1234619879, pageIds=[7290201467513], cacheId=645096946, cacheName=*, indexName=*, msg=Runtime failure on row: Row@79570772[ key: 1168930235, val: Data hidden due to IGNITE_SENSITIVE_DATA_LOGGING flag. ][ data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden, data hidden ]] at org.apache.ignite.internal.processors.query.h2.database.H2Tree.corruptedTreeException(H2Tree.java:1003) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doPut(BPlusTree.java:2492) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.putx(BPlusTree.java:2432) at org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.putx(H2TreeIndex.java:500) at org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.addToIndex(GridH2Table.java:880) at org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.update(GridH2Table.java:794) at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.store(IgniteH2Indexing.java:411) at org.apache.ignite.internal.processors.query.GridQueryProcessor.store(GridQueryProcessor.java:2546) at
[jira] [Updated] (IGNITE-16801) Implement error handling for rebalance
[ https://issues.apache.org/jira/browse/IGNITE-16801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vyacheslav Koptilin updated IGNITE-16801: - Epic Link: IGNITE-14209 Ignite Flags: (was: Docs Required,Release Notes Required) > Implement error handling for rebalance > --- > > Key: IGNITE-16801 > URL: https://issues.apache.org/jira/browse/IGNITE-16801 > Project: Ignite > Issue Type: Task >Reporter: Kirill Gusakov >Priority: Major > Labels: ignite-3 > > We have the listener `onReconfigurationError` for handling errors during the > rebalance, but not implementation yet. > At the moment, it looks like, that we can receive only 1 kind of errors - > `RaftError.ECATCHUP` -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (IGNITE-16668) Design in-memory raft group reconfiguration on node failure
[ https://issues.apache.org/jira/browse/IGNITE-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Lapin updated IGNITE-16668: - Description: If a node storing a partition of an in-memory table fails and leaves the cluster all data it had is lost. From the point of view of the partition it looks like as the node is left forever. Although Raft protocol tolerates leaving some amount of nodes composing Raft group (partition); for in-memory caches we cannot restore replica factor because of in-memory nature of the table. It means that we need to detect failures of each node owning a partition and recalculate assignments for the table without keeping replica factor. h4. Upd 1: h4. Problem By design raft has several persisted segments, e.g. raft meta (term/committedIndex) and stable raft log. So, by converting common raft to in-memory one it’s possible to break some of it’s invariants. For example Node C could vote for Candidate A before self-restart and vote then for Candidate B after one. As a result two leaders will be elected which is illegal. !Screenshot from 2022-04-19 11-11-05.png! h4. Solution In order to solve the problem mentioned above it’s possible to remove and then return back the restarting node from the peers of the corresponding raft group. The peer-removal process should be finished before the restarting of the corresponding raft server node. !Screenshot from 2022-04-19 11-12-55.png! The process of removing and then returning back the restarting node is however itself tricky. And to answer why it’s non-trivial action, it’s necessary to reveal the main ideas of the rebalance protocol. Reconfiguration of the raft group - is a process driven by the fact of changing the assignments. Each partition has three corresponding sets of assignments stored in the metastore: # assignments.stable - current distribution # assignments.pending - partition distribution for an ongoing rebalance if any # assignments.planned - in some cases it’s not possible to cancel or merge pending rebalance with new one. In that case newly calculated assignments will be stored explicitly with corresponding assignments.planned key. It's worth noting that it doesn't make sense to keep more than one planned rebalance. Any new scheduled one will overwrite already existing. However such idea of overwriting the assignments.planned key wont work within the context of an in-memory raft restart, because it’s not valid to overwrite the reduction of assignments. Let's illustrate this problem with the following example. # In-memory partition p1 is hosted on nodes A, B and C, meaning that p1.assignments.stable=[A,B,C] # Let's say that the baseline was changed, resulting in a rebalance on assignments.pending=[A,B,C,*D*] # During the non-cancelable phase of [A,B,C]->[A,B,C,D], node C fails and returns back, meaning that we should plan [A,B,D] and [A,B,C,D] assignments. Both must be recorded in the only assignments.planned key meaning that [A,B,C,D] will overwrite reduction [A,B,D], so no actual raft reconfiguration will take place, which is not acceptable. In order to overcome given issue, let’s introduce two new keys _assignments.switch.reduce_ that will hold nodes that should be removed and _assignments.switch.append_ that will hold nodes that should be returned back and run following actions: h5. On in-memory partition restart (or on partition start with cleaned-up PDS) within retry loop add current node to assignments.switch.reduce set: {code:java} do { retrievedAssignmentsSwitchReduce = metastorage.read(assignments.switch.reduce); calculatedAssignmetnsSwitchReduce = union(retrievedAssignmentsSwitchReduce.value, currentNode); if (retrievedAssignmentsSwitchReduce.isEmpty()) { invokeRes = metastoreInvoke: if empty(assignments.switch.reduce) assignments.switch.reduce = calculatedAssignmentsSwitchReduce } else { invokeRes = metastoreInvoke: eq(revision(assignments.switch.reduce), retrievedAssignmentsSwitchReduce.revision) assignments.switch.reduce = calculatedAssignmentsSwitchReduce } } while (!invokeRes);{code} h5. On assignments.switch.reduce change on corresponding partition leader Within watch listener on assignments.switch.reduce key on corresponding partition leader we trigger new rebalance if there are no pending one. {code:java} calculatedAssignments = substract(calcPartAssighments(), assignments.switch.reduce); metastoreInvoke: if empty(partition.assignments.change.trigger.revision) || partition.assignments.change.trigger.revision < event.revision if empty(assignments.pending) assignments.pending = calculatedAssignments partition.assignments.change.trigger.revision = event.revision {code} h5. On rebalance done changePeers() calles
[jira] [Updated] (IGNITE-16668) Design in-memory raft group reconfiguration on node failure
[ https://issues.apache.org/jira/browse/IGNITE-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Lapin updated IGNITE-16668: - Description: If a node storing a partition of an in-memory table fails and leaves the cluster all data it had is lost. From the point of view of the partition it looks like as the node is left forever. Although Raft protocol tolerates leaving some amount of nodes composing Raft group (partition); for in-memory caches we cannot restore replica factor because of in-memory nature of the table. It means that we need to detect failures of each node owning a partition and recalculate assignments for the table without keeping replica factor. h4. Upd 1: h4. Problem By design raft has several persisted segments, e.g. raft meta (term/committedIndex) and stable raft log. So, by converting common raft to in-memory one it’s possible to break some of it’s invariants. For example Node C could vote for Candidate A before self-restart and vote then for Candidate B after one. As a result two leaders will be elected which is illegal. !Screenshot from 2022-04-19 11-11-05.png! h4. Solution In order to solve the problem mentioned above it’s possible to remove and then return back the restarting node from the peers of the corresponding raft group. The peer-removal process should be finished before the restarting of the corresponding raft server node. !Screenshot from 2022-04-19 11-12-55.png! The process of removing and then returning back the restarting node is however itself tricky. And to answer why it’s non-trivial action, it’s necessary to reveal the main ideas of the rebalance protocol. Reconfiguration of the raft group - is a process driven by the fact of changing the assignments. Each partition has three corresponding sets of assignments stored in the metastore: # assignments.stable - current distribution # assignments.pending - partition distribution for an ongoing rebalance if any # assignments.planned - in some cases it’s not possible to cancel or merge pending rebalance with new one. In that case newly calculated assignments will be stored explicitly with corresponding assignments.planned key. It's worth noting that it doesn't make sense to keep more than one planned rebalance. Any new scheduled one will overwrite already existing. However such idea of overwriting the assignments.planned key wont work within the context of an in-memory raft restart, because it’s not valid to overwrite the reduction of assignments. Let's illustrate this problem with the following example. # In-memory partition p1 is hosted on nodes A, B and C, meaning that p1.assignments.stable=[A,B,C] # Let's say that the baseline was changed, resulting in a rebalance on assignments.pending=[A,B,C,*D*] # During the non-cancelable phase of [A,B,C]->[A,B,C,D], node C fails and returns back, meaning that we should plan [A,B,D] and [A,B,C,D] assignments. Both must be recorded in the only assignments.planned key meaning that [A,B,C,D] will overwrite reduction [A,B,D], so no actual raft reconfiguration will take place, which is not acceptable. In order to overcome given issue, let’s introduce two new keys _assignments.switch.reduce_ that will hold nodes that should be removed and _assignments.switch.append_ that will hold nodes that should be returned back and run following actions: h5. On in-memory partition restart (or on partition start with cleaned-up PDS) within retry loop add current node to assignments.switch.reduce set: {code:java} do { retrievedAssignmentsSwitchReduce = metastorage.read(assignments.switch.reduce); calculatedAssignmetnsSwitchReduce = union(retrievedAssignmentsSwitchReduce.value, currentNode); if (retrievedAssignmentsSwitchReduce.isEmpty()) { invokeRes = metastoreInvoke: if empty(assignments.switch.reduce) assignments.switch.reduce = calculatedAssignmentsSwitchReduce } else { invokeRes = metastoreInvoke: eq(revision(assignments.switch.reduce), retrievedAssignmentsSwitchReduce.revision) assignments.switch.reduce = calculatedAssignmentsSwitchReduce } } while (!invokeRes);{code} h5. On assignments.switch.reduce change on corresponding partition leader Within watch listener on assignments.switch.reduce key on corresponding partition leader we trigger new rebalance if there are no pending one. {code:java} calculatedAssignments = substract(calcPartAssighments(), assignments.switch.reduce); metastoreInvoke: if empty(partition.assignments.change.trigger.revision) || partition.assignments.change.trigger.revision < event.revision if empty(assignments.pending) assignments.pending = calculatedAssignments partition.assignments.change.trigger.revision = event.revision {code} h5. On rebalance done changePeers() calles
[jira] [Updated] (IGNITE-16668) Design in-memory raft group reconfiguration on node failure
[ https://issues.apache.org/jira/browse/IGNITE-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Lapin updated IGNITE-16668: - Description: If a node storing a partition of an in-memory table fails and leaves the cluster all data it had is lost. From the point of view of the partition it looks like as the node is left forever. Although Raft protocol tolerates leaving some amount of nodes composing Raft group (partition); for in-memory caches we cannot restore replica factor because of in-memory nature of the table. It means that we need to detect failures of each node owning a partition and recalculate assignments for the table without keeping replica factor. h4. Upd 1: h4. Problem By design raft has several persisted segments, e.g. raft meta (term/committedIndex) and stable raft log. So, by converting common raft to in-memory one it’s possible to break some of it’s invariants. For example Node C could vote for Candidate A before self-restart and vote then for Candidate B after one. As a result two leaders will be elected which is illegal. !Screenshot from 2022-04-19 11-11-05.png! h4. Solution In order to solve the problem mentioned above it’s possible to remove and then return back the restarting node from the peers of the corresponding raft group. The peer-removal process should be finished before the restarting of the corresponding raft server node. !Screenshot from 2022-04-19 11-12-55.png! The process of removing and then returning back the restarting node is however itself tricky. And to answer why it’s non-trivial action, it’s necessary to reveal the main ideas of the rebalance protocol. Reconfiguration of the raft group - is a process driven by the fact of changing the assignments. Each partition has three corresponding sets of assignments stored in the metastore: # assignments.stable - current distribution # assignments.pending - partition distribution for an ongoing rebalance if any # assignments.planned - in some cases it’s not possible to cancel or merge pending rebalance with new one. In that case newly calculated assignments will be stored explicitly with corresponding assignments.planned key. It's worth noting that it doesn't make sense to keep more than one planned rebalance. Any new scheduled one will overwrite already existing. However such idea of overwriting the assignments.planned key wont work within the context of an in-memory raft restart, because it’s not valid to overwrite the reduction of assignments. Let's illustrate this problem with the following example. # In-memory partition p1 is hosted on nodes A, B and C, meaning that p1.assignments.stable=[A,B,C] # Let's say that the baseline was changed, resulting in a rebalance on assignments.pending=[A,B,C,*D*] # During the non-cancelable phase of [A,B,C]->[A,B,C,D], node C fails and returns back, meaning that we should plan [A,B,D] and [A,B,C,D] assignments. Both must be recorded in the only assignments.planned key meaning that [A,B,C,D] will overwrite reduction [A,B,D], so no actual raft reconfiguration will take place, which is not acceptable. In order to overcome given issue, let’s introduce two new keys _assignments.switch.reduce_ that will hold nodes that should be removed and _assignments.switch.append_ that will hold nodes that should be returned back and run following actions: h5. On in-memory partition restart (or on partition start with cleaned-up PDS) within retry loop add current node to assignments.switch.reduce set: {code:java} do { retrievedAssignmentsSwitchReduce = metastorage.read(assignments.switch.reduce); calculatedAssignmetnsSwitchReduce = union(retrievedAssignmentsSwitchReduce.value, currentNode); if (retrievedAssignmentsSwitchReduce.isEmpty()) { invokeRes = metastoreInvoke: if empty(assignments.switch.reduce) assignments.switch.reduce = calculatedAssignmentsSwitchReduce } else { invokeRes = metastoreInvoke: eq(revision(assignments.switch.reduce), retrievedAssignmentsSwitchReduce.revision) assignments.switch.reduce = calculatedAssignmentsSwitchReduce } } while (!invokeRes);{code} h5. On assignments.switch.reduce change on corresponding partition leader Within watch listener on assignments.switch.reduce key on corresponding partition leader we trigger new rebalance if there are no pending one. {code:java} calculatedAssignments = substract(calcPartAssighments(), assignments.switch.reduce); metastoreInvoke: if empty(partition.assignments.change.trigger.revision) || partition.assignments.change.trigger.revision < event.revision if empty(assignments.pending) assignments.pending = calculatedAssignments partition.assignments.change.trigger.revision = event.revision {code} h5. On rebalance done changePeers() calles
[jira] [Updated] (IGNITE-16668) Design in-memory raft group reconfiguration on node failure
[ https://issues.apache.org/jira/browse/IGNITE-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Lapin updated IGNITE-16668: - Description: If a node storing a partition of an in-memory table fails and leaves the cluster all data it had is lost. From the point of view of the partition it looks like as the node is left forever. Although Raft protocol tolerates leaving some amount of nodes composing Raft group (partition); for in-memory caches we cannot restore replica factor because of in-memory nature of the table. It means that we need to detect failures of each node owning a partition and recalculate assignments for the table without keeping replica factor. h4. Upd 1: h4. Problem By design raft has several persisted segments, e.g. raft meta (term/committedIndex) and stable raft log. So, by converting common raft to in-memory one it’s possible to break some of it’s invariants. For example Node C could vote for Candidate A before self-restart and vote then for Candidate B after one. As a result two leaders will be elected which is illegal. !Screenshot from 2022-04-19 11-11-05.png! h4. Solution In order to solve the problem mentioned above it’s possible to remove and then return back the restarting node from the peers of the corresponding raft group. The peer-removal process should be finished before the restarting of the corresponding raft server node. !Screenshot from 2022-04-19 11-12-55.png! The process of removing and then returning back the restarting node is however itself tricky. And to answer why it’s non-trivial action, it’s necessary to reveal the main ideas of the rebalance protocol. Reconfiguration of the raft group - is a process driven by the fact of changing the assignments. Each partition has three corresponding sets of assignments stored in the metastore: # assignments.stable - current distribution # assignments.pending - partition distribution for an ongoing rebalance if any # assignments.planned - in some cases it’s not possible to cancel or merge pending rebalance with new one. In that case newly calculated assignments will be stored explicitly with corresponding assignments.planned key. It's worth noting that it doesn't make sense to keep more than one planned rebalance. Any new scheduled one will overwrite already existing. However such idea of overwriting the assignments.planned key wont work within the context of an in-memory raft restart, because it’s not valid to overwrite the reduction of assignments. Let's illustrate this problem with the following example. # In-memory partition p1 is hosted on nodes A, B and C, meaning that p1.assignments.stable=[A,B,C] # Let's say that the baseline was changed, resulting in a rebalance on assignments.pending=[A,B,C,*D*] # During the non-cancelable phase of [A,B,C]->[A,B,C,D], node C fails and returns back, meaning that we should plan [A,B,D] and [A,B,C,D] assignments. Both must be recorded in the only assignments.planned key meaning that [A,B,C,D] will overwrite reduction [A,B,D], so no actual raft reconfiguration will take place, which is not acceptable. In order to overcome given issue, let’s introduce two new keys _assignments.switch.reduce_ that will hold nodes that should be removed and _assignments.switch.append_ that will hold nodes that should be returned back and run following actions: h5. On in-memory partition restart (or on partition start with cleaned-up PDS) within retry loop add current node to assignments.switch.reduce set: {code:java} do { retrievedAssignmentsSwitchReduce = metastorage.read(assignments.switch.reduce); calculatedAssignmetnsSwitchReduce = union(retrievedAssignmentsSwitchReduce.value, currentNode); if (retrievedAssignmentsSwitchReduce.isEmpty()) { invokeRes = metastoreInvoke: if empty(assignments.switch.reduce) assignments.switch.reduce = calculatedAssignmentsSwitchReduce } else { invokeRes = metastoreInvoke: eq(revision(assignments.switch.reduce), retrievedAssignmentsSwitchReduce.revision) assignments.switch.reduce = calculatedAssignmentsSwitchReduce } } while (!invokeRes);{code} h5. On assignments.switch.reduce change on corresponding partition leader Within watch listener on assignments.switch.reduce key on corresponding partition leader we trigger new rebalance if there are no pending one. {code:java} calculatedAssignments = substract(calcPartAssighments(), assignments.switch.reduce); metastoreInvoke: if empty(partition.assignments.change.trigger.revision) || partition.assignments.change.trigger.revision < event.revision if empty(assignments.pending) assignments.pending = calculatedAssignments partition.assignments.change.trigger.revision = event.revision {code} h5. On rebalance done changePeers() calles
[jira] [Updated] (IGNITE-16668) Design in-memory raft group reconfiguration on node failure
[ https://issues.apache.org/jira/browse/IGNITE-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Lapin updated IGNITE-16668: - Description: If a node storing a partition of an in-memory table fails and leaves the cluster all data it had is lost. From the point of view of the partition it looks like as the node is left forever. Although Raft protocol tolerates leaving some amount of nodes composing Raft group (partition); for in-memory caches we cannot restore replica factor because of in-memory nature of the table. It means that we need to detect failures of each node owning a partition and recalculate assignments for the table without keeping replica factor. h4. Upd 1: h4. Problem By design raft has several persisted segments, e.g. raft meta (term/committedIndex) and stable raft log. So, by converting common raft to in-memory one it’s possible to break some of it’s invariants. For example Node C could vote for Candidate A before self-restart and vote then for Candidate B after one. As a result two leaders will be elected which is illegal. !Screenshot from 2022-04-19 11-11-05.png! h4. Solution In order to solve the problem mentioned above it’s possible to remove and then return back the restarting node from the peers of the corresponding raft group. The peer-removal process should be finished before the restarting of the corresponding raft server node. !Screenshot from 2022-04-19 11-12-55.png! The process of removing and then returning back the restarting node is however itself tricky. And to answer why it’s non-trivial action, it’s necessary to reveal the main ideas of the rebalance protocol. Reconfiguration of the raft group - is a process driven by the fact of changing the assignments. Each partition has three corresponding sets of assignments stored in the metastore: # assignments.stable - current distribution # assignments.pending - partition distribution for an ongoing rebalance if any # assignments.planned - in some cases it’s not possible to cancel or merge pending rebalance with new one. In that case newly calculated assignments will be stored explicitly with corresponding assignments.planned key. It's worth noting that it doesn't make sense to keep more than one planned rebalance. Any new scheduled one will overwrite already existing. However such idea of overwriting the assignments.planned key wont work within the context of an in-memory raft restart, because it’s not valid to overwrite the reduction of assignments. Let's illustrate this problem with the following example. # In-memory partition p1 is hosted on nodes A, B and C, meaning that p1.assignments.stable=[A,B,C] # Let's say that the baseline was changed, resulting in a rebalance on assignments.pending=[A,B,C,*D*] # During the non-cancelable phase of [A,B,C]->[A,B,C,D], node C fails and returns back, meaning that we should plan [A,B,D] and [A,B,C,D] assignments. Both must be recorded in the only assignments.planned key meaning that [A,B,C,D] will overwrite reduction [A,B,D], so no actual raft reconfiguration will take place, which is not acceptable. In order to overcome given issue, let’s introduce two new keys _assignments.switch.reduce_ that will hold nodes that should be removed and _assignments.switch.append_ that will hold nodes that should be returned back and run following actions: h5. On in-memory partition restart (or on partition start with cleaned-up PDS) within retry loop add current node to assignments.switch.reduce set: {code:java} do { retrievedAssignmentsSwitchReduce = metastorage.read(assignments.switch.reduce); calculatedAssignmetnsSwitchReduce = union(retrievedAssignmentsSwitchReduce.value, currentNode); if (retrievedAssignmentsSwitchReduce.isEmpty()) { invokeRes = metastoreInvoke: if empty(assignments.switch.reduce) assignments.switch.reduce = calculatedAssignmentsSwitchReduce } else { invokeRes = metastoreInvoke: eq(revision(assignments.switch.reduce), retrievedAssignmentsSwitchReduce.revision) assignments.switch.reduce = calculatedAssignmentsSwitchReduce } } while (!invokeRes);{code} h5. On assignments.switch.reduce change on corresponding partition leader Within watch listener on assignments.switch.reduce key on corresponding partition leader we trigger new rebalance if there are no pending one. {code:java} calculatedAssignments = substract(calcPartAssighments(), assignments.switch.reduce); metastoreInvoke: if empty(partition.assignments.change.trigger.revision) || partition.assignments.change.trigger.revision < event.revision if empty(assignments.pending) assignments.pending = calculatedAssignments partition.assignments.change.trigger.revision = event.revision {code} h5. On rebalance done changePeers() calles