[jira] [Commented] (IGNITE-18875) Sql. Drop AbstractPlannerTest.TestTable.
[ https://issues.apache.org/jira/browse/IGNITE-18875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17709542#comment-17709542 ] Gael Yimen Yimga commented on IGNITE-18875: --- [~zstan] Please can you take a look to the PR [1]again, I made a little progress. [1] [https://github.com/apache/ignite-3/pull/1873] > Sql. Drop AbstractPlannerTest.TestTable. > > > Key: IGNITE-18875 > URL: https://issues.apache.org/jira/browse/IGNITE-18875 > Project: Ignite > Issue Type: Improvement > Components: sql >Reporter: Andrey Mashenkov >Assignee: Gael Yimen Yimga >Priority: Major > Labels: ignite-3, newbie, tech-debt-test > Fix For: 3.0.0-beta2 > > Attachments: Screen Shot 2023-04-03 at 1.04.39 AM.png > > Time Spent: 50m > Remaining Estimate: 0h > > Use test framework for schema configuration in tests. > Replace > {code:java} > org.apache.ignite.internal.sql.engine.planner.AbstractPlannerTest.TestTable > {code} > with > {code:java} > org.apache.ignite.internal.sql.engine.framework.TestTable > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19252) The incremental snapshot restore operation fails if there is a node not from the baseline.
[ https://issues.apache.org/jira/browse/IGNITE-19252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikita Amelchev updated IGNITE-19252: - Ignite Flags: (was: Release Notes Required) > The incremental snapshot restore operation fails if there is a node not from > the baseline. > -- > > Key: IGNITE-19252 > URL: https://issues.apache.org/jira/browse/IGNITE-19252 > Project: Ignite > Issue Type: Bug >Reporter: Nikita Amelchev >Assignee: Nikita Amelchev >Priority: Major > Labels: ise > Fix For: 2.15 > > Time Spent: 10m > Remaining Estimate: 0h > > The incremental snapshot restore operation fails if there is a node not from > the baseline: > {noformat} > 21:20:40.324 [disco-notifier-worker-#147%server-1%] ERROR > org.apache.ignite.internal.processors.cache.persistence.snapshot.SnapshotRestoreProcess > - Failed to restore snapshot cache groups > [reqId=55eead09-4da7-4232-8e98-976dba117d91]. > org.apache.ignite.IgniteCheckedException: Snapshot metafile cannot be read > due to it doesn't exist: > /work/snapshots/snp1/increments/0001/server_3.smf > at > org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.readFromFile(IgniteSnapshotManager.java:2001) > ~[ignite-core-15.0.0-SNAPSHOT.jar:15.0.0-SNAPSHOT] > at > org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.readIncrementalSnapshotMetadata(IgniteSnapshotManager.java:1098) > ~[ignite-core-15.0.0-SNAPSHOT.jar:15.0.0-SNAPSHOT] > at > org.apache.ignite.internal.processors.cache.persistence.snapshot.IncrementalSnapshotProcessor.process(IncrementalSnapshotProcessor.java:94) > ~[ignite-core-15.0.0-SNAPSHOT.jar:15.0.0-SNAPSHOT] > at > org.apache.ignite.internal.processors.cache.persistence.snapshot.SnapshotRestoreProcess.restoreIncrementalSnapshot(SnapshotRestoreProcess.java:1466) > ~[ignite-core-15.0.0-SNAPSHOT.jar:15.0.0-SNAPSHOT] > at > org.apache.ignite.internal.processors.cache.persistence.snapshot.SnapshotRestoreProcess.lambda$incrementalSnapshotRestore$35(SnapshotRestoreProcess.java:1417) > ~[ignite-core-15.0.0-SNAPSHOT.jar:15.0.0-SNAPSHOT] > at > org.apache.ignite.internal.processors.security.thread.SecurityAwareRunnable.run(SecurityAwareRunnable.java:51) > ~[ignite-core-15.0.0-SNAPSHOT.jar:15.0.0-SNAPSHOT] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[?:1.8.0_201] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[?:1.8.0_201] > at > org.apache.ignite.internal.processors.security.thread.SecurityAwareRunnable.run(SecurityAwareRunnable.java:51) > ~[ignite-core-15.0.0-SNAPSHOT.jar:15.0.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > ~[?:1.8.0_201] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > ~[?:1.8.0_201] > at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_201] > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-19252) The incremental snapshot restore operation fails if there is a node not from the baseline.
Nikita Amelchev created IGNITE-19252: Summary: The incremental snapshot restore operation fails if there is a node not from the baseline. Key: IGNITE-19252 URL: https://issues.apache.org/jira/browse/IGNITE-19252 Project: Ignite Issue Type: Bug Reporter: Nikita Amelchev Assignee: Nikita Amelchev Fix For: 2.15 The incremental snapshot restore operation fails if there is a node not from the baseline: {noformat} 21:20:40.324 [disco-notifier-worker-#147%server-1%] ERROR org.apache.ignite.internal.processors.cache.persistence.snapshot.SnapshotRestoreProcess - Failed to restore snapshot cache groups [reqId=55eead09-4da7-4232-8e98-976dba117d91]. org.apache.ignite.IgniteCheckedException: Snapshot metafile cannot be read due to it doesn't exist: /work/snapshots/snp1/increments/0001/server_3.smf at org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.readFromFile(IgniteSnapshotManager.java:2001) ~[ignite-core-15.0.0-SNAPSHOT.jar:15.0.0-SNAPSHOT] at org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.readIncrementalSnapshotMetadata(IgniteSnapshotManager.java:1098) ~[ignite-core-15.0.0-SNAPSHOT.jar:15.0.0-SNAPSHOT] at org.apache.ignite.internal.processors.cache.persistence.snapshot.IncrementalSnapshotProcessor.process(IncrementalSnapshotProcessor.java:94) ~[ignite-core-15.0.0-SNAPSHOT.jar:15.0.0-SNAPSHOT] at org.apache.ignite.internal.processors.cache.persistence.snapshot.SnapshotRestoreProcess.restoreIncrementalSnapshot(SnapshotRestoreProcess.java:1466) ~[ignite-core-15.0.0-SNAPSHOT.jar:15.0.0-SNAPSHOT] at org.apache.ignite.internal.processors.cache.persistence.snapshot.SnapshotRestoreProcess.lambda$incrementalSnapshotRestore$35(SnapshotRestoreProcess.java:1417) ~[ignite-core-15.0.0-SNAPSHOT.jar:15.0.0-SNAPSHOT] at org.apache.ignite.internal.processors.security.thread.SecurityAwareRunnable.run(SecurityAwareRunnable.java:51) ~[ignite-core-15.0.0-SNAPSHOT.jar:15.0.0-SNAPSHOT] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_201] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_201] at org.apache.ignite.internal.processors.security.thread.SecurityAwareRunnable.run(SecurityAwareRunnable.java:51) ~[ignite-core-15.0.0-SNAPSHOT.jar:15.0.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_201] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_201] at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_201] {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19238) ItDataTypesTest and ItCreateTableDdlTest are flaky
[ https://issues.apache.org/jira/browse/IGNITE-19238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Lapin updated IGNITE-19238: - Summary: ItDataTypesTest and ItCreateTableDdlTest are flaky (was: ItDataTypesTest and is flaky) > ItDataTypesTest and ItCreateTableDdlTest are flaky > -- > > Key: IGNITE-19238 > URL: https://issues.apache.org/jira/browse/IGNITE-19238 > Project: Ignite > Issue Type: Bug >Reporter: Alexander Lapin >Assignee: Alexander Lapin >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Attachments: Снимок экрана от 2023-04-06 10-39-32.png > > Time Spent: 20m > Remaining Estimate: 0h > > h3. Description & Root cause > 1. ItDataTypesTest is flaky because previous ItCreateTableDdlTest tests > failed to stop replicas on node stop: > !Снимок экрана от 2023-04-06 10-39-32.png! > {code:java} > java.lang.AssertionError: There are replicas alive > [replicas=[b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_21, > b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_6, > b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_13, > b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_8, > b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_9, > b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_11]] > at > org.apache.ignite.internal.replicator.ReplicaManager.stop(ReplicaManager.java:341) > at > org.apache.ignite.internal.app.LifecycleManager.lambda$stopAllComponents$1(LifecycleManager.java:133) > at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) > at > org.apache.ignite.internal.app.LifecycleManager.stopAllComponents(LifecycleManager.java:131) > at > org.apache.ignite.internal.app.LifecycleManager.stopNode(LifecycleManager.java:115){code} > 2. The reason why we failed to stop replicas is the race between > tablesToStopInCaseOfError cleanup and adding tables to tablesByIdVv. > On TableManager stop, we stop and cleanup all table resources like replicas > and raft nodes > {code:java} > public void stop() { > ... > Map tables = tablesByIdVv.latest(); // 1* > cleanUpTablesResources(tables); > cleanUpTablesResources(tablesToStopInCaseOfError); > ... > }{code} > where tablesToStopInCaseOfError is a sort of pending tables list which one is > cleared on cfg storage revision update. > tablesByIdVv *listens same storage revision update event* in order to publish > tables related to the given revision or in other words make such tables > accessible from tablesByIdVv.latest(); that one that is used in order to > retrieve tables for cleanup on components stop (see // 1* above) > {code:java} > public TableManager( > ... > tablesByIdVv = new IncrementalVersionedValue<>(registry, HashMap::new); > registry.accept(token -> { > tablesToStopInCaseOfError.clear(); > > return completedFuture(null); > }); > {code} > However inside IncrementalVersionedValue we have async storageRevision update > processing > {code:java} > updaterFuture = updaterFuture.whenComplete((v, t) -> > versionedValue.complete(causalityToken, localUpdaterFuture)); {code} > As a result it's possible that we will clear tablesToStopInCaseOfError before > publishing same revision tables to tablesByIdVv, so that we will miss that > cleared tables in tablesByIdVv.latest() which is used in TableManager#stop. > h3. Implementation Notes > 1. First of all I've renamed tablesToStopInCaseOfError to pending tables, > because they aren't only ...InCaseOfError. > 2. I've also reworked tablesToStopInCaseOfError cleanup by substituting > tablesToStopInCaseOfError.clear on revision change with > {code:java} > tablesByIdVv.get(causalityToken).thenAccept(ignored -> inBusyLock(busyLock, > ()-> { > pendingTables.remove(tblId); > })); {code} > meaning that we > 2.1. remove specific table by id instead of ready. > 2.2. do that removal on corresponding table publishing wihtin tablesByIdVv. > 3. That means that at some point right after the publishing but before > removal it's possible to have same table both within tablesByIdVv and > pendingTables thus in order not to stop same table twice (which is safe by > the way because of idempotentce) I've substituted > {code:java} > cleanUpTablesResources(tables); > cleanUpTablesResources(tablesToStopInCaseOfError); {code} > with > {code:java} > Map tablesToStop = > Stream.concat(tablesByIdVv.latest().entrySet().stream(), > pendingTables.entrySet().stream()). > collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue, (v1, > v2) -> v1)); > cleanUpTablesResources(tablesToStop); {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19238) ItDataTypesTest and is flaky
[ https://issues.apache.org/jira/browse/IGNITE-19238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Lapin updated IGNITE-19238: - Summary: ItDataTypesTest and is flaky (was: ItDataTypesTest is flaky) > ItDataTypesTest and is flaky > > > Key: IGNITE-19238 > URL: https://issues.apache.org/jira/browse/IGNITE-19238 > Project: Ignite > Issue Type: Bug >Reporter: Alexander Lapin >Assignee: Alexander Lapin >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Attachments: Снимок экрана от 2023-04-06 10-39-32.png > > Time Spent: 20m > Remaining Estimate: 0h > > h3. Description & Root cause > 1. ItDataTypesTest is flaky because previous ItCreateTableDdlTest tests > failed to stop replicas on node stop: > !Снимок экрана от 2023-04-06 10-39-32.png! > {code:java} > java.lang.AssertionError: There are replicas alive > [replicas=[b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_21, > b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_6, > b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_13, > b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_8, > b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_9, > b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_11]] > at > org.apache.ignite.internal.replicator.ReplicaManager.stop(ReplicaManager.java:341) > at > org.apache.ignite.internal.app.LifecycleManager.lambda$stopAllComponents$1(LifecycleManager.java:133) > at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) > at > org.apache.ignite.internal.app.LifecycleManager.stopAllComponents(LifecycleManager.java:131) > at > org.apache.ignite.internal.app.LifecycleManager.stopNode(LifecycleManager.java:115){code} > 2. The reason why we failed to stop replicas is the race between > tablesToStopInCaseOfError cleanup and adding tables to tablesByIdVv. > On TableManager stop, we stop and cleanup all table resources like replicas > and raft nodes > {code:java} > public void stop() { > ... > Map tables = tablesByIdVv.latest(); // 1* > cleanUpTablesResources(tables); > cleanUpTablesResources(tablesToStopInCaseOfError); > ... > }{code} > where tablesToStopInCaseOfError is a sort of pending tables list which one is > cleared on cfg storage revision update. > tablesByIdVv *listens same storage revision update event* in order to publish > tables related to the given revision or in other words make such tables > accessible from tablesByIdVv.latest(); that one that is used in order to > retrieve tables for cleanup on components stop (see // 1* above) > {code:java} > public TableManager( > ... > tablesByIdVv = new IncrementalVersionedValue<>(registry, HashMap::new); > registry.accept(token -> { > tablesToStopInCaseOfError.clear(); > > return completedFuture(null); > }); > {code} > However inside IncrementalVersionedValue we have async storageRevision update > processing > {code:java} > updaterFuture = updaterFuture.whenComplete((v, t) -> > versionedValue.complete(causalityToken, localUpdaterFuture)); {code} > As a result it's possible that we will clear tablesToStopInCaseOfError before > publishing same revision tables to tablesByIdVv, so that we will miss that > cleared tables in tablesByIdVv.latest() which is used in TableManager#stop. > h3. Implementation Notes > 1. First of all I've renamed tablesToStopInCaseOfError to pending tables, > because they aren't only ...InCaseOfError. > 2. I've also reworked tablesToStopInCaseOfError cleanup by substituting > tablesToStopInCaseOfError.clear on revision change with > {code:java} > tablesByIdVv.get(causalityToken).thenAccept(ignored -> inBusyLock(busyLock, > ()-> { > pendingTables.remove(tblId); > })); {code} > meaning that we > 2.1. remove specific table by id instead of ready. > 2.2. do that removal on corresponding table publishing wihtin tablesByIdVv. > 3. That means that at some point right after the publishing but before > removal it's possible to have same table both within tablesByIdVv and > pendingTables thus in order not to stop same table twice (which is safe by > the way because of idempotentce) I've substituted > {code:java} > cleanUpTablesResources(tables); > cleanUpTablesResources(tablesToStopInCaseOfError); {code} > with > {code:java} > Map tablesToStop = > Stream.concat(tablesByIdVv.latest().entrySet().stream(), > pendingTables.entrySet().stream()). > collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue, (v1, > v2) -> v1)); > cleanUpTablesResources(tablesToStop); {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19238) ItDataTypesTest is flaky
[ https://issues.apache.org/jira/browse/IGNITE-19238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Lapin updated IGNITE-19238: - Description: h3. Description & Root cause 1. ItDataTypesTest is flaky because previous ItCreateTableDdlTest tests failed to stop replicas on node stop: !Снимок экрана от 2023-04-06 10-39-32.png! {code:java} java.lang.AssertionError: There are replicas alive [replicas=[b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_21, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_6, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_13, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_8, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_9, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_11]] at org.apache.ignite.internal.replicator.ReplicaManager.stop(ReplicaManager.java:341) at org.apache.ignite.internal.app.LifecycleManager.lambda$stopAllComponents$1(LifecycleManager.java:133) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at org.apache.ignite.internal.app.LifecycleManager.stopAllComponents(LifecycleManager.java:131) at org.apache.ignite.internal.app.LifecycleManager.stopNode(LifecycleManager.java:115){code} 2. The reason why we failed to stop replicas is the race between tablesToStopInCaseOfError cleanup and adding tables to tablesByIdVv. On TableManager stop, we stop and cleanup all table resources like replicas and raft nodes {code:java} public void stop() { ... Map tables = tablesByIdVv.latest(); // 1* cleanUpTablesResources(tables); cleanUpTablesResources(tablesToStopInCaseOfError); ... }{code} where tablesToStopInCaseOfError is a sort of pending tables list which one is cleared on cfg storage revision update. tablesByIdVv *listens same storage revision update event* in order to publish tables related to the given revision or in other words make such tables accessible from tablesByIdVv.latest(); that one that is used in order to retrieve tables for cleanup on components stop (see // 1* above) {code:java} public TableManager( ... tablesByIdVv = new IncrementalVersionedValue<>(registry, HashMap::new); registry.accept(token -> { tablesToStopInCaseOfError.clear(); return completedFuture(null); }); {code} However inside IncrementalVersionedValue we have async storageRevision update processing {code:java} updaterFuture = updaterFuture.whenComplete((v, t) -> versionedValue.complete(causalityToken, localUpdaterFuture)); {code} As a result it's possible that we will clear tablesToStopInCaseOfError before publishing same revision tables to tablesByIdVv, so that we will miss that cleared tables in tablesByIdVv.latest() which is used in TableManager#stop. h3. Implementation Notes 1. First of all I've renamed tablesToStopInCaseOfError to pending tables, because they aren't only ...InCaseOfError. 2. I've also reworked tablesToStopInCaseOfError cleanup by substituting tablesToStopInCaseOfError.clear on revision change with {code:java} tablesByIdVv.get(causalityToken).thenAccept(ignored -> inBusyLock(busyLock, ()-> { pendingTables.remove(tblId); })); {code} meaning that we 2.1. remove specific table by id instead of ready. 2.2. do that removal on corresponding table publishing wihtin tablesByIdVv. 3. That means that at some point right after the publishing but before removal it's possible to have same table both within tablesByIdVv and pendingTables thus in order not to stop same table twice (which is safe by the way because of idempotentce) I've substituted {code:java} cleanUpTablesResources(tables); cleanUpTablesResources(tablesToStopInCaseOfError); {code} with {code:java} Map tablesToStop = Stream.concat(tablesByIdVv.latest().entrySet().stream(), pendingTables.entrySet().stream()). collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue, (v1, v2) -> v1)); cleanUpTablesResources(tablesToStop); {code} was: h3. Description & Root cause 1. ItDataTypesTest is flaky because previous ItCreateTableDdlTest tests failed to stop replicas on node stop: !Снимок экрана от 2023-04-06 10-39-32.png! {code:java} java.lang.AssertionError: There are replicas alive [replicas=[b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_21, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_6, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_13, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_8, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_9, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_11]] at org.apache.ignite.internal.replicator.ReplicaManager.stop(ReplicaManager.java:341) at org.apache.ignite.internal.app.LifecycleManager.lambda$stopAllComponents$1(LifecycleManager.java:133) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at org.apache.ignite.internal.app.LifecycleManager.stopAllComponents(LifecycleManager.java:131) at org.apache.ignite.internal.app.LifecycleManager.stopNode(LifecycleManager.java:115){code} 2. The reason why
[jira] [Updated] (IGNITE-19231) Change thread pool for metastore raft group
[ https://issues.apache.org/jira/browse/IGNITE-19231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Tkalenko updated IGNITE-19231: - Reviewer: Roman Puchkovskiy > Change thread pool for metastore raft group > --- > > Key: IGNITE-19231 > URL: https://issues.apache.org/jira/browse/IGNITE-19231 > Project: Ignite > Issue Type: Bug >Reporter: Kirill Tkalenko >Assignee: Kirill Tkalenko >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 10m > Remaining Estimate: 0h > > It was discovered that the common thread pool is used for raft group the > metastorage and partitions, which can lead to deadlocks. The metastorage > needs its own thread pool. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19238) ItDataTypesTest is flaky
[ https://issues.apache.org/jira/browse/IGNITE-19238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Lapin updated IGNITE-19238: - Description: h3. Description & Root cause 1. ItDataTypesTest is flaky because previous ItCreateTableDdlTest tests failed to stop replicas on node stop: !Снимок экрана от 2023-04-06 10-39-32.png! {code:java} java.lang.AssertionError: There are replicas alive [replicas=[b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_21, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_6, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_13, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_8, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_9, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_11]] at org.apache.ignite.internal.replicator.ReplicaManager.stop(ReplicaManager.java:341) at org.apache.ignite.internal.app.LifecycleManager.lambda$stopAllComponents$1(LifecycleManager.java:133) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at org.apache.ignite.internal.app.LifecycleManager.stopAllComponents(LifecycleManager.java:131) at org.apache.ignite.internal.app.LifecycleManager.stopNode(LifecycleManager.java:115){code} 2. The reason why we failed to stop replicas is the race between tablesToStopInCaseOfError cleanup and adding tables to tablesByIdVv. On TableManager stop, we stop and cleanup all table resources like replicas and raft nodes {code:java} public void stop() { ... Map tables = tablesByIdVv.latest(); // 1* cleanUpTablesResources(tables); cleanUpTablesResources(tablesToStopInCaseOfError); ... }{code} where tablesToStopInCaseOfError is a sort of pending tables list which one is cleared on cfg storage revision update. tablesByIdVv *listens same storage revision update event* in order to publish tables related to the given revision or in other words make such tables accessible from tablesByIdVv.latest(); that one that is used in order to retrieve tables for cleanup on components stop (see // 1* above) {code:java} public TableManager( ... tablesByIdVv = new IncrementalVersionedValue<>(registry, HashMap::new); registry.accept(token -> { tablesToStopInCaseOfError.clear(); return completedFuture(null); }); {code} However inside IncrementalVersionedValue we have async storageRevision update processing {code:java} updaterFuture = updaterFuture.whenComplete((v, t) -> versionedValue.complete(causalityToken, localUpdaterFuture)); {code} As a result it's possible that we will clear tablesToStopInCaseOfError before publishing same revision tables to tablesByIdVv, so that we will miss that cleared tables in tablesByIdVv.latest() which is used in TableManager#stop. h3. Implementation Notes 1. First of all I've renamed tablesToStopInCaseOfError to pending tables, because they aren't only ...InCaseOfError. 2. I've also reworked tablesToStopInCaseOfError cleanup by substituting tablesToStopInCaseOfError.clear on revision change with {code:java} tablesByIdVv.get(causalityToken).thenAccept(ignored -> inBusyLock(busyLock, ()-> { pendingTables.remove(tblId); })); {code} meaning that we 2.1. remove specific table by id instead of ready. 2.2. do that removal on corresponding table publishing wihtin tablesByIdVv. 3. That means that at some point right after the publishing but before removal it's possible to have same table both within tablesByIdVv and pendingTables thus in order not to stop same table twice (which is safe by the way because of idempotentce) I've substituted {code:java} cleanUpTablesResources(tables); cleanUpTablesResources(tablesToStopInCaseOfError); {code} with {code:java} Stream tablesToStop = Stream.concat(tablesByIdVv.latest().entrySet().stream(), pendingTables.entrySet().stream()).distinct(). map(Map.Entry::getValue); cleanUpTablesResources(tablesToStop); {code} was: h3. Description & Root cause 1. ItDataTypesTest is flaky because previous ItCreateTableDdlTest tests failed to stop replicas on node stop: !Снимок экрана от 2023-04-06 10-39-32.png! {code:java} java.lang.AssertionError: There are replicas alive [replicas=[b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_21, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_6, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_13, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_8, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_9, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_11]] at org.apache.ignite.internal.replicator.ReplicaManager.stop(ReplicaManager.java:341) at org.apache.ignite.internal.app.LifecycleManager.lambda$stopAllComponents$1(LifecycleManager.java:133) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at org.apache.ignite.internal.app.LifecycleManager.stopAllComponents(LifecycleManager.java:131) at org.apache.ignite.internal.app.LifecycleManager.stopNode(LifecycleManager.java:115){code} 2. The reason why we failed to stop replicas is
[jira] [Updated] (IGNITE-19116) Sql. UPDATE statement fails with NPE when table does not exist
[ https://issues.apache.org/jira/browse/IGNITE-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Pereslegin updated IGNITE-19116: -- Fix Version/s: 3.0.0-beta2 > Sql. UPDATE statement fails with NPE when table does not exist > -- > > Key: IGNITE-19116 > URL: https://issues.apache.org/jira/browse/IGNITE-19116 > Project: Ignite > Issue Type: Bug > Components: sql >Affects Versions: 3.0.0-beta2 >Reporter: Maksim Zhuravkov >Assignee: Pavel Pereslegin >Priority: Minor > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > UPDATE statement fails with NPE when table does not exist. > {code:java} > @Test > public void test() { >sql("UPDATE unknown SET j = j + 1"); > } > {code} > Error: > {code:java} > java.lang.NullPointerException > at > org.apache.ignite.internal.sql.engine.prepare.IgniteSqlValidator.createSourceSelectForUpdate(IgniteSqlValidator.java:175) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.performUnconditionalRewrites(SqlValidatorImpl.java:1476) > at > org.apache.ignite.internal.sql.engine.prepare.IgniteSqlValidator.performUnconditionalRewrites(IgniteSqlValidator.java:383) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:1046) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:759) > at > org.apache.ignite.internal.sql.engine.prepare.IgniteSqlValidator.validate(IgniteSqlValidator.java:135) > at > org.apache.ignite.internal.sql.engine.prepare.IgnitePlanner.validate(IgnitePlanner.java:189) > {code} > *Expected behavoir* > It should return throw objectNotFound error: > {code:java} > Object 'UNKNOWN' not found > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19116) Sql. UPDATE statement fails with NPE when table does not exist
[ https://issues.apache.org/jira/browse/IGNITE-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Pereslegin updated IGNITE-19116: -- Ignite Flags: (was: Docs Required,Release Notes Required) > Sql. UPDATE statement fails with NPE when table does not exist > -- > > Key: IGNITE-19116 > URL: https://issues.apache.org/jira/browse/IGNITE-19116 > Project: Ignite > Issue Type: Bug > Components: sql >Affects Versions: 3.0.0-beta2 >Reporter: Maksim Zhuravkov >Assignee: Pavel Pereslegin >Priority: Minor > Labels: ignite-3 > > UPDATE statement fails with NPE when table does not exist. > {code:java} > @Test > public void test() { >sql("UPDATE unknown SET j = j + 1"); > } > {code} > Error: > {code:java} > java.lang.NullPointerException > at > org.apache.ignite.internal.sql.engine.prepare.IgniteSqlValidator.createSourceSelectForUpdate(IgniteSqlValidator.java:175) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.performUnconditionalRewrites(SqlValidatorImpl.java:1476) > at > org.apache.ignite.internal.sql.engine.prepare.IgniteSqlValidator.performUnconditionalRewrites(IgniteSqlValidator.java:383) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:1046) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:759) > at > org.apache.ignite.internal.sql.engine.prepare.IgniteSqlValidator.validate(IgniteSqlValidator.java:135) > at > org.apache.ignite.internal.sql.engine.prepare.IgnitePlanner.validate(IgnitePlanner.java:189) > {code} > *Expected behavoir* > It should return throw objectNotFound error: > {code:java} > Object 'UNKNOWN' not found > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-18454) Explain treading model in corresponding README.md file for TableManager
[ https://issues.apache.org/jira/browse/IGNITE-18454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vyacheslav Koptilin reassigned IGNITE-18454: Assignee: Denis Chudov > Explain treading model in corresponding README.md file for TableManager > --- > > Key: IGNITE-18454 > URL: https://issues.apache.org/jira/browse/IGNITE-18454 > Project: Ignite > Issue Type: Improvement >Reporter: Alexander Lapin >Assignee: Denis Chudov >Priority: Major > Labels: ignite-3 > > Use ignite-3/modules/raft/README.md as a reference thread-model explanation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-18461) Add fish-like suggestions to CLI
[ https://issues.apache.org/jira/browse/IGNITE-18461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17709426#comment-17709426 ] Aleksandr commented on IGNITE-18461: merged into main: c46a971ecbc4493fdcc30f1f95f349831d1da5be > Add fish-like suggestions to CLI > > > Key: IGNITE-18461 > URL: https://issues.apache.org/jira/browse/IGNITE-18461 > Project: Ignite > Issue Type: Task > Components: cli >Reporter: Aleksandr >Assignee: Aleksandr >Priority: Major > Labels: ignite-3, ignite-3-cli-tool > Time Spent: 20m > Remaining Estimate: 0h > > We can add fish-like autosuggestions of typed text > https://github.com/jline/jline3/wiki/Autosuggestions > The user should be able to switch off such behavior. I suggest doing it via > CLI profile but maybe there is a better way. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19239) Checkpoint read lock acquisition timeouts during snapshot restore
[ https://issues.apache.org/jira/browse/IGNITE-19239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Shishkov updated IGNITE-19239: --- Description: There may be possible error messages about checkpoint read lock acquisition timeouts and critical threads blocking during snapshot restore process (just after caches start): {quote} [2023-04-06T10:55:46,561][ERROR]\[ttl-cleanup-worker-#475%node%][CheckpointTimeoutLock] Checkpoint read lock acquisition has been timed out. {quote} {quote} [2023-04-06T10:55:47,487][ERROR]\[tcp-disco-msg-worker-[crd]\-#23%node%\-#446%node%][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour \[workerName=db-checkpoint-thread, threadName=db-checkpoint-thread-#457%snapshot.BlockingThreadsOnSnapshotRestoreReproducerTest0%, {color:red}blockedFor=100s{color}] {quote} Also there are active exchange process, which finishes with such timings (timing will be approximatelly equal to blocking time of threads): {quote} [2023-04-06T10:55:52,211][INFO ]\[exchange-worker-#450%node%][GridDhtPartitionsExchangeFuture] Exchange timings [startVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], resVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], stage="Waiting in exchange queue" (0 ms), ..., stage="Restore partition states" ({color:red}100163 ms{color}), ..., stage="Total time" ({color:red}100334 ms{color})] {quote} Is I understand, such errors do not affect restoring, but can confuse. How to reproduce: # Set checkpoint frequency less than failure detection timeout. # Ensure, that cache groups partitions states restoring lasts more than failure detection timeout, i.e. it is actual to sufficiently large caches. Reproducer: [^BlockingThreadsOnSnapshotRestoreReproducerTest.patch] was: There may be possible error messages about checkpoint read lock acquisition timeouts and critical threads blocking during snapshot restore process (just after caches start): {quote} [2023-04-06T10:55:46,561][ERROR]\[ttl-cleanup-worker-#475%node%][CheckpointTimeoutLock] Checkpoint read lock acquisition has been timed out. {quote} {quote} [2023-04-06T10:55:47,487][ERROR]\[tcp-disco-msg-worker-[crd]\-#23%node%\-#446%node%][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour \[workerName=db-checkpoint-thread, threadName=db-checkpoint-thread-#457%snapshot.BlockingThreadsOnSnapshotRestoreReproducerTest0%, {color:red}blockedFor=100s{color}] {quote} Also there are active exchange process, which finishes with such timings (timing will be approximatelly equal to blocking time of threads): {quote} [2023-04-06T10:55:52,211][INFO ]\[exchange-worker-#450%node%][GridDhtPartitionsExchangeFuture] Exchange timings [startVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], resVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], stage="Waiting in exchange queue" (0 ms), ..., stage="Restore partition states" ({color:red}100163 ms{color}), ..., stage="Total time" ({color:red}100334 ms{color})] {quote} Is I understand, such errors do not affect restoring, but such error messages can confuse. How to reproduce: # Set checkpoint frequency less than failure detection timeout. # Ensure, that cache groups partitions states restoring lasts more than failure detection timeout, i.e. it is actual to sufficiently large caches. Reproducer: [^BlockingThreadsOnSnapshotRestoreReproducerTest.patch] > Checkpoint read lock acquisition timeouts during snapshot restore > - > > Key: IGNITE-19239 > URL: https://issues.apache.org/jira/browse/IGNITE-19239 > Project: Ignite > Issue Type: Bug >Reporter: Ilya Shishkov >Priority: Minor > Labels: iep-43, ise > Attachments: BlockingThreadsOnSnapshotRestoreReproducerTest.patch > > > There may be possible error messages about checkpoint read lock acquisition > timeouts and critical threads blocking during snapshot restore process (just > after caches start): > {quote} > [2023-04-06T10:55:46,561][ERROR]\[ttl-cleanup-worker-#475%node%][CheckpointTimeoutLock] > Checkpoint read lock acquisition has been timed out. > {quote} > {quote} > [2023-04-06T10:55:47,487][ERROR]\[tcp-disco-msg-worker-[crd]\-#23%node%\-#446%node%][G] > Blocked system-critical thread has been detected. This can lead to > cluster-wide undefined behaviour \[workerName=db-checkpoint-thread, > threadName=db-checkpoint-thread-#457%snapshot.BlockingThreadsOnSnapshotRestoreReproducerTest0%, > {color:red}blockedFor=100s{color}] > {quote} > Also there are active exchange process, which finishes with such timings > (timing will be approximatelly equal to blocking time of threads): > {quote} > [2023-04-06T10:55:52,211][INFO >
[jira] [Updated] (IGNITE-19239) Checkpoint read lock acquisition timeouts during snapshot restore
[ https://issues.apache.org/jira/browse/IGNITE-19239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Shishkov updated IGNITE-19239: --- Description: There may be possible error messages about checkpoint read lock acquisition timeouts and critical threads blocking during snapshot restore process (just after caches start): {quote} [2023-04-06T10:55:46,561][ERROR]\[ttl-cleanup-worker-#475%node%][CheckpointTimeoutLock] Checkpoint read lock acquisition has been timed out. {quote} {quote} [2023-04-06T10:55:47,487][ERROR]\[tcp-disco-msg-worker-[crd]\-#23%node%\-#446%node%][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour \[workerName=db-checkpoint-thread, threadName=db-checkpoint-thread-#457%snapshot.BlockingThreadsOnSnapshotRestoreReproducerTest0%, {color:red}blockedFor=100s{color}] {quote} Also there are active exchange process, which finishes with such timings (timing will be approximatelly equal to blocking time of threads): {quote} [2023-04-06T10:55:52,211][INFO ]\[exchange-worker-#450%node%][GridDhtPartitionsExchangeFuture] Exchange timings [startVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], resVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], stage="Waiting in exchange queue" (0 ms), ..., stage="Restore partition states" ({color:red}100163 ms{color}), ..., stage="Total time" ({color:red}100334 ms{color})] {quote} Is I understand, such errors does not affect restoring, but such error messages can confuse. How to reproduce: # Set checkpoint frequency less than failure detection timeout. # Ensure, that cache groups partitions states restoring lasts more than failure detection timeout, i.e. it is actual to sufficiently large caches. Reproducer: [^BlockingThreadsOnSnapshotRestoreReproducerTest.patch] was: There may be possible error messages about checkpoint read lock acquisition timeouts and critical threads blocking during snapshot restore process (just after caches start): {quote}[2023-04-06T10:55:46,561][ERROR][ttl-cleanup-worker-#475%node%][CheckpointTimeoutLock] Checkpoint read lock acquisition has been timed out. {quote} {quote}[2023-04-06T10:55:47,487][ERROR][tcp-disco-msg-worker-[crd]-#23%node%-#446%node%][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [workerName=db-checkpoint-thread, threadName=db-checkpoint-thread-#457%snapshot.BlockingThreadsOnSnapshotRestoreReproducerTest0%, {color:#FF}blockedFor=100s{color}] {quote} Also there are active exchange process, which finishes with such timings (timing will be approximatelly equal to blocking time of threads): {quote}[2023-04-06T10:55:52,211][INFO ][exchange-worker-#450%node%][GridDhtPartitionsExchangeFuture] Exchange timings [startVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], resVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], stage="Waiting in exchange queue" (0 ms), ..., stage="Restore partition states" ({color:#FF}100163 ms{color}), ..., stage="Total time" ({color:#FF}100334 ms{color})] {quote} Is I understand, such errors does not affect restoring, but such error messages can confuse. How to reproduce: # Set checkpoint frequency less than failure detection timeout. # Ensure, that cache groups partitions states restoring lasts more than failure detection timeout, i.e. it is actual to sufficiently large caches. Reproducer: [^BlockingThreadsOnSnapshotRestoreReproducerTest.patch] > Checkpoint read lock acquisition timeouts during snapshot restore > - > > Key: IGNITE-19239 > URL: https://issues.apache.org/jira/browse/IGNITE-19239 > Project: Ignite > Issue Type: Bug >Reporter: Ilya Shishkov >Priority: Minor > Labels: iep-43, ise > Attachments: BlockingThreadsOnSnapshotRestoreReproducerTest.patch > > > There may be possible error messages about checkpoint read lock acquisition > timeouts and critical threads blocking during snapshot restore process (just > after caches start): > {quote} > [2023-04-06T10:55:46,561][ERROR]\[ttl-cleanup-worker-#475%node%][CheckpointTimeoutLock] > Checkpoint read lock acquisition has been timed out. > {quote} > {quote} > [2023-04-06T10:55:47,487][ERROR]\[tcp-disco-msg-worker-[crd]\-#23%node%\-#446%node%][G] > Blocked system-critical thread has been detected. This can lead to > cluster-wide undefined behaviour \[workerName=db-checkpoint-thread, > threadName=db-checkpoint-thread-#457%snapshot.BlockingThreadsOnSnapshotRestoreReproducerTest0%, > {color:red}blockedFor=100s{color}] > {quote} > Also there are active exchange process, which finishes with such timings > (timing will be approximatelly equal to blocking time of threads): > {quote} >
[jira] [Updated] (IGNITE-19239) Checkpoint read lock acquisition timeouts during snapshot restore
[ https://issues.apache.org/jira/browse/IGNITE-19239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Shishkov updated IGNITE-19239: --- Description: There may be possible error messages about checkpoint read lock acquisition timeouts and critical threads blocking during snapshot restore process (just after caches start): {quote} [2023-04-06T10:55:46,561][ERROR]\[ttl-cleanup-worker-#475%node%][CheckpointTimeoutLock] Checkpoint read lock acquisition has been timed out. {quote} {quote} [2023-04-06T10:55:47,487][ERROR]\[tcp-disco-msg-worker-[crd]\-#23%node%\-#446%node%][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour \[workerName=db-checkpoint-thread, threadName=db-checkpoint-thread-#457%snapshot.BlockingThreadsOnSnapshotRestoreReproducerTest0%, {color:red}blockedFor=100s{color}] {quote} Also there are active exchange process, which finishes with such timings (timing will be approximatelly equal to blocking time of threads): {quote} [2023-04-06T10:55:52,211][INFO ]\[exchange-worker-#450%node%][GridDhtPartitionsExchangeFuture] Exchange timings [startVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], resVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], stage="Waiting in exchange queue" (0 ms), ..., stage="Restore partition states" ({color:red}100163 ms{color}), ..., stage="Total time" ({color:red}100334 ms{color})] {quote} Is I understand, such errors do not affect restoring, but such error messages can confuse. How to reproduce: # Set checkpoint frequency less than failure detection timeout. # Ensure, that cache groups partitions states restoring lasts more than failure detection timeout, i.e. it is actual to sufficiently large caches. Reproducer: [^BlockingThreadsOnSnapshotRestoreReproducerTest.patch] was: There may be possible error messages about checkpoint read lock acquisition timeouts and critical threads blocking during snapshot restore process (just after caches start): {quote} [2023-04-06T10:55:46,561][ERROR]\[ttl-cleanup-worker-#475%node%][CheckpointTimeoutLock] Checkpoint read lock acquisition has been timed out. {quote} {quote} [2023-04-06T10:55:47,487][ERROR]\[tcp-disco-msg-worker-[crd]\-#23%node%\-#446%node%][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour \[workerName=db-checkpoint-thread, threadName=db-checkpoint-thread-#457%snapshot.BlockingThreadsOnSnapshotRestoreReproducerTest0%, {color:red}blockedFor=100s{color}] {quote} Also there are active exchange process, which finishes with such timings (timing will be approximatelly equal to blocking time of threads): {quote} [2023-04-06T10:55:52,211][INFO ]\[exchange-worker-#450%node%][GridDhtPartitionsExchangeFuture] Exchange timings [startVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], resVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], stage="Waiting in exchange queue" (0 ms), ..., stage="Restore partition states" ({color:red}100163 ms{color}), ..., stage="Total time" ({color:red}100334 ms{color})] {quote} Is I understand, such errors does not affect restoring, but such error messages can confuse. How to reproduce: # Set checkpoint frequency less than failure detection timeout. # Ensure, that cache groups partitions states restoring lasts more than failure detection timeout, i.e. it is actual to sufficiently large caches. Reproducer: [^BlockingThreadsOnSnapshotRestoreReproducerTest.patch] > Checkpoint read lock acquisition timeouts during snapshot restore > - > > Key: IGNITE-19239 > URL: https://issues.apache.org/jira/browse/IGNITE-19239 > Project: Ignite > Issue Type: Bug >Reporter: Ilya Shishkov >Priority: Minor > Labels: iep-43, ise > Attachments: BlockingThreadsOnSnapshotRestoreReproducerTest.patch > > > There may be possible error messages about checkpoint read lock acquisition > timeouts and critical threads blocking during snapshot restore process (just > after caches start): > {quote} > [2023-04-06T10:55:46,561][ERROR]\[ttl-cleanup-worker-#475%node%][CheckpointTimeoutLock] > Checkpoint read lock acquisition has been timed out. > {quote} > {quote} > [2023-04-06T10:55:47,487][ERROR]\[tcp-disco-msg-worker-[crd]\-#23%node%\-#446%node%][G] > Blocked system-critical thread has been detected. This can lead to > cluster-wide undefined behaviour \[workerName=db-checkpoint-thread, > threadName=db-checkpoint-thread-#457%snapshot.BlockingThreadsOnSnapshotRestoreReproducerTest0%, > {color:red}blockedFor=100s{color}] > {quote} > Also there are active exchange process, which finishes with such timings > (timing will be approximatelly equal to blocking time of threads): > {quote} >
[jira] [Updated] (IGNITE-19239) Checkpoint read lock acquisition timeouts during snapshot restore
[ https://issues.apache.org/jira/browse/IGNITE-19239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Shishkov updated IGNITE-19239: --- Description: There may be possible error messages about checkpoint read lock acquisition timeouts and critical threads blocking during snapshot restore process (just after caches start): {quote}[2023-04-06T10:55:46,561][ERROR][ttl-cleanup-worker-#475%node%][CheckpointTimeoutLock] Checkpoint read lock acquisition has been timed out. {quote} {quote}[2023-04-06T10:55:47,487][ERROR][tcp-disco-msg-worker-[crd]-#23%node%-#446%node%][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [workerName=db-checkpoint-thread, threadName=db-checkpoint-thread-#457%snapshot.BlockingThreadsOnSnapshotRestoreReproducerTest0%, {color:#FF}blockedFor=100s{color}] {quote} Also there are active exchange process, which finishes with such timings (timing will be approximatelly equal to blocking time of threads): {quote}[2023-04-06T10:55:52,211][INFO ][exchange-worker-#450%node%][GridDhtPartitionsExchangeFuture] Exchange timings [startVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], resVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], stage="Waiting in exchange queue" (0 ms), ..., stage="Restore partition states" ({color:#FF}100163 ms{color}), ..., stage="Total time" ({color:#FF}100334 ms{color})] {quote} Is I understand, such errors does not affect restoring, but such error messages can confuse. How to reproduce: # Set checkpoint frequency less than failure detection timeout. # Ensure, that cache groups partitions states restoring lasts more than failure detection timeout, i.e. it is actual to sufficiently large caches. Reproducer: [^BlockingThreadsOnSnapshotRestoreReproducerTest.patch] was: There may be possible error messages about checkpoint read lock acquisition timeouts and critical threads blocking during snapshot restore process (just after caches start): {quote} [2023-04-06T10:55:46,561][ERROR]\[ttl-cleanup-worker-#475%node%][CheckpointTimeoutLock] Checkpoint read lock acquisition has been timed out. {quote} {quote} [2023-04-06T10:55:47,487][ERROR]\[tcp-disco-msg-worker-[crd]\-#23%node%\-#446%node%][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour \[workerName=db-checkpoint-thread, threadName=db-checkpoint-thread-#457%snapshot.BlockingThreadsOnSnapshotRestoreReproducerTest0%, {color:red}blockedFor=100s{color}] {quote} Also there are active exchange process, which finishes with such timings (timing will be approximatelly equal to blocking time of threads): {quote} [2023-04-06T10:55:52,211][INFO ]\[exchange-worker-#450%node%][GridDhtPartitionsExchangeFuture] Exchange timings [startVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], resVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], stage="Waiting in exchange queue" (0 ms), ..., stage="Restore partition states" ({color:red}100163 ms{color}), ..., stage="Total time" ({color:red}100334 ms{color})] {quote} How to reproduce: # Set checkpoint frequency less than failure detection timeout. # Ensure, that cache groups partitions states restoring lasts more than failure detection timeout, i.e. it is actual to sufficiently large caches. Reproducer: [^BlockingThreadsOnSnapshotRestoreReproducerTest.patch] > Checkpoint read lock acquisition timeouts during snapshot restore > - > > Key: IGNITE-19239 > URL: https://issues.apache.org/jira/browse/IGNITE-19239 > Project: Ignite > Issue Type: Bug >Reporter: Ilya Shishkov >Priority: Minor > Labels: iep-43, ise > Attachments: BlockingThreadsOnSnapshotRestoreReproducerTest.patch > > > There may be possible error messages about checkpoint read lock acquisition > timeouts and critical threads blocking during snapshot restore process (just > after caches start): > {quote}[2023-04-06T10:55:46,561][ERROR][ttl-cleanup-worker-#475%node%][CheckpointTimeoutLock] > Checkpoint read lock acquisition has been timed out. > {quote} > {quote}[2023-04-06T10:55:47,487][ERROR][tcp-disco-msg-worker-[crd]-#23%node%-#446%node%][G] > Blocked system-critical thread has been detected. This can lead to > cluster-wide undefined behaviour [workerName=db-checkpoint-thread, > threadName=db-checkpoint-thread-#457%snapshot.BlockingThreadsOnSnapshotRestoreReproducerTest0%, > {color:#FF}blockedFor=100s{color}] > {quote} > Also there are active exchange process, which finishes with such timings > (timing will be approximatelly equal to blocking time of threads): > {quote}[2023-04-06T10:55:52,211][INFO > ][exchange-worker-#450%node%][GridDhtPartitionsExchangeFuture] Exchange > timings [startVer=AffinityTopologyVersion
[jira] [Updated] (IGNITE-19211) ODBC 3.0: Align metainfo provided by driver with SQL engine in 3.0
[ https://issues.apache.org/jira/browse/IGNITE-19211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Sapego updated IGNITE-19211: - Epic Link: IGNITE-19250 (was: IGNITE-19131) > ODBC 3.0: Align metainfo provided by driver with SQL engine in 3.0 > -- > > Key: IGNITE-19211 > URL: https://issues.apache.org/jira/browse/IGNITE-19211 > Project: Ignite > Issue Type: Improvement > Components: odbc >Reporter: Igor Sapego >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > Scope: > - Make sure we return proper metainformation on SQL types. Check > ignite/odbc/meta, ignite/odbc/type_traits.h, etc; > - Port tests that are applicable; > - Add new tests where needed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19208) ODBC 3.0: Port msi builder scripts properly
[ https://issues.apache.org/jira/browse/IGNITE-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Sapego updated IGNITE-19208: - Epic Link: IGNITE-19251 (was: IGNITE-19131) > ODBC 3.0: Port msi builder scripts properly > --- > > Key: IGNITE-19208 > URL: https://issues.apache.org/jira/browse/IGNITE-19208 > Project: Ignite > Issue Type: Improvement > Components: odbc >Reporter: Igor Sapego >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > To do: > Make sure CMake flag ENABLE_ODBC_MSI works properly; -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19210) ODBC 3.0: Make sure DSN-managing UI works properly in Windows
[ https://issues.apache.org/jira/browse/IGNITE-19210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Sapego updated IGNITE-19210: - Epic Link: IGNITE-19251 (was: IGNITE-19131) > ODBC 3.0: Make sure DSN-managing UI works properly in Windows > - > > Key: IGNITE-19210 > URL: https://issues.apache.org/jira/browse/IGNITE-19210 > Project: Ignite > Issue Type: Improvement > Components: odbc >Reporter: Igor Sapego >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > Scope: > - Properly port content of ignite/odbc/system; > - Probably, come up with some kind of automatic tests for this functionality, > as it's always hard to make sure that UI is not broken. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19215) ODBC 3.0: Implement DML data batching
[ https://issues.apache.org/jira/browse/IGNITE-19215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Sapego updated IGNITE-19215: - Epic Link: IGNITE-19251 (was: IGNITE-19131) > ODBC 3.0: Implement DML data batching > - > > Key: IGNITE-19215 > URL: https://issues.apache.org/jira/browse/IGNITE-19215 > Project: Ignite > Issue Type: Improvement > Components: odbc >Reporter: Igor Sapego >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > Scope: > - Implement server side request handling; > - Port client side functionality; > - Port applicable tests; -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-19251) ODBC 3.0 Enchantments
Igor Sapego created IGNITE-19251: Summary: ODBC 3.0 Enchantments Key: IGNITE-19251 URL: https://issues.apache.org/jira/browse/IGNITE-19251 Project: Ignite Issue Type: Epic Components: odbc Reporter: Igor Sapego Assignee: Igor Sapego Enchantments for the Ignite 3 ODBC driver -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19131) ODBC 3.0 Basic functionality
[ https://issues.apache.org/jira/browse/IGNITE-19131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Sapego updated IGNITE-19131: - Description: We need to implement basic ODBC driver for Ignite 3. (was: We need to implement ODBC driver for Ignite 3.) > ODBC 3.0 Basic functionality > > > Key: IGNITE-19131 > URL: https://issues.apache.org/jira/browse/IGNITE-19131 > Project: Ignite > Issue Type: Epic > Components: odbc >Reporter: Igor Sapego >Assignee: Igor Sapego >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > We need to implement basic ODBC driver for Ignite 3. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19218) ODBC 3.0: Implement special columns query
[ https://issues.apache.org/jira/browse/IGNITE-19218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Sapego updated IGNITE-19218: - Epic Link: IGNITE-19250 (was: IGNITE-19131) > ODBC 3.0: Implement special columns query > - > > Key: IGNITE-19218 > URL: https://issues.apache.org/jira/browse/IGNITE-19218 > Project: Ignite > Issue Type: Improvement > Components: odbc >Reporter: Igor Sapego >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > Probably should just port dummy functionality and tests from Ignite 2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19214) ODBC 3.0: Implement table metadata fetching
[ https://issues.apache.org/jira/browse/IGNITE-19214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Sapego updated IGNITE-19214: - Epic Link: IGNITE-19250 (was: IGNITE-19131) > ODBC 3.0: Implement table metadata fetching > --- > > Key: IGNITE-19214 > URL: https://issues.apache.org/jira/browse/IGNITE-19214 > Project: Ignite > Issue Type: Improvement > Components: odbc >Reporter: Igor Sapego >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > Scope: > - Implement server side request handling; > - Implement client side metadata handling; > - Port applicable ports; -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19217) ODBC 3.0: Implement foreign keys query
[ https://issues.apache.org/jira/browse/IGNITE-19217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Sapego updated IGNITE-19217: - Epic Link: IGNITE-19250 (was: IGNITE-19131) > ODBC 3.0: Implement foreign keys query > -- > > Key: IGNITE-19217 > URL: https://issues.apache.org/jira/browse/IGNITE-19217 > Project: Ignite > Issue Type: Improvement > Components: odbc >Reporter: Igor Sapego >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > As we do not support them natively, probably should just port dummy > functionality and tests from Ignite 2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19131) ODBC 3.0 Basic functionality
[ https://issues.apache.org/jira/browse/IGNITE-19131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Sapego updated IGNITE-19131: - Epic Name: ODBC 3.0 Basic functionality (was: ODBC 3.0) > ODBC 3.0 Basic functionality > > > Key: IGNITE-19131 > URL: https://issues.apache.org/jira/browse/IGNITE-19131 > Project: Ignite > Issue Type: Epic > Components: odbc >Reporter: Igor Sapego >Assignee: Igor Sapego >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > We need to implement ODBC driver for Ignite 3. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19216) ODBC 3.0: implement type info fetching
[ https://issues.apache.org/jira/browse/IGNITE-19216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Sapego updated IGNITE-19216: - Epic Link: IGNITE-19250 (was: IGNITE-19131) > ODBC 3.0: implement type info fetching > -- > > Key: IGNITE-19216 > URL: https://issues.apache.org/jira/browse/IGNITE-19216 > Project: Ignite > Issue Type: Improvement > Components: odbc >Reporter: Igor Sapego >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > Scope: > - Decide whether we need to implement type info fetching from server or > whether we can implement it locally; > - Implement chosen solution; > - Port/Add new tests. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19219) ODBC 3.0: Implement primary keys query
[ https://issues.apache.org/jira/browse/IGNITE-19219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Sapego updated IGNITE-19219: - Epic Link: IGNITE-19250 (was: IGNITE-19131) > ODBC 3.0: Implement primary keys query > -- > > Key: IGNITE-19219 > URL: https://issues.apache.org/jira/browse/IGNITE-19219 > Project: Ignite > Issue Type: Improvement > Components: odbc >Reporter: Igor Sapego >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > This functionality was not implemented properly in Ignite 2, so we probably > will need to re-implement it. > Also port and add tests as needed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-19250) ODBC 3.0 Metainformation
Igor Sapego created IGNITE-19250: Summary: ODBC 3.0 Metainformation Key: IGNITE-19250 URL: https://issues.apache.org/jira/browse/IGNITE-19250 Project: Ignite Issue Type: Epic Components: odbc Reporter: Igor Sapego ODBC features that related to metadata providing and handling. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19131) ODBC 3.0 Basic functionality
[ https://issues.apache.org/jira/browse/IGNITE-19131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Sapego updated IGNITE-19131: - Summary: ODBC 3.0 Basic functionality (was: ODBC 3.0) > ODBC 3.0 Basic functionality > > > Key: IGNITE-19131 > URL: https://issues.apache.org/jira/browse/IGNITE-19131 > Project: Ignite > Issue Type: Epic > Components: odbc >Reporter: Igor Sapego >Assignee: Igor Sapego >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > We need to implement ODBC driver for Ignite 3. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-19116) Sql. UPDATE statement fails with NPE when table does not exist
[ https://issues.apache.org/jira/browse/IGNITE-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Pereslegin reassigned IGNITE-19116: - Assignee: Pavel Pereslegin > Sql. UPDATE statement fails with NPE when table does not exist > -- > > Key: IGNITE-19116 > URL: https://issues.apache.org/jira/browse/IGNITE-19116 > Project: Ignite > Issue Type: Bug > Components: sql >Affects Versions: 3.0.0-beta2 >Reporter: Maksim Zhuravkov >Assignee: Pavel Pereslegin >Priority: Minor > Labels: ignite-3 > > UPDATE statement fails with NPE when table does not exist. > {code:java} > @Test > public void test() { >sql("UPDATE unknown SET j = j + 1"); > } > {code} > Error: > {code:java} > java.lang.NullPointerException > at > org.apache.ignite.internal.sql.engine.prepare.IgniteSqlValidator.createSourceSelectForUpdate(IgniteSqlValidator.java:175) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.performUnconditionalRewrites(SqlValidatorImpl.java:1476) > at > org.apache.ignite.internal.sql.engine.prepare.IgniteSqlValidator.performUnconditionalRewrites(IgniteSqlValidator.java:383) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:1046) > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:759) > at > org.apache.ignite.internal.sql.engine.prepare.IgniteSqlValidator.validate(IgniteSqlValidator.java:135) > at > org.apache.ignite.internal.sql.engine.prepare.IgnitePlanner.validate(IgnitePlanner.java:189) > {code} > *Expected behavoir* > It should return throw objectNotFound error: > {code:java} > Object 'UNKNOWN' not found > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19239) Checkpoint read lock acquisition timeouts during snapshot restore
[ https://issues.apache.org/jira/browse/IGNITE-19239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Shishkov updated IGNITE-19239: --- Description: There may be possible error messages about checkpoint read lock acquisition timeouts and critical threads blocking during snapshot restore process (just after caches start): {quote} [2023-04-06T10:55:46,561][ERROR]\[ttl-cleanup-worker-#475%node%][CheckpointTimeoutLock] Checkpoint read lock acquisition has been timed out. {quote} {quote} [2023-04-06T10:55:47,487][ERROR]\[tcp-disco-msg-worker-[crd]\-#23%node%\-#446%node%][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour \[workerName=db-checkpoint-thread, threadName=db-checkpoint-thread-#457%snapshot.BlockingThreadsOnSnapshotRestoreReproducerTest0%, {color:red}blockedFor=100s{color}] {quote} Also there are active exchange process, which finishes with such timings (timing will be approximatelly equal to blocking time of threads): {quote} [2023-04-06T10:55:52,211][INFO ]\[exchange-worker-#450%node%][GridDhtPartitionsExchangeFuture] Exchange timings [startVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], resVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], stage="Waiting in exchange queue" (0 ms), ..., stage="Restore partition states" ({color:red}100163 ms{color}), ..., stage="Total time" ({color:red}100334 ms{color})] {quote} How to reproduce: # Set checkpoint frequency less than failure detection timeout. # Ensure, that cache groups partitions states restoring lasts more than failure detection timeout, i.e. it is actual to sufficiently large caches. Reproducer: [^BlockingThreadsOnSnapshotRestoreReproducerTest.patch] was: There may be possible error messages about checkpoint read lock acquisition timeouts and critical threads blocking during snapshot restore process (just after caches start): {quote} [2023-04-06T10:55:46,561][ERROR]\[ttl-cleanup-worker-#475%node%][CheckpointTimeoutLock] Checkpoint read lock acquisition has been timed out. {quote} {quote} [2023-04-06T10:55:47,487][ERROR]\[tcp-disco-msg-worker-[crd]-#23%node%-#446%node%][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour \[workerName=db-checkpoint-thread, threadName=db-checkpoint-thread-#457%snapshot.BlockingThreadsOnSnapshotRestoreReproducerTest0%, {color:red}blockedFor=100s{color}] {quote} Also there are active exchange process, which finishes with such timings (timing will be approximatelly equal to blocking time of threads): {quote} [2023-04-06T10:55:52,211][INFO ]\[exchange-worker-#450%node%][GridDhtPartitionsExchangeFuture] Exchange timings [startVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], resVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], stage="Waiting in exchange queue" (0 ms), ..., stage="Restore partition states" ({color:red}100163 ms{color}), ..., stage="Total time" ({color:red}100334 ms{color})] {quote} How to reproduce: # Set checkpoint frequency less than failure detection timeout. # Ensure, that cache groups partitions states restoring lasts more than failure detection timeout, i.e. it is actual to sufficiently large caches. Reproducer: [^BlockingThreadsOnSnapshotRestoreReproducerTest.patch] > Checkpoint read lock acquisition timeouts during snapshot restore > - > > Key: IGNITE-19239 > URL: https://issues.apache.org/jira/browse/IGNITE-19239 > Project: Ignite > Issue Type: Bug >Reporter: Ilya Shishkov >Priority: Minor > Labels: iep-43, ise > Attachments: BlockingThreadsOnSnapshotRestoreReproducerTest.patch > > > There may be possible error messages about checkpoint read lock acquisition > timeouts and critical threads blocking during snapshot restore process (just > after caches start): > {quote} > [2023-04-06T10:55:46,561][ERROR]\[ttl-cleanup-worker-#475%node%][CheckpointTimeoutLock] > Checkpoint read lock acquisition has been timed out. > {quote} > {quote} > [2023-04-06T10:55:47,487][ERROR]\[tcp-disco-msg-worker-[crd]\-#23%node%\-#446%node%][G] > Blocked system-critical thread has been detected. This can lead to > cluster-wide undefined behaviour \[workerName=db-checkpoint-thread, > threadName=db-checkpoint-thread-#457%snapshot.BlockingThreadsOnSnapshotRestoreReproducerTest0%, > {color:red}blockedFor=100s{color}] > {quote} > Also there are active exchange process, which finishes with such timings > (timing will be approximatelly equal to blocking time of threads): > {quote} > [2023-04-06T10:55:52,211][INFO > ]\[exchange-worker-#450%node%][GridDhtPartitionsExchangeFuture] Exchange > timings [startVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], > resVer=AffinityTopologyVersion [topVer=1, minorTopVer=5],
[jira] [Updated] (IGNITE-19239) Checkpoint read lock acquisition timeouts during snapshot restore
[ https://issues.apache.org/jira/browse/IGNITE-19239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Shishkov updated IGNITE-19239: --- Description: There may be possible error messages about checkpoint read lock acquisition timeouts and critical threads blocking during snapshot restore process (just after caches start): {quote} [2023-04-06T10:55:46,561][ERROR]\[ttl-cleanup-worker-#475%node%][CheckpointTimeoutLock] Checkpoint read lock acquisition has been timed out. {quote} {quote} [2023-04-06T10:55:47,487][ERROR]\[tcp-disco-msg-worker-[crd]-#23%node%-#446%node%][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour \[workerName=db-checkpoint-thread, threadName=db-checkpoint-thread-#457%snapshot.BlockingThreadsOnSnapshotRestoreReproducerTest0%, {color:red}blockedFor=100s{color}] {quote} Also there are active exchange process, which finishes with such timings (timing will be approximatelly equal to blocking time of threads): {quote} [2023-04-06T10:55:52,211][INFO ]\[exchange-worker-#450%node%][GridDhtPartitionsExchangeFuture] Exchange timings [startVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], resVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], stage="Waiting in exchange queue" (0 ms), ..., stage="Restore partition states" ({color:red}100163 ms{color}), ..., stage="Total time" ({color:red}100334 ms{color})] {quote} How to reproduce: # Set checkpoint frequency less than failure detection timeout. # Ensure, that cache groups partitions states restoring lasts more than failure detection timeout, i.e. it is actual to sufficiently large caches. Reproducer: [^BlockingThreadsOnSnapshotRestoreReproducerTest.patch] was: There may be possible error messages about checkpoint read lock acquisition timeouts and critical threads blocking during snapshot restore process (just after caches start): {quote} [2023-04-06T10:55:46,561][ERROR]\[ttl-cleanup-worker-#475%node%][CheckpointTimeoutLock] Checkpoint read lock acquisition has been timed out. {quote} {quote} [2023-04-06T10:55:47,487][ERROR]\[tcp-disco-msg-worker-[crd]-#23%node%-#446%node%][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [workerName=db-checkpoint-thread, threadName=db-checkpoint-thread-#457%snapshot.BlockingThreadsOnSnapshotRestoreReproducerTest0%, {color:red}blockedFor=100s{color}] {quote} Also there are active exchange process, which finishes with such timings (timing will be approximatelly equal to blocking time of threads): {quote} [2023-04-06T10:55:52,211][INFO ]\[exchange-worker-#450%node%][GridDhtPartitionsExchangeFuture] Exchange timings [startVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], resVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], stage="Waiting in exchange queue" (0 ms), ..., stage="Restore partition states" ({color:red}100163 ms{color}), ..., stage="Total time" ({color:red}100334 ms{color})] {quote} How to reproduce: # Set checkpoint frequency less than failure detection timeout. # Ensure, that cache groups partitions states restoring lasts more than failure detection timeout, i.e. it is actual to sufficiently large caches. Reproducer: [^BlockingThreadsOnSnapshotRestoreReproducerTest.patch] > Checkpoint read lock acquisition timeouts during snapshot restore > - > > Key: IGNITE-19239 > URL: https://issues.apache.org/jira/browse/IGNITE-19239 > Project: Ignite > Issue Type: Bug >Reporter: Ilya Shishkov >Priority: Minor > Labels: iep-43, ise > Attachments: BlockingThreadsOnSnapshotRestoreReproducerTest.patch > > > There may be possible error messages about checkpoint read lock acquisition > timeouts and critical threads blocking during snapshot restore process (just > after caches start): > {quote} > [2023-04-06T10:55:46,561][ERROR]\[ttl-cleanup-worker-#475%node%][CheckpointTimeoutLock] > Checkpoint read lock acquisition has been timed out. > {quote} > {quote} > [2023-04-06T10:55:47,487][ERROR]\[tcp-disco-msg-worker-[crd]-#23%node%-#446%node%][G] > Blocked system-critical thread has been detected. This can lead to > cluster-wide undefined behaviour \[workerName=db-checkpoint-thread, > threadName=db-checkpoint-thread-#457%snapshot.BlockingThreadsOnSnapshotRestoreReproducerTest0%, > {color:red}blockedFor=100s{color}] > {quote} > Also there are active exchange process, which finishes with such timings > (timing will be approximatelly equal to blocking time of threads): > {quote} > [2023-04-06T10:55:52,211][INFO > ]\[exchange-worker-#450%node%][GridDhtPartitionsExchangeFuture] Exchange > timings [startVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], > resVer=AffinityTopologyVersion [topVer=1, minorTopVer=5],
[jira] [Assigned] (IGNITE-19249) Prohibit disabling a test without mentioning a ticket
[ https://issues.apache.org/jira/browse/IGNITE-19249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yury Gerzhedovich reassigned IGNITE-19249: -- Assignee: Yury Gerzhedovich > Prohibit disabling a test without mentioning a ticket > - > > Key: IGNITE-19249 > URL: https://issues.apache.org/jira/browse/IGNITE-19249 > Project: Ignite > Issue Type: Improvement >Reporter: Yury Gerzhedovich >Assignee: Yury Gerzhedovich >Priority: Major > Labels: ignite-3 > > Let's add a test to check that code doesn't have any muted test with no > ticket mention. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-19249) Prohibit disabling a test without mentioning a ticket
Yury Gerzhedovich created IGNITE-19249: -- Summary: Prohibit disabling a test without mentioning a ticket Key: IGNITE-19249 URL: https://issues.apache.org/jira/browse/IGNITE-19249 Project: Ignite Issue Type: Improvement Reporter: Yury Gerzhedovich Let's add a test to check that code doesn't have any muted test with no ticket mention. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19021) Support the directory deployment
[ https://issues.apache.org/jira/browse/IGNITE-19021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim Pakhnushev updated IGNITE-19021: -- Summary: Support the directory deployment (was: Support the dirrectory deployment) > Support the directory deployment > > > Key: IGNITE-19021 > URL: https://issues.apache.org/jira/browse/IGNITE-19021 > Project: Ignite > Issue Type: Improvement > Components: cli, rest >Reporter: Aleksandr >Priority: Major > Labels: ignite-3 > > Now it is impossible to deploy the directory. The case: deploy several jars > or just a directory with class files. > The solution might be: > - zip the dir on the client side > - deploy the zip > - unzip on the server side -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-19021) Support the directory deployment
[ https://issues.apache.org/jira/browse/IGNITE-19021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim Pakhnushev reassigned IGNITE-19021: - Assignee: Vadim Pakhnushev > Support the directory deployment > > > Key: IGNITE-19021 > URL: https://issues.apache.org/jira/browse/IGNITE-19021 > Project: Ignite > Issue Type: Improvement > Components: cli, rest >Reporter: Aleksandr >Assignee: Vadim Pakhnushev >Priority: Major > Labels: ignite-3 > > Now it is impossible to deploy the directory. The case: deploy several jars > or just a directory with class files. > The solution might be: > - zip the dir on the client side > - deploy the zip > - unzip on the server side -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19248) Fix snapshot restore hanging if the prepare stage fails.
[ https://issues.apache.org/jira/browse/IGNITE-19248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikita Amelchev updated IGNITE-19248: - Labels: ise (was: ) > Fix snapshot restore hanging if the prepare stage fails. > > > Key: IGNITE-19248 > URL: https://issues.apache.org/jira/browse/IGNITE-19248 > Project: Ignite > Issue Type: Bug >Reporter: Nikita Amelchev >Assignee: Nikita Amelchev >Priority: Major > Labels: ise > Time Spent: 10m > Remaining Estimate: 0h > > Snapshot restore hangs if the prepare stage fails. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-19248) Fix snapshot restore hanging if the prepare stage fails.
Nikita Amelchev created IGNITE-19248: Summary: Fix snapshot restore hanging if the prepare stage fails. Key: IGNITE-19248 URL: https://issues.apache.org/jira/browse/IGNITE-19248 Project: Ignite Issue Type: Bug Reporter: Nikita Amelchev Assignee: Nikita Amelchev Snapshot restore hangs if the prepare stage fails. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19164) Improve message about requested partitions during snapshot restore
[ https://issues.apache.org/jira/browse/IGNITE-19164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Shishkov updated IGNITE-19164: --- Description: Currently, during snapshot restore message is logged before requesting partitions from remote nodes: {quote} [2023-03-24T18:06:59,910][INFO ]\[disco-notifier-worker-#792%node%|#792%node%][SnapshotRestoreProcess] Trying to request partitions from remote nodes [reqId=ff682204-9554-4fbb-804c-38a79c0b286a, snapshot=snapshot_name, map={*{color:#FF}76e22ef5-3c76-4987-bebd-9a6222a0{color}*={*{color:#FF}-903566235{color}*=[0,2,4,6,11,12,18,98,100,170,190,194,1015], *{color:#FF}1544803905{color}*=[1,11,17,18,22,25,27,35,37,42,45,51,62,64,67,68,73,76,1017]}}] {quote} It is necessary to make this output "human readable": # Print messages per node instead of one message for all nodes. # Print node consistent id and address. # Print cache / group name. was: Currently, during snapshot restore message is logged before requesting partitions from remote nodes: {quote} [2023-03-24T18:06:59,910][INFO ][disco-notifier-worker-#792%node%|#792%node%][SnapshotRestoreProcess] Trying to request partitions from remote nodes [reqId=ff682204-9554-4fbb-804c-38a79c0b286a, snapshot=snapshot_name, map={*{color:#FF}76e22ef5-3c76-4987-bebd-9a6222a0{color}*={*{color:#FF}-903566235{color}*=[0,2,4,6,11,12,18,98,100,170,190,194,1015], *{color:#FF}1544803905{color}*=[1,11,17,18,22,25,27,35,37,42,45,51,62,64,67,68,73,76,1017]}}] {quote} It is necessary to make this output "human readable": # Print messages per node instead of one message for all nodes. # Print node consistent id and address. # Print cache / group name. > Improve message about requested partitions during snapshot restore > -- > > Key: IGNITE-19164 > URL: https://issues.apache.org/jira/browse/IGNITE-19164 > Project: Ignite > Issue Type: Task >Reporter: Ilya Shishkov >Assignee: Julia Bakulina >Priority: Minor > Labels: iep-43, ise > > Currently, during snapshot restore message is logged before requesting > partitions from remote nodes: > {quote} > [2023-03-24T18:06:59,910][INFO > ]\[disco-notifier-worker-#792%node%|#792%node%][SnapshotRestoreProcess] > Trying to request partitions from remote nodes > [reqId=ff682204-9554-4fbb-804c-38a79c0b286a, snapshot=snapshot_name, > map={*{color:#FF}76e22ef5-3c76-4987-bebd-9a6222a0{color}*={*{color:#FF}-903566235{color}*=[0,2,4,6,11,12,18,98,100,170,190,194,1015], > > *{color:#FF}1544803905{color}*=[1,11,17,18,22,25,27,35,37,42,45,51,62,64,67,68,73,76,1017]}}] > {quote} > It is necessary to make this output "human readable": > # Print messages per node instead of one message for all nodes. > # Print node consistent id and address. > # Print cache / group name. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-19153) Fix docker compose
[ https://issues.apache.org/jira/browse/IGNITE-19153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim Pakhnushev reassigned IGNITE-19153: - Assignee: Vadim Pakhnushev > Fix docker compose > -- > > Key: IGNITE-19153 > URL: https://issues.apache.org/jira/browse/IGNITE-19153 > Project: Ignite > Issue Type: Task > Components: build >Reporter: Vadim Pakhnushev >Assignee: Vadim Pakhnushev >Priority: Major > Labels: ignite-3 > Time Spent: 10m > Remaining Estimate: 0h > > After IGNITE-18581, ignite node entry point doesn't accept {{--join}} option > so we need to create a corresponding config file for the example compose file. > Also there are leftover code in the {{IgniteRunner}} class for converting > {{NetworkAddress}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-18320) [IEP-94] Reimplement cache scan command to control.sh
[ https://issues.apache.org/jira/browse/IGNITE-18320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolay Izhikov reassigned IGNITE-18320: Assignee: Aleksey Plekhanov (was: Nikolay Izhikov) > [IEP-94] Reimplement cache scan command to control.sh > - > > Key: IGNITE-18320 > URL: https://issues.apache.org/jira/browse/IGNITE-18320 > Project: Ignite > Issue Type: Improvement >Reporter: Nikolay Izhikov >Assignee: Aleksey Plekhanov >Priority: Blocker > Labels: IEP-94 > Fix For: 2.15 > > Time Spent: 10m > Remaining Estimate: 0h > > To decomission ignitevisorcmd.sh we need to move all useful commands to > control script. > > Cache scan command is used by users to view cache content so we must provide > it via control.sh -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-18320) [IEP-94] Reimplement cache scan command to control.sh
[ https://issues.apache.org/jira/browse/IGNITE-18320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikolay Izhikov reassigned IGNITE-18320: Assignee: Nikolay Izhikov (was: Aleksey Plekhanov) > [IEP-94] Reimplement cache scan command to control.sh > - > > Key: IGNITE-18320 > URL: https://issues.apache.org/jira/browse/IGNITE-18320 > Project: Ignite > Issue Type: Improvement >Reporter: Nikolay Izhikov >Assignee: Nikolay Izhikov >Priority: Blocker > Labels: IEP-94 > Fix For: 2.15 > > Time Spent: 10m > Remaining Estimate: 0h > > To decomission ignitevisorcmd.sh we need to move all useful commands to > control script. > > Cache scan command is used by users to view cache content so we must provide > it via control.sh -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19237) Dependency copying should happed on package phase instead of test-compile
[ https://issues.apache.org/jira/browse/IGNITE-19237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Vinogradov updated IGNITE-19237: -- Description: According to the [plugin usage examples|https://maven.apache.org/plugins/maven-dependency-plugin/usage.html] the phase shoul be the `package`. Lower phase may (will) cause sutuation when artifacts are not generated at multi-level projects. And you may gain the following: {noformat} Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:3.1.1:copy-dependencies (copy-libs) on project ignite-XXX-plugin: Artifact has not been packaged yet. When used on reactor artifact, copy should be executed after packaging: see MDEP-187. {noformat} was: Otherwice there is nothing to copy :( and you may gain the following: {noformat} Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:3.1.1:copy-dependencies (copy-libs) on project ignite-XXX-plugin: Artifact has not been packaged yet. When used on reactor artifact, copy should be executed after packaging: see MDEP-187. {noformat} According to the [plugin usage examples|https://maven.apache.org/plugins/maven-dependency-plugin/usage.html] the phase shoul be the `package`. Lower phase may (will) cause sutuation when artifacts are not generated at multi-level projects. > Dependency copying should happed on package phase instead of test-compile > - > > Key: IGNITE-19237 > URL: https://issues.apache.org/jira/browse/IGNITE-19237 > Project: Ignite > Issue Type: Improvement >Reporter: Anton Vinogradov >Assignee: Anton Vinogradov >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > According to the [plugin usage > examples|https://maven.apache.org/plugins/maven-dependency-plugin/usage.html] > the phase shoul be the `package`. > Lower phase may (will) cause sutuation when artifacts are not generated at > multi-level projects. > And you may gain the following: > {noformat} > Failed to execute goal > org.apache.maven.plugins:maven-dependency-plugin:3.1.1:copy-dependencies > (copy-libs) on project ignite-XXX-plugin: Artifact has not been packaged yet. > When used on reactor artifact, copy should be executed after packaging: see > MDEP-187. > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19237) Dependency copying should happed on package phase instead of test-compile
[ https://issues.apache.org/jira/browse/IGNITE-19237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Vinogradov updated IGNITE-19237: -- Description: Otherwice there is nothing to copy :( and you may gain the following: {noformat} Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:3.1.1:copy-dependencies (copy-libs) on project ignite-XXX-plugin: Artifact has not been packaged yet. When used on reactor artifact, copy should be executed after packaging: see MDEP-187. {noformat} According to the [plugin usage examples|https://maven.apache.org/plugins/maven-dependency-plugin/usage.html] the phase shoul be the `package`. Lower phase may (will) cause sutuation when artifacts are not generated at multi-level projects. was: Otherwice there is nothing to copy :( and you may gain the following: {noformat} Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:3.1.1:copy-dependencies (copy-libs) on project ignite-XXX-plugin: Artifact has not been packaged yet. When used on reactor artifact, copy should be executed after packaging: see MDEP-187. {noformat} > Dependency copying should happed on package phase instead of test-compile > - > > Key: IGNITE-19237 > URL: https://issues.apache.org/jira/browse/IGNITE-19237 > Project: Ignite > Issue Type: Improvement >Reporter: Anton Vinogradov >Assignee: Anton Vinogradov >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Otherwice there is nothing to copy :( and you may gain the following: > {noformat} > Failed to execute goal > org.apache.maven.plugins:maven-dependency-plugin:3.1.1:copy-dependencies > (copy-libs) on project ignite-XXX-plugin: Artifact has not been packaged yet. > When used on reactor artifact, copy should be executed after packaging: see > MDEP-187. > {noformat} > According to the [plugin usage > examples|https://maven.apache.org/plugins/maven-dependency-plugin/usage.html] > the phase shoul be the `package`. > Lower phase may (will) cause sutuation when artifacts are not generated at > multi-level projects. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19247) Replication is timed out
[ https://issues.apache.org/jira/browse/IGNITE-19247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Belyak updated IGNITE-19247: -- Description: This is very basic acceptance test. Code below just create tables with columns (int key and varchar cols) and insert rows into each table (with SLEEP ms interval between operations, with attemps. {noformat} import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; public class TimeoutExceptionReproducer { private static final String DB_URL = "jdbc:ignite:thin://172.24.1.2:10800"; private static final int COLUMNS = 10; private static final String TABLE_NAME = "K"; private static final int ROWS = 1000; private static final int TABLES = 10; private static final int BATCH_SIZE = 10; private static final int SLEEP = 30; private static final int RETRY = 10; private static String getCreateSql(String tableName) { StringBuilder sql = new StringBuilder("create table ").append(tableName).append(" (id int primary key"); for (int i = 0; i < COLUMNS; i++) { sql.append(", col").append(i).append(" varchar NOT NULL"); } sql.append(")"); return sql.toString(); } private static final void s() { if (SLEEP > 0) { try { Thread.sleep(SLEEP); } catch (InterruptedException e) { // NoOp } } } private static void createTables(Connection connection, String tableName) throws SQLException { try (Statement stmt = connection.createStatement()) { System.out.println("Creating " + tableName); stmt.executeUpdate("drop table if exists " + tableName ); s(); stmt.executeUpdate(getCreateSql(tableName)); s(); } } private static String getInsertSql(String tableName) { StringBuilder sql = new StringBuilder("insert into ").append(tableName).append(" values(?"); for (int i = 0; i < COLUMNS; i++) { sql.append(", ?"); } sql.append(")"); return sql.toString(); } private static void insertBatch(PreparedStatement ps) { int retryCounter = 0; while(retryCounter <= RETRY) { try { ps.executeBatch(); return; } catch (SQLException e) { System.err.println(retryCounter + " error while executing " + ps + ":" + e); retryCounter++; } } } private static void insertData(Connection connection, String tableName) throws SQLException { long ts = System.currentTimeMillis(); try (PreparedStatement ps = connection.prepareStatement(getInsertSql(tableName))) { int batch = 0; for (int i = 0; i < ROWS; i++) { ps.setInt(1, i); for (int j = 2; j < COLUMNS + 2; j++) { ps.setString(j, "value" + i + "_" + j); } ps.addBatch(); batch++; if (batch == BATCH_SIZE) { batch = 0; insertBatch(ps); ps.clearBatch(); System.out.println("Batch " + BATCH_SIZE + " took " + (System.currentTimeMillis() - ts) + " to get " + i + " rows"); s(); ts = System.currentTimeMillis(); } } if (batch > 0) { insertBatch(ps); ps.clearBatch(); s(); } } } private static int testData(Connection connection, String tableName) throws SQLException { try (Statement stmt = connection.createStatement(); ResultSet rs = stmt.executeQuery("select count(*) from " + tableName);) { rs.next(); int count = rs.getInt(1); int result = ROWS - count; if (result == 0) { System.out.println("Found " + count + " rows in " + tableName); } else { System.err.println("Found " + count + " rows in " + tableName + " instead of " + ROWS); } s(); return result; } } public static void main(String[] args) throws SQLException { int lostRows = 0; try (Connection connection = DriverManager.getConnection(DB_URL)) { for (int i = 0; i < TABLES; i++) { String tableName = TABLE_NAME + i; createTables(connection, tableName); insertData(connection, tableName); lostRows += testData(connection, tableName); } } System.exit(lostRows); } }
[jira] [Updated] (IGNITE-19247) Replication is timed out
[ https://issues.apache.org/jira/browse/IGNITE-19247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Belyak updated IGNITE-19247: -- Description: {color:#0033b3}import {color}{color:#00}java.sql.Connection{color}; {color:#0033b3}import {color}{color:#00}java.sql.DriverManager{color}; {color:#0033b3}import {color}{color:#00}java.sql.PreparedStatement{color}; {color:#0033b3}import {color}{color:#00}java.sql.ResultSet{color}; {color:#0033b3}import {color}{color:#00}java.sql.SQLException{color}; {color:#0033b3}import {color}{color:#00}java.sql.Statement{color}; {color:#0033b3}public class {color}{color:#00}TimeoutExceptionReproducer {color}{ {color:#0033b3}private static final {color}{color:#00}String {color}{color:#871094}DB_URL {color}= {color:#067d17}"jdbc:ignite:thin://172.24.1.2:10800"{color}; {color:#0033b3}private static final int {color}{color:#871094}COLUMNS {color}= {color:#1750eb}10{color}; {color:#0033b3}private static final {color}{color:#00}String {color}{color:#871094}TABLE_NAME {color}= {color:#067d17}"K"{color}; {color:#0033b3}private static final int {color}{color:#871094}ROWS {color}= {color:#1750eb}1000{color}; {color:#0033b3}private static final int {color}{color:#871094}TABLES {color}= {color:#1750eb}10{color}; {color:#0033b3}private static final int {color}{color:#871094}BATCH_SIZE {color}= {color:#1750eb}10{color}; {color:#0033b3}private static final int {color}{color:#871094}SLEEP {color}= {color:#1750eb}30{color}; {color:#0033b3}private static final int {color}{color:#871094}RETRY {color}= {color:#1750eb}10{color}; {color:#0033b3}private static {color}{color:#00}String {color}{color:#00627a}getCreateSql{color}({color:#00}String {color}tableName) { {color:#00}StringBuilder sql {color}= {color:#0033b3}new {color}StringBuilder({color:#067d17}"create table "{color}).append(tableName).append({color:#067d17}" (id int primary key"{color}); {color:#0033b3}for {color}({color:#0033b3}int {color}i = {color:#1750eb}0{color}; i < {color:#871094}COLUMNS{color}; i++) { {color:#00}sql{color}.append({color:#067d17}", col"{color}).append(i).append({color:#067d17}" varchar NOT NULL"{color}); } {color:#00}sql{color}.append({color:#067d17}")"{color}); {color:#0033b3}return {color}{color:#00}sql{color}.toString(); } {color:#0033b3}private static final void {color}{color:#00627a}s{color}() { {color:#0033b3}if {color}({color:#871094}SLEEP {color}> {color:#1750eb}0{color}) { {color:#0033b3}try {color}{ {color:#00}Thread{color}.sleep({color:#871094}SLEEP{color}); } {color:#0033b3}catch {color}({color:#00}InterruptedException {color}e) { {color:#8c8c8c}// NoOp {color}{color:#8c8c8c} {color}} } } {color:#0033b3}private static void {color}{color:#00627a}createTables{color}({color:#00}Connection {color}connection, {color:#00}String {color}tableName) {color:#0033b3}throws {color}{color:#00}SQLException {color}{ {color:#0033b3}try {color}({color:#00}Statement stmt {color}= connection.createStatement()) { {color:#00}System{color}.{color:#871094}out{color}.println({color:#067d17}"Creating " {color}+ tableName); {color:#00}stmt{color}.executeUpdate({color:#067d17}"{color}{color:#067d17}drop table if exists {color}{color:#067d17}" {color}+ tableName ); s(); {color:#00}stmt{color}.executeUpdate(getCreateSql(tableName)); s(); } } {color:#0033b3}private static {color}{color:#00}String {color}{color:#00627a}getInsertSql{color}({color:#00}String {color}tableName) { {color:#00}StringBuilder sql {color}= {color:#0033b3}new {color}StringBuilder({color:#067d17}"insert into "{color}).append(tableName).append({color:#067d17}" values(?"{color}); {color:#0033b3}for {color}({color:#0033b3}int {color}i = {color:#1750eb}0{color}; i < {color:#871094}COLUMNS{color}; i++) { {color:#00}sql{color}.append({color:#067d17}", ?"{color}); } {color:#00}sql{color}.append({color:#067d17}")"{color}); {color:#0033b3}return {color}{color:#00}sql{color}.toString(); } {color:#0033b3}private static void {color}{color:#00627a}insertBatch{color}({color:#00}PreparedStatement {color}ps) { {color:#0033b3}int {color}retryCounter = {color:#1750eb}0{color}; {color:#0033b3}while{color}(retryCounter <= {color:#871094}RETRY{color}) { {color:#0033b3}try {color}{ ps.executeBatch(); {color:#0033b3}return{color}; } {color:#0033b3}catch {color}({color:#00}SQLException {color}e) { {color:#00}System{color}.{color:#871094}err{color}.println(retryCounter + {color:#067d17}" error while executing " {color}+ ps + {color:#067d17}":" {color}+ e); retryCounter++; } } } {color:#0033b3}private static void {color}{color:#00627a}insertData{color}({color:#00}Connection {color}connection, {color:#00}String {color}tableName) {color:#0033b3}throws {color}{color:#00}SQLException {color}{ {color:#0033b3}long
[jira] [Updated] (IGNITE-19238) ItDataTypesTest is flaky
[ https://issues.apache.org/jira/browse/IGNITE-19238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Lapin updated IGNITE-19238: - Description: h3. Description & Root cause 1. ItDataTypesTest is flaky because previous ItCreateTableDdlTest tests failed to stop replicas on node stop: !Снимок экрана от 2023-04-06 10-39-32.png! {code:java} java.lang.AssertionError: There are replicas alive [replicas=[b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_21, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_6, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_13, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_8, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_9, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_11]] at org.apache.ignite.internal.replicator.ReplicaManager.stop(ReplicaManager.java:341) at org.apache.ignite.internal.app.LifecycleManager.lambda$stopAllComponents$1(LifecycleManager.java:133) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at org.apache.ignite.internal.app.LifecycleManager.stopAllComponents(LifecycleManager.java:131) at org.apache.ignite.internal.app.LifecycleManager.stopNode(LifecycleManager.java:115){code} 2. The reason why we failed to stop replicas is the race between tablesToStopInCaseOfError cleanup and adding tables to tablesByIdVv. On TableManager stop, we stop and cleanup all table resources like replicas and raft nodes {code:java} public void stop() { ... Map tables = tablesByIdVv.latest(); // 1* cleanUpTablesResources(tables); cleanUpTablesResources(tablesToStopInCaseOfError); ... }{code} where tablesToStopInCaseOfError is a sort of pending tables list which one is cleared on cfg storage revision update. tablesByIdVv *listens same storage revision update event* in order to publish tables related to the given revision or in other words make such tables accessible from tablesByIdVv.latest(); that one that is used in order to retrieve tables for cleanup on components stop (see // 1* above) {code:java} public TableManager( ... tablesByIdVv = new IncrementalVersionedValue<>(registry, HashMap::new); registry.accept(token -> { tablesToStopInCaseOfError.clear(); return completedFuture(null); }); {code} However inside IncrementalVersionedValue we have async storageRevision update processing {code:java} updaterFuture = updaterFuture.whenComplete((v, t) -> versionedValue.complete(causalityToken, localUpdaterFuture)); {code} As a result it's possible that we will clear tablesToStopInCaseOfError before publishing same revision tables to tablesByIdVv, so that we will miss that cleared tables in tablesByIdVv.latest() which is used in TableManager#stop. h3. Implementation Notes 1. First of all I've renamed tablesToStopInCaseOfError to pending tables, because they aren't only ...InCaseOfError. 2. I've also reworked tablesToStopInCaseOfError cleanup by substituting tablesToStopInCaseOfError.clear on revision change with {code:java} tablesByIdVv.get(causalityToken).thenAccept(ignored -> inBusyLock(busyLock, ()-> { pendingTables.remove(tblId); })); {code} meaning that we 2.1. remove specific table by id instead of ready. 2.2. do that removal on corresponding table publishing wihtin tablesByIdVv. 3. That means that at some point right after the publishing but before removal it's possible to have same table both within tablesByIdVv and pendingTables thus in order not to stop same table twice (which is safe by the way because of idempotentce) I've substituted {code:java} cleanUpTablesResources(tables); cleanUpTablesResources(tablesToStopInCaseOfError); {code} with {code:java} Stream tablesToStop = Stream.concat(tablesByIdVv.latest().entrySet().stream(), pendingTables.entrySet().stream()). map(Map.Entry::getValue); cleanUpTablesResources(tablesToStop); {code} was: h3. Description & Root cause 1. ItDataTypesTest is flaky because previous ItCreateTableDdlTest tests failed to stop replicas on node stop: !Снимок экрана от 2023-04-06 10-39-32.png! {code:java} java.lang.AssertionError: There are replicas alive [replicas=[b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_21, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_6, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_13, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_8, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_9, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_11]] at org.apache.ignite.internal.replicator.ReplicaManager.stop(ReplicaManager.java:341) at org.apache.ignite.internal.app.LifecycleManager.lambda$stopAllComponents$1(LifecycleManager.java:133) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at org.apache.ignite.internal.app.LifecycleManager.stopAllComponents(LifecycleManager.java:131) at org.apache.ignite.internal.app.LifecycleManager.stopNode(LifecycleManager.java:115){code} 2. The reason why we failed to stop replicas is the race
[jira] [Updated] (IGNITE-19247) Replication is timed out
[ https://issues.apache.org/jira/browse/IGNITE-19247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Belyak updated IGNITE-19247: -- Description: import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; public class TimeoutExceptionReproducer \{ private static final String DB_URL = "jdbc:ignite:thin://172.24.1.2:10800"; private static final int COLUMNS = 10; private static final String TABLE_NAME = "K"; private static final int ROWS = 10; private static final int TABLES = 10; private static final int BATCH_SIZE = 10; private static final int SLEEP = 30; private static String getCreateSql(String tableName) { StringBuilder sql = new StringBuilder("create table ").append(tableName).append(" (id int primary key"); for (int i = 0; i < COLUMNS; i++) { sql.append(", col").append(i).append(" varchar NOT NULL"); } sql.append(")"); return sql.toString(); } private static final void s() \{ if (SLEEP > 0) { try { Thread.sleep(SLEEP); } catch (InterruptedException e) \{ // NoOp } } } private static void createTables(Connection connection, String tableName) throws SQLException \{ try (Statement stmt = connection.createStatement()) { System.out.println("Creating " + tableName); stmt.executeUpdate("drop table if exists " + tableName ); s(); stmt.executeUpdate(getCreateSql(tableName)); s(); } } private static String getInsertSql(String tableName) \{ StringBuilder sql = new StringBuilder("insert into ").append(tableName).append(" values(?"); for (int i = 0; i < COLUMNS; i++) { sql.append(", ?"); } sql.append(")"); return sql.toString(); } private static void insertData(Connection connection, String tableName) throws SQLException \{ long ts = System.currentTimeMillis(); try (PreparedStatement ps = connection.prepareStatement(getInsertSql(tableName))) { int batch = 0; for (int i = 0; i < ROWS; i++) { ps.setInt(1, i); for (int j = 2; j < COLUMNS + 2; j++) { ps.setString(j, "value" + i + "_" + j); } ps.addBatch(); batch++; if (batch == BATCH_SIZE) \{ batch = 0; ps.executeBatch(); ps.clearBatch(); System.out.println("Batch " + BATCH_SIZE + " took " + (System.currentTimeMillis() - ts) + " to get " + i + " rows"); s(); ts = System.currentTimeMillis(); } } if (batch > 0) \{ batch = 0; ps.executeBatch(); ps.clearBatch(); s(); } } } private static int testData(Connection connection, String tableName) throws SQLException \{ try (Statement stmt = connection.createStatement(); ResultSet rs = stmt.executeQuery("select count(*) from " + tableName);) { rs.next(); int count = rs.getInt(1); int result = ROWS - count; if (result == 0) { System.out.println("Found " + count + " rows in " + tableName); } else \{ System.err.println("Found " + count + " rows in " + tableName + " instead of " + ROWS); } s(); return result; } } public static void main(String[] args) throws SQLException \{ int lostRows = 0; try (Connection connection = DriverManager.getConnection(DB_URL)) { for (int i = 0; i < TABLES; i++) { String tableName = TABLE_NAME + i; createTables(connection, tableName); insertData(connection, tableName); lostRows += testData(connection, tableName); } } System.exit(lostRows); } } (was: Code below just create TABLES tables with COLUMNS+1 columns and insert ROWS rows into each table (with SLEEP ms interval between operations). Simple example: {code:java} import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; public class TimeoutExceptionReproducer { private static final String DB_URL = "jdbc:ignite:thin://172.24.1.2:10800"; private static final int COLUMNS = 10; private static final String TABLE_NAME = "K"; private static final int ROWS = 10; private static final int TABLES = 10; private static final int BATCH_SIZE = 10; private static final int SLEEP = 30; private static String getCreateSql(String tableName) { StringBuilder sql = new StringBuilder("create table ").append(tableName).append(" (id int primary key"); for (int i = 0; i < COLUMNS; i++) { sql.append(", col").append(i).append(" varchar NOT NULL"); } sql.append(")"); return sql.toString(); } private static final void s() { try { Thread.sleep(SLEEP); } catch (InterruptedException e) { // NoOp } } private static void createTables(Connection connection, String tableName) throws SQLException { try (Statement stmt = connection.createStatement()) { System.out.println("Creating " + tableName); stmt.executeUpdate("drop table if exists " + tableName ); s(); stmt.executeUpdate(getCreateSql(tableName)); s();
[jira] [Updated] (IGNITE-19247) Replication is timed out
[ https://issues.apache.org/jira/browse/IGNITE-19247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Belyak updated IGNITE-19247: -- Description: Code below just create TABLES tables with COLUMNS+1 columns and insert ROWS rows into each table (with SLEEP ms interval between operations). Simple example: {code:java} import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; public class TimeoutExceptionReproducer { private static final String DB_URL = "jdbc:ignite:thin://172.24.1.2:10800"; private static final int COLUMNS = 10; private static final String TABLE_NAME = "K"; private static final int ROWS = 10; private static final int TABLES = 10; private static final int BATCH_SIZE = 10; private static final int SLEEP = 30; private static String getCreateSql(String tableName) { StringBuilder sql = new StringBuilder("create table ").append(tableName).append(" (id int primary key"); for (int i = 0; i < COLUMNS; i++) { sql.append(", col").append(i).append(" varchar NOT NULL"); } sql.append(")"); return sql.toString(); } private static final void s() { try { Thread.sleep(SLEEP); } catch (InterruptedException e) { // NoOp } } private static void createTables(Connection connection, String tableName) throws SQLException { try (Statement stmt = connection.createStatement()) { System.out.println("Creating " + tableName); stmt.executeUpdate("drop table if exists " + tableName ); s(); stmt.executeUpdate(getCreateSql(tableName)); s(); } } private static String getInsertSql(String tableName) { StringBuilder sql = new StringBuilder("insert into ").append(tableName).append(" values(?"); for (int i = 0; i < COLUMNS; i++) { sql.append(", ?"); } sql.append(")"); return sql.toString(); } private static void insertData(Connection connection, String tableName) throws SQLException { long ts = System.currentTimeMillis(); try (PreparedStatement ps = connection.prepareStatement(getInsertSql(tableName))) { int batch = 0; for (int i = 0; i < ROWS; i++) { ps.setInt(1, i); for (int j = 2; j < COLUMNS + 2; j++) { ps.setString(j, "value" + i + "_" + j); } ps.addBatch(); batch++; if (batch == BATCH_SIZE) { batch = 0; ps.executeBatch(); ps.clearBatch(); System.out.println("Batch " + BATCH_SIZE + " took " + (System.currentTimeMillis() - ts) + " to get " + i + " rows"); s(); ts = System.currentTimeMillis(); } } if (batch > 0) { batch = 0; ps.executeBatch(); ps.clearBatch(); s(); } } } private static int testData(Connection connection, String tableName) throws SQLException { try (Statement stmt = connection.createStatement(); ResultSet rs = stmt.executeQuery("select count(*) from " + tableName);) { rs.next(); int count = rs.getInt(1); int result = ROWS - count; if (result == 0) { System.out.println("Found " + count + " rows in " + tableName); } else { System.err.println("Found " + count + " rows in " + tableName + " instead of " + ROWS); } return result; } } public static void main(String[] args) throws SQLException { int lostRows = 0; try (Connection connection = DriverManager.getConnection(DB_URL)) { for (int i = 0; i < TABLES; i++) { String tableName = TABLE_NAME + i; createTables(connection, tableName); insertData(connection, tableName); lostRows += testData(connection, tableName); } } System.exit(lostRows); } } {code} lead to timeout exception: {code:java} Batch 100 took 4228 to get 2899 rows Batch 100 took 5669 to get 2999 rows Batch 100 took 3902 to get 3099 rows Exception in thread "main" java.sql.BatchUpdateException: IGN-REP-3 TraceId:b2c2c9e5-b917-482e-91df-2e0576c443c7 Replication is timed out [replicaGrpId=76c2b69a-a2bc-4d16-838d-5aff014c6004_part_11] at org.apache.ignite.internal.jdbc.JdbcPreparedStatement.executeBatch(JdbcPreparedStatement.java:124) at
[jira] [Updated] (IGNITE-19247) Replication is timed out
[ https://issues.apache.org/jira/browse/IGNITE-19247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Belyak updated IGNITE-19247: -- Description: Code below just create 1000 tables with 101 columns and insert 1000 rows into each table. Simple example: {code:java} import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; public class TimeoutExceptionReproducer { private static final String DB_URL = "jdbc:ignite:thin://172.24.1.2:10800"; private static final int COLUMNS = 10; private static final String TABLE_NAME = "K"; private static final int ROWS = 10; private static final int TABLES = 10; private static final int BATCH_SIZE = 10; private static final int SLEEP = 30; private static String getCreateSql(String tableName) { StringBuilder sql = new StringBuilder("create table ").append(tableName).append(" (id int primary key"); for (int i = 0; i < COLUMNS; i++) { sql.append(", col").append(i).append(" varchar NOT NULL"); } sql.append(")"); return sql.toString(); } private static final void s() { try { Thread.sleep(SLEEP); } catch (InterruptedException e) { // NoOp } } private static void createTables(Connection connection, String tableName) throws SQLException { try (Statement stmt = connection.createStatement()) { System.out.println("Creating " + tableName); stmt.executeUpdate("drop table if exists " + tableName ); s(); stmt.executeUpdate(getCreateSql(tableName)); s(); } } private static String getInsertSql(String tableName) { StringBuilder sql = new StringBuilder("insert into ").append(tableName).append(" values(?"); for (int i = 0; i < COLUMNS; i++) { sql.append(", ?"); } sql.append(")"); return sql.toString(); } private static void insertData(Connection connection, String tableName) throws SQLException { long ts = System.currentTimeMillis(); try (PreparedStatement ps = connection.prepareStatement(getInsertSql(tableName))) { int batch = 0; for (int i = 0; i < ROWS; i++) { ps.setInt(1, i); for (int j = 2; j < COLUMNS + 2; j++) { ps.setString(j, "value" + i + "_" + j); } ps.addBatch(); batch++; if (batch == BATCH_SIZE) { batch = 0; ps.executeBatch(); ps.clearBatch(); System.out.println("Batch " + BATCH_SIZE + " took " + (System.currentTimeMillis() - ts) + " to get " + i + " rows"); s(); ts = System.currentTimeMillis(); } } if (batch > 0) { batch = 0; ps.executeBatch(); ps.clearBatch(); s(); } } } private static int testData(Connection connection, String tableName) throws SQLException { try (Statement stmt = connection.createStatement(); ResultSet rs = stmt.executeQuery("select count(*) from " + tableName);) { rs.next(); int count = rs.getInt(1); int result = ROWS - count; if (result == 0) { System.out.println("Found " + count + " rows in " + tableName); } else { System.err.println("Found " + count + " rows in " + tableName + " instead of " + ROWS); } return result; } } public static void main(String[] args) throws SQLException { int lostRows = 0; try (Connection connection = DriverManager.getConnection(DB_URL)) { for (int i = 0; i < TABLES; i++) { String tableName = TABLE_NAME + i; createTables(connection, tableName); insertData(connection, tableName); lostRows += testData(connection, tableName); } } System.exit(lostRows); } } {code} lead to timeout exception: {code:java} Batch 100 took 4228 to get 2899 rows Batch 100 took 5669 to get 2999 rows Batch 100 took 3902 to get 3099 rows Exception in thread "main" java.sql.BatchUpdateException: IGN-REP-3 TraceId:b2c2c9e5-b917-482e-91df-2e0576c443c7 Replication is timed out [replicaGrpId=76c2b69a-a2bc-4d16-838d-5aff014c6004_part_11] at org.apache.ignite.internal.jdbc.JdbcPreparedStatement.executeBatch(JdbcPreparedStatement.java:124) at TimeoutExceptionReproducer.insertData(TimeoutExceptionReproducer.java:64) at
[jira] [Updated] (IGNITE-19238) ItDataTypesTest is flaky
[ https://issues.apache.org/jira/browse/IGNITE-19238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Lapin updated IGNITE-19238: - Description: h3. Description & Root cause 1. ItDataTypesTest is flaky because previous ItCreateTableDdlTest tests failed to stop replicas on node stop: !Снимок экрана от 2023-04-06 10-39-32.png! {code:java} java.lang.AssertionError: There are replicas alive [replicas=[b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_21, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_6, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_13, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_8, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_9, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_11]] at org.apache.ignite.internal.replicator.ReplicaManager.stop(ReplicaManager.java:341) at org.apache.ignite.internal.app.LifecycleManager.lambda$stopAllComponents$1(LifecycleManager.java:133) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at org.apache.ignite.internal.app.LifecycleManager.stopAllComponents(LifecycleManager.java:131) at org.apache.ignite.internal.app.LifecycleManager.stopNode(LifecycleManager.java:115){code} 2. The reason why we failed to stop replicas is the race between tablesToStopInCaseOfError cleanup and adding tables to tablesByIdVv. On TableManager stop, we stop and cleanup all table resources like replicas and raft nodes {code:java} public void stop() { ... Map tables = tablesByIdVv.latest(); // 1* cleanUpTablesResources(tables); cleanUpTablesResources(tablesToStopInCaseOfError); ... }{code} where tablesToStopInCaseOfError is a sort of pending tables list which one is cleared on cfg storage revision update. tablesByIdVv *listens same storage revision update event* in order to publish tables related to the given revision or in other words make such tables accessible from tablesByIdVv.latest(); that one that is used in order to retrieve tables for cleanup on components stop (see // 1* above) {code:java} public TableManager( ... tablesByIdVv = new IncrementalVersionedValue<>(registry, HashMap::new); registry.accept(token -> { tablesToStopInCaseOfError.clear(); return completedFuture(null); }); {code} However inside IncrementalVersionedValue we have async storageRevision update processing {code:java} updaterFuture = updaterFuture.whenComplete((v, t) -> versionedValue.complete(causalityToken, localUpdaterFuture)); {code} As a result it's possible that we will clear tablesToStopInCaseOfError before publishing same revision tables to tablesByIdVv, so that we will miss that cleared tables in tablesByIdVv.latest() which is used in TableManager#stop. h3. Implementation Notes 1. First of all I've renamed tablesToStopInCaseOfError to pending tables, because they aren't only ...InCaseOfError. 2. I've also reworked tablesToStopInCaseOfError cleanup by substituting tablesToStopInCaseOfError.clear on revision change with {code:java} tablesByIdVv.get(causalityToken).thenAccept(ignored -> inBusyLock(busyLock, ()-> { pendingTables.remove(tblId); })); {code} was: h3. Description & Root cause 1. ItDataTypesTest is flaky because previous ItCreateTableDdlTest tests failed to stop replicas on node stop: !Снимок экрана от 2023-04-06 10-39-32.png! {code:java} java.lang.AssertionError: There are replicas alive [replicas=[b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_21, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_6, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_13, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_8, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_9, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_11]] at org.apache.ignite.internal.replicator.ReplicaManager.stop(ReplicaManager.java:341) at org.apache.ignite.internal.app.LifecycleManager.lambda$stopAllComponents$1(LifecycleManager.java:133) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at org.apache.ignite.internal.app.LifecycleManager.stopAllComponents(LifecycleManager.java:131) at org.apache.ignite.internal.app.LifecycleManager.stopNode(LifecycleManager.java:115){code} 2. The reason why we failed to stop replicas is the race between tablesToStopInCaseOfError cleanup and adding tables to tablesByIdVv. On TableManager stop, we stop and cleanup all table resources like replicas and raft nodes {code:java} public void stop() { ... Map tables = tablesByIdVv.latest(); // 1* cleanUpTablesResources(tables); cleanUpTablesResources(tablesToStopInCaseOfError); ... }{code} where tablesToStopInCaseOfError is a sort of pending tables list which one is cleared on cfg storage revision update. tablesByIdVv *listens same storage revision update event* in order to publish tables related to the given revision or in other words make such tables accessible from tablesByIdVv.latest(); that one that is used
[jira] [Updated] (IGNITE-19238) ItDataTypesTest is flaky
[ https://issues.apache.org/jira/browse/IGNITE-19238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Lapin updated IGNITE-19238: - Description: h3. Description & Root cause 1. ItDataTypesTest is flaky because previous ItCreateTableDdlTest tests failed to stop replicas on node stop: !Снимок экрана от 2023-04-06 10-39-32.png! {code:java} java.lang.AssertionError: There are replicas alive [replicas=[b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_21, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_6, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_13, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_8, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_9, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_11]] at org.apache.ignite.internal.replicator.ReplicaManager.stop(ReplicaManager.java:341) at org.apache.ignite.internal.app.LifecycleManager.lambda$stopAllComponents$1(LifecycleManager.java:133) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at org.apache.ignite.internal.app.LifecycleManager.stopAllComponents(LifecycleManager.java:131) at org.apache.ignite.internal.app.LifecycleManager.stopNode(LifecycleManager.java:115){code} 2. The reason why we failed to stop replicas is the race between tablesToStopInCaseOfError cleanup and adding tables to tablesByIdVv. On TableManager stop, we stop and cleanup all table resources like replicas and raft nodes {code:java} public void stop() { ... Map tables = tablesByIdVv.latest(); // 1* cleanUpTablesResources(tables); cleanUpTablesResources(tablesToStopInCaseOfError); ... }{code} where tablesToStopInCaseOfError is a sort of pending tables list which one is cleared on cfg storage revision update. tablesByIdVv *listens same storage revision update event* in order to publish tables related to the given revision or in other words make such tables accessible from tablesByIdVv.latest(); that one that is used in order to retrieve tables for cleanup on components stop (see // 1* above) {code:java} public TableManager( ... tablesByIdVv = new IncrementalVersionedValue<>(registry, HashMap::new); registry.accept(token -> { tablesToStopInCaseOfError.clear(); return completedFuture(null); }); {code} However inside IncrementalVersionedValue we have async storageRevision update processing {code:java} updaterFuture = updaterFuture.whenComplete((v, t) -> versionedValue.complete(causalityToken, localUpdaterFuture)); {code} As a result it's possible that we will clear tablesToStopInCaseOfError before publishing same revision tables to tablesByIdVv, so that we will miss that cleared tables in tablesByIdVv.latest() which is used in TableManager#stop. h3. Implementation Notes was: 1. ItDataTypesTest is flaky because previous ItCreateTableDdlTest tests failed to stop replicas on node stop: !Снимок экрана от 2023-04-06 10-39-32.png! {code:java} java.lang.AssertionError: There are replicas alive [replicas=[b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_21, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_6, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_13, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_8, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_9, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_11]] at org.apache.ignite.internal.replicator.ReplicaManager.stop(ReplicaManager.java:341) at org.apache.ignite.internal.app.LifecycleManager.lambda$stopAllComponents$1(LifecycleManager.java:133) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at org.apache.ignite.internal.app.LifecycleManager.stopAllComponents(LifecycleManager.java:131) at org.apache.ignite.internal.app.LifecycleManager.stopNode(LifecycleManager.java:115){code} 2. The reason why we failed to stop replicas is the race between tablesToStopInCaseOfError cleanup and adding tables to tablesByIdVv. On TableManager stop, we stop and cleanup all table resources like replicas and raft nodes {code:java} public void stop() { ... Map tables = tablesByIdVv.latest(); // 1* cleanUpTablesResources(tables); cleanUpTablesResources(tablesToStopInCaseOfError); ... }{code} where tablesToStopInCaseOfError is a sort of pending tables list which one is cleared on cfg storage revision update. tablesByIdVv *listens same storage revision update event* in order to publish tables related to the given revision or in other words make such tables accessible from tablesByIdVv.latest(); that one that is used in order to retrieve tables for cleanup on components stop (see // 1* above) {code:java} public TableManager( ... tablesByIdVv = new IncrementalVersionedValue<>(registry, HashMap::new); registry.accept(token -> { tablesToStopInCaseOfError.clear(); return completedFuture(null); }); {code} However inside IncrementalVersionedValue we have async storageRevision update processing {code:java} updaterFuture = updaterFuture.whenComplete((v, t) ->
[jira] [Updated] (IGNITE-19247) Replication is timed out
[ https://issues.apache.org/jira/browse/IGNITE-19247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Belyak updated IGNITE-19247: -- Description: Code below just create 1000 tables with 101 columns and insert 1000 rows into each table. Simple example: {code:java} import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.sql.SQLException; import java.sql.Statement; public class TimeoutExceptionReproducer { private static final String DB_URL = "jdbc:ignite:thin://172.24.1.2:10800"; private static final int COLUMNS = 100; private static final String TABLE_NAME = "t1"; private static final int ROWS = 1000; private static final int TABLES = 1000; private static final int BATCH_SIZE = 100; private static String getCreateSql(String tableName) { StringBuilder sql = new StringBuilder("create table ").append(tableName).append(" (id int primary key"); for (int i = 0; i < COLUMNS; i++) { sql.append(", col").append(i).append(" varchar NOT NULL"); } sql.append(")"); return sql.toString(); } private static void createTables(Connection connection, String tableName) throws SQLException { try (Statement stmt = connection.createStatement()) { System.out.println("Creating " + tableName); stmt.executeUpdate("drop table if exists " + tableName ); stmt.executeUpdate(getCreateSql(tableName)); } } private static String getInsertSql(String tableName) { StringBuilder sql = new StringBuilder("insert into ").append(tableName).append(" values(?"); for (int i = 0; i < COLUMNS; i++) { sql.append(", ?"); } sql.append(")"); return sql.toString(); } private static void insertData(Connection connection, String tableName) throws SQLException { long ts = System.currentTimeMillis(); try (PreparedStatement ps = connection.prepareStatement(getInsertSql(tableName))) { int batch = 0; for (int i = 0; i < ROWS; i++) { ps.setInt(1, i); for (int j = 2; j < COLUMNS + 2; j++) { ps.setString(j, "value" + i + "_" + j); } ps.addBatch(); batch++; if (batch == BATCH_SIZE) { batch = 0; ps.executeBatch(); ps.clearBatch(); long nextTs = System.currentTimeMillis(); System.out.println("Batch " + BATCH_SIZE + " took " + (nextTs - ts) + " to get " + i + " rows"); ts = nextTs; } } if (batch > 0) { batch = 0; ps.executeBatch(); ps.clearBatch(); } } } public static void main(String[] args) throws SQLException { try (Connection connection = DriverManager.getConnection(DB_URL)) { for (int i = 0; i < TABLES; i++) { String tableName = TABLE_NAME + i; createTables(connection, tableName); insertData(connection, tableName); } } } } {code} lead to timeout exception: {code:java} Batch 100 took 4228 to get 2899 rows Batch 100 took 5669 to get 2999 rows Batch 100 took 3902 to get 3099 rows Exception in thread "main" java.sql.BatchUpdateException: IGN-REP-3 TraceId:b2c2c9e5-b917-482e-91df-2e0576c443c7 Replication is timed out [replicaGrpId=76c2b69a-a2bc-4d16-838d-5aff014c6004_part_11] at org.apache.ignite.internal.jdbc.JdbcPreparedStatement.executeBatch(JdbcPreparedStatement.java:124) at TimeoutExceptionReproducer.insertData(TimeoutExceptionReproducer.java:64) at TimeoutExceptionReproducer.main(TimeoutExceptionReproducer.java:84){code} was: Simple example: {code:java} import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.sql.SQLException; import java.sql.Statement; public class TimeoutExceptionReproducer { private static final String DB_URL = "jdbc:ignite:thin://172.24.1.2:10800"; private static final int COLUMNS = 100; private static final String TABLE_NAME = "t1"; private static final int ROWS = 1000; private static final int TABLES = 1000; private static final int BATCH_SIZE = 100; private static String getCreateSql(String tableName) { StringBuilder sql = new StringBuilder("create table ").append(tableName).append(" (id int primary key"); for (int i = 0; i < COLUMNS; i++) { sql.append(", col").append(i).append(" varchar NOT NULL"); } sql.append(")"); return sql.toString(); } private static void createTables(Connection connection, String tableName) throws SQLException {
[jira] [Updated] (IGNITE-19247) Replication is timed out
[ https://issues.apache.org/jira/browse/IGNITE-19247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Belyak updated IGNITE-19247: -- Environment: (was: Simple example: {code:java} import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.sql.SQLException; import java.sql.Statement; public class TimeoutExceptionReproducer { private static final String DB_URL = "jdbc:ignite:thin://172.24.1.2:10800"; private static final int COLUMNS = 100; private static final String TABLE_NAME = "t1"; private static final int ROWS = 10; private static final int BATCH_SIZE = 100; private static String getCreateSql() { StringBuilder sql = new StringBuilder("create table ").append(TABLE_NAME).append(" (id int primary key"); for (int i = 0; i < COLUMNS; i++) { sql.append(", col").append(i).append(" varchar NOT NULL"); } sql.append(")"); return sql.toString(); } private static void createTable(Connection connection) throws SQLException { try (Statement stmt = connection.createStatement()) { stmt.executeUpdate("drop table if exists " + TABLE_NAME ); stmt.executeUpdate(getCreateSql()); } } private static String getInsertSql() { StringBuilder sql = new StringBuilder("insert into t1 values(?"); for (int i = 0; i < COLUMNS; i++) { sql.append(", ?"); } sql.append(")"); return sql.toString(); } private static void insertData(Connection connection) throws SQLException { long ts = System.currentTimeMillis(); try (PreparedStatement ps = connection.prepareStatement(getInsertSql())) { int batch = 0; for (int i = 0; i < ROWS; i++) { ps.setInt(1, i); for (int j = 2; j < COLUMNS + 2; j++) { ps.setString(j, "value" + i + "_" + j); } ps.addBatch(); batch++; if (batch == BATCH_SIZE) { batch = 0; ps.executeBatch(); ps.clearBatch(); long nextTs = System.currentTimeMillis(); System.out.println("Batch " + BATCH_SIZE + " took " + (nextTs - ts) + " to get " + i + " rows"); ts = nextTs; } } if (batch > 0) { batch = 0; ps.executeBatch(); ps.clearBatch(); } } } public static void main(String[] args) throws SQLException { try (Connection connection = DriverManager.getConnection(DB_URL)) { createTable(connection); insertData(connection); } } } {code} lead to timeout exception: {code:java} Batch 100 took 4228 to get 2899 rows Batch 100 took 5669 to get 2999 rows Batch 100 took 3902 to get 3099 rows Exception in thread "main" java.sql.BatchUpdateException: IGN-REP-3 TraceId:b2c2c9e5-b917-482e-91df-2e0576c443c7 Replication is timed out [replicaGrpId=76c2b69a-a2bc-4d16-838d-5aff014c6004_part_11] at org.apache.ignite.internal.jdbc.JdbcPreparedStatement.executeBatch(JdbcPreparedStatement.java:124) at TimeoutExceptionReproducer.insertData(TimeoutExceptionReproducer.java:64) at TimeoutExceptionReproducer.main(TimeoutExceptionReproducer.java:84){code}) > Replication is timed out > > > Key: IGNITE-19247 > URL: https://issues.apache.org/jira/browse/IGNITE-19247 > Project: Ignite > Issue Type: Bug > Components: general >Affects Versions: 3.0 >Reporter: Alexander Belyak >Priority: Critical > Labels: ignite-3 > Fix For: 3.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19247) Replication is timed out
[ https://issues.apache.org/jira/browse/IGNITE-19247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Belyak updated IGNITE-19247: -- Description: Simple example: {code:java} import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.sql.SQLException; import java.sql.Statement; public class TimeoutExceptionReproducer { private static final String DB_URL = "jdbc:ignite:thin://172.24.1.2:10800"; private static final int COLUMNS = 100; private static final String TABLE_NAME = "t1"; private static final int ROWS = 1000; private static final int TABLES = 1000; private static final int BATCH_SIZE = 100; private static String getCreateSql(String tableName) { StringBuilder sql = new StringBuilder("create table ").append(tableName).append(" (id int primary key"); for (int i = 0; i < COLUMNS; i++) { sql.append(", col").append(i).append(" varchar NOT NULL"); } sql.append(")"); return sql.toString(); } private static void createTables(Connection connection, String tableName) throws SQLException { try (Statement stmt = connection.createStatement()) { System.out.println("Creating " + tableName); stmt.executeUpdate("drop table if exists " + tableName ); stmt.executeUpdate(getCreateSql(tableName)); } } private static String getInsertSql(String tableName) { StringBuilder sql = new StringBuilder("insert into ").append(tableName).append(" values(?"); for (int i = 0; i < COLUMNS; i++) { sql.append(", ?"); } sql.append(")"); return sql.toString(); } private static void insertData(Connection connection, String tableName) throws SQLException { long ts = System.currentTimeMillis(); try (PreparedStatement ps = connection.prepareStatement(getInsertSql(tableName))) { int batch = 0; for (int i = 0; i < ROWS; i++) { ps.setInt(1, i); for (int j = 2; j < COLUMNS + 2; j++) { ps.setString(j, "value" + i + "_" + j); } ps.addBatch(); batch++; if (batch == BATCH_SIZE) { batch = 0; ps.executeBatch(); ps.clearBatch(); long nextTs = System.currentTimeMillis(); System.out.println("Batch " + BATCH_SIZE + " took " + (nextTs - ts) + " to get " + i + " rows"); ts = nextTs; } } if (batch > 0) { batch = 0; ps.executeBatch(); ps.clearBatch(); } } } public static void main(String[] args) throws SQLException { try (Connection connection = DriverManager.getConnection(DB_URL)) { for (int i = 0; i < TABLES; i++) { String tableName = TABLE_NAME + i; createTables(connection, tableName); insertData(connection, tableName); } } } } {code} lead to timeout exception: {code:java} Batch 100 took 4228 to get 2899 rows Batch 100 took 5669 to get 2999 rows Batch 100 took 3902 to get 3099 rows Exception in thread "main" java.sql.BatchUpdateException: IGN-REP-3 TraceId:b2c2c9e5-b917-482e-91df-2e0576c443c7 Replication is timed out [replicaGrpId=76c2b69a-a2bc-4d16-838d-5aff014c6004_part_11] at org.apache.ignite.internal.jdbc.JdbcPreparedStatement.executeBatch(JdbcPreparedStatement.java:124) at TimeoutExceptionReproducer.insertData(TimeoutExceptionReproducer.java:64) at TimeoutExceptionReproducer.main(TimeoutExceptionReproducer.java:84){code} > Replication is timed out > > > Key: IGNITE-19247 > URL: https://issues.apache.org/jira/browse/IGNITE-19247 > Project: Ignite > Issue Type: Bug > Components: general >Affects Versions: 3.0 >Reporter: Alexander Belyak >Priority: Critical > Labels: ignite-3 > Fix For: 3.0 > > > Simple example: > {code:java} > import java.sql.Connection; > import java.sql.DriverManager; > import java.sql.PreparedStatement; > import java.sql.SQLException; > import java.sql.Statement; > public class TimeoutExceptionReproducer { > private static final String DB_URL = > "jdbc:ignite:thin://172.24.1.2:10800"; > private static final int COLUMNS = 100; > private static final String TABLE_NAME = "t1"; > private static final int ROWS = 1000; > private static final int TABLES = 1000; > private static final int BATCH_SIZE = 100; > private static String getCreateSql(String tableName) { > StringBuilder sql = new StringBuilder("create table >
[jira] [Created] (IGNITE-19247) Replication is timed out
Alexander Belyak created IGNITE-19247: - Summary: Replication is timed out Key: IGNITE-19247 URL: https://issues.apache.org/jira/browse/IGNITE-19247 Project: Ignite Issue Type: Bug Components: general Affects Versions: 3.0 Environment: Simple example: {code:java} import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.sql.SQLException; import java.sql.Statement; public class TimeoutExceptionReproducer { private static final String DB_URL = "jdbc:ignite:thin://172.24.1.2:10800"; private static final int COLUMNS = 100; private static final String TABLE_NAME = "t1"; private static final int ROWS = 10; private static final int BATCH_SIZE = 100; private static String getCreateSql() { StringBuilder sql = new StringBuilder("create table ").append(TABLE_NAME).append(" (id int primary key"); for (int i = 0; i < COLUMNS; i++) { sql.append(", col").append(i).append(" varchar NOT NULL"); } sql.append(")"); return sql.toString(); } private static void createTable(Connection connection) throws SQLException { try (Statement stmt = connection.createStatement()) { stmt.executeUpdate("drop table if exists " + TABLE_NAME ); stmt.executeUpdate(getCreateSql()); } } private static String getInsertSql() { StringBuilder sql = new StringBuilder("insert into t1 values(?"); for (int i = 0; i < COLUMNS; i++) { sql.append(", ?"); } sql.append(")"); return sql.toString(); } private static void insertData(Connection connection) throws SQLException { long ts = System.currentTimeMillis(); try (PreparedStatement ps = connection.prepareStatement(getInsertSql())) { int batch = 0; for (int i = 0; i < ROWS; i++) { ps.setInt(1, i); for (int j = 2; j < COLUMNS + 2; j++) { ps.setString(j, "value" + i + "_" + j); } ps.addBatch(); batch++; if (batch == BATCH_SIZE) { batch = 0; ps.executeBatch(); ps.clearBatch(); long nextTs = System.currentTimeMillis(); System.out.println("Batch " + BATCH_SIZE + " took " + (nextTs - ts) + " to get " + i + " rows"); ts = nextTs; } } if (batch > 0) { batch = 0; ps.executeBatch(); ps.clearBatch(); } } } public static void main(String[] args) throws SQLException { try (Connection connection = DriverManager.getConnection(DB_URL)) { createTable(connection); insertData(connection); } } } {code} lead to timeout exception: {code:java} Batch 100 took 4228 to get 2899 rows Batch 100 took 5669 to get 2999 rows Batch 100 took 3902 to get 3099 rows Exception in thread "main" java.sql.BatchUpdateException: IGN-REP-3 TraceId:b2c2c9e5-b917-482e-91df-2e0576c443c7 Replication is timed out [replicaGrpId=76c2b69a-a2bc-4d16-838d-5aff014c6004_part_11] at org.apache.ignite.internal.jdbc.JdbcPreparedStatement.executeBatch(JdbcPreparedStatement.java:124) at TimeoutExceptionReproducer.insertData(TimeoutExceptionReproducer.java:64) at TimeoutExceptionReproducer.main(TimeoutExceptionReproducer.java:84){code} Reporter: Alexander Belyak Fix For: 3.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IGNITE-18170) Deadlock in TableManager#updateAssignmentInternal()
[ https://issues.apache.org/jira/browse/IGNITE-18170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roman Puchkovskiy resolved IGNITE-18170. Resolution: Fixed Fixed in IGNITE-18203 > Deadlock in TableManager#updateAssignmentInternal() > --- > > Key: IGNITE-18170 > URL: https://issues.apache.org/jira/browse/IGNITE-18170 > Project: Ignite > Issue Type: Bug >Reporter: Roman Puchkovskiy >Assignee: Roman Puchkovskiy >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Attachments: threads_report.txt > > Time Spent: 10m > Remaining Estimate: 0h > > Currently, {{TableManager#updateAssignmentsInternal}} is fully synchronous. > The scenario is as follows: > # {{updateAssignmentsInternal}} starts a RAFT group for a partition > # {{FSMCallerImpl}} finds out that its applied index is below the group > committed index, so it starts to apply the missing log entries in its > {{init()}} method (this is still done synchronously) > # While doing so, it invokes {{{}PartitionListener{}}}, which tries to > execute an insert > # To make an insert, a PK is needed, so it the insertion code tries to > obtain a PK from its future like this: {{pkFuture.join()}} > # That future is completed from {{{}IndexManager#createIndexLocally(){}}}, > which is invoked by {{ConfigurationNotifier}} later than > {{updateassignmentsInternal}} in the same thread > # As a result, the PK future cannot be completed before the sync > {{updateAssignmentsInternal}} finishes its job and returns, and it cannot > finish its job before the PK future is completed > We should make {{updateAssignmentsInternal}} async. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19238) ItDataTypesTest is flaky
[ https://issues.apache.org/jira/browse/IGNITE-19238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Lapin updated IGNITE-19238: - Description: 1. ItDataTypesTest is flaky because previous ItCreateTableDdlTest tests failed to stop replicas on node stop: !Снимок экрана от 2023-04-06 10-39-32.png! {code:java} java.lang.AssertionError: There are replicas alive [replicas=[b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_21, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_6, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_13, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_8, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_9, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_11]] at org.apache.ignite.internal.replicator.ReplicaManager.stop(ReplicaManager.java:341) at org.apache.ignite.internal.app.LifecycleManager.lambda$stopAllComponents$1(LifecycleManager.java:133) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at org.apache.ignite.internal.app.LifecycleManager.stopAllComponents(LifecycleManager.java:131) at org.apache.ignite.internal.app.LifecycleManager.stopNode(LifecycleManager.java:115){code} 2. The reason why we failed to stop replicas is the race between tablesToStopInCaseOfError cleanup and adding tables to tablesByIdVv. On TableManager stop, we stop and cleanup all table resources like replicas and raft nodes {code:java} public void stop() { ... Map tables = tablesByIdVv.latest(); // 1* cleanUpTablesResources(tables); cleanUpTablesResources(tablesToStopInCaseOfError); ... }{code} where tablesToStopInCaseOfError is a sort of pending tables list which one is cleared on cfg storage revision update. tablesByIdVv *listens same storage revision update event* in order to publish tables related to the given revision or in other words make such tables accessible from tablesByIdVv.latest(); that one that is used in order to retrieve tables for cleanup on components stop (see // 1* above) {code:java} public TableManager( ... tablesByIdVv = new IncrementalVersionedValue<>(registry, HashMap::new); registry.accept(token -> { tablesToStopInCaseOfError.clear(); return completedFuture(null); }); {code} However inside IncrementalVersionedValue we have async storageRevision update processing {code:java} updaterFuture = updaterFuture.whenComplete((v, t) -> versionedValue.complete(causalityToken, localUpdaterFuture)); {code} As a result it's possible that we will clear tablesToStopInCaseOfError before publishing same revision tables to tablesByIdVv, so that we will miss that cleared tables in tablesByIdVv.latest() which is used in TableManager#stop. was: 1. ItDataTypesTest is flaky because previous ItCreateTableDdlTest tests failed to stop replicas on node stop: !Снимок экрана от 2023-04-06 10-39-32.png! {code:java} java.lang.AssertionError: There are replicas alive [replicas=[b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_21, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_6, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_13, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_8, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_9, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_11]] at org.apache.ignite.internal.replicator.ReplicaManager.stop(ReplicaManager.java:341) at org.apache.ignite.internal.app.LifecycleManager.lambda$stopAllComponents$1(LifecycleManager.java:133) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at org.apache.ignite.internal.app.LifecycleManager.stopAllComponents(LifecycleManager.java:131) at org.apache.ignite.internal.app.LifecycleManager.stopNode(LifecycleManager.java:115){code} 2. The reason why we failed to stop replicas is the race between tablesToStopInCaseOfError cleanup and adding tables to tablesByIdVv. On TableManager stop, we stop and cleanup all table resources like replicas and raft nodes {code:java} public void stop() { ... Map tables = tablesByIdVv.latest(); // 1* cleanUpTablesResources(tables); cleanUpTablesResources(tablesToStopInCaseOfError); ... }{code} where tablesToStopInCaseOfError is a sort of pending tables list which one is cleared on cfg storage revision update. tablesByIdVv *listens same storage revision update event* in order to publish tables related to the given revision or in other words make such tables accessible from tablesByIdVv.latest(); that one that is used in order to retrieve tables for cleanup on components stop (see // 1* above) {code:java} public TableManager( ... tablesByIdVv = new IncrementalVersionedValue<>(registry, HashMap::new); registry.accept(token -> { tablesToStopInCaseOfError.clear(); return completedFuture(null); }); {code} However inside IncrementalVersionedValue we have async storageRevision update processing {code:java} updaterFuture = updaterFuture.whenComplete((v, t) -> versionedValue.complete(causalityToken,
[jira] [Updated] (IGNITE-19238) ItDataTypesTest is flaky
[ https://issues.apache.org/jira/browse/IGNITE-19238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Lapin updated IGNITE-19238: - Fix Version/s: 3.0.0-beta2 > ItDataTypesTest is flaky > > > Key: IGNITE-19238 > URL: https://issues.apache.org/jira/browse/IGNITE-19238 > Project: Ignite > Issue Type: Bug >Reporter: Alexander Lapin >Assignee: Alexander Lapin >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Attachments: Снимок экрана от 2023-04-06 10-39-32.png > > > 1. ItDataTypesTest is flaky because previous ItCreateTableDdlTest tests > failed to stop replicas on node stop: > !Снимок экрана от 2023-04-06 10-39-32.png! > > {code:java} > java.lang.AssertionError: There are replicas alive > [replicas=[b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_21, > b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_6, > b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_13, > b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_8, > b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_9, > b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_11]] > at > org.apache.ignite.internal.replicator.ReplicaManager.stop(ReplicaManager.java:341) > at > org.apache.ignite.internal.app.LifecycleManager.lambda$stopAllComponents$1(LifecycleManager.java:133) > at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) > at > org.apache.ignite.internal.app.LifecycleManager.stopAllComponents(LifecycleManager.java:131) > at > org.apache.ignite.internal.app.LifecycleManager.stopNode(LifecycleManager.java:115){code} > 2. The reason why we failed to stop replicas is the race between > tablesToStopInCaseOfError cleanup and adding tables to tablesByIdVv. > On TableManager stop, we stop and cleanup all table resources like replicas > and raft nodes > {code:java} > public void stop() { > ... > Map tables = tablesByIdVv.latest(); // 1* > cleanUpTablesResources(tables); > cleanUpTablesResources(tablesToStopInCaseOfError); > ... > }{code} > where tablesToStopInCaseOfError is a sort of pending tables list which one is > cleared on cfg storage revision update. > tablesByIdVv *listens same storage revision update event* in order to publish > tables related to the given revision or in other words make such tables > accessible from tablesByIdVv.latest(); that one that is used in order to > retrieve tables for cleanup on components stop (see // 1* above) > {code:java} > public TableManager( > ... > tablesByIdVv = new IncrementalVersionedValue<>(registry, HashMap::new); > registry.accept(token -> { > tablesToStopInCaseOfError.clear(); > > return completedFuture(null); > }); > {code} > However inside IncrementalVersionedValue we have async storageRevision update > processing > > {code:java} > updaterFuture = updaterFuture.whenComplete((v, t) -> > versionedValue.complete(causalityToken, localUpdaterFuture)); {code} > As a result it's possible that we will clear tablesToStopInCaseOfError before > publishing same revision tables to tablesByIdVv, so that we will miss that > cleared tables in tablesByIdVv.latest() which is used in TableManager#stop. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-19168) Command for testing that snapshot partitions will be redistributed during restore
[ https://issues.apache.org/jira/browse/IGNITE-19168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julia Bakulina reassigned IGNITE-19168: --- Assignee: Julia Bakulina > Command for testing that snapshot partitions will be redistributed during > restore > - > > Key: IGNITE-19168 > URL: https://issues.apache.org/jira/browse/IGNITE-19168 > Project: Ignite > Issue Type: New Feature >Reporter: Ilya Shishkov >Assignee: Julia Bakulina >Priority: Minor > Labels: iep-43, ise > > When data is restored from snapshot, taken on other baseline topology (eg. > with another consistent identifiers or different cluster size) there will be > two stages that can last long enough: > # Partitions redistribution according to affinity function. > # Index rebuilding. > It would be nice to have command for checking, that such redistribution won't > or will occur, eg.: > {noformat} > control.sh --snapshot distribution snapshot_name > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-19163) Add logging of snapshot check
[ https://issues.apache.org/jira/browse/IGNITE-19163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julia Bakulina reassigned IGNITE-19163: --- Assignee: Julia Bakulina > Add logging of snapshot check > - > > Key: IGNITE-19163 > URL: https://issues.apache.org/jira/browse/IGNITE-19163 > Project: Ignite > Issue Type: Task >Reporter: Ilya Shishkov >Assignee: Julia Bakulina >Priority: Minor > Labels: iep-43, ise > > Sever nodes do not log state of snapshot check process, but it is necessary > for further analysis to print messages in log, when snapshot check procedure > started and finished. Currently, snapshot check invoked at least by restore > and check commands. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-19160) Improve message about sent partition file during snapshot restore
[ https://issues.apache.org/jira/browse/IGNITE-19160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julia Bakulina reassigned IGNITE-19160: --- Assignee: Julia Bakulina > Improve message about sent partition file during snapshot restore > - > > Key: IGNITE-19160 > URL: https://issues.apache.org/jira/browse/IGNITE-19160 > Project: Ignite > Issue Type: Task >Reporter: Ilya Shishkov >Assignee: Julia Bakulina >Priority: Minor > Labels: iep-43, ise > > Currently, message about partition is as below: > {quote} > [2023-03-29T18:31:44,773][INFO ][snapshot-runner-#863%node0%][SnapshotSender] > Partition file has been send [part=part-645.bin, > pair={color:red}GroupPartitionId [grpId=1544803905, partId=645]{color}, > length=45056] > {quote} > It does not tell us: > # Receiver node id / address / consistent id. > # Cache or cache group name which partition belongs to. Numerical group id is > not convenient way to determine cache or cache group. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-19158) Improve message about received partition file during snapshot restore
[ https://issues.apache.org/jira/browse/IGNITE-19158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julia Bakulina reassigned IGNITE-19158: --- Assignee: Julia Bakulina > Improve message about received partition file during snapshot restore > - > > Key: IGNITE-19158 > URL: https://issues.apache.org/jira/browse/IGNITE-19158 > Project: Ignite > Issue Type: Task >Reporter: Ilya Shishkov >Assignee: Julia Bakulina >Priority: Minor > Labels: iep-43, ise > > Currently, GridIoManager prints only name of a file and node id: > {quote} > [2023-03-24T18:07:00,747][INFO ]pub-#871%node1%[GridIoManager] File has been > received [name={color:red}part-233.bin{color}, transferred=53248, time=0.0 > sec, {color:red}rmtId=76e22ef5-3c76-4987-bebd-9a6222a0{color}] > {quote} > This meager information does not allow to determine in a simple way which > file is received and from which node. > For example, such message would be more informative: > {quote} > [2023-03-29T17:09:42,230][INFO ][pub-#869%node0%][GridIoManager] File has > been received > [{color:red}path=/ignite/db/node0/_tmp_snp_restore_cache-default/part-647.bin{color}, > transferred=45056, time=0.0 sec, rmtId=de43d2e8-a1ab-4d7c-9cea-72615371, > {color:red}rmdAddr=/127.0.0.1:51773{color}] > {quote} > _Other ways might be investigated_ in order to improve logging of receiving > partition files during snapshot restore. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19075) CLI should ask for SSL settings
[ https://issues.apache.org/jira/browse/IGNITE-19075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim Pakhnushev updated IGNITE-19075: -- Summary: CLI should ask for SSL settings (was: CLI should ask user for SSL settings, if needed ) > CLI should ask for SSL settings > --- > > Key: IGNITE-19075 > URL: https://issues.apache.org/jira/browse/IGNITE-19075 > Project: Ignite > Issue Type: Improvement > Components: cli >Reporter: Ivan Gagarkin >Assignee: Vadim Pakhnushev >Priority: Major > Labels: ignite-3 > Time Spent: 10m > Remaining Estimate: 0h > > Currently, a user just gets an error, if tries to contest a node via HTTPS > without SSL settings. > The CLI should ask a user to set SSL settings if gets an error on a call: > # Set trust store path > # Set trust store password > # Set key store path > # Set key store password > Save provided values to the config and repeat the call. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19246) CLI should ask for auth settings
[ https://issues.apache.org/jira/browse/IGNITE-19246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim Pakhnushev updated IGNITE-19246: -- Description: Currently, a user just gets an error, if tries to connect to a node which has authentication configured. The CLI should ask a user to set auth settings if gets an error on a call, save provided values to the config and repeat the call. was: Currently, a user just gets an error, if tries to contest a node via HTTPS without SSL settings. The CLI should ask a user to set SSL settings if gets an error on a call: # Set trust store path # Set trust store password # Set key store path # Set key store password Save provided values to the config and repeat the call. > CLI should ask for auth settings > > > Key: IGNITE-19246 > URL: https://issues.apache.org/jira/browse/IGNITE-19246 > Project: Ignite > Issue Type: Improvement > Components: cli >Reporter: Vadim Pakhnushev >Assignee: Vadim Pakhnushev >Priority: Major > Labels: ignite-3 > > Currently, a user just gets an error, if tries to connect to a node which has > authentication configured. > The CLI should ask a user to set auth settings if gets an error on a call, > save provided values to the config and repeat the call. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19246) CLI should ask for auth settings
[ https://issues.apache.org/jira/browse/IGNITE-19246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim Pakhnushev updated IGNITE-19246: -- Ignite Flags: (was: Docs Required,Release Notes Required) > CLI should ask for auth settings > > > Key: IGNITE-19246 > URL: https://issues.apache.org/jira/browse/IGNITE-19246 > Project: Ignite > Issue Type: Improvement > Components: cli >Reporter: Vadim Pakhnushev >Assignee: Vadim Pakhnushev >Priority: Major > Labels: ignite-3 > > Currently, a user just gets an error, if tries to connect to a node which has > authentication configured. > The CLI should ask a user to set auth settings if gets an error on a call, > save provided values to the config and repeat the call. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-19246) CLI should ask for auth settings
Vadim Pakhnushev created IGNITE-19246: - Summary: CLI should ask for auth settings Key: IGNITE-19246 URL: https://issues.apache.org/jira/browse/IGNITE-19246 Project: Ignite Issue Type: Improvement Components: cli Reporter: Vadim Pakhnushev Assignee: Vadim Pakhnushev Currently, a user just gets an error, if tries to contest a node via HTTPS without SSL settings. The CLI should ask a user to set SSL settings if gets an error on a call: # Set trust store path # Set trust store password # Set key store path # Set key store password Save provided values to the config and repeat the call. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19075) CLI should ask user for SSL settings, if needed
[ https://issues.apache.org/jira/browse/IGNITE-19075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim Pakhnushev updated IGNITE-19075: -- Summary: CLI should ask user for SSL settings, if needed (was: CLI should ask user set SSL and authentication settings, if needed ) > CLI should ask user for SSL settings, if needed > > > Key: IGNITE-19075 > URL: https://issues.apache.org/jira/browse/IGNITE-19075 > Project: Ignite > Issue Type: Improvement > Components: cli >Reporter: Ivan Gagarkin >Assignee: Vadim Pakhnushev >Priority: Major > Labels: ignite-3 > Time Spent: 10m > Remaining Estimate: 0h > > Currently, a user just gets an error, if tries to contest a node via HTTPS > without SSL settings. > The CLI should ask a user to set SSL settings if gets an error on a call: > # Set trust store path > # Set trust store password > # Set key store path > # Set key store password > Save provided values to the config and repeat the call. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-19245) Handle SSL errors
Vadim Pakhnushev created IGNITE-19245: - Summary: Handle SSL errors Key: IGNITE-19245 URL: https://issues.apache.org/jira/browse/IGNITE-19245 Project: Ignite Issue Type: Improvement Components: cli Reporter: Vadim Pakhnushev When SSL configuration is incorrect, generic {{Unknown error Couldn't build REST client}} is displayed. More information can be extracted from the underlying exceptions like key store file not found or password is incorrect. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-19236) orphaned_tests.txt location calculation simplification
[ https://issues.apache.org/jira/browse/IGNITE-19236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17709314#comment-17709314 ] Anton Vinogradov commented on IGNITE-19236: --- Merged to the master. [~timonin.maksim], thanks for the review! > orphaned_tests.txt location calculation simplification > -- > > Key: IGNITE-19236 > URL: https://issues.apache.org/jira/browse/IGNITE-19236 > Project: Ignite > Issue Type: Improvement >Reporter: Anton Vinogradov >Assignee: Anton Vinogradov >Priority: Major > Fix For: 2.15 > > Time Spent: 20m > Remaining Estimate: 0h > > Let's simplify the code :) > Current hardcode to the 'modules' does not allow to check projects based on > Ignite (where 'modules' folder is missed). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-16778) Support timestamp through jdbc
[ https://issues.apache.org/jira/browse/IGNITE-16778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Evgeny Stanilovsky updated IGNITE-16778: Ignite Flags: (was: Docs Required,Release Notes Required) > Support timestamp through jdbc > -- > > Key: IGNITE-16778 > URL: https://issues.apache.org/jira/browse/IGNITE-16778 > Project: Ignite > Issue Type: Bug > Components: jdbc, sql >Reporter: Alexander Belyak >Priority: Major > Labels: ignite-3 > Attachments: RunnerForTestNode.java > > > Able to use timestamp data type through KV view by LocalDateTime, but no > through jdbc setTimestamp method. Example in attachment -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-19244) Add file path completion to SSL config questions
Vadim Pakhnushev created IGNITE-19244: - Summary: Add file path completion to SSL config questions Key: IGNITE-19244 URL: https://issues.apache.org/jira/browse/IGNITE-19244 Project: Ignite Issue Type: Improvement Components: cli Reporter: Vadim Pakhnushev IGNITE-19075 introduced SSL configuration in REPL mode, file name completion should be added to questions that ask for the key store/trust store location. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-16778) Support timestamp through jdbc
[ https://issues.apache.org/jira/browse/IGNITE-16778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Evgeny Stanilovsky updated IGNITE-16778: Component/s: sql > Support timestamp through jdbc > -- > > Key: IGNITE-16778 > URL: https://issues.apache.org/jira/browse/IGNITE-16778 > Project: Ignite > Issue Type: Bug > Components: jdbc, sql >Reporter: Alexander Belyak >Priority: Major > Labels: ignite-3 > Attachments: RunnerForTestNode.java > > > Able to use timestamp data type through KV view by LocalDateTime, but no > through jdbc setTimestamp method. Example in attachment -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19239) Checkpoint read lock acquisition timeouts during snapshot restore
[ https://issues.apache.org/jira/browse/IGNITE-19239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Shishkov updated IGNITE-19239: --- Description: There may be possible error messages about checkpoint read lock acquisition timeouts and critical threads blocking during snapshot restore process (just after caches start): {quote} [2023-04-06T10:55:46,561][ERROR]\[ttl-cleanup-worker-#475%node%][CheckpointTimeoutLock] Checkpoint read lock acquisition has been timed out. {quote} {quote} [2023-04-06T10:55:47,487][ERROR]\[tcp-disco-msg-worker-[crd]-#23%node%-#446%node%][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [workerName=db-checkpoint-thread, threadName=db-checkpoint-thread-#457%snapshot.BlockingThreadsOnSnapshotRestoreReproducerTest0%, {color:red}blockedFor=100s{color}] {quote} Also there are active exchange process, which finishes with such timings (timing will be approximatelly equal to blocking time of threads): {quote} [2023-04-06T10:55:52,211][INFO ]\[exchange-worker-#450%node%][GridDhtPartitionsExchangeFuture] Exchange timings [startVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], resVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], stage="Waiting in exchange queue" (0 ms), ..., stage="Restore partition states" ({color:red}100163 ms{color}), ..., stage="Total time" ({color:red}100334 ms{color})] {quote} How to reproduce: # Set checkpoint frequency less than failure detection timeout. # Ensure, that cache groups partitions states restoring lasts more than failure detection timeout, i.e. it is actual to sufficiently large caches. Reproducer: [^BlockingThreadsOnSnapshotRestoreReproducerTest.patch] was: There may be possible error messages about checkpoint read lock acquisition timeouts and critical threads blocking during snapshot restore process (just after caches start): {quote} [2023-04-06T10:55:46,561][ERROR][ttl-cleanup-worker-#475%node%][CheckpointTimeoutLock] Checkpoint read lock acquisition has been timed out. {quote} {quote} [2023-04-06T10:55:47,487][ERROR][tcp-disco-msg-worker-[crd]-#23%node%-#446%node%][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [workerName=db-checkpoint-thread, threadName=db-checkpoint-thread-#457%snapshot.BlockingThreadsOnSnapshotRestoreReproducerTest0%, *{color:red}blockedFor=100s{color}*] {quote} Also there are active exchange process, which finishes with such timings (timing will be approximatelly equal to blocking time of threads): {quote} [2023-04-06T10:55:52,211][INFO ][exchange-worker-#450%node%][GridDhtPartitionsExchangeFuture] Exchange timings [startVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], resVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], stage="Waiting in exchange queue" (0 ms), ..., stage="Restore partition states" (*{color:red}100163 ms{color}*), ..., stage="Total time" (*{color:red}100334 ms{color}*)] {quote} How to reproduce: # Set checkpoint frequency less than failure detection timeout. # Ensure, that cache groups partitions states restoring lasts more than failure detection timeout, i.e. it is actual to sufficiently large caches. Reproducer: [^BlockingThreadsOnSnapshotRestoreReproducerTest.patch] > Checkpoint read lock acquisition timeouts during snapshot restore > - > > Key: IGNITE-19239 > URL: https://issues.apache.org/jira/browse/IGNITE-19239 > Project: Ignite > Issue Type: Bug >Reporter: Ilya Shishkov >Priority: Minor > Labels: iep-43, ise > Attachments: BlockingThreadsOnSnapshotRestoreReproducerTest.patch > > > There may be possible error messages about checkpoint read lock acquisition > timeouts and critical threads blocking during snapshot restore process (just > after caches start): > {quote} > [2023-04-06T10:55:46,561][ERROR]\[ttl-cleanup-worker-#475%node%][CheckpointTimeoutLock] > Checkpoint read lock acquisition has been timed out. > {quote} > {quote} > [2023-04-06T10:55:47,487][ERROR]\[tcp-disco-msg-worker-[crd]-#23%node%-#446%node%][G] > Blocked system-critical thread has been detected. This can lead to > cluster-wide undefined behaviour [workerName=db-checkpoint-thread, > threadName=db-checkpoint-thread-#457%snapshot.BlockingThreadsOnSnapshotRestoreReproducerTest0%, > {color:red}blockedFor=100s{color}] > {quote} > Also there are active exchange process, which finishes with such timings > (timing will be approximatelly equal to blocking time of threads): > {quote} > [2023-04-06T10:55:52,211][INFO > ]\[exchange-worker-#450%node%][GridDhtPartitionsExchangeFuture] Exchange > timings [startVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], > resVer=AffinityTopologyVersion [topVer=1, minorTopVer=5],
[jira] [Updated] (IGNITE-19239) Checkpoint read lock acquisition timeouts during snapshot restore
[ https://issues.apache.org/jira/browse/IGNITE-19239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Shishkov updated IGNITE-19239: --- Description: There may be possible error messages about checkpoint read lock acquisition timeouts and critical threads blocking during snapshot restore process (just after caches start): {quote} [2023-04-06T10:55:46,561][ERROR][ttl-cleanup-worker-#475%node%][CheckpointTimeoutLock] Checkpoint read lock acquisition has been timed out. {quote} {quote} [2023-04-06T10:55:47,487][ERROR][tcp-disco-msg-worker-[crd]-#23%node%-#446%node%][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [workerName=db-checkpoint-thread, threadName=db-checkpoint-thread-#457%snapshot.BlockingThreadsOnSnapshotRestoreReproducerTest0%, *{color:red}blockedFor=100s{color}*] {quote} Also there are active exchange process, which finishes with such timings (timing will be approximatelly equal to blocking time of threads): {quote} [2023-04-06T10:55:52,211][INFO ][exchange-worker-#450%node%][GridDhtPartitionsExchangeFuture] Exchange timings [startVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], resVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], stage="Waiting in exchange queue" (0 ms), ..., stage="Restore partition states" (*{color:red}100163 ms{color}*), ..., stage="Total time" (*{color:red}100334 ms{color}*)] {quote} How to reproduce: # Set checkpoint frequency less than failure detection timeout. # Ensure, that cache groups partitions states restoring lasts more than failure detection timeout, i.e. it is actual to sufficiently large caches. Reproducer: [^BlockingThreadsOnSnapshotRestoreReproducerTest.patch] was: There may be possible error messages about checkpoint read lock acquisition timeouts and critical threads blocking during snapshot restore process (just after caches start): {quote} [2023-04-06T10:55:46,561][ERROR][ttl-cleanup-worker-#475%node%][CheckpointTimeoutLock] Checkpoint read lock acquisition has been timed out. {quote} {quote} [2023-04-06T10:55:47,487][ERROR][tcp-disco-msg-worker-[crd]-#23%node%-#446%node%][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [workerName=db-checkpoint-thread, threadName=db-checkpoint-thread-#457%snapshot.BlockingThreadsOnSnapshotRestoreReproducerTest0%, *{color:red}blockedFor=100s{color}*] {quote} Also there are activ exchange process and after finish Exchange future will print such timing: [2023-04-06T10:55:52,211][INFO ][exchange-worker-#450%node%][GridDhtPartitionsExchangeFuture] Exchange timings [startVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], resVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], stage="Waiting in exchange queue" (0 ms), ..., stage="Restore partition states" (*{color:red}100163 ms{color}*), ..., stage="Total time" (*{color:red}100334 ms{color}*)] How to reproduce: # Set checkpoint frequency less than failure detection timeout. # Ensure, that cache groups partitions states restoring lasts more than failure detection timeout, i.e. it is actual to sufficiently large caches. Reproducer: [^BlockingThreadsOnSnapshotRestoreReproducerTest.patch] > Checkpoint read lock acquisition timeouts during snapshot restore > - > > Key: IGNITE-19239 > URL: https://issues.apache.org/jira/browse/IGNITE-19239 > Project: Ignite > Issue Type: Bug >Reporter: Ilya Shishkov >Priority: Minor > Labels: iep-43, ise > Attachments: BlockingThreadsOnSnapshotRestoreReproducerTest.patch > > > There may be possible error messages about checkpoint read lock acquisition > timeouts and critical threads blocking during snapshot restore process (just > after caches start): > {quote} > [2023-04-06T10:55:46,561][ERROR][ttl-cleanup-worker-#475%node%][CheckpointTimeoutLock] > Checkpoint read lock acquisition has been timed out. > {quote} > {quote} > [2023-04-06T10:55:47,487][ERROR][tcp-disco-msg-worker-[crd]-#23%node%-#446%node%][G] > Blocked system-critical thread has been detected. This can lead to > cluster-wide undefined behaviour [workerName=db-checkpoint-thread, > threadName=db-checkpoint-thread-#457%snapshot.BlockingThreadsOnSnapshotRestoreReproducerTest0%, > *{color:red}blockedFor=100s{color}*] > {quote} > Also there are active exchange process, which finishes with such timings > (timing will be approximatelly equal to blocking time of threads): > {quote} > [2023-04-06T10:55:52,211][INFO > ][exchange-worker-#450%node%][GridDhtPartitionsExchangeFuture] Exchange > timings [startVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], > resVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], stage="Waiting in > exchange queue" (0 ms), ..., stage="Restore
[jira] [Updated] (IGNITE-19240) Use HTTPS port for dynamic completers when connected to SSL enabled node
[ https://issues.apache.org/jira/browse/IGNITE-19240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim Pakhnushev updated IGNITE-19240: -- Description: Currently {{NodeNameRegistryImpl.urlFromClusterNode}} uses an HTTP port when constructing URLs for completion. HTTPS port should be used if the node is configured with SSL enabled. Even if we construct proper URL, it might be incorrect due to the {{NodeMetadata.getRestHost}} returning some IP number which is not verifiable with provided trust store. was:Currently {{NodeNameRegistryImpl.urlFromClusterNode}} uses an HTTP port when constructing URLs for completion. HTTPS port should be used if the node is configured with SSL enabled. > Use HTTPS port for dynamic completers when connected to SSL enabled node > > > Key: IGNITE-19240 > URL: https://issues.apache.org/jira/browse/IGNITE-19240 > Project: Ignite > Issue Type: Bug > Components: cli >Reporter: Vadim Pakhnushev >Assignee: Vadim Pakhnushev >Priority: Major > Labels: ignite-3 > > Currently {{NodeNameRegistryImpl.urlFromClusterNode}} uses an HTTP port when > constructing URLs for completion. HTTPS port should be used if the node is > configured with SSL enabled. > Even if we construct proper URL, it might be incorrect due to the > {{NodeMetadata.getRestHost}} returning some IP number which is not verifiable > with provided trust store. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19239) Checkpoint read lock acquisition timeouts during snapshot restore
[ https://issues.apache.org/jira/browse/IGNITE-19239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Shishkov updated IGNITE-19239: --- Description: There may be possible error messages about checkpoint read lock acquisition timeouts and critical threads blocking during snapshot restore process (just after caches start): {quote} [2023-04-06T10:55:46,561][ERROR][ttl-cleanup-worker-#475%node%][CheckpointTimeoutLock] Checkpoint read lock acquisition has been timed out. {quote} {quote} [2023-04-06T10:55:47,487][ERROR][tcp-disco-msg-worker-[crd]-#23%node%-#446%node%][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [workerName=db-checkpoint-thread, threadName=db-checkpoint-thread-#457%snapshot.BlockingThreadsOnSnapshotRestoreReproducerTest0%, *{color:red}blockedFor=100s{color}*] {quote} Also there are activ exchange process and after finish Exchange future will print such timing: [2023-04-06T10:55:52,211][INFO ][exchange-worker-#450%node%][GridDhtPartitionsExchangeFuture] Exchange timings [startVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], resVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], stage="Waiting in exchange queue" (0 ms), ..., stage="Restore partition states" (*{color:red}100163 ms{color}*), ..., stage="Total time" (*{color:red}100334 ms{color}*)] How to reproduce: # Set checkpoint frequency less than failure detection timeout. # Ensure, that cache groups partitions states restoring lasts more than failure detection timeout, i.e. it is actual to sufficiently large caches. Reproducer: [^BlockingThreadsOnSnapshotRestoreReproducerTest.patch] was: There are possible error messages about checkpoint read lock acquisition timeouts and critical threads blocking during snapshot restore process (just after caches start). How to reproduce: # Set checkpoint frequency less than failure detection timeout. # Ensure, that cache groups partitions states restoring lasts more than failure detection timeout, i.e. it is actual to sufficiently large caches. Reproducer: [^BlockingThreadsOnSnapshotRestoreReproducerTest.patch] > Checkpoint read lock acquisition timeouts during snapshot restore > - > > Key: IGNITE-19239 > URL: https://issues.apache.org/jira/browse/IGNITE-19239 > Project: Ignite > Issue Type: Bug >Reporter: Ilya Shishkov >Priority: Minor > Labels: iep-43, ise > Attachments: BlockingThreadsOnSnapshotRestoreReproducerTest.patch > > > There may be possible error messages about checkpoint read lock acquisition > timeouts and critical threads blocking during snapshot restore process (just > after caches start): > {quote} > [2023-04-06T10:55:46,561][ERROR][ttl-cleanup-worker-#475%node%][CheckpointTimeoutLock] > Checkpoint read lock acquisition has been timed out. > {quote} > {quote} > [2023-04-06T10:55:47,487][ERROR][tcp-disco-msg-worker-[crd]-#23%node%-#446%node%][G] > Blocked system-critical thread has been detected. This can lead to > cluster-wide undefined behaviour [workerName=db-checkpoint-thread, > threadName=db-checkpoint-thread-#457%snapshot.BlockingThreadsOnSnapshotRestoreReproducerTest0%, > *{color:red}blockedFor=100s{color}*] > {quote} > Also there are activ exchange process and after finish Exchange future will > print such timing: > [2023-04-06T10:55:52,211][INFO > ][exchange-worker-#450%node%][GridDhtPartitionsExchangeFuture] Exchange > timings [startVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], > resVer=AffinityTopologyVersion [topVer=1, minorTopVer=5], stage="Waiting in > exchange queue" (0 ms), ..., stage="Restore partition states" > (*{color:red}100163 ms{color}*), ..., stage="Total time" (*{color:red}100334 > ms{color}*)] > How to reproduce: > # Set checkpoint frequency less than failure detection timeout. > # Ensure, that cache groups partitions states restoring lasts more than > failure detection timeout, i.e. it is actual to sufficiently large caches. > Reproducer: [^BlockingThreadsOnSnapshotRestoreReproducerTest.patch] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-19187) Sql. Handle StorageRebalanceException during rowsCount estimation
[ https://issues.apache.org/jira/browse/IGNITE-19187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Pereslegin reassigned IGNITE-19187: - Assignee: Pavel Pereslegin > Sql. Handle StorageRebalanceException during rowsCount estimation > - > > Key: IGNITE-19187 > URL: https://issues.apache.org/jira/browse/IGNITE-19187 > Project: Ignite > Issue Type: Bug > Components: sql >Reporter: Konstantin Orlov >Assignee: Pavel Pereslegin >Priority: Major > Labels: ignite-3 > > We need to handle StorageRebalanceException which may be thrown from > {{org.apache.ignite.internal.storage.MvPartitionStorage#rowsCount}} during > row count estimation > ({{org.apache.ignite.internal.sql.engine.schema.IgniteTableImpl.StatisticsImpl#getRowCount}}). > {code:java} > Caused by: org.apache.ignite.internal.storage.StorageRebalanceException: > IGN-STORAGE-4 TraceId:a943b5f5-8018-4c4b-9e66-cc5060796848 Storage in the > process of rebalancing: [table=TEST, partitionId=0] > at > app//org.apache.ignite.internal.storage.util.StorageUtils.throwExceptionDependingOnStorageState(StorageUtils.java:129) > at > app//org.apache.ignite.internal.storage.util.StorageUtils.throwExceptionIfStorageNotInRunnableState(StorageUtils.java:51) > at > app//org.apache.ignite.internal.storage.pagememory.mv.AbstractPageMemoryMvPartitionStorage.throwExceptionIfStorageNotInRunnableState(AbstractPageMemoryMvPartitionStorage.java:894) > at > app//org.apache.ignite.internal.storage.pagememory.mv.AbstractPageMemoryMvPartitionStorage.lambda$rowsCount$24(AbstractPageMemoryMvPartitionStorage.java:707) > at > app//org.apache.ignite.internal.storage.pagememory.mv.AbstractPageMemoryMvPartitionStorage.busy(AbstractPageMemoryMvPartitionStorage.java:785) > at > app//org.apache.ignite.internal.storage.pagememory.mv.AbstractPageMemoryMvPartitionStorage.rowsCount(AbstractPageMemoryMvPartitionStorage.java:706) > at > app//org.apache.ignite.internal.sql.engine.schema.IgniteTableImpl$StatisticsImpl.getRowCount(IgniteTableImpl.java:551) > at > app//org.apache.calcite.prepare.RelOptTableImpl.getRowCount(RelOptTableImpl.java:238) > at > app//org.apache.ignite.internal.sql.engine.rel.ProjectableFilterableTableScan.computeSelfCost(ProjectableFilterableTableScan.java:156) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19243) C++ 3.0: propagate table schema updates to client on write-only operations
[ https://issues.apache.org/jira/browse/IGNITE-19243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Tupitsyn updated IGNITE-19243: Language: C++ (was: Java) > C++ 3.0: propagate table schema updates to client on write-only operations > -- > > Key: IGNITE-19243 > URL: https://issues.apache.org/jira/browse/IGNITE-19243 > Project: Ignite > Issue Type: Improvement > Components: thin client >Affects Versions: 3.0.0-beta1 >Reporter: Pavel Tupitsyn >Assignee: Pavel Tupitsyn >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > Currently, C++ client receives table schema updates when write-read requests > are performed. For example, client performs TUPLE_GET request, sends key > tuple using old schema version, receives result tuple with the latest schema > version, and retrieves the latest schema. > However, some requests are "write-only": client sends a tuple, but does not > receive one back, like TUPLE_UPSERT. No schema updates are performed in this > case. > To fix this, include the latest schema version into all write-only operation > responses: > * TUPLE_UPSERT > * TUPLE_UPSERT_ALL > * TUPLE_INSERT > * TUPLE_INSERT_ALL > * TUPLE_REPLACE > * TUPLE_REPLACE_EXACT > * TUPLE_DELETE > * TUPLE_DELETE_ALL > * TUPLE_DELETE_EXACT > * TUPLE_DELETE_ALL_EXACT > * TUPLE_CONTAINS_KEY > Client will compare this version to the known one and perform a background > update, if necessary. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19242) .NET: Thin 3.0: propagate table schema updates to client on write-only operations
[ https://issues.apache.org/jira/browse/IGNITE-19242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Tupitsyn updated IGNITE-19242: Language: C# (was: Java) > .NET: Thin 3.0: propagate table schema updates to client on write-only > operations > - > > Key: IGNITE-19242 > URL: https://issues.apache.org/jira/browse/IGNITE-19242 > Project: Ignite > Issue Type: Improvement > Components: thin client >Affects Versions: 3.0.0-beta1 >Reporter: Pavel Tupitsyn >Assignee: Pavel Tupitsyn >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > Currently, .NET client receives table schema updates when write-read requests > are performed. For example, client performs TUPLE_GET request, sends key > tuple using old schema version, receives result tuple with the latest schema > version, and retrieves the latest schema. > However, some requests are "write-only": client sends a tuple, but does not > receive one back, like TUPLE_UPSERT. No schema updates are performed in this > case. > To fix this, include the latest schema version into all write-only operation > responses: > * TUPLE_UPSERT > * TUPLE_UPSERT_ALL > * TUPLE_INSERT > * TUPLE_INSERT_ALL > * TUPLE_REPLACE > * TUPLE_REPLACE_EXACT > * TUPLE_DELETE > * TUPLE_DELETE_ALL > * TUPLE_DELETE_EXACT > * TUPLE_DELETE_ALL_EXACT > * TUPLE_CONTAINS_KEY > Client will compare this version to the known one and perform a background > update, if necessary. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-19243) C++ 3.0:: propagate table schema updates to client on write-only operations
Pavel Tupitsyn created IGNITE-19243: --- Summary: C++ 3.0:: propagate table schema updates to client on write-only operations Key: IGNITE-19243 URL: https://issues.apache.org/jira/browse/IGNITE-19243 Project: Ignite Issue Type: Improvement Components: thin client Affects Versions: 3.0.0-beta1 Reporter: Pavel Tupitsyn Assignee: Pavel Tupitsyn Fix For: 3.0.0-beta2 Currently, Java client receives table schema updates when write-read requests are performed. For example, client performs TUPLE_GET request, sends key tuple using old schema version, receives result tuple with the latest schema version, and retrieves the latest schema. However, some requests are "write-only": client sends a tuple, but does not receive one back, like TUPLE_UPSERT. No schema updates are performed in this case. To fix this, include the latest schema version into all write-only operation responses: * TUPLE_UPSERT * TUPLE_UPSERT_ALL * TUPLE_INSERT * TUPLE_INSERT_ALL * TUPLE_REPLACE * TUPLE_REPLACE_EXACT * TUPLE_DELETE * TUPLE_DELETE_ALL * TUPLE_DELETE_EXACT * TUPLE_DELETE_ALL_EXACT * TUPLE_CONTAINS_KEY Client will compare this version to the known one and perform a background update, if necessary. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19243) C++ 3.0: propagate table schema updates to client on write-only operations
[ https://issues.apache.org/jira/browse/IGNITE-19243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Tupitsyn updated IGNITE-19243: Summary: C++ 3.0: propagate table schema updates to client on write-only operations (was: C++ 3.0:: propagate table schema updates to client on write-only operations) > C++ 3.0: propagate table schema updates to client on write-only operations > -- > > Key: IGNITE-19243 > URL: https://issues.apache.org/jira/browse/IGNITE-19243 > Project: Ignite > Issue Type: Improvement > Components: thin client >Affects Versions: 3.0.0-beta1 >Reporter: Pavel Tupitsyn >Assignee: Pavel Tupitsyn >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > Currently, Java client receives table schema updates when write-read requests > are performed. For example, client performs TUPLE_GET request, sends key > tuple using old schema version, receives result tuple with the latest schema > version, and retrieves the latest schema. > However, some requests are "write-only": client sends a tuple, but does not > receive one back, like TUPLE_UPSERT. No schema updates are performed in this > case. > To fix this, include the latest schema version into all write-only operation > responses: > * TUPLE_UPSERT > * TUPLE_UPSERT_ALL > * TUPLE_INSERT > * TUPLE_INSERT_ALL > * TUPLE_REPLACE > * TUPLE_REPLACE_EXACT > * TUPLE_DELETE > * TUPLE_DELETE_ALL > * TUPLE_DELETE_EXACT > * TUPLE_DELETE_ALL_EXACT > * TUPLE_CONTAINS_KEY > Client will compare this version to the known one and perform a background > update, if necessary. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19243) C++ 3.0: propagate table schema updates to client on write-only operations
[ https://issues.apache.org/jira/browse/IGNITE-19243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Tupitsyn updated IGNITE-19243: Description: Currently, C++ client receives table schema updates when write-read requests are performed. For example, client performs TUPLE_GET request, sends key tuple using old schema version, receives result tuple with the latest schema version, and retrieves the latest schema. However, some requests are "write-only": client sends a tuple, but does not receive one back, like TUPLE_UPSERT. No schema updates are performed in this case. To fix this, include the latest schema version into all write-only operation responses: * TUPLE_UPSERT * TUPLE_UPSERT_ALL * TUPLE_INSERT * TUPLE_INSERT_ALL * TUPLE_REPLACE * TUPLE_REPLACE_EXACT * TUPLE_DELETE * TUPLE_DELETE_ALL * TUPLE_DELETE_EXACT * TUPLE_DELETE_ALL_EXACT * TUPLE_CONTAINS_KEY Client will compare this version to the known one and perform a background update, if necessary. was: Currently, Java client receives table schema updates when write-read requests are performed. For example, client performs TUPLE_GET request, sends key tuple using old schema version, receives result tuple with the latest schema version, and retrieves the latest schema. However, some requests are "write-only": client sends a tuple, but does not receive one back, like TUPLE_UPSERT. No schema updates are performed in this case. To fix this, include the latest schema version into all write-only operation responses: * TUPLE_UPSERT * TUPLE_UPSERT_ALL * TUPLE_INSERT * TUPLE_INSERT_ALL * TUPLE_REPLACE * TUPLE_REPLACE_EXACT * TUPLE_DELETE * TUPLE_DELETE_ALL * TUPLE_DELETE_EXACT * TUPLE_DELETE_ALL_EXACT * TUPLE_CONTAINS_KEY Client will compare this version to the known one and perform a background update, if necessary. > C++ 3.0: propagate table schema updates to client on write-only operations > -- > > Key: IGNITE-19243 > URL: https://issues.apache.org/jira/browse/IGNITE-19243 > Project: Ignite > Issue Type: Improvement > Components: thin client >Affects Versions: 3.0.0-beta1 >Reporter: Pavel Tupitsyn >Assignee: Pavel Tupitsyn >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > Currently, C++ client receives table schema updates when write-read requests > are performed. For example, client performs TUPLE_GET request, sends key > tuple using old schema version, receives result tuple with the latest schema > version, and retrieves the latest schema. > However, some requests are "write-only": client sends a tuple, but does not > receive one back, like TUPLE_UPSERT. No schema updates are performed in this > case. > To fix this, include the latest schema version into all write-only operation > responses: > * TUPLE_UPSERT > * TUPLE_UPSERT_ALL > * TUPLE_INSERT > * TUPLE_INSERT_ALL > * TUPLE_REPLACE > * TUPLE_REPLACE_EXACT > * TUPLE_DELETE > * TUPLE_DELETE_ALL > * TUPLE_DELETE_EXACT > * TUPLE_DELETE_ALL_EXACT > * TUPLE_CONTAINS_KEY > Client will compare this version to the known one and perform a background > update, if necessary. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-19242) .NET: Thin 3.0: propagate table schema updates to client on write-only operations
Pavel Tupitsyn created IGNITE-19242: --- Summary: .NET: Thin 3.0: propagate table schema updates to client on write-only operations Key: IGNITE-19242 URL: https://issues.apache.org/jira/browse/IGNITE-19242 Project: Ignite Issue Type: Improvement Components: thin client Affects Versions: 3.0.0-beta1 Reporter: Pavel Tupitsyn Assignee: Pavel Tupitsyn Fix For: 3.0.0-beta2 Currently, Java client receives table schema updates when write-read requests are performed. For example, client performs TUPLE_GET request, sends key tuple using old schema version, receives result tuple with the latest schema version, and retrieves the latest schema. However, some requests are "write-only": client sends a tuple, but does not receive one back, like TUPLE_UPSERT. No schema updates are performed in this case. To fix this, include the latest schema version into all write-only operation responses: * TUPLE_UPSERT * TUPLE_UPSERT_ALL * TUPLE_INSERT * TUPLE_INSERT_ALL * TUPLE_REPLACE * TUPLE_REPLACE_EXACT * TUPLE_DELETE * TUPLE_DELETE_ALL * TUPLE_DELETE_EXACT * TUPLE_DELETE_ALL_EXACT * TUPLE_CONTAINS_KEY Client will compare this version to the known one and perform a background update, if necessary. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19241) Java thin 3.0: propagate table schema updates to client on write-only operations
[ https://issues.apache.org/jira/browse/IGNITE-19241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Tupitsyn updated IGNITE-19241: Summary: Java thin 3.0: propagate table schema updates to client on write-only operations (was: Java thin: propagate table schema updates to client on write-only operations) > Java thin 3.0: propagate table schema updates to client on write-only > operations > > > Key: IGNITE-19241 > URL: https://issues.apache.org/jira/browse/IGNITE-19241 > Project: Ignite > Issue Type: Improvement > Components: thin client >Affects Versions: 3.0.0-beta1 >Reporter: Pavel Tupitsyn >Assignee: Pavel Tupitsyn >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > Currently, Java client receives table schema updates when write-read requests > are performed. For example, client performs TUPLE_GET request, sends key > tuple using old schema version, receives result tuple with the latest schema > version, and retrieves the latest schema. > However, some requests are "write-only": client sends a tuple, but does not > receive one back, like TUPLE_UPSERT. No schema updates are performed in this > case. > To fix this, include the latest schema version into all write-only operation > responses: > * TUPLE_UPSERT > * TUPLE_UPSERT_ALL > * TUPLE_INSERT > * TUPLE_INSERT_ALL > * TUPLE_REPLACE > * TUPLE_REPLACE_EXACT > * TUPLE_DELETE > * TUPLE_DELETE_ALL > * TUPLE_DELETE_EXACT > * TUPLE_DELETE_ALL_EXACT > * TUPLE_CONTAINS_KEY > Client will compare this version to the known one and perform a background > update, if necessary. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19242) .NET: Thin 3.0: propagate table schema updates to client on write-only operations
[ https://issues.apache.org/jira/browse/IGNITE-19242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Tupitsyn updated IGNITE-19242: Description: Currently, .NET client receives table schema updates when write-read requests are performed. For example, client performs TUPLE_GET request, sends key tuple using old schema version, receives result tuple with the latest schema version, and retrieves the latest schema. However, some requests are "write-only": client sends a tuple, but does not receive one back, like TUPLE_UPSERT. No schema updates are performed in this case. To fix this, include the latest schema version into all write-only operation responses: * TUPLE_UPSERT * TUPLE_UPSERT_ALL * TUPLE_INSERT * TUPLE_INSERT_ALL * TUPLE_REPLACE * TUPLE_REPLACE_EXACT * TUPLE_DELETE * TUPLE_DELETE_ALL * TUPLE_DELETE_EXACT * TUPLE_DELETE_ALL_EXACT * TUPLE_CONTAINS_KEY Client will compare this version to the known one and perform a background update, if necessary. was: Currently, Java client receives table schema updates when write-read requests are performed. For example, client performs TUPLE_GET request, sends key tuple using old schema version, receives result tuple with the latest schema version, and retrieves the latest schema. However, some requests are "write-only": client sends a tuple, but does not receive one back, like TUPLE_UPSERT. No schema updates are performed in this case. To fix this, include the latest schema version into all write-only operation responses: * TUPLE_UPSERT * TUPLE_UPSERT_ALL * TUPLE_INSERT * TUPLE_INSERT_ALL * TUPLE_REPLACE * TUPLE_REPLACE_EXACT * TUPLE_DELETE * TUPLE_DELETE_ALL * TUPLE_DELETE_EXACT * TUPLE_DELETE_ALL_EXACT * TUPLE_CONTAINS_KEY Client will compare this version to the known one and perform a background update, if necessary. > .NET: Thin 3.0: propagate table schema updates to client on write-only > operations > - > > Key: IGNITE-19242 > URL: https://issues.apache.org/jira/browse/IGNITE-19242 > Project: Ignite > Issue Type: Improvement > Components: thin client >Affects Versions: 3.0.0-beta1 >Reporter: Pavel Tupitsyn >Assignee: Pavel Tupitsyn >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > Currently, .NET client receives table schema updates when write-read requests > are performed. For example, client performs TUPLE_GET request, sends key > tuple using old schema version, receives result tuple with the latest schema > version, and retrieves the latest schema. > However, some requests are "write-only": client sends a tuple, but does not > receive one back, like TUPLE_UPSERT. No schema updates are performed in this > case. > To fix this, include the latest schema version into all write-only operation > responses: > * TUPLE_UPSERT > * TUPLE_UPSERT_ALL > * TUPLE_INSERT > * TUPLE_INSERT_ALL > * TUPLE_REPLACE > * TUPLE_REPLACE_EXACT > * TUPLE_DELETE > * TUPLE_DELETE_ALL > * TUPLE_DELETE_EXACT > * TUPLE_DELETE_ALL_EXACT > * TUPLE_CONTAINS_KEY > Client will compare this version to the known one and perform a background > update, if necessary. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19241) Java thin: propagate table schema updates to client on write-only operations
[ https://issues.apache.org/jira/browse/IGNITE-19241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Tupitsyn updated IGNITE-19241: Description: Currently, Java client receives table schema updates when write-read requests are performed. For example, client performs TUPLE_GET request, sends key tuple using old schema version, receives result tuple with the latest schema version, and retrieves the latest schema. However, some requests are "write-only": client sends a tuple, but does not receive one back, like TUPLE_UPSERT. No schema updates are performed in this case. To fix this, include the latest schema version into all write-only operation responses: * TUPLE_UPSERT * TUPLE_UPSERT_ALL * TUPLE_INSERT * TUPLE_INSERT_ALL * TUPLE_REPLACE * TUPLE_REPLACE_EXACT * TUPLE_DELETE * TUPLE_DELETE_ALL * TUPLE_DELETE_EXACT * TUPLE_DELETE_ALL_EXACT * TUPLE_CONTAINS_KEY Client will compare this version to the known one and perform a background update, if necessary. > Java thin: propagate table schema updates to client on write-only operations > > > Key: IGNITE-19241 > URL: https://issues.apache.org/jira/browse/IGNITE-19241 > Project: Ignite > Issue Type: Improvement > Components: thin client >Affects Versions: 3.0.0-beta1 >Reporter: Pavel Tupitsyn >Assignee: Pavel Tupitsyn >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > Currently, Java client receives table schema updates when write-read requests > are performed. For example, client performs TUPLE_GET request, sends key > tuple using old schema version, receives result tuple with the latest schema > version, and retrieves the latest schema. > However, some requests are "write-only": client sends a tuple, but does not > receive one back, like TUPLE_UPSERT. No schema updates are performed in this > case. > To fix this, include the latest schema version into all write-only operation > responses: > * TUPLE_UPSERT > * TUPLE_UPSERT_ALL > * TUPLE_INSERT > * TUPLE_INSERT_ALL > * TUPLE_REPLACE > * TUPLE_REPLACE_EXACT > * TUPLE_DELETE > * TUPLE_DELETE_ALL > * TUPLE_DELETE_EXACT > * TUPLE_DELETE_ALL_EXACT > * TUPLE_CONTAINS_KEY > Client will compare this version to the known one and perform a background > update, if necessary. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-19241) Java thin: propagate table schema updates to client on write-only operations
Pavel Tupitsyn created IGNITE-19241: --- Summary: Java thin: propagate table schema updates to client on write-only operations Key: IGNITE-19241 URL: https://issues.apache.org/jira/browse/IGNITE-19241 Project: Ignite Issue Type: Improvement Components: thin client Affects Versions: 3.0.0-beta1 Reporter: Pavel Tupitsyn Assignee: Pavel Tupitsyn Fix For: 3.0.0-beta2 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-16778) Support timestamp through jdbc
[ https://issues.apache.org/jira/browse/IGNITE-16778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17709303#comment-17709303 ] Ivan Artukhov commented on IGNITE-16778: [~jooger] FYI ^^^ > Support timestamp through jdbc > -- > > Key: IGNITE-16778 > URL: https://issues.apache.org/jira/browse/IGNITE-16778 > Project: Ignite > Issue Type: Bug > Components: jdbc >Reporter: Alexander Belyak >Priority: Major > Labels: ignite-3 > Attachments: RunnerForTestNode.java > > > Able to use timestamp data type through KV view by LocalDateTime, but no > through jdbc setTimestamp method. Example in attachment -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-19240) Use HTTPS port for dynamic completers when connected to SSL enabled node
Vadim Pakhnushev created IGNITE-19240: - Summary: Use HTTPS port for dynamic completers when connected to SSL enabled node Key: IGNITE-19240 URL: https://issues.apache.org/jira/browse/IGNITE-19240 Project: Ignite Issue Type: Bug Components: cli Reporter: Vadim Pakhnushev Assignee: Vadim Pakhnushev Currently {{NodeNameRegistryImpl.urlFromClusterNode}} uses an HTTP port when constructing URLs for completion. HTTPS port should be used if the node is configured with SSL enabled. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19238) ItDataTypesTest is flaky
[ https://issues.apache.org/jira/browse/IGNITE-19238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Lapin updated IGNITE-19238: - Description: 1. ItDataTypesTest is flaky because previous ItCreateTableDdlTest tests failed to stop replicas on node stop: !Снимок экрана от 2023-04-06 10-39-32.png! {code:java} java.lang.AssertionError: There are replicas alive [replicas=[b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_21, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_6, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_13, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_8, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_9, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_11]] at org.apache.ignite.internal.replicator.ReplicaManager.stop(ReplicaManager.java:341) at org.apache.ignite.internal.app.LifecycleManager.lambda$stopAllComponents$1(LifecycleManager.java:133) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at org.apache.ignite.internal.app.LifecycleManager.stopAllComponents(LifecycleManager.java:131) at org.apache.ignite.internal.app.LifecycleManager.stopNode(LifecycleManager.java:115){code} 2. The reason why we failed to stop replicas is the race between tablesToStopInCaseOfError cleanup and adding tables to tablesByIdVv. On TableManager stop, we stop and cleanup all table resources like replicas and raft nodes {code:java} public void stop() { ... Map tables = tablesByIdVv.latest(); // 1* cleanUpTablesResources(tables); cleanUpTablesResources(tablesToStopInCaseOfError); ... }{code} where tablesToStopInCaseOfError is a sort of pending tables list which one is cleared on cfg storage revision update. tablesByIdVv *listens same storage revision update event* in order to publish tables related to the given revision or in other words make such tables accessible from tablesByIdVv.latest(); that one that is used in order to retrieve tables for cleanup on components stop (see // 1* above) {code:java} public TableManager( ... tablesByIdVv = new IncrementalVersionedValue<>(registry, HashMap::new); registry.accept(token -> { tablesToStopInCaseOfError.clear(); return completedFuture(null); }); {code} However inside IncrementalVersionedValue we have async storageRevision update processing {code:java} updaterFuture = updaterFuture.whenComplete((v, t) -> versionedValue.complete(causalityToken, localUpdaterFuture)); {code} As a result it's possible that we will clear tablesToStopInCaseOfError before publishing same revision tables to tablesByIdVv, so that we will miss that cleared tables in tablesByIdVv.latest() which is used in TableManager#stop. was: 1. ItDataTypesTest is flaky because previous ItCreateTableDdlTest tests failed to stop replicas on node stop: !Снимок экрана от 2023-04-06 10-39-32.png! {code:java} java.lang.AssertionError: There are replicas alive [replicas=[b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_21, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_6, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_13, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_8, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_9, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_11]] at org.apache.ignite.internal.replicator.ReplicaManager.stop(ReplicaManager.java:341) at org.apache.ignite.internal.app.LifecycleManager.lambda$stopAllComponents$1(LifecycleManager.java:133) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at org.apache.ignite.internal.app.LifecycleManager.stopAllComponents(LifecycleManager.java:131) at org.apache.ignite.internal.app.LifecycleManager.stopNode(LifecycleManager.java:115){code} 2. The reason why we failed to stop replicas is the race between tablesToStopInCaseOfError cleanup and adding tables to tablesByIdVv. 2.1 On TableManager stop, we stop and cleanup all table resources like replicas and raft nodes {code:java} public void stop() { ... Map tables = tablesByIdVv.latest(); // 1* cleanUpTablesResources(tables); cleanUpTablesResources(tablesToStopInCaseOfError); ... }{code} where tablesToStopInCaseOfError is a sort of pending tables list which one is cleared on cfg storage revision update. *!* tablesByIdVv listens same storage revision update event in order to publish tables related to the given revision or in other words make such tables accessible from tablesByIdVv.latest(); that one that is used in order to retrieve tables for cleanup on components stop (see // 1* above) {code:java} public TableManager( ... tablesByIdVv = new IncrementalVersionedValue<>(registry, HashMap::new); registry.accept(token -> { tablesToStopInCaseOfError.clear(); return completedFuture(null); }); {code} However inside IncrementalVersionedValue we have async storageRevision update processing 2.2 So that, we have following flow that touches tablesToStopInCaseOfError, tablesByIdVv onCreateTable
[jira] [Updated] (IGNITE-19238) ItDataTypesTest is flaky
[ https://issues.apache.org/jira/browse/IGNITE-19238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Lapin updated IGNITE-19238: - Description: 1. ItDataTypesTest is flaky because previous ItCreateTableDdlTest tests failed to stop replicas on node stop: !Снимок экрана от 2023-04-06 10-39-32.png! {code:java} java.lang.AssertionError: There are replicas alive [replicas=[b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_21, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_6, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_13, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_8, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_9, b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_11]] at org.apache.ignite.internal.replicator.ReplicaManager.stop(ReplicaManager.java:341) at org.apache.ignite.internal.app.LifecycleManager.lambda$stopAllComponents$1(LifecycleManager.java:133) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at org.apache.ignite.internal.app.LifecycleManager.stopAllComponents(LifecycleManager.java:131) at org.apache.ignite.internal.app.LifecycleManager.stopNode(LifecycleManager.java:115){code} 2. The reason why we failed to stop replicas is the race between tablesToStopInCaseOfError cleanup and adding tables to tablesByIdVv. 2.1 On TableManager stop, we stop and cleanup all table resources like replicas and raft nodes {code:java} public void stop() { ... Map tables = tablesByIdVv.latest(); // 1* cleanUpTablesResources(tables); cleanUpTablesResources(tablesToStopInCaseOfError); ... }{code} where tablesToStopInCaseOfError is a sort of pending tables list which one is cleared on cfg storage revision update. *!* tablesByIdVv listens same storage revision update event in order to publish tables related to the given revision or in other words make such tables accessible from tablesByIdVv.latest(); that one that is used in order to retrieve tables for cleanup on components stop (see // 1* above) {code:java} public TableManager( ... tablesByIdVv = new IncrementalVersionedValue<>(registry, HashMap::new); registry.accept(token -> { tablesToStopInCaseOfError.clear(); return completedFuture(null); }); {code} However inside IncrementalVersionedValue we have async storageRevision update processing 2.2 So that, we have following flow that touches tablesToStopInCaseOfError, tablesByIdVv onCreateTable was:It > ItDataTypesTest is flaky > > > Key: IGNITE-19238 > URL: https://issues.apache.org/jira/browse/IGNITE-19238 > Project: Ignite > Issue Type: Bug >Reporter: Alexander Lapin >Assignee: Alexander Lapin >Priority: Major > Labels: ignite-3 > Attachments: Снимок экрана от 2023-04-06 10-39-32.png > > > 1. ItDataTypesTest is flaky because previous ItCreateTableDdlTest tests > failed to stop replicas on node stop: > !Снимок экрана от 2023-04-06 10-39-32.png! > > {code:java} > java.lang.AssertionError: There are replicas alive > [replicas=[b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_21, > b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_6, > b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_13, > b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_8, > b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_9, > b86c60a8-4ea3-4592-abef-6438cfc4cdb2_part_11]] > at > org.apache.ignite.internal.replicator.ReplicaManager.stop(ReplicaManager.java:341) > at > org.apache.ignite.internal.app.LifecycleManager.lambda$stopAllComponents$1(LifecycleManager.java:133) > at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) > at > org.apache.ignite.internal.app.LifecycleManager.stopAllComponents(LifecycleManager.java:131) > at > org.apache.ignite.internal.app.LifecycleManager.stopNode(LifecycleManager.java:115){code} > > 2. The reason why we failed to stop replicas is the race between > tablesToStopInCaseOfError cleanup and adding tables to tablesByIdVv. > 2.1 On TableManager stop, we stop and cleanup all table resources like > replicas and raft nodes > {code:java} > public void stop() { > ... > Map tables = tablesByIdVv.latest(); // 1* > cleanUpTablesResources(tables); > cleanUpTablesResources(tablesToStopInCaseOfError); > ... > }{code} > where tablesToStopInCaseOfError is a sort of pending tables list which one is > cleared on cfg storage revision update. > *!* tablesByIdVv listens same storage revision update event in order to > publish tables related to the given revision or in other words make such > tables accessible from tablesByIdVv.latest(); that one that is used in order > to retrieve tables for cleanup on components stop (see // 1* above) > {code:java} > public TableManager( > ... > tablesByIdVv = new IncrementalVersionedValue<>(registry, HashMap::new); > registry.accept(token -> { >
[jira] [Assigned] (IGNITE-19152) Named list support in local file configuration is broken.
[ https://issues.apache.org/jira/browse/IGNITE-19152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr reassigned IGNITE-19152: -- Assignee: Aleksandr > Named list support in local file configuration is broken. > -- > > Key: IGNITE-19152 > URL: https://issues.apache.org/jira/browse/IGNITE-19152 > Project: Ignite > Issue Type: Bug >Reporter: Mirza Aliev >Assignee: Aleksandr >Priority: Major > Labels: ignite-3 > > After IGNITE-18581 we have started to store local configuration in local > config file, instead of vault. > The current flow with the saving configuration to a file has a bug. In the > method {{LocalFileConfigurationStorage#write}} we call > {{LocalFileConfigurationStorage#saveValues}} to save configuration fields to > a file, where we call > {{{}LocalFileConfigurationStorage#renderHoconString{}}}. Named list value has > internal id which is {{{}UUID{}}}, but {{com.typesafe}} do not support > {{{}UUID{}}}, so the whole process of saving configuration to a file fails > with > {noformat} > Caused by: com.typesafe.config.ConfigException$BugOrBroken: bug in method > caller: not valid to create ConfigValue from: > 489e16e8-3123-44a3-b27d-6e410863eb24 > at > app//com.typesafe.config.impl.ConfigImpl.fromAnyRef(ConfigImpl.java:282) > at > app//com.typesafe.config.impl.PropertiesParser.fromPathMap(PropertiesParser.java:165) > at > app//com.typesafe.config.impl.PropertiesParser.fromPathMap(PropertiesParser.java:95) > at > app//com.typesafe.config.impl.ConfigImpl.fromAnyRef(ConfigImpl.java:265) > at > app//com.typesafe.config.impl.ConfigImpl.fromPathMap(ConfigImpl.java:201) > at > app//com.typesafe.config.ConfigFactory.parseMap(ConfigFactory.java:1225) > at > app//com.typesafe.config.ConfigFactory.parseMap(ConfigFactory.java:1236) > at > app//org.apache.ignite.internal.configuration.storage.LocalFileConfigurationStorage.renderHoconString(LocalFileConfigurationStorage.java:208) > at > app//org.apache.ignite.internal.configuration.storage.LocalFileConfigurationStorage.saveValues(LocalFileConfigurationStorage.java:185) > at > app//org.apache.ignite.internal.configuration.storage.LocalFileConfigurationStorage.write(LocalFileConfigurationStorage.java:138) > at > app//org.apache.ignite.internal.configuration.ConfigurationChanger.changeInternally0(ConfigurationChanger.java:606) > at > app//org.apache.ignite.internal.configuration.ConfigurationChanger.lambda$changeInternally$1(ConfigurationChanger.java:541) > {noformat} > h3. More details > The problem is trickier than it may seem. > Configuration storages receive data in "flat" data format, meaning that the > entire tree is converted into a list of pairs: > {code:java} > [{ "dot-separated key string", "serializable value" }]{code} > LocalFileConfigurationStorage interprets keys as literal paths in HOCON > representation, which is simply not correct. These keys and values also have > meta-information, associated with them, such as: > * order of elements in named list configuration > * internal ids for named list elements > To see, what's exactly in there, you may refer to the > {{{}org.apache.ignite.internal.configuration.tree.NamedListNodeTest{}}}. It > has everything laid out explicitly. > h3. Proposed fix > Well, the ideal approach would be rendering the configuration more or less > the same way, as we do it for REST. > It means calling {{ConfigurationUtil#fillFromPrefixMap}} for every local root. > Local roots can be retrieved using {{{}ConfigurationModule{}}}, by reading > them all from the class path. > Resulting nodes are converted to maps using {{{}ConverterToMapVisitor{}}}. > Then maps are converted to HOCON using its own API. > There are several hidden problems here. > * {-}we must check, that HOCON preserves order of keys{-}, and that we use > linked hash maps in {{fillFromPrefixMap}} > EDIT: HOCON sorts keys alphabetically. Ok > * {{ConverterToMapVisitor}} does not expect null nodes, because it always > works with "full" trees. Fixing it would require some fine-tuning, otherwise > one may end up with a bunch of empty nodes in the config file, which is bad > * {{ConverterToMapVisitor}} uses array syntax for named lists. You can see > it in action in {{{}HoconConverterTest{}}}. > Yes, there are two ways of representing named lists in the system. We should > make rendering mode configurable, because local configuration, at the moment, > only needs basic tree representation (for node attributes) > We should also add tests for most of these improvements. First of all, to > {{{}HoconConverterTest{}}}. > h3. Misc > Another extremely uncertain thing is the way we handle default values. This > may be a topic for another
[jira] [Commented] (IGNITE-16778) Support timestamp through jdbc
[ https://issues.apache.org/jira/browse/IGNITE-16778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17709271#comment-17709271 ] Alexander Belyak commented on IGNITE-16778: --- Now Instant, LocalTime, LocalDate, LocalDateTime supported, but not the java.sql.Timestamp. Is there are some particular reason for that? > Support timestamp through jdbc > -- > > Key: IGNITE-16778 > URL: https://issues.apache.org/jira/browse/IGNITE-16778 > Project: Ignite > Issue Type: Bug > Components: jdbc >Reporter: Alexander Belyak >Priority: Major > Labels: ignite-3 > Attachments: RunnerForTestNode.java > > > Able to use timestamp data type through KV view by LocalDateTime, but no > through jdbc setTimestamp method. Example in attachment -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19239) Checkpoint read lock acquisition timeouts during snapshot restore
[ https://issues.apache.org/jira/browse/IGNITE-19239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Shishkov updated IGNITE-19239: --- Description: There are possible error messages about checkpoint read lock acquisition timeouts and critical threads blocking during snapshot restore process (just after caches start). How to reproduce: # Set checkpoint frequency less than failure detection timeout. # Ensure, that cache groups partitions states restoring lasts more than failure detection timeout, i.e. it is actual to sufficiently large caches. Reproducer: [^BlockingThreadsOnSnapshotRestoreReproducerTest.patch] was: There are possible error messages about checkpoint read lock acquisition timeouts and critical threads blocking during snapshot restore process (just after caches start). How to reproduce: # Set checkpoint frequency is less than failure detection timeout. # Ensure, that cache groups partitions states restoring lasts more than failure detection timeout, i.e. it is actual to sufficiently large caches. Reproducer: [^BlockingThreadsOnSnapshotRestoreReproducerTest.patch] > Checkpoint read lock acquisition timeouts during snapshot restore > - > > Key: IGNITE-19239 > URL: https://issues.apache.org/jira/browse/IGNITE-19239 > Project: Ignite > Issue Type: Bug >Reporter: Ilya Shishkov >Priority: Minor > Labels: iep-43, ise > Attachments: BlockingThreadsOnSnapshotRestoreReproducerTest.patch > > > There are possible error messages about checkpoint read lock acquisition > timeouts and critical threads blocking during snapshot restore process (just > after caches start). > How to reproduce: > # Set checkpoint frequency less than failure detection timeout. > # Ensure, that cache groups partitions states restoring lasts more than > failure detection timeout, i.e. it is actual to sufficiently large caches. > Reproducer: [^BlockingThreadsOnSnapshotRestoreReproducerTest.patch] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19239) Checkpoint read lock acquisition timeouts during snapshot restore
[ https://issues.apache.org/jira/browse/IGNITE-19239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Shishkov updated IGNITE-19239: --- Description: There are possible error messages about checkpoint read lock acquisition timeouts and critical threads blocking during snapshot restore process (just after caches start). How to reproduce: # Set checkpoint frequency is less than failure detection timeout. # Ensure, that cache groups partitions states restoring lasts more than failure detection timeout, i.e. it is actual to sufficiently large caches. Reproducer: [^BlockingThreadsOnSnapshotRestoreReproducerTest.patch] was: When cache groups restore lasts more than failure detection timout > Checkpoint read lock acquisition timeouts during snapshot restore > - > > Key: IGNITE-19239 > URL: https://issues.apache.org/jira/browse/IGNITE-19239 > Project: Ignite > Issue Type: Bug >Reporter: Ilya Shishkov >Priority: Minor > Labels: iep-43, ise > Attachments: BlockingThreadsOnSnapshotRestoreReproducerTest.patch > > > There are possible error messages about checkpoint read lock acquisition > timeouts and critical threads blocking during snapshot restore process (just > after caches start). > How to reproduce: > # Set checkpoint frequency is less than failure detection timeout. > # Ensure, that cache groups partitions states restoring lasts more than > failure detection timeout, i.e. it is actual to sufficiently large caches. > Reproducer: [^BlockingThreadsOnSnapshotRestoreReproducerTest.patch] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19239) Checkpoint read lock acquisition timeouts during snapshot restore
[ https://issues.apache.org/jira/browse/IGNITE-19239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Shishkov updated IGNITE-19239: --- Attachment: BlockingThreadsOnSnapshotRestoreReproducerTest.patch > Checkpoint read lock acquisition timeouts during snapshot restore > - > > Key: IGNITE-19239 > URL: https://issues.apache.org/jira/browse/IGNITE-19239 > Project: Ignite > Issue Type: Bug >Reporter: Ilya Shishkov >Priority: Minor > Labels: iep-43, ise > Attachments: BlockingThreadsOnSnapshotRestoreReproducerTest.patch > > > When cache groups restore lasts more than failure detection timout -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IGNITE-16779) Support decimal through jdbc
[ https://issues.apache.org/jira/browse/IGNITE-16779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Belyak resolved IGNITE-16779. --- Resolution: Cannot Reproduce > Support decimal through jdbc > > > Key: IGNITE-16779 > URL: https://issues.apache.org/jira/browse/IGNITE-16779 > Project: Ignite > Issue Type: Bug > Components: jdbc >Reporter: Alexander Belyak >Priority: Major > Labels: ignite-3 > > Unable to insert values into decimal type (like decimal(12, 2)) columns > through jdbc. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-19164) Improve message about requested partitions during snapshot restore
[ https://issues.apache.org/jira/browse/IGNITE-19164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julia Bakulina reassigned IGNITE-19164: --- Assignee: Julia Bakulina > Improve message about requested partitions during snapshot restore > -- > > Key: IGNITE-19164 > URL: https://issues.apache.org/jira/browse/IGNITE-19164 > Project: Ignite > Issue Type: Task >Reporter: Ilya Shishkov >Assignee: Julia Bakulina >Priority: Minor > Labels: iep-43, ise > > Currently, during snapshot restore message is logged before requesting > partitions from remote nodes: > {quote} > [2023-03-24T18:06:59,910][INFO > ][disco-notifier-worker-#792%node%|#792%node%][SnapshotRestoreProcess] Trying > to request partitions from remote nodes > [reqId=ff682204-9554-4fbb-804c-38a79c0b286a, snapshot=snapshot_name, > map={*{color:#FF}76e22ef5-3c76-4987-bebd-9a6222a0{color}*={*{color:#FF}-903566235{color}*=[0,2,4,6,11,12,18,98,100,170,190,194,1015], > > *{color:#FF}1544803905{color}*=[1,11,17,18,22,25,27,35,37,42,45,51,62,64,67,68,73,76,1017]}}] > {quote} > It is necessary to make this output "human readable": > # Print messages per node instead of one message for all nodes. > # Print node consistent id and address. > # Print cache / group name. -- This message was sent by Atlassian Jira (v8.20.10#820010)