[ https://issues.apache.org/jira/browse/IGNITE-21860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ivan Artiukhov updated IGNITE-21860: ------------------------------------ Description: ignite3, rev. ae643bd31f329ac2ce877cc51eaa2c6a1f1a8a21 h1. Setup: * 2 server nodes h1. Test: A set of DDL operations which creates test tables with different field types, creates an index over one or more fields and then drops the index. See the attached [^drop_index_test_ddl.txt] h1. Expected result: The test passes h1. Actual result: The test hangs after executing the following set of operations: {noformat} drop table if exists dropIndexedColumn create zone if not exists "AIMEM" engine aimem create table dropIndexedColumn(k1 SMALLINT not null, v0 TINYINT not null, v1 SMALLINT not null, v2 INTEGER not null, v3 BIGINT not null, v4 VARCHAR not null, v5 TIMESTAMP not null, primary key (k1)) with PRIMARY_ZONE='AIMEM' create index dropIndexedColumn_v1idx on dropIndexedColumn using TREE (v1) {noformat} The following error is seen in the log of the first node at this time: {noformat} 2024-03-28 02:42:38:675 +0000 [WARNING][%DropColumnsTest_cluster_0%JRaft-Common-Executor-8][FSMCallerImpl] FSMCaller already in error status, ignore new error Error [type=ERROR_TYPE_LOG, status=Status[EIO<1014>: Corrupted entry at index=1534, not found]] at org.apache.ignite.raft.jraft.storage.impl.LogManagerImpl.reportError(LogManagerImpl.java:595) at org.apache.ignite.raft.jraft.storage.impl.LogManagerImpl.getEntry(LogManagerImpl.java:798) at org.apache.ignite.raft.jraft.core.Replicator.prepareEntry(Replicator.java:820) at org.apache.ignite.raft.jraft.core.Replicator.sendEntries(Replicator.java:1645) at org.apache.ignite.raft.jraft.core.Replicator.sendEntries(Replicator.java:1601) at org.apache.ignite.raft.jraft.core.Replicator.continueSending(Replicator.java:983) at org.apache.ignite.raft.jraft.core.Replicator.lambda$waitMoreEntries$9(Replicator.java:1583) at org.apache.ignite.raft.jraft.storage.impl.LogManagerImpl.runOnNewLog(LogManagerImpl.java:1205) at org.apache.ignite.raft.jraft.storage.impl.LogManagerImpl.lambda$notifyOnNewLog$9(LogManagerImpl.java:1159) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) 2024-03-28 02:42:38:674 +0000 [INFO][%DropColumnsTest_cluster_0%build-index-1][IndexBuildTask] Index build completed: [tableId=83, partitionId=0, indexId=85] 2024-03-28 02:42:38:720 +0000 [ERROR][%DropColumnsTest_cluster_0%JRaft-Common-Executor-4][NodeImpl] Node <metastorage_group/DropColumnsTest_cluster_0> append [0, 0] failed, status=Status[EIO<1014>: Corrupted LogStorage]. 2024-03-28 02:42:38:720 +0000 [ERROR][%DropColumnsTest_cluster_0%JRaft-Common-Executor-9][NodeImpl] Node <metastorage_group/DropColumnsTest_cluster_0> append [0, 0] failed, status=Status[EIO<1014>: Corrupted LogStorage]. 2024-03-28 02:42:38:720 +0000 [ERROR][%DropColumnsTest_cluster_0%JRaft-Common-Executor-3][NodeImpl] Node <metastorage_group/DropColumnsTest_cluster_0> append [0, 0] failed, status=Status[EIO<1014>: Corrupted LogStorage]. 2024-03-28 02:42:38:720 +0000 [ERROR][%DropColumnsTest_cluster_0%JRaft-Common-Executor-5][NodeImpl] Node <metastorage_group/DropColumnsTest_cluster_0> append [0, 0] failed, status=Status[EIO<1014>: Corrupted LogStorage]. 2024-03-28 02:42:38:720 +0000 [ERROR][%DropColumnsTest_cluster_0%JRaft-Common-Executor-0][NodeImpl] Node <metastorage_group/DropColumnsTest_cluster_0> append [0, 0] failed, status=Status[EIO<1014>: Corrupted LogStorage]. 2024-03-28 02:42:38:720 +0000 [ERROR][%DropColumnsTest_cluster_0%JRaft-Common-Executor-1][NodeImpl] Node <metastorage_group/DropColumnsTest_cluster_0> append [0, 0] failed, status=Status[EIO<1014>: Corrupted LogStorage]. {noformat} On the second node: {noformat} 2024-03-28 02:42:49:143 +0000 [ERROR][%DropColumnsTest_cluster_1%Raft-Group-Client-8][IndexAvailabilityController] Error processing the operation to delete the partition index building key: [indexId=85, partitionId=11] java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346) at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:632) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) at org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:550) at org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$handleErrorResponse$44(RaftGroupServiceImpl.java:653) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: java.util.concurrent.TimeoutException ... 8 more {noformat} After this there is no way to execute the following DDL (it hangs): {{drop table if exists dropNoMoreIndexedColumn}} h1. Notes: * Reproduced only once and only with {{aimem}} storage engine. was: ignite3, rev. ae643bd31f329ac2ce877cc51eaa2c6a1f1a8a21 h1. Setup: * 2 server nodes h1. Test: A set of DDL operations which creates test tables with different field types, creates an index over one or more fields and then drops the index. See the attached [^drop_index_test_ddl.txt] h1. Expected result: The test passes h1. Actual result: The test hangs after executing the following set of operations: {noformat} drop table if exists dropIndexedColumn create zone if not exists "AIMEM" engine aimem create table dropIndexedColumn(k1 SMALLINT not null, v0 TINYINT not null, v1 SMALLINT not null, v2 INTEGER not null, v3 BIGINT not null, v4 VARCHAR not null, v5 TIMESTAMP not null, primary key (k1)) with PRIMARY_ZONE='AIMEM' create index dropIndexedColumn_v1idx on dropIndexedColumn using TREE (v1) {noformat} The following error is seen in the log of the first node at this time: {noformat} 2024-03-28 02:42:38:675 +0000 [WARNING][%DropColumnsTest_cluster_0%JRaft-Common-Executor-8][FSMCallerImpl] FSMCaller already in error status, ignore new error Error [type=ERROR_TYPE_LOG, status=Status[EIO<1014>: Corrupted entry at index=1534, not found]] at org.apache.ignite.raft.jraft.storage.impl.LogManagerImpl.reportError(LogManagerImpl.java:595) at org.apache.ignite.raft.jraft.storage.impl.LogManagerImpl.getEntry(LogManagerImpl.java:798) at org.apache.ignite.raft.jraft.core.Replicator.prepareEntry(Replicator.java:820) at org.apache.ignite.raft.jraft.core.Replicator.sendEntries(Replicator.java:1645) at org.apache.ignite.raft.jraft.core.Replicator.sendEntries(Replicator.java:1601) at org.apache.ignite.raft.jraft.core.Replicator.continueSending(Replicator.java:983) at org.apache.ignite.raft.jraft.core.Replicator.lambda$waitMoreEntries$9(Replicator.java:1583) at org.apache.ignite.raft.jraft.storage.impl.LogManagerImpl.runOnNewLog(LogManagerImpl.java:1205) at org.apache.ignite.raft.jraft.storage.impl.LogManagerImpl.lambda$notifyOnNewLog$9(LogManagerImpl.java:1159) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) {noformat} On the second node: {noformat} 2024-03-28 02:42:49:143 +0000 [ERROR][%DropColumnsTest_cluster_1%Raft-Group-Client-8][IndexAvailabilityController] Error processing the operation to delete the partition index building key: [indexId=85, partitionId=11] java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346) at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:632) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) at org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:550) at org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$handleErrorResponse$44(RaftGroupServiceImpl.java:653) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: java.util.concurrent.TimeoutException ... 8 more {noformat} After this there is no way to execute the following DDL (it hangs): {{drop table if exists dropNoMoreIndexedColumn}} h1. Notes: * Reproduced only once and only with {{aimem}} storage engine. > DDL operations hang after creating an index > ------------------------------------------- > > Key: IGNITE-21860 > URL: https://issues.apache.org/jira/browse/IGNITE-21860 > Project: Ignite > Issue Type: Bug > Components: sql > Affects Versions: 3.0.0-beta1 > Reporter: Ivan Artiukhov > Priority: Major > Labels: ignite-3 > Attachments: drop_index_test_ddl.txt, node1.log, node2.log > > > ignite3, rev. ae643bd31f329ac2ce877cc51eaa2c6a1f1a8a21 > h1. Setup: > * 2 server nodes > h1. Test: > A set of DDL operations which creates test tables with different field types, > creates an index over one or more fields and then drops the index. See the > attached [^drop_index_test_ddl.txt] > h1. Expected result: > The test passes > h1. Actual result: > The test hangs after executing the following set of operations: > > {noformat} > drop table if exists dropIndexedColumn > create zone if not exists "AIMEM" engine aimem > create table dropIndexedColumn(k1 SMALLINT not null, v0 TINYINT not null, v1 > SMALLINT not null, v2 INTEGER not null, v3 BIGINT not null, v4 VARCHAR not > null, v5 TIMESTAMP not null, primary key (k1)) with PRIMARY_ZONE='AIMEM' > create index dropIndexedColumn_v1idx on dropIndexedColumn using TREE (v1) > {noformat} > The following error is seen in the log of the first node at this time: > > {noformat} > 2024-03-28 02:42:38:675 +0000 > [WARNING][%DropColumnsTest_cluster_0%JRaft-Common-Executor-8][FSMCallerImpl] > FSMCaller already in error status, ignore new error > Error [type=ERROR_TYPE_LOG, status=Status[EIO<1014>: Corrupted entry at > index=1534, not found]] > at > org.apache.ignite.raft.jraft.storage.impl.LogManagerImpl.reportError(LogManagerImpl.java:595) > at > org.apache.ignite.raft.jraft.storage.impl.LogManagerImpl.getEntry(LogManagerImpl.java:798) > at > org.apache.ignite.raft.jraft.core.Replicator.prepareEntry(Replicator.java:820) > at > org.apache.ignite.raft.jraft.core.Replicator.sendEntries(Replicator.java:1645) > at > org.apache.ignite.raft.jraft.core.Replicator.sendEntries(Replicator.java:1601) > at > org.apache.ignite.raft.jraft.core.Replicator.continueSending(Replicator.java:983) > at > org.apache.ignite.raft.jraft.core.Replicator.lambda$waitMoreEntries$9(Replicator.java:1583) > at > org.apache.ignite.raft.jraft.storage.impl.LogManagerImpl.runOnNewLog(LogManagerImpl.java:1205) > at > org.apache.ignite.raft.jraft.storage.impl.LogManagerImpl.lambda$notifyOnNewLog$9(LogManagerImpl.java:1159) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > 2024-03-28 02:42:38:674 +0000 > [INFO][%DropColumnsTest_cluster_0%build-index-1][IndexBuildTask] Index build > completed: [tableId=83, partitionId=0, indexId=85] > 2024-03-28 02:42:38:720 +0000 > [ERROR][%DropColumnsTest_cluster_0%JRaft-Common-Executor-4][NodeImpl] Node > <metastorage_group/DropColumnsTest_cluster_0> append [0, 0] failed, > status=Status[EIO<1014>: Corrupted LogStorage]. > 2024-03-28 02:42:38:720 +0000 > [ERROR][%DropColumnsTest_cluster_0%JRaft-Common-Executor-9][NodeImpl] Node > <metastorage_group/DropColumnsTest_cluster_0> append [0, 0] failed, > status=Status[EIO<1014>: Corrupted LogStorage]. > 2024-03-28 02:42:38:720 +0000 > [ERROR][%DropColumnsTest_cluster_0%JRaft-Common-Executor-3][NodeImpl] Node > <metastorage_group/DropColumnsTest_cluster_0> append [0, 0] failed, > status=Status[EIO<1014>: Corrupted LogStorage]. > 2024-03-28 02:42:38:720 +0000 > [ERROR][%DropColumnsTest_cluster_0%JRaft-Common-Executor-5][NodeImpl] Node > <metastorage_group/DropColumnsTest_cluster_0> append [0, 0] failed, > status=Status[EIO<1014>: Corrupted LogStorage]. > 2024-03-28 02:42:38:720 +0000 > [ERROR][%DropColumnsTest_cluster_0%JRaft-Common-Executor-0][NodeImpl] Node > <metastorage_group/DropColumnsTest_cluster_0> append [0, 0] failed, > status=Status[EIO<1014>: Corrupted LogStorage]. > 2024-03-28 02:42:38:720 +0000 > [ERROR][%DropColumnsTest_cluster_0%JRaft-Common-Executor-1][NodeImpl] Node > <metastorage_group/DropColumnsTest_cluster_0> append [0, 0] failed, > status=Status[EIO<1014>: Corrupted LogStorage]. > {noformat} > On the second node: > {noformat} > 2024-03-28 02:42:49:143 +0000 > [ERROR][%DropColumnsTest_cluster_1%Raft-Group-Client-8][IndexAvailabilityController] > Error processing the operation to delete the partition index building key: > [indexId=85, partitionId=11] > java.util.concurrent.CompletionException: > java.util.concurrent.TimeoutException > at > java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) > at > java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346) > at > java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:632) > at > java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > at > java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) > at > org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:550) > at > org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$handleErrorResponse$44(RaftGroupServiceImpl.java:653) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > Caused by: java.util.concurrent.TimeoutException > ... 8 more > {noformat} > After this there is no way to execute the following DDL (it hangs): {{drop > table if exists dropNoMoreIndexedColumn}} > h1. Notes: > * Reproduced only once and only with {{aimem}} storage engine. -- This message was sent by Atlassian Jira (v8.20.10#820010)