[jira] [Updated] (IGNITE-22819) Metastorage revisions inconsistency
[ https://issues.apache.org/jira/browse/IGNITE-22819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22819: --- Description: Following situation might happen: {code:java} [2024-07-24T09:29:17,220][INFO ][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:220 +0300, logical=0, composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:220 +0300, logical=1, composite=112840052389969921], removedEntriesCount=0, cacheSize=240]. ... [2024-07-24T09:29:17,257][INFO ][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=0, composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=1, composite=112840052392394753], removedEntriesCount=3, cacheSize=237]. [2024-07-24T09:29:17,257][INFO ][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=0, composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=1, composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code} Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, because of the difference of their clocks. {{evictIdempotentCommandsCache}} works differently on different nodes for the same raft commands. The real problem here is that it might (or might not) call the {{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would increase local revision. Revision is always local, it's never replicated. Revision mismatch leads to different evaluation of conditions in conditional updates and invokes. Simple example of such an issue would be a skipped configuration update on one or several nodes in cluster. What can we do about it: * make an alternative for {{removeAll}} that doesn't increase local revision * call {{removeAll}} even if the list is empty * never invalidate cache locally, but rather replicate cache invalidation with a special command * there's a TODO that says "clear this during compaction". That's a bad option, it would lead to either frequent compactions, or huge memory overheads was: Following situation might happen: {code:java} [2024-07-24T09:29:17,220][INFO ][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:220 +0300, logical=0, composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:220 +0300, logical=1, composite=112840052389969921], removedEntriesCount=0, cacheSize=240]. ... [2024-07-24T09:29:17,257][INFO ][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=0, composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=1, composite=112840052392394753], removedEntriesCount=3, cacheSize=237]. [2024-07-24T09:29:17,257][INFO ][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=0, composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=1, composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code} Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, because of the difference of their clocks. {{evictIdempotentCommandsCache}} works differently on different nodes for the same raft commands. The real problem here is that it might (or might not) call the {{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would increase local revision. Revision is always local, it's never replicated. Revision mismatch leads to different evaluation of conditions in conditional updates and invokes. Simple example of such an issue would be a skipped configuration update on one or several nodes in cluster. What can we do about it: * make an alternative for {{removeAll}} that doesn't increase local revision * never invalidate cache locally, but rather replicate cache invalidation with a special command * there's a TODO that says "clear this during compaction". That's a
[jira] [Updated] (IGNITE-22819) Metastorage revisions inconsistency
[ https://issues.apache.org/jira/browse/IGNITE-22819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22819: --- Description: Following situation might happen: {code:java} [2024-07-24T09:29:17,220][INFO ][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:220 +0300, logical=0, composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:220 +0300, logical=1, composite=112840052389969921], removedEntriesCount=0, cacheSize=240]. ... [2024-07-24T09:29:17,257][INFO ][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=0, composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=1, composite=112840052392394753], removedEntriesCount=3, cacheSize=237]. [2024-07-24T09:29:17,257][INFO ][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=0, composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=1, composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code} Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, because of the difference of their clocks. {{evictIdempotentCommandsCache}} works differently on different nodes for the same raft commands. The real problem here is that it might (or might not) call the {{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would increase local revision. Revision is always local, it's never replicated. Revision mismatch leads to different evaluation of conditions in conditional updates and invokes. Simple example of such an issue would be a skipped configuration update on one or several nodes in cluster. What can we do about it: * make an alternative for {{removeAll}} that doesn't increase local revision * never invalidate cache locally, but rather replicate cache invalidation with a special command * there's a TODO that says "clear this during compaction". That's a bad option, it would lead to either frequent compactions, or huge memory overheads was: Following situation might happen: {code:java} [2024-07-24T09:29:17,220][INFO ][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:220 +0300, logical=0, composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:220 +0300, logical=1, composite=112840052389969921], removedEntriesCount=0, cacheSize=240]. ... [2024-07-24T09:29:17,257][INFO ][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=0, composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=1, composite=112840052392394753], removedEntriesCount=3, cacheSize=237]. [2024-07-24T09:29:17,257][INFO ][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=0, composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=1, composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code} Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, because of the difference of their clocks. {{evictIdempotentCommandsCache}} works differently on different nodes for the same raft commands. The real problem here is that it might (or might not) call the {{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would increase local revision. Revision is always local, it's never replicated. Revision mismatch leads to different evaluation of conditions in conditional updates and invokes. Simple example of such an issue would be a skipped configuration update on one or several nodes in cluster. What can we do about it: * make an alternative for {{removeAll}} that doesn't increase local revision * never invalidate cache locally, but rather replicate cache invalidation with a special command * there's a TODO that says "clear this during compaction". That's a bad option, it would lead to either
[jira] [Updated] (IGNITE-22819) Metastorage revisions inconsistency
[ https://issues.apache.org/jira/browse/IGNITE-22819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22819: --- Description: Following situation might happen: {code:java} [2024-07-24T09:29:17,220][INFO ][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:220 +0300, logical=0, composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:220 +0300, logical=1, composite=112840052389969921], removedEntriesCount=0, cacheSize=240]. ... [2024-07-24T09:29:17,257][INFO ][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=0, composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=1, composite=112840052392394753], removedEntriesCount=3, cacheSize=237]. [2024-07-24T09:29:17,257][INFO ][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=0, composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=1, composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code} Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, because of the difference of their clocks. {{evictIdempotentCommandsCache}} works differently on different nodes for the same raft commands. The real problem here is that it might (or might not) call the {{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would increase local revision. Revision is always local, it's never replicated. Revision mismatch leads to different evaluation of conditions in conditional updates and invokes. Simple example of such an issue would be a skipped configuration update on one or several nodes in cluster. What can we do about it: * make an alternative for {{removeAll}} that doesn't increase local revision * never invalidate cache locally, but rather replicate cache invalidation with a special command * there's a TODO that says "clear this during compaction". That's a bad option, it would lead to either frequent compactions, or huge memory overheads was: Following situation might happen: {code:java} [2024-07-24T09:29:17,220][INFO ][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:220 +0300, logical=0, composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:220 +0300, logical=1, composite=112840052389969921], removedEntriesCount=0, cacheSize=240]. ... [2024-07-24T09:29:17,257][INFO ][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=0, composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=1, composite=112840052392394753], removedEntriesCount=3, cacheSize=237]. [2024-07-24T09:29:17,257][INFO ][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=0, composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=1, composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code} Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, because of the difference of their clocks. {{evictIdempotentCommandsCache}} works differently on different nodes for the same raft commands. The real problem here is that it might (or might not) call the {{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would increase local revision. Revision is always local, it's never replicated. Revision mismatch leads to different evaluation of conditions in conditional updates and invokes. Simple example of such an issue would be a skipped configuration update on one of or several nodes in cluster. > Metastorage revisions inconsistency > --- > > Key: IGNITE-22819 > URL: https://issues.apache.org/jira/browse/IGNITE-22819 > Project: Ignite > Issue Type: Bug >Reporter: Ivan Bessonov >Priority:
[jira] [Created] (IGNITE-22819) Metastorage revisions inconsistency
Ivan Bessonov created IGNITE-22819: -- Summary: Metastorage revisions inconsistency Key: IGNITE-22819 URL: https://issues.apache.org/jira/browse/IGNITE-22819 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Following situation might happen: {code:java} [2024-07-24T09:29:17,220][INFO ][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:220 +0300, logical=0, composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:220 +0300, logical=1, composite=112840052389969921], removedEntriesCount=0, cacheSize=240]. ... [2024-07-24T09:29:17,257][INFO ][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=0, composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=1, composite=112840052392394753], removedEntriesCount=3, cacheSize=237]. [2024-07-24T09:29:17,257][INFO ][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=0, composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=1, composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code} Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, because of the difference of their clocks. {{evictIdempotentCommandsCache}} works differently on different nodes for the same raft commands. The real problem here is that it might (or might not) call the {{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would increase local revision. Revision is always local, it's never replicated. Revision mismatch leads to different evaluation of conditions in conditional updates and invokes. Simple example of such an issue would be a skipped configuration update on one of or several nodes in cluster. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-22736) PartitionCommandsMarshallerImpl corrupts the buffer it reads from
[ https://issues.apache.org/jira/browse/IGNITE-22736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov reassigned IGNITE-22736: -- Fix Version/s: 3.0.0-beta2 Assignee: Ivan Bessonov Labels: ignite-3 (was: ) > PartitionCommandsMarshallerImpl corrupts the buffer it reads from > - > > Key: IGNITE-22736 > URL: https://issues.apache.org/jira/browse/IGNITE-22736 > Project: Ignite > Issue Type: Bug >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > {{PartitionCommandsMarshallerImpl#unmarshall}} receives a buffer, that's > requested from the log manager, for example. > The instance of byte buffer that it receives might be acquired from on-heap > cache of log entries. Modifying it would be > # not thread-safe, because multiple threads may start modifying it > concurrently > # illegal, because it stays in the cache for some time, and we basically > corrupt it by modifying it > We shouldn't do that -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-22736) PartitionCommandsMarshallerImpl corrupts the buffer it reads from
Ivan Bessonov created IGNITE-22736: -- Summary: PartitionCommandsMarshallerImpl corrupts the buffer it reads from Key: IGNITE-22736 URL: https://issues.apache.org/jira/browse/IGNITE-22736 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov {{PartitionCommandsMarshallerImpl#unmarshall}} receives a buffer, that's requested from the log manager, for example. The instance of byte buffer that it receives might be acquired from on-heap cache of log entries. Modifying it would be # not thread-safe, because multiple threads may start modifying it concurrently # illegal, because it stays in the cache for some time, and we basically corrupt it by modifying it We shouldn't do that -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-22657) Investigate why ItDisasterRecoveryReconfigurationTest#testIncompleteRebalanceAfterResetPartitions fails without sleep
Ivan Bessonov created IGNITE-22657: -- Summary: Investigate why ItDisasterRecoveryReconfigurationTest#testIncompleteRebalanceAfterResetPartitions fails without sleep Key: IGNITE-22657 URL: https://issues.apache.org/jira/browse/IGNITE-22657 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-21303) Exclude nodes in "error" state from manual group reconfiguration
[ https://issues.apache.org/jira/browse/IGNITE-21303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov reassigned IGNITE-21303: -- Assignee: Ivan Bessonov > Exclude nodes in "error" state from manual group reconfiguration > > > Key: IGNITE-21303 > URL: https://issues.apache.org/jira/browse/IGNITE-21303 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > Instead of simply using existing set of node as a baseline for new > assignments, we should either exclude peers in ERROR state from it, or force > data cleanup on such nodes. Third option - forbid such reconfiguration, > forcing user to clear ERROR peers in advance -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IGNITE-22500) Remove unnecessary waits when creating an index
[ https://issues.apache.org/jira/browse/IGNITE-22500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov resolved IGNITE-22500. Resolution: Won't Fix About eliminating a BUILDING status from catalog, we can't simply change a few lines, this task involves more changes. To my understanding, following nuances are important: * ChangeIndexStatusTask should be changed. If we remove REGISTERED->BUILDING change, then we wouldn't have to update catalog, this will lead to small refactoring. * We would have to create {{CatalogEvent.INDEX_BUILDING}} event instead of updating the catalog. * This event will have nothing to do with catalog at this point, it should be renamed. * It will *not* be fired in a context of meta-storage watch execution, which might be a problem if listener implementations rely on it. Spoiler: they do. * Local recovery and other such stuff will be changed slightly, this part shouldn't be that hard. Overall, I don't think that we should do such an optimization in this issue specifically. It's not about "removing wait that we don't need", it's about changing the internal protocol of index creation. I will file another Jira for that soon > Remove unnecessary waits when creating an index > --- > > Key: IGNITE-22500 > URL: https://issues.apache.org/jira/browse/IGNITE-22500 > Project: Ignite > Issue Type: Improvement >Reporter: Roman Puchkovskiy >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > When creating an index with current defaults (DelayDuration=1sec, > MaxClockSkew=500ms, IdleSafeTimePropagationPeriod=1sec), it takes 6-6.5 > seconds on my machine (without concurrent transactions, on an empty table > that was just created). > According to the design, we need to first wait for the REGISTERED state to > activate on all nodes, including the ones that are currently down; this is to > make sure that all transactions started on schema versions before the index > creation have finished before we start to build the index (this makes us > waiting for DelayDuration+MaxClockSkew). Then, after the build finishes, we > switch the index to the AVAILABLE state. This requires another wait of > DelayDuration+MaxClockSkew. > Because of IGNITE-20378, in the second case we actually wait longer (for > additional IdleSafeTimePropagationPeriod+MaxClockSkew). > The total of waits is thus 1.5+3=4.5sec. But index creation actually takes > 6-6.5 seconds. It looks like there are some additional delays (like > submitting to the Metastorage and executing its watches). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-22500) Remove unnecessary waits when creating an index
[ https://issues.apache.org/jira/browse/IGNITE-22500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17859937#comment-17859937 ] Ivan Bessonov commented on IGNITE-22500: My thoughts on the topic: * _We have additional switch from REGISTERED to BUILDING, which can in theory be eliminated from catalog, it'll save us additional second (DD is 500ms now)_ * We can't lower DD for a specific status change, because it would violate schema synchronization protocol. After waiting for "msSafeTime - DD - skew" (don't remember precise rules about clock skew) we rely on the fact that the catalog is up-to-date, breaking that invariant would lead to some unforeseen consequences. * What we really need it: ** The ability to create indexes in the same DDL as the table itself. We do this implicitly for PK. For other indexes it's only a question of API ** For SQL scripts we could batch consecutive DDLs and create indexes at the same time as a table implicitly, which seems like an optimal choice. This way we don't need any special syntax ** Some DDL queries can be executed in parallel, why not. Again, seems more like a SQL issue to me > Remove unnecessary waits when creating an index > --- > > Key: IGNITE-22500 > URL: https://issues.apache.org/jira/browse/IGNITE-22500 > Project: Ignite > Issue Type: Improvement >Reporter: Roman Puchkovskiy >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > When creating an index with current defaults (DelayDuration=1sec, > MaxClockSkew=500ms, IdleSafeTimePropagationPeriod=1sec), it takes 6-6.5 > seconds on my machine (without concurrent transactions, on an empty table > that was just created). > According to the design, we need to first wait for the REGISTERED state to > activate on all nodes, including the ones that are currently down; this is to > make sure that all transactions started on schema versions before the index > creation have finished before we start to build the index (this makes us > waiting for DelayDuration+MaxClockSkew). Then, after the build finishes, we > switch the index to the AVAILABLE state. This requires another wait of > DelayDuration+MaxClockSkew. > Because of IGNITE-20378, in the second case we actually wait longer (for > additional IdleSafeTimePropagationPeriod+MaxClockSkew). > The total of waits is thus 1.5+3=4.5sec. But index creation actually takes > 6-6.5 seconds. It looks like there are some additional delays (like > submitting to the Metastorage and executing its watches). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-22561) Get rid of ByteString in messages
Ivan Bessonov created IGNITE-22561: -- Summary: Get rid of ByteString in messages Key: IGNITE-22561 URL: https://issues.apache.org/jira/browse/IGNITE-22561 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Here I would include two types of improvements: * {{@Marshallable ByteString}} - this pattern became obsolete long time ago. {{ByteBuffer}} type is natively supported by the protocol, and it should eliminate unnecessary data copying, potentioally making the system faster * Pretty much the same thing, but for {{{}byte[]{}}}. It's used in classes like {{{}org.apache.ignite.internal.metastorage.dsl.Operation{}}}. If we migrate these properties to {{ByteBuffer}} then deserialization will become significantly faster, but in order to utilize it we would have to change internal metastorage implementation a little bit (like optimizing memory usage in {{{}RocksDbKeyValueStorage#addDataToBatch{}}}). If it requires too many changes then I propose doing it in a separate JIRA. My assumption - it will not require too many changes, but we'll see. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (IGNITE-22544) Commands marshalling appears to be slow
[ https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17859613#comment-17859613 ] Ivan Bessonov edited comment on IGNITE-22544 at 6/24/24 10:00 AM: -- According to {{{}org.apache.ignite.internal.benchmarks.UpdateCommandsMarshalingMicroBenchmark{}}}. Before: {code:java} Benchmark (payloadSize) (updateAll) Mode Cnt Score Error Units UpdateCommandsMarshalingMicroBenchmark.marshal 128false thrpt5 2361.249 ± 66.884 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 128 true thrpt552.377 ± 3.769 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048false thrpt5 1713.443 ± 331.795 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048 true thrpt514.916 ± 2.230 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192false thrpt5 833.372 ± 227.738 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192 true thrpt5 3.281 ± 0.906 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal128false thrpt5 2090.845 ± 792.226 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal128 true thrpt551.393 ± 16.872 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048false thrpt5 2188.459 ± 69.423 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048 true thrpt552.705 ± 2.771 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192false thrpt5 2174.810 ± 61.331 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192 true thrpt553.805 ± 1.000 ops/ms {code} After: {code:java} Benchmark (payloadSize) (updateAll) Mode Cnt Score Error Units UpdateCommandsMarshalingMicroBenchmark.marshal 128 false thrpt 5 4389.765 ± 66.332 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 128 true thrpt 5 79.684 ± 0.965 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048 false thrpt 5 2754.506 ± 58.151 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048 true thrpt 5 17.435 ± 0.267 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192 false thrpt 5 1066.381 ± 10.254 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192 true thrpt 5 3.389 ± 0.688 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 128 false thrpt 5 2782.648 ± 173.791 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 128 true thrpt 5 69.952 ± 9.109 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048 false thrpt 5 2752.568 ± 50.796 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048 true thrpt 5 63.721 ± 2.902 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192 false thrpt 5 2676.343 ± 1209.184 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192 true thrpt 5 62.139 ± 17.144 ops/ms {code} Short summary: * Depending on the number of byte arrays inside of the message (which can't be optimized), marshaling became from 0% to 85% faster according to created benchmark, due to a combination of a lot of different optimizations, such as ** avoiding the creation of serializers ** simpler and slightly faster byte buffers pool ** better binary UUID format ** low-level stuff in direct stream ** better {{writeVarInt}} / {{writeVarLong}} * If we take a look at the flamegraph, we could see that serialization itself is about 1.5-2.0 times slower than the following {{{}Arrays.copyOf{}}}, which is pretty good in my opinion. * Reading speed wasn't so thoroughly checked in this issue, I created another one: https://issues.apache.org/jira/browse/IGNITE-22559 Overall, reading speed doesn't depend on the size of individual byte buffers, because we simple wrap the original array. Other then that, current optimizations show 15%-35% increase in deserialization speed, due to ** {{...StreamImplV1}} optimizations ** faster {{readInt}} / {{readLong}} ** better binary UUID format * Further optimizations for reads are required. Here I mostly focused on writing speed. Reading speed turned out to be worse than writing speed for small commands, I don't like it. was (Author: ibessonov): According to {{{}org.apache.ignite.internal.benchmarks.UpdateCommandsMarshalingMicroBenchmark{}}}. Before: {code:java} Benchmark
[jira] [Comment Edited] (IGNITE-22544) Commands marshalling appears to be slow
[ https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17859613#comment-17859613 ] Ivan Bessonov edited comment on IGNITE-22544 at 6/24/24 9:59 AM: - According to {{{}org.apache.ignite.internal.benchmarks.UpdateCommandsMarshalingMicroBenchmark{}}}. Before: {code:java} Benchmark (payloadSize) (updateAll) Mode Cnt Score Error Units UpdateCommandsMarshalingMicroBenchmark.marshal 128false thrpt5 2361.249 ± 66.884 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 128 true thrpt552.377 ± 3.769 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048false thrpt5 1713.443 ± 331.795 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048 true thrpt514.916 ± 2.230 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192false thrpt5 833.372 ± 227.738 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192 true thrpt5 3.281 ± 0.906 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal128false thrpt5 2090.845 ± 792.226 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal128 true thrpt551.393 ± 16.872 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048false thrpt5 2188.459 ± 69.423 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048 true thrpt552.705 ± 2.771 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192false thrpt5 2174.810 ± 61.331 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192 true thrpt553.805 ± 1.000 ops/ms {code} After: {code:java} Benchmark (payloadSize) (updateAll) Mode Cnt Score Error Units UpdateCommandsMarshalingMicroBenchmark.marshal 128 false thrpt 5 4389.765 ± 66.332 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 128 true thrpt 5 79.684 ± 0.965 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048 false thrpt 5 2754.506 ± 58.151 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048 true thrpt 5 17.435 ± 0.267 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192 false thrpt 5 1066.381 ± 10.254 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192 true thrpt 5 3.389 ± 0.688 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 128 false thrpt 5 2782.648 ± 173.791 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 128 true thrpt 5 69.952 ± 9.109 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048 false thrpt 5 2752.568 ± 50.796 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048 true thrpt 5 63.721 ± 2.902 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192 false thrpt 5 2676.343 ± 1209.184 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192 true thrpt 5 62.139 ± 17.144 ops/ms {code} Short summary: * Depending on the number of byte arrays inside of the message (which can't be optimized), marshaling became from 0% to 85% faster according to created benchmark, due to a combination of a lot of different optimizations, such as ** avoiding the creation of serializers ** simpler and slightly faster byte buffers pool ** better binary UUID format ** low-level stuff in direct stream ** better {{writeVarInt}} / {{writeVarLong}} * If we take a look at the flamegraph, we could see that serialization itself is about 1.5-2.0 times slower than the following {{{}Arrays.copyOf{}}}, which is pretty good in my opinion. * Reading speed wasn't so thoroughly checked in this issue, I created another one: https://issues.apache.org/jira/browse/IGNITE-22559 Overall, reading speed doesn't depend on the size of individual byte buffers, because we simple wrap the original array. Other then that, current optimizations show 15%-35% increase in deserialization speed, due to ** {{...StreamImplV1}} optimizations ** faster {{readInt}} / {{readLong}} ** better binary UUID format * * Further optimizations for reads are required. Here I mostly focused on writing speed. Reading speed turned out to be worse than writing speed for small commands, I don't like it. was (Author: ibessonov): According to {{{}org.apache.ignite.internal.benchmarks.UpdateCommandsMarshalingMicroBenchmark{}}}. Before: {code:java} Benchmark
[jira] [Updated] (IGNITE-22544) Commands marshalling appears to be slow
[ https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22544: --- Reviewer: Philipp Shergalis > Commands marshalling appears to be slow > --- > > Key: IGNITE-22544 > URL: https://issues.apache.org/jira/browse/IGNITE-22544 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3, ignite3_performance > Attachments: IGNITE-22544.patch > > Time Spent: 10m > Remaining Estimate: 0h > > We should benchmark the way we marshal commands using optimized marshaller > and make it faster. Some obvious places: > * byte buffers pool - we can replace queue with a manual implementation of > Treiber stack, it's trivial and doesn't use as many CAS/volatile operations > * new serializers are allocated every time, but they can be put into static > final constants instead, or cached in fields of corresponding factories > * we can create a serialization factory per group, not per message, this way > we will remove unnecessary indirection. Group factory can use {{{}switch{}}}, > like in Ignite 2, which would basically lead to static dispatch of > deserializer constructors and static access to serializers, instead of > dynamic dispatch (virtual call), which should be noticeably faster > * profiler might show other simple places, we must also compare > {{OptimizedMarshaller}} against other serialization algorithms in benchmarks > EDIT: quick draft attached, it addresses points 1 and 2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-22544) Commands marshalling appears to be slow
[ https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17859613#comment-17859613 ] Ivan Bessonov commented on IGNITE-22544: According to {{{}org.apache.ignite.internal.benchmarks.UpdateCommandsMarshalingMicroBenchmark{}}}. Before: {code:java} Benchmark (payloadSize) (updateAll) Mode Cnt Score Error Units UpdateCommandsMarshalingMicroBenchmark.marshal 128false thrpt5 2361.249 ± 66.884 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 128 true thrpt552.377 ± 3.769 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048false thrpt5 1713.443 ± 331.795 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048 true thrpt514.916 ± 2.230 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192false thrpt5 833.372 ± 227.738 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192 true thrpt5 3.281 ± 0.906 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal128false thrpt5 2090.845 ± 792.226 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal128 true thrpt551.393 ± 16.872 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048false thrpt5 2188.459 ± 69.423 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048 true thrpt552.705 ± 2.771 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192false thrpt5 2174.810 ± 61.331 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192 true thrpt553.805 ± 1.000 ops/ms {code} After: {code:java} Benchmark (payloadSize) (updateAll) Mode Cnt Score Error Units UpdateCommandsMarshalingMicroBenchmark.marshal 128 false thrpt 5 4389.765 ± 66.332 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 128 true thrpt 5 79.684 ± 0.965 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048 false thrpt 5 2754.506 ± 58.151 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048 true thrpt 5 17.435 ± 0.267 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192 false thrpt 5 1066.381 ± 10.254 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192 true thrpt 5 3.389 ± 0.688 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 128 false thrpt 5 2782.648 ± 173.791 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 128 true thrpt 5 69.952 ± 9.109 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048 false thrpt 5 2752.568 ± 50.796 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048 true thrpt 5 63.721 ± 2.902 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192 false thrpt 5 2676.343 ± 1209.184 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192 true thrpt 5 62.139 ± 17.144 ops/ms {code} Short summary: * Depending on the number of byte arrays inside of the message (which can't be optimized), marshaling became from 0% to 85% faster according to created benchmark, due to a combination of a lot of different optimizations, such as ** avoiding the creation of serializers ** simpler and slightly faster byte buffers pool ** better binary UUID format ** low-level stuff in direct stream ** better {{writeVarInt}} / {{writeVarLong}} * If we take a look at the flamegraph, we could see that serialization itself is about 1.5-2.0 times slower than the following {{{}Arrays.copyOf{}}}, which is pretty good in my opinion. * Reading speed wasn't so thoroughly checked in this issue, I created another one: https://issues.apache.org/jira/browse/IGNITE-22559 Overall, reading speed doesn't depend on the size of individual byte buffers, because we simple wrap the original array. Other then that, current optimizations show 15%-35% increase in deserialization speed, due to ** {{...StreamImplV1}} optimizations ** faster {{readInt}} / {{readLong}} ** better binary UUID format * Further optimizations for reads are required. Here I mostly focused on writing speed. Reading speed turned out to be worse than writing speed for small commands, I don't like it. > Commands marshalling appears to be slow > --- > > Key: IGNITE-22544 > URL: https://issues.apache.org/jira/browse/IGNITE-22544 > Project:
[jira] [Comment Edited] (IGNITE-22544) Commands marshalling appears to be slow
[ https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17859613#comment-17859613 ] Ivan Bessonov edited comment on IGNITE-22544 at 6/24/24 8:25 AM: - According to {{{}org.apache.ignite.internal.benchmarks.UpdateCommandsMarshalingMicroBenchmark{}}}. Before: {code:java} Benchmark (payloadSize) (updateAll) Mode Cnt Score Error Units UpdateCommandsMarshalingMicroBenchmark.marshal 128false thrpt5 2361.249 ± 66.884 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 128 true thrpt552.377 ± 3.769 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048false thrpt5 1713.443 ± 331.795 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048 true thrpt514.916 ± 2.230 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192false thrpt5 833.372 ± 227.738 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192 true thrpt5 3.281 ± 0.906 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal128false thrpt5 2090.845 ± 792.226 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal128 true thrpt551.393 ± 16.872 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048false thrpt5 2188.459 ± 69.423 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048 true thrpt552.705 ± 2.771 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192false thrpt5 2174.810 ± 61.331 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192 true thrpt553.805 ± 1.000 ops/ms {code} After: {code:java} Benchmark (payloadSize) (updateAll) Mode Cnt Score Error Units UpdateCommandsMarshalingMicroBenchmark.marshal 128 false thrpt 5 4389.765 ± 66.332 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 128 true thrpt 5 79.684 ± 0.965 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048 false thrpt 5 2754.506 ± 58.151 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048 true thrpt 5 17.435 ± 0.267 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192 false thrpt 5 1066.381 ± 10.254 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192 true thrpt 5 3.389 ± 0.688 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 128 false thrpt 5 2782.648 ± 173.791 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 128 true thrpt 5 69.952 ± 9.109 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048 false thrpt 5 2752.568 ± 50.796 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048 true thrpt 5 63.721 ± 2.902 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192 false thrpt 5 2676.343 ± 1209.184 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192 true thrpt 5 62.139 ± 17.144 ops/ms {code} Short summary: * Depending on the number of byte arrays inside of the message (which can't be optimized), marshaling became from 0% to 85% faster according to created benchmark, due to a combination of a lot of different optimizations, such as * ** avoiding the creation of serializers ** simpler and slightly faster byte buffers pool ** better binary UUID format ** low-level stuff in direct stream ** better {{writeVarInt}} / {{writeVarLong}} * If we take a look at the flamegraph, we could see that serialization itself is about 1.5-2.0 times slower than the following {{{}Arrays.copyOf{}}}, which is pretty good in my opinion. * Reading speed wasn't so thoroughly checked in this issue, I created another one: https://issues.apache.org/jira/browse/IGNITE-22559 Overall, reading speed doesn't depend on the size of individual byte buffers, because we simple wrap the original array. Other then that, current optimizations show 15%-35% increase in deserialization speed, due to ** {{...StreamImplV1}} optimizations ** faster {{readInt}} / {{readLong}} ** better binary UUID format * Further optimizations for reads are required. Here I mostly focused on writing speed. Reading speed turned out to be worse than writing speed for small commands, I don't like it. was (Author: ibessonov): According to {{{}org.apache.ignite.internal.benchmarks.UpdateCommandsMarshalingMicroBenchmark{}}}. Before: {code:java} Benchmark
[jira] [Updated] (IGNITE-22559) Optimize raft command deserialization
[ https://issues.apache.org/jira/browse/IGNITE-22559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22559: --- Description: # We should benchmark readInt / readLong against protobuf, since it uses the same binary format # We should create much faster way of creating deserializers for messages. For example, we could generate "switch" statements like in Ignite 2. Both for creating message deserializer (compile time generation) and for message group deserialization factory (runtime generation, because we don't know the list of factories) # We should get rid of serializers and deserializers as separate classes and move generated code into message implementation. This way we save on allocations and we don't create builder, which is also expensive, we should write directly into fields of target object like in Ignite 2. > Optimize raft command deserialization > - > > Key: IGNITE-22559 > URL: https://issues.apache.org/jira/browse/IGNITE-22559 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > # We should benchmark readInt / readLong against protobuf, since it uses the > same binary format > # We should create much faster way of creating deserializers for messages. > For example, we could generate "switch" statements like in Ignite 2. Both for > creating message deserializer (compile time generation) and for message group > deserialization factory (runtime generation, because we don't know the list > of factories) > # We should get rid of serializers and deserializers as separate classes and > move generated code into message implementation. This way we save on > allocations and we don't create builder, which is also expensive, we should > write directly into fields of target object like in Ignite 2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-22559) Optimize raft command deserialization
Ivan Bessonov created IGNITE-22559: -- Summary: Optimize raft command deserialization Key: IGNITE-22559 URL: https://issues.apache.org/jira/browse/IGNITE-22559 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-22544) Commands marshalling appears to be slow
[ https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov reassigned IGNITE-22544: -- Assignee: Ivan Bessonov > Commands marshalling appears to be slow > --- > > Key: IGNITE-22544 > URL: https://issues.apache.org/jira/browse/IGNITE-22544 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Attachments: IGNITE-22544.patch > > > We should benchmark the way we marshal commands using optimized marshaller > and make it faster. Some obvious places: > * byte buffers pool - we can replace queue with a manual implementation of > Treiber stack, it's trivial and doesn't use as many CAS/volatile operations > * new serializers are allocated every time, but they can be put into static > final constants instead, or cached in fields of corresponding factories > * we can create a serialization factory per group, not per message, this way > we will remove unnecessary indirection. Group factory can use {{{}switch{}}}, > like in Ignite 2, which would basically lead to static dispatch of > deserializer constructors and static access to serializers, instead of > dynamic dispatch (virtual call), which should be noticeably faster > * profiler might show other simple places, we must also compare > {{OptimizedMarshaller}} against other serialization algorithms in benchmarks > EDIT: quick draft attached, it addresses points 1 and 2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-22544) Commands marshalling appears to be slow
[ https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22544: --- Description: We should benchmark the way we marshal commands using optimized marshaller and make it faster. Some obvious places: * byte buffers pool - we can replace queue with a manual implementation of Treiber stack, it's trivial and doesn't use as many CAS/volatile operations * new serializers are allocated every time, but they can be put into static final constants instead, or cached in fields of corresponding factories * we can create a serialization factory per group, not per message, this way we will remove unnecessary indirection. Group factory can use {{{}switch{}}}, like in Ignite 2, which would basically lead to static dispatch of deserializer constructors and static access to serializers, instead of dynamic dispatch (virtual call), which should be noticeably faster * profiler might show other simple places, we must also compare {{OptimizedMarshaller}} against other serialization algorithms in benchmarks EDIT: quick draft attached, it addresses points 1 and 2. was: We should benchmark the way we marshal commands using optimized marshaller and make it faster. Some obvious places: * byte buffers pool - we can replace queue with a manual implementation of Treiber stack, it's trivial and doesn't use as many CAS/volatile operations * new serializers are allocated every time, but they can be put into static final constants instead, or cached in fields of corresponding factories * we can create a serialization factory per group, not per message, this way we will remove unnecessary indirection. Group factory can use {{{}switch{}}}, like in Ignite 2, which would basically lead to static dispatch of deserializer constructors and static access to serializers, instead of dynamic dispatch (virtual call), which should be noticeably faster * profiler might show other simple places, we must also compare {{OptimizedMarshaller}} against other serialization algorithms in benchmarks > Commands marshalling appears to be slow > --- > > Key: IGNITE-22544 > URL: https://issues.apache.org/jira/browse/IGNITE-22544 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Attachments: IGNITE-22544.patch > > > We should benchmark the way we marshal commands using optimized marshaller > and make it faster. Some obvious places: > * byte buffers pool - we can replace queue with a manual implementation of > Treiber stack, it's trivial and doesn't use as many CAS/volatile operations > * new serializers are allocated every time, but they can be put into static > final constants instead, or cached in fields of corresponding factories > * we can create a serialization factory per group, not per message, this way > we will remove unnecessary indirection. Group factory can use {{{}switch{}}}, > like in Ignite 2, which would basically lead to static dispatch of > deserializer constructors and static access to serializers, instead of > dynamic dispatch (virtual call), which should be noticeably faster > * profiler might show other simple places, we must also compare > {{OptimizedMarshaller}} against other serialization algorithms in benchmarks > EDIT: quick draft attached, it addresses points 1 and 2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-22544) Commands marshalling appears to be slow
[ https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22544: --- Attachment: IGNITE-22544.patch > Commands marshalling appears to be slow > --- > > Key: IGNITE-22544 > URL: https://issues.apache.org/jira/browse/IGNITE-22544 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Attachments: IGNITE-22544.patch > > > We should benchmark the way we marshal commands using optimized marshaller > and make it faster. Some obvious places: > * byte buffers pool - we can replace queue with a manual implementation of > Treiber stack, it's trivial and doesn't use as many CAS/volatile operations > * new serializers are allocated every time, but they can be put into static > final constants instead, or cached in fields of corresponding factories > * we can create a serialization factory per group, not per message, this way > we will remove unnecessary indirection. Group factory can use {{{}switch{}}}, > like in Ignite 2, which would basically lead to static dispatch of > deserializer constructors and static access to serializers, instead of > dynamic dispatch (virtual call), which should be noticeably faster > * profiler might show other simple places, we must also compare > {{OptimizedMarshaller}} against other serialization algorithms in benchmarks -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-22542) Synchronous message handling on local node
[ https://issues.apache.org/jira/browse/IGNITE-22542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22542: --- Description: {{org.apache.ignite.internal.network.DefaultMessagingService#isSelf}} - if we detect that we send a message to the local node, we handle it immediately in the same thread, which could be very bed for throughput of the system. "send"/"invoke" themselves appear to be slow as well, we should benchmark them. We should remove instantiation of InetSocketAddress for sure, if it's possible, it takes time to resolve it. Maybe we should create it unresolved or just cache it like in Ignite 2. was: {{org.apache.ignite.internal.network.DefaultMessagingService#isSelf}} - if we detect that we send a message to the local node, we handle it immediately in the same thread, which could be very bed for throughput of the system. "send"/"invoke" themselves appear to be slow as well, we should benchmark them. > Synchronous message handling on local node > -- > > Key: IGNITE-22542 > URL: https://issues.apache.org/jira/browse/IGNITE-22542 > Project: Ignite > Issue Type: Bug >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > {{org.apache.ignite.internal.network.DefaultMessagingService#isSelf}} - if we > detect that we send a message to the local node, we handle it immediately in > the same thread, which could be very bed for throughput of the system. > "send"/"invoke" themselves appear to be slow as well, we should benchmark > them. We should remove instantiation of InetSocketAddress for sure, if it's > possible, it takes time to resolve it. Maybe we should create it unresolved > or just cache it like in Ignite 2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-22544) Commands marshalling appears to be slow
Ivan Bessonov created IGNITE-22544: -- Summary: Commands marshalling appears to be slow Key: IGNITE-22544 URL: https://issues.apache.org/jira/browse/IGNITE-22544 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov We should benchmark the way we marshal commands using optimized marshaller and make it faster. Some obvious places: * byte buffers pool - we can replace queue with a manual implementation of Treiber stack, it's trivial and doesn't use as many CAS/volatile operations * new serializers are allocated every time, but they can be put into static final constants instead, or cached in fields of corresponding factories * we can create a serialization factory per group, not per message, this way we will remove unnecessary indirection. Group factory can use {{{}switch{}}}, like in Ignite 2, which would basically lead to static dispatch of deserializer constructors and static access to serializers, instead of dynamic dispatch (virtual call), which should be noticeably faster * profiler might show other simple places, we must also compare {{OptimizedMarshaller}} against other serialization algorithms in benchmarks -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-22542) Synchronous message handling on local node
[ https://issues.apache.org/jira/browse/IGNITE-22542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22542: --- Ignite Flags: (was: Docs Required,Release Notes Required) > Synchronous message handling on local node > -- > > Key: IGNITE-22542 > URL: https://issues.apache.org/jira/browse/IGNITE-22542 > Project: Ignite > Issue Type: Bug >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > {{org.apache.ignite.internal.network.DefaultMessagingService#isSelf}} - if we > detect that we send a message to the local node, we handle it immediately in > the same thread, which could be very bed for throughput of the system. > "send"/"invoke" themselves appear to be slow as well, we should benchmark > them. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-22542) Synchronous message handling on local node
Ivan Bessonov created IGNITE-22542: -- Summary: Synchronous message handling on local node Key: IGNITE-22542 URL: https://issues.apache.org/jira/browse/IGNITE-22542 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov {{org.apache.ignite.internal.network.DefaultMessagingService#isSelf}} - if we detect that we send a message to the local node, we handle it immediately in the same thread, which could be very bed for throughput of the system. "send"/"invoke" themselves appear to be slow as well, we should benchmark them. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-22500) Remove unnecessary waits when creating an index
[ https://issues.apache.org/jira/browse/IGNITE-22500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov reassigned IGNITE-22500: -- Assignee: Ivan Bessonov > Remove unnecessary waits when creating an index > --- > > Key: IGNITE-22500 > URL: https://issues.apache.org/jira/browse/IGNITE-22500 > Project: Ignite > Issue Type: Improvement >Reporter: Roman Puchkovskiy >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > When creating an index with current defaults (DelayDuration=1sec, > MaxClockSkew=500ms, IdleSafeTimePropagationPeriod=1sec), it takes 6-6.5 > seconds on my machine (without concurrent transactions, on an empty table > that was just created). > According to the design, we need to first wait for the REGISTERED state to > activate on all nodes, including the ones that are currently down; this is to > make sure that all transactions started on schema versions before the index > creation have finished before we start to build the index (this makes us > waiting for DelayDuration+MaxClockSkew). Then, after the build finishes, we > switch the index to the AVAILABLE state. This requires another wait of > DelayDuration+MaxClockSkew. > Because of IGNITE-20378, in the second case we actually wait longer (for > additional IdleSafeTimePropagationPeriod+MaxClockSkew). > The total of waits is thus 1.5+3=4.5sec. But index creation actually takes > 6-6.5 seconds. It looks like there are some additional delays (like > submitting to the Metastorage and executing its watches). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-21661) Test scenario where all stable nodes are lost during a partially completed rebalance
[ https://issues.apache.org/jira/browse/IGNITE-21661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-21661: --- Reviewer: Kirill Tkalenko > Test scenario where all stable nodes are lost during a partially completed > rebalance > > > Key: IGNITE-21661 > URL: https://issues.apache.org/jira/browse/IGNITE-21661 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Time Spent: 10m > Remaining Estimate: 0h > > Following case is possible: > * Nodes A, B and C for a partition > * B and C go offline > * new distribution is A, D and E > * EDIT: rebalance can only be started with one more "resetPartitions" > * full state transfer from A to D is completed > * full state transfer from A to E is not > * A goes offline > * we perform "resetPartitions" > Ideally, we should use D as a new leader somehow, but the bare minimum should > be a partition that is functional, maybe an empty one. We should test the case > > This might be a good place to add more tests. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-22502) Change default DelayDuration to 500ms
[ https://issues.apache.org/jira/browse/IGNITE-22502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov reassigned IGNITE-22502: -- Assignee: Ivan Bessonov > Change default DelayDuration to 500ms > - > > Key: IGNITE-22502 > URL: https://issues.apache.org/jira/browse/IGNITE-22502 > Project: Ignite > Issue Type: Improvement >Reporter: Roman Puchkovskiy >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > When executing a DDL, we must wait for DelayDuration+MaxClockSkew. > DelayDuration for small clusters (which will probably be the usual mode of > operation) does not need to be long, so it makes sense to lower the default > from 1 second to 0.5 second. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-22509) Deadlock during the node stop
Ivan Bessonov created IGNITE-22509: -- Summary: Deadlock during the node stop Key: IGNITE-22509 URL: https://issues.apache.org/jira/browse/IGNITE-22509 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov {code:java} "%itcskvt_n_1%Raft-Group-Client-1@51623" prio=5 tid=0x4a6e nid=NA waiting for monitor entry java.lang.Thread.State: BLOCKED waiting for main@1 to release lock on <0xca23> (a org.apache.ignite.internal.app.LifecycleManager) at org.apache.ignite.internal.app.LifecycleManager.lambda$allComponentsStartFuture$1(LifecycleManager.java:130) at org.apache.ignite.internal.app.LifecycleManager$$Lambda$2852.843214322.accept(Unknown Source:-1) at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) at org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:550) at org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$handleThrowable$41(RaftGroupServiceImpl.java:605) at org.apache.ignite.internal.raft.RaftGroupServiceImpl$$Lambda$5439.1444714785.run(Unknown Source:-1) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264) at java.util.concurrent.FutureTask.run(FutureTask.java:-1) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.lang.Thread.run(Thread.java:829) {code} Holds busy lock in {{{}RaftGroupServiceImpl.sendWithRetry{}}}. {code:java} "main@1" prio=5 tid=0x1 nid=NA sleeping java.lang.Thread.State: TIMED_WAITING blocks %itcskvt_n_1%Raft-Group-Client-1@51623 at java.lang.Thread.sleep(Thread.java:-1) at org.apache.ignite.internal.util.IgniteSpinReadWriteLock.writeLock(IgniteSpinReadWriteLock.java:255) at org.apache.ignite.internal.util.IgniteSpinBusyLock.block(IgniteSpinBusyLock.java:68) at org.apache.ignite.internal.raft.RaftGroupServiceImpl.shutdown(RaftGroupServiceImpl.java:491) at org.apache.ignite.internal.metastorage.impl.MetaStorageServiceContext.close(MetaStorageServiceContext.java:75) at org.apache.ignite.internal.metastorage.impl.MetaStorageServiceImpl.close(MetaStorageServiceImpl.java:272) at org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl$$Lambda$5148.891107.accept(Unknown Source:-1) at org.apache.ignite.internal.util.IgniteUtils.cancelOrConsume(IgniteUtils.java:967) at org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl.lambda$stopAsync$13(MetaStorageManagerImpl.java:452) at org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl$$Lambda$5141.633101377.close(Unknown Source:-1) at org.apache.ignite.internal.util.IgniteUtils.lambda$closeAllManually$1(IgniteUtils.java:611) at org.apache.ignite.internal.util.IgniteUtils$$Lambda$4822.1427077270.accept(Unknown Source:-1) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177) at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) at org.apache.ignite.internal.util.IgniteUtils.closeAllManually(IgniteUtils.java:609) at org.apache.ignite.internal.util.IgniteUtils.closeAllManually(IgniteUtils.java:643) at org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl.stopAsync(MetaStorageManagerImpl.java:449) at org.apache.ignite.internal.util.IgniteUtils.lambda$stopAsync$6(IgniteUtils.java:1213) at org.apache.ignite.internal.util.IgniteUtils$$Lambda$5013.753691797.apply(Unknown Source:-1) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) at
[jira] [Updated] (IGNITE-22443) Sporadic fails of ConfigurationTreeGeneratorTest
[ https://issues.apache.org/jira/browse/IGNITE-22443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22443: --- Ignite Flags: (was: Docs Required,Release Notes Required) > Sporadic fails of ConfigurationTreeGeneratorTest > > > Key: IGNITE-22443 > URL: https://issues.apache.org/jira/browse/IGNITE-22443 > Project: Ignite > Issue Type: Bug >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 40m > Remaining Estimate: 0h > > Configuration changer start doesn't wait for internal defaults update future, > as a result we have rare data races in certain test methods -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-22443) Sporadic fails of ConfigurationTreeGeneratorTest
Ivan Bessonov created IGNITE-22443: -- Summary: Sporadic fails of ConfigurationTreeGeneratorTest Key: IGNITE-22443 URL: https://issues.apache.org/jira/browse/IGNITE-22443 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov Configuration changer start doesn't wait for internal defaults update future, as a result we have rare data races in certain test methods -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-22386) Many usages of wrong revision serialization in metastorage commands
[ https://issues.apache.org/jira/browse/IGNITE-22386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22386: --- Ignite Flags: (was: Docs Required,Release Notes Required) > Many usages of wrong revision serialization in metastorage commands > --- > > Key: IGNITE-22386 > URL: https://issues.apache.org/jira/browse/IGNITE-22386 > Project: Ignite > Issue Type: Bug >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > {code:java} > byte[] revisionBytes = ByteUtils.longToBytes(revision); > Iif iif = iif( > notExists(partChangeTriggerKey).or(value(partChangeTriggerKey).lt(revisionBytes)), > {code} > Code above has a bug - "longToBytes" is not a suitable serialization format > for preserving natural comparison order used in "lt". We must fix it, because > it leads to occasional false-positive and false-negative condition evaluation > It also leads to flaky tests, obviously -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-22386) Many usages of wrong revision serialization in metastorage commands
[ https://issues.apache.org/jira/browse/IGNITE-22386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov reassigned IGNITE-22386: -- Assignee: Ivan Bessonov > Many usages of wrong revision serialization in metastorage commands > --- > > Key: IGNITE-22386 > URL: https://issues.apache.org/jira/browse/IGNITE-22386 > Project: Ignite > Issue Type: Bug >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > > {code:java} > byte[] revisionBytes = ByteUtils.longToBytes(revision); > Iif iif = iif( > notExists(partChangeTriggerKey).or(value(partChangeTriggerKey).lt(revisionBytes)), > {code} > Code above has a bug - "longToBytes" is not a suitable serialization format > for preserving natural comparison order used in "lt". We must fix it, because > it leads to occasional false-positive and false-negative condition evaluation > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-22386) Many usages of wrong revision serialization in metastorage commands
[ https://issues.apache.org/jira/browse/IGNITE-22386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22386: --- Description: {code:java} byte[] revisionBytes = ByteUtils.longToBytes(revision); Iif iif = iif( notExists(partChangeTriggerKey).or(value(partChangeTriggerKey).lt(revisionBytes)), {code} Code above has a bug - "longToBytes" is not a suitable serialization format for preserving natural comparison order used in "lt". We must fix it, because it leads to occasional false-positive and false-negative condition evaluation It also leads to flaky tests, obviously was: {code:java} byte[] revisionBytes = ByteUtils.longToBytes(revision); Iif iif = iif( notExists(partChangeTriggerKey).or(value(partChangeTriggerKey).lt(revisionBytes)), {code} Code above has a bug - "longToBytes" is not a suitable serialization format for preserving natural comparison order used in "lt". We must fix it, because it leads to occasional false-positive and false-negative condition evaluation > Many usages of wrong revision serialization in metastorage commands > --- > > Key: IGNITE-22386 > URL: https://issues.apache.org/jira/browse/IGNITE-22386 > Project: Ignite > Issue Type: Bug >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > {code:java} > byte[] revisionBytes = ByteUtils.longToBytes(revision); > Iif iif = iif( > notExists(partChangeTriggerKey).or(value(partChangeTriggerKey).lt(revisionBytes)), > {code} > Code above has a bug - "longToBytes" is not a suitable serialization format > for preserving natural comparison order used in "lt". We must fix it, because > it leads to occasional false-positive and false-negative condition evaluation > It also leads to flaky tests, obviously -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-22386) Many usages of wrong revision serialization in metastorage commands
Ivan Bessonov created IGNITE-22386: -- Summary: Many usages of wrong revision serialization in metastorage commands Key: IGNITE-22386 URL: https://issues.apache.org/jira/browse/IGNITE-22386 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov {code:java} byte[] revisionBytes = ByteUtils.longToBytes(revision); Iif iif = iif( notExists(partChangeTriggerKey).or(value(partChangeTriggerKey).lt(revisionBytes)), {code} Code above has a bug - "longToBytes" is not a suitable serialization format for preserving natural comparison order used in "lt". We must fix it, because it leads to occasional false-positive and false-negative condition evaluation -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-21661) Test scenario where all stable nodes are lost during a partially completed rebalance
[ https://issues.apache.org/jira/browse/IGNITE-21661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-21661: --- Description: Following case is possible: * Nodes A, B and C for a partition * B and C go offline * new distribution is A, D and E * EDIT: rebalance can only be started with one more "resetPartitions" * full state transfer from A to D is completed * full state transfer from A to E is not * A goes offline * we perform "resetPartitions" Ideally, we should use D as a new leader somehow, but the bare minimum should be a partition that is functional, maybe an empty one. We should test the case This might be a good place to add more tests. was: Following case is possible: * Nodes A, B and C for a partition * B and C go offline * new distribution is A, D and E * full state transfer from A to D is completed * full state transfer from A to E is not * A goes offline * we perform "resetPartitions" Ideally, we should use D as a new leader somehow, but the bare minimum should be a partition that is functional, maybe an empty one. We should test the case This might be a good place to add more tests. > Test scenario where all stable nodes are lost during a partially completed > rebalance > > > Key: IGNITE-21661 > URL: https://issues.apache.org/jira/browse/IGNITE-21661 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > Following case is possible: > * Nodes A, B and C for a partition > * B and C go offline > * new distribution is A, D and E > * EDIT: rebalance can only be started with one more "resetPartitions" > * full state transfer from A to D is completed > * full state transfer from A to E is not > * A goes offline > * we perform "resetPartitions" > Ideally, we should use D as a new leader somehow, but the bare minimum should > be a partition that is functional, maybe an empty one. We should test the case > > This might be a good place to add more tests. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-21661) Test scenario where all stable nodes are lost during a partially completed rebalance
[ https://issues.apache.org/jira/browse/IGNITE-21661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov reassigned IGNITE-21661: -- Assignee: Ivan Bessonov > Test scenario where all stable nodes are lost during a partially completed > rebalance > > > Key: IGNITE-21661 > URL: https://issues.apache.org/jira/browse/IGNITE-21661 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > Following case is possible: > * Nodes A, B and C for a partition > * B and C go offline > * new distribution is A, D and E > * full state transfer from A to D is completed > * full state transfer from A to E is not > * A goes offline > * we perform "resetPartitions" > Ideally, we should use D as a new leader somehow, but the bare minimum should > be a partition that is functional, maybe an empty one. We should test the case > > This might be a good place to add more tests. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-21661) Test scenario where all stable nodes are lost during a partially completed rebalance
[ https://issues.apache.org/jira/browse/IGNITE-21661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-21661: --- Description: Following case is possible: * Nodes A, B and C for a partition * B and C go offline * new distribution is A, D and E * full state transfer from A to D is completed * full state transfer from A to E is not * A goes offline * we perform "resetPartitions" Ideally, we should use D as a new leader somehow, but the bare minimum should be a partition that is functional, maybe an empty one. We should test the case This might be a good place to add more tests. was: Following case is possible: * Nodes A, B and C for a partition * B and C go offline * new distribution is A, D and E * full state transfer from A to D is completed * full state transfer from A to E is not * A goes offline * we perform "resetPartitions" Ideally, we should use D as a new leader somehow, but the bare minimum should be a partition that is functional, maybe an empty one. We should test the case > Test scenario where all stable nodes are lost during a partially completed > rebalance > > > Key: IGNITE-21661 > URL: https://issues.apache.org/jira/browse/IGNITE-21661 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > Following case is possible: > * Nodes A, B and C for a partition > * B and C go offline > * new distribution is A, D and E > * full state transfer from A to D is completed > * full state transfer from A to E is not > * A goes offline > * we perform "resetPartitions" > Ideally, we should use D as a new leader somehow, but the bare minimum should > be a partition that is functional, maybe an empty one. We should test the case > > This might be a good place to add more tests. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-22107) Properly encapsulate partition meta
[ https://issues.apache.org/jira/browse/IGNITE-22107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22107: --- Description: {{PartitionMeta}} and {{PartitionMetaIo}} leak specific implementation details, specifically - all fields except for {{{}pageCount{}}}. This breaks encapsulation and makes {{page-memory}} module code non-reusable. I propose splitting meta into 2 parts - abstract meta, that would only hold page count, and specific meta that will be located in a different module, close to the implementation. In this case, we would have to pass meta IO as parameters into methods like {{{}PartitionMetaManager#readOrCreateMeta{}}}, and create a getter for IO in {{AbstractPartitionMeta}} class itself, but that's a necessary sacrifice. Some other places will be affected as well, mostly tests. was: `PartitionMeta` and `PartitionMetaIo` leak specific implementation details, specifically - all fields except for `pageCount`. This breaks encapsulation and makes `page-memory` module code non-reusable. I propose splitting meta into 2 parts - abstract meta, that would only hold page count, and specific meta that will be located in a different module, close to the implementation. In this case, we would have to pass meta IO as parameters into methods like `PartitionMetaManager#readOrCreateMeta`, and create a getter for IO in `AbstractPartitionMeta` class itself, but that's a necessary sacrifice. > Properly encapsulate partition meta > --- > > Key: IGNITE-22107 > URL: https://issues.apache.org/jira/browse/IGNITE-22107 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > {{PartitionMeta}} and {{PartitionMetaIo}} leak specific implementation > details, specifically - all fields except for {{{}pageCount{}}}. This breaks > encapsulation and makes {{page-memory}} module code non-reusable. > I propose splitting meta into 2 parts - abstract meta, that would only hold > page count, and specific meta that will be located in a different module, > close to the implementation. > In this case, we would have to pass meta IO as parameters into methods like > {{{}PartitionMetaManager#readOrCreateMeta{}}}, and create a getter for IO in > {{AbstractPartitionMeta}} class itself, but that's a necessary sacrifice. > Some other places will be affected as well, mostly tests. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-22107) Properly encapsulate partition meta
Ivan Bessonov created IGNITE-22107: -- Summary: Properly encapsulate partition meta Key: IGNITE-22107 URL: https://issues.apache.org/jira/browse/IGNITE-22107 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Fix For: 3.0.0-beta2 `PartitionMeta` and `PartitionMetaIo` leak specific implementation details, specifically - all fields except for `pageCount`. This breaks encapsulation and makes `page-memory` module code non-reusable. I propose splitting meta into 2 parts - abstract meta, that would only hold page count, and specific meta that will be located in a different module, close to the implementation. In this case, we would have to pass meta IO as parameters into methods like `PartitionMetaManager#readOrCreateMeta`, and create a getter for IO in `AbstractPartitionMeta` class itself, but that's a necessary sacrifice. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IGNITE-21434) Fail user write requests for non-available partitions
[ https://issues.apache.org/jira/browse/IGNITE-21434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov resolved IGNITE-21434. Resolution: Won't Fix This insert doesn't hang indefinitely anymore, it fails with primary replica awaiting. I'm closing the issue as "Won't Fix" > Fail user write requests for non-available partitions > - > > Key: IGNITE-21434 > URL: https://issues.apache.org/jira/browse/IGNITE-21434 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > Currently, {{INSERT INTO test VALUES(%d, %d);}} just hands indefinitely, > which is not what you would expect. We should either fail the request > immediately if there's no majority, or return a replication timeout > exception, for example. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-22075) GC doesn't wait for RO transactions
Ivan Bessonov created IGNITE-22075: -- Summary: GC doesn't wait for RO transactions Key: IGNITE-22075 URL: https://issues.apache.org/jira/browse/IGNITE-22075 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Fix For: 3.0.0-beta2 In https://issues.apache.org/jira/browse/IGNITE-21773 we started handling the LWM update concurrently by both TX manager and GC, which means that GC might start collecting garbage before transactions are finished. This doesn't even depend on listeners order, because both operations are asynchronous. We must fix it -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-22041) Secondary indexes inline size calculation is wrong
[ https://issues.apache.org/jira/browse/IGNITE-22041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22041: --- Description: * "short" size is used as 16 bytes instead of 2 bytes * decimal header is not included in estimation > Secondary indexes inline size calculation is wrong > -- > > Key: IGNITE-22041 > URL: https://issues.apache.org/jira/browse/IGNITE-22041 > Project: Ignite > Issue Type: Bug >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Time Spent: 10m > Remaining Estimate: 0h > > * "short" size is used as 16 bytes instead of 2 bytes > * decimal header is not included in estimation -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-22063) aimem partition deletion doesn't delete GC queue
Ivan Bessonov created IGNITE-22063: -- Summary: aimem partition deletion doesn't delete GC queue Key: IGNITE-22063 URL: https://issues.apache.org/jira/browse/IGNITE-22063 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov {{org.apache.ignite.internal.storage.pagememory.mv.VolatilePageMemoryMvPartitionStorage#destroyStructures}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-22050) Data structures don't clear partId of reused page
[ https://issues.apache.org/jira/browse/IGNITE-22050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22050: --- Description: In current implementation we use a single reuse list for all partitions in aimem storage engine. That works fine in Ignite 2, but here in Ignite 3 we implemented a "partitilnless link" format for eliminating 2 bytes, that indicate partition number, from the data in pages. This means that if allocator provided the structure with the page from partition X, but the structure itself represents partition Y, we will lose the "X" in the process and next time will try accessing the page by the pageId that has Y encoded in it. This would lead to pageId mismatch. We have several options here. * ignore mismatched partitions * get rid of partitionless pageIds * fix the allocator, so that it would change partition Id upon allocation Ideally, we should go with the 3rd option. It requires some slight changes in internal data structure API, so that we would pass the required partitionId directly into the allocator (reuse list). This is a little bit excessive at first sight, but seems more appropriate in a long run. Ignite 2 pageIds are all messed up inside of structures, we can fix that. > Data structures don't clear partId of reused page > - > > Key: IGNITE-22050 > URL: https://issues.apache.org/jira/browse/IGNITE-22050 > Project: Ignite > Issue Type: Bug >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 20m > Remaining Estimate: 0h > > In current implementation we use a single reuse list for all partitions in > aimem storage engine. > That works fine in Ignite 2, but here in Ignite 3 we implemented a > "partitilnless link" format for eliminating 2 bytes, that indicate partition > number, from the data in pages. This means that if allocator provided the > structure with the page from partition X, but the structure itself represents > partition Y, we will lose the "X" in the process and next time will try > accessing the page by the pageId that has Y encoded in it. This would lead to > pageId mismatch. > We have several options here. > * ignore mismatched partitions > * get rid of partitionless pageIds > * fix the allocator, so that it would change partition Id upon allocation > Ideally, we should go with the 3rd option. It requires some slight changes in > internal data structure API, so that we would pass the required partitionId > directly into the allocator (reuse list). This is a little bit excessive at > first sight, but seems more appropriate in a long run. Ignite 2 pageIds are > all messed up inside of structures, we can fix that. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IGNITE-22055) Shut destruction executor down before closing volatile regions
[ https://issues.apache.org/jira/browse/IGNITE-22055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov resolved IGNITE-22055. Reviewer: Ivan Bessonov Resolution: Fixed > Shut destruction executor down before closing volatile regions > -- > > Key: IGNITE-22055 > URL: https://issues.apache.org/jira/browse/IGNITE-22055 > Project: Ignite > Issue Type: Bug >Reporter: Roman Puchkovskiy >Assignee: Roman Puchkovskiy >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-22058) Use paranoid leak detection in tests
Ivan Bessonov created IGNITE-22058: -- Summary: Use paranoid leak detection in tests Key: IGNITE-22058 URL: https://issues.apache.org/jira/browse/IGNITE-22058 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Fix For: 3.0.0-beta2 We should set `io.netty.leakDetection.level=paranoid` in integration tests and network tests, in order to detect possible leaks -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-22050) Data structures don't clear partId of reused page
Ivan Bessonov created IGNITE-22050: -- Summary: Data structures don't clear partId of reused page Key: IGNITE-22050 URL: https://issues.apache.org/jira/browse/IGNITE-22050 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov Fix For: 3.0.0-beta2 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-22041) Secondary indexes inline size calculation is wrong
Ivan Bessonov created IGNITE-22041: -- Summary: Secondary indexes inline size calculation is wrong Key: IGNITE-22041 URL: https://issues.apache.org/jira/browse/IGNITE-22041 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Assignee: Ivan Bessonov -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-21999) Merge partition free-lists into one
[ https://issues.apache.org/jira/browse/IGNITE-21999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov reassigned IGNITE-21999: -- Assignee: Philipp Shergalis (was: Ivan Bessonov) > Merge partition free-lists into one > --- > > Key: IGNITE-21999 > URL: https://issues.apache.org/jira/browse/IGNITE-21999 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Philipp Shergalis >Priority: Major > Labels: ignite-3 > > Current implementation has 2 free-lists: > * version chains > * index tuples > These lists have separate buckets for different types of data pages. There's > an issue with this approach: > * overhead on pages - we have to allocate more pages to store buckets > * overhead on checkpoints - we have to save twice as many free-lists on > every checkpoint > The reason, to my understanding, is the fact that FreeList class is > parameterized with the specific type of data that it stores. It makes no > sense to me, to be completely honest, because the algorithm is always the > same, and we always use the code from abstract free-list implementation. > What I propose: > * get rid of abstract implementation and only have the concrete > implementation of free lists > * same for data pages > * serialization code will be fully moved to implementations of Storeable > We're losing some guarantees if we do this change - we can no longer check > that type of the page is correct. My response to this issue is that every > Storeable could add a 1-byte header to the data, in order to validate it when > being read, that should be enough. If we could find a way to store less than > 1 byte then that's nice, I didn't look too much into the question. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-21999) Merge partition free-lists into one
[ https://issues.apache.org/jira/browse/IGNITE-21999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov reassigned IGNITE-21999: -- Assignee: Ivan Bessonov > Merge partition free-lists into one > --- > > Key: IGNITE-21999 > URL: https://issues.apache.org/jira/browse/IGNITE-21999 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > Current implementation has 2 free-lists: > * version chains > * index tuples > These lists have separate buckets for different types of data pages. There's > an issue with this approach: > * overhead on pages - we have to allocate more pages to store buckets > * overhead on checkpoints - we have to save twice as many free-lists on > every checkpoint > The reason, to my understanding, is the fact that FreeList class is > parameterized with the specific type of data that it stores. It makes no > sense to me, to be completely honest, because the algorithm is always the > same, and we always use the code from abstract free-list implementation. > What I propose: > * get rid of abstract implementation and only have the concrete > implementation of free lists > * same for data pages > * serialization code will be fully moved to implementations of Storeable > We're losing some guarantees if we do this change - we can no longer check > that type of the page is correct. My response to this issue is that every > Storeable could add a 1-byte header to the data, in order to validate it when > being read, that should be enough. If we could find a way to store less than > 1 byte then that's nice, I didn't look too much into the question. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-21999) Merge partition free-lists into one
Ivan Bessonov created IGNITE-21999: -- Summary: Merge partition free-lists into one Key: IGNITE-21999 URL: https://issues.apache.org/jira/browse/IGNITE-21999 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Current implementation has 2 free-lists: * version chains * index tuples These lists have separate buckets for different types of data pages. There's an issue with this approach: * overhead on pages - we have to allocate more pages to store buckets * overhead on checkpoints - we have to save twice as many free-lists on every checkpoint The reason, to my understanding, is the fact that FreeList class is parameterized with the specific type of data that it stores. It makes no sense to me, to be completely honest, because the algorithm is always the same, and we always use the code from abstract free-list implementation. What I propose: * get rid of abstract implementation and only have the concrete implementation of free lists * same for data pages * serialization code will be fully moved to implementations of Storeable We're losing some guarantees if we do this change - we can no longer check that type of the page is correct. My response to this issue is that every Storeable could add a 1-byte header to the data, in order to validate it when being read, that should be enough. If we could find a way to store less than 1 byte then that's nice, I didn't look too much into the question. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-21257) Public Java API to get global partition states
[ https://issues.apache.org/jira/browse/IGNITE-21257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov reassigned IGNITE-21257: -- Assignee: Ivan Bessonov > Public Java API to get global partition states > -- > > Key: IGNITE-21257 > URL: https://issues.apache.org/jira/browse/IGNITE-21257 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > Please refer to https://issues.apache.org/jira/browse/IGNITE-21140 for the > list. > We should use local partition states, implemented in IGNITE-21256, and > combine them in cluster-wide compute call, before returning to the user. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-21987) Optimize RO scan in sorted indexes
[ https://issues.apache.org/jira/browse/IGNITE-21987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-21987: --- Description: This issue applies to aimem/aipersist primarily. Optimization for rocksdb might be done separately. * add new method to SortedIndexStorage, like "readOnlyScan", that returns a simple cursor * in the implementation we should use alternative cursor implementation for RO scans - it should delegate calls to B+Tree cursor * reuse existing tests where possible * call new method where necessary (PartitionReplicaListener#scanSortedIndex) IMPORTANT: we should throw an exception if somebody scans an index and IndexStorage#getNextRowIdToBuild is not null. It should be a new error, like "IndexNotBuiltException" was: This issue applies to aimem/aipersist primarily. Optimization for rocksdb might be done separately. * add new method to SortedIndexStorage, like "readOnlyScan", that returns a simple cursor * in the implementation we should use alternative cursor implementation for RO scans - it should delegate calls to B+Tree cursor * reuse existing tests where possible * call new method where necessary (PartitionReplicaListener#scanSortedIndex) IMPORTANT: we should throw an exception if somebody scans an index and IndexStorage#getNextRowIdToBuild is not null. > Optimize RO scan in sorted indexes > -- > > Key: IGNITE-21987 > URL: https://issues.apache.org/jira/browse/IGNITE-21987 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > This issue applies to aimem/aipersist primarily. Optimization for rocksdb > might be done separately. > * add new method to SortedIndexStorage, like "readOnlyScan", that returns a > simple cursor > * in the implementation we should use alternative cursor implementation for > RO scans - it should delegate calls to B+Tree cursor > * reuse existing tests where possible > * call new method where necessary (PartitionReplicaListener#scanSortedIndex) > IMPORTANT: we should throw an exception if somebody scans an index and > IndexStorage#getNextRowIdToBuild is not null. It should be a new error, like > "IndexNotBuiltException" -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-21987) Optimize RO scan in sorted indexes
[ https://issues.apache.org/jira/browse/IGNITE-21987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-21987: --- Description: This issue applies to aimem/aipersist primarily. Optimization for rocksdb might be done separately. * add new method to SortedIndexStorage, like "readOnlyScan", that returns a simple cursor * in the implementation we should use alternative cursor implementation for RO scans - it should delegate calls to B+Tree cursor * reuse existing tests where possible * call new method where necessary (PartitionReplicaListener#scanSortedIndex) IMPORTANT: we should throw an exception if somebody scans an index and IndexStorage#getNextRowIdToBuild is not null. was: This issue applies to aimem/aipersist primarily. Optimization for rocksdb might be done separately. * add new method to SortedIndexStorage, like "readOnlyScan", that returns a simple cursor * in the implementation we should use alternative cursor implementation for RO scans - it should delegate calls to B+Tree cursor * reuse existing tests where possible * call new method where necessary (PartitionReplicaListener#scanSortedIndex) > Optimize RO scan in sorted indexes > -- > > Key: IGNITE-21987 > URL: https://issues.apache.org/jira/browse/IGNITE-21987 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > This issue applies to aimem/aipersist primarily. Optimization for rocksdb > might be done separately. > * add new method to SortedIndexStorage, like "readOnlyScan", that returns a > simple cursor > * in the implementation we should use alternative cursor implementation for > RO scans - it should delegate calls to B+Tree cursor > * reuse existing tests where possible > * call new method where necessary (PartitionReplicaListener#scanSortedIndex) > IMPORTANT: we should throw an exception if somebody scans an index and > IndexStorage#getNextRowIdToBuild is not null. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-21987) Optimize RO scan in sorted indexes
[ https://issues.apache.org/jira/browse/IGNITE-21987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-21987: --- Description: This issue applies to aimem/aipersist primarily. Optimization for rocksdb might be done separately. * add new method to SortedIndexStorage, like "readOnlyScan", that returns a simple cursor * in the implementation we should use alternative cursor implementation for RO scans - it should delegate calls to B+Tree cursor * reuse existing tests where possible * call new method where necessary (PartitionReplicaListener#scanSortedIndex) was: This issue applies to aimem/aipersist primarily. Optimization for rocksdb might be done separately. * add new flag RO_SCAN to SortedIndexStorage * in the implementation we should use alternative cursor implementation for RO scans - it should delegate calls to B+Tree cursor, and "peek" should throw an "UnsupportedOperationException" * for "rocksdb" it shouldn't refresh the iterator all the time. "peek" should also throw exceptions * reuse existing tests * pass new RO_SCAN flag into a method where it's necessary > Optimize RO scan in sorted indexes > -- > > Key: IGNITE-21987 > URL: https://issues.apache.org/jira/browse/IGNITE-21987 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > This issue applies to aimem/aipersist primarily. Optimization for rocksdb > might be done separately. > * add new method to SortedIndexStorage, like "readOnlyScan", that returns a > simple cursor > * in the implementation we should use alternative cursor implementation for > RO scans - it should delegate calls to B+Tree cursor > * reuse existing tests where possible > * call new method where necessary (PartitionReplicaListener#scanSortedIndex) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-21987) Optimize RO scan in sorted indexes
Ivan Bessonov created IGNITE-21987: -- Summary: Optimize RO scan in sorted indexes Key: IGNITE-21987 URL: https://issues.apache.org/jira/browse/IGNITE-21987 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov This issue applies to aimem/aipersist primarily. Optimization for rocksdb might be done separately. * add new flag RO_SCAN to SortedIndexStorage * in the implementation we should use alternative cursor implementation for RO scans - it should delegate calls to B+Tree cursor, and "peek" should throw an "UnsupportedOperationException" * for "rocksdb" it shouldn't refresh the iterator all the time. "peek" should also throw exceptions * reuse existing tests * pass new RO_SCAN flag into a method where it's necessary -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-21906) Consider disabling inline in PK index by default
Ivan Bessonov created IGNITE-21906: -- Summary: Consider disabling inline in PK index by default Key: IGNITE-21906 URL: https://issues.apache.org/jira/browse/IGNITE-21906 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov In aipersist/aimem we attempt to inline binary tuples into pages for hash indexes by default. This, in theory, saves us from the necessity of accessing binary tuples from data pages for comparison, which is slower than comparing inlined data. But, assuming the good hash distribution, we would only have to do the real comparison for the matched tuple. At the same time, inlined data might be substantially larger than hash+link, meaning that B+Tree with inlined data has bigger height, which correlates with slower search speed. So, we have both pros and cons for inlining, and the only real way to reconcile them is to compare them with some benchmarks. This is exactly what I propose. TL;DR: force inline size to be 0 for hash indices and benchmark for put/get operations, with large enough amount of data. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-21902) Add an option to configure log storage path
Ivan Bessonov created IGNITE-21902: -- Summary: Add an option to configure log storage path Key: IGNITE-21902 URL: https://issues.apache.org/jira/browse/IGNITE-21902 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Fix For: 3.0.0-beta2 Option to store log and data on separate devices can substantially improve the performance in a long run for many users, we should implement it. There is such an option in Ignite 2, and people use it all the time. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IGNITE-21898) Remove reactive methods from AntiHijackingIgniteSql
[ https://issues.apache.org/jira/browse/IGNITE-21898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov resolved IGNITE-21898. Reviewer: Ivan Bessonov Resolution: Fixed > Remove reactive methods from AntiHijackingIgniteSql > --- > > Key: IGNITE-21898 > URL: https://issues.apache.org/jira/browse/IGNITE-21898 > Project: Ignite > Issue Type: Improvement >Reporter: Roman Puchkovskiy >Assignee: Roman Puchkovskiy >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 20m > Remaining Estimate: 0h > > They were removed from IgniteSql interface. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-21661) Test scenario where all stable nodes are lost during a partially completed rebalance
Ivan Bessonov created IGNITE-21661: -- Summary: Test scenario where all stable nodes are lost during a partially completed rebalance Key: IGNITE-21661 URL: https://issues.apache.org/jira/browse/IGNITE-21661 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Following case is possible: * Nodes A, B and C for a partition * B and C go offline * new distribution is A, D and E * full state transfer from A to D is completed * full state transfer from A to E is not * A goes offline * we perform "resetPartitions" Ideally, we should use D as a new leader somehow, but the bare minimum should be a partition that is functional, maybe an empty one. We should test the case -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-21284) Internal API for manual raft group configuration update
[ https://issues.apache.org/jira/browse/IGNITE-21284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-21284: --- Description: We need an API (with implementation) that's analogous to "reset-lost-partitions", but with the ability to reuse living minority of nodes. This API should gather the states of partitions, identify healthy peers, and use them as a new raft group configuration (through the update of assignments). We have to make sure that node with latest log index will become a leader, so we will have to propagate desired minimum for log index in assignments and use it during the voting. h2. What's implemented "resetPartitions" operation in distributed zone manager. It identifies partitions where only a minority of nodes is online (thus they won't be able to execute "changePeersAsync"), and writes a "forced pending assignments" for them. Forced assignment excludes stable nodes, that are not present in pending assignment, from a new raft group configuration. It also performs a "resetPeers" operation on alive nodes from the stable assignment. Complete loss of all nodes from stable assignments is not yet implemented, at least one node is required to be elected as a leader. was: We need an API (with implementation) that's analogous to "reset-lost-partitions", but with the ability to reuse living minority of nodes. This API should gather the states of partitions, identify healthy peers, and use them as a new raft group configuration (through the update of assignments). We have to make sure that node with latest log index will become a leader, so we will have to propagate desired minimum for log index in assignments and use it during the voting. > Internal API for manual raft group configuration update > --- > > Key: IGNITE-21284 > URL: https://issues.apache.org/jira/browse/IGNITE-21284 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Time Spent: 10m > Remaining Estimate: 0h > > We need an API (with implementation) that's analogous to > "reset-lost-partitions", but with the ability to reuse living minority of > nodes. > This API should gather the states of partitions, identify healthy peers, and > use them as a new raft group configuration (through the update of > assignments). > We have to make sure that node with latest log index will become a leader, so > we will have to propagate desired minimum for log index in assignments and > use it during the voting. > h2. What's implemented > "resetPartitions" operation in distributed zone manager. It identifies > partitions where only a minority of nodes is online (thus they won't be able > to execute "changePeersAsync"), and writes a "forced pending assignments" for > them. > Forced assignment excludes stable nodes, that are not present in pending > assignment, from a new raft group configuration. It also performs a > "resetPeers" operation on alive nodes from the stable assignment. > Complete loss of all nodes from stable assignments is not yet implemented, at > least one node is required to be elected as a leader. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-21588) CMG commands idempotency is broken
[ https://issues.apache.org/jira/browse/IGNITE-21588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-21588: --- Description: When handling commands like {{JoinReadyCommand}} and {{NodesLeaveCommand}} we do the following: * Read local state with {{{}readLogicalTopology(){}}}. * Modify state according to the command. * {*}Increase version{*}. * Write new state with {{{}saveSnapshotToStorage(snapshot){}}}. The problem lies in reading and writing of the state - it' local, and version value is not replicated. What happens when we restart the node: * It starts without local storage snapshot, with appliedIndex == 0, which is a {*}state in the past{*}. * We apply commands that were already applied before restart. * We apply these commands to locally saved topology snapshot. * This logical topology snapshot has a *state in the future* when compared to appliedIndex == 0. * As a result, when we re-apply some commands, we *increase the version* one more time, thus breaking data consistency between nodes. This would have been fine if we only used this version locally. But distribution zones rely on the consistency of the version between all nodes in cluster. This might break DZ data nodes handling if any of the cluster nodes restarts. How to fix: * Either drop the storage if there's no storage snapshot, this will restore consistency * or never start CMG group from a snapshot, but rather start it from the latest storage data. was: When handling commands like {{JoinReadyCommand}} and {{NodesLeaveCommand}} we do the following: * Read local state with {{{}readLogicalTopology(){}}}. * Modify state according to the command. * {*}Increase version{*}. * Write new state with {{{}saveSnapshotToStorage(snapshot){}}}. The problem lies in reading and writing of the state - it's local, and version value is not replicated. What happens when we restart the node: * It starts with local storage snapshot, which is a {*}state in the past{*}, generally speaking. * We apply commands that were not applied in the snapshot. * We apply these commands to locally saved topology snapshot. * This logical topology snapshot has a *state in the future* when compared to storage snapshot. * As a result, when we re-apply some commands, we *increase the version* one more time, thus breaking data consistency between nodes. This would have been fine if we only used this version locally. But distribution zones rely on the consistency of the version between all nodes in cluster. This might break DZ data nodes handling if any of the cluster nodes restarts. > CMG commands idempotency is broken > -- > > Key: IGNITE-21588 > URL: https://issues.apache.org/jira/browse/IGNITE-21588 > Project: Ignite > Issue Type: Bug >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > When handling commands like {{JoinReadyCommand}} and {{NodesLeaveCommand}} we > do the following: > * Read local state with {{{}readLogicalTopology(){}}}. > * Modify state according to the command. > * {*}Increase version{*}. > * Write new state with {{{}saveSnapshotToStorage(snapshot){}}}. > The problem lies in reading and writing of the state - it' local, and version > value is not replicated. > What happens when we restart the node: > * It starts without local storage snapshot, with appliedIndex == 0, which is > a {*}state in the past{*}. > * We apply commands that were already applied before restart. > * We apply these commands to locally saved topology snapshot. > * This logical topology snapshot has a *state in the future* when compared > to appliedIndex == 0. > * As a result, when we re-apply some commands, we *increase the version* one > more time, thus breaking data consistency between nodes. > This would have been fine if we only used this version locally. But > distribution zones rely on the consistency of the version between all nodes > in cluster. This might break DZ data nodes handling if any of the cluster > nodes restarts. > How to fix: > * Either drop the storage if there's no storage snapshot, this will restore > consistency > * or never start CMG group from a snapshot, but rather start it from the > latest storage data. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-21588) CMG commands idempotency is broken
Ivan Bessonov created IGNITE-21588: -- Summary: CMG commands idempotency is broken Key: IGNITE-21588 URL: https://issues.apache.org/jira/browse/IGNITE-21588 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov When handling commands like {{JoinReadyCommand}} and {{NodesLeaveCommand}} we do the following: * Read local state with {{{}readLogicalTopology(){}}}. * Modify state according to the command. * {*}Increase version{*}. * Write new state with {{{}saveSnapshotToStorage(snapshot){}}}. The problem lies in reading and writing of the state - it' local, and version value is not replicated. What happens when we restart the node: * It starts with local storage snapshot, which is a {*}state in the past{*}, generally speaking. * We apply commands that were not applied in the snapshot. * We apply these commands to locally saved topology snapshot. * This logical topology snapshot has a *state in the future* when compared to storage snapshot. * As a result, when we re-apply some commands, we *increase the version* one more time, thus breaking data consistency between nodes. This would have been fine if we only used this version locally. But distribution zones rely on the consistency of the version between all nodes in cluster. This might break DZ data nodes handling if any of the cluster nodes restarts. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-21548) Encapsulate Set
Ivan Bessonov created IGNITE-21548: -- Summary: Encapsulate Set Key: IGNITE-21548 URL: https://issues.apache.org/jira/browse/IGNITE-21548 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Assignee: Ivan Bessonov Assignments may have some associated metadata, like a "force" flag, for example. We should prepare the code for introducing such meta in the future. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-18366) Simplify the configuration asm generator, phase 2
[ https://issues.apache.org/jira/browse/IGNITE-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-18366: --- Description: After the split, it makes sense to start simplifying every individual generator. This is partially a research issue. Exactly what to do is not clear yet. Some context: classes in package {{org.apache.ignite.internal.configuration.asm}} are pretty big and complicated. {{InnerNodeAsmGenerator}} is almost 2000 lines long. How can we make it simpler? Better naming, more comments. Inner node generation can be split into multiple files, because it also handles polymorphic implementations. In some cases I would change the generation itself. For example, generated methods in polymorphic instances have the same implementation as in original inner node instead of simply delegating the execution to inner nodes. It affect both performance and the code of the generators in negative way. was:After the split, it makes sense to start simplifying every individual generator. This is partially a research issue. Exactly what to do is not clear yet. > Simplify the configuration asm generator, phase 2 > - > > Key: IGNITE-18366 > URL: https://issues.apache.org/jira/browse/IGNITE-18366 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: iep-55, ignite-3, technical-debt > Fix For: 3.0.0-beta2 > > > After the split, it makes sense to start simplifying every individual > generator. This is partially a research issue. Exactly what to do is not > clear yet. > Some context: classes in package > {{org.apache.ignite.internal.configuration.asm}} are pretty big and > complicated. {{InnerNodeAsmGenerator}} is almost 2000 lines long. > How can we make it simpler? Better naming, more comments. Inner node > generation can be split into multiple files, because it also handles > polymorphic implementations. > In some cases I would change the generation itself. For example, generated > methods in polymorphic instances have the same implementation as in original > inner node instead of simply delegating the execution to inner nodes. It > affect both performance and the code of the generators in negative way. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IGNITE-21302) Prohibit automatic group reconfiguration when there's no majority
[ https://issues.apache.org/jira/browse/IGNITE-21302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov resolved IGNITE-21302. Resolution: Won't Fix This fix is not required. Data loss won't happen for different reasons > Prohibit automatic group reconfiguration when there's no majority > - > > Key: IGNITE-21302 > URL: https://issues.apache.org/jira/browse/IGNITE-21302 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Time Spent: 10m > Remaining Estimate: 0h > > scaleDown timer should not lead to a situation where user loses the data. > Default "changePeers" behavior also won't work, because there's no majority > and thus no leader. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-21501) Create index storages for new partitions on rebalance
[ https://issues.apache.org/jira/browse/IGNITE-21501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-21501: --- Epic Link: IGNITE-20782 > Create index storages for new partitions on rebalance > - > > Key: IGNITE-21501 > URL: https://issues.apache.org/jira/browse/IGNITE-21501 > Project: Ignite > Issue Type: Bug >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > It appears that we only create index storages during the "table creation", > not during the "partition creation" if it's performed in isolation. > Even if we did, > {{org.apache.ignite.internal.table.distributed.index.IndexUpdateHandler#waitIndexes}} > is still badly designed, because it waits for indexes of the initial > partitions distribution and cannot provide any guarantees when assignments > are changed. > This leads to NPEs or bizarre assertions, related to aforementioned method. > What we need to do is: > * Get rid of the faulty index awaiting mechanizm. > * Create index storages before starting raft group. > * [optional] There might be naturally occurring "races" between catalog > updates (index creation) and rebalance. Right now they are resolved by the > fact that these processes are linearized in watch processing, but that's not > the best approach. If we could provide something more robust, that would have > been nice. Let's think about it at least. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-21501) Create index storages for new partitions on rebalance
Ivan Bessonov created IGNITE-21501: -- Summary: Create index storages for new partitions on rebalance Key: IGNITE-21501 URL: https://issues.apache.org/jira/browse/IGNITE-21501 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov It appears that we only create index storages during the "table creation", not during the "partition creation" if it's performed in isolation. Even if we did, {{org.apache.ignite.internal.table.distributed.index.IndexUpdateHandler#waitIndexes}} is still badly designed, because it waits for indexes of the initial partitions distribution and cannot provide any guarantees when assignments are changed. This leads to NPEs or bizarre assertions, related to aforementioned method. What we need to do is: * Get rid of the faulty index awaiting mechanizm. * Create index storages before starting raft group. * [optional] There might be naturally occurring "races" between catalog updates (index creation) and rebalance. Right now they are resolved by the fact that these processes are linearized in watch processing, but that's not the best approach. If we could provide something more robust, that would have been nice. Let's think about it at least. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IGNITE-21488) Disable thread assertions by default
[ https://issues.apache.org/jira/browse/IGNITE-21488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov resolved IGNITE-21488. Reviewer: Ivan Bessonov Resolution: Fixed > Disable thread assertions by default > > > Key: IGNITE-21488 > URL: https://issues.apache.org/jira/browse/IGNITE-21488 > Project: Ignite > Issue Type: Improvement >Reporter: Roman Puchkovskiy >Assignee: Roman Puchkovskiy >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-21469) AssertionError in checkpoint
[ https://issues.apache.org/jira/browse/IGNITE-21469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-21469: --- Epic Link: IGNITE-21444 > AssertionError in checkpoint > > > Key: IGNITE-21469 > URL: https://issues.apache.org/jira/browse/IGNITE-21469 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > > {code:java} > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:870) > ~[?:?] at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) > ~[?:?] at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > ~[?:?] at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) > ~[?:?] at > org.apache.ignite.internal.pagememory.persistence.checkpoint.AwaitTasksCompletionExecutor.lambda$execute$1(AwaitTasksCompletionExecutor.java:63) > ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > ~[?:?] at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > ~[?:?] ... 1 more Caused by: java.lang.AssertionError: FullPageId > [pageId=000100020378, effectivePageId=00020378, groupId=886] > at > org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:758) > ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at > org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:641) > ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at > org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:613) > ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at > org.apache.ignite.internal.pagememory.util.PageHandler.writePage(PageHandler.java:280) > ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at > org.apache.ignite.internal.pagememory.datastructure.DataStructure.write(DataStructure.java:296) > ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at > org.apache.ignite.internal.pagememory.freelist.PagesList.flushBucketsCache(PagesList.java:387) > ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at > org.apache.ignite.internal.pagememory.freelist.PagesList.saveMetadata(PagesList.java:332) > ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at > org.apache.ignite.internal.storage.pagememory.mv.RowVersionFreeList.saveMetadata(RowVersionFreeList.java:185) > ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?] at > org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.lambda$syncMetadataOnCheckpoint$13(PersistentPageMemoryMvPartitionStorage.java:345) > ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?] at > org.apache.ignite.internal.pagememory.persistence.checkpoint.AwaitTasksCompletionExecutor.lambda$execute$1(AwaitTasksCompletionExecutor.java:59) > ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > ~[?:?] at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > ~[?:?] ... 1 more{code} > [https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7820824?expandBuildDeploymentsSection=false=false=false=true=true+Inspection=true] > > The reason of the assertion is a bug/race in listeners unregistration for > partitions freelists. We should do it properly -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-21469) AssertionError in checkpoint
[ https://issues.apache.org/jira/browse/IGNITE-21469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-21469: --- Ignite Flags: (was: Docs Required,Release Notes Required) > AssertionError in checkpoint > > > Key: IGNITE-21469 > URL: https://issues.apache.org/jira/browse/IGNITE-21469 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > > {code:java} > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:870) > ~[?:?] at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) > ~[?:?] at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > ~[?:?] at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) > ~[?:?] at > org.apache.ignite.internal.pagememory.persistence.checkpoint.AwaitTasksCompletionExecutor.lambda$execute$1(AwaitTasksCompletionExecutor.java:63) > ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > ~[?:?] at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > ~[?:?] ... 1 more Caused by: java.lang.AssertionError: FullPageId > [pageId=000100020378, effectivePageId=00020378, groupId=886] > at > org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:758) > ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at > org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:641) > ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at > org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:613) > ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at > org.apache.ignite.internal.pagememory.util.PageHandler.writePage(PageHandler.java:280) > ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at > org.apache.ignite.internal.pagememory.datastructure.DataStructure.write(DataStructure.java:296) > ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at > org.apache.ignite.internal.pagememory.freelist.PagesList.flushBucketsCache(PagesList.java:387) > ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at > org.apache.ignite.internal.pagememory.freelist.PagesList.saveMetadata(PagesList.java:332) > ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at > org.apache.ignite.internal.storage.pagememory.mv.RowVersionFreeList.saveMetadata(RowVersionFreeList.java:185) > ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?] at > org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.lambda$syncMetadataOnCheckpoint$13(PersistentPageMemoryMvPartitionStorage.java:345) > ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?] at > org.apache.ignite.internal.pagememory.persistence.checkpoint.AwaitTasksCompletionExecutor.lambda$execute$1(AwaitTasksCompletionExecutor.java:59) > ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > ~[?:?] at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > ~[?:?] ... 1 more{code} > [https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7820824?expandBuildDeploymentsSection=false=false=false=true=true+Inspection=true] > > The reason of the assertion is a bug/race in listeners unregistration for > partitions freelists. We should do it properly -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-21469) AssertionError in checkpoint
[ https://issues.apache.org/jira/browse/IGNITE-21469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-21469: --- Labels: ignite-3 (was: ) > AssertionError in checkpoint > > > Key: IGNITE-21469 > URL: https://issues.apache.org/jira/browse/IGNITE-21469 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > > {code:java} > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:870) > ~[?:?] at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) > ~[?:?] at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > ~[?:?] at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) > ~[?:?] at > org.apache.ignite.internal.pagememory.persistence.checkpoint.AwaitTasksCompletionExecutor.lambda$execute$1(AwaitTasksCompletionExecutor.java:63) > ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > ~[?:?] at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > ~[?:?] ... 1 more Caused by: java.lang.AssertionError: FullPageId > [pageId=000100020378, effectivePageId=00020378, groupId=886] > at > org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:758) > ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at > org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:641) > ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at > org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:613) > ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at > org.apache.ignite.internal.pagememory.util.PageHandler.writePage(PageHandler.java:280) > ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at > org.apache.ignite.internal.pagememory.datastructure.DataStructure.write(DataStructure.java:296) > ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at > org.apache.ignite.internal.pagememory.freelist.PagesList.flushBucketsCache(PagesList.java:387) > ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at > org.apache.ignite.internal.pagememory.freelist.PagesList.saveMetadata(PagesList.java:332) > ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at > org.apache.ignite.internal.storage.pagememory.mv.RowVersionFreeList.saveMetadata(RowVersionFreeList.java:185) > ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?] at > org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.lambda$syncMetadataOnCheckpoint$13(PersistentPageMemoryMvPartitionStorage.java:345) > ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?] at > org.apache.ignite.internal.pagememory.persistence.checkpoint.AwaitTasksCompletionExecutor.lambda$execute$1(AwaitTasksCompletionExecutor.java:59) > ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > ~[?:?] at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > ~[?:?] ... 1 more{code} > [https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7820824?expandBuildDeploymentsSection=false=false=false=true=true+Inspection=true] > > The reason of the assertion is a bug/race in listeners unregistration for > partitions freelists. We should do it properly -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-21469) AssertionError in checkpoint
Ivan Bessonov created IGNITE-21469: -- Summary: AssertionError in checkpoint Key: IGNITE-21469 URL: https://issues.apache.org/jira/browse/IGNITE-21469 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov {code:java} at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:870) ~[?:?] at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) ~[?:?] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) ~[?:?] at org.apache.ignite.internal.pagememory.persistence.checkpoint.AwaitTasksCompletionExecutor.lambda$execute$1(AwaitTasksCompletionExecutor.java:63) ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] ... 1 more Caused by: java.lang.AssertionError: FullPageId [pageId=000100020378, effectivePageId=00020378, groupId=886] at org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:758) ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:641) ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:613) ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.pagememory.util.PageHandler.writePage(PageHandler.java:280) ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.pagememory.datastructure.DataStructure.write(DataStructure.java:296) ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.pagememory.freelist.PagesList.flushBucketsCache(PagesList.java:387) ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.pagememory.freelist.PagesList.saveMetadata(PagesList.java:332) ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.storage.pagememory.mv.RowVersionFreeList.saveMetadata(RowVersionFreeList.java:185) ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.lambda$syncMetadataOnCheckpoint$13(PersistentPageMemoryMvPartitionStorage.java:345) ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.pagememory.persistence.checkpoint.AwaitTasksCompletionExecutor.lambda$execute$1(AwaitTasksCompletionExecutor.java:59) ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] ... 1 more{code} [https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7820824?expandBuildDeploymentsSection=false=false=false=true=true+Inspection=true] The reason of the assertion is a bug/race in listeners unregistration for partitions freelists. We should do it properly -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IGNITE-21044) Investigate long table creation
[ https://issues.apache.org/jira/browse/IGNITE-21044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov resolved IGNITE-21044. Resolution: Done > Investigate long table creation > --- > > Key: IGNITE-21044 > URL: https://issues.apache.org/jira/browse/IGNITE-21044 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > If we run the test, in which we would create a lot of tables (mare than 200? > for example), we soon start seeing a degradation in table creation time. > In particular, handling of corresponding Catalog update might take literal > seconds. > One of the reasons is described here: > https://issues.apache.org/jira/browse/IGNITE-19913 > It explains why table creation might be slow, but it does not explain why it > degrades when we create more tables. So there are basically two issues: > * watch processing waits for unnecessary operations to complete > * those operations are too slow for some reason > We need to investigate and fix both issues -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-21466) Add metrics for partition states
[ https://issues.apache.org/jira/browse/IGNITE-21466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov reassigned IGNITE-21466: -- Assignee: Ivan Bessonov > Add metrics for partition states > > > Key: IGNITE-21466 > URL: https://issues.apache.org/jira/browse/IGNITE-21466 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-21466) Add metrics for partition states
Ivan Bessonov created IGNITE-21466: -- Summary: Add metrics for partition states Key: IGNITE-21466 URL: https://issues.apache.org/jira/browse/IGNITE-21466 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-21465) Add system views for partition states
Ivan Bessonov created IGNITE-21465: -- Summary: Add system views for partition states Key: IGNITE-21465 URL: https://issues.apache.org/jira/browse/IGNITE-21465 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-21446) Import JVM args from build.gradle for JUnit run configurations
[ https://issues.apache.org/jira/browse/IGNITE-21446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-21446: --- Ignite Flags: (was: Docs Required,Release Notes Required) > Import JVM args from build.gradle for JUnit run configurations > -- > > Key: IGNITE-21446 > URL: https://issues.apache.org/jira/browse/IGNITE-21446 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 20m > Remaining Estimate: 0h > > This should help running tests locally with IDEA runner on Java 17 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-21446) Import JVM args from build.gradle for JUnit run configurations
[ https://issues.apache.org/jira/browse/IGNITE-21446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-21446: --- Reviewer: Kirill Tkalenko > Import JVM args from build.gradle for JUnit run configurations > -- > > Key: IGNITE-21446 > URL: https://issues.apache.org/jira/browse/IGNITE-21446 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 10m > Remaining Estimate: 0h > > This should help running tests locally with IDEA runner on Java 17 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-21446) Import JVM args from build.gradle for JUnit run configurations
Ivan Bessonov created IGNITE-21446: -- Summary: Import JVM args from build.gradle for JUnit run configurations Key: IGNITE-21446 URL: https://issues.apache.org/jira/browse/IGNITE-21446 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Assignee: Ivan Bessonov Fix For: 3.0.0-beta2 This should help running tests locally with IDEA runner on Java 17 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-21434) Fail user write requests for non-available partitions
Ivan Bessonov created IGNITE-21434: -- Summary: Fail user write requests for non-available partitions Key: IGNITE-21434 URL: https://issues.apache.org/jira/browse/IGNITE-21434 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Currently, {{INSERT INTO test VALUES(%d, %d);}} just hands indefinitely, which is not what you would expect. We should either fail the request immediately if there's no majority, or return a replication timeout exception, for example. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IGNITE-20067) Optimize "StorageUpdateHandler#handleUpdateAll"
[ https://issues.apache.org/jira/browse/IGNITE-20067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov resolved IGNITE-20067. Fix Version/s: 3.0.0-beta2 Reviewer: Ivan Bessonov Resolution: Fixed > Optimize "StorageUpdateHandler#handleUpdateAll" > --- > > Key: IGNITE-20067 > URL: https://issues.apache.org/jira/browse/IGNITE-20067 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Philipp Shergalis >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > In current implementation, the size of a single batch inside of the > "runConsistently" is unpredictable, because the collection of rows is > received from the message. > Generally speaking, it's a good idea to make the scope of single > "runConsistently" smaller - it would lead to faster work in all storage > engines: > * for rocksdb, write batches would become smaller; > * for page memory, spikes on checkpoint would become smaller. > There are two criteria that we could use: > * number of rows stored; > * cumulative number of inserted bytes. > Raft does the same approximation when batching log records, for example. This > should not affect the data consistency, because updateAll itself is > idempotent by its nature -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-21359) There are 2 RebalanceUtil classes
Ivan Bessonov created IGNITE-21359: -- Summary: There are 2 RebalanceUtil classes Key: IGNITE-21359 URL: https://issues.apache.org/jira/browse/IGNITE-21359 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Assignee: Ivan Bessonov Fix For: 3.0.0-beta2 and they duplicate constants and methods. The least that we could do is remove code duplication and maybe rename one of these classes -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-21347) Fix license header extra whitespaces in ErrorCodeGroup annotation processor
[ https://issues.apache.org/jira/browse/IGNITE-21347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-21347: --- Labels: ignite-3 (was: ) > Fix license header extra whitespaces in ErrorCodeGroup annotation processor > > > Key: IGNITE-21347 > URL: https://issues.apache.org/jira/browse/IGNITE-21347 > Project: Ignite > Issue Type: Improvement >Reporter: Dmitrii Zabotlin >Assignee: Dmitrii Zabotlin >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 20m > Remaining Estimate: 0h > > There are extra whitespaces in the license headers in the generated error > codes files. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-21284) Internal API for manual raft group configuration update
[ https://issues.apache.org/jira/browse/IGNITE-21284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov reassigned IGNITE-21284: -- Assignee: Ivan Bessonov > Internal API for manual raft group configuration update > --- > > Key: IGNITE-21284 > URL: https://issues.apache.org/jira/browse/IGNITE-21284 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > We need an API (with implementation) that's analogous to > "reset-lost-partitions", but with the ability to reuse living minority of > nodes. > This API should gather the states of partitions, identify healthy peers, and > use them as a new raft group configuration (through the update of > assignments). > We have to make sure that node with latest log index will become a leader, so > we will have to propagate desired minimum for log index in assignments and > use it during the voting. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-21309) DirectMessageWriter keeps holding used buffers
[ https://issues.apache.org/jira/browse/IGNITE-21309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-21309: --- Reviewer: Kirill Tkalenko > DirectMessageWriter keeps holding used buffers > -- > > Key: IGNITE-21309 > URL: https://issues.apache.org/jira/browse/IGNITE-21309 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 10m > Remaining Estimate: 0h > > Thread-local optimized marshallers store links to write buffers in their > internal stacks, which could lead to occasional OOMs. We should release > buffers after writing nested messages in DirectMessageWriter. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-21309) DirectMessageWriter keeps holding used buffers
Ivan Bessonov created IGNITE-21309: -- Summary: DirectMessageWriter keeps holding used buffers Key: IGNITE-21309 URL: https://issues.apache.org/jira/browse/IGNITE-21309 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Assignee: Ivan Bessonov Fix For: 3.0.0-beta2 Thread-local optimized marshallers store links to write buffers in their internal stacks, which could lead to occasional OOMs. We should release buffers after writing nested messages in DirectMessageWriter. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-21305) Internal API for truncating log suffix
Ivan Bessonov created IGNITE-21305: -- Summary: Internal API for truncating log suffix Key: IGNITE-21305 URL: https://issues.apache.org/jira/browse/IGNITE-21305 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov API and implementation is needed to truncate suffix of peers in ERROR state that cannot proceed applying commands -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-21305) Internal API for truncating log suffix
[ https://issues.apache.org/jira/browse/IGNITE-21305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-21305: --- Ignite Flags: (was: Docs Required,Release Notes Required) > Internal API for truncating log suffix > -- > > Key: IGNITE-21305 > URL: https://issues.apache.org/jira/browse/IGNITE-21305 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Priority: Major > > API and implementation is needed to truncate suffix of peers in ERROR state > that cannot proceed applying commands -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-21256) Internal API for local partition states
[ https://issues.apache.org/jira/browse/IGNITE-21256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-21256: --- Description: Please refer to https://issues.apache.org/jira/browse/IGNITE-21140 for the list. We need an API (with implementation) to access the list of local partitions and their states. The way to determine them: * comparing current assignments with replica states * check the state machine, it might be broken or installing snapshot was: Please refer to https://issues.apache.org/jira/browse/IGNITE-21140 for the list. We need an API to access the list of local partitions and their states. The way to determine them: * comparing current assignments with replica states * check the state machine, it might be broken or installing snapshot > Internal API for local partition states > --- > > Key: IGNITE-21256 > URL: https://issues.apache.org/jira/browse/IGNITE-21256 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > Please refer to https://issues.apache.org/jira/browse/IGNITE-21140 for the > list. We need an API (with implementation) to access the list of local > partitions and their states. The way to determine them: > * comparing current assignments with replica states > * check the state machine, it might be broken or installing snapshot -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-21284) Internal API for manual raft group configuration update
[ https://issues.apache.org/jira/browse/IGNITE-21284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-21284: --- Description: We need an API (with implementation) that's analogous to "reset-lost-partitions", but with the ability to reuse living minority of nodes. This API should gather the states of partitions, identify healthy peers, and use them as a new raft group configuration (through the update of assignments). We have to make sure that node with latest log index will become a leader, so we will have to propagate desired minimum for log index in assignments and use it during the voting. was: We need an API that's analogous to "reset-lost-partitions", but with the ability to reuse living minority of nodes. This API should gather the states of partitions, identify healthy peers, and use them as a new raft group configuration (through the update of assignments). We have to make sure that node with latest log index will become a leader, so we will have to propagate desired minimum for log index in assignments and use it during the voting. > Internal API for manual raft group configuration update > --- > > Key: IGNITE-21284 > URL: https://issues.apache.org/jira/browse/IGNITE-21284 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > We need an API (with implementation) that's analogous to > "reset-lost-partitions", but with the ability to reuse living minority of > nodes. > This API should gather the states of partitions, identify healthy peers, and > use them as a new raft group configuration (through the update of > assignments). > We have to make sure that node with latest log index will become a leader, so > we will have to propagate desired minimum for log index in assignments and > use it during the voting. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-21304) Internal API for restarting partitions
Ivan Bessonov created IGNITE-21304: -- Summary: Internal API for restarting partitions Key: IGNITE-21304 URL: https://issues.apache.org/jira/browse/IGNITE-21304 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov API and implementation should be provided for restarting peers in raft groups. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-21303) Exclude nodes in "error" state from manual group reconfiguration
Ivan Bessonov created IGNITE-21303: -- Summary: Exclude nodes in "error" state from manual group reconfiguration Key: IGNITE-21303 URL: https://issues.apache.org/jira/browse/IGNITE-21303 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Instead of simply using existing set of node as a baseline for new assignments, we should either exclude peers in ERROR state from it, or force data cleanup on such nodes. Third option - forbid such reconfiguration, forcing user to clear ERROR peers in advance -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-21302) Prohibit automatic group reconfiguration when there's no majority
Ivan Bessonov created IGNITE-21302: -- Summary: Prohibit automatic group reconfiguration when there's no majority Key: IGNITE-21302 URL: https://issues.apache.org/jira/browse/IGNITE-21302 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov scaleDown timer should not lead to a situation where user loses the data. Default "changePeers" behavior also won't work, because there's no majority and thus no leader. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-21301) Sync raft log before flush in all storage engines
Ivan Bessonov created IGNITE-21301: -- Summary: Sync raft log before flush in all storage engines Key: IGNITE-21301 URL: https://issues.apache.org/jira/browse/IGNITE-21301 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Checkpoints and RocsDB's flush actions should sync log before completing writing data to disk, if "fsync" is disabled -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-21300) Implement disaster recovery for secondary indexes
Ivan Bessonov created IGNITE-21300: -- Summary: Implement disaster recovery for secondary indexes Key: IGNITE-21300 URL: https://issues.apache.org/jira/browse/IGNITE-21300 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov It is possible that if we lost part of the log, some available indexes might become "locally" unavailable. We will have to finish build process second time in such a case. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-21299) Rest API for disaster recovery commands
Ivan Bessonov created IGNITE-21299: -- Summary: Rest API for disaster recovery commands Key: IGNITE-21299 URL: https://issues.apache.org/jira/browse/IGNITE-21299 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Please refer to https://issues.apache.org/jira/browse/IGNITE-21298 for a list -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-21298) CLI for disaster recovery commands
Ivan Bessonov created IGNITE-21298: -- Summary: CLI for disaster recovery commands Key: IGNITE-21298 URL: https://issues.apache.org/jira/browse/IGNITE-21298 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Names might change. * ignite restart-partitions --nodes [--zones ] [--partitions ] [--purge] * ignite reset-lost-partitions [--zones ] [--partitions ] * ignite truncate-log-suffix --zone --partition --index * ignite partition-states [--local [--nodes ] | --global] [--zones ] [--partitions ] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-21295) Public Java API for manual raft group configuration update
Ivan Bessonov created IGNITE-21295: -- Summary: Public Java API for manual raft group configuration update Key: IGNITE-21295 URL: https://issues.apache.org/jira/browse/IGNITE-21295 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Implement public API for IGNITE-21284 -- This message was sent by Atlassian Jira (v8.20.10#820010)