[jira] [Updated] (IGNITE-22819) Metastorage revisions inconsistency

2024-07-24 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22819:
---
Description: 
Following situation might happen:
{code:java}
[2024-07-24T09:29:17,220][INFO 
][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:220 +0300, logical=0, 
composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:220 +0300, logical=1, 
composite=112840052389969921], removedEntriesCount=0, cacheSize=240].

... 

[2024-07-24T09:29:17,257][INFO 
][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=0, 
composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=1, 
composite=112840052392394753], removedEntriesCount=3, cacheSize=237].
[2024-07-24T09:29:17,257][INFO 
][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=0, 
composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=1, 
composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code}
Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, because 
of the difference of their clocks. {{evictIdempotentCommandsCache}} works 
differently on different nodes for the same raft commands.

The real problem here is that it might (or might not) call the 
{{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would increase 
local revision.
 
Revision is always local, it's never replicated. Revision mismatch leads to 
different evaluation of conditions in conditional updates and invokes. Simple 
example of such an issue would be a skipped configuration update on one or 
several nodes in cluster.
 
 What can we do about it:
 * make an alternative for {{removeAll}} that doesn't increase local revision
 * call {{removeAll}} even if the list is empty
 * never invalidate cache locally, but rather replicate cache invalidation with 
a special command
 * there's a TODO that says "clear this during compaction". That's a bad 
option, it would lead to either frequent compactions, or huge memory overheads

  was:
Following situation might happen:
{code:java}
[2024-07-24T09:29:17,220][INFO 
][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:220 +0300, logical=0, 
composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:220 +0300, logical=1, 
composite=112840052389969921], removedEntriesCount=0, cacheSize=240].

... 

[2024-07-24T09:29:17,257][INFO 
][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=0, 
composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=1, 
composite=112840052392394753], removedEntriesCount=3, cacheSize=237].
[2024-07-24T09:29:17,257][INFO 
][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=0, 
composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=1, 
composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code}
Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, because 
of the difference of their clocks. {{evictIdempotentCommandsCache}} works 
differently on different nodes for the same raft commands.

The real problem here is that it might (or might not) call the 
{{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would increase 
local revision.
 
Revision is always local, it's never replicated. Revision mismatch leads to 
different evaluation of conditions in conditional updates and invokes. Simple 
example of such an issue would be a skipped configuration update on one or 
several nodes in cluster.
 
 What can we do about it:
 * make an alternative for {{removeAll}} that doesn't increase local revision
 * never invalidate cache locally, but rather replicate cache invalidation with 
a special command
 * there's a TODO that says "clear this during compaction". That's a 

[jira] [Updated] (IGNITE-22819) Metastorage revisions inconsistency

2024-07-24 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22819:
---
Description: 
Following situation might happen:
{code:java}
[2024-07-24T09:29:17,220][INFO 
][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:220 +0300, logical=0, 
composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:220 +0300, logical=1, 
composite=112840052389969921], removedEntriesCount=0, cacheSize=240].

... 

[2024-07-24T09:29:17,257][INFO 
][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=0, 
composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=1, 
composite=112840052392394753], removedEntriesCount=3, cacheSize=237].
[2024-07-24T09:29:17,257][INFO 
][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=0, 
composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=1, 
composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code}
Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, because 
of the difference of their clocks. {{evictIdempotentCommandsCache}} works 
differently on different nodes for the same raft commands.

The real problem here is that it might (or might not) call the 
{{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would increase 
local revision.
 
Revision is always local, it's never replicated. Revision mismatch leads to 
different evaluation of conditions in conditional updates and invokes. Simple 
example of such an issue would be a skipped configuration update on one or 
several nodes in cluster.
 
 What can we do about it:
 * make an alternative for {{removeAll}} that doesn't increase local revision
 * never invalidate cache locally, but rather replicate cache invalidation with 
a special command
 * there's a TODO that says "clear this during compaction". That's a bad 
option, it would lead to either frequent compactions, or huge memory overheads

  was:
 

Following situation might happen:
{code:java}
[2024-07-24T09:29:17,220][INFO 
][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:220 +0300, logical=0, 
composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:220 +0300, logical=1, 
composite=112840052389969921], removedEntriesCount=0, cacheSize=240].

... 

[2024-07-24T09:29:17,257][INFO 
][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=0, 
composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=1, 
composite=112840052392394753], removedEntriesCount=3, cacheSize=237].
[2024-07-24T09:29:17,257][INFO 
][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=0, 
composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=1, 
composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code}
Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, because 
of the difference of their clocks. {{evictIdempotentCommandsCache}} works 
differently on different nodes for the same raft commands.

The real problem here is that it might (or might not) call the 
{{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would increase 
local revision.
 
Revision is always local, it's never replicated. Revision mismatch leads to 
different evaluation of conditions in conditional updates and invokes. Simple 
example of such an issue would be a skipped configuration update on one or 
several nodes in cluster.
 
 What can we do about it:
 * make an alternative for {{removeAll}} that doesn't increase local revision
 * never invalidate cache locally, but rather replicate cache invalidation with 
a special command
 * there's a TODO that says "clear this during compaction". That's a bad 
option, it would lead to either 

[jira] [Updated] (IGNITE-22819) Metastorage revisions inconsistency

2024-07-24 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22819:
---
Description: 
 

Following situation might happen:
{code:java}
[2024-07-24T09:29:17,220][INFO 
][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:220 +0300, logical=0, 
composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:220 +0300, logical=1, 
composite=112840052389969921], removedEntriesCount=0, cacheSize=240].

... 

[2024-07-24T09:29:17,257][INFO 
][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=0, 
composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=1, 
composite=112840052392394753], removedEntriesCount=3, cacheSize=237].
[2024-07-24T09:29:17,257][INFO 
][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=0, 
composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=1, 
composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code}
Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, because 
of the difference of their clocks. {{evictIdempotentCommandsCache}} works 
differently on different nodes for the same raft commands.

The real problem here is that it might (or might not) call the 
{{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would increase 
local revision.
 
Revision is always local, it's never replicated. Revision mismatch leads to 
different evaluation of conditions in conditional updates and invokes. Simple 
example of such an issue would be a skipped configuration update on one or 
several nodes in cluster.
 
 What can we do about it:
 * make an alternative for {{removeAll}} that doesn't increase local revision
 * never invalidate cache locally, but rather replicate cache invalidation with 
a special command
 * there's a TODO that says "clear this during compaction". That's a bad 
option, it would lead to either frequent compactions, or huge memory overheads

  was:
 

Following situation might happen:
{code:java}
[2024-07-24T09:29:17,220][INFO 
][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:220 +0300, logical=0, 
composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:220 +0300, logical=1, 
composite=112840052389969921], removedEntriesCount=0, cacheSize=240].

... 

[2024-07-24T09:29:17,257][INFO 
][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=0, 
composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=1, 
composite=112840052392394753], removedEntriesCount=3, cacheSize=237].
[2024-07-24T09:29:17,257][INFO 
][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=0, 
composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=1, 
composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code}
Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, because 
of the difference of their clocks. {{evictIdempotentCommandsCache}} works 
differently on different nodes for the same raft commands.

The real problem here is that it might (or might not) call the 
{{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would increase 
local revision.
 
Revision is always local, it's never replicated. Revision mismatch leads to 
different evaluation of conditions in conditional updates and invokes. Simple 
example of such an issue would be a skipped configuration update on one of or 
several nodes in cluster.
 
 
 


> Metastorage revisions inconsistency
> ---
>
> Key: IGNITE-22819
> URL: https://issues.apache.org/jira/browse/IGNITE-22819
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Priority: 

[jira] [Created] (IGNITE-22819) Metastorage revisions inconsistency

2024-07-24 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22819:
--

 Summary: Metastorage revisions inconsistency
 Key: IGNITE-22819
 URL: https://issues.apache.org/jira/browse/IGNITE-22819
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov


 

Following situation might happen:
{code:java}
[2024-07-24T09:29:17,220][INFO 
][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:220 +0300, logical=0, 
composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:220 +0300, logical=1, 
composite=112840052389969921], removedEntriesCount=0, cacheSize=240].

... 

[2024-07-24T09:29:17,257][INFO 
][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=0, 
composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=1, 
composite=112840052392394753], removedEntriesCount=3, cacheSize=237].
[2024-07-24T09:29:17,257][INFO 
][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=0, 
composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=1, 
composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code}
Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, because 
of the difference of their clocks. {{evictIdempotentCommandsCache}} works 
differently on different nodes for the same raft commands.

The real problem here is that it might (or might not) call the 
{{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would increase 
local revision.
 
Revision is always local, it's never replicated. Revision mismatch leads to 
different evaluation of conditions in conditional updates and invokes. Simple 
example of such an issue would be a skipped configuration update on one of or 
several nodes in cluster.
 
 
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-22736) PartitionCommandsMarshallerImpl corrupts the buffer it reads from

2024-07-15 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-22736:
--

Fix Version/s: 3.0.0-beta2
 Assignee: Ivan Bessonov
   Labels: ignite-3  (was: )

> PartitionCommandsMarshallerImpl corrupts the buffer it reads from
> -
>
> Key: IGNITE-22736
> URL: https://issues.apache.org/jira/browse/IGNITE-22736
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> {{PartitionCommandsMarshallerImpl#unmarshall}} receives a buffer, that's 
> requested from the log manager, for example.
> The instance of byte buffer that it receives might be acquired from on-heap 
> cache of log entries. Modifying it would be
>  # not thread-safe, because multiple threads may start modifying it 
> concurrently
>  # illegal, because it stays in the cache for some time, and we basically 
> corrupt it by modifying it
> We shouldn't do that



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22736) PartitionCommandsMarshallerImpl corrupts the buffer it reads from

2024-07-15 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22736:
--

 Summary: PartitionCommandsMarshallerImpl corrupts the buffer it 
reads from
 Key: IGNITE-22736
 URL: https://issues.apache.org/jira/browse/IGNITE-22736
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov


{{PartitionCommandsMarshallerImpl#unmarshall}} receives a buffer, that's 
requested from the log manager, for example.

The instance of byte buffer that it receives might be acquired from on-heap 
cache of log entries. Modifying it would be
 # not thread-safe, because multiple threads may start modifying it concurrently
 # illegal, because it stays in the cache for some time, and we basically 
corrupt it by modifying it

We shouldn't do that



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22657) Investigate why ItDisasterRecoveryReconfigurationTest#testIncompleteRebalanceAfterResetPartitions fails without sleep

2024-07-03 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22657:
--

 Summary: Investigate why 
ItDisasterRecoveryReconfigurationTest#testIncompleteRebalanceAfterResetPartitions
 fails without sleep
 Key: IGNITE-22657
 URL: https://issues.apache.org/jira/browse/IGNITE-22657
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-21303) Exclude nodes in "error" state from manual group reconfiguration

2024-06-27 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-21303:
--

Assignee: Ivan Bessonov

> Exclude nodes in "error" state from manual group reconfiguration
> 
>
> Key: IGNITE-21303
> URL: https://issues.apache.org/jira/browse/IGNITE-21303
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Instead of simply using existing set of node as a baseline for new 
> assignments, we should either exclude peers in ERROR state from it, or force 
> data cleanup on such nodes. Third option - forbid such reconfiguration, 
> forcing user to clear ERROR peers in advance



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IGNITE-22500) Remove unnecessary waits when creating an index

2024-06-26 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov resolved IGNITE-22500.

Resolution: Won't Fix

About eliminating a BUILDING status from catalog, we can't simply change a few 
lines, this task involves more changes. To my understanding, following nuances 
are important:
 * ChangeIndexStatusTask should be changed. If we remove REGISTERED->BUILDING 
change, then we wouldn't have to update catalog, this will lead to small 
refactoring.
 * We would have to create {{CatalogEvent.INDEX_BUILDING}} event instead of 
updating the catalog.
 * This event will have nothing to do with catalog at this point, it should be 
renamed.
 * It will *not* be fired in a context of meta-storage watch execution, which 
might be a problem if listener implementations rely on it. Spoiler: they do.
 * Local recovery and other such stuff will be changed slightly, this part 
shouldn't be that hard.

Overall, I don't think that we should do such an optimization in this issue 
specifically. It's not about "removing wait that we don't need", it's about 
changing the internal protocol of index creation. I will file another Jira for 
that soon

> Remove unnecessary waits when creating an index
> ---
>
> Key: IGNITE-22500
> URL: https://issues.apache.org/jira/browse/IGNITE-22500
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Roman Puchkovskiy
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> When creating an index with current defaults (DelayDuration=1sec, 
> MaxClockSkew=500ms, IdleSafeTimePropagationPeriod=1sec), it takes 6-6.5 
> seconds on my machine (without concurrent transactions, on an empty table 
> that was just created).
> According to the design, we need to first wait for the REGISTERED state to 
> activate on all nodes, including the ones that are currently down; this is to 
> make sure that all transactions started on schema versions before the index 
> creation have finished before we start to build the index (this makes us 
> waiting for DelayDuration+MaxClockSkew). Then, after the build finishes, we 
> switch the index to the AVAILABLE state. This requires another wait of 
> DelayDuration+MaxClockSkew.
> Because of IGNITE-20378, in the second case we actually wait longer (for 
> additional IdleSafeTimePropagationPeriod+MaxClockSkew).
> The total of waits is thus 1.5+3=4.5sec. But index creation actually takes 
> 6-6.5 seconds. It looks like there are some additional delays (like 
> submitting to the Metastorage and executing its watches).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-22500) Remove unnecessary waits when creating an index

2024-06-25 Thread Ivan Bessonov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17859937#comment-17859937
 ] 

Ivan Bessonov commented on IGNITE-22500:


My thoughts on the topic:
 * _We have additional switch from REGISTERED to BUILDING, which can in theory 
be eliminated from catalog, it'll save us additional second (DD is 500ms now)_
 * We can't lower DD for a specific status change, because it would violate 
schema synchronization protocol. After waiting for "msSafeTime - DD - skew" 
(don't remember precise rules about clock skew) we rely on the fact that the 
catalog is up-to-date, breaking that invariant would lead to some unforeseen 
consequences.
 * What we really need it:
 ** The ability to create indexes in the same DDL as the table itself. We do 
this implicitly for PK. For other indexes it's only a question of API
 ** For SQL scripts we could batch consecutive DDLs and create indexes at the 
same time as a table implicitly, which seems like an optimal choice. This way 
we don't need any special syntax
 ** Some DDL queries can be executed in parallel, why not. Again, seems more 
like a SQL issue to me

> Remove unnecessary waits when creating an index
> ---
>
> Key: IGNITE-22500
> URL: https://issues.apache.org/jira/browse/IGNITE-22500
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Roman Puchkovskiy
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> When creating an index with current defaults (DelayDuration=1sec, 
> MaxClockSkew=500ms, IdleSafeTimePropagationPeriod=1sec), it takes 6-6.5 
> seconds on my machine (without concurrent transactions, on an empty table 
> that was just created).
> According to the design, we need to first wait for the REGISTERED state to 
> activate on all nodes, including the ones that are currently down; this is to 
> make sure that all transactions started on schema versions before the index 
> creation have finished before we start to build the index (this makes us 
> waiting for DelayDuration+MaxClockSkew). Then, after the build finishes, we 
> switch the index to the AVAILABLE state. This requires another wait of 
> DelayDuration+MaxClockSkew.
> Because of IGNITE-20378, in the second case we actually wait longer (for 
> additional IdleSafeTimePropagationPeriod+MaxClockSkew).
> The total of waits is thus 1.5+3=4.5sec. But index creation actually takes 
> 6-6.5 seconds. It looks like there are some additional delays (like 
> submitting to the Metastorage and executing its watches).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22561) Get rid of ByteString in messages

2024-06-24 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22561:
--

 Summary: Get rid of ByteString in messages
 Key: IGNITE-22561
 URL: https://issues.apache.org/jira/browse/IGNITE-22561
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


Here I would include two types of improvements:
 * {{@Marshallable ByteString}} - this pattern became obsolete long time ago. 
{{ByteBuffer}} type is natively supported by the protocol, and it should 
eliminate unnecessary data copying, potentioally making the system faster
 * Pretty much the same thing, but for {{{}byte[]{}}}. It's used in classes 
like {{{}org.apache.ignite.internal.metastorage.dsl.Operation{}}}. If we 
migrate these properties to {{ByteBuffer}} then deserialization will become 
significantly faster, but in order to utilize it we would have to change 
internal metastorage implementation a little bit (like optimizing memory usage 
in {{{}RocksDbKeyValueStorage#addDataToBatch{}}}).
If it requires too many changes then I propose doing it in a separate JIRA. My 
assumption - it will not require too many changes, but we'll see.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (IGNITE-22544) Commands marshalling appears to be slow

2024-06-24 Thread Ivan Bessonov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17859613#comment-17859613
 ] 

Ivan Bessonov edited comment on IGNITE-22544 at 6/24/24 10:00 AM:
--

According to 
{{{}org.apache.ignite.internal.benchmarks.UpdateCommandsMarshalingMicroBenchmark{}}}.

Before:
{code:java}
Benchmark                                         (payloadSize)  (updateAll)   
Mode  Cnt     Score     Error   Units
UpdateCommandsMarshalingMicroBenchmark.marshal  128false  
thrpt5  2361.249 ±  66.884  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal  128 true  
thrpt552.377 ±   3.769  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 2048false  
thrpt5  1713.443 ± 331.795  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 2048 true  
thrpt514.916 ±   2.230  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 8192false  
thrpt5   833.372 ± 227.738  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 8192 true  
thrpt5 3.281 ±   0.906  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal128false  
thrpt5  2090.845 ± 792.226  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal128 true  
thrpt551.393 ±  16.872  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   2048false  
thrpt5  2188.459 ±  69.423  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   2048 true  
thrpt552.705 ±   2.771  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   8192false  
thrpt5  2174.810 ±  61.331  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   8192 true  
thrpt553.805 ±   1.000  ops/ms {code}
After:
{code:java}
Benchmark                                         (payloadSize)  (updateAll)   
Mode  Cnt     Score      Error   Units
UpdateCommandsMarshalingMicroBenchmark.marshal              128        false  
thrpt    5  4389.765 ±   66.332  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal              128         true  
thrpt    5    79.684 ±    0.965  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             2048        false  
thrpt    5  2754.506 ±   58.151  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             2048         true  
thrpt    5    17.435 ±    0.267  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             8192        false  
thrpt    5  1066.381 ±   10.254  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             8192         true  
thrpt    5     3.389 ±    0.688  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal            128        false  
thrpt    5  2782.648 ±  173.791  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal            128         true  
thrpt    5    69.952 ±    9.109  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           2048        false  
thrpt    5  2752.568 ±   50.796  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           2048         true  
thrpt    5    63.721 ±    2.902  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           8192        false  
thrpt    5  2676.343 ± 1209.184  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           8192         true  
thrpt    5    62.139 ±   17.144  ops/ms {code}
Short summary:
 * Depending on the number of byte arrays inside of the message (which can't be 
optimized), marshaling became from 0% to 85% faster according to created 
benchmark, due to a combination of a lot of different optimizations, such as
 ** avoiding the creation of serializers
 ** simpler and slightly faster byte buffers pool
 ** better binary UUID format
 ** low-level stuff in direct stream
 ** better {{writeVarInt}} / {{writeVarLong}}
 * If we take a look at the flamegraph, we could see that serialization itself 
is about 1.5-2.0 times slower than the following {{{}Arrays.copyOf{}}}, which 
is pretty good in my opinion.
 * Reading speed wasn't so thoroughly checked in this issue, I created another 
one: https://issues.apache.org/jira/browse/IGNITE-22559
Overall, reading speed doesn't depend on the size of individual byte buffers, 
because we simple wrap the original array. Other then that, current 
optimizations show 15%-35% increase in deserialization speed, due to
 ** {{...StreamImplV1}} optimizations
 ** faster {{readInt}} / {{readLong}}
 ** better binary UUID format

 * Further optimizations for reads are required. Here I mostly focused on 
writing speed. Reading speed turned out to be worse than writing speed for 
small commands, I don't like it.


was (Author: ibessonov):
According to 
{{{}org.apache.ignite.internal.benchmarks.UpdateCommandsMarshalingMicroBenchmark{}}}.

Before:
{code:java}
Benchmark                            

[jira] [Comment Edited] (IGNITE-22544) Commands marshalling appears to be slow

2024-06-24 Thread Ivan Bessonov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17859613#comment-17859613
 ] 

Ivan Bessonov edited comment on IGNITE-22544 at 6/24/24 9:59 AM:
-

According to 
{{{}org.apache.ignite.internal.benchmarks.UpdateCommandsMarshalingMicroBenchmark{}}}.

Before:
{code:java}
Benchmark                                         (payloadSize)  (updateAll)   
Mode  Cnt     Score     Error   Units
UpdateCommandsMarshalingMicroBenchmark.marshal  128false  
thrpt5  2361.249 ±  66.884  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal  128 true  
thrpt552.377 ±   3.769  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 2048false  
thrpt5  1713.443 ± 331.795  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 2048 true  
thrpt514.916 ±   2.230  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 8192false  
thrpt5   833.372 ± 227.738  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 8192 true  
thrpt5 3.281 ±   0.906  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal128false  
thrpt5  2090.845 ± 792.226  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal128 true  
thrpt551.393 ±  16.872  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   2048false  
thrpt5  2188.459 ±  69.423  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   2048 true  
thrpt552.705 ±   2.771  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   8192false  
thrpt5  2174.810 ±  61.331  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   8192 true  
thrpt553.805 ±   1.000  ops/ms {code}
After:
{code:java}
Benchmark                                         (payloadSize)  (updateAll)   
Mode  Cnt     Score      Error   Units
UpdateCommandsMarshalingMicroBenchmark.marshal              128        false  
thrpt    5  4389.765 ±   66.332  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal              128         true  
thrpt    5    79.684 ±    0.965  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             2048        false  
thrpt    5  2754.506 ±   58.151  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             2048         true  
thrpt    5    17.435 ±    0.267  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             8192        false  
thrpt    5  1066.381 ±   10.254  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             8192         true  
thrpt    5     3.389 ±    0.688  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal            128        false  
thrpt    5  2782.648 ±  173.791  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal            128         true  
thrpt    5    69.952 ±    9.109  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           2048        false  
thrpt    5  2752.568 ±   50.796  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           2048         true  
thrpt    5    63.721 ±    2.902  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           8192        false  
thrpt    5  2676.343 ± 1209.184  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           8192         true  
thrpt    5    62.139 ±   17.144  ops/ms {code}
Short summary:
 * Depending on the number of byte arrays inside of the message (which can't be 
optimized), marshaling became from 0% to 85% faster according to created 
benchmark, due to a combination of a lot of different optimizations, such as
 ** avoiding the creation of serializers
 ** simpler and slightly faster byte buffers pool
 ** better binary UUID format
 ** low-level stuff in direct stream
 ** better {{writeVarInt}} / {{writeVarLong}}
 * If we take a look at the flamegraph, we could see that serialization itself 
is about 1.5-2.0 times slower than the following {{{}Arrays.copyOf{}}}, which 
is pretty good in my opinion.
 * Reading speed wasn't so thoroughly checked in this issue, I created another 
one: https://issues.apache.org/jira/browse/IGNITE-22559
Overall, reading speed doesn't depend on the size of individual byte buffers, 
because we simple wrap the original array. Other then that, current 
optimizations show 15%-35% increase in deserialization speed, due to
 ** {{...StreamImplV1}} optimizations
 ** faster {{readInt}} / {{readLong}}
 ** better binary UUID format

 *  
 * Further optimizations for reads are required. Here I mostly focused on 
writing speed. Reading speed turned out to be worse than writing speed for 
small commands, I don't like it.


was (Author: ibessonov):
According to 
{{{}org.apache.ignite.internal.benchmarks.UpdateCommandsMarshalingMicroBenchmark{}}}.

Before:
{code:java}
Benchmark                         

[jira] [Updated] (IGNITE-22544) Commands marshalling appears to be slow

2024-06-24 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22544:
---
Reviewer: Philipp Shergalis

> Commands marshalling appears to be slow
> ---
>
> Key: IGNITE-22544
> URL: https://issues.apache.org/jira/browse/IGNITE-22544
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3, ignite3_performance
> Attachments: IGNITE-22544.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should benchmark the way we marshal commands using optimized marshaller 
> and make it faster. Some obvious places:
>  * byte buffers pool - we can replace queue with a manual implementation of 
> Treiber stack, it's trivial and doesn't use as many CAS/volatile operations
>  * new serializers are allocated every time, but they can be put into static 
> final constants instead, or cached in fields of corresponding factories
>  * we can create a serialization factory per group, not per message, this way 
> we will remove unnecessary indirection. Group factory can use {{{}switch{}}}, 
> like in Ignite 2, which would basically lead to static dispatch of 
> deserializer constructors and static access to serializers, instead of 
> dynamic dispatch (virtual call), which should be noticeably faster
>  * profiler might show other simple places, we must also compare 
> {{OptimizedMarshaller}} against other serialization algorithms in benchmarks
> EDIT: quick draft attached, it addresses points 1 and 2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-22544) Commands marshalling appears to be slow

2024-06-24 Thread Ivan Bessonov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17859613#comment-17859613
 ] 

Ivan Bessonov commented on IGNITE-22544:


According to 
{{{}org.apache.ignite.internal.benchmarks.UpdateCommandsMarshalingMicroBenchmark{}}}.

Before:

 
{code:java}
Benchmark                                         (payloadSize)  (updateAll)   
Mode  Cnt     Score      Error   Units
UpdateCommandsMarshalingMicroBenchmark.marshal  128false  
thrpt5  2361.249 ±  66.884  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal  128 true  
thrpt552.377 ±   3.769  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 2048false  
thrpt5  1713.443 ± 331.795  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 2048 true  
thrpt514.916 ±   2.230  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 8192false  
thrpt5   833.372 ± 227.738  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 8192 true  
thrpt5 3.281 ±   0.906  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal128false  
thrpt5  2090.845 ± 792.226  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal128 true  
thrpt551.393 ±  16.872  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   2048false  
thrpt5  2188.459 ±  69.423  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   2048 true  
thrpt552.705 ±   2.771  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   8192false  
thrpt5  2174.810 ±  61.331  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   8192 true  
thrpt553.805 ±   1.000  ops/ms {code}
After:

 

 
{code:java}
Benchmark                                         (payloadSize)  (updateAll)   
Mode  Cnt     Score      Error   Units
UpdateCommandsMarshalingMicroBenchmark.marshal              128        false  
thrpt    5  4389.765 ±   66.332  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal              128         true  
thrpt    5    79.684 ±    0.965  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             2048        false  
thrpt    5  2754.506 ±   58.151  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             2048         true  
thrpt    5    17.435 ±    0.267  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             8192        false  
thrpt    5  1066.381 ±   10.254  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             8192         true  
thrpt    5     3.389 ±    0.688  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal            128        false  
thrpt    5  2782.648 ±  173.791  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal            128         true  
thrpt    5    69.952 ±    9.109  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           2048        false  
thrpt    5  2752.568 ±   50.796  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           2048         true  
thrpt    5    63.721 ±    2.902  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           8192        false  
thrpt    5  2676.343 ± 1209.184  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           8192         true  
thrpt    5    62.139 ±   17.144  ops/ms {code}
Short summary:
 * Depending on the number of byte arrays inside of the message (which can't be 
optimized), marshaling became from 0% to 85% faster according to created 
benchmark, due to a combination of a lot of different optimizations, such as

 ** avoiding the creation of serializers
 ** simpler and slightly faster byte buffers pool
 ** better binary UUID format
 ** low-level stuff in direct stream
 ** better {{writeVarInt}} / {{writeVarLong}}
 * If we take a look at the flamegraph, we could see that serialization itself 
is about 1.5-2.0 times slower than the following {{{}Arrays.copyOf{}}}, which 
is pretty good in my opinion.
 * Reading speed wasn't so thoroughly checked in this issue, I created another 
one: https://issues.apache.org/jira/browse/IGNITE-22559
Overall, reading speed doesn't depend on the size of individual byte buffers, 
because we simple wrap the original array. Other then that, current 
optimizations show 15%-35% increase in deserialization speed, due to
 ** {{...StreamImplV1}} optimizations
 ** faster {{readInt}} / {{readLong}}
 ** better binary UUID format
 * Further optimizations for reads are required. Here I mostly focused on 
writing speed. Reading speed turned out to be worse than writing speed for 
small commands, I don't like it.

 

 

> Commands marshalling appears to be slow
> ---
>
> Key: IGNITE-22544
> URL: https://issues.apache.org/jira/browse/IGNITE-22544
> Project: 

[jira] [Comment Edited] (IGNITE-22544) Commands marshalling appears to be slow

2024-06-24 Thread Ivan Bessonov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17859613#comment-17859613
 ] 

Ivan Bessonov edited comment on IGNITE-22544 at 6/24/24 8:25 AM:
-

According to 
{{{}org.apache.ignite.internal.benchmarks.UpdateCommandsMarshalingMicroBenchmark{}}}.

Before:
{code:java}
Benchmark                                         (payloadSize)  (updateAll)   
Mode  Cnt     Score      Error   Units
UpdateCommandsMarshalingMicroBenchmark.marshal  128false  
thrpt5  2361.249 ±  66.884  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal  128 true  
thrpt552.377 ±   3.769  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 2048false  
thrpt5  1713.443 ± 331.795  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 2048 true  
thrpt514.916 ±   2.230  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 8192false  
thrpt5   833.372 ± 227.738  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 8192 true  
thrpt5 3.281 ±   0.906  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal128false  
thrpt5  2090.845 ± 792.226  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal128 true  
thrpt551.393 ±  16.872  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   2048false  
thrpt5  2188.459 ±  69.423  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   2048 true  
thrpt552.705 ±   2.771  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   8192false  
thrpt5  2174.810 ±  61.331  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   8192 true  
thrpt553.805 ±   1.000  ops/ms {code}
After:
{code:java}
Benchmark                                         (payloadSize)  (updateAll)   
Mode  Cnt     Score      Error   Units
UpdateCommandsMarshalingMicroBenchmark.marshal              128        false  
thrpt    5  4389.765 ±   66.332  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal              128         true  
thrpt    5    79.684 ±    0.965  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             2048        false  
thrpt    5  2754.506 ±   58.151  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             2048         true  
thrpt    5    17.435 ±    0.267  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             8192        false  
thrpt    5  1066.381 ±   10.254  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             8192         true  
thrpt    5     3.389 ±    0.688  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal            128        false  
thrpt    5  2782.648 ±  173.791  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal            128         true  
thrpt    5    69.952 ±    9.109  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           2048        false  
thrpt    5  2752.568 ±   50.796  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           2048         true  
thrpt    5    63.721 ±    2.902  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           8192        false  
thrpt    5  2676.343 ± 1209.184  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           8192         true  
thrpt    5    62.139 ±   17.144  ops/ms {code}
Short summary:
 * Depending on the number of byte arrays inside of the message (which can't be 
optimized), marshaling became from 0% to 85% faster according to created 
benchmark, due to a combination of a lot of different optimizations, such as

 * 
 ** avoiding the creation of serializers
 ** simpler and slightly faster byte buffers pool
 ** better binary UUID format
 ** low-level stuff in direct stream
 ** better {{writeVarInt}} / {{writeVarLong}}
 * If we take a look at the flamegraph, we could see that serialization itself 
is about 1.5-2.0 times slower than the following {{{}Arrays.copyOf{}}}, which 
is pretty good in my opinion.
 * Reading speed wasn't so thoroughly checked in this issue, I created another 
one: https://issues.apache.org/jira/browse/IGNITE-22559
Overall, reading speed doesn't depend on the size of individual byte buffers, 
because we simple wrap the original array. Other then that, current 
optimizations show 15%-35% increase in deserialization speed, due to
 ** {{...StreamImplV1}} optimizations
 ** faster {{readInt}} / {{readLong}}
 ** better binary UUID format
 * Further optimizations for reads are required. Here I mostly focused on 
writing speed. Reading speed turned out to be worse than writing speed for 
small commands, I don't like it.

 

 


was (Author: ibessonov):
According to 
{{{}org.apache.ignite.internal.benchmarks.UpdateCommandsMarshalingMicroBenchmark{}}}.

Before:

 
{code:java}
Benchmark                

[jira] [Updated] (IGNITE-22559) Optimize raft command deserialization

2024-06-24 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22559:
---
Description: 
# We should benchmark readInt / readLong against protobuf, since it uses the 
same binary format
 # We should create much faster way of creating deserializers for messages. For 
example, we could generate "switch" statements like in Ignite 2. Both for 
creating message deserializer (compile time generation) and for message group 
deserialization factory  (runtime generation, because we don't know the list of 
factories)
 # We should get rid of serializers and deserializers as separate classes and 
move generated code into message implementation. This way we save on 
allocations and we don't create builder, which is also expensive, we should 
write directly into fields of target object like in Ignite 2.

> Optimize raft command deserialization
> -
>
> Key: IGNITE-22559
> URL: https://issues.apache.org/jira/browse/IGNITE-22559
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> # We should benchmark readInt / readLong against protobuf, since it uses the 
> same binary format
>  # We should create much faster way of creating deserializers for messages. 
> For example, we could generate "switch" statements like in Ignite 2. Both for 
> creating message deserializer (compile time generation) and for message group 
> deserialization factory  (runtime generation, because we don't know the list 
> of factories)
>  # We should get rid of serializers and deserializers as separate classes and 
> move generated code into message implementation. This way we save on 
> allocations and we don't create builder, which is also expensive, we should 
> write directly into fields of target object like in Ignite 2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22559) Optimize raft command deserialization

2024-06-24 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22559:
--

 Summary: Optimize raft command deserialization
 Key: IGNITE-22559
 URL: https://issues.apache.org/jira/browse/IGNITE-22559
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-22544) Commands marshalling appears to be slow

2024-06-24 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-22544:
--

Assignee: Ivan Bessonov

> Commands marshalling appears to be slow
> ---
>
> Key: IGNITE-22544
> URL: https://issues.apache.org/jira/browse/IGNITE-22544
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Attachments: IGNITE-22544.patch
>
>
> We should benchmark the way we marshal commands using optimized marshaller 
> and make it faster. Some obvious places:
>  * byte buffers pool - we can replace queue with a manual implementation of 
> Treiber stack, it's trivial and doesn't use as many CAS/volatile operations
>  * new serializers are allocated every time, but they can be put into static 
> final constants instead, or cached in fields of corresponding factories
>  * we can create a serialization factory per group, not per message, this way 
> we will remove unnecessary indirection. Group factory can use {{{}switch{}}}, 
> like in Ignite 2, which would basically lead to static dispatch of 
> deserializer constructors and static access to serializers, instead of 
> dynamic dispatch (virtual call), which should be noticeably faster
>  * profiler might show other simple places, we must also compare 
> {{OptimizedMarshaller}} against other serialization algorithms in benchmarks
> EDIT: quick draft attached, it addresses points 1 and 2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22544) Commands marshalling appears to be slow

2024-06-20 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22544:
---
Description: 
We should benchmark the way we marshal commands using optimized marshaller and 
make it faster. Some obvious places:
 * byte buffers pool - we can replace queue with a manual implementation of 
Treiber stack, it's trivial and doesn't use as many CAS/volatile operations
 * new serializers are allocated every time, but they can be put into static 
final constants instead, or cached in fields of corresponding factories
 * we can create a serialization factory per group, not per message, this way 
we will remove unnecessary indirection. Group factory can use {{{}switch{}}}, 
like in Ignite 2, which would basically lead to static dispatch of deserializer 
constructors and static access to serializers, instead of dynamic dispatch 
(virtual call), which should be noticeably faster
 * profiler might show other simple places, we must also compare 
{{OptimizedMarshaller}} against other serialization algorithms in benchmarks

EDIT: quick draft attached, it addresses points 1 and 2.

  was:
We should benchmark the way we marshal commands using optimized marshaller and 
make it faster. Some obvious places:
 * byte buffers pool - we can replace queue with a manual implementation of 
Treiber stack, it's trivial and doesn't use as many CAS/volatile operations
 * new serializers are allocated every time, but they can be put into static 
final constants instead, or cached in fields of corresponding factories
 * we can create a serialization factory per group, not per message, this way 
we will remove unnecessary indirection. Group factory can use {{{}switch{}}}, 
like in Ignite 2, which would basically lead to static dispatch of deserializer 
constructors and static access to serializers, instead of dynamic dispatch 
(virtual call), which should be noticeably faster
 * profiler might show other simple places, we must also compare 
{{OptimizedMarshaller}} against other serialization algorithms in benchmarks


> Commands marshalling appears to be slow
> ---
>
> Key: IGNITE-22544
> URL: https://issues.apache.org/jira/browse/IGNITE-22544
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Attachments: IGNITE-22544.patch
>
>
> We should benchmark the way we marshal commands using optimized marshaller 
> and make it faster. Some obvious places:
>  * byte buffers pool - we can replace queue with a manual implementation of 
> Treiber stack, it's trivial and doesn't use as many CAS/volatile operations
>  * new serializers are allocated every time, but they can be put into static 
> final constants instead, or cached in fields of corresponding factories
>  * we can create a serialization factory per group, not per message, this way 
> we will remove unnecessary indirection. Group factory can use {{{}switch{}}}, 
> like in Ignite 2, which would basically lead to static dispatch of 
> deserializer constructors and static access to serializers, instead of 
> dynamic dispatch (virtual call), which should be noticeably faster
>  * profiler might show other simple places, we must also compare 
> {{OptimizedMarshaller}} against other serialization algorithms in benchmarks
> EDIT: quick draft attached, it addresses points 1 and 2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22544) Commands marshalling appears to be slow

2024-06-20 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22544:
---
Attachment: IGNITE-22544.patch

> Commands marshalling appears to be slow
> ---
>
> Key: IGNITE-22544
> URL: https://issues.apache.org/jira/browse/IGNITE-22544
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Attachments: IGNITE-22544.patch
>
>
> We should benchmark the way we marshal commands using optimized marshaller 
> and make it faster. Some obvious places:
>  * byte buffers pool - we can replace queue with a manual implementation of 
> Treiber stack, it's trivial and doesn't use as many CAS/volatile operations
>  * new serializers are allocated every time, but they can be put into static 
> final constants instead, or cached in fields of corresponding factories
>  * we can create a serialization factory per group, not per message, this way 
> we will remove unnecessary indirection. Group factory can use {{{}switch{}}}, 
> like in Ignite 2, which would basically lead to static dispatch of 
> deserializer constructors and static access to serializers, instead of 
> dynamic dispatch (virtual call), which should be noticeably faster
>  * profiler might show other simple places, we must also compare 
> {{OptimizedMarshaller}} against other serialization algorithms in benchmarks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22542) Synchronous message handling on local node

2024-06-20 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22542:
---
Description: 
{{org.apache.ignite.internal.network.DefaultMessagingService#isSelf}} - if we 
detect that we send a message to the local node, we handle it immediately in 
the same thread, which could be very bed for throughput of the system.

"send"/"invoke" themselves appear to be slow as well, we should benchmark them. 
We should remove instantiation of InetSocketAddress for sure, if it's possible, 
it takes time to resolve it. Maybe we should create it unresolved or just cache 
it like in Ignite 2.

  was:
{{org.apache.ignite.internal.network.DefaultMessagingService#isSelf}} - if we 
detect that we send a message to the local node, we handle it immediately in 
the same thread, which could be very bed for throughput of the system.

"send"/"invoke" themselves appear to be slow as well, we should benchmark them.


> Synchronous message handling on local node
> --
>
> Key: IGNITE-22542
> URL: https://issues.apache.org/jira/browse/IGNITE-22542
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> {{org.apache.ignite.internal.network.DefaultMessagingService#isSelf}} - if we 
> detect that we send a message to the local node, we handle it immediately in 
> the same thread, which could be very bed for throughput of the system.
> "send"/"invoke" themselves appear to be slow as well, we should benchmark 
> them. We should remove instantiation of InetSocketAddress for sure, if it's 
> possible, it takes time to resolve it. Maybe we should create it unresolved 
> or just cache it like in Ignite 2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22544) Commands marshalling appears to be slow

2024-06-20 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22544:
--

 Summary: Commands marshalling appears to be slow
 Key: IGNITE-22544
 URL: https://issues.apache.org/jira/browse/IGNITE-22544
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


We should benchmark the way we marshal commands using optimized marshaller and 
make it faster. Some obvious places:
 * byte buffers pool - we can replace queue with a manual implementation of 
Treiber stack, it's trivial and doesn't use as many CAS/volatile operations
 * new serializers are allocated every time, but they can be put into static 
final constants instead, or cached in fields of corresponding factories
 * we can create a serialization factory per group, not per message, this way 
we will remove unnecessary indirection. Group factory can use {{{}switch{}}}, 
like in Ignite 2, which would basically lead to static dispatch of deserializer 
constructors and static access to serializers, instead of dynamic dispatch 
(virtual call), which should be noticeably faster
 * profiler might show other simple places, we must also compare 
{{OptimizedMarshaller}} against other serialization algorithms in benchmarks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22542) Synchronous message handling on local node

2024-06-20 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22542:
---
Ignite Flags:   (was: Docs Required,Release Notes Required)

> Synchronous message handling on local node
> --
>
> Key: IGNITE-22542
> URL: https://issues.apache.org/jira/browse/IGNITE-22542
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> {{org.apache.ignite.internal.network.DefaultMessagingService#isSelf}} - if we 
> detect that we send a message to the local node, we handle it immediately in 
> the same thread, which could be very bed for throughput of the system.
> "send"/"invoke" themselves appear to be slow as well, we should benchmark 
> them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22542) Synchronous message handling on local node

2024-06-20 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22542:
--

 Summary: Synchronous message handling on local node
 Key: IGNITE-22542
 URL: https://issues.apache.org/jira/browse/IGNITE-22542
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov


{{org.apache.ignite.internal.network.DefaultMessagingService#isSelf}} - if we 
detect that we send a message to the local node, we handle it immediately in 
the same thread, which could be very bed for throughput of the system.

"send"/"invoke" themselves appear to be slow as well, we should benchmark them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-22500) Remove unnecessary waits when creating an index

2024-06-19 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-22500:
--

Assignee: Ivan Bessonov

> Remove unnecessary waits when creating an index
> ---
>
> Key: IGNITE-22500
> URL: https://issues.apache.org/jira/browse/IGNITE-22500
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Roman Puchkovskiy
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> When creating an index with current defaults (DelayDuration=1sec, 
> MaxClockSkew=500ms, IdleSafeTimePropagationPeriod=1sec), it takes 6-6.5 
> seconds on my machine (without concurrent transactions, on an empty table 
> that was just created).
> According to the design, we need to first wait for the REGISTERED state to 
> activate on all nodes, including the ones that are currently down; this is to 
> make sure that all transactions started on schema versions before the index 
> creation have finished before we start to build the index (this makes us 
> waiting for DelayDuration+MaxClockSkew). Then, after the build finishes, we 
> switch the index to the AVAILABLE state. This requires another wait of 
> DelayDuration+MaxClockSkew.
> Because of IGNITE-20378, in the second case we actually wait longer (for 
> additional IdleSafeTimePropagationPeriod+MaxClockSkew).
> The total of waits is thus 1.5+3=4.5sec. But index creation actually takes 
> 6-6.5 seconds. It looks like there are some additional delays (like 
> submitting to the Metastorage and executing its watches).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21661) Test scenario where all stable nodes are lost during a partially completed rebalance

2024-06-19 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21661:
---
Reviewer: Kirill Tkalenko

> Test scenario where all stable nodes are lost during a partially completed 
> rebalance
> 
>
> Key: IGNITE-21661
> URL: https://issues.apache.org/jira/browse/IGNITE-21661
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Following case is possible:
>  * Nodes A, B and C for a partition
>  * B and C go offline
>  * new distribution is A, D and E
>  * EDIT: rebalance can only be started with one more "resetPartitions"
>  * full state transfer from A to D is completed
>  * full state transfer from A to E is not
>  * A goes offline
>  * we perform "resetPartitions"
> Ideally, we should use D as a new leader somehow, but the bare minimum should 
> be a partition that is functional, maybe an empty one. We should test the case
>  
> This might be a good place to add more tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-22502) Change default DelayDuration to 500ms

2024-06-19 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-22502:
--

Assignee: Ivan Bessonov

> Change default DelayDuration to 500ms
> -
>
> Key: IGNITE-22502
> URL: https://issues.apache.org/jira/browse/IGNITE-22502
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Roman Puchkovskiy
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> When executing a DDL, we must wait for DelayDuration+MaxClockSkew. 
> DelayDuration for small clusters (which will probably be the usual mode of 
> operation) does not need to be long, so it makes sense to lower the default 
> from 1 second to 0.5 second.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22509) Deadlock during the node stop

2024-06-14 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22509:
--

 Summary: Deadlock during the node stop
 Key: IGNITE-22509
 URL: https://issues.apache.org/jira/browse/IGNITE-22509
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov


{code:java}
"%itcskvt_n_1%Raft-Group-Client-1@51623" prio=5 tid=0x4a6e nid=NA waiting for 
monitor entry
  java.lang.Thread.State: BLOCKED
     waiting for main@1 to release lock on <0xca23> (a 
org.apache.ignite.internal.app.LifecycleManager)
      at 
org.apache.ignite.internal.app.LifecycleManager.lambda$allComponentsStartFuture$1(LifecycleManager.java:130)
      at 
org.apache.ignite.internal.app.LifecycleManager$$Lambda$2852.843214322.accept(Unknown
 Source:-1)
      at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
      at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
      at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
      at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
      at 
org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:550)
      at 
org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$handleThrowable$41(RaftGroupServiceImpl.java:605)
      at 
org.apache.ignite.internal.raft.RaftGroupServiceImpl$$Lambda$5439.1444714785.run(Unknown
 Source:-1)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
      at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264)
      at java.util.concurrent.FutureTask.run(FutureTask.java:-1)
      at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
      at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
      at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
      at java.lang.Thread.run(Thread.java:829)
 {code}
Holds busy lock in {{{}RaftGroupServiceImpl.sendWithRetry{}}}.
{code:java}
"main@1" prio=5 tid=0x1 nid=NA sleeping
  java.lang.Thread.State: TIMED_WAITING
     blocks %itcskvt_n_1%Raft-Group-Client-1@51623
      at java.lang.Thread.sleep(Thread.java:-1)
      at 
org.apache.ignite.internal.util.IgniteSpinReadWriteLock.writeLock(IgniteSpinReadWriteLock.java:255)
      at 
org.apache.ignite.internal.util.IgniteSpinBusyLock.block(IgniteSpinBusyLock.java:68)
      at 
org.apache.ignite.internal.raft.RaftGroupServiceImpl.shutdown(RaftGroupServiceImpl.java:491)
      at 
org.apache.ignite.internal.metastorage.impl.MetaStorageServiceContext.close(MetaStorageServiceContext.java:75)
      at 
org.apache.ignite.internal.metastorage.impl.MetaStorageServiceImpl.close(MetaStorageServiceImpl.java:272)
      at 
org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl$$Lambda$5148.891107.accept(Unknown
 Source:-1)
      at 
org.apache.ignite.internal.util.IgniteUtils.cancelOrConsume(IgniteUtils.java:967)
      at 
org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl.lambda$stopAsync$13(MetaStorageManagerImpl.java:452)
      at 
org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl$$Lambda$5141.633101377.close(Unknown
 Source:-1)
      at 
org.apache.ignite.internal.util.IgniteUtils.lambda$closeAllManually$1(IgniteUtils.java:611)
      at 
org.apache.ignite.internal.util.IgniteUtils$$Lambda$4822.1427077270.accept(Unknown
 Source:-1)
      at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
      at 
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
      at 
java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
      at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
      at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
      at 
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
      at 
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
      at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
      at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
      at 
org.apache.ignite.internal.util.IgniteUtils.closeAllManually(IgniteUtils.java:609)
      at 
org.apache.ignite.internal.util.IgniteUtils.closeAllManually(IgniteUtils.java:643)
      at 
org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl.stopAsync(MetaStorageManagerImpl.java:449)
      at 
org.apache.ignite.internal.util.IgniteUtils.lambda$stopAsync$6(IgniteUtils.java:1213)
      at 
org.apache.ignite.internal.util.IgniteUtils$$Lambda$5013.753691797.apply(Unknown
 Source:-1)
      at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
      at 

[jira] [Updated] (IGNITE-22443) Sporadic fails of ConfigurationTreeGeneratorTest

2024-06-11 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22443:
---
Ignite Flags:   (was: Docs Required,Release Notes Required)

> Sporadic fails of ConfigurationTreeGeneratorTest
> 
>
> Key: IGNITE-22443
> URL: https://issues.apache.org/jira/browse/IGNITE-22443
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Configuration changer start doesn't wait for internal defaults update future, 
> as a result we have rare data races in certain test methods



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22443) Sporadic fails of ConfigurationTreeGeneratorTest

2024-06-07 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22443:
--

 Summary: Sporadic fails of ConfigurationTreeGeneratorTest
 Key: IGNITE-22443
 URL: https://issues.apache.org/jira/browse/IGNITE-22443
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov
Assignee: Ivan Bessonov


Configuration changer start doesn't wait for internal defaults update future, 
as a result we have rare data races in certain test methods



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22386) Many usages of wrong revision serialization in metastorage commands

2024-05-31 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22386:
---
Ignite Flags:   (was: Docs Required,Release Notes Required)

> Many usages of wrong revision serialization in metastorage commands
> ---
>
> Key: IGNITE-22386
> URL: https://issues.apache.org/jira/browse/IGNITE-22386
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> {code:java}
> byte[] revisionBytes = ByteUtils.longToBytes(revision);
> Iif iif = iif( 
> notExists(partChangeTriggerKey).or(value(partChangeTriggerKey).lt(revisionBytes)),
> {code}
> Code above has a bug - "longToBytes" is not a suitable serialization format 
> for preserving natural comparison order used in "lt". We must fix it, because 
> it leads to occasional false-positive and false-negative condition evaluation
> It also leads to flaky tests, obviously



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-22386) Many usages of wrong revision serialization in metastorage commands

2024-05-31 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-22386:
--

Assignee: Ivan Bessonov

> Many usages of wrong revision serialization in metastorage commands
> ---
>
> Key: IGNITE-22386
> URL: https://issues.apache.org/jira/browse/IGNITE-22386
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
>  
> {code:java}
> byte[] revisionBytes = ByteUtils.longToBytes(revision);
> Iif iif = iif( 
> notExists(partChangeTriggerKey).or(value(partChangeTriggerKey).lt(revisionBytes)),
> {code}
> Code above has a bug - "longToBytes" is not a suitable serialization format 
> for preserving natural comparison order used in "lt". We must fix it, because 
> it leads to occasional false-positive and false-negative condition evaluation
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22386) Many usages of wrong revision serialization in metastorage commands

2024-05-31 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22386:
---
Description: 
{code:java}
byte[] revisionBytes = ByteUtils.longToBytes(revision);
Iif iif = iif( 
notExists(partChangeTriggerKey).or(value(partChangeTriggerKey).lt(revisionBytes)),
{code}
Code above has a bug - "longToBytes" is not a suitable serialization format for 
preserving natural comparison order used in "lt". We must fix it, because it 
leads to occasional false-positive and false-negative condition evaluation

It also leads to flaky tests, obviously

  was:
 
{code:java}
byte[] revisionBytes = ByteUtils.longToBytes(revision);
Iif iif = iif( 
notExists(partChangeTriggerKey).or(value(partChangeTriggerKey).lt(revisionBytes)),
{code}
Code above has a bug - "longToBytes" is not a suitable serialization format for 
preserving natural comparison order used in "lt". We must fix it, because it 
leads to occasional false-positive and false-negative condition evaluation

 


> Many usages of wrong revision serialization in metastorage commands
> ---
>
> Key: IGNITE-22386
> URL: https://issues.apache.org/jira/browse/IGNITE-22386
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> {code:java}
> byte[] revisionBytes = ByteUtils.longToBytes(revision);
> Iif iif = iif( 
> notExists(partChangeTriggerKey).or(value(partChangeTriggerKey).lt(revisionBytes)),
> {code}
> Code above has a bug - "longToBytes" is not a suitable serialization format 
> for preserving natural comparison order used in "lt". We must fix it, because 
> it leads to occasional false-positive and false-negative condition evaluation
> It also leads to flaky tests, obviously



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22386) Many usages of wrong revision serialization in metastorage commands

2024-05-31 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22386:
--

 Summary: Many usages of wrong revision serialization in 
metastorage commands
 Key: IGNITE-22386
 URL: https://issues.apache.org/jira/browse/IGNITE-22386
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov


 
{code:java}
byte[] revisionBytes = ByteUtils.longToBytes(revision);
Iif iif = iif( 
notExists(partChangeTriggerKey).or(value(partChangeTriggerKey).lt(revisionBytes)),
{code}
Code above has a bug - "longToBytes" is not a suitable serialization format for 
preserving natural comparison order used in "lt". We must fix it, because it 
leads to occasional false-positive and false-negative condition evaluation

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21661) Test scenario where all stable nodes are lost during a partially completed rebalance

2024-05-02 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21661:
---
Description: 
Following case is possible:
 * Nodes A, B and C for a partition
 * B and C go offline
 * new distribution is A, D and E
 * EDIT: rebalance can only be started with one more "resetPartitions"
 * full state transfer from A to D is completed
 * full state transfer from A to E is not
 * A goes offline
 * we perform "resetPartitions"

Ideally, we should use D as a new leader somehow, but the bare minimum should 
be a partition that is functional, maybe an empty one. We should test the case

 

This might be a good place to add more tests.

  was:
Following case is possible:
 * Nodes A, B and C for a partition
 * B and C go offline
 * new distribution is A, D and E
 * full state transfer from A to D is completed
 * full state transfer from A to E is not
 * A goes offline
 * we perform "resetPartitions"

Ideally, we should use D as a new leader somehow, but the bare minimum should 
be a partition that is functional, maybe an empty one. We should test the case

 

This might be a good place to add more tests.


> Test scenario where all stable nodes are lost during a partially completed 
> rebalance
> 
>
> Key: IGNITE-21661
> URL: https://issues.apache.org/jira/browse/IGNITE-21661
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Following case is possible:
>  * Nodes A, B and C for a partition
>  * B and C go offline
>  * new distribution is A, D and E
>  * EDIT: rebalance can only be started with one more "resetPartitions"
>  * full state transfer from A to D is completed
>  * full state transfer from A to E is not
>  * A goes offline
>  * we perform "resetPartitions"
> Ideally, we should use D as a new leader somehow, but the bare minimum should 
> be a partition that is functional, maybe an empty one. We should test the case
>  
> This might be a good place to add more tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-21661) Test scenario where all stable nodes are lost during a partially completed rebalance

2024-04-25 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-21661:
--

Assignee: Ivan Bessonov

> Test scenario where all stable nodes are lost during a partially completed 
> rebalance
> 
>
> Key: IGNITE-21661
> URL: https://issues.apache.org/jira/browse/IGNITE-21661
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Following case is possible:
>  * Nodes A, B and C for a partition
>  * B and C go offline
>  * new distribution is A, D and E
>  * full state transfer from A to D is completed
>  * full state transfer from A to E is not
>  * A goes offline
>  * we perform "resetPartitions"
> Ideally, we should use D as a new leader somehow, but the bare minimum should 
> be a partition that is functional, maybe an empty one. We should test the case
>  
> This might be a good place to add more tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21661) Test scenario where all stable nodes are lost during a partially completed rebalance

2024-04-25 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21661:
---
Description: 
Following case is possible:
 * Nodes A, B and C for a partition
 * B and C go offline
 * new distribution is A, D and E
 * full state transfer from A to D is completed
 * full state transfer from A to E is not
 * A goes offline
 * we perform "resetPartitions"

Ideally, we should use D as a new leader somehow, but the bare minimum should 
be a partition that is functional, maybe an empty one. We should test the case

 

This might be a good place to add more tests.

  was:
Following case is possible:
 * Nodes A, B and C for a partition
 * B and C go offline
 * new distribution is A, D and E
 * full state transfer from A to D is completed
 * full state transfer from A to E is not
 * A goes offline
 * we perform "resetPartitions"

Ideally, we should use D as a new leader somehow, but the bare minimum should 
be a partition that is functional, maybe an empty one. We should test the case


> Test scenario where all stable nodes are lost during a partially completed 
> rebalance
> 
>
> Key: IGNITE-21661
> URL: https://issues.apache.org/jira/browse/IGNITE-21661
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Following case is possible:
>  * Nodes A, B and C for a partition
>  * B and C go offline
>  * new distribution is A, D and E
>  * full state transfer from A to D is completed
>  * full state transfer from A to E is not
>  * A goes offline
>  * we perform "resetPartitions"
> Ideally, we should use D as a new leader somehow, but the bare minimum should 
> be a partition that is functional, maybe an empty one. We should test the case
>  
> This might be a good place to add more tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22107) Properly encapsulate partition meta

2024-04-25 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22107:
---
Description: 
{{PartitionMeta}} and {{PartitionMetaIo}} leak specific implementation details, 
specifically - all fields except for {{{}pageCount{}}}. This breaks 
encapsulation and makes {{page-memory}} module code non-reusable.

I propose splitting meta into 2 parts - abstract meta, that would only hold 
page count, and specific meta that will be located in a different module, close 
to the implementation.

In this case, we would have to pass meta IO as parameters into methods like 
{{{}PartitionMetaManager#readOrCreateMeta{}}}, and create a getter for IO in 
{{AbstractPartitionMeta}} class itself, but that's a necessary sacrifice. Some 
other places will be affected as well, mostly tests.

  was:
`PartitionMeta` and `PartitionMetaIo` leak specific implementation details, 
specifically - all fields except for `pageCount`. This breaks encapsulation and 
makes `page-memory` module code non-reusable.

I propose splitting meta into 2 parts - abstract meta, that would only hold 
page count, and specific meta that will be located in a different module, close 
to the implementation.

In this case, we would have to pass meta IO as parameters into methods like 
`PartitionMetaManager#readOrCreateMeta`, and create a getter for IO in 
`AbstractPartitionMeta` class itself, but that's a necessary sacrifice.


> Properly encapsulate partition meta
> ---
>
> Key: IGNITE-22107
> URL: https://issues.apache.org/jira/browse/IGNITE-22107
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> {{PartitionMeta}} and {{PartitionMetaIo}} leak specific implementation 
> details, specifically - all fields except for {{{}pageCount{}}}. This breaks 
> encapsulation and makes {{page-memory}} module code non-reusable.
> I propose splitting meta into 2 parts - abstract meta, that would only hold 
> page count, and specific meta that will be located in a different module, 
> close to the implementation.
> In this case, we would have to pass meta IO as parameters into methods like 
> {{{}PartitionMetaManager#readOrCreateMeta{}}}, and create a getter for IO in 
> {{AbstractPartitionMeta}} class itself, but that's a necessary sacrifice. 
> Some other places will be affected as well, mostly tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22107) Properly encapsulate partition meta

2024-04-25 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22107:
--

 Summary: Properly encapsulate partition meta
 Key: IGNITE-22107
 URL: https://issues.apache.org/jira/browse/IGNITE-22107
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov
 Fix For: 3.0.0-beta2


`PartitionMeta` and `PartitionMetaIo` leak specific implementation details, 
specifically - all fields except for `pageCount`. This breaks encapsulation and 
makes `page-memory` module code non-reusable.

I propose splitting meta into 2 parts - abstract meta, that would only hold 
page count, and specific meta that will be located in a different module, close 
to the implementation.

In this case, we would have to pass meta IO as parameters into methods like 
`PartitionMetaManager#readOrCreateMeta`, and create a getter for IO in 
`AbstractPartitionMeta` class itself, but that's a necessary sacrifice.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IGNITE-21434) Fail user write requests for non-available partitions

2024-04-25 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov resolved IGNITE-21434.

Resolution: Won't Fix

This insert doesn't hang indefinitely anymore, it fails with primary replica 
awaiting. I'm closing the issue as "Won't Fix"

> Fail user write requests for non-available partitions
> -
>
> Key: IGNITE-21434
> URL: https://issues.apache.org/jira/browse/IGNITE-21434
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Currently, {{INSERT INTO test VALUES(%d, %d);}} just hands indefinitely, 
> which is not what you would expect. We should either fail the request 
> immediately if there's no majority, or return a replication timeout 
> exception, for example.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22075) GC doesn't wait for RO transactions

2024-04-19 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22075:
--

 Summary: GC doesn't wait for RO transactions
 Key: IGNITE-22075
 URL: https://issues.apache.org/jira/browse/IGNITE-22075
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov
 Fix For: 3.0.0-beta2


In https://issues.apache.org/jira/browse/IGNITE-21773 we started handling the 
LWM update concurrently by both TX manager and GC, which means that GC might 
start collecting garbage before transactions are finished. This doesn't even 
depend on listeners order, because both operations are asynchronous.

We must fix it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22041) Secondary indexes inline size calculation is wrong

2024-04-17 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22041:
---
Description: 
* "short" size is used as 16 bytes instead of 2 bytes
 * decimal header is not included in estimation

> Secondary indexes inline size calculation is wrong
> --
>
> Key: IGNITE-22041
> URL: https://issues.apache.org/jira/browse/IGNITE-22041
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> * "short" size is used as 16 bytes instead of 2 bytes
>  * decimal header is not included in estimation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22063) aimem partition deletion doesn't delete GC queue

2024-04-17 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22063:
--

 Summary: aimem partition deletion doesn't delete GC queue
 Key: IGNITE-22063
 URL: https://issues.apache.org/jira/browse/IGNITE-22063
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov


{{org.apache.ignite.internal.storage.pagememory.mv.VolatilePageMemoryMvPartitionStorage#destroyStructures}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22050) Data structures don't clear partId of reused page

2024-04-17 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22050:
---
Description: 
In current implementation we use a single reuse list for all partitions in 
aimem storage engine.

That works fine in Ignite 2, but here in Ignite 3 we implemented a 
"partitilnless link" format for eliminating 2 bytes, that indicate partition 
number, from the data in pages. This means that if allocator provided the 
structure with the page from partition X, but the structure itself represents 
partition Y, we will lose the "X" in the process and next time will try 
accessing the page by the pageId that has Y encoded in it. This would lead to 
pageId mismatch.

We have several options here.
 * ignore mismatched partitions
 * get rid of partitionless pageIds
 * fix the allocator, so that it would change partition Id upon allocation

Ideally, we should go with the 3rd option. It requires some slight changes in 
internal data structure API, so that we would pass the required partitionId 
directly into the allocator (reuse list). This is a little bit excessive at 
first sight, but seems more appropriate in a long run. Ignite 2 pageIds are all 
messed up inside of structures, we can fix that.

> Data structures don't clear partId of reused page
> -
>
> Key: IGNITE-22050
> URL: https://issues.apache.org/jira/browse/IGNITE-22050
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In current implementation we use a single reuse list for all partitions in 
> aimem storage engine.
> That works fine in Ignite 2, but here in Ignite 3 we implemented a 
> "partitilnless link" format for eliminating 2 bytes, that indicate partition 
> number, from the data in pages. This means that if allocator provided the 
> structure with the page from partition X, but the structure itself represents 
> partition Y, we will lose the "X" in the process and next time will try 
> accessing the page by the pageId that has Y encoded in it. This would lead to 
> pageId mismatch.
> We have several options here.
>  * ignore mismatched partitions
>  * get rid of partitionless pageIds
>  * fix the allocator, so that it would change partition Id upon allocation
> Ideally, we should go with the 3rd option. It requires some slight changes in 
> internal data structure API, so that we would pass the required partitionId 
> directly into the allocator (reuse list). This is a little bit excessive at 
> first sight, but seems more appropriate in a long run. Ignite 2 pageIds are 
> all messed up inside of structures, we can fix that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IGNITE-22055) Shut destruction executor down before closing volatile regions

2024-04-17 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov resolved IGNITE-22055.

  Reviewer: Ivan Bessonov
Resolution: Fixed

> Shut destruction executor down before closing volatile regions
> --
>
> Key: IGNITE-22055
> URL: https://issues.apache.org/jira/browse/IGNITE-22055
> Project: Ignite
>  Issue Type: Bug
>Reporter: Roman Puchkovskiy
>Assignee: Roman Puchkovskiy
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22058) Use paranoid leak detection in tests

2024-04-17 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22058:
--

 Summary: Use paranoid leak detection in tests
 Key: IGNITE-22058
 URL: https://issues.apache.org/jira/browse/IGNITE-22058
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov
 Fix For: 3.0.0-beta2


We should set `io.netty.leakDetection.level=paranoid` in integration tests and 
network tests, in order to detect possible leaks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22050) Data structures don't clear partId of reused page

2024-04-16 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22050:
--

 Summary: Data structures don't clear partId of reused page
 Key: IGNITE-22050
 URL: https://issues.apache.org/jira/browse/IGNITE-22050
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov
Assignee: Ivan Bessonov
 Fix For: 3.0.0-beta2






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22041) Secondary indexes inline size calculation is wrong

2024-04-15 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22041:
--

 Summary: Secondary indexes inline size calculation is wrong
 Key: IGNITE-22041
 URL: https://issues.apache.org/jira/browse/IGNITE-22041
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov
Assignee: Ivan Bessonov






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-21999) Merge partition free-lists into one

2024-04-11 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-21999:
--

Assignee: Philipp Shergalis  (was: Ivan Bessonov)

> Merge partition free-lists into one
> ---
>
> Key: IGNITE-21999
> URL: https://issues.apache.org/jira/browse/IGNITE-21999
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Philipp Shergalis
>Priority: Major
>  Labels: ignite-3
>
> Current implementation has 2 free-lists:
>  * version chains
>  * index tuples
> These lists have separate buckets for different types of data pages. There's 
> an issue with this approach:
>  * overhead on pages - we have to allocate more pages to store buckets
>  * overhead on checkpoints - we have to save twice as many free-lists on 
> every checkpoint
> The reason, to my understanding, is the fact that FreeList class is 
> parameterized with the specific type of data that it stores. It makes no 
> sense to me, to be completely honest, because the algorithm is always the 
> same, and we always use the code from abstract free-list implementation.
> What I propose:
>  * get rid of abstract implementation and only have the concrete 
> implementation of free lists
>  * same for data pages
>  * serialization code will be fully moved to implementations of Storeable
> We're losing some guarantees if we do this change - we can no longer check 
> that type of the page is correct. My response to this issue is that every 
> Storeable could add a 1-byte header to the data, in order to validate it when 
> being read, that should be enough. If we could find a way to store less than 
> 1 byte then that's nice, I didn't look too much into the question.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-21999) Merge partition free-lists into one

2024-04-11 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-21999:
--

Assignee: Ivan Bessonov

> Merge partition free-lists into one
> ---
>
> Key: IGNITE-21999
> URL: https://issues.apache.org/jira/browse/IGNITE-21999
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Current implementation has 2 free-lists:
>  * version chains
>  * index tuples
> These lists have separate buckets for different types of data pages. There's 
> an issue with this approach:
>  * overhead on pages - we have to allocate more pages to store buckets
>  * overhead on checkpoints - we have to save twice as many free-lists on 
> every checkpoint
> The reason, to my understanding, is the fact that FreeList class is 
> parameterized with the specific type of data that it stores. It makes no 
> sense to me, to be completely honest, because the algorithm is always the 
> same, and we always use the code from abstract free-list implementation.
> What I propose:
>  * get rid of abstract implementation and only have the concrete 
> implementation of free lists
>  * same for data pages
>  * serialization code will be fully moved to implementations of Storeable
> We're losing some guarantees if we do this change - we can no longer check 
> that type of the page is correct. My response to this issue is that every 
> Storeable could add a 1-byte header to the data, in order to validate it when 
> being read, that should be enough. If we could find a way to store less than 
> 1 byte then that's nice, I didn't look too much into the question.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21999) Merge partition free-lists into one

2024-04-08 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21999:
--

 Summary: Merge partition free-lists into one
 Key: IGNITE-21999
 URL: https://issues.apache.org/jira/browse/IGNITE-21999
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


Current implementation has 2 free-lists:
 * version chains
 * index tuples

These lists have separate buckets for different types of data pages. There's an 
issue with this approach:
 * overhead on pages - we have to allocate more pages to store buckets
 * overhead on checkpoints - we have to save twice as many free-lists on every 
checkpoint

The reason, to my understanding, is the fact that FreeList class is 
parameterized with the specific type of data that it stores. It makes no sense 
to me, to be completely honest, because the algorithm is always the same, and 
we always use the code from abstract free-list implementation.

What I propose:
 * get rid of abstract implementation and only have the concrete implementation 
of free lists
 * same for data pages
 * serialization code will be fully moved to implementations of Storeable

We're losing some guarantees if we do this change - we can no longer check that 
type of the page is correct. My response to this issue is that every Storeable 
could add a 1-byte header to the data, in order to validate it when being read, 
that should be enough. If we could find a way to store less than 1 byte then 
that's nice, I didn't look too much into the question.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-21257) Public Java API to get global partition states

2024-04-05 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-21257:
--

Assignee: Ivan Bessonov

> Public Java API to get global partition states
> --
>
> Key: IGNITE-21257
> URL: https://issues.apache.org/jira/browse/IGNITE-21257
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Please refer to https://issues.apache.org/jira/browse/IGNITE-21140 for the 
> list.
> We should use local partition states, implemented in IGNITE-21256, and 
> combine them in cluster-wide compute call, before returning to the user.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21987) Optimize RO scan in sorted indexes

2024-04-04 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21987:
---
Description: 
This issue applies to aimem/aipersist primarily. Optimization for rocksdb might 
be done separately.
 * add new method to SortedIndexStorage, like "readOnlyScan", that returns a 
simple cursor
 * in the implementation we should use alternative cursor implementation for RO 
scans - it should delegate calls to B+Tree cursor
 * reuse existing tests where possible
 * call new method where necessary (PartitionReplicaListener#scanSortedIndex)

IMPORTANT: we should throw an exception if somebody scans an index and 
IndexStorage#getNextRowIdToBuild is not null. It should be a new error, like 
"IndexNotBuiltException"

  was:
This issue applies to aimem/aipersist primarily. Optimization for rocksdb might 
be done separately.
 * add new method to SortedIndexStorage, like "readOnlyScan", that returns a 
simple cursor
 * in the implementation we should use alternative cursor implementation for RO 
scans - it should delegate calls to B+Tree cursor
 * reuse existing tests where possible
 * call new method where necessary (PartitionReplicaListener#scanSortedIndex)

IMPORTANT: we should throw an exception if somebody scans an index and 
IndexStorage#getNextRowIdToBuild is not null.


> Optimize RO scan in sorted indexes
> --
>
> Key: IGNITE-21987
> URL: https://issues.apache.org/jira/browse/IGNITE-21987
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> This issue applies to aimem/aipersist primarily. Optimization for rocksdb 
> might be done separately.
>  * add new method to SortedIndexStorage, like "readOnlyScan", that returns a 
> simple cursor
>  * in the implementation we should use alternative cursor implementation for 
> RO scans - it should delegate calls to B+Tree cursor
>  * reuse existing tests where possible
>  * call new method where necessary (PartitionReplicaListener#scanSortedIndex)
> IMPORTANT: we should throw an exception if somebody scans an index and 
> IndexStorage#getNextRowIdToBuild is not null. It should be a new error, like 
> "IndexNotBuiltException"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21987) Optimize RO scan in sorted indexes

2024-04-04 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21987:
---
Description: 
This issue applies to aimem/aipersist primarily. Optimization for rocksdb might 
be done separately.
 * add new method to SortedIndexStorage, like "readOnlyScan", that returns a 
simple cursor
 * in the implementation we should use alternative cursor implementation for RO 
scans - it should delegate calls to B+Tree cursor
 * reuse existing tests where possible
 * call new method where necessary (PartitionReplicaListener#scanSortedIndex)

IMPORTANT: we should throw an exception if somebody scans an index and 
IndexStorage#getNextRowIdToBuild is not null.

  was:
This issue applies to aimem/aipersist primarily. Optimization for rocksdb might 
be done separately.
 * add new method to SortedIndexStorage, like "readOnlyScan", that returns a 
simple cursor
 * in the implementation we should use alternative cursor implementation for RO 
scans - it should delegate calls to B+Tree cursor
 * reuse existing tests where possible
 * call new method where necessary (PartitionReplicaListener#scanSortedIndex)


> Optimize RO scan in sorted indexes
> --
>
> Key: IGNITE-21987
> URL: https://issues.apache.org/jira/browse/IGNITE-21987
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> This issue applies to aimem/aipersist primarily. Optimization for rocksdb 
> might be done separately.
>  * add new method to SortedIndexStorage, like "readOnlyScan", that returns a 
> simple cursor
>  * in the implementation we should use alternative cursor implementation for 
> RO scans - it should delegate calls to B+Tree cursor
>  * reuse existing tests where possible
>  * call new method where necessary (PartitionReplicaListener#scanSortedIndex)
> IMPORTANT: we should throw an exception if somebody scans an index and 
> IndexStorage#getNextRowIdToBuild is not null.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21987) Optimize RO scan in sorted indexes

2024-04-04 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21987:
---
Description: 
This issue applies to aimem/aipersist primarily. Optimization for rocksdb might 
be done separately.
 * add new method to SortedIndexStorage, like "readOnlyScan", that returns a 
simple cursor
 * in the implementation we should use alternative cursor implementation for RO 
scans - it should delegate calls to B+Tree cursor
 * reuse existing tests where possible
 * call new method where necessary (PartitionReplicaListener#scanSortedIndex)

  was:
This issue applies to aimem/aipersist primarily. Optimization for rocksdb might 
be done separately.
 * add new flag RO_SCAN to SortedIndexStorage
 * in the implementation we should use alternative cursor implementation for RO 
scans - it should delegate calls to B+Tree cursor, and "peek" should throw an 
"UnsupportedOperationException"
 * for "rocksdb" it shouldn't refresh the iterator all the time. "peek" should 
also throw exceptions
 * reuse existing tests
 * pass new RO_SCAN flag into a method where it's necessary


> Optimize RO scan in sorted indexes
> --
>
> Key: IGNITE-21987
> URL: https://issues.apache.org/jira/browse/IGNITE-21987
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> This issue applies to aimem/aipersist primarily. Optimization for rocksdb 
> might be done separately.
>  * add new method to SortedIndexStorage, like "readOnlyScan", that returns a 
> simple cursor
>  * in the implementation we should use alternative cursor implementation for 
> RO scans - it should delegate calls to B+Tree cursor
>  * reuse existing tests where possible
>  * call new method where necessary (PartitionReplicaListener#scanSortedIndex)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21987) Optimize RO scan in sorted indexes

2024-04-04 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21987:
--

 Summary: Optimize RO scan in sorted indexes
 Key: IGNITE-21987
 URL: https://issues.apache.org/jira/browse/IGNITE-21987
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


This issue applies to aimem/aipersist primarily. Optimization for rocksdb might 
be done separately.
 * add new flag RO_SCAN to SortedIndexStorage
 * in the implementation we should use alternative cursor implementation for RO 
scans - it should delegate calls to B+Tree cursor, and "peek" should throw an 
"UnsupportedOperationException"
 * for "rocksdb" it shouldn't refresh the iterator all the time. "peek" should 
also throw exceptions
 * reuse existing tests
 * pass new RO_SCAN flag into a method where it's necessary



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21906) Consider disabling inline in PK index by default

2024-04-02 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21906:
--

 Summary: Consider disabling inline in PK index by default
 Key: IGNITE-21906
 URL: https://issues.apache.org/jira/browse/IGNITE-21906
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


In aipersist/aimem we attempt to inline binary tuples into pages for hash 
indexes by default. This, in theory, saves us from the necessity of accessing 
binary tuples from data pages for comparison, which is slower than comparing 
inlined data.

But, assuming the good hash distribution, we would only have to do the real 
comparison for the matched tuple. At the same time, inlined data might be 
substantially larger than hash+link, meaning that B+Tree with inlined data has 
bigger height, which correlates with slower search speed.

So, we have both pros and cons for inlining, and the only real way to reconcile 
them is to compare them with some benchmarks. This is exactly what I propose.

TL;DR: force inline size to be 0 for hash indices and benchmark for put/get 
operations, with large enough amount of data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21902) Add an option to configure log storage path

2024-04-02 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21902:
--

 Summary: Add an option to configure log storage path
 Key: IGNITE-21902
 URL: https://issues.apache.org/jira/browse/IGNITE-21902
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov
 Fix For: 3.0.0-beta2


Option to store log and data on separate devices can substantially improve the 
performance in a long run for many users, we should implement it.

There is such an option in Ignite 2, and people use it all the time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IGNITE-21898) Remove reactive methods from AntiHijackingIgniteSql

2024-04-01 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov resolved IGNITE-21898.

  Reviewer: Ivan Bessonov
Resolution: Fixed

> Remove reactive methods from AntiHijackingIgniteSql
> ---
>
> Key: IGNITE-21898
> URL: https://issues.apache.org/jira/browse/IGNITE-21898
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Roman Puchkovskiy
>Assignee: Roman Puchkovskiy
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> They were removed from IgniteSql interface.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21661) Test scenario where all stable nodes are lost during a partially completed rebalance

2024-03-04 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21661:
--

 Summary: Test scenario where all stable nodes are lost during a 
partially completed rebalance
 Key: IGNITE-21661
 URL: https://issues.apache.org/jira/browse/IGNITE-21661
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


Following case is possible:
 * Nodes A, B and C for a partition
 * B and C go offline
 * new distribution is A, D and E
 * full state transfer from A to D is completed
 * full state transfer from A to E is not
 * A goes offline
 * we perform "resetPartitions"

Ideally, we should use D as a new leader somehow, but the bare minimum should 
be a partition that is functional, maybe an empty one. We should test the case



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21284) Internal API for manual raft group configuration update

2024-02-23 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21284:
---
Description: 
We need an API (with implementation) that's analogous to 
"reset-lost-partitions", but with the ability to reuse living minority of nodes.

This API should gather the states of partitions, identify healthy peers, and 
use them as a new raft group configuration (through the update of assignments).

We have to make sure that node with latest log index will become a leader, so 
we will have to propagate desired minimum for log index in assignments and use 
it during the voting.
h2. What's implemented

"resetPartitions" operation in distributed zone manager. It identifies 
partitions where only a minority of nodes is online (thus they won't be able to 
execute "changePeersAsync"), and writes a "forced pending assignments" for them.

Forced assignment excludes stable nodes, that are not present in pending 
assignment, from a new raft group configuration. It also performs a 
"resetPeers" operation on alive nodes from the stable assignment.

Complete loss of all nodes from stable assignments is not yet implemented, at 
least one node is required to be elected as a leader.

  was:
We need an API (with implementation) that's analogous to 
"reset-lost-partitions", but with the ability to reuse living minority of nodes.

This API should gather the states of partitions, identify healthy peers, and 
use them as a new raft group configuration (through the update of assignments).

We have to make sure that node with latest log index will become a leader, so 
we will have to propagate desired minimum for log index in assignments and use 
it during the voting.


> Internal API for manual raft group configuration update
> ---
>
> Key: IGNITE-21284
> URL: https://issues.apache.org/jira/browse/IGNITE-21284
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We need an API (with implementation) that's analogous to 
> "reset-lost-partitions", but with the ability to reuse living minority of 
> nodes.
> This API should gather the states of partitions, identify healthy peers, and 
> use them as a new raft group configuration (through the update of 
> assignments).
> We have to make sure that node with latest log index will become a leader, so 
> we will have to propagate desired minimum for log index in assignments and 
> use it during the voting.
> h2. What's implemented
> "resetPartitions" operation in distributed zone manager. It identifies 
> partitions where only a minority of nodes is online (thus they won't be able 
> to execute "changePeersAsync"), and writes a "forced pending assignments" for 
> them.
> Forced assignment excludes stable nodes, that are not present in pending 
> assignment, from a new raft group configuration. It also performs a 
> "resetPeers" operation on alive nodes from the stable assignment.
> Complete loss of all nodes from stable assignments is not yet implemented, at 
> least one node is required to be elected as a leader.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21588) CMG commands idempotency is broken

2024-02-22 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21588:
---
Description: 
When handling commands like {{JoinReadyCommand}} and {{NodesLeaveCommand}} we 
do the following:
 * Read local state with {{{}readLogicalTopology(){}}}.
 * Modify state according to the command.
 * {*}Increase version{*}.
 * Write new state with {{{}saveSnapshotToStorage(snapshot){}}}.

The problem lies in reading and writing of the state - it' local, and version 
value is not replicated.

What happens when we restart the node:
 * It starts without local storage snapshot, with appliedIndex == 0, which is a 
{*}state in the past{*}.
 * We apply commands that were already applied before restart.
 * We apply these commands to locally saved topology snapshot.
 * This logical topology snapshot has a *state in the future* when compared to 
appliedIndex == 0.
 * As a result, when we re-apply some commands, we *increase the version* one 
more time, thus breaking data consistency between nodes.

This would have been fine if we only used this version locally. But 
distribution zones rely on the consistency of the version between all nodes in 
cluster. This might break DZ data nodes handling if any of the cluster nodes 
restarts.

How to fix:
 * Either drop the storage if there's no storage snapshot, this will restore 
consistency
 * or never start CMG group from a snapshot, but rather start it from the 
latest storage data.

  was:
When handling commands like {{JoinReadyCommand}} and {{NodesLeaveCommand}} we 
do the following:
 * Read local state with {{{}readLogicalTopology(){}}}.
 * Modify state according to the command.
 * {*}Increase version{*}.
 * Write new state with {{{}saveSnapshotToStorage(snapshot){}}}.

The problem lies in reading and writing of the state - it's local, and version 
value is not replicated.

What happens when we restart the node:
 * It starts with local storage snapshot, which is a {*}state in the past{*}, 
generally speaking.
 * We apply commands that were not applied in the snapshot.
 * We apply these commands to locally saved topology snapshot.
 * This logical topology snapshot has a *state in the future* when compared to 
storage snapshot.
 * As a result, when we re-apply some commands, we *increase the version* one 
more time, thus breaking data consistency between nodes.

This would have been fine if we only used this version locally. But 
distribution zones rely on the consistency of the version between all nodes in 
cluster. This might break DZ data nodes handling if any of the cluster nodes 
restarts.


> CMG commands idempotency is broken
> --
>
> Key: IGNITE-21588
> URL: https://issues.apache.org/jira/browse/IGNITE-21588
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> When handling commands like {{JoinReadyCommand}} and {{NodesLeaveCommand}} we 
> do the following:
>  * Read local state with {{{}readLogicalTopology(){}}}.
>  * Modify state according to the command.
>  * {*}Increase version{*}.
>  * Write new state with {{{}saveSnapshotToStorage(snapshot){}}}.
> The problem lies in reading and writing of the state - it' local, and version 
> value is not replicated.
> What happens when we restart the node:
>  * It starts without local storage snapshot, with appliedIndex == 0, which is 
> a {*}state in the past{*}.
>  * We apply commands that were already applied before restart.
>  * We apply these commands to locally saved topology snapshot.
>  * This logical topology snapshot has a *state in the future* when compared 
> to appliedIndex == 0.
>  * As a result, when we re-apply some commands, we *increase the version* one 
> more time, thus breaking data consistency between nodes.
> This would have been fine if we only used this version locally. But 
> distribution zones rely on the consistency of the version between all nodes 
> in cluster. This might break DZ data nodes handling if any of the cluster 
> nodes restarts.
> How to fix:
>  * Either drop the storage if there's no storage snapshot, this will restore 
> consistency
>  * or never start CMG group from a snapshot, but rather start it from the 
> latest storage data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21588) CMG commands idempotency is broken

2024-02-22 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21588:
--

 Summary: CMG commands idempotency is broken
 Key: IGNITE-21588
 URL: https://issues.apache.org/jira/browse/IGNITE-21588
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov


When handling commands like {{JoinReadyCommand}} and {{NodesLeaveCommand}} we 
do the following:
 * Read local state with {{{}readLogicalTopology(){}}}.
 * Modify state according to the command.
 * {*}Increase version{*}.
 * Write new state with {{{}saveSnapshotToStorage(snapshot){}}}.

The problem lies in reading and writing of the state - it' local, and version 
value is not replicated.

What happens when we restart the node:
 * It starts with local storage snapshot, which is a {*}state in the past{*}, 
generally speaking.
 * We apply commands that were not applied in the snapshot.
 * We apply these commands to locally saved topology snapshot.
 * This logical topology snapshot has a *state in the future* when compared to 
storage snapshot.
 * As a result, when we re-apply some commands, we *increase the version* one 
more time, thus breaking data consistency between nodes.

This would have been fine if we only used this version locally. But 
distribution zones rely on the consistency of the version between all nodes in 
cluster. This might break DZ data nodes handling if any of the cluster nodes 
restarts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21548) Encapsulate Set

2024-02-16 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21548:
--

 Summary: Encapsulate Set
 Key: IGNITE-21548
 URL: https://issues.apache.org/jira/browse/IGNITE-21548
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov
Assignee: Ivan Bessonov


Assignments may have some associated metadata, like a "force" flag, for 
example. We should prepare the code for introducing such meta in the future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-18366) Simplify the configuration asm generator, phase 2

2024-02-16 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-18366:
---
Description: 
After the split, it makes sense to start simplifying every individual 
generator. This is partially a research issue. Exactly what to do is not clear 
yet.

Some context: classes in package 
{{org.apache.ignite.internal.configuration.asm}} are pretty big and 
complicated.  {{InnerNodeAsmGenerator}} is almost 2000 lines long.

How can we make it simpler? Better naming, more comments. Inner node generation 
can be split into multiple files, because it also handles polymorphic 
implementations.

In some cases I would change the generation itself. For example, generated 
methods in polymorphic instances have the same implementation as in original 
inner node instead of simply delegating the execution to inner nodes. It affect 
both performance and the code of the generators in negative way.

  was:After the split, it makes sense to start simplifying every individual 
generator. This is partially a research issue. Exactly what to do is not clear 
yet.


> Simplify the configuration asm generator, phase 2
> -
>
> Key: IGNITE-18366
> URL: https://issues.apache.org/jira/browse/IGNITE-18366
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: iep-55, ignite-3, technical-debt
> Fix For: 3.0.0-beta2
>
>
> After the split, it makes sense to start simplifying every individual 
> generator. This is partially a research issue. Exactly what to do is not 
> clear yet.
> Some context: classes in package 
> {{org.apache.ignite.internal.configuration.asm}} are pretty big and 
> complicated.  {{InnerNodeAsmGenerator}} is almost 2000 lines long.
> How can we make it simpler? Better naming, more comments. Inner node 
> generation can be split into multiple files, because it also handles 
> polymorphic implementations.
> In some cases I would change the generation itself. For example, generated 
> methods in polymorphic instances have the same implementation as in original 
> inner node instead of simply delegating the execution to inner nodes. It 
> affect both performance and the code of the generators in negative way.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IGNITE-21302) Prohibit automatic group reconfiguration when there's no majority

2024-02-14 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov resolved IGNITE-21302.

Resolution: Won't Fix

This fix is not required. Data loss won't happen for different reasons

> Prohibit automatic group reconfiguration when there's no majority
> -
>
> Key: IGNITE-21302
> URL: https://issues.apache.org/jira/browse/IGNITE-21302
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> scaleDown timer should not lead to a situation where user loses the data.
> Default "changePeers" behavior also won't work, because there's no majority 
> and thus no leader.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21501) Create index storages for new partitions on rebalance

2024-02-09 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21501:
---
Epic Link: IGNITE-20782

> Create index storages for new partitions on rebalance
> -
>
> Key: IGNITE-21501
> URL: https://issues.apache.org/jira/browse/IGNITE-21501
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> It appears that we only create index storages during the "table creation", 
> not during the "partition creation" if it's performed in isolation.
> Even if we did, 
> {{org.apache.ignite.internal.table.distributed.index.IndexUpdateHandler#waitIndexes}}
>  is still badly designed, because it waits for indexes of the initial 
> partitions distribution and cannot provide any guarantees when assignments 
> are changed.
> This leads to NPEs or bizarre assertions, related to aforementioned method.
> What we need to do is:
>  * Get rid of the faulty index awaiting mechanizm.
>  * Create index storages before starting raft group.
>  * [optional] There might be naturally occurring "races" between catalog 
> updates (index creation) and rebalance. Right now they are resolved by the 
> fact that these processes are linearized in watch processing, but that's not 
> the best approach. If we could provide something more robust, that would have 
> been nice. Let's think about it at least.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21501) Create index storages for new partitions on rebalance

2024-02-09 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21501:
--

 Summary: Create index storages for new partitions on rebalance
 Key: IGNITE-21501
 URL: https://issues.apache.org/jira/browse/IGNITE-21501
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov


It appears that we only create index storages during the "table creation", not 
during the "partition creation" if it's performed in isolation.

Even if we did, 
{{org.apache.ignite.internal.table.distributed.index.IndexUpdateHandler#waitIndexes}}
 is still badly designed, because it waits for indexes of the initial 
partitions distribution and cannot provide any guarantees when assignments are 
changed.

This leads to NPEs or bizarre assertions, related to aforementioned method.

What we need to do is:
 * Get rid of the faulty index awaiting mechanizm.
 * Create index storages before starting raft group.
 * [optional] There might be naturally occurring "races" between catalog 
updates (index creation) and rebalance. Right now they are resolved by the fact 
that these processes are linearized in watch processing, but that's not the 
best approach. If we could provide something more robust, that would have been 
nice. Let's think about it at least.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IGNITE-21488) Disable thread assertions by default

2024-02-07 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov resolved IGNITE-21488.

  Reviewer: Ivan Bessonov
Resolution: Fixed

> Disable thread assertions by default
> 
>
> Key: IGNITE-21488
> URL: https://issues.apache.org/jira/browse/IGNITE-21488
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Roman Puchkovskiy
>Assignee: Roman Puchkovskiy
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21469) AssertionError in checkpoint

2024-02-06 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21469:
---
Epic Link: IGNITE-21444

> AssertionError in checkpoint
> 
>
> Key: IGNITE-21469
> URL: https://issues.apache.org/jira/browse/IGNITE-21469
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
>  
> {code:java}
>   at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:870)
>  ~[?:?]   at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
>  ~[?:?]   at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>  ~[?:?]   at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
>  ~[?:?]   at 
> org.apache.ignite.internal.pagememory.persistence.checkpoint.AwaitTasksCompletionExecutor.lambda$execute$1(AwaitTasksCompletionExecutor.java:63)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  ~[?:?]   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  ~[?:?]   ... 1 more Caused by: java.lang.AssertionError: FullPageId 
> [pageId=000100020378, effectivePageId=00020378, groupId=886]  
>  at 
> org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:758)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:641)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:613)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.util.PageHandler.writePage(PageHandler.java:280)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.datastructure.DataStructure.write(DataStructure.java:296)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.freelist.PagesList.flushBucketsCache(PagesList.java:387)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.freelist.PagesList.saveMetadata(PagesList.java:332)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.storage.pagememory.mv.RowVersionFreeList.saveMetadata(RowVersionFreeList.java:185)
>  ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.lambda$syncMetadataOnCheckpoint$13(PersistentPageMemoryMvPartitionStorage.java:345)
>  ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.persistence.checkpoint.AwaitTasksCompletionExecutor.lambda$execute$1(AwaitTasksCompletionExecutor.java:59)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  ~[?:?]   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  ~[?:?]   ... 1 more{code}
> [https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7820824?expandBuildDeploymentsSection=false=false=false=true=true+Inspection=true]
>  
> The reason of the assertion is a bug/race in listeners unregistration for 
> partitions freelists. We should do it properly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21469) AssertionError in checkpoint

2024-02-06 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21469:
---
Ignite Flags:   (was: Docs Required,Release Notes Required)

> AssertionError in checkpoint
> 
>
> Key: IGNITE-21469
> URL: https://issues.apache.org/jira/browse/IGNITE-21469
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
>  
> {code:java}
>   at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:870)
>  ~[?:?]   at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
>  ~[?:?]   at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>  ~[?:?]   at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
>  ~[?:?]   at 
> org.apache.ignite.internal.pagememory.persistence.checkpoint.AwaitTasksCompletionExecutor.lambda$execute$1(AwaitTasksCompletionExecutor.java:63)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  ~[?:?]   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  ~[?:?]   ... 1 more Caused by: java.lang.AssertionError: FullPageId 
> [pageId=000100020378, effectivePageId=00020378, groupId=886]  
>  at 
> org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:758)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:641)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:613)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.util.PageHandler.writePage(PageHandler.java:280)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.datastructure.DataStructure.write(DataStructure.java:296)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.freelist.PagesList.flushBucketsCache(PagesList.java:387)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.freelist.PagesList.saveMetadata(PagesList.java:332)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.storage.pagememory.mv.RowVersionFreeList.saveMetadata(RowVersionFreeList.java:185)
>  ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.lambda$syncMetadataOnCheckpoint$13(PersistentPageMemoryMvPartitionStorage.java:345)
>  ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.persistence.checkpoint.AwaitTasksCompletionExecutor.lambda$execute$1(AwaitTasksCompletionExecutor.java:59)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  ~[?:?]   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  ~[?:?]   ... 1 more{code}
> [https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7820824?expandBuildDeploymentsSection=false=false=false=true=true+Inspection=true]
>  
> The reason of the assertion is a bug/race in listeners unregistration for 
> partitions freelists. We should do it properly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21469) AssertionError in checkpoint

2024-02-06 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21469:
---
Labels: ignite-3  (was: )

> AssertionError in checkpoint
> 
>
> Key: IGNITE-21469
> URL: https://issues.apache.org/jira/browse/IGNITE-21469
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
>  
> {code:java}
>   at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:870)
>  ~[?:?]   at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
>  ~[?:?]   at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>  ~[?:?]   at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
>  ~[?:?]   at 
> org.apache.ignite.internal.pagememory.persistence.checkpoint.AwaitTasksCompletionExecutor.lambda$execute$1(AwaitTasksCompletionExecutor.java:63)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  ~[?:?]   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  ~[?:?]   ... 1 more Caused by: java.lang.AssertionError: FullPageId 
> [pageId=000100020378, effectivePageId=00020378, groupId=886]  
>  at 
> org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:758)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:641)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:613)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.util.PageHandler.writePage(PageHandler.java:280)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.datastructure.DataStructure.write(DataStructure.java:296)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.freelist.PagesList.flushBucketsCache(PagesList.java:387)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.freelist.PagesList.saveMetadata(PagesList.java:332)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.storage.pagememory.mv.RowVersionFreeList.saveMetadata(RowVersionFreeList.java:185)
>  ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.lambda$syncMetadataOnCheckpoint$13(PersistentPageMemoryMvPartitionStorage.java:345)
>  ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> org.apache.ignite.internal.pagememory.persistence.checkpoint.AwaitTasksCompletionExecutor.lambda$execute$1(AwaitTasksCompletionExecutor.java:59)
>  ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  ~[?:?]   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  ~[?:?]   ... 1 more{code}
> [https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7820824?expandBuildDeploymentsSection=false=false=false=true=true+Inspection=true]
>  
> The reason of the assertion is a bug/race in listeners unregistration for 
> partitions freelists. We should do it properly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21469) AssertionError in checkpoint

2024-02-06 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21469:
--

 Summary: AssertionError in checkpoint
 Key: IGNITE-21469
 URL: https://issues.apache.org/jira/browse/IGNITE-21469
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


 
{code:java}
  at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:870)
 ~[?:?]   at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
 ~[?:?]   at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) 
~[?:?]   at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
 ~[?:?]   at 
org.apache.ignite.internal.pagememory.persistence.checkpoint.AwaitTasksCompletionExecutor.lambda$execute$1(AwaitTasksCompletionExecutor.java:63)
 ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
~[?:?]   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
~[?:?]   ... 1 more Caused by: java.lang.AssertionError: FullPageId 
[pageId=000100020378, effectivePageId=00020378, groupId=886]   
at 
org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:758)
 ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:641)
 ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:613)
 ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
org.apache.ignite.internal.pagememory.util.PageHandler.writePage(PageHandler.java:280)
 ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
org.apache.ignite.internal.pagememory.datastructure.DataStructure.write(DataStructure.java:296)
 ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
org.apache.ignite.internal.pagememory.freelist.PagesList.flushBucketsCache(PagesList.java:387)
 ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
org.apache.ignite.internal.pagememory.freelist.PagesList.saveMetadata(PagesList.java:332)
 ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
org.apache.ignite.internal.storage.pagememory.mv.RowVersionFreeList.saveMetadata(RowVersionFreeList.java:185)
 ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.lambda$syncMetadataOnCheckpoint$13(PersistentPageMemoryMvPartitionStorage.java:345)
 ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
org.apache.ignite.internal.pagememory.persistence.checkpoint.AwaitTasksCompletionExecutor.lambda$execute$1(AwaitTasksCompletionExecutor.java:59)
 ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
~[?:?]   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
~[?:?]   ... 1 more{code}
[https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/7820824?expandBuildDeploymentsSection=false=false=false=true=true+Inspection=true]

 

The reason of the assertion is a bug/race in listeners unregistration for 
partitions freelists. We should do it properly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IGNITE-21044) Investigate long table creation

2024-02-06 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov resolved IGNITE-21044.

Resolution: Done

> Investigate long table creation
> ---
>
> Key: IGNITE-21044
> URL: https://issues.apache.org/jira/browse/IGNITE-21044
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> If we run the test, in which we would create a lot of tables (mare than 200? 
> for example), we soon start seeing a degradation in table creation time.
> In particular, handling of corresponding Catalog update might take literal 
> seconds.
> One of the reasons is described here: 
> https://issues.apache.org/jira/browse/IGNITE-19913
> It explains why table creation might be slow, but it does not explain why it 
> degrades when we create more tables. So there are basically two issues:
>  * watch processing waits for unnecessary operations to complete
>  * those operations are too slow for some reason
> We need to investigate and fix both issues



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-21466) Add metrics for partition states

2024-02-06 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-21466:
--

Assignee: Ivan Bessonov

> Add metrics for partition states
> 
>
> Key: IGNITE-21466
> URL: https://issues.apache.org/jira/browse/IGNITE-21466
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21466) Add metrics for partition states

2024-02-06 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21466:
--

 Summary: Add metrics for partition states
 Key: IGNITE-21466
 URL: https://issues.apache.org/jira/browse/IGNITE-21466
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21465) Add system views for partition states

2024-02-06 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21465:
--

 Summary: Add system views for partition states
 Key: IGNITE-21465
 URL: https://issues.apache.org/jira/browse/IGNITE-21465
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21446) Import JVM args from build.gradle for JUnit run configurations

2024-02-05 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21446:
---
Ignite Flags:   (was: Docs Required,Release Notes Required)

> Import JVM args from build.gradle for JUnit run configurations
> --
>
> Key: IGNITE-21446
> URL: https://issues.apache.org/jira/browse/IGNITE-21446
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This should help running tests locally with IDEA runner on Java 17



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21446) Import JVM args from build.gradle for JUnit run configurations

2024-02-05 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21446:
---
Reviewer: Kirill Tkalenko

> Import JVM args from build.gradle for JUnit run configurations
> --
>
> Key: IGNITE-21446
> URL: https://issues.apache.org/jira/browse/IGNITE-21446
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This should help running tests locally with IDEA runner on Java 17



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21446) Import JVM args from build.gradle for JUnit run configurations

2024-02-05 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21446:
--

 Summary: Import JVM args from build.gradle for JUnit run 
configurations
 Key: IGNITE-21446
 URL: https://issues.apache.org/jira/browse/IGNITE-21446
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov
Assignee: Ivan Bessonov
 Fix For: 3.0.0-beta2


This should help running tests locally with IDEA runner on Java 17



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21434) Fail user write requests for non-available partitions

2024-02-02 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21434:
--

 Summary: Fail user write requests for non-available partitions
 Key: IGNITE-21434
 URL: https://issues.apache.org/jira/browse/IGNITE-21434
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


Currently, {{INSERT INTO test VALUES(%d, %d);}} just hands indefinitely, which 
is not what you would expect. We should either fail the request immediately if 
there's no majority, or return a replication timeout exception, for example.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IGNITE-20067) Optimize "StorageUpdateHandler#handleUpdateAll"

2024-01-30 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov resolved IGNITE-20067.

Fix Version/s: 3.0.0-beta2
 Reviewer: Ivan Bessonov
   Resolution: Fixed

> Optimize "StorageUpdateHandler#handleUpdateAll"
> ---
>
> Key: IGNITE-20067
> URL: https://issues.apache.org/jira/browse/IGNITE-20067
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Philipp Shergalis
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> In current implementation, the size of a single batch inside of the 
> "runConsistently" is unpredictable, because the collection of rows is 
> received from the message.
> Generally speaking, it's a good idea to make the scope of single 
> "runConsistently" smaller - it would lead to faster work in all storage 
> engines:
>  * for rocksdb, write batches would become smaller;
>  * for page memory, spikes on checkpoint would become smaller.
> There are two criteria that we could use:
>  * number of rows stored;
>  * cumulative number of inserted bytes.
> Raft does the same approximation when batching log records, for example. This 
> should not affect the data consistency, because updateAll itself is 
> idempotent by its nature



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21359) There are 2 RebalanceUtil classes

2024-01-25 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21359:
--

 Summary: There are 2 RebalanceUtil classes
 Key: IGNITE-21359
 URL: https://issues.apache.org/jira/browse/IGNITE-21359
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov
Assignee: Ivan Bessonov
 Fix For: 3.0.0-beta2


and they duplicate constants and methods. The least that we could do is remove 
code duplication and maybe rename one of these classes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21347) Fix license header extra whitespaces in ErrorCodeGroup annotation processor

2024-01-25 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21347:
---
Labels: ignite-3  (was: )

> Fix license header extra whitespaces in ErrorCodeGroup annotation processor 
> 
>
> Key: IGNITE-21347
> URL: https://issues.apache.org/jira/browse/IGNITE-21347
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Dmitrii Zabotlin
>Assignee: Dmitrii Zabotlin
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There are extra whitespaces in the license headers in the generated error 
> codes files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-21284) Internal API for manual raft group configuration update

2024-01-22 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-21284:
--

Assignee: Ivan Bessonov

> Internal API for manual raft group configuration update
> ---
>
> Key: IGNITE-21284
> URL: https://issues.apache.org/jira/browse/IGNITE-21284
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> We need an API (with implementation) that's analogous to 
> "reset-lost-partitions", but with the ability to reuse living minority of 
> nodes.
> This API should gather the states of partitions, identify healthy peers, and 
> use them as a new raft group configuration (through the update of 
> assignments).
> We have to make sure that node with latest log index will become a leader, so 
> we will have to propagate desired minimum for log index in assignments and 
> use it during the voting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21309) DirectMessageWriter keeps holding used buffers

2024-01-18 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21309:
---
Reviewer: Kirill Tkalenko

> DirectMessageWriter keeps holding used buffers
> --
>
> Key: IGNITE-21309
> URL: https://issues.apache.org/jira/browse/IGNITE-21309
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Thread-local optimized marshallers store links to write buffers in their 
> internal stacks, which could lead to occasional OOMs. We should release 
> buffers after writing nested messages in DirectMessageWriter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21309) DirectMessageWriter keeps holding used buffers

2024-01-18 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21309:
--

 Summary: DirectMessageWriter keeps holding used buffers
 Key: IGNITE-21309
 URL: https://issues.apache.org/jira/browse/IGNITE-21309
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov
Assignee: Ivan Bessonov
 Fix For: 3.0.0-beta2


Thread-local optimized marshallers store links to write buffers in their 
internal stacks, which could lead to occasional OOMs. We should release buffers 
after writing nested messages in DirectMessageWriter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21305) Internal API for truncating log suffix

2024-01-18 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21305:
--

 Summary: Internal API for truncating log suffix
 Key: IGNITE-21305
 URL: https://issues.apache.org/jira/browse/IGNITE-21305
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


API and implementation is needed to truncate suffix of peers in ERROR state 
that cannot proceed applying commands



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21305) Internal API for truncating log suffix

2024-01-18 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21305:
---
Ignite Flags:   (was: Docs Required,Release Notes Required)

> Internal API for truncating log suffix
> --
>
> Key: IGNITE-21305
> URL: https://issues.apache.org/jira/browse/IGNITE-21305
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>
> API and implementation is needed to truncate suffix of peers in ERROR state 
> that cannot proceed applying commands



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21256) Internal API for local partition states

2024-01-18 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21256:
---
Description: 
Please refer to https://issues.apache.org/jira/browse/IGNITE-21140 for the 
list. We need an API (with implementation) to access the list of local 
partitions and their states. The way to determine them:
 * comparing current assignments with replica states
 * check the state machine, it might be broken or installing snapshot

  was:
Please refer to https://issues.apache.org/jira/browse/IGNITE-21140 for the 
list. We need an API to access the list of local partitions and their states. 
The way to determine them:
 * comparing current assignments with replica states
 * check the state machine, it might be broken or installing snapshot


> Internal API for local partition states
> ---
>
> Key: IGNITE-21256
> URL: https://issues.apache.org/jira/browse/IGNITE-21256
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Please refer to https://issues.apache.org/jira/browse/IGNITE-21140 for the 
> list. We need an API (with implementation) to access the list of local 
> partitions and their states. The way to determine them:
>  * comparing current assignments with replica states
>  * check the state machine, it might be broken or installing snapshot



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21284) Internal API for manual raft group configuration update

2024-01-18 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21284:
---
Description: 
We need an API (with implementation) that's analogous to 
"reset-lost-partitions", but with the ability to reuse living minority of nodes.

This API should gather the states of partitions, identify healthy peers, and 
use them as a new raft group configuration (through the update of assignments).

We have to make sure that node with latest log index will become a leader, so 
we will have to propagate desired minimum for log index in assignments and use 
it during the voting.

  was:
We need an API that's analogous to "reset-lost-partitions", but with the 
ability to reuse living minority of nodes.

This API should gather the states of partitions, identify healthy peers, and 
use them as a new raft group configuration (through the update of assignments).

We have to make sure that node with latest log index will become a leader, so 
we will have to propagate desired minimum for log index in assignments and use 
it during the voting.


> Internal API for manual raft group configuration update
> ---
>
> Key: IGNITE-21284
> URL: https://issues.apache.org/jira/browse/IGNITE-21284
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> We need an API (with implementation) that's analogous to 
> "reset-lost-partitions", but with the ability to reuse living minority of 
> nodes.
> This API should gather the states of partitions, identify healthy peers, and 
> use them as a new raft group configuration (through the update of 
> assignments).
> We have to make sure that node with latest log index will become a leader, so 
> we will have to propagate desired minimum for log index in assignments and 
> use it during the voting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21304) Internal API for restarting partitions

2024-01-18 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21304:
--

 Summary: Internal API for restarting partitions
 Key: IGNITE-21304
 URL: https://issues.apache.org/jira/browse/IGNITE-21304
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


API and implementation should be provided for restarting peers in raft groups.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21303) Exclude nodes in "error" state from manual group reconfiguration

2024-01-18 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21303:
--

 Summary: Exclude nodes in "error" state from manual group 
reconfiguration
 Key: IGNITE-21303
 URL: https://issues.apache.org/jira/browse/IGNITE-21303
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


Instead of simply using existing set of node as a baseline for new assignments, 
we should either exclude peers in ERROR state from it, or force data cleanup on 
such nodes. Third option - forbid such reconfiguration, forcing user to clear 
ERROR peers in advance



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21302) Prohibit automatic group reconfiguration when there's no majority

2024-01-18 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21302:
--

 Summary: Prohibit automatic group reconfiguration when there's no 
majority
 Key: IGNITE-21302
 URL: https://issues.apache.org/jira/browse/IGNITE-21302
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


scaleDown timer should not lead to a situation where user loses the data.

Default "changePeers" behavior also won't work, because there's no majority and 
thus no leader.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21301) Sync raft log before flush in all storage engines

2024-01-18 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21301:
--

 Summary: Sync raft log before flush in all storage engines
 Key: IGNITE-21301
 URL: https://issues.apache.org/jira/browse/IGNITE-21301
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


Checkpoints and RocsDB's flush actions should sync log before completing 
writing data to disk, if "fsync" is disabled



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21300) Implement disaster recovery for secondary indexes

2024-01-18 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21300:
--

 Summary: Implement disaster recovery for secondary indexes
 Key: IGNITE-21300
 URL: https://issues.apache.org/jira/browse/IGNITE-21300
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


It is possible that if we lost part of the log, some available indexes might 
become "locally" unavailable. We will have to finish build process second time 
in such a case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21299) Rest API for disaster recovery commands

2024-01-18 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21299:
--

 Summary: Rest API for disaster recovery commands
 Key: IGNITE-21299
 URL: https://issues.apache.org/jira/browse/IGNITE-21299
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


Please refer to https://issues.apache.org/jira/browse/IGNITE-21298 for a list



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21298) CLI for disaster recovery commands

2024-01-18 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21298:
--

 Summary: CLI for disaster recovery commands
 Key: IGNITE-21298
 URL: https://issues.apache.org/jira/browse/IGNITE-21298
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


Names might change.
 * ignite restart-partitions --nodes  [--zones ]
[--partitions ] [--purge]


 * ignite reset-lost-partitions [--zones ]
[--partitions ]


 * ignite truncate-log-suffix --zone  --partition  
--index 


 * ignite partition-states [--local [--nodes ] | --global] [--zones 
] [--partitions ]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-21295) Public Java API for manual raft group configuration update

2024-01-18 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-21295:
--

 Summary: Public Java API for manual raft group configuration update
 Key: IGNITE-21295
 URL: https://issues.apache.org/jira/browse/IGNITE-21295
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


Implement public API for IGNITE-21284



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   6   7   8   9   10   >