[jira] [Commented] (IGNITE-23413) Catalog compaction. Component to determine minimum catalog version required by rebalance.

2024-11-05 Thread Ivan Bessonov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-23413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17895644#comment-17895644
 ] 

Ivan Bessonov commented on IGNITE-23413:


What I think needs to be done:
 * We should fix current local meta-storage version and use it for further 
reads.
 * We should read all pending and planned assignments, there's a timestamp 
stored in each of them. We calculate the minimal one.
 * If there are no pending and planned assignments, we might return the 
timestamp that's associated with meta-storage revision.

This approach almost works, there are situations with nuances. These are the 
cases where we're in-between some operation. Let's examine them:
 * Zone is created/altered, but assignments are not yet saved. This is the case 
when assignments timestamp is below latest zone's timestamp.
 ** For ALTER, returning an older timestamp from assignments is not a problem, 
it'll eventually become more recent.
 ** For CREATE, we should probably determine if assignments are not yet saved, 
and use the timestamp from catalog.
 * A list of data nodes is updated, but assignments are not yet re-calculated 
because of timeout.
 ** Current code uses "latest" catalog state when it transforms data nodes into 
assignments, so it is safe to use timestamp of latest catalog version.
There's a Jira that aims to fix it: 
https://issues.apache.org/jira/browse/IGNITE-22723. This means that current 
approach might not work in the future.

Anyway, considering everything from the above, there's one situation that we 
must keep in mind:
 * DZ is updated at catalog version 15.
 * Assignments are calculated for the same exact catalog version.
 * Nothing changes for a long time. Several days, for example.
 * During that time, catalog version increases and becomes 75, for example.

If nothing changes, we should be able to remove versions 15-74, because DZ 
settings from versions 15 and 75 are identical.
It seems like the proposed algorithm works exactly like we need.

> Catalog compaction. Component to determine minimum catalog version required 
> by rebalance.
> -
>
> Key: IGNITE-23413
> URL: https://issues.apache.org/jira/browse/IGNITE-23413
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Pereslegin
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Each rebalance procedure uses specific catalog version, it "holds" the 
> timestamp corresponding to the latest (at the moment of rebalancing start) 
> version of the catalog
> To be able safely perform catalog compaction, we need to design and implement 
> a component that can determine the minimum version required for active 
> rebalances (to avoid deleting this version during compaction).
> {code:java}
> interface RebalanceMinimumRequiredTimeProvider {
> /**
>  * Returns the minimum time required for rebalance,
>  * or current timestamp if there are no active 
>  * rebalances and there is a guarantee that all rebalances
>  * launched in the future will use catalog version 
>  * corresponding to the current time or greater.
>  */
> long minimumRequiredTime();
> }
> {code}
> The component can be either global or local (whichever is easier to 
> implement). This means that the compaction procedure can call the component 
> on all nodes in the cluster and calculate the minimum.
> The component must be able to track rebalances that may be triggered during 
> "replay" of the metastorage raft log.
> The component should return only monotonically increasing values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-23582) Xmx and GC options are ignored on startup

2024-11-05 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-23582:
---
Labels: ignite-3  (was: )

> Xmx and GC options are ignored on startup
> -
>
> Key: IGNITE-23582
> URL: https://issues.apache.org/jira/browse/IGNITE-23582
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Zlenko
>Assignee: Ivan Zlenko
>Priority: Critical
>  Labels: ignite-3
> Fix For: 3.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Even though we can set values for Xmx and GC options in vars.env they are not 
> applied to the service.
> We need to fix this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-23588) B+Tree corruption during concurrent removes

2024-11-01 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-23588:
---
Description: 
{{ItBplusTreePersistentPageMemoryTest#testMassiveRemove2_true}} fails on TC 
sometimes.

Can be reproduced locally. In order to make it faster, data region can be 
reduced to 32Mb, number of threads to 16 and number of keys to about 8000 (I 
used {{{}threads*50{}}}).

It's not clear how precisely it happens, but during a remove the 
{{needReplaceInner}} logic does not work as it should sometimes, leading to an 
inner node that holds an obsolete key.

Must be fixed in both Ignite 2 and Ignite 3.
{code:java}
[org.apache.ignite.internal.pagememory.tree.persistence.ItBplusTreePersistentPageMemoryTest.testMassiveRemove2_true()]
 org.opentest4j.AssertionFailedError: Removed row: 
683[11:44:19][org.apache.ignite.internal.pagememory.tree.persistence.ItBplusTreePersistentPageMemoryTest.testMassiveRemove2_true()]
 org.opentest4j.AssertionFailedError: Removed row: 683
at 
app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.checkNotRemoved(AbstractBplusTreePageMemoryTest.java:2948)
at 
app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.getLookupRow(AbstractBplusTreePageMemoryTest.java:2965)
at 
app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.getLookupRow(AbstractBplusTreePageMemoryTest.java:2918)
at 
app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$TestTree.compare(AbstractBplusTreePageMemoryTest.java:2844)
at 
app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$TestTree.compare(AbstractBplusTreePageMemoryTest.java:2787)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree.compare(BplusTree.java:5748)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree.findInsertionPoint(BplusTree.java:5652)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree$Search.run0(BplusTree.java:398)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree$GetPageHandler.run(BplusTree.java:6422)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree$Search.run(BplusTree.java:370)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree$GetPageHandler.run(BplusTree.java:6398)
at 
app//org.apache.ignite.internal.pagememory.util.PageHandler.readPage(PageHandler.java:157)
at 
app//org.apache.ignite.internal.pagememory.datastructure.DataStructure.read(DataStructure.java:391)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree.read(BplusTree.java:6639)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree.removeDown(BplusTree.java:2300)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree.removeDown(BplusTree.java:2320)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree.removeDown(BplusTree.java:2320)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree.doRemove(BplusTree.java:2238)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree.remove(BplusTree.java:2067)
at 
app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest.lambda$doTestMassiveRemove$0(AbstractBplusTreePageMemoryTest.java:894)
at 
app//org.apache.ignite.internal.testframework.IgniteTestUtils.lambda$runMultiThreaded$2(IgniteTestUtils.java:569)
at java.base@11.0.17/java.lang.Thread.run(Thread.java:834) {code}

  was:
ItBplusTreePersistentPageMemoryTest#testMassiveRemove2_true fails on TC 
sometimes.

Can be reproduced locally. In order to make it faster, data region can be 
reduced to 32Mb, number of threads to 16 and number of keys to about 8000 (I 
used {{{}threads*50{}}}).

It's not clear how precisely it happens, but during a remove the 
{{needReplaceInner}} logic does not work as it should sometimes, leading to an 
inner node that holds an obsolete key.

Must be fixed in both Ignite 2 and Ignite 3.
{code:java}
[org.apache.ignite.internal.pagememory.tree.persistence.ItBplusTreePersistentPageMemoryTest.testMassiveRemove2_true()]
 org.opentest4j.AssertionFailedError: Removed row: 
683[11:44:19][org.apache.ignite.internal.pagememory.tree.persistence.ItBplusTreePersistentPageMemoryTest.testMassiveRemove2_true()]
 org.opentest4j.AssertionFailedError: Removed row: 683
at 
app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.checkNotRemoved(AbstractBplusTreePageMemoryTest.java:2948)
at 
app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.getLookupRow(AbstractBplusTreePageMemoryTest.java:2965)
at 
app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.getLookupR

[jira] [Updated] (IGNITE-23588) B+Tree corruption during concurrent removes

2024-11-01 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-23588:
---
Description: 
ItBplusTreePersistentPageMemoryTest#testMassiveRemove2_true fails on TC 
sometimes.

Can be reproduced locally. In order to make it faster, data region can be 
reduced to 32Mb, number of threads to 16 and number of keys to about 8000 (I 
used {{{}threads*50{}}}).

It's not clear how precisely it happens, but during a remove the 
{{needReplaceInner}} logic does not work as it should sometimes, leading to an 
inner node that holds an obsolete key.

Must be fixed in both Ignite 2 and Ignite 3.
{code:java}
[org.apache.ignite.internal.pagememory.tree.persistence.ItBplusTreePersistentPageMemoryTest.testMassiveRemove2_true()]
 org.opentest4j.AssertionFailedError: Removed row: 
683[11:44:19][org.apache.ignite.internal.pagememory.tree.persistence.ItBplusTreePersistentPageMemoryTest.testMassiveRemove2_true()]
 org.opentest4j.AssertionFailedError: Removed row: 683
at 
app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.checkNotRemoved(AbstractBplusTreePageMemoryTest.java:2948)
at 
app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.getLookupRow(AbstractBplusTreePageMemoryTest.java:2965)
at 
app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.getLookupRow(AbstractBplusTreePageMemoryTest.java:2918)
at 
app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$TestTree.compare(AbstractBplusTreePageMemoryTest.java:2844)
at 
app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$TestTree.compare(AbstractBplusTreePageMemoryTest.java:2787)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree.compare(BplusTree.java:5748)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree.findInsertionPoint(BplusTree.java:5652)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree$Search.run0(BplusTree.java:398)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree$GetPageHandler.run(BplusTree.java:6422)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree$Search.run(BplusTree.java:370)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree$GetPageHandler.run(BplusTree.java:6398)
at 
app//org.apache.ignite.internal.pagememory.util.PageHandler.readPage(PageHandler.java:157)
at 
app//org.apache.ignite.internal.pagememory.datastructure.DataStructure.read(DataStructure.java:391)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree.read(BplusTree.java:6639)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree.removeDown(BplusTree.java:2300)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree.removeDown(BplusTree.java:2320)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree.removeDown(BplusTree.java:2320)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree.doRemove(BplusTree.java:2238)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree.remove(BplusTree.java:2067)
at 
app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest.lambda$doTestMassiveRemove$0(AbstractBplusTreePageMemoryTest.java:894)
at 
app//org.apache.ignite.internal.testframework.IgniteTestUtils.lambda$runMultiThreaded$2(IgniteTestUtils.java:569)
at java.base@11.0.17/java.lang.Thread.run(Thread.java:834) {code}

  was:
ItBplusTreePersistentPageMemoryTest#testMassiveRemove2_true fails on TC 
sometimes.

Can be reproduced locally. In order to make it faster, data region can be 
reduced to 32Mb, number of threads to 16 and number of keys to about 8000 (I 
used {{{}threads*50{}}}).

It's not clear how precisely it happens, but during a remove the 
{{needReplaceInner}} logic does not work as it should sometimes, leading to an 
inner node that holds an obsolete key.
Must be fixed in both Ignite 2 and Ignite 3.
{code:java}
[org.apache.ignite.internal.pagememory.tree.persistence.ItBplusTreePersistentPageMemoryTest.testMassiveRemove2_true()]
 org.opentest4j.AssertionFailedError: Removed row: 
683[11:44:19][org.apache.ignite.internal.pagememory.tree.persistence.ItBplusTreePersistentPageMemoryTest.testMassiveRemove2_true()]
 org.opentest4j.AssertionFailedError: Removed row: 683
at 
app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.checkNotRemoved(AbstractBplusTreePageMemoryTest.java:2948)
at 
app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.getLookupRow(AbstractBplusTreePageMemoryTest.java:2965)
at 
app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.getLookupRow(Ab

[jira] [Created] (IGNITE-23588) B+Tree corruption during concurrent removes

2024-11-01 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-23588:
--

 Summary: B+Tree corruption during concurrent removes
 Key: IGNITE-23588
 URL: https://issues.apache.org/jira/browse/IGNITE-23588
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov


ItBplusTreePersistentPageMemoryTest#testMassiveRemove2_true fails on TC 
sometimes.

Can be reproduced locally. In order to make it faster, data region can be 
reduced to 32Mb, number of threads to 16 and number of keys to about 8000 (I 
used {{{}threads*50{}}}).

It's not clear how precisely it happens, but during a remove the 
{{needReplaceInner}} logic does not work as it should sometimes, leading to an 
inner node that holds an obsolete key.
Must be fixed in both Ignite 2 and Ignite 3.
{code:java}
[org.apache.ignite.internal.pagememory.tree.persistence.ItBplusTreePersistentPageMemoryTest.testMassiveRemove2_true()]
 org.opentest4j.AssertionFailedError: Removed row: 
683[11:44:19][org.apache.ignite.internal.pagememory.tree.persistence.ItBplusTreePersistentPageMemoryTest.testMassiveRemove2_true()]
 org.opentest4j.AssertionFailedError: Removed row: 683
at 
app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.checkNotRemoved(AbstractBplusTreePageMemoryTest.java:2948)
at 
app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.getLookupRow(AbstractBplusTreePageMemoryTest.java:2965)
at 
app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.getLookupRow(AbstractBplusTreePageMemoryTest.java:2918)
at 
app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$TestTree.compare(AbstractBplusTreePageMemoryTest.java:2844)
at 
app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$TestTree.compare(AbstractBplusTreePageMemoryTest.java:2787)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree.compare(BplusTree.java:5748)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree.findInsertionPoint(BplusTree.java:5652)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree$Search.run0(BplusTree.java:398)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree$GetPageHandler.run(BplusTree.java:6422)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree$Search.run(BplusTree.java:370)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree$GetPageHandler.run(BplusTree.java:6398)
at 
app//org.apache.ignite.internal.pagememory.util.PageHandler.readPage(PageHandler.java:157)
at 
app//org.apache.ignite.internal.pagememory.datastructure.DataStructure.read(DataStructure.java:391)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree.read(BplusTree.java:6639)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree.removeDown(BplusTree.java:2300)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree.removeDown(BplusTree.java:2320)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree.removeDown(BplusTree.java:2320)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree.doRemove(BplusTree.java:2238)
at 
app//org.apache.ignite.internal.pagememory.tree.BplusTree.remove(BplusTree.java:2067)
at 
app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest.lambda$doTestMassiveRemove$0(AbstractBplusTreePageMemoryTest.java:894)
at 
app//org.apache.ignite.internal.testframework.IgniteTestUtils.lambda$runMultiThreaded$2(IgniteTestUtils.java:569)
at java.base@11.0.17/java.lang.Thread.run(Thread.java:834) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-23549) MetaStorageListener doesn't flush on snapshot creation

2024-10-25 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-23549:
--

 Summary: MetaStorageListener doesn't flush on snapshot creation
 Key: IGNITE-23549
 URL: https://issues.apache.org/jira/browse/IGNITE-23549
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov


Just like partitions, we need to force the flushing of metastorage data to the 
storage, so that log won't be truncated before data is persisted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-23547) Limit meta-storage log storage size

2024-10-25 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-23547:
--

 Summary: Limit meta-storage log storage size
 Key: IGNITE-23547
 URL: https://issues.apache.org/jira/browse/IGNITE-23547
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


Technically, this issue affects all raft logs, but let's start with a 
meta-storage. Given the constant background update of meta-storage data, it 
always keeps growing. The resulting log size might easily exceed several 
gigabytes, if cluster worked for a few hours, which is too much for service 
data.

We should make raft snapshot frequency configurable, and provide a higher 
default value (more frequent). Also, it would be nice to come up with size 
limits, as we do for WAL in Ignite 2.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-23550) Test and optimize metastorage snapshot transfer and recovery speed for new nodes

2024-10-25 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-23550:
--

 Summary: Test and optimize metastorage snapshot transfer and 
recovery speed for new nodes
 Key: IGNITE-23550
 URL: https://issues.apache.org/jira/browse/IGNITE-23550
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


Test and optimize metastorage snapshot transfer and recovery speed for new 
nodes.

Let's assume that we have a 100Mb+ meta-storage snapshot and 100k+ entries in 
raft log replicated as log.

How long would it take for a new node to join the cluster under these 
conditions? Will something break? What can we do to make it work?

Goal is - the joining process should work for a long-running clusters. It 
should be pretty fast as well. Less than 10 seconds for sure, of course 
depending on the network capabilities. No timeout errors should occur if it 
takes more than 10 seconds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-23548) Investigate the project for missing logs of unexpected "Throwable"

2024-10-25 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-23548:
--

 Summary: Investigate the project for missing logs of unexpected 
"Throwable"
 Key: IGNITE-23548
 URL: https://issues.apache.org/jira/browse/IGNITE-23548
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov


Primary candidates:
 * {{{}netty{}}}, there's no default handler
 * some thread-pools might not have default handlers too
 ** this includes disruptors



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-23413) Catalog compaction. Component to determine minimum catalog version required by rebalance.

2024-10-22 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-23413:
--

Assignee: Ivan Bessonov

> Catalog compaction. Component to determine minimum catalog version required 
> by rebalance.
> -
>
> Key: IGNITE-23413
> URL: https://issues.apache.org/jira/browse/IGNITE-23413
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Pereslegin
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Each rebalance procedure uses specific catalog version, it "holds" the 
> timestamp corresponding to the latest (at the moment of rebalancing start) 
> version of the catalog
> To be able safely perform catalog compaction, we need to design and implement 
> a component that can determine the minimum version required for active 
> rebalances (to avoid deleting this version during compaction).
> {code:java}
> interface RebalanceMinimumRequiredTimeProvider {
> /**
>  * Returns the minimum time required for rebalance,
>  * or current timestamp if there are no active 
>  * rebalances and there is a guarantee that all rebalances
>  * launched in the future will use catalog version 
>  * corresponding to the current time or greater.
>  */
> long minimumRequiredTime();
> }
> {code}
> The component can be either global or local (whichever is easier to 
> implement). This means that the compaction procedure can call the component 
> on all nodes in the cluster and calculate the minimum.
> The component must be able to track rebalances that may be triggered during 
> "replay" of the metastorage raft log.
> The component should return only monotonically increasing values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-23128) NullPointerException if non-existent profile passed into zone and table

2024-10-21 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-23128:
--

Assignee: Roman Puchkovskiy

> NullPointerException if non-existent profile passed into zone and table 
> 
>
> Key: IGNITE-23128
> URL: https://issues.apache.org/jira/browse/IGNITE-23128
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Zlenko
>Assignee: Roman Puchkovskiy
>Priority: Major
>  Labels: ignite-3
>
> If we try to create a table with a zone containing a profile that not 
> described in the node configuration we will get NullPointerException trying 
> to create such table. 
> To reproduce you need to do following:
> 1. Execute command 
> {code:sql}
> create zone test with storage_profiles='IAmAPhantomProfile'
> {code}
> Where IAmAPhantomProfile is not described in storage.profiles section of node 
> configuration.
> 2. Execute command
> {code:sql}
> create table test (I int) with storage_profile='IAmAPhantomProfile'
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-23318) Unable to restart cluster multiple times.

2024-10-14 Thread Ivan Bessonov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-23318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17889220#comment-17889220
 ] 

Ivan Bessonov commented on IGNITE-23318:


What's done here:
 * Recover meta storage from persisted data, instead of recovering it from a 
snapshot. This way we will only recover a limited number of data, usually it 
would fit into a single mem-table. Maybe several mem-tables at worst.
To do that, I introduced index/term/configuration values into the storage, just 
like it's done in partitions.
As of right now, "raft snapshots" code for meta storage is not affected, let's 
do that separately.

> Unable to restart cluster multiple times.
> -
>
> Key: IGNITE-23318
> URL: https://issues.apache.org/jira/browse/IGNITE-23318
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 3.0
>Reporter: Iurii Gerzhedovich
>Assignee: Ivan Bessonov
>Priority: Blocker
>  Labels: ignite-3
> Attachments: ignite.log
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> We have TPCH benchmarks in Ignite 3 
> (org.apache.ignite.internal.benchmark.TpchBenchmark). The benchmark can 
> prepare data in a cluster and run benchmarks a few times on the same data set 
> without data reload. During run the benchmarks I observe the following issues:
> 1. After a few such runs, I have situations when the nodes cannot assemble 
> into a cluster. 100% reproducible but with a different number of restarts.
> 2. With every restart the startup logs become longer and longer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-23318) Unable to restart cluster multiple times.

2024-10-11 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-23318:
---
Reviewer: Aleksandr Polovtsev

> Unable to restart cluster multiple times.
> -
>
> Key: IGNITE-23318
> URL: https://issues.apache.org/jira/browse/IGNITE-23318
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 3.0
>Reporter: Iurii Gerzhedovich
>Assignee: Ivan Bessonov
>Priority: Blocker
>  Labels: ignite-3
> Attachments: ignite.log
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We have TPCH benchmarks in Ignite 3 
> (org.apache.ignite.internal.benchmark.TpchBenchmark). The benchmark can 
> prepare data in a cluster and run benchmarks a few times on the same data set 
> without data reload. During run the benchmarks I observe the following issues:
> 1. After a few such runs, I have situations when the nodes cannot assemble 
> into a cluster. 100% reproducible but with a different number of restarts.
> 2. With every restart the startup logs become longer and longer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-23393) RocksSnapshotManager#restoreSnapshot is not fail-proof

2024-10-09 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-23393:
--

 Summary: RocksSnapshotManager#restoreSnapshot is not fail-proof
 Key: IGNITE-23393
 URL: https://issues.apache.org/jira/browse/IGNITE-23393
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov


If we fail between in-between different column families, node could be 
restarted with corrupted storage next time.

We should probably delete everything if snapshot installation failed.

Also, when we start on partially restored snapshot, we should be able to detect 
that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-23365) Fix huge log message with assignments

2024-10-04 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-23365:
--

 Summary: Fix huge log message with assignments
 Key: IGNITE-23365
 URL: https://issues.apache.org/jira/browse/IGNITE-23365
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov
 Attachments: assignments.log

When starting a node, we can write a message like this:
{code:java}
[2024-10-04T13:08:50,921][INFO 
][%node_3344%JRaft-ReadOnlyService-Disruptor_stripe_0-0][AssignmentsTracker] 
Assignment cache initialized for placement driver 
[groupAssignments={12_part_12=TokenizedAssignmentsImpl [nodes=UnmodifiableSet 
[Assignment [consistentId=node_3345, isPeer=true]], token=1334], 
20_part_20=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment 
[consistentId=node_3346, isPeer=true]], token=1969], 
22_part_22=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment 
[consistentId=node_3344, isPeer=true]], token=997], 
12_part_13=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment 
[consistentId=node_3345, isPeer=true]], token=1323], 
20_part_21=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment 
[consistentId=node_3344, isPeer=true]], token=1004], 
22_part_23=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment 
[consistentId=node_3346, isPeer=true]], token=1978], 
25_part_24=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment 
[consistentId=node_3346, isPeer=true]], token=2019], 
12_part_14=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment 
[consistentId=node_3345, isPeer=true]], token=1327], 
20_part_22=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment 
[consistentId=node_3344, isPeer=true]], token=1012], 
22_part_20=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment 
[consistentId=node_3346, isPeer=true]], token=1974],...{code}
Full message is in the attachment. We should make it shorter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-23365) Fix huge log message with assignments

2024-10-04 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-23365:
---
Attachment: assignments.log

> Fix huge log message with assignments
> -
>
> Key: IGNITE-23365
> URL: https://issues.apache.org/jira/browse/IGNITE-23365
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Attachments: assignments.log
>
>
> When starting a node, we can write a message like this:
> {code:java}
> [2024-10-04T13:08:50,921][INFO 
> ][%node_3344%JRaft-ReadOnlyService-Disruptor_stripe_0-0][AssignmentsTracker] 
> Assignment cache initialized for placement driver 
> [groupAssignments={12_part_12=TokenizedAssignmentsImpl [nodes=UnmodifiableSet 
> [Assignment [consistentId=node_3345, isPeer=true]], token=1334], 
> 20_part_20=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment 
> [consistentId=node_3346, isPeer=true]], token=1969], 
> 22_part_22=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment 
> [consistentId=node_3344, isPeer=true]], token=997], 
> 12_part_13=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment 
> [consistentId=node_3345, isPeer=true]], token=1323], 
> 20_part_21=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment 
> [consistentId=node_3344, isPeer=true]], token=1004], 
> 22_part_23=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment 
> [consistentId=node_3346, isPeer=true]], token=1978], 
> 25_part_24=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment 
> [consistentId=node_3346, isPeer=true]], token=2019], 
> 12_part_14=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment 
> [consistentId=node_3345, isPeer=true]], token=1327], 
> 20_part_22=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment 
> [consistentId=node_3344, isPeer=true]], token=1012], 
> 22_part_20=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment 
> [consistentId=node_3346, isPeer=true]], token=1974],...{code}
> Full message is in the attachment. We should make it shorter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-23318) Unable to restart cluster multiple times.

2024-10-03 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-23318:
--

Assignee: Ivan Bessonov

> Unable to restart cluster multiple times.
> -
>
> Key: IGNITE-23318
> URL: https://issues.apache.org/jira/browse/IGNITE-23318
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 3.0
>Reporter: Iurii Gerzhedovich
>Assignee: Ivan Bessonov
>Priority: Blocker
>  Labels: ignite-3
> Attachments: ignite.log
>
>
> We have TPCH benchmarks in Ignite 3 
> (org.apache.ignite.internal.benchmark.TpchBenchmark). The benchmark can 
> prepare data in a cluster and run benchmarks a few times on the same data set 
> without data reload. During run the benchmarks I observe the following issues:
> 1. After a few such runs, I have situations when the nodes cannot assemble 
> into a cluster. 100% reproducible but with a different number of restarts.
> 2. With every restart the startup logs become longer and longer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage

2024-10-02 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-23240:
---
Description: 
h1. Preface

Current implementation, based on {{{}RocksDB{}}}, is known to be way slower 
then it should be. There are multiple obvious reasons for that:
 * Writing into WAL +and+ memtable
 * Creating unique keys for every record
 * Inability to efficiently serialize data, we must have an intermediate state 
before we pass data into {{{}RocksDB{}}}'s API.

h1. Benchmarks
h3. Local benchmarks

Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local 
environment with fsync disabled. I got the following results:
 * {{{}Logit{}}}:

{noformat}
Test write:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 23.541
  Total size      : 16777216000
  Throughput(bps) : 712680684
  Throughput(rps) : 43498
Test read:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 3.808
  Total size      : 16777216000
  Throughput(bps) : 4405781512
  Throughput(rps) : 268907
Test done!{noformat}
 * {{{}RocksDB{}}}:

{noformat}
Test write:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 178.785
  Total size      : 16777216000
  Throughput(bps) : 93840176
  Throughput(rps) : 5727
Test read:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 13.572
  Total size      : 16777216000
  Throughput(bps) : 1236163866
  Throughput(rps) : 75449
Test done!{noformat}
While testing on local environment is not optimal, is still shows a huge 
improvement in writing speed (7.5x) and reading speed (3.5x). Enabling 
{{fsync}} sort-of equalizes writing speed, but we still expect that simpler log 
implementation would be faster dues to smaller overall overhead.
h3. Integration testing

Benchmark for 3 servers and 1 client writing data in multiple threads shows 
34438 vs 30299 throughput improvement.

{{{}RocksDB{}}}:

!Screenshot from 2024-09-20 10-38-53.png!

{{{}Logit{}}}:

!Screenshot from 2024-09-20 10-38-57.png!

Benchmark for single thread insertions in embedded mode shows 4072 vs 3739 
throughput improvement.

{{{}RocksDB{}}}:

!Screenshot from 2024-09-20 10-42-49.png!

{{{}Logit{}}}:

!Screenshot from 2024-09-20 10-43-09.png!
h1. Observations

Despite a drastic difference in log throughput, user operations throughput 
increase is only about 10%. This means that we lose a lot of time elsewhere, 
and optimizing those parts could significantly increase performance too. Log 
optimizations would become more evident after that.
h1. Unsolved issues

There are multiple issues with new log implementation, some of them have been 
mentioned in IGNITE-22843
 * {{Logit}} pre-allocates _a lot_ of data on drive. Considering that we use 
"log per partition" paradigm, it's too wasteful.
 * Storing separate log file per partition is not scalable anyway, it's too 
difficult to optimize batches and {{fsync}} in this approach.
 * Using the same log for all tables in a distribution zone won't really solve 
the issue, the best it could do is to make it {_}manageable{_}, in some sense.

h1. Shortly about how Logit works

Each log consists of 3 sets of files:
 * "segment" files with data.
 * "configuration" files with raft configuration.
 * "index" files with pointers to segment and configuration files.

"segment" and "configuration" files contain chunks of data in a following 
format:
|Magic header|Payload size|Payload itself|

"index" files contain following pieces of data:
|Magic header|Log entry type (data/cfg)|offset|position|

It's a fixed-length tuple, that contains a "link" to one of data files. Each 
"index" file is basically an offset table, and it is used to resolve "logIndex" 
into real log data.
h1. What we should change

A list of actions, that we need to do to make this log fit the required 
criteria includes:
 * Merge "configuration" and "segment" files into one, to have fewer files on 
drive, the distinction is arbitrary anyway. Let's call it a "data" file.
 * Use the same "data" file for multiple raft groups.
It's important to note that we can't use "data" file per stripe, because stripe 
calculation function is not {_}stable{_}, it allocates {{stripeId}} dynamically 
in order to have a smoother distribution in runtime.
 * Log should be able to enforce checkpoints/flushes in storage engines, in 
order to safely truncate data upon reaching a threshold (we truncate logs from 
multiple raft groups at the same time, that's why we need it).
This means that we will change the way Raft canonically makes snapshots, 
instead we will have our own approach, similar to what we have in Ignite 2.x. 
Or we will abuse snapshots logic and trigger them outside of the schedule.
 * In order to make {{fsync}} faster, we should get rid of "index" 

[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage

2024-10-02 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-23240:
---
Description: 
h1. Preface

Current implementation, based on {{{}RocksDB{}}}, is known to be way slower 
then it should be. There are multiple obvious reasons for that:
 * Writing into WAL +and+ memtable
 * Creating unique keys for every record
 * Inability to efficiently serialize data, we must have an intermediate state 
before we pass data into {{{}RocksDB{}}}'s API.

h1. Benchmarks
h3. Local benchmarks

Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local 
environment with fsync disabled. I got the following results:
 * {{{}Logit{}}}:

{noformat}
Test write:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 23.541
  Total size      : 16777216000
  Throughput(bps) : 712680684
  Throughput(rps) : 43498
Test read:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 3.808
  Total size      : 16777216000
  Throughput(bps) : 4405781512
  Throughput(rps) : 268907
Test done!{noformat}
 * {{{}RocksDB{}}}:

{noformat}
Test write:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 178.785
  Total size      : 16777216000
  Throughput(bps) : 93840176
  Throughput(rps) : 5727
Test read:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 13.572
  Total size      : 16777216000
  Throughput(bps) : 1236163866
  Throughput(rps) : 75449
Test done!{noformat}
While testing on local environment is not optimal, is still shows a huge 
improvement in writing speed (7.5x) and reading speed (3.5x). Enabling 
{{fsync}} sort-of equalizes writing speed, but we still expect that simpler log 
implementation would be faster dues to smaller overall overhead.
h3. Integration testing

Benchmark for 3 servers and 1 client writing data in multiple threads shows 
34438 vs 30299 throughput improvement.

{{{}RocksDB{}}}:

!Screenshot from 2024-09-20 10-38-53.png!

{{{}Logit{}}}:

!Screenshot from 2024-09-20 10-38-57.png!

Benchmark for single thread insertions in embedded mode shows 4072 vs 3739 
throughput improvement.

{{{}RocksDB{}}}:

!Screenshot from 2024-09-20 10-42-49.png!

{{{}Logit{}}}:

!Screenshot from 2024-09-20 10-43-09.png!
h1. Observations

Despite a drastic difference in log throughput, user operations throughput 
increase is only about 10%. This means that we lose a lot of time elsewhere, 
and optimizing those parts could significantly increase performance too. Log 
optimizations would become more evident after that.
h1. Unsolved issues

There are multiple issues with new log implementation, some of them have been 
mentioned in IGNITE-22843
 * {{Logit}} pre-allocates _a lot_ of data on drive. Considering that we use 
"log per partition" paradigm, it's too wasteful.
 * Storing separate log file per partition is not scalable anyway, it's too 
difficult to optimize batches and {{fsync}} in this approach.
 * Using the same log for all tables in a distribution zone won't really solve 
the issue, the best it could do is to make it {_}manageable{_}, in some sense.

h1. Shortly about how Logit works

Each log consists of 3 sets of files:
 * "segment" files with data.
 * "configuration" files with raft configuration.
 * "index" files with pointers to segment and configuration files.

"segment" and "configuration" files contain chunks of data in a following 
format:
|Magic header|Payload size|Payload itself|

"index" files contain following pieces of data:
|Magic header|Log entry type (data/cfg)|offset|position|

It's a fixed-length tuple, that contains a "link" to one of data files. Each 
"index" file is basically an offset table, and it is used to resolve "logIndex" 
into real log data.
h1. What we should change

A list of actions, that we need to do to make this log fit the required 
criteria includes:
 * Merge "configuration" and "segment" files into one, to have fewer files on 
drive, the distinction is arbitrary anyway. Let's call it a "data" file.
 * Use the same "data" file for multiple raft groups.
It's important to note that we can't use "data" file per stripe, because stripe 
calculation function is not {_}stable{_}, it allocates {{stripeId}} dynamically 
in order to have a smoother distribution in runtime.
 * Log should be able to enforce checkpoints/flushes in storage engines, in 
order to safely truncate data upon reaching a threshold (we truncate logs from 
multiple raft groups at the same time, that's why we need it).
This means that we will change the way Raft canonically makes snapshots, 
instead we will have our own approach, similar to what we have in Ignite 2.x. 
Or we will abuse snapshots logic and trigger them outside of the schedule.
 * In order to make {{fsync}} faster, we should get rid of "index" 

[jira] [Updated] (IGNITE-23325) Checkpoint read-lock timeout under high load

2024-10-01 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-23325:
---
Description: 
We encounter the following situation while having a very intensive load. It 
should not happen.

Can be reproduced in {{ignite-22835-2}} on 
{{6bc1f97d0d2506b666975eea57b78ad8609b69d7}} commit.
{noformat}
[2024-09-25T11:03:12,180][INFO ][%node_3344%checkpoint-thread][Checkpointer] 
Checkpoint started [checkpointId=db0f392d-7d71-4ef1-bb50-c17eb2d02d82, 
checkpointBeforeWriteLockTime=30ms, checkpointWriteLockWait=1ms, 
checkpointListenersExecuteTime=2ms, checkpointWriteLockHoldTime=4ms, 
splitAndSortPagesDuration=276ms, pages=80904, reason='too many dirty pages']
227808.089 ops/s
[2024-09-25T11:03:13,438][WARN 
][org.apache.ignite.internal.benchmark.UpsertKvBenchmark.upsert-jmh-worker-10][PersistentPageMemory]
 Page replacements started, pages will be rotated with disk, this will affect 
storage performance (consider increasing 
PageMemoryDataRegionConfiguration#setMaxSize for data region) [region=default]
[2024-09-25T11:03:13,796][INFO ][checkpoint-runner-io0][CheckpointPagesWriter] 
Checkpoint pages were not written yet due to unsuccessful page write lock 
acquisition and will be retried [pageCount=1]
[2024-09-25T11:03:14,071][INFO ][%node_3344%checkpoint-thread][Checkpointer] 
Checkpoint finished [checkpointId=db0f392d-7d71-4ef1-bb50-c17eb2d02d82, 
pages=80904, pagesWriteTime=1614ms, fsyncTime=273ms, totalTime=2203ms]
[2024-09-25T11:03:14,073][INFO ][%node_3344%compaction-thread][Compactor] 
Starting new compaction round [files=64]
[2024-09-25T11:03:15,828][INFO ][%node_3344%checkpoint-thread][Checkpointer] 
Checkpoint started [checkpointId=11157014-7a08-41e0-9e80-e26cf01656a8, 
checkpointBeforeWriteLockTime=21ms, checkpointWriteLockWait=0ms, 
checkpointListenersExecuteTime=6ms, checkpointWriteLockHoldTime=6ms, 
splitAndSortPagesDuration=205ms, pages=77234, reason='too many dirty pages']
231068.630 ops/s
271814.068 ops/s
# Warmup Iteration   8: [2024-09-25T11:03:23,547][INFO 
][%node_3344%compaction-thread][Compactor] Starting new compaction round 
[files=64]
[2024-09-25T11:03:23,547][INFO ][%node_3344%checkpoint-thread][Checkpointer] 
Checkpoint finished [checkpointId=11157014-7a08-41e0-9e80-e26cf01656a8, 
pages=77234, pagesWriteTime=2547ms, fsyncTime=5099ms, totalTime=7951ms]
26376.850 ops/s
# Warmup Iteration   9: [2024-09-25T11:03:23,685][INFO 
][%node_3344%checkpoint-thread][Checkpointer] Checkpoint started 
[checkpointId=b1b2541a-e948-46cf-821b-412ea120146c, 
checkpointBeforeWriteLockTime=9ms, checkpointWriteLockWait=0ms, 
checkpointListenersExecuteTime=1ms, checkpointWriteLockHoldTime=1ms, 
splitAndSortPagesDuration=125ms, pages=95251, reason='too many dirty pages']
[2024-09-25T11:03:34,706][INFO ][%node_3344%compaction-thread][Compactor] 
Compaction round finished [duration=11159ms]
[2024-09-25T11:03:34,707][INFO ][%node_3344%compaction-thread][Compactor] 
Starting new compaction round [files=41]
[2024-09-25T11:03:35,444][INFO ][%node_3344%lease-updater][LeaseUpdater] Leases 
updated (printed once per 10 iteration(s)): [inCurrentIteration=LeaseStats 
[leasesCreated=0, leasesPublished=0, leasesProlonged=64, 
leasesWithoutCandidates=0], active=64, currentAssignmentsSize=64].
[2024-09-25T11:03:39,075][WARN 
][%node_3344%meta-storage-safe-time-0][TrackableNetworkMessageHandler] Message 
handling has been too long [duration=11ms, message=class 
org.apache.ignite.raft.jraft.rpc.WriteActionRequestImpl]
[2024-09-25T11:03:40,287][INFO ][%node_3344%checkpoint-thread][Checkpointer] 
Checkpoint finished [checkpointId=b1b2541a-e948-46cf-821b-412ea120146c, 
pages=95251, pagesWriteTime=3098ms, fsyncTime=13146ms, totalTime=16740ms]
[2024-09-25T11:03:40,307][WARN 
][%node_3344%JRaft-FSMCaller-Disruptor_stripe_24-0][FailureProcessor] Possible 
failure suppressed according to a configured handler [hnd=NoOpFailureHandler 
[super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet 
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], 
failureCtx=SYSTEM_CRITICAL_OPERATION_TIMEOUT]
org.apache.ignite.internal.lang.IgniteInternalException: Checkpoint read lock 
acquisition has been timed out.
at 
org.apache.ignite.internal.pagememory.persistence.checkpoint.CheckpointTimeoutLock.failCheckpointReadLock(CheckpointTimeoutLock.java:242)
 ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.internal.pagememory.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:130)
 ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.lambda$runConsistently$0(PersistentPageMemoryMvPartitionStorage.java:175)
 ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.internal.storage.pagememory.mv.AbstractPageMemoryMvPartitionStorage.b

[jira] [Created] (IGNITE-23326) Configuration parser allows duplicated keys

2024-10-01 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-23326:
--

 Summary: Configuration parser allows duplicated keys
 Key: IGNITE-23326
 URL: https://issues.apache.org/jira/browse/IGNITE-23326
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov


Currently, this code leads to no warnings or errors:
{code:java}
String configTemplate = "ignite {\n"
+ "  \"network\": {\n"
+ "\"port\":{},\n"
+ "\"nodeFinder\":{\n"
+ "  \"netClusterNodes\": [ {} ]\n"
+ "}\n"
+ "  },\n"
+ "  storage.profiles: {"
+ "" + DEFAULT_STORAGE_PROFILE + ".engine: aipersist, "
+ "" + DEFAULT_STORAGE_PROFILE + ".size: 2073741824 "
+ "  },\n"
+ "  storage.profiles: {"
+ "" + DEFAULT_STORAGE_PROFILE + ".engine: aipersist, "
+ "" + DEFAULT_STORAGE_PROFILE + ".size: 2073741824 " // Avoid 
page replacement.
+ "  },\n"
+ "  clientConnector: { port:{} },\n"
+ "  rest.port: {},\n"
+ "  raft.fsync = " + fsync()
+ "}"; {code}
This behavior is confusing and error-prone for the end user, we shouldn't allow 
it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-23325) Checkpoint read-lock timeout under high load

2024-10-01 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-23325:
--

 Summary: Checkpoint read-lock timeout under high load
 Key: IGNITE-23325
 URL: https://issues.apache.org/jira/browse/IGNITE-23325
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov


We encounter the following situation while having a very intensive load. It 
should not happen
{noformat}
[2024-09-25T11:03:12,180][INFO ][%node_3344%checkpoint-thread][Checkpointer] 
Checkpoint started [checkpointId=db0f392d-7d71-4ef1-bb50-c17eb2d02d82, 
checkpointBeforeWriteLockTime=30ms, checkpointWriteLockWait=1ms, 
checkpointListenersExecuteTime=2ms, checkpointWriteLockHoldTime=4ms, 
splitAndSortPagesDuration=276ms, pages=80904, reason='too many dirty pages']
227808.089 ops/s
[2024-09-25T11:03:13,438][WARN 
][org.apache.ignite.internal.benchmark.UpsertKvBenchmark.upsert-jmh-worker-10][PersistentPageMemory]
 Page replacements started, pages will be rotated with disk, this will affect 
storage performance (consider increasing 
PageMemoryDataRegionConfiguration#setMaxSize for data region) [region=default]
[2024-09-25T11:03:13,796][INFO ][checkpoint-runner-io0][CheckpointPagesWriter] 
Checkpoint pages were not written yet due to unsuccessful page write lock 
acquisition and will be retried [pageCount=1]
[2024-09-25T11:03:14,071][INFO ][%node_3344%checkpoint-thread][Checkpointer] 
Checkpoint finished [checkpointId=db0f392d-7d71-4ef1-bb50-c17eb2d02d82, 
pages=80904, pagesWriteTime=1614ms, fsyncTime=273ms, totalTime=2203ms]
[2024-09-25T11:03:14,073][INFO ][%node_3344%compaction-thread][Compactor] 
Starting new compaction round [files=64]
[2024-09-25T11:03:15,828][INFO ][%node_3344%checkpoint-thread][Checkpointer] 
Checkpoint started [checkpointId=11157014-7a08-41e0-9e80-e26cf01656a8, 
checkpointBeforeWriteLockTime=21ms, checkpointWriteLockWait=0ms, 
checkpointListenersExecuteTime=6ms, checkpointWriteLockHoldTime=6ms, 
splitAndSortPagesDuration=205ms, pages=77234, reason='too many dirty pages']
231068.630 ops/s
271814.068 ops/s
# Warmup Iteration   8: [2024-09-25T11:03:23,547][INFO 
][%node_3344%compaction-thread][Compactor] Starting new compaction round 
[files=64]
[2024-09-25T11:03:23,547][INFO ][%node_3344%checkpoint-thread][Checkpointer] 
Checkpoint finished [checkpointId=11157014-7a08-41e0-9e80-e26cf01656a8, 
pages=77234, pagesWriteTime=2547ms, fsyncTime=5099ms, totalTime=7951ms]
26376.850 ops/s
# Warmup Iteration   9: [2024-09-25T11:03:23,685][INFO 
][%node_3344%checkpoint-thread][Checkpointer] Checkpoint started 
[checkpointId=b1b2541a-e948-46cf-821b-412ea120146c, 
checkpointBeforeWriteLockTime=9ms, checkpointWriteLockWait=0ms, 
checkpointListenersExecuteTime=1ms, checkpointWriteLockHoldTime=1ms, 
splitAndSortPagesDuration=125ms, pages=95251, reason='too many dirty pages']
[2024-09-25T11:03:34,706][INFO ][%node_3344%compaction-thread][Compactor] 
Compaction round finished [duration=11159ms]
[2024-09-25T11:03:34,707][INFO ][%node_3344%compaction-thread][Compactor] 
Starting new compaction round [files=41]
[2024-09-25T11:03:35,444][INFO ][%node_3344%lease-updater][LeaseUpdater] Leases 
updated (printed once per 10 iteration(s)): [inCurrentIteration=LeaseStats 
[leasesCreated=0, leasesPublished=0, leasesProlonged=64, 
leasesWithoutCandidates=0], active=64, currentAssignmentsSize=64].
[2024-09-25T11:03:39,075][WARN 
][%node_3344%meta-storage-safe-time-0][TrackableNetworkMessageHandler] Message 
handling has been too long [duration=11ms, message=class 
org.apache.ignite.raft.jraft.rpc.WriteActionRequestImpl]
[2024-09-25T11:03:40,287][INFO ][%node_3344%checkpoint-thread][Checkpointer] 
Checkpoint finished [checkpointId=b1b2541a-e948-46cf-821b-412ea120146c, 
pages=95251, pagesWriteTime=3098ms, fsyncTime=13146ms, totalTime=16740ms]
[2024-09-25T11:03:40,307][WARN 
][%node_3344%JRaft-FSMCaller-Disruptor_stripe_24-0][FailureProcessor] Possible 
failure suppressed according to a configured handler [hnd=NoOpFailureHandler 
[super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet 
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], 
failureCtx=SYSTEM_CRITICAL_OPERATION_TIMEOUT]
org.apache.ignite.internal.lang.IgniteInternalException: Checkpoint read lock 
acquisition has been timed out.
at 
org.apache.ignite.internal.pagememory.persistence.checkpoint.CheckpointTimeoutLock.failCheckpointReadLock(CheckpointTimeoutLock.java:242)
 ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.internal.pagememory.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:130)
 ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.lambda$runConsistently$0(PersistentPageMemoryMvPartitionStorage.java:175)
 ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.internal.storage.pagememory.mv.AbstractPageMe

[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage

2024-09-23 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-23240:
---
Description: 
h1. Preface

Current implementation, based on {{{}RocksDB{}}}, is known to be way slower 
then it should be. There are multiple obvious reasons for that:
 * Writing into WAL +and+ memtable
 * Creating unique keys for every record
 * Inability to efficiently serialize data, we must have an intermediate state 
before we pass data into {{{}RocksDB{}}}'s API.

h1. Benchmarks
h3. Local benchmarks

Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local 
environment with fsync disabled. I got the following results:
 * {{{}Logit{}}}:

{noformat}
Test write:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 23.541
  Total size      : 16777216000
  Throughput(bps) : 712680684
  Throughput(rps) : 43498
Test read:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 3.808
  Total size      : 16777216000
  Throughput(bps) : 4405781512
  Throughput(rps) : 268907
Test done!{noformat}
 * {{{}RocksDB{}}}:

{noformat}
Test write:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 178.785
  Total size      : 16777216000
  Throughput(bps) : 93840176
  Throughput(rps) : 5727
Test read:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 13.572
  Total size      : 16777216000
  Throughput(bps) : 1236163866
  Throughput(rps) : 75449
Test done!{noformat}
While testing on local environment is not optimal, is still shows a huge 
improvement in writing speed (7.5x) and reading speed (3.5x). Enabling 
{{fsync}} sort-of equalizes writing speed, but we still expect that simpler log 
implementation would be faster dues to smaller overall overhead.
h3. Integration testing

Benchmark for 3 servers and 1 client writing data in multiple threads shows 
34438 vs 30299 throughput improvement.

{{{}RocksDB{}}}:

!Screenshot from 2024-09-20 10-38-53.png!

{{{}Logit{}}}:

!Screenshot from 2024-09-20 10-38-57.png!

Benchmark for single thread insertions in embedded mode shows 4072 vs 3739 
throughput improvement.

{{{}RocksDB{}}}:

!Screenshot from 2024-09-20 10-42-49.png!

{{{}Logit{}}}:

!Screenshot from 2024-09-20 10-43-09.png!
h1. Observations

Despite a drastic difference in log throughput, user operations throughput 
increase is only about 10%. This means that we lose a lot of time elsewhere, 
and optimizing those parts could significantly increase performance too. Log 
optimizations would become more evident after that.
h1. Unsolved issues

There are multiple issues with new log implementation, some of them have been 
mentioned in IGNITE-22843
 * {{Logit}} pre-allocates _a lot_ of data on drive. Considering that we use 
"log per partition" paradigm, it's too wasteful.
 * Storing separate log file per partition is not scalable anyway, it's too 
difficult to optimize batches and {{fsync}} in this approach.
 * Using the same log for all tables in a distribution zone won't really solve 
the issue, the best it could do is to make it {_}manageable{_}, in some sense.

h1. Shortly about how Logit works

Each log consists of 3 sets of files:
 * "segment" files with data.
 * "configuration" files with raft configuration.
 * "index" files with pointers to segment and configuration files.

"segment" and "configuration" files contain chunks of data in a following 
format:

 
|Magic header|Payload size|Payload itself|

"index" files contain following pieces of data:
|Magic header|Log entry type (data/cfg)|offset|position|

It's a fixed-length tuple, that contains a "link" to one of data files. Each 
"index" file is basically an offset table, and it is used to resolve "logIndex" 
into real log data.
h1. What we should change

A list of actions, that we need to do to make this log fit the required 
criteria includes:
 * Merge "configuration" and "segment" files into one, to have fewer files on 
drive, the distinction is arbitrary anyway. Let's call it a "data" file.
 * Use the same "data" file for multiple raft groups.
It's important to note that we can't use "data" file per stripe, because stripe 
calculation function is not {_}stable{_}, it allocates {{stripeId}} dynamically 
in order to have a smoother distribution in runtime.
 * Log should be able to enforce checkpoints/flushes in storage engines, in 
order to safely truncate data upon reaching a threshold (we truncate logs from 
multiple raft groups at the same time, that's why we need it).
This means that we will change the way Raft canonically makes snapshots, 
instead we will have our own approach, similar to what we have in Ignite 2.x. 
Or we will abuse snapshots logic and trigger them outside of the schedule.
 * In order to make {{fsync}} faster, we should get rid of "inde

[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage

2024-09-20 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-23240:
---
Description: 
h1. Preface

Current implementation, based on {{{}RocksDB{}}}, is known to be way slower 
then it should be. There are multiple obvious reasons for that:
 * Writing into WAL +and+ memtable
 * Creating unique keys for every record
 * Inability to efficiently serialize data, we must have an intermediate state 
before we pass data into {{{}RocksDB{}}}'s API.

h1. Benchmarks
h3. Local benchmarks

Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local 
environment with fsync disabled. I got the following results:
 * {{{}Logit{}}}:

{noformat}
Test write:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 23.541
  Total size      : 16777216000
  Throughput(bps) : 712680684
  Throughput(rps) : 43498
Test read:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 3.808
  Total size      : 16777216000
  Throughput(bps) : 4405781512
  Throughput(rps) : 268907
Test done!{noformat}
 * {{{}RocksDB{}}}:

{noformat}
Test write:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 178.785
  Total size      : 16777216000
  Throughput(bps) : 93840176
  Throughput(rps) : 5727
Test read:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 13.572
  Total size      : 16777216000
  Throughput(bps) : 1236163866
  Throughput(rps) : 75449
Test done!{noformat}
While testing on local environment is not optimal, is still shows a huge 
improvement in writing speed (7.5x) and reading speed (3.5x). Enabling 
{{fsync}} sort-of equalizes writing speed, but we still expect that simpler log 
implementation would be faster dues to smaller overall overhead.
h3. Integration testing

Benchmark for 3 servers and 1 client writing data in multiple threads shows 
34438 vs 30299 throughput improvement.

{{{}RocksDB{}}}:

!Screenshot from 2024-09-20 10-38-53.png!

{{{}Logit{}}}:

!Screenshot from 2024-09-20 10-38-57.png!

Benchmark for single thread insertions in embedded mode shows 4072 vs 3739 
throughput improvement.

{{{}RocksDB{}}}:

!Screenshot from 2024-09-20 10-42-49.png!

{{{}Logit{}}}:

!Screenshot from 2024-09-20 10-43-09.png!
h1. Observations

Despite a drastic difference in log throughput, user operations throughput 
increase is only about 10%. This means that we lose a lot of time elsewhere, 
and optimizing those parts could significantly increase performance too. Log 
optimizations would become more evident after that.
h1. Unsolved issues

There are multiple issues with new log implementation, some of them have been 
mentioned in IGNITE-22843
 * {{Logit}} pre-allocates _a lot_ of data on drive. Considering that we use 
"log per partition" paradigm, it's too wasteful.
 * Storing separate log file per partition is not scalable anyway, it's too 
difficult to optimize batches and {{fsync}} in this approach.
 * Using the same log for all tables in a distribution zone won't really solve 
the issue, the best it could do is to make it {_}manageable{_}, in some sense.

h1. Shortly about how Logit works

Each log consists of 3 sets of files:
 * "segment" files with data.
 * "configuration" files with raft configuration.
 * "index" files with pointers to segment and configuration files.

"segment" and "configuration" files contain chunks of data in a following 
format:

 
|Magic header|Payload size|Payload itself|

"index" files contain following pieces of data:
|Magic header|Log entry type (data/cfg)|offset|position|

It's a fixed-length tuple, that contains a "link" to one of data files. Each 
"index" file is basically an offset table, and it is used to resolve "logIndex" 
into real log data.
h1. What we should change

A list of actions, that we need to do to make this log fit the required 
criteria includes:
 * Merge "configuration" and "segment" files into one, to have fewer files on 
drive, the distinction is arbitrary anyway. Let's call it a "data" file.
 * Use the same "data" file for multiple raft groups.
It's important to note that we can't use "data" file per stripe, because stripe 
calculation function is not {_}stable{_}, it allocates {{stripeId}} dynamically 
in order to have a smoother distribution in runtime.
 * Log should be able to enforce checkpoints/flushes in storage engines, in 
order to safely truncate data upon reaching a threshold (we truncate logs from 
multiple raft groups at the same time, that's why we need it).
This means that we will change the way Raft canonically makes snapshots, 
instead we will have our own approach, similar to what we have in Ignite 2.x. 
Or we will abuse snapshots logic and trigger them outside of the schedule.
 * In order to make {{fsync}} faster, we should get rid of "inde

[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage

2024-09-20 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-23240:
---
Description: 
h1. Preface

Current implementation, based on {{{}RocksDB{}}}, is known to be way slower 
then it should be. There are multiple obvious reasons for that:
 * Writing into WAL +and+ memtable
 * Creating unique keys for every record
 * Inability to efficiently serialize data, we must have an intermediate state 
before we pass data into {{{}RocksDB{}}}'s API.

h1. Benchmarks
h3. Local benchmarks

Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local 
environment with fsync disabled. I got the following results:
 * {{{}Logit{}}}:

{noformat}
Test write:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 23.541
  Total size      : 16777216000
  Throughput(bps) : 712680684
  Throughput(rps) : 43498
Test read:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 3.808
  Total size      : 16777216000
  Throughput(bps) : 4405781512
  Throughput(rps) : 268907
Test done!{noformat}
 * {{{}RocksDB{}}}:

{noformat}
Test write:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 178.785
  Total size      : 16777216000
  Throughput(bps) : 93840176
  Throughput(rps) : 5727
Test read:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 13.572
  Total size      : 16777216000
  Throughput(bps) : 1236163866
  Throughput(rps) : 75449
Test done!{noformat}
While testing on local environment is not optimal, is still shows a huge 
improvement in writing speed (7.5x) and reading speed (3.5x). Enabling 
{{fsync}} sort-of equalizes writing speed, but we still expect that simpler log 
implementation would be faster dues to smaller overall overhead.
h3. Integration testing

Benchmark for 3 servers and 1 client writing data in multiple threads shows 
34438 vs 30299 throughput improvement.

{{{}RocksDB{}}}:

!Screenshot from 2024-09-20 10-38-53.png!

{{{}Logit{}}}:

!Screenshot from 2024-09-20 10-38-57.png!

Benchmark for single thread insertions in embedded mode shows 4072 vs 3739 
throughput improvement.

{{{}RocksDB{}}}:

!Screenshot from 2024-09-20 10-42-49.png!

{{{}Logit{}}}:

!Screenshot from 2024-09-20 10-43-09.png!
h1. Observations

Despite a drastic difference in log throughput, user operations throughput 
increase is only about 10%. This means that we lose a lot of time elsewhere, 
and optimizing those parts could significantly increase performance too. Log 
optimizations would become more evident after that.
h1. Unsolved issues

There are multiple issues with new log implementation, some of them have been 
mentioned in IGNITE-22843
 * {{Logit}} pre-allocates _a lot_ of data on drive. Considering that we use 
"log per partition" paradigm, it's too wasteful.
 * Storing separate log file per partition is not scalable anyway, it's too 
difficult to optimize batches and {{fsync}} in this approach.
 * Using the same log for all tables in a distribution zone won't really solve 
the issue, the best it could do is to make it {_}manageable{_}, in some sense.

h1. Shortly about how Logit works

Each log consists of 3 sets of files:
 * "segment" files with data.
 * "configuration" files with raft configuration.
 * "index" files with pointers to segment and configuration files.

"segment" and "configuration" files contain chunks of data in a following 
format:

 
|Magic header|Payload size|Payload itself|

"index" files contain following pieces of data:
|Magic header|Log entry type (data/cfg)|offset|position|

It's a fixed-length tuple, that contains a "link" to one of data files. Each 
"index" file is basically an offset table, and it is used to resolve "logIndex" 
into real log data.

 
h1. What we should change

A list of actions, that we need to do to make this log fit the required 
criteria includes:
 *  

 

  was:
h1. Preface

Current implementation, based on {{{}RocksDB{}}}, is known to be way slower 
then it should be. There are multiple obvious reasons for that:
 * Writing into WAL +and+ memtable
 * Creating unique keys for every record
 * Inability to efficiently serialize data, we must have an intermediate state 
before we pass data into {{{}RocksDB{}}}'s API.

h1. Benchmarks
h3. Local benchmarks

Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local 
environment with fsync disabled. I got the following results:
 * {{{}Logit{}}}:

{noformat}
Test write:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 23.541
  Total size      : 16777216000
  Throughput(bps) : 712680684
  Throughput(rps) : 43498
Test read:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 3.808
  Total size      : 16777

[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage

2024-09-20 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-23240:
---
Description: 
h1. Preface

Current implementation, based on {{{}RocksDB{}}}, is known to be way slower 
then it should be. There are multiple obvious reasons for that:
 * Writing into WAL +and+ memtable
 * Creating unique keys for every record
 * Inability to efficiently serialize data, we must have an intermediate state 
before we pass data into {{{}RocksDB{}}}'s API.

h1. Benchmarks
h3. Local benchmarks

Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local 
environment with fsync disabled. I got the following results:
 * {{{}Logit{}}}:

{noformat}
Test write:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 23.541
  Total size      : 16777216000
  Throughput(bps) : 712680684
  Throughput(rps) : 43498
Test read:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 3.808
  Total size      : 16777216000
  Throughput(bps) : 4405781512
  Throughput(rps) : 268907
Test done!{noformat}
 * {{{}RocksDB{}}}:

{noformat}
Test write:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 178.785
  Total size      : 16777216000
  Throughput(bps) : 93840176
  Throughput(rps) : 5727
Test read:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 13.572
  Total size      : 16777216000
  Throughput(bps) : 1236163866
  Throughput(rps) : 75449
Test done!{noformat}
While testing on local environment is not optimal, is still shows a huge 
improvement in writing speed (7.5x) and reading speed (3.5x). Enabling 
{{fsync}} sort-of equalizes writing speed, but we still expect that simpler log 
implementation would be faster dues to smaller overall overhead.
h3. Integration testing

Benchmark for 3 servers and 1 client writing data in multiple threads shows 
34438 vs 30299 throughput improvement.

{{{}RocksDB{}}}:

!Screenshot from 2024-09-20 10-38-53.png!

{{{}Logit{}}}:

!Screenshot from 2024-09-20 10-38-57.png!

Benchmark for single thread insertions in embedded mode shows 4072 vs 3739 
throughput improvement.

{{{}RocksDB{}}}:

!Screenshot from 2024-09-20 10-42-49.png!

{{{}Logit{}}}:

!Screenshot from 2024-09-20 10-43-09.png!
h1. Observations

Despite a drastic difference in log throughput, user operations throughput 
increase is only about 10%. This means that we lose a lot of time elsewhere, 
and optimizing those parts could significantly increase performance too. Log 
optimizations would become more evident after that.
h1. Unsolved issues

There are multiple issues with new log implementation, most of them have been 
mentioned in 
[IGNITE-22843|https://issues.apache.org/jira/browse/IGNITE-22843?focusedCommentId=17871250&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17871250]

  was:
h1. Preface

Current implementation, based on {{{}RocksDB{}}}, is known to be way slower 
then it should be. There are multiple obvious reasons for that:
 * Writing into WAL +and+ memtable
 * Creating unique keys for every record
 * Inability to efficiently serialize data, we must have an intermediate state 
before we pass data into {{{}RocksDB{}}}'s API.

h1. Benchmarks
h3. Local benchmarks

Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local 
environment with fsync disabled. I got the following results:
 * {{{}Logit{}}}:

 
{noformat}
Test write:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 23.541
  Total size      : 16777216000
  Throughput(bps) : 712680684
  Throughput(rps) : 43498
Test read:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 3.808
  Total size      : 16777216000
  Throughput(bps) : 4405781512
  Throughput(rps) : 268907
Test done!{noformat}
 * {{{}RocksDB{}}}:

{noformat}
Test write:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 178.785
  Total size      : 16777216000
  Throughput(bps) : 93840176
  Throughput(rps) : 5727
Test read:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 13.572
  Total size      : 16777216000
  Throughput(bps) : 1236163866
  Throughput(rps) : 75449
Test done!{noformat}
While testing on local environment is not optimal, is still shows a huge 
improvement in writing speed (7.5x) and reading speed (3.5x). Enabling 
{{fsync}} sort-of equalizes writing speed, but we still expect that simpler log 
implementation would be faster dues to smaller overall overhead.
h3. Integration testing

Benchmark for 3 servers and 1 client writing data in multiple threads show 
34438 vs 30299 throughput.

{{{}RocksDB{}}}:

!Screenshot from 2024-09-20 10-38-

[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage

2024-09-20 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-23240:
---
Attachment: Screenshot from 2024-09-20 10-38-53.png

> Ignite 3 new log storage
> 
>
> Key: IGNITE-23240
> URL: https://issues.apache.org/jira/browse/IGNITE-23240
> Project: Ignite
>  Issue Type: Epic
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Attachments: Screenshot from 2024-09-20 10-38-53.png
>
>
> h1. Preface
> Current implementation, based on {{{}RocksDB{}}}, is known to be way slower 
> then it should be. There are multiple obvious reasons for that:
>  * Writing into WAL +and+ memtable
>  * Creating unique keys for every record
>  * Inability to efficiently serialize data, we must have an intermediate 
> state before we pass data into {{{}RocksDB{}}}'s API.
> h1. Benchmarks
> h3. Local benchmarks
> Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my 
> local environment with fsync disabled. I got the following results:
>  * {{{}Logit{}}}:
>  
> {noformat}
> Test write:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 23.541
>   Total size      : 16777216000
>   Throughput(bps) : 712680684
>   Throughput(rps) : 43498
> Test read:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 3.808
>   Total size      : 16777216000
>   Throughput(bps) : 4405781512
>   Throughput(rps) : 268907
> Test done!{noformat}
>  * {{{}RocksDB{}}}:
> {noformat}
> Test write:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 178.785
>   Total size      : 16777216000
>   Throughput(bps) : 93840176
>   Throughput(rps) : 5727
> Test read:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 13.572
>   Total size      : 16777216000
>   Throughput(bps) : 1236163866
>   Throughput(rps) : 75449
> Test done!{noformat}
> While testing on local environment is not optimal, is still shows a huge 
> improvement in writing speed (7.5x) and reading speed (3.5x). Enabling 
> {{fsync}} sort-of equalizes writing speed, but we still expect that simpler 
> log implementation would be faster dues to smaller overall overhead.
> h3. Integration testing
> Benchmark for 3 servers and 1 client writing data in multiple threads show 
> the following.
> {{{}RocksDB{}}}:
>  
>  
> {{{}Logit{}}}:
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage

2024-09-20 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-23240:
---
Description: 
h1. Preface

Current implementation, based on {{{}RocksDB{}}}, is known to be way slower 
then it should be. There are multiple obvious reasons for that:
 * Writing into WAL +and+ memtable
 * Creating unique keys for every record
 * Inability to efficiently serialize data, we must have an intermediate state 
before we pass data into {{{}RocksDB{}}}'s API.

h1. Benchmarks
h3. Local benchmarks

Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local 
environment with fsync disabled. I got the following results:
 * {{{}Logit{}}}:

 
{noformat}
Test write:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 23.541
  Total size      : 16777216000
  Throughput(bps) : 712680684
  Throughput(rps) : 43498
Test read:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 3.808
  Total size      : 16777216000
  Throughput(bps) : 4405781512
  Throughput(rps) : 268907
Test done!{noformat}
 * {{{}RocksDB{}}}:

{noformat}
Test write:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 178.785
  Total size      : 16777216000
  Throughput(bps) : 93840176
  Throughput(rps) : 5727
Test read:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 13.572
  Total size      : 16777216000
  Throughput(bps) : 1236163866
  Throughput(rps) : 75449
Test done!{noformat}
While testing on local environment is not optimal, is still shows a huge 
improvement in writing speed (7.5x) and reading speed (3.5x). Enabling 
{{fsync}} sort-of equalizes writing speed, but we still expect that simpler log 
implementation would be faster dues to smaller overall overhead.
h3. Integration testing

Benchmark for 3 servers and 1 client writing data in multiple threads show the 
following.

{{{}RocksDB{}}}:

 

 

{{{}Logit{}}}:

 

  was:
h1. Preface

Current implementation, based on {{{}RocksDB{}}}, is known to be way slower 
then it should be. There are multiple obvious reasons for that:
 * Writing into WAL +and+ memtable
 * Creating unique keys for every record
 * Inability to efficiently serialize data, we must have an intermediate state 
before we pass data into {{{}RocksDB{}}}'s API.

h1. Benchmarks
h3. Local benchmarks

Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local 
environment with fsync disabled. I got the following results:
 * {{{}Logit{}}}:

 
{noformat}
Test write:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 23.541
  Total size      : 16777216000
  Throughput(bps) : 712680684
  Throughput(rps) : 43498
Test read:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 3.808
  Total size      : 16777216000
  Throughput(bps) : 4405781512
  Throughput(rps) : 268907
Test done!{noformat}
 * {{{}RocksDB{}}}:

{noformat}
Test write:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 178.785
  Total size      : 16777216000
  Throughput(bps) : 93840176
  Throughput(rps) : 5727
Test read:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 13.572
  Total size      : 16777216000
  Throughput(bps) : 1236163866
  Throughput(rps) : 75449
Test done!{noformat}
While testing on local environment is not optimal, is still shows a huge 
improvement in writing speed (7.5x) and reading speed (3.5x). Enabling 
{{fsync}} sort-of equalizes writing speed, but we still expect that simpler log 
implementation would be faster dues to smaller overall overhead.
h3. Integration testing

Benchmark for 3 servers and 1 client writing data in multiple threads show the 
following.

{{{}RocksDB{}}}:

 

!image-2024-09-20-10-39-22-043.png!

{{{}Logit{}}}:

 


> Ignite 3 new log storage
> 
>
> Key: IGNITE-23240
> URL: https://issues.apache.org/jira/browse/IGNITE-23240
> Project: Ignite
>  Issue Type: Epic
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Attachments: Screenshot from 2024-09-20 10-38-53.png
>
>
> h1. Preface
> Current implementation, based on {{{}RocksDB{}}}, is known to be way slower 
> then it should be. There are multiple obvious reasons for that:
>  * Writing into WAL +and+ memtable
>  * Creating unique keys for every record
>  * Inability to efficiently serialize data, we must have an intermediate 
> state before we pass data into {{{}RocksDB{}}}'s API.
> h1. Benchmarks
> h3. Local benchmarks
> Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my 
> local environment with fs

[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage

2024-09-20 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-23240:
---
Attachment: (was: image-2024-09-20-10-43-23-213.png)

> Ignite 3 new log storage
> 
>
> Key: IGNITE-23240
> URL: https://issues.apache.org/jira/browse/IGNITE-23240
> Project: Ignite
>  Issue Type: Epic
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Attachments: Screenshot from 2024-09-20 10-38-53.png, Screenshot from 
> 2024-09-20 10-38-57.png, Screenshot from 2024-09-20 10-42-49.png, Screenshot 
> from 2024-09-20 10-43-09.png
>
>
> h1. Preface
> Current implementation, based on {{{}RocksDB{}}}, is known to be way slower 
> then it should be. There are multiple obvious reasons for that:
>  * Writing into WAL +and+ memtable
>  * Creating unique keys for every record
>  * Inability to efficiently serialize data, we must have an intermediate 
> state before we pass data into {{{}RocksDB{}}}'s API.
> h1. Benchmarks
> h3. Local benchmarks
> Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my 
> local environment with fsync disabled. I got the following results:
>  * {{{}Logit{}}}:
>  
> {noformat}
> Test write:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 23.541
>   Total size      : 16777216000
>   Throughput(bps) : 712680684
>   Throughput(rps) : 43498
> Test read:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 3.808
>   Total size      : 16777216000
>   Throughput(bps) : 4405781512
>   Throughput(rps) : 268907
> Test done!{noformat}
>  * {{{}RocksDB{}}}:
> {noformat}
> Test write:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 178.785
>   Total size      : 16777216000
>   Throughput(bps) : 93840176
>   Throughput(rps) : 5727
> Test read:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 13.572
>   Total size      : 16777216000
>   Throughput(bps) : 1236163866
>   Throughput(rps) : 75449
> Test done!{noformat}
> While testing on local environment is not optimal, is still shows a huge 
> improvement in writing speed (7.5x) and reading speed (3.5x). Enabling 
> {{fsync}} sort-of equalizes writing speed, but we still expect that simpler 
> log implementation would be faster dues to smaller overall overhead.
> h3. Integration testing
> Benchmark for 3 servers and 1 client writing data in multiple threads show 
> 34438 vs 30299 throughput.
> {{{}RocksDB{}}}:
> !Screenshot from 2024-09-20 10-38-53.png!
> {{{}Logit{}}}:
> !Screenshot from 2024-09-20 10-38-57.png!
> Single thread insertions in embedded mode show



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage

2024-09-20 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-23240:
---
Attachment: Screenshot from 2024-09-20 10-38-57.png

> Ignite 3 new log storage
> 
>
> Key: IGNITE-23240
> URL: https://issues.apache.org/jira/browse/IGNITE-23240
> Project: Ignite
>  Issue Type: Epic
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Attachments: Screenshot from 2024-09-20 10-38-53.png
>
>
> h1. Preface
> Current implementation, based on {{{}RocksDB{}}}, is known to be way slower 
> then it should be. There are multiple obvious reasons for that:
>  * Writing into WAL +and+ memtable
>  * Creating unique keys for every record
>  * Inability to efficiently serialize data, we must have an intermediate 
> state before we pass data into {{{}RocksDB{}}}'s API.
> h1. Benchmarks
> h3. Local benchmarks
> Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my 
> local environment with fsync disabled. I got the following results:
>  * {{{}Logit{}}}:
>  
> {noformat}
> Test write:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 23.541
>   Total size      : 16777216000
>   Throughput(bps) : 712680684
>   Throughput(rps) : 43498
> Test read:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 3.808
>   Total size      : 16777216000
>   Throughput(bps) : 4405781512
>   Throughput(rps) : 268907
> Test done!{noformat}
>  * {{{}RocksDB{}}}:
> {noformat}
> Test write:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 178.785
>   Total size      : 16777216000
>   Throughput(bps) : 93840176
>   Throughput(rps) : 5727
> Test read:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 13.572
>   Total size      : 16777216000
>   Throughput(bps) : 1236163866
>   Throughput(rps) : 75449
> Test done!{noformat}
> While testing on local environment is not optimal, is still shows a huge 
> improvement in writing speed (7.5x) and reading speed (3.5x). Enabling 
> {{fsync}} sort-of equalizes writing speed, but we still expect that simpler 
> log implementation would be faster dues to smaller overall overhead.
> h3. Integration testing
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage

2024-09-20 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-23240:
---
Attachment: (was: Screenshot from 2024-09-20 10-38-57.png)

> Ignite 3 new log storage
> 
>
> Key: IGNITE-23240
> URL: https://issues.apache.org/jira/browse/IGNITE-23240
> Project: Ignite
>  Issue Type: Epic
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Attachments: Screenshot from 2024-09-20 10-38-53.png
>
>
> h1. Preface
> Current implementation, based on {{{}RocksDB{}}}, is known to be way slower 
> then it should be. There are multiple obvious reasons for that:
>  * Writing into WAL +and+ memtable
>  * Creating unique keys for every record
>  * Inability to efficiently serialize data, we must have an intermediate 
> state before we pass data into {{{}RocksDB{}}}'s API.
> h1. Benchmarks
> h3. Local benchmarks
> Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my 
> local environment with fsync disabled. I got the following results:
>  * {{{}Logit{}}}:
>  
> {noformat}
> Test write:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 23.541
>   Total size      : 16777216000
>   Throughput(bps) : 712680684
>   Throughput(rps) : 43498
> Test read:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 3.808
>   Total size      : 16777216000
>   Throughput(bps) : 4405781512
>   Throughput(rps) : 268907
> Test done!{noformat}
>  * {{{}RocksDB{}}}:
> {noformat}
> Test write:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 178.785
>   Total size      : 16777216000
>   Throughput(bps) : 93840176
>   Throughput(rps) : 5727
> Test read:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 13.572
>   Total size      : 16777216000
>   Throughput(bps) : 1236163866
>   Throughput(rps) : 75449
> Test done!{noformat}
> While testing on local environment is not optimal, is still shows a huge 
> improvement in writing speed (7.5x) and reading speed (3.5x). Enabling 
> {{fsync}} sort-of equalizes writing speed, but we still expect that simpler 
> log implementation would be faster dues to smaller overall overhead.
> h3. Integration testing
> Benchmark for 3 servers and 1 client writing data in multiple threads show 
> the following.
> {{{}RocksDB{}}}:
>  
>  
> {{{}Logit{}}}:
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage

2024-09-20 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-23240:
---
Attachment: Screenshot from 2024-09-20 10-43-09.png

> Ignite 3 new log storage
> 
>
> Key: IGNITE-23240
> URL: https://issues.apache.org/jira/browse/IGNITE-23240
> Project: Ignite
>  Issue Type: Epic
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Attachments: Screenshot from 2024-09-20 10-38-53.png, Screenshot from 
> 2024-09-20 10-38-57.png, Screenshot from 2024-09-20 10-42-49.png, Screenshot 
> from 2024-09-20 10-43-09.png
>
>
> h1. Preface
> Current implementation, based on {{{}RocksDB{}}}, is known to be way slower 
> then it should be. There are multiple obvious reasons for that:
>  * Writing into WAL +and+ memtable
>  * Creating unique keys for every record
>  * Inability to efficiently serialize data, we must have an intermediate 
> state before we pass data into {{{}RocksDB{}}}'s API.
> h1. Benchmarks
> h3. Local benchmarks
> Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my 
> local environment with fsync disabled. I got the following results:
>  * {{{}Logit{}}}:
>  
> {noformat}
> Test write:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 23.541
>   Total size      : 16777216000
>   Throughput(bps) : 712680684
>   Throughput(rps) : 43498
> Test read:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 3.808
>   Total size      : 16777216000
>   Throughput(bps) : 4405781512
>   Throughput(rps) : 268907
> Test done!{noformat}
>  * {{{}RocksDB{}}}:
> {noformat}
> Test write:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 178.785
>   Total size      : 16777216000
>   Throughput(bps) : 93840176
>   Throughput(rps) : 5727
> Test read:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 13.572
>   Total size      : 16777216000
>   Throughput(bps) : 1236163866
>   Throughput(rps) : 75449
> Test done!{noformat}
> While testing on local environment is not optimal, is still shows a huge 
> improvement in writing speed (7.5x) and reading speed (3.5x). Enabling 
> {{fsync}} sort-of equalizes writing speed, but we still expect that simpler 
> log implementation would be faster dues to smaller overall overhead.
> h3. Integration testing
> Benchmark for 3 servers and 1 client writing data in multiple threads show 
> 34438 vs 30299 throughput.
> {{{}RocksDB{}}}:
> !Screenshot from 2024-09-20 10-38-53.png!
> {{{}Logit{}}}:
> !Screenshot from 2024-09-20 10-38-57.png!
> Single thread insertions in embedded mode show



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage

2024-09-20 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-23240:
---
Description: 
h1. Preface

Current implementation, based on {{{}RocksDB{}}}, is known to be way slower 
then it should be. There are multiple obvious reasons for that:
 * Writing into WAL +and+ memtable
 * Creating unique keys for every record
 * Inability to efficiently serialize data, we must have an intermediate state 
before we pass data into {{{}RocksDB{}}}'s API.

h1. Benchmarks
h3. Local benchmarks

Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local 
environment with fsync disabled. I got the following results:
 * {{{}Logit{}}}:

 
{noformat}
Test write:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 23.541
  Total size      : 16777216000
  Throughput(bps) : 712680684
  Throughput(rps) : 43498
Test read:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 3.808
  Total size      : 16777216000
  Throughput(bps) : 4405781512
  Throughput(rps) : 268907
Test done!{noformat}
 * {{{}RocksDB{}}}:

{noformat}
Test write:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 178.785
  Total size      : 16777216000
  Throughput(bps) : 93840176
  Throughput(rps) : 5727
Test read:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 13.572
  Total size      : 16777216000
  Throughput(bps) : 1236163866
  Throughput(rps) : 75449
Test done!{noformat}
While testing on local environment is not optimal, is still shows a huge 
improvement in writing speed (7.5x) and reading speed (3.5x). Enabling 
{{fsync}} sort-of equalizes writing speed, but we still expect that simpler log 
implementation would be faster dues to smaller overall overhead.
h3. Integration testing

Benchmark for 3 servers and 1 client writing data in multiple threads show 
34438 vs 30299 throughput.

{{{}RocksDB{}}}:

!Screenshot from 2024-09-20 10-38-53.png!

{{{}Logit{}}}:

!Screenshot from 2024-09-20 10-38-57.png!

Single thread insertions in embedded mode show

  was:
h1. Preface

Current implementation, based on {{{}RocksDB{}}}, is known to be way slower 
then it should be. There are multiple obvious reasons for that:
 * Writing into WAL +and+ memtable
 * Creating unique keys for every record
 * Inability to efficiently serialize data, we must have an intermediate state 
before we pass data into {{{}RocksDB{}}}'s API.

h1. Benchmarks
h3. Local benchmarks

Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local 
environment with fsync disabled. I got the following results:
 * {{{}Logit{}}}:

 
{noformat}
Test write:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 23.541
  Total size      : 16777216000
  Throughput(bps) : 712680684
  Throughput(rps) : 43498
Test read:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 3.808
  Total size      : 16777216000
  Throughput(bps) : 4405781512
  Throughput(rps) : 268907
Test done!{noformat}
 * {{{}RocksDB{}}}:

{noformat}
Test write:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 178.785
  Total size      : 16777216000
  Throughput(bps) : 93840176
  Throughput(rps) : 5727
Test read:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 13.572
  Total size      : 16777216000
  Throughput(bps) : 1236163866
  Throughput(rps) : 75449
Test done!{noformat}
While testing on local environment is not optimal, is still shows a huge 
improvement in writing speed (7.5x) and reading speed (3.5x). Enabling 
{{fsync}} sort-of equalizes writing speed, but we still expect that simpler log 
implementation would be faster dues to smaller overall overhead.
h3. Integration testing

Benchmark for 3 servers and 1 client writing data in multiple threads show the 
following.

{{{}RocksDB{}}}:

 

 

{{{}Logit{}}}:

 


> Ignite 3 new log storage
> 
>
> Key: IGNITE-23240
> URL: https://issues.apache.org/jira/browse/IGNITE-23240
> Project: Ignite
>  Issue Type: Epic
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Attachments: Screenshot from 2024-09-20 10-38-53.png, Screenshot from 
> 2024-09-20 10-38-57.png, Screenshot from 2024-09-20 10-42-49.png, Screenshot 
> from 2024-09-20 10-43-09.png
>
>
> h1. Preface
> Current implementation, based on {{{}RocksDB{}}}, is known to be way slower 
> then it should be. There are multiple obvious reasons for that:
>  * Writing into WAL +and+ memtable
>  * Creating unique keys for every record
>  * Inability to efficiently serialize data

[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage

2024-09-20 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-23240:
---
Attachment: image-2024-09-20-10-43-23-213.png

> Ignite 3 new log storage
> 
>
> Key: IGNITE-23240
> URL: https://issues.apache.org/jira/browse/IGNITE-23240
> Project: Ignite
>  Issue Type: Epic
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Attachments: Screenshot from 2024-09-20 10-38-53.png, Screenshot from 
> 2024-09-20 10-38-57.png, Screenshot from 2024-09-20 10-42-49.png, Screenshot 
> from 2024-09-20 10-43-09.png
>
>
> h1. Preface
> Current implementation, based on {{{}RocksDB{}}}, is known to be way slower 
> then it should be. There are multiple obvious reasons for that:
>  * Writing into WAL +and+ memtable
>  * Creating unique keys for every record
>  * Inability to efficiently serialize data, we must have an intermediate 
> state before we pass data into {{{}RocksDB{}}}'s API.
> h1. Benchmarks
> h3. Local benchmarks
> Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my 
> local environment with fsync disabled. I got the following results:
>  * {{{}Logit{}}}:
>  
> {noformat}
> Test write:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 23.541
>   Total size      : 16777216000
>   Throughput(bps) : 712680684
>   Throughput(rps) : 43498
> Test read:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 3.808
>   Total size      : 16777216000
>   Throughput(bps) : 4405781512
>   Throughput(rps) : 268907
> Test done!{noformat}
>  * {{{}RocksDB{}}}:
> {noformat}
> Test write:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 178.785
>   Total size      : 16777216000
>   Throughput(bps) : 93840176
>   Throughput(rps) : 5727
> Test read:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 13.572
>   Total size      : 16777216000
>   Throughput(bps) : 1236163866
>   Throughput(rps) : 75449
> Test done!{noformat}
> While testing on local environment is not optimal, is still shows a huge 
> improvement in writing speed (7.5x) and reading speed (3.5x). Enabling 
> {{fsync}} sort-of equalizes writing speed, but we still expect that simpler 
> log implementation would be faster dues to smaller overall overhead.
> h3. Integration testing
> Benchmark for 3 servers and 1 client writing data in multiple threads show 
> the following.
> {{{}RocksDB{}}}:
>  
>  
> {{{}Logit{}}}:
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage

2024-09-20 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-23240:
---
Attachment: Screenshot from 2024-09-20 10-42-49.png

> Ignite 3 new log storage
> 
>
> Key: IGNITE-23240
> URL: https://issues.apache.org/jira/browse/IGNITE-23240
> Project: Ignite
>  Issue Type: Epic
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Attachments: Screenshot from 2024-09-20 10-38-53.png, Screenshot from 
> 2024-09-20 10-38-57.png, Screenshot from 2024-09-20 10-42-49.png, Screenshot 
> from 2024-09-20 10-43-09.png
>
>
> h1. Preface
> Current implementation, based on {{{}RocksDB{}}}, is known to be way slower 
> then it should be. There are multiple obvious reasons for that:
>  * Writing into WAL +and+ memtable
>  * Creating unique keys for every record
>  * Inability to efficiently serialize data, we must have an intermediate 
> state before we pass data into {{{}RocksDB{}}}'s API.
> h1. Benchmarks
> h3. Local benchmarks
> Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my 
> local environment with fsync disabled. I got the following results:
>  * {{{}Logit{}}}:
>  
> {noformat}
> Test write:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 23.541
>   Total size      : 16777216000
>   Throughput(bps) : 712680684
>   Throughput(rps) : 43498
> Test read:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 3.808
>   Total size      : 16777216000
>   Throughput(bps) : 4405781512
>   Throughput(rps) : 268907
> Test done!{noformat}
>  * {{{}RocksDB{}}}:
> {noformat}
> Test write:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 178.785
>   Total size      : 16777216000
>   Throughput(bps) : 93840176
>   Throughput(rps) : 5727
> Test read:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 13.572
>   Total size      : 16777216000
>   Throughput(bps) : 1236163866
>   Throughput(rps) : 75449
> Test done!{noformat}
> While testing on local environment is not optimal, is still shows a huge 
> improvement in writing speed (7.5x) and reading speed (3.5x). Enabling 
> {{fsync}} sort-of equalizes writing speed, but we still expect that simpler 
> log implementation would be faster dues to smaller overall overhead.
> h3. Integration testing
> Benchmark for 3 servers and 1 client writing data in multiple threads show 
> the following.
> {{{}RocksDB{}}}:
>  
>  
> {{{}Logit{}}}:
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage

2024-09-20 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-23240:
---
Attachment: Screenshot from 2024-09-20 10-38-57.png

> Ignite 3 new log storage
> 
>
> Key: IGNITE-23240
> URL: https://issues.apache.org/jira/browse/IGNITE-23240
> Project: Ignite
>  Issue Type: Epic
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Attachments: Screenshot from 2024-09-20 10-38-53.png, Screenshot from 
> 2024-09-20 10-38-57.png
>
>
> h1. Preface
> Current implementation, based on {{{}RocksDB{}}}, is known to be way slower 
> then it should be. There are multiple obvious reasons for that:
>  * Writing into WAL +and+ memtable
>  * Creating unique keys for every record
>  * Inability to efficiently serialize data, we must have an intermediate 
> state before we pass data into {{{}RocksDB{}}}'s API.
> h1. Benchmarks
> h3. Local benchmarks
> Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my 
> local environment with fsync disabled. I got the following results:
>  * {{{}Logit{}}}:
>  
> {noformat}
> Test write:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 23.541
>   Total size      : 16777216000
>   Throughput(bps) : 712680684
>   Throughput(rps) : 43498
> Test read:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 3.808
>   Total size      : 16777216000
>   Throughput(bps) : 4405781512
>   Throughput(rps) : 268907
> Test done!{noformat}
>  * {{{}RocksDB{}}}:
> {noformat}
> Test write:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 178.785
>   Total size      : 16777216000
>   Throughput(bps) : 93840176
>   Throughput(rps) : 5727
> Test read:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 13.572
>   Total size      : 16777216000
>   Throughput(bps) : 1236163866
>   Throughput(rps) : 75449
> Test done!{noformat}
> While testing on local environment is not optimal, is still shows a huge 
> improvement in writing speed (7.5x) and reading speed (3.5x). Enabling 
> {{fsync}} sort-of equalizes writing speed, but we still expect that simpler 
> log implementation would be faster dues to smaller overall overhead.
> h3. Integration testing
> Benchmark for 3 servers and 1 client writing data in multiple threads show 
> the following.
> {{{}RocksDB{}}}:
>  
>  
> {{{}Logit{}}}:
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage

2024-09-20 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-23240:
---
Attachment: (was: image-2024-09-20-10-39-22-043.png)

> Ignite 3 new log storage
> 
>
> Key: IGNITE-23240
> URL: https://issues.apache.org/jira/browse/IGNITE-23240
> Project: Ignite
>  Issue Type: Epic
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Attachments: Screenshot from 2024-09-20 10-38-53.png
>
>
> h1. Preface
> Current implementation, based on {{{}RocksDB{}}}, is known to be way slower 
> then it should be. There are multiple obvious reasons for that:
>  * Writing into WAL +and+ memtable
>  * Creating unique keys for every record
>  * Inability to efficiently serialize data, we must have an intermediate 
> state before we pass data into {{{}RocksDB{}}}'s API.
> h1. Benchmarks
> h3. Local benchmarks
> Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my 
> local environment with fsync disabled. I got the following results:
>  * {{{}Logit{}}}:
>  
> {noformat}
> Test write:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 23.541
>   Total size      : 16777216000
>   Throughput(bps) : 712680684
>   Throughput(rps) : 43498
> Test read:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 3.808
>   Total size      : 16777216000
>   Throughput(bps) : 4405781512
>   Throughput(rps) : 268907
> Test done!{noformat}
>  * {{{}RocksDB{}}}:
> {noformat}
> Test write:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 178.785
>   Total size      : 16777216000
>   Throughput(bps) : 93840176
>   Throughput(rps) : 5727
> Test read:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 13.572
>   Total size      : 16777216000
>   Throughput(bps) : 1236163866
>   Throughput(rps) : 75449
> Test done!{noformat}
> While testing on local environment is not optimal, is still shows a huge 
> improvement in writing speed (7.5x) and reading speed (3.5x). Enabling 
> {{fsync}} sort-of equalizes writing speed, but we still expect that simpler 
> log implementation would be faster dues to smaller overall overhead.
> h3. Integration testing
> Benchmark for 3 servers and 1 client writing data in multiple threads show 
> the following.
> {{{}RocksDB{}}}:
>  
> !image-2024-09-20-10-39-22-043.png!
> {{{}Logit{}}}:
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage

2024-09-20 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-23240:
---
Description: 
h1. Preface

Current implementation, based on {{{}RocksDB{}}}, is known to be way slower 
then it should be. There are multiple obvious reasons for that:
 * Writing into WAL +and+ memtable
 * Creating unique keys for every record
 * Inability to efficiently serialize data, we must have an intermediate state 
before we pass data into {{{}RocksDB{}}}'s API.

h1. Benchmarks
h3. Local benchmarks

Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local 
environment with fsync disabled. I got the following results:
 * {{{}Logit{}}}:

 
{noformat}
Test write:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 23.541
  Total size      : 16777216000
  Throughput(bps) : 712680684
  Throughput(rps) : 43498
Test read:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 3.808
  Total size      : 16777216000
  Throughput(bps) : 4405781512
  Throughput(rps) : 268907
Test done!{noformat}
 * {{{}RocksDB{}}}:

{noformat}
Test write:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 178.785
  Total size      : 16777216000
  Throughput(bps) : 93840176
  Throughput(rps) : 5727
Test read:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 13.572
  Total size      : 16777216000
  Throughput(bps) : 1236163866
  Throughput(rps) : 75449
Test done!{noformat}
While testing on local environment is not optimal, is still shows a huge 
improvement in writing speed (7.5x) and reading speed (3.5x). Enabling 
{{fsync}} sort-of equalizes writing speed, but we still expect that simpler log 
implementation would be faster dues to smaller overall overhead.
h3. Integration testing

Benchmark for 3 servers and 1 client writing data in multiple threads show the 
following.

{{{}RocksDB{}}}:

 

!image-2024-09-20-10-39-22-043.png!

{{{}Logit{}}}:

 

  was:
h1. Preface

Current implementation, based on {{{}RocksDB{}}}, is known to be way slower 
then it should be. There are multiple obvious reasons for that:
 * Writing into WAL +and+ memtable
 * Creating unique keys for every record
 * Inability to efficiently serialize data, we must have an intermediate state 
before we pass data into {{{}RocksDB{}}}'s API.

h1. Benchmarks
h3. Local benchmarks

Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local 
environment with fsync disabled. I got the following results:
 * {{{}Logit{}}}:

 
{noformat}
Test write:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 23.541
  Total size      : 16777216000
  Throughput(bps) : 712680684
  Throughput(rps) : 43498
Test read:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 3.808
  Total size      : 16777216000
  Throughput(bps) : 4405781512
  Throughput(rps) : 268907
Test done!{noformat}
 * {{{}RocksDB{}}}:

{noformat}
Test write:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 178.785
  Total size      : 16777216000
  Throughput(bps) : 93840176
  Throughput(rps) : 5727
Test read:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 13.572
  Total size      : 16777216000
  Throughput(bps) : 1236163866
  Throughput(rps) : 75449
Test done!{noformat}
While testing on local environment is not optimal, is still shows a huge 
improvement in writing speed (7.5x) and reading speed (3.5x). Enabling 
{{fsync}} sort-of equalizes writing speed, but we still expect that simpler log 
implementation would be faster dues to smaller overall overhead.
h3. Integration testing

 

 

 


> Ignite 3 new log storage
> 
>
> Key: IGNITE-23240
> URL: https://issues.apache.org/jira/browse/IGNITE-23240
> Project: Ignite
>  Issue Type: Epic
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Attachments: Screenshot from 2024-09-20 10-38-53.png
>
>
> h1. Preface
> Current implementation, based on {{{}RocksDB{}}}, is known to be way slower 
> then it should be. There are multiple obvious reasons for that:
>  * Writing into WAL +and+ memtable
>  * Creating unique keys for every record
>  * Inability to efficiently serialize data, we must have an intermediate 
> state before we pass data into {{{}RocksDB{}}}'s API.
> h1. Benchmarks
> h3. Local benchmarks
> Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my 
> local environment with fsync disabled. I got the following results:
>  * {{{}Logit{}}}:
>  
> {noformat}
> Test write:
>   Log number      : 1024000
> 

[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage

2024-09-20 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-23240:
---
Attachment: image-2024-09-20-10-39-22-043.png

> Ignite 3 new log storage
> 
>
> Key: IGNITE-23240
> URL: https://issues.apache.org/jira/browse/IGNITE-23240
> Project: Ignite
>  Issue Type: Epic
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Attachments: image-2024-09-20-10-39-22-043.png
>
>
> h1. Preface
> Current implementation, based on {{{}RocksDB{}}}, is known to be way slower 
> then it should be. There are multiple obvious reasons for that:
>  * Writing into WAL +and+ memtable
>  * Creating unique keys for every record
>  * Inability to efficiently serialize data, we must have an intermediate 
> state before we pass data into {{{}RocksDB{}}}'s API.
> h1. Benchmarks
> h3. Local benchmarks
> Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my 
> local environment with fsync disabled. I got the following results:
>  * {{{}Logit{}}}:
>  
> {noformat}
> Test write:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 23.541
>   Total size      : 16777216000
>   Throughput(bps) : 712680684
>   Throughput(rps) : 43498
> Test read:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 3.808
>   Total size      : 16777216000
>   Throughput(bps) : 4405781512
>   Throughput(rps) : 268907
> Test done!{noformat}
>  * {{{}RocksDB{}}}:
> {noformat}
> Test write:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 178.785
>   Total size      : 16777216000
>   Throughput(bps) : 93840176
>   Throughput(rps) : 5727
> Test read:
>   Log number      : 1024000
>   Log Size        : 16384
>   Batch Size      : 100
>   Cost time(s)    : 13.572
>   Total size      : 16777216000
>   Throughput(bps) : 1236163866
>   Throughput(rps) : 75449
> Test done!{noformat}
> While testing on local environment is not optimal, is still shows a huge 
> improvement in writing speed (7.5x) and reading speed (3.5x). Enabling 
> {{fsync}} sort-of equalizes writing speed, but we still expect that simpler 
> log implementation would be faster dues to smaller overall overhead.
> h3. Integration testing
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-23240) Ignite 3 new log storage

2024-09-20 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-23240:
--

 Summary: Ignite 3 new log storage
 Key: IGNITE-23240
 URL: https://issues.apache.org/jira/browse/IGNITE-23240
 Project: Ignite
  Issue Type: Epic
Reporter: Ivan Bessonov
 Attachments: image-2024-09-20-10-39-22-043.png

h1. Preface

Current implementation, based on {{{}RocksDB{}}}, is known to be way slower 
then it should be. There are multiple obvious reasons for that:
 * Writing into WAL +and+ memtable
 * Creating unique keys for every record
 * Inability to efficiently serialize data, we must have an intermediate state 
before we pass data into {{{}RocksDB{}}}'s API.

h1. Benchmarks
h3. Local benchmarks

Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local 
environment with fsync disabled. I got the following results:
 * {{{}Logit{}}}:

 
{noformat}
Test write:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 23.541
  Total size      : 16777216000
  Throughput(bps) : 712680684
  Throughput(rps) : 43498
Test read:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 3.808
  Total size      : 16777216000
  Throughput(bps) : 4405781512
  Throughput(rps) : 268907
Test done!{noformat}
 * {{{}RocksDB{}}}:

{noformat}
Test write:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 178.785
  Total size      : 16777216000
  Throughput(bps) : 93840176
  Throughput(rps) : 5727
Test read:
  Log number      : 1024000
  Log Size        : 16384
  Batch Size      : 100
  Cost time(s)    : 13.572
  Total size      : 16777216000
  Throughput(bps) : 1236163866
  Throughput(rps) : 75449
Test done!{noformat}
While testing on local environment is not optimal, is still shows a huge 
improvement in writing speed (7.5x) and reading speed (3.5x). Enabling 
{{fsync}} sort-of equalizes writing speed, but we still expect that simpler log 
implementation would be faster dues to smaller overall overhead.
h3. Integration testing

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-22843) Writing into RAFT log is too long

2024-09-19 Thread Ivan Bessonov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882920#comment-17882920
 ] 

Ivan Bessonov commented on IGNITE-22843:


Totally agree with what Roman said, I'll elaborate on this topic in a separate 
JIRA. Thank you!

> Writing into RAFT log is too long
> -
>
> Key: IGNITE-22843
> URL: https://issues.apache.org/jira/browse/IGNITE-22843
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> h3. Motivation
> We are using RocksDB as RAFT log storage. Writing in the log is significantly 
> longer than writing in the memory-mapped buffer (as we used in Ignite 2).
> {noformat}
> appendLogEntry 0.8 6493700 6494500
> Here is hidden 0.5 us
> flushLog 20.1 6495000 6515100
> Here is hidden 2.8 us
> {noformat}
> h3. Definition of done
> We should find a way to implement faster log storage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22843) Writing into RAFT log is too long

2024-09-17 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22843:
---
Reviewer: Roman Puchkovskiy

> Writing into RAFT log is too long
> -
>
> Key: IGNITE-22843
> URL: https://issues.apache.org/jira/browse/IGNITE-22843
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. Motivation
> We are using RocksDB as RAFT log storage. Writing in the log is significantly 
> longer than writing in the memory-mapped buffer (as we used in Ignite 2).
> {noformat}
> appendLogEntry 0.8 6493700 6494500
> Here is hidden 0.5 us
> flushLog 20.1 6495000 6515100
> Here is hidden 2.8 us
> {noformat}
> h3. Definition of done
> We should find a way to implement faster log storage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-23212) Page replacement doesn't work sometimes

2024-09-16 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-23212:
--

 Summary: Page replacement doesn't work sometimes
 Key: IGNITE-23212
 URL: https://issues.apache.org/jira/browse/IGNITE-23212
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov


Under a sophisticated load, we sometimes see the following exception:
{noformat}
org.apache.ignite.lang.IgniteException: Error while executing 
addWriteCommitted: [rowId=RowId [partitionId=13, 
uuid=0191-eb5c-824c-7a07-fa1210b49ed8], tableId=10, partitionId=13]
at 
java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710) ~[?:?]
at 
org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:789) 
~[ignite-core-3.0.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:723)
 ~[ignite-core-3.0.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:525)
 ~[ignite-core-3.0.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.internal.util.ViewUtils.copyExceptionWithCauseIfPossible(ViewUtils.java:91)
 ~[ignite-core-3.0.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.internal.util.ViewUtils.ensurePublicException(ViewUtils.java:71)
 ~[ignite-core-3.0.0-SNAPSHOT.jar:?]
at org.apache.ignite.internal.util.ViewUtils.sync(ViewUtils.java:54) 
~[ignite-core-3.0.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.internal.client.table.ClientKeyValueBinaryView.put(ClientKeyValueBinaryView.java:207)
 ~[ignite-client-3.0.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.internal.client.table.ClientKeyValueBinaryView.put(ClientKeyValueBinaryView.java:60)
 ~[ignite-client-3.0.0-SNAPSHOT.jar:?]
at site.ycsb.db.ignite3.IgniteClient.insert(IgniteClient.java:49) 
[ignite3-binding-2024.15.jar:?]
at site.ycsb.DBWrapper.insert(DBWrapper.java:284) [core-2024.15.jar:?]
at site.ycsb.workloads.CoreWorkload.doInsert(CoreWorkload.java:657) 
[core-2024.15.jar:?]
at site.ycsb.ClientThread.run(ClientThread.java:181) 
[core-2024.15.jar:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: org.apache.ignite.lang.IgniteException: Error while executing 
addWriteCommitted: [rowId=RowId [partitionId=13, 
uuid=0191-eb5c-824c-7a07-fa1210b49ed8], tableId=10, partitionId=13]
at 
java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710) ~[?:?]
at 
org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:789) 
~[ignite-core-3.0.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:723)
 ~[ignite-core-3.0.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:525)
 ~[ignite-core-3.0.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.internal.client.TcpClientChannel.readError(TcpClientChannel.java:549)
 ~[ignite-client-3.0.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.internal.client.TcpClientChannel.processNextMessage(TcpClientChannel.java:435)
 ~[ignite-client-3.0.0-SNAPSHOT.jar:?]
at 
org.apache.ignite.internal.client.TcpClientChannel.lambda$onMessage$3(TcpClientChannel.java:277)
 ~[ignite-client-3.0.0-SNAPSHOT.jar:?]
at 
java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)
 ~[?:?]
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) 
~[?:?]
at 
java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
 ~[?:?]
at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) ~[?:?]
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) 
~[?:?]
at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) 
~[?:?]
Caused by: org.apache.ignite.lang.IgniteException: 
org.apache.ignite.lang.IgniteException: IGN-CMN-65535 
TraceId:60b4295a-8c18-4cdb-93e8-266bc9aaed88 Error while executing 
addWriteCommitted: [rowId=RowId [partitionId=13, 
uuid=0191-eb5c-824c-7a07-fa1210b49ed8], tableId=10, partitionId=13]
at 
org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.lambda$mapToPublicException$2(IgniteExceptionMapperUtil.java:88)
at 
org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.mapCheckingResultIsPublic(IgniteExceptionMapperUtil.java:141)
at 
org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.mapToPublicException(IgniteExceptionMapperUtil.java:137)
at 
org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.mapToPublicException(IgniteExceptionMapperUtil.java:88)
at 
org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.lambda$convertToPublicFuture$3(IgniteExceptionMapperUtil.java:178)
at 
java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930)
at

[jira] [Updated] (IGNITE-23056) Verbose logging of delta-files compaction

2024-09-12 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-23056:
---
Ignite Flags:   (was: Docs Required,Release Notes Required)

> Verbose logging of delta-files compaction
> -
>
> Key: IGNITE-23056
> URL: https://issues.apache.org/jira/browse/IGNITE-23056
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> For checkpoints we have a very extensive log message, that shows the duration 
> of each checkpoint's phase. We don't have that for compactor.
> In this Jira we need to implement that. The list of phases and statistics is 
> at developers discretion.
> As a bonus, we might want to print some values in microseconds instead of 
> milliseconds, and not use fast timestamp (they don't have enough 
> granularity). While we're doing that for compactor logs, we might as well 
> update checkpoint's logs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-23189) RocksDB tests flush too often

2024-09-11 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-23189:
---
Description: Write buffer size in rocksdb unit test for a corresponding 
storage engine is too small, it's flushed literally after every insertion. This 
makes these tests longer then they have to be, sometimes several seconds 
instead of several hundreds milliseconds. We should make this size bigger  
(was: Write buffer size in tests is too small, it's flushed literally after 
every insertion)

> RocksDB tests flush too often
> -
>
> Key: IGNITE-23189
> URL: https://issues.apache.org/jira/browse/IGNITE-23189
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Write buffer size in rocksdb unit test for a corresponding storage engine is 
> too small, it's flushed literally after every insertion. This makes these 
> tests longer then they have to be, sometimes several seconds instead of 
> several hundreds milliseconds. We should make this size bigger



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-23189) RocksDB tests flush too often

2024-09-11 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-23189:
--

 Summary: RocksDB tests flush too often
 Key: IGNITE-23189
 URL: https://issues.apache.org/jira/browse/IGNITE-23189
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov
Assignee: Ivan Bessonov
 Fix For: 3.0


Write buffer size in tests is too small, it's flushed literally after every 
insertion



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-22843) Writing into RAFT log is too long

2024-09-10 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-22843:
--

Assignee: Ivan Bessonov

> Writing into RAFT log is too long
> -
>
> Key: IGNITE-22843
> URL: https://issues.apache.org/jira/browse/IGNITE-22843
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> We are using RocksDB as RAFT log storage. Writing in the log is significantly 
> longer than writing in the memory-mapped buffer (as we used in Ignite 2).
> {noformat}
> appendLogEntry 0.8 6493700 6494500
> Here is hidden 0.5 us
> flushLog 20.1 6495000 6515100
> Here is hidden 2.8 us
> {noformat}
> h3. Definition of done
> We should find a way to implement faster log storage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-22609) MetaStorageListener can access KeyValueStorage after it had been closed

2024-09-05 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-22609:
--

Assignee: Ivan Bessonov

> MetaStorageListener can access KeyValueStorage after it had been closed
> ---
>
> Key: IGNITE-22609
> URL: https://issues.apache.org/jira/browse/IGNITE-22609
> Project: Ignite
>  Issue Type: Bug
>Reporter: Aleksandr Polovtsev
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Attachments: hs_err_pid2936001.log
>
>
> See the attached stacktrace for details. Looks like we are trying to process 
> a read command in {{MetaStorageListener}} while the underlying storage has 
> already been closed.  
> This may be happening because looks like we don't guarantee that 
> {{RaftManager#stopRaftNodes}} waits for all read commands to be processed. If 
> this is the case, a possible solution would be to add a busy lock either to 
> the {{MetaStorageListener}} or to the {{KeyValueStorage}}. But this needs to 
> be verified first. 
> Also, it must be checked if similar Raft Listeners in other components are 
> affected by the same issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IGNITE-22598) Failed to allocate temporary buffer for checkpoint

2024-08-30 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov resolved IGNITE-22598.

Resolution: Duplicate

> Failed to allocate temporary buffer for checkpoint
> --
>
> Key: IGNITE-22598
> URL: https://issues.apache.org/jira/browse/IGNITE-22598
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
> Attachments: 
> poc-tester-SERVER-192.168.1.41-id-0-2024-06-27-09-14-17-client.log.2
>
>
> h3. Motivation
> Many exception might appear in log of the thrutput test. After the partition 
> storage is in undefined state. Notsurprised the te continue work with the 
> storage leads to another issues.
> {noformat}
> 2024-06-27 12:19:46:881 +0300 
> [INFO][%poc-tester-SERVER-192.168.1.41-id-0%JRaft-FSMCaller-Disruptor_stripe_6-0][ActionRequestProcessor]
>  Error occurred on a user's state machine
> org.apache.ignite.internal.storage.StorageException: IGN-STORAGE-1 
> TraceId:0d512917-7a88-4a7c-94c9-03d86304997d Failed to put value into index
> at 
> org.apache.ignite.internal.storage.pagememory.index.hash.PageMemoryHashIndexStorage.lambda$put$1(PageMemoryHashIndexStorage.java:123)
> at 
> org.apache.ignite.internal.storage.pagememory.index.AbstractPageMemoryIndexStorage.busy(AbstractPageMemoryIndexStorage.java:336)
> at 
> org.apache.ignite.internal.storage.pagememory.index.AbstractPageMemoryIndexStorage.busyNonDataRead(AbstractPageMemoryIndexStorage.java:317)
> at 
> org.apache.ignite.internal.storage.pagememory.index.hash.PageMemoryHashIndexStorage.put(PageMemoryHashIndexStorage.java:109)
> at 
> org.apache.ignite.internal.table.distributed.TableSchemaAwareIndexStorage.put(TableSchemaAwareIndexStorage.java:83)
> at 
> org.apache.ignite.internal.table.distributed.index.IndexUpdateHandler.putToIndex(IndexUpdateHandler.java:270)
> at 
> org.apache.ignite.internal.table.distributed.index.IndexUpdateHandler.addToIndexes(IndexUpdateHandler.java:69)
> at 
> org.apache.ignite.internal.table.distributed.StorageUpdateHandler.tryProcessRow(StorageUpdateHandler.java:173)
> at 
> org.apache.ignite.internal.table.distributed.StorageUpdateHandler.lambda$handleUpdate$0(StorageUpdateHandler.java:114)
> at 
> org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.lambda$runConsistently$0(PersistentPageMemoryMvPartitionStorage.java:165)
> at 
> org.apache.ignite.internal.storage.pagememory.mv.AbstractPageMemoryMvPartitionStorage.busy(AbstractPageMemoryMvPartitionStorage.java:668)
> at 
> org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.runConsistently(PersistentPageMemoryMvPartitionStorage.java:155)
> at 
> org.apache.ignite.internal.table.distributed.raft.snapshot.outgoing.SnapshotAwarePartitionDataStorage.runConsistently(SnapshotAwarePartitionDataStorage.java:76)
> at 
> org.apache.ignite.internal.table.distributed.StorageUpdateHandler.handleUpdate(StorageUpdateHandler.java:109)
> at 
> org.apache.ignite.internal.table.distributed.raft.PartitionListener.handleUpdateCommand(PartitionListener.java:289)
> at 
> org.apache.ignite.internal.table.distributed.raft.PartitionListener.lambda$onWrite$1(PartitionListener.java:209)
> at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
> at 
> org.apache.ignite.internal.table.distributed.raft.PartitionListener.onWrite(PartitionListener.java:166)
> at 
> org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:702)
> at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.doApplyTasks(FSMCallerImpl.java:571)
> at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.doCommitted(FSMCallerImpl.java:539)
> at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:458)
> at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:131)
> at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:125)
> at 
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:326)
> at 
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:283)
> at 
> com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:167)
> at 
> com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:122)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: org.apache.ignite.internal.pagememory.tree.CorruptedTreeExc

[jira] [Updated] (IGNITE-23084) Implement checkpoint buffer protection

2024-08-30 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-23084:
---
Ignite Flags:   (was: Docs Required,Release Notes Required)

> Implement checkpoint buffer protection
> --
>
> Key: IGNITE-23084
> URL: https://issues.apache.org/jira/browse/IGNITE-23084
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Current implementation of checkpoint allows for checkpoint buffer overflow. 
> We should port a part that prioritizes cp-buffer pages in checkpoint writer, 
> if it's close to overflow.
> Throttling should not be ported as of right now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-23106) Wait for free space in checkpoint buffer

2024-08-30 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-23106:
---
Description: 
In PersistentPageMemory#postWriteLockPage we use a spin wait instead of 
properly waiting for a notification from checkpointer. This is, most likely, 
not optimal, and we should implement a fair waiting algorithm.

Another thing to consider - we should make checkpoint buffer size configurable.

  was:In PersistentPageMemory#postWriteLockPage we use a spin wait instead of 
properly waiting for a notification from checkpointer. This is, most likely, 
not optimal, and we should implement a fair waiting algorithm.


> Wait for free space in checkpoint buffer
> 
>
> Key: IGNITE-23106
> URL: https://issues.apache.org/jira/browse/IGNITE-23106
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> In PersistentPageMemory#postWriteLockPage we use a spin wait instead of 
> properly waiting for a notification from checkpointer. This is, most likely, 
> not optimal, and we should implement a fair waiting algorithm.
> Another thing to consider - we should make checkpoint buffer size 
> configurable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-23115) Checkpoint single partition from a single thread

2024-08-30 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-23115:
--

 Summary: Checkpoint single partition from a single thread
 Key: IGNITE-23115
 URL: https://issues.apache.org/jira/browse/IGNITE-23115
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


As far as I know, writing multiple files from multiple threads is more 
efficient that writing a single file from multiple threads. But that's exactly 
what we do.

We should make an alternative implementation that would distribute partitions 
between threads and check if it performs better than current implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-23106) Wait for free space in checkpoint buffer

2024-08-29 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-23106:
--

 Summary: Wait for free space in checkpoint buffer
 Key: IGNITE-23106
 URL: https://issues.apache.org/jira/browse/IGNITE-23106
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


In PersistentPageMemory#postWriteLockPage we use a spin wait instead of 
properly waiting for a notification from checkpointer. This is, most likely, 
not optimal, and we should implement a fair waiting algorithm.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-23105) Data race in aipersist partition destruction

2024-08-29 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-23105:
--

 Summary: Data race in aipersist partition destruction
 Key: IGNITE-23105
 URL: https://issues.apache.org/jira/browse/IGNITE-23105
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov


{{CheckpointProgressImpl#onStartPartitionProcessing}} and 
{{CheckpointProgressImpl#onFinishPartitionProcessing}} don't work as intended 
for several reasons:
 * There's a race, we could call {{onFinish}} before {{onStart}} is called in a 
concurrent thread. This might happen if there's only a handful of dirty pages 
in each partition and there are more than one checkpoint threads. Basically, 
this protection doesn't work.
 * Even if that particular race wouldn't exits, this code still doesn't work, 
because some of pages could be added to {{pageIdsToRetry}} map. That map will 
be processed later, when 
{{writePages}} is finished, manning that we mark unfinished partitions as 
finished.
 * Due to aforementioned bugs, I didn't bother including these methods to 
{{{}drainCheckpointBuffers{}}}. As a result, this method requires a fix too



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-23103) RandomLruPageReplacementPolicy is not fully ported

2024-08-29 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-23103:
--

 Summary: RandomLruPageReplacementPolicy is not fully ported
 Key: IGNITE-23103
 URL: https://issues.apache.org/jira/browse/IGNITE-23103
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov


There should be a line
{code:java}
if (relRmvAddr == rndAddr || pinned || skip || (dirty && (checkpointPages == 
null || !checkpointPages.contains(fullId {{code}
instead of
{code:java}
if (relRmvAddr == rndAddr || pinned || skip || dirty) { {code}
Due to this mistake we have several conditions, that are always evaluated to 
constants, namely
 * 
{{!dirty}} - always true
 * 
{{pageTs < dirtyTs && dirty && !storMeta}} - always false

Ideally, we should add tests that would cover this situation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-23084) Implement checkpoint buffer protection

2024-08-28 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-23084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-23084:
--

Assignee: Ivan Bessonov

> Implement checkpoint buffer protection
> --
>
> Key: IGNITE-23084
> URL: https://issues.apache.org/jira/browse/IGNITE-23084
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Current implementation of checkpoint allows for checkpoint buffer overflow. 
> We should port a part that prioritizes cp-buffer pages in checkpoint writer, 
> if it's close to overflow.
> Throttling should not be ported as of right now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-23084) Implement checkpoint buffer protection

2024-08-28 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-23084:
--

 Summary: Implement checkpoint buffer protection
 Key: IGNITE-23084
 URL: https://issues.apache.org/jira/browse/IGNITE-23084
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


Current implementation of checkpoint allows for checkpoint buffer overflow. We 
should port a part that prioritizes cp-buffer pages in checkpoint writer, if 
it's close to overflow.

Throttling should not be ported as of right now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22878) Periodic latency sinks on key-value KeyValueView#put

2024-08-26 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22878:
---
Description: 
h1. Results

I put it right here, because comments can be missed easily.
 * The main reason of performance dips is the fact that we locate both raft log 
and table data on the same storage device. We should test a configuration where 
they are separated.
 * {{rocksdb}} based log storage adds minor issues during its flush and 
compaction, it might cause 10-20% dips. It's not too critical, but it once 
again shows downsides of current implementation.
Reducing the number of threads that write SST files and compact them doesn't 
seem to do anything, although it's hard to say precisely. This part is not 
configurable, but I would investigate separately, whether or not it would make 
sense to set those values to 1.
 * Nothing really changes when you disable fsync.
 * Table data checkpoints and compaction have the most impact. For some reason, 
first checkpoint impacts the performance the worst, maybe due to some kind of a 
warmup.
Making checkpoints more frequent helps smoothing out the graph a little.
Reducing the number of checkpoint threads and compaction threads also helps 
smoothing out the graph, effects are more visible. Checkpoints become longer, 
obviously, but still don't overlap in single-put KV tests even under high load.

What's implemented in current JIRA:
 * Basic logs of rocksdb compaction.
 * Basic logs of aipersist compaction, that should be expanded in 
https://issues.apache.org/jira/browse/IGNITE-23056.

h1. Description

Build under test: Ignite 3, rev. 1e8959c0a000f0901085eb0b11b37db4299fa72a

Benchmark: 
[https://github.com/gridgain/YCSB/blob/ycsb-2024.14/ignite3/src/main/java/site/ycsb/db/ignite3/IgniteClient.java]
 
h1. Test environment

6 AWS VMs of type c5d.4xlarge:
 * vCPU    16
 * Memory    32
 * Storage    400 NVMe SSD
 * Network    up to 10 Gbps

h1. Test

Start 3 Ignite nodes (one node per host). Configuration:
 * raft.fsync=false
 * partitions=16
 * replicas=1

Start 3 YCSB clients (one client per host). Each YCSB client spawns 32 load 
threads and works with own key range. Parameters:
 * Client 1: {{-db site.ycsb.db.ignite3.IgniteClient -load -P 
/opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p 
hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 
-p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p 
status.interval=1 -p partitions=16 -p insertstart=510 -p 
insertcount=500 -s}}
 * Client 2: {{-db site.ycsb.db.ignite3.IgniteClient -load -P 
/opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p 
hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 
-p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p 
status.interval=1 -p partitions=16 -p insertstart=0 -p insertcount=500 -s}}
 * {{{}Client 3: {{-db site.ycsb.db.ignite3.IgniteClient -load -P 
/opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p 
hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 
-p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p 
status.interval=1 -p partitions=16 -p insertstart=1020 -p 
insertcount=500 -s{}

h1. Results

Results from each client are in the separate files (attached). 

>From these files we can draw transactions-per-second graphs:

!cl1.png!!cl2.png!!cl3.png!

Take a look at these sinks. We need to investigate the cause of them.

  was:
h1. Results

I put it right here, because comments can be missed easily.
 * The main reason of performance dips is the fact that we locate both raft log 
and table data on the same storage device. We should test a configuration where 
they are separated.
 * {{rocksdb}} based log storage adds minor issues during its flush and 
compaction, it might cause 10-20% dips. It's not too critical, but it once 
again shows downsides of current implementation.
Reducing the number of threads that write SST files and compact them doesn't 
seem to do anything, although it's hard to say precisely. This part is not 
configurable, but I would investigate separately, whether or not it would make 
sense to set those values to 1.
 * Nothing really changes when you disable fsync.
 * Table data checkpoints and compaction have the most impact. For some reason, 
first checkpoint impacts the performance the worst, maybe due to some kind of a 
warmup.
Making checkpoints more frequent helps smoothing out the graph a little.
Reducing the number of checkpoint threads and compaction threads also helps 
smoothing out the graph, effects are more visible. Checkpoints become longer, 
obviously, but still don't overlap in single-put KV tests even under high load.

h1. Description

Build under test: Ignite 3, rev. 1e8959c0a000f0901085eb0b11b37db4299fa72a

Benchmark: 
[https://githu

[jira] [Updated] (IGNITE-22878) Periodic latency sinks on key-value KeyValueView#put

2024-08-26 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22878:
---
Description: 
h1. Results

I put it right here, because comments can be missed easily.
 * The main reason of performance dips is the fact that we locate both raft log 
and table data on the same storage device. We should test a configuration where 
they are separated.
 * {{rocksdb}} based log storage adds minor issues during its flush and 
compaction, it might cause 10-20% dips. It's not too critical, but it once 
again shows downsides of current implementation.
Reducing the number of threads that write SST files and compact them doesn't 
seem to do anything, although it's hard to say precisely. This part is not 
configurable, but I would investigate separately, whether or not it would make 
sense to set those values to 1.
 * Nothing really changes when you disable fsync.
 * Table data checkpoints and compaction have the most impact. For some reason, 
first checkpoint impacts the performance the worst, maybe due to some kind of a 
warmup.
Making checkpoints more frequent helps smoothing out the graph a little.
Reducing the number of checkpoint threads and compaction threads also helps 
smoothing out the graph, effects are more visible. Checkpoints become longer, 
obviously, but still don't overlap in single-put KV tests even under high load.

h1. Description

Build under test: Ignite 3, rev. 1e8959c0a000f0901085eb0b11b37db4299fa72a

Benchmark: 
[https://github.com/gridgain/YCSB/blob/ycsb-2024.14/ignite3/src/main/java/site/ycsb/db/ignite3/IgniteClient.java]
 
h1. Test environment

6 AWS VMs of type c5d.4xlarge:
 * vCPU    16
 * Memory    32
 * Storage    400 NVMe SSD
 * Network    up to 10 Gbps

h1. Test

Start 3 Ignite nodes (one node per host). Configuration:
 * raft.fsync=false
 * partitions=16
 * replicas=1

Start 3 YCSB clients (one client per host). Each YCSB client spawns 32 load 
threads and works with own key range. Parameters:
 * Client 1: {{-db site.ycsb.db.ignite3.IgniteClient -load -P 
/opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p 
hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 
-p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p 
status.interval=1 -p partitions=16 -p insertstart=510 -p 
insertcount=500 -s}}
 * Client 2: {{-db site.ycsb.db.ignite3.IgniteClient -load -P 
/opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p 
hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 
-p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p 
status.interval=1 -p partitions=16 -p insertstart=0 -p insertcount=500 -s}}
 * {{{}Client 3: {{-db site.ycsb.db.ignite3.IgniteClient -load -P 
/opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p 
hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 
-p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p 
status.interval=1 -p partitions=16 -p insertstart=1020 -p 
insertcount=500 -s{}

h1. Results

Results from each client are in the separate files (attached). 

>From these files we can draw transactions-per-second graphs:

!cl1.png!!cl2.png!!cl3.png!

Take a look at these sinks. We need to investigate the cause of them.

  was:
h1. Results

I put it right here, because comments can be missed easily.
 * The main reason of performance dips is the fact that we locate both raft log 
and table data on the same storage device. We should test a configuration where 
they are separated.
 * {{rocksdb}} based log storage adds minor issues during its flush and 
compaction, it might cause 10-20% dips. It's not too critical, but it once 
again shows downsides of current implementation.
Reducing the number of threads that write SST files and compact them doesn't 
seem to do anything, although it's hard to say precisely. This part is not 
configurable, but I would investigate separately, whether or not it would make 
sense to set those values to 1.
 * Table data checkpoints and compaction have the most impact. For some reason, 
first checkpoint impacts the performance the worst, maybe due to some kind of a 
warmup.
Making checkpoints more frequent helps smoothing out the graph a little.
Reducing the number of checkpoint threads and compaction threads also helps 
smoothing out the graph, effects are more visible. Checkpoints become longer, 
obviously, but still don't overlap in single-put KV tests even under high load.

h1. Description

Build under test: Ignite 3, rev. 1e8959c0a000f0901085eb0b11b37db4299fa72a

Benchmark: 
[https://github.com/gridgain/YCSB/blob/ycsb-2024.14/ignite3/src/main/java/site/ycsb/db/ignite3/IgniteClient.java]
 
h1. Test environment

6 AWS VMs of type c5d.4xlarge:
 * vCPU    16
 * Memory    32
 * Storage    400 NVMe SSD
 * Network    up to 10 Gbps

h1

[jira] [Updated] (IGNITE-22878) Periodic latency sinks on key-value KeyValueView#put

2024-08-26 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22878:
---
Description: 
h1. Results

I put it right here, because comments can be missed easily.
 * The main reason of performance dips is the fact that we locate both raft log 
and table data on the same storage device. We should test a configuration where 
they are separated.
 * {{rocksdb}} based log storage adds minor issues during its flush and 
compaction, it might cause 10-20% dips. It's not too critical, but it once 
again shows downsides of current implementation.
Reducing the number of threads that write SST files and compact them doesn't 
seem to do anything, although it's hard to say precisely. This part is not 
configurable, but I would investigate separately, whether or not it would make 
sense to set those values to 1.
 * Table data checkpoints and compaction have the most impact. For some reason, 
first checkpoint impacts the performance the worst, maybe due to some kind of a 
warmup.
Making checkpoints more frequent helps smoothing out the graph a little.
Reducing the number of checkpoint threads and compaction threads also helps 
smoothing out the graph, effects are more visible. Checkpoints become longer, 
obviously, but still don't overlap in single-put KV tests even under high load.

h1. Description

Build under test: Ignite 3, rev. 1e8959c0a000f0901085eb0b11b37db4299fa72a

Benchmark: 
[https://github.com/gridgain/YCSB/blob/ycsb-2024.14/ignite3/src/main/java/site/ycsb/db/ignite3/IgniteClient.java]
 
h1. Test environment

6 AWS VMs of type c5d.4xlarge:
 * vCPU    16
 * Memory    32
 * Storage    400 NVMe SSD
 * Network    up to 10 Gbps

h1. Test

Start 3 Ignite nodes (one node per host). Configuration:
 * raft.fsync=false
 * partitions=16
 * replicas=1

Start 3 YCSB clients (one client per host). Each YCSB client spawns 32 load 
threads and works with own key range. Parameters:
 * Client 1: {{-db site.ycsb.db.ignite3.IgniteClient -load -P 
/opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p 
hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 
-p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p 
status.interval=1 -p partitions=16 -p insertstart=510 -p 
insertcount=500 -s}}
 * Client 2: {{-db site.ycsb.db.ignite3.IgniteClient -load -P 
/opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p 
hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 
-p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p 
status.interval=1 -p partitions=16 -p insertstart=0 -p insertcount=500 -s}}
 * {{{}Client 3: {{-db site.ycsb.db.ignite3.IgniteClient -load -P 
/opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p 
hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 
-p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p 
status.interval=1 -p partitions=16 -p insertstart=1020 -p 
insertcount=500 -s{}

h1. Results

Results from each client are in the separate files (attached). 

>From these files we can draw transactions-per-second graphs:

!cl1.png!!cl2.png!!cl3.png!

Take a look at these sinks. We need to investigate the cause of them.

  was:
Build under test: Ignite 3, rev. 1e8959c0a000f0901085eb0b11b37db4299fa72a

Benchmark: 
[https://github.com/gridgain/YCSB/blob/ycsb-2024.14/ignite3/src/main/java/site/ycsb/db/ignite3/IgniteClient.java]
 
h1. Test environment

6 AWS VMs of type c5d.4xlarge:
 * vCPU    16
 * Memory    32
 * Storage    400 NVMe SSD
 * Network    up to 10 Gbps

h1. Test

Start 3 Ignite nodes (one node per host). Configuration:
 * raft.fsync=false
 * partitions=16
 * replicas=1

Start 3 YCSB clients (one client per host). Each YCSB client spawns 32 load 
threads and works with own key range. Parameters:
 * Client 1: {{-db site.ycsb.db.ignite3.IgniteClient -load -P 
/opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p 
hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 
-p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p 
status.interval=1 -p partitions=16 -p insertstart=510 -p 
insertcount=500 -s}}
 * Client 2: {{-db site.ycsb.db.ignite3.IgniteClient -load -P 
/opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p 
hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 
-p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p 
status.interval=1 -p partitions=16 -p insertstart=0 -p insertcount=500 -s}}
 * {{Client 3: {{-db site.ycsb.db.ignite3.IgniteClient -load -P 
/opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p 
hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 
-p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p 
s

[jira] [Updated] (IGNITE-22878) Periodic latency sinks on key-value KeyValueView#put

2024-08-23 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22878:
---
Reviewer: Aleksandr Polovtsev

> Periodic latency sinks on key-value KeyValueView#put
> 
>
> Key: IGNITE-22878
> URL: https://issues.apache.org/jira/browse/IGNITE-22878
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 3.0.0-beta2
>Reporter: Ivan Artiukhov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Attachments: 2024-08-01-11-36-02_192.168.208.148_kv_load.txt, 
> 2024-08-01-11-36-02_192.168.209.141_kv_load.txt, 
> 2024-08-01-11-36-02_192.168.209.191_kv_load.txt, cl1.png, cl2.png, cl3.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Build under test: Ignite 3, rev. 1e8959c0a000f0901085eb0b11b37db4299fa72a
> Benchmark: 
> [https://github.com/gridgain/YCSB/blob/ycsb-2024.14/ignite3/src/main/java/site/ycsb/db/ignite3/IgniteClient.java]
>  
> h1. Test environment
> 6 AWS VMs of type c5d.4xlarge:
>  * vCPU    16
>  * Memory    32
>  * Storage    400 NVMe SSD
>  * Network    up to 10 Gbps
> h1. Test
> Start 3 Ignite nodes (one node per host). Configuration:
>  * raft.fsync=false
>  * partitions=16
>  * replicas=1
> Start 3 YCSB clients (one client per host). Each YCSB client spawns 32 load 
> threads and works with own key range. Parameters:
>  * Client 1: {{-db site.ycsb.db.ignite3.IgniteClient -load -P 
> /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p 
> hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 
> -p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p 
> status.interval=1 -p partitions=16 -p insertstart=510 -p 
> insertcount=500 -s}}
>  * Client 2: {{-db site.ycsb.db.ignite3.IgniteClient -load -P 
> /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p 
> hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 
> -p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p 
> status.interval=1 -p partitions=16 -p insertstart=0 -p insertcount=500 
> -s}}
>  * {{Client 3: {{-db site.ycsb.db.ignite3.IgniteClient -load -P 
> /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p 
> hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 
> -p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p 
> status.interval=1 -p partitions=16 -p insertstart=1020 -p 
> insertcount=500 -s
> h1. Results
> Results from each client are in the separate files (attached). 
> From these files we can draw transactions-per-second graphs:
> !cl1.png!!cl2.png!!cl3.png!
> Take a look at these sinks. We need to investigate the cause of them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-23056) Verbose logging of delta-files compaction

2024-08-22 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-23056:
--

 Summary: Verbose logging of delta-files compaction
 Key: IGNITE-23056
 URL: https://issues.apache.org/jira/browse/IGNITE-23056
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


For checkpoints we have a very extensive log message, that shows the duration 
of each checkpoint's phase. We don't have that for compactor.

In this Jira we need to implement that. The list of phases and statistics is at 
developers discretion.

As a bonus, we might want to print some values in microseconds instead of 
milliseconds, and not use fast timestamp (they don't have enough granularity). 
While we're doing that for compactor logs, we might as well update checkpoint's 
logs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22987) Log rocksdb flush events

2024-08-14 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22987:
---
Reviewer: Roman Puchkovskiy  (was: Philipp Shergalis)

> Log rocksdb flush events
> 
>
> Key: IGNITE-22987
> URL: https://issues.apache.org/jira/browse/IGNITE-22987
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> * {{org.apache.ignite.internal.rocksdb.flush.RocksDbFlushListener}} should 
> log its events and basic info about them (once per flush, we don't need an 
> individual log message per CF)
>  * We should add this listener to log storage, these logs will be most 
> valuable in it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-22987) Log rocksdb flush events

2024-08-14 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-22987:
--

Assignee: Ivan Bessonov

> Log rocksdb flush events
> 
>
> Key: IGNITE-22987
> URL: https://issues.apache.org/jira/browse/IGNITE-22987
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> * {{org.apache.ignite.internal.rocksdb.flush.RocksDbFlushListener}} should 
> log its events and basic info about them (once per flush, we don't need an 
> individual log message per CF)
>  * We should add this listener to log storage, these logs will be most 
> valuable in it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22987) Log rocksdb flush events

2024-08-14 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22987:
--

 Summary: Log rocksdb flush events
 Key: IGNITE-22987
 URL: https://issues.apache.org/jira/browse/IGNITE-22987
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


* {{org.apache.ignite.internal.rocksdb.flush.RocksDbFlushListener}} should log 
its events and basic info about them (once per flush, we don't need an 
individual log message per CF)
 * We should add this listener to log storage, these logs will be most valuable 
in it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-22878) Periodic latency sinks on key-value KeyValueView#put

2024-08-13 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-22878:
--

Assignee: Ivan Bessonov

> Periodic latency sinks on key-value KeyValueView#put
> 
>
> Key: IGNITE-22878
> URL: https://issues.apache.org/jira/browse/IGNITE-22878
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 3.0.0-beta2
>Reporter: Ivan Artiukhov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Attachments: 2024-08-01-11-36-02_192.168.208.148_kv_load.txt, 
> 2024-08-01-11-36-02_192.168.209.141_kv_load.txt, 
> 2024-08-01-11-36-02_192.168.209.191_kv_load.txt, cl1.png, cl2.png, cl3.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Build under test: Ignite 3, rev. 1e8959c0a000f0901085eb0b11b37db4299fa72a
> Benchmark: 
> [https://github.com/gridgain/YCSB/blob/ycsb-2024.14/ignite3/src/main/java/site/ycsb/db/ignite3/IgniteClient.java]
>  
> h1. Test environment
> 6 AWS VMs of type c5d.4xlarge:
>  * vCPU    16
>  * Memory    32
>  * Storage    400 NVMe SSD
>  * Network    up to 10 Gbps
> h1. Test
> Start 3 Ignite nodes (one node per host). Configuration:
>  * raft.fsync=false
>  * partitions=16
>  * replicas=1
> Start 3 YCSB clients (one client per host). Each YCSB client spawns 32 load 
> threads and works with own key range. Parameters:
>  * Client 1: {{-db site.ycsb.db.ignite3.IgniteClient -load -P 
> /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p 
> hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 
> -p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p 
> status.interval=1 -p partitions=16 -p insertstart=510 -p 
> insertcount=500 -s}}
>  * Client 2: {{-db site.ycsb.db.ignite3.IgniteClient -load -P 
> /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p 
> hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 
> -p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p 
> status.interval=1 -p partitions=16 -p insertstart=0 -p insertcount=500 
> -s}}
>  * {{Client 3: {{-db site.ycsb.db.ignite3.IgniteClient -load -P 
> /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p 
> hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 
> -p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p 
> status.interval=1 -p partitions=16 -p insertstart=1020 -p 
> insertcount=500 -s
> h1. Results
> Results from each client are in the separate files (attached). 
> From these files we can draw transactions-per-second graphs:
> !cl1.png!!cl2.png!!cl3.png!
> Take a look at these sinks. We need to investigate the cause of them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22952) IgniteDeploymentException upon using Compute API under Java 21

2024-08-08 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22952:
---
Release Note: Fixed lambda serialization issues in code deployment for Java 
21

> IgniteDeploymentException upon using Compute API under Java 21
> --
>
> Key: IGNITE-22952
> URL: https://issues.apache.org/jira/browse/IGNITE-22952
> Project: Ignite
>  Issue Type: Bug
>  Components: compute
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
> Fix For: 2.17
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> * Start a node via bin/ignite.sh
>  * Start the {{CacheAffinityExample}} on Java 21. The example is started with 
> the same JVM options which are used to start a node:
> {code:java}
> -DIGNITE_UPDATE_NOTIFIER=false
> -Xmx1g
> -Xms1g
> -DCONSISTENT_ID=1001
> --add-opens=java.base/jdk.internal.access=ALL-UNNAMED
> --add-opens=java.base/jdk.internal.misc=ALL-UNNAMED
> --add-opens=java.base/sun.nio.ch=ALL-UNNAMED
> --add-opens=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED
> --add-opens=jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED
> --add-opens=java.base/sun.reflect.generics.reflectiveObjects=ALL-UNNAMED
> --add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED
> --add-opens=java.base/java.io=ALL-UNNAMED
> --add-opens=java.base/java.net=ALL-UNNAMED
> --add-opens=java.base/java.nio=ALL-UNNAMED
> --add-opens=java.base/java.security.cert=ALL-UNNAMED
> --add-opens=java.base/java.util=ALL-UNNAMED
> --add-opens=java.base/java.util.concurrent=ALL-UNNAMED
> --add-opens=java.base/java.util.concurrent.locks=ALL-UNNAMED
> --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED
> --add-opens=java.base/java.lang=ALL-UNNAMED
> --add-opens=java.base/java.lang.invoke=ALL-UNNAMED
> --add-opens=java.base/java.math=ALL-UNNAMED
> --add-opens=java.base/java.time=ALL-UNNAMED
> --add-opens=java.base/sun.security.ssl=ALL-UNNAMED
> --add-opens=java.base/sun.security.x509=ALL-UNNAMED
> --add-opens=java.sql/java.sql=ALL-UNNAMED{code}
> Expected behavior:
>  * The example finishes without errors
> Actual behavior:
>  * The example fails with the following exception in the example’s log:
> {code:java}
> [2024-04-17T08:09:43.27][INFO][main][GridDeploymentLocalStore] Class locally 
> deployed: class 
> org.apache.ignite.examples.datagrid.CacheAffinityExample$$Lambda/0x7f7264515000
> [2024-04-17T08:09:43.384][WARNING][p2p-#78][GridDeploymentCommunication] 
> Failed to resolve class 
> [originatingNodeId=21e1dbde-b1b3-4eb2-8d8e-0418e4dfeb1b, 
> class=o.a.i.examples.datagrid.CacheAffinityExample$$Lambda.0x7f7264515000,
>  req=GridDeploymentRequest 
> [rsrcName=org/apache/ignite/examples/datagrid/CacheAffinityExample$$Lambda/0x7f7264515000.class,
>  ldrId=dbcba1bee81-37bf182b-b8f7-471e-ae26-7145048e19d1, isUndeploy=false, 
> nodeIds=null]]
> java.lang.ClassNotFoundException: 
> org.apache.ignite.examples.datagrid.CacheAffinityExample$$Lambda.0x7f7264515000
> at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
> at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
> at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:526)
> at java.base/java.lang.Class.forName0(Native Method)
> at java.base/java.lang.Class.forName(Class.java:534)
> at java.base/java.lang.Class.forName(Class.java:513)
> at 
> org.apache.ignite.internal.managers.deployment.GridDeploymentCommunication.processResourceRequest(GridDeploymentCommunication.java:218)
> at 
> org.apache.ignite.internal.managers.deployment.GridDeploymentCommunication.processDeploymentRequest(GridDeploymentCommunication.java:155)
> at 
> org.apache.ignite.internal.managers.deployment.GridDeploymentCommunication.access$000(GridDeploymentCommunication.java:55)
> at 
> org.apache.ignite.internal.managers.deployment.GridDeploymentCommunication$1.onMessage(GridDeploymentCommunication.java:91)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1727)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4500(GridIoManager.java:158)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager$7.run(GridIoManager.java:1164)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
> at java.base/java.lang.Thread.run(Thread.java:1583){code}
> The following exception is seen in the server node’s log:
> {code:java}
> [2024-04-17T08:09:43.391][INFO][pub-#77][GridDeploymentPerVersionStore] 
> Failed t

[jira] [Updated] (IGNITE-22952) IgniteDeploymentException upon using Compute API under Java 21

2024-08-07 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22952:
---
Description: 
* Start a node via bin/ignite.sh
 * Start the {{CacheAffinityExample}} on Java 21. The example is started with 
the same JVM options which are used to start a node:

{code:java}
-DIGNITE_UPDATE_NOTIFIER=false
-Xmx1g
-Xms1g
-DCONSISTENT_ID=1001
--add-opens=java.base/jdk.internal.access=ALL-UNNAMED
--add-opens=java.base/jdk.internal.misc=ALL-UNNAMED
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED
--add-opens=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED
--add-opens=jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED
--add-opens=java.base/sun.reflect.generics.reflectiveObjects=ALL-UNNAMED
--add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED
--add-opens=java.base/java.io=ALL-UNNAMED
--add-opens=java.base/java.net=ALL-UNNAMED
--add-opens=java.base/java.nio=ALL-UNNAMED
--add-opens=java.base/java.security.cert=ALL-UNNAMED
--add-opens=java.base/java.util=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent.locks=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED
--add-opens=java.base/java.lang=ALL-UNNAMED
--add-opens=java.base/java.lang.invoke=ALL-UNNAMED
--add-opens=java.base/java.math=ALL-UNNAMED
--add-opens=java.base/java.time=ALL-UNNAMED
--add-opens=java.base/sun.security.ssl=ALL-UNNAMED
--add-opens=java.base/sun.security.x509=ALL-UNNAMED
--add-opens=java.sql/java.sql=ALL-UNNAMED{code}
Expected behavior:
 * The example finishes without errors

Actual behavior:
 * The example fails with the following exception in the example’s log:

{code:java}
[2024-04-17T08:09:43.27][INFO][main][GridDeploymentLocalStore] Class locally 
deployed: class 
org.apache.ignite.examples.datagrid.CacheAffinityExample$$Lambda/0x7f7264515000
[2024-04-17T08:09:43.384][WARNING][p2p-#78][GridDeploymentCommunication] Failed 
to resolve class [originatingNodeId=21e1dbde-b1b3-4eb2-8d8e-0418e4dfeb1b, 
class=o.a.i.examples.datagrid.CacheAffinityExample$$Lambda.0x7f7264515000, 
req=GridDeploymentRequest 
[rsrcName=org/apache/ignite/examples/datagrid/CacheAffinityExample$$Lambda/0x7f7264515000.class,
 ldrId=dbcba1bee81-37bf182b-b8f7-471e-ae26-7145048e19d1, isUndeploy=false, 
nodeIds=null]]
java.lang.ClassNotFoundException: 
org.apache.ignite.examples.datagrid.CacheAffinityExample$$Lambda.0x7f7264515000
at 
java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
    at 
java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:526)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:534)
at java.base/java.lang.Class.forName(Class.java:513)
at 
org.apache.ignite.internal.managers.deployment.GridDeploymentCommunication.processResourceRequest(GridDeploymentCommunication.java:218)
at 
org.apache.ignite.internal.managers.deployment.GridDeploymentCommunication.processDeploymentRequest(GridDeploymentCommunication.java:155)
at 
org.apache.ignite.internal.managers.deployment.GridDeploymentCommunication.access$000(GridDeploymentCommunication.java:55)
at 
org.apache.ignite.internal.managers.deployment.GridDeploymentCommunication$1.onMessage(GridDeploymentCommunication.java:91)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1727)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4500(GridIoManager.java:158)
at 
org.apache.ignite.internal.managers.communication.GridIoManager$7.run(GridIoManager.java:1164)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583){code}
The following exception is seen in the server node’s log:
{code:java}
[2024-04-17T08:09:43.391][INFO][pub-#77][GridDeploymentPerVersionStore] Failed 
to get resource from node [nodeId=37bf182b-b8f7-471e-ae26-7145048e19d1, 
clsLdrId=dbcba1bee81-37bf182b-b8f7-471e-ae26-7145048e19d1, 
resName=org/apache/ignite/examples/datagrid/CacheAffinityExample$$Lambda/0x7f7264515000.class,
 
classLoadersHierarchy=org.apache.ignite.internal.managers.deployment.GridDeploymentClassLoader->jdk.internal.loader.ClassLoaders$AppClassLoader->jdk.internal.loader.ClassLoaders$PlatformClassLoader,
 msg=Requested resource not found (ignoring locally) 
[originatingNodeId=21e1dbde-b1b3-4eb2-8d8e-0418e4dfeb1b, 
resourceName=org/apache/ignite/examples/datagrid/CacheAffinityExample$$Lambda/0x7f7264515000.class,
 classLoaderId=dbcba1bee81-37bf182b-b8f7-471e-ae26-7145048e19d1]]
[2024-04-17T08:09:43.392][S

[jira] [Created] (IGNITE-22952) IgniteDeploymentException upon using Compute API under Java 21

2024-08-07 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22952:
--

 Summary: IgniteDeploymentException upon using Compute API under 
Java 21
 Key: IGNITE-22952
 URL: https://issues.apache.org/jira/browse/IGNITE-22952
 Project: Ignite
  Issue Type: Bug
  Components: compute
Reporter: Ivan Bessonov
Assignee: Ivan Bessonov
 Fix For: 2.17


* Start a node via bin/ignite.sh

 * Start the CacheAffinityExample on Java 21. The example is started with the 
same JVM options which are used to start a node:

{{}}
{code:java}
'-DIGNITE_UPDATE_NOTIFIER=false', '-Xmx1g', '-Xms1g', '-DCONSISTENT_ID=1001', 
'--add-opens=java.base/jdk.internal.access=ALL-UNNAMED', 
'--add-opens=java.base/jdk.internal.misc=ALL-UNNAMED', 
'--add-opens=java.base/sun.nio.ch=ALL-UNNAMED', 
'--add-opens=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED', 
'--add-opens=jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED', 
'--add-opens=java.base/sun.reflect.generics.reflectiveObjects=ALL-UNNAMED', 
'--add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED', 
'--add-opens=java.base/java.io=ALL-UNNAMED', 
'--add-opens=java.base/java.net=ALL-UNNAMED', 
'--add-opens=java.base/java.nio=ALL-UNNAMED', 
'--add-opens=java.base/java.security.cert=ALL-UNNAMED', 
'--add-opens=java.base/java.util=ALL-UNNAMED', 
'--add-opens=java.base/java.util.concurrent=ALL-UNNAMED', 
'--add-opens=java.base/java.util.concurrent.locks=ALL-UNNAMED', 
'--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED', 
'--add-opens=java.base/java.lang=ALL-UNNAMED', 
'--add-opens=java.base/java.lang.invoke=ALL-UNNAMED', 
'--add-opens=java.base/java.math=ALL-UNNAMED', 
'--add-opens=java.base/java.time=ALL-UNNAMED', 
'--add-opens=java.base/sun.security.ssl=ALL-UNNAMED', 
'--add-opens=java.base/sun.security.x509=ALL-UNNAMED', 
'--add-opens=java.sql/java.sql=ALL-UNNAMED'{code}
{{}}

Expected behaviour:
 * The example finishes without errors

Actual behaviour:
 * The example fails with the following exception in the example’s log:

{{}}
{code:java}
[2024-04-17T08:09:43.27][INFO][main][GridDeploymentLocalStore] Class locally 
deployed: class 
org.apache.ignite.examples.datagrid.CacheAffinityExample$$Lambda/0x7f7264515000
 [2024-04-17T08:09:43.384][WARNING][p2p-#78][GridDeploymentCommunication] 
Failed to resolve class 
[originatingNodeId=21e1dbde-b1b3-4eb2-8d8e-0418e4dfeb1b, 
class=o.a.i.examples.datagrid.CacheAffinityExample$$Lambda.0x7f7264515000, 
req=GridDeploymentRequest 
[rsrcName=org/apache/ignite/examples/datagrid/CacheAffinityExample$$Lambda/0x7f7264515000.class,
 ldrId=dbcba1bee81-37bf182b-b8f7-471e-ae26-7145048e19d1, isUndeploy=false, 
nodeIds=null]] java.lang.ClassNotFoundException: 
org.apache.ignite.examples.datagrid.CacheAffinityExample$$Lambda.0x7f7264515000
 at 
java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
 at 
java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
 at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:526) at 
java.base/java.lang.Class.forName0(Native Method) at 
java.base/java.lang.Class.forName(Class.java:534) at 
java.base/java.lang.Class.forName(Class.java:513) at 
org.apache.ignite.internal.managers.deployment.GridDeploymentCommunication.processResourceRequest(GridDeploymentCommunication.java:218)
 at 
org.apache.ignite.internal.managers.deployment.GridDeploymentCommunication.processDeploymentRequest(GridDeploymentCommunication.java:155)
 at 
org.apache.ignite.internal.managers.deployment.GridDeploymentCommunication.access$000(GridDeploymentCommunication.java:55)
 at 
org.apache.ignite.internal.managers.deployment.GridDeploymentCommunication$1.onMessage(GridDeploymentCommunication.java:91)
 at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1727)
 at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4500(GridIoManager.java:158)
 at 
org.apache.ignite.internal.managers.communication.GridIoManager$7.run(GridIoManager.java:1164)
 at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
 at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
 at java.base/java.lang.Thread.run(Thread.java:1583){code}
{{}}

The following exception is seen in the server node’s log:

{{}}
{code:java}
[2024-04-17T08:09:43.391][INFO][pub-#77][GridDeploymentPerVersionStore] Failed 
to get resource from node [nodeId=37bf182b-b8f7-471e-ae26-7145048e19d1, 
clsLdrId=dbcba1bee81-37bf182b-b8f7-471e-ae26-7145048e19d1, 
resName=org/apache/ignite/examples/datagrid/CacheAffinityExample$$Lambda/0x7f7264515000.class,
 
classLoadersHierarchy=org.apache.ignite.internal.managers.deployment.GridDeploymentClassLoader->jdk.internal.loader.ClassLoaders$AppClassLoader->jdk.internal.loader.ClassLoaders$PlatformClassLoa

[jira] [Updated] (IGNITE-22667) Optimise RocksDB sorted indexes

2024-08-01 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22667:
---
Description: 
We should write comparator for RocksDB in C++, contact [~ibessonov] for 
references

Examples of native comparators for RocksDB can be found here:

[https://github.com/facebook/rocksdb/blob/2e09a54c4fb82e88bcaa3e7cfa8ccf3635d5/java/src/test/java/org/rocksdb/NativeComparatorWrapperTest.java]

[https://github.com/facebook/rocksdb/blob/06c8afeff5b9fd38a79bdd4ba1bbb9df572c8096/java/rocksjni/native_comparator_wrapper_test.cc]

Only applicable for sorted indexes, because they use complicated algorithm for 
comparing binary tuples. Schema of these tuples is encoded in CF name, so 
reading it is not an issue.

  was:We should write comparator for RocksDB in C++, contact [~ibessonov] for 
references


> Optimise RocksDB sorted indexes
> ---
>
> Key: IGNITE-22667
> URL: https://issues.apache.org/jira/browse/IGNITE-22667
> Project: Ignite
>  Issue Type: Improvement
>  Components: persistence
>Reporter: Philipp Shergalis
>Priority: Major
>  Labels: ignite-3
>
> We should write comparator for RocksDB in C++, contact [~ibessonov] for 
> references
> Examples of native comparators for RocksDB can be found here:
> [https://github.com/facebook/rocksdb/blob/2e09a54c4fb82e88bcaa3e7cfa8ccf3635d5/java/src/test/java/org/rocksdb/NativeComparatorWrapperTest.java]
> [https://github.com/facebook/rocksdb/blob/06c8afeff5b9fd38a79bdd4ba1bbb9df572c8096/java/rocksjni/native_comparator_wrapper_test.cc]
> Only applicable for sorted indexes, because they use complicated algorithm 
> for comparing binary tuples. Schema of these tuples is encoded in CF name, so 
> reading it is not an issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22852) Remove extra leaf access while modifying data in B+Tree

2024-07-27 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22852:
--

 Summary: Remove extra leaf access while modifying data in B+Tree
 Key: IGNITE-22852
 URL: https://issues.apache.org/jira/browse/IGNITE-22852
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


Currently, put operation does the following, when it finds a leaf node:
 # acquire read lock
 # check triangle invariant
 # find insertion point
 # release read lock
 # acquire write lock
 # check triangle invariant
 # find insertion point
 # insert/replace data
 # maybe release write lock

I'm simplifying it a little bit, but the fact is that steps 4-7 could 
potentially be ignored, if we had an option to convert read lock into write 
lock.

There's already an API for that in offheap lock class 
({{{}upgradeToWriteLock{}}}). We could use it, or make another method with 
"try" semantics, that wouldn't acquire write lock if it couldn't acquire it 
using spin-wait with limited number of iterations.

Same approach is possible for "put" and "remove". "invoke" requires some 
intermediary actions, so I wouldn't modify it for now.

As a result, we can expect slightly improved performance of put/remove 
operations. We will see it in benchmarks, most likely.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22819) Metastorage revisions inconsistency

2024-07-24 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22819:
---
Description: 
Following situation might happen:
{code:java}
[2024-07-24T09:29:17,220][INFO 
][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:220 +0300, logical=0, 
composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:220 +0300, logical=1, 
composite=112840052389969921], removedEntriesCount=0, cacheSize=240].

... 

[2024-07-24T09:29:17,257][INFO 
][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=0, 
composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=1, 
composite=112840052392394753], removedEntriesCount=3, cacheSize=237].
[2024-07-24T09:29:17,257][INFO 
][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=0, 
composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=1, 
composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code}
Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, because 
of the difference of their clocks. {{evictIdempotentCommandsCache}} works 
differently on different nodes for the same raft commands.

The real problem here is that it might (or might not) call the 
{{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would increase 
local revision.
 
Revision is always local, it's never replicated. Revision mismatch leads to 
different evaluation of conditions in conditional updates and invokes. Simple 
example of such an issue would be a skipped configuration update on one or 
several nodes in cluster.
 
 What can we do about it:
 * make an alternative for {{removeAll}} that doesn't increase local revision
 * call {{removeAll}} even if the list is empty
 * never invalidate cache locally, but rather replicate cache invalidation with 
a special command
 * there's a TODO that says "clear this during compaction". That's a bad 
option, it would lead to either frequent compactions, or huge memory overheads

  was:
Following situation might happen:
{code:java}
[2024-07-24T09:29:17,220][INFO 
][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:220 +0300, logical=0, 
composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:220 +0300, logical=1, 
composite=112840052389969921], removedEntriesCount=0, cacheSize=240].

... 

[2024-07-24T09:29:17,257][INFO 
][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=0, 
composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=1, 
composite=112840052392394753], removedEntriesCount=3, cacheSize=237].
[2024-07-24T09:29:17,257][INFO 
][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=0, 
composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=1, 
composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code}
Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, because 
of the difference of their clocks. {{evictIdempotentCommandsCache}} works 
differently on different nodes for the same raft commands.

The real problem here is that it might (or might not) call the 
{{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would increase 
local revision.
 
Revision is always local, it's never replicated. Revision mismatch leads to 
different evaluation of conditions in conditional updates and invokes. Simple 
example of such an issue would be a skipped configuration update on one or 
several nodes in cluster.
 
 What can we do about it:
 * make an alternative for {{removeAll}} that doesn't increase local revision
 * never invalidate cache locally, but rather replicate cache invalidation with 
a special command
 * there's a TODO that says "clear this during compaction". That's a 

[jira] [Updated] (IGNITE-22819) Metastorage revisions inconsistency

2024-07-24 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22819:
---
Description: 
Following situation might happen:
{code:java}
[2024-07-24T09:29:17,220][INFO 
][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:220 +0300, logical=0, 
composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:220 +0300, logical=1, 
composite=112840052389969921], removedEntriesCount=0, cacheSize=240].

... 

[2024-07-24T09:29:17,257][INFO 
][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=0, 
composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=1, 
composite=112840052392394753], removedEntriesCount=3, cacheSize=237].
[2024-07-24T09:29:17,257][INFO 
][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=0, 
composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=1, 
composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code}
Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, because 
of the difference of their clocks. {{evictIdempotentCommandsCache}} works 
differently on different nodes for the same raft commands.

The real problem here is that it might (or might not) call the 
{{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would increase 
local revision.
 
Revision is always local, it's never replicated. Revision mismatch leads to 
different evaluation of conditions in conditional updates and invokes. Simple 
example of such an issue would be a skipped configuration update on one or 
several nodes in cluster.
 
 What can we do about it:
 * make an alternative for {{removeAll}} that doesn't increase local revision
 * never invalidate cache locally, but rather replicate cache invalidation with 
a special command
 * there's a TODO that says "clear this during compaction". That's a bad 
option, it would lead to either frequent compactions, or huge memory overheads

  was:
 

Following situation might happen:
{code:java}
[2024-07-24T09:29:17,220][INFO 
][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:220 +0300, logical=0, 
composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:220 +0300, logical=1, 
composite=112840052389969921], removedEntriesCount=0, cacheSize=240].

... 

[2024-07-24T09:29:17,257][INFO 
][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=0, 
composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=1, 
composite=112840052392394753], removedEntriesCount=3, cacheSize=237].
[2024-07-24T09:29:17,257][INFO 
][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=0, 
composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=1, 
composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code}
Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, because 
of the difference of their clocks. {{evictIdempotentCommandsCache}} works 
differently on different nodes for the same raft commands.

The real problem here is that it might (or might not) call the 
{{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would increase 
local revision.
 
Revision is always local, it's never replicated. Revision mismatch leads to 
different evaluation of conditions in conditional updates and invokes. Simple 
example of such an issue would be a skipped configuration update on one or 
several nodes in cluster.
 
 What can we do about it:
 * make an alternative for {{removeAll}} that doesn't increase local revision
 * never invalidate cache locally, but rather replicate cache invalidation with 
a special command
 * there's a TODO that says "clear this during compaction". That's a bad 
option, it would lead to either frequent

[jira] [Updated] (IGNITE-22819) Metastorage revisions inconsistency

2024-07-24 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22819:
---
Description: 
 

Following situation might happen:
{code:java}
[2024-07-24T09:29:17,220][INFO 
][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:220 +0300, logical=0, 
composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:220 +0300, logical=1, 
composite=112840052389969921], removedEntriesCount=0, cacheSize=240].

... 

[2024-07-24T09:29:17,257][INFO 
][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=0, 
composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=1, 
composite=112840052392394753], removedEntriesCount=3, cacheSize=237].
[2024-07-24T09:29:17,257][INFO 
][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=0, 
composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=1, 
composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code}
Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, because 
of the difference of their clocks. {{evictIdempotentCommandsCache}} works 
differently on different nodes for the same raft commands.

The real problem here is that it might (or might not) call the 
{{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would increase 
local revision.
 
Revision is always local, it's never replicated. Revision mismatch leads to 
different evaluation of conditions in conditional updates and invokes. Simple 
example of such an issue would be a skipped configuration update on one or 
several nodes in cluster.
 
 What can we do about it:
 * make an alternative for {{removeAll}} that doesn't increase local revision
 * never invalidate cache locally, but rather replicate cache invalidation with 
a special command
 * there's a TODO that says "clear this during compaction". That's a bad 
option, it would lead to either frequent compactions, or huge memory overheads

  was:
 

Following situation might happen:
{code:java}
[2024-07-24T09:29:17,220][INFO 
][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:220 +0300, logical=0, 
composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:220 +0300, logical=1, 
composite=112840052389969921], removedEntriesCount=0, cacheSize=240].

... 

[2024-07-24T09:29:17,257][INFO 
][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=0, 
composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=1, 
composite=112840052392394753], removedEntriesCount=3, cacheSize=237].
[2024-07-24T09:29:17,257][INFO 
][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=0, 
composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=1, 
composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code}
Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, because 
of the difference of their clocks. {{evictIdempotentCommandsCache}} works 
differently on different nodes for the same raft commands.

The real problem here is that it might (or might not) call the 
{{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would increase 
local revision.
 
Revision is always local, it's never replicated. Revision mismatch leads to 
different evaluation of conditions in conditional updates and invokes. Simple 
example of such an issue would be a skipped configuration update on one of or 
several nodes in cluster.
 
 
 


> Metastorage revisions inconsistency
> ---
>
> Key: IGNITE-22819
> URL: https://issues.apache.org/jira/browse/IGNITE-22819
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Priority: 

[jira] [Created] (IGNITE-22819) Metastorage revisions inconsistency

2024-07-24 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22819:
--

 Summary: Metastorage revisions inconsistency
 Key: IGNITE-22819
 URL: https://issues.apache.org/jira/browse/IGNITE-22819
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov


 

Following situation might happen:
{code:java}
[2024-07-24T09:29:17,220][INFO 
][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:220 +0300, logical=0, 
composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:220 +0300, logical=1, 
composite=112840052389969921], removedEntriesCount=0, cacheSize=240].

... 

[2024-07-24T09:29:17,257][INFO 
][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=0, 
composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=1, 
composite=112840052392394753], removedEntriesCount=3, cacheSize=237].
[2024-07-24T09:29:17,257][INFO 
][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler]
 Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=0, 
composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp 
[physical=2024-07-24 09:29:17:257 +0300, logical=1, 
composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code}
Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, because 
of the difference of their clocks. {{evictIdempotentCommandsCache}} works 
differently on different nodes for the same raft commands.

The real problem here is that it might (or might not) call the 
{{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would increase 
local revision.
 
Revision is always local, it's never replicated. Revision mismatch leads to 
different evaluation of conditions in conditional updates and invokes. Simple 
example of such an issue would be a skipped configuration update on one of or 
several nodes in cluster.
 
 
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-22736) PartitionCommandsMarshallerImpl corrupts the buffer it reads from

2024-07-15 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-22736:
--

Fix Version/s: 3.0.0-beta2
 Assignee: Ivan Bessonov
   Labels: ignite-3  (was: )

> PartitionCommandsMarshallerImpl corrupts the buffer it reads from
> -
>
> Key: IGNITE-22736
> URL: https://issues.apache.org/jira/browse/IGNITE-22736
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> {{PartitionCommandsMarshallerImpl#unmarshall}} receives a buffer, that's 
> requested from the log manager, for example.
> The instance of byte buffer that it receives might be acquired from on-heap 
> cache of log entries. Modifying it would be
>  # not thread-safe, because multiple threads may start modifying it 
> concurrently
>  # illegal, because it stays in the cache for some time, and we basically 
> corrupt it by modifying it
> We shouldn't do that



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22736) PartitionCommandsMarshallerImpl corrupts the buffer it reads from

2024-07-15 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22736:
--

 Summary: PartitionCommandsMarshallerImpl corrupts the buffer it 
reads from
 Key: IGNITE-22736
 URL: https://issues.apache.org/jira/browse/IGNITE-22736
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov


{{PartitionCommandsMarshallerImpl#unmarshall}} receives a buffer, that's 
requested from the log manager, for example.

The instance of byte buffer that it receives might be acquired from on-heap 
cache of log entries. Modifying it would be
 # not thread-safe, because multiple threads may start modifying it concurrently
 # illegal, because it stays in the cache for some time, and we basically 
corrupt it by modifying it

We shouldn't do that



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22657) Investigate why ItDisasterRecoveryReconfigurationTest#testIncompleteRebalanceAfterResetPartitions fails without sleep

2024-07-03 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22657:
--

 Summary: Investigate why 
ItDisasterRecoveryReconfigurationTest#testIncompleteRebalanceAfterResetPartitions
 fails without sleep
 Key: IGNITE-22657
 URL: https://issues.apache.org/jira/browse/IGNITE-22657
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-21303) Exclude nodes in "error" state from manual group reconfiguration

2024-06-27 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-21303:
--

Assignee: Ivan Bessonov

> Exclude nodes in "error" state from manual group reconfiguration
> 
>
> Key: IGNITE-21303
> URL: https://issues.apache.org/jira/browse/IGNITE-21303
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> Instead of simply using existing set of node as a baseline for new 
> assignments, we should either exclude peers in ERROR state from it, or force 
> data cleanup on such nodes. Third option - forbid such reconfiguration, 
> forcing user to clear ERROR peers in advance



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (IGNITE-22500) Remove unnecessary waits when creating an index

2024-06-26 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov resolved IGNITE-22500.

Resolution: Won't Fix

About eliminating a BUILDING status from catalog, we can't simply change a few 
lines, this task involves more changes. To my understanding, following nuances 
are important:
 * ChangeIndexStatusTask should be changed. If we remove REGISTERED->BUILDING 
change, then we wouldn't have to update catalog, this will lead to small 
refactoring.
 * We would have to create {{CatalogEvent.INDEX_BUILDING}} event instead of 
updating the catalog.
 * This event will have nothing to do with catalog at this point, it should be 
renamed.
 * It will *not* be fired in a context of meta-storage watch execution, which 
might be a problem if listener implementations rely on it. Spoiler: they do.
 * Local recovery and other such stuff will be changed slightly, this part 
shouldn't be that hard.

Overall, I don't think that we should do such an optimization in this issue 
specifically. It's not about "removing wait that we don't need", it's about 
changing the internal protocol of index creation. I will file another Jira for 
that soon

> Remove unnecessary waits when creating an index
> ---
>
> Key: IGNITE-22500
> URL: https://issues.apache.org/jira/browse/IGNITE-22500
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Roman Puchkovskiy
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> When creating an index with current defaults (DelayDuration=1sec, 
> MaxClockSkew=500ms, IdleSafeTimePropagationPeriod=1sec), it takes 6-6.5 
> seconds on my machine (without concurrent transactions, on an empty table 
> that was just created).
> According to the design, we need to first wait for the REGISTERED state to 
> activate on all nodes, including the ones that are currently down; this is to 
> make sure that all transactions started on schema versions before the index 
> creation have finished before we start to build the index (this makes us 
> waiting for DelayDuration+MaxClockSkew). Then, after the build finishes, we 
> switch the index to the AVAILABLE state. This requires another wait of 
> DelayDuration+MaxClockSkew.
> Because of IGNITE-20378, in the second case we actually wait longer (for 
> additional IdleSafeTimePropagationPeriod+MaxClockSkew).
> The total of waits is thus 1.5+3=4.5sec. But index creation actually takes 
> 6-6.5 seconds. It looks like there are some additional delays (like 
> submitting to the Metastorage and executing its watches).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-22500) Remove unnecessary waits when creating an index

2024-06-25 Thread Ivan Bessonov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859937#comment-17859937
 ] 

Ivan Bessonov commented on IGNITE-22500:


My thoughts on the topic:
 * _We have additional switch from REGISTERED to BUILDING, which can in theory 
be eliminated from catalog, it'll save us additional second (DD is 500ms now)_
 * We can't lower DD for a specific status change, because it would violate 
schema synchronization protocol. After waiting for "msSafeTime - DD - skew" 
(don't remember precise rules about clock skew) we rely on the fact that the 
catalog is up-to-date, breaking that invariant would lead to some unforeseen 
consequences.
 * What we really need it:
 ** The ability to create indexes in the same DDL as the table itself. We do 
this implicitly for PK. For other indexes it's only a question of API
 ** For SQL scripts we could batch consecutive DDLs and create indexes at the 
same time as a table implicitly, which seems like an optimal choice. This way 
we don't need any special syntax
 ** Some DDL queries can be executed in parallel, why not. Again, seems more 
like a SQL issue to me

> Remove unnecessary waits when creating an index
> ---
>
> Key: IGNITE-22500
> URL: https://issues.apache.org/jira/browse/IGNITE-22500
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Roman Puchkovskiy
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> When creating an index with current defaults (DelayDuration=1sec, 
> MaxClockSkew=500ms, IdleSafeTimePropagationPeriod=1sec), it takes 6-6.5 
> seconds on my machine (without concurrent transactions, on an empty table 
> that was just created).
> According to the design, we need to first wait for the REGISTERED state to 
> activate on all nodes, including the ones that are currently down; this is to 
> make sure that all transactions started on schema versions before the index 
> creation have finished before we start to build the index (this makes us 
> waiting for DelayDuration+MaxClockSkew). Then, after the build finishes, we 
> switch the index to the AVAILABLE state. This requires another wait of 
> DelayDuration+MaxClockSkew.
> Because of IGNITE-20378, in the second case we actually wait longer (for 
> additional IdleSafeTimePropagationPeriod+MaxClockSkew).
> The total of waits is thus 1.5+3=4.5sec. But index creation actually takes 
> 6-6.5 seconds. It looks like there are some additional delays (like 
> submitting to the Metastorage and executing its watches).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22561) Get rid of ByteString in messages

2024-06-24 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22561:
--

 Summary: Get rid of ByteString in messages
 Key: IGNITE-22561
 URL: https://issues.apache.org/jira/browse/IGNITE-22561
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


Here I would include two types of improvements:
 * {{@Marshallable ByteString}} - this pattern became obsolete long time ago. 
{{ByteBuffer}} type is natively supported by the protocol, and it should 
eliminate unnecessary data copying, potentioally making the system faster
 * Pretty much the same thing, but for {{{}byte[]{}}}. It's used in classes 
like {{{}org.apache.ignite.internal.metastorage.dsl.Operation{}}}. If we 
migrate these properties to {{ByteBuffer}} then deserialization will become 
significantly faster, but in order to utilize it we would have to change 
internal metastorage implementation a little bit (like optimizing memory usage 
in {{{}RocksDbKeyValueStorage#addDataToBatch{}}}).
If it requires too many changes then I propose doing it in a separate JIRA. My 
assumption - it will not require too many changes, but we'll see.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (IGNITE-22544) Commands marshalling appears to be slow

2024-06-24 Thread Ivan Bessonov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859613#comment-17859613
 ] 

Ivan Bessonov edited comment on IGNITE-22544 at 6/24/24 10:00 AM:
--

According to 
{{{}org.apache.ignite.internal.benchmarks.UpdateCommandsMarshalingMicroBenchmark{}}}.

Before:
{code:java}
Benchmark                                         (payloadSize)  (updateAll)   
Mode  Cnt     Score     Error   Units
UpdateCommandsMarshalingMicroBenchmark.marshal  128false  
thrpt5  2361.249 ±  66.884  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal  128 true  
thrpt552.377 ±   3.769  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 2048false  
thrpt5  1713.443 ± 331.795  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 2048 true  
thrpt514.916 ±   2.230  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 8192false  
thrpt5   833.372 ± 227.738  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 8192 true  
thrpt5 3.281 ±   0.906  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal128false  
thrpt5  2090.845 ± 792.226  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal128 true  
thrpt551.393 ±  16.872  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   2048false  
thrpt5  2188.459 ±  69.423  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   2048 true  
thrpt552.705 ±   2.771  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   8192false  
thrpt5  2174.810 ±  61.331  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   8192 true  
thrpt553.805 ±   1.000  ops/ms {code}
After:
{code:java}
Benchmark                                         (payloadSize)  (updateAll)   
Mode  Cnt     Score      Error   Units
UpdateCommandsMarshalingMicroBenchmark.marshal              128        false  
thrpt    5  4389.765 ±   66.332  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal              128         true  
thrpt    5    79.684 ±    0.965  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             2048        false  
thrpt    5  2754.506 ±   58.151  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             2048         true  
thrpt    5    17.435 ±    0.267  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             8192        false  
thrpt    5  1066.381 ±   10.254  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             8192         true  
thrpt    5     3.389 ±    0.688  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal            128        false  
thrpt    5  2782.648 ±  173.791  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal            128         true  
thrpt    5    69.952 ±    9.109  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           2048        false  
thrpt    5  2752.568 ±   50.796  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           2048         true  
thrpt    5    63.721 ±    2.902  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           8192        false  
thrpt    5  2676.343 ± 1209.184  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           8192         true  
thrpt    5    62.139 ±   17.144  ops/ms {code}
Short summary:
 * Depending on the number of byte arrays inside of the message (which can't be 
optimized), marshaling became from 0% to 85% faster according to created 
benchmark, due to a combination of a lot of different optimizations, such as
 ** avoiding the creation of serializers
 ** simpler and slightly faster byte buffers pool
 ** better binary UUID format
 ** low-level stuff in direct stream
 ** better {{writeVarInt}} / {{writeVarLong}}
 * If we take a look at the flamegraph, we could see that serialization itself 
is about 1.5-2.0 times slower than the following {{{}Arrays.copyOf{}}}, which 
is pretty good in my opinion.
 * Reading speed wasn't so thoroughly checked in this issue, I created another 
one: https://issues.apache.org/jira/browse/IGNITE-22559
Overall, reading speed doesn't depend on the size of individual byte buffers, 
because we simple wrap the original array. Other then that, current 
optimizations show 15%-35% increase in deserialization speed, due to
 ** {{...StreamImplV1}} optimizations
 ** faster {{readInt}} / {{readLong}}
 ** better binary UUID format

 * Further optimizations for reads are required. Here I mostly focused on 
writing speed. Reading speed turned out to be worse than writing speed for 
small commands, I don't like it.


was (Author: ibessonov):
According to 
{{{}org.apache.ignite.internal.benchmarks.UpdateCommandsMarshalingMicroBenchmark{}}}.

Before:
{code:java}
Benchmark           

[jira] [Comment Edited] (IGNITE-22544) Commands marshalling appears to be slow

2024-06-24 Thread Ivan Bessonov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859613#comment-17859613
 ] 

Ivan Bessonov edited comment on IGNITE-22544 at 6/24/24 9:59 AM:
-

According to 
{{{}org.apache.ignite.internal.benchmarks.UpdateCommandsMarshalingMicroBenchmark{}}}.

Before:
{code:java}
Benchmark                                         (payloadSize)  (updateAll)   
Mode  Cnt     Score     Error   Units
UpdateCommandsMarshalingMicroBenchmark.marshal  128false  
thrpt5  2361.249 ±  66.884  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal  128 true  
thrpt552.377 ±   3.769  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 2048false  
thrpt5  1713.443 ± 331.795  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 2048 true  
thrpt514.916 ±   2.230  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 8192false  
thrpt5   833.372 ± 227.738  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 8192 true  
thrpt5 3.281 ±   0.906  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal128false  
thrpt5  2090.845 ± 792.226  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal128 true  
thrpt551.393 ±  16.872  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   2048false  
thrpt5  2188.459 ±  69.423  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   2048 true  
thrpt552.705 ±   2.771  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   8192false  
thrpt5  2174.810 ±  61.331  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   8192 true  
thrpt553.805 ±   1.000  ops/ms {code}
After:
{code:java}
Benchmark                                         (payloadSize)  (updateAll)   
Mode  Cnt     Score      Error   Units
UpdateCommandsMarshalingMicroBenchmark.marshal              128        false  
thrpt    5  4389.765 ±   66.332  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal              128         true  
thrpt    5    79.684 ±    0.965  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             2048        false  
thrpt    5  2754.506 ±   58.151  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             2048         true  
thrpt    5    17.435 ±    0.267  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             8192        false  
thrpt    5  1066.381 ±   10.254  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             8192         true  
thrpt    5     3.389 ±    0.688  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal            128        false  
thrpt    5  2782.648 ±  173.791  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal            128         true  
thrpt    5    69.952 ±    9.109  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           2048        false  
thrpt    5  2752.568 ±   50.796  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           2048         true  
thrpt    5    63.721 ±    2.902  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           8192        false  
thrpt    5  2676.343 ± 1209.184  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           8192         true  
thrpt    5    62.139 ±   17.144  ops/ms {code}
Short summary:
 * Depending on the number of byte arrays inside of the message (which can't be 
optimized), marshaling became from 0% to 85% faster according to created 
benchmark, due to a combination of a lot of different optimizations, such as
 ** avoiding the creation of serializers
 ** simpler and slightly faster byte buffers pool
 ** better binary UUID format
 ** low-level stuff in direct stream
 ** better {{writeVarInt}} / {{writeVarLong}}
 * If we take a look at the flamegraph, we could see that serialization itself 
is about 1.5-2.0 times slower than the following {{{}Arrays.copyOf{}}}, which 
is pretty good in my opinion.
 * Reading speed wasn't so thoroughly checked in this issue, I created another 
one: https://issues.apache.org/jira/browse/IGNITE-22559
Overall, reading speed doesn't depend on the size of individual byte buffers, 
because we simple wrap the original array. Other then that, current 
optimizations show 15%-35% increase in deserialization speed, due to
 ** {{...StreamImplV1}} optimizations
 ** faster {{readInt}} / {{readLong}}
 ** better binary UUID format

 *  
 * Further optimizations for reads are required. Here I mostly focused on 
writing speed. Reading speed turned out to be worse than writing speed for 
small commands, I don't like it.


was (Author: ibessonov):
According to 
{{{}org.apache.ignite.internal.benchmarks.UpdateCommandsMarshalingMicroBenchmark{}}}.

Before:
{code:java}
Benchmark        

[jira] [Updated] (IGNITE-22544) Commands marshalling appears to be slow

2024-06-24 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22544:
---
Reviewer: Philipp Shergalis

> Commands marshalling appears to be slow
> ---
>
> Key: IGNITE-22544
> URL: https://issues.apache.org/jira/browse/IGNITE-22544
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3, ignite3_performance
> Attachments: IGNITE-22544.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should benchmark the way we marshal commands using optimized marshaller 
> and make it faster. Some obvious places:
>  * byte buffers pool - we can replace queue with a manual implementation of 
> Treiber stack, it's trivial and doesn't use as many CAS/volatile operations
>  * new serializers are allocated every time, but they can be put into static 
> final constants instead, or cached in fields of corresponding factories
>  * we can create a serialization factory per group, not per message, this way 
> we will remove unnecessary indirection. Group factory can use {{{}switch{}}}, 
> like in Ignite 2, which would basically lead to static dispatch of 
> deserializer constructors and static access to serializers, instead of 
> dynamic dispatch (virtual call), which should be noticeably faster
>  * profiler might show other simple places, we must also compare 
> {{OptimizedMarshaller}} against other serialization algorithms in benchmarks
> EDIT: quick draft attached, it addresses points 1 and 2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IGNITE-22544) Commands marshalling appears to be slow

2024-06-24 Thread Ivan Bessonov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859613#comment-17859613
 ] 

Ivan Bessonov commented on IGNITE-22544:


According to 
{{{}org.apache.ignite.internal.benchmarks.UpdateCommandsMarshalingMicroBenchmark{}}}.

Before:

 
{code:java}
Benchmark                                         (payloadSize)  (updateAll)   
Mode  Cnt     Score      Error   Units
UpdateCommandsMarshalingMicroBenchmark.marshal  128false  
thrpt5  2361.249 ±  66.884  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal  128 true  
thrpt552.377 ±   3.769  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 2048false  
thrpt5  1713.443 ± 331.795  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 2048 true  
thrpt514.916 ±   2.230  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 8192false  
thrpt5   833.372 ± 227.738  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 8192 true  
thrpt5 3.281 ±   0.906  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal128false  
thrpt5  2090.845 ± 792.226  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal128 true  
thrpt551.393 ±  16.872  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   2048false  
thrpt5  2188.459 ±  69.423  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   2048 true  
thrpt552.705 ±   2.771  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   8192false  
thrpt5  2174.810 ±  61.331  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   8192 true  
thrpt553.805 ±   1.000  ops/ms {code}
After:

 

 
{code:java}
Benchmark                                         (payloadSize)  (updateAll)   
Mode  Cnt     Score      Error   Units
UpdateCommandsMarshalingMicroBenchmark.marshal              128        false  
thrpt    5  4389.765 ±   66.332  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal              128         true  
thrpt    5    79.684 ±    0.965  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             2048        false  
thrpt    5  2754.506 ±   58.151  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             2048         true  
thrpt    5    17.435 ±    0.267  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             8192        false  
thrpt    5  1066.381 ±   10.254  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             8192         true  
thrpt    5     3.389 ±    0.688  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal            128        false  
thrpt    5  2782.648 ±  173.791  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal            128         true  
thrpt    5    69.952 ±    9.109  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           2048        false  
thrpt    5  2752.568 ±   50.796  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           2048         true  
thrpt    5    63.721 ±    2.902  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           8192        false  
thrpt    5  2676.343 ± 1209.184  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           8192         true  
thrpt    5    62.139 ±   17.144  ops/ms {code}
Short summary:
 * Depending on the number of byte arrays inside of the message (which can't be 
optimized), marshaling became from 0% to 85% faster according to created 
benchmark, due to a combination of a lot of different optimizations, such as

 ** avoiding the creation of serializers
 ** simpler and slightly faster byte buffers pool
 ** better binary UUID format
 ** low-level stuff in direct stream
 ** better {{writeVarInt}} / {{writeVarLong}}
 * If we take a look at the flamegraph, we could see that serialization itself 
is about 1.5-2.0 times slower than the following {{{}Arrays.copyOf{}}}, which 
is pretty good in my opinion.
 * Reading speed wasn't so thoroughly checked in this issue, I created another 
one: https://issues.apache.org/jira/browse/IGNITE-22559
Overall, reading speed doesn't depend on the size of individual byte buffers, 
because we simple wrap the original array. Other then that, current 
optimizations show 15%-35% increase in deserialization speed, due to
 ** {{...StreamImplV1}} optimizations
 ** faster {{readInt}} / {{readLong}}
 ** better binary UUID format
 * Further optimizations for reads are required. Here I mostly focused on 
writing speed. Reading speed turned out to be worse than writing speed for 
small commands, I don't like it.

 

 

> Commands marshalling appears to be slow
> ---
>
> Key: IGNITE-22544
> URL: https://issues.apache.org/jira/browse/IGNITE-22544
> 

[jira] [Comment Edited] (IGNITE-22544) Commands marshalling appears to be slow

2024-06-24 Thread Ivan Bessonov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859613#comment-17859613
 ] 

Ivan Bessonov edited comment on IGNITE-22544 at 6/24/24 8:25 AM:
-

According to 
{{{}org.apache.ignite.internal.benchmarks.UpdateCommandsMarshalingMicroBenchmark{}}}.

Before:
{code:java}
Benchmark                                         (payloadSize)  (updateAll)   
Mode  Cnt     Score      Error   Units
UpdateCommandsMarshalingMicroBenchmark.marshal  128false  
thrpt5  2361.249 ±  66.884  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal  128 true  
thrpt552.377 ±   3.769  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 2048false  
thrpt5  1713.443 ± 331.795  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 2048 true  
thrpt514.916 ±   2.230  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 8192false  
thrpt5   833.372 ± 227.738  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal 8192 true  
thrpt5 3.281 ±   0.906  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal128false  
thrpt5  2090.845 ± 792.226  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal128 true  
thrpt551.393 ±  16.872  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   2048false  
thrpt5  2188.459 ±  69.423  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   2048 true  
thrpt552.705 ±   2.771  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   8192false  
thrpt5  2174.810 ±  61.331  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal   8192 true  
thrpt553.805 ±   1.000  ops/ms {code}
After:
{code:java}
Benchmark                                         (payloadSize)  (updateAll)   
Mode  Cnt     Score      Error   Units
UpdateCommandsMarshalingMicroBenchmark.marshal              128        false  
thrpt    5  4389.765 ±   66.332  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal              128         true  
thrpt    5    79.684 ±    0.965  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             2048        false  
thrpt    5  2754.506 ±   58.151  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             2048         true  
thrpt    5    17.435 ±    0.267  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             8192        false  
thrpt    5  1066.381 ±   10.254  ops/ms
UpdateCommandsMarshalingMicroBenchmark.marshal             8192         true  
thrpt    5     3.389 ±    0.688  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal            128        false  
thrpt    5  2782.648 ±  173.791  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal            128         true  
thrpt    5    69.952 ±    9.109  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           2048        false  
thrpt    5  2752.568 ±   50.796  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           2048         true  
thrpt    5    63.721 ±    2.902  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           8192        false  
thrpt    5  2676.343 ± 1209.184  ops/ms
UpdateCommandsMarshalingMicroBenchmark.unmarshal           8192         true  
thrpt    5    62.139 ±   17.144  ops/ms {code}
Short summary:
 * Depending on the number of byte arrays inside of the message (which can't be 
optimized), marshaling became from 0% to 85% faster according to created 
benchmark, due to a combination of a lot of different optimizations, such as

 * 
 ** avoiding the creation of serializers
 ** simpler and slightly faster byte buffers pool
 ** better binary UUID format
 ** low-level stuff in direct stream
 ** better {{writeVarInt}} / {{writeVarLong}}
 * If we take a look at the flamegraph, we could see that serialization itself 
is about 1.5-2.0 times slower than the following {{{}Arrays.copyOf{}}}, which 
is pretty good in my opinion.
 * Reading speed wasn't so thoroughly checked in this issue, I created another 
one: https://issues.apache.org/jira/browse/IGNITE-22559
Overall, reading speed doesn't depend on the size of individual byte buffers, 
because we simple wrap the original array. Other then that, current 
optimizations show 15%-35% increase in deserialization speed, due to
 ** {{...StreamImplV1}} optimizations
 ** faster {{readInt}} / {{readLong}}
 ** better binary UUID format
 * Further optimizations for reads are required. Here I mostly focused on 
writing speed. Reading speed turned out to be worse than writing speed for 
small commands, I don't like it.

 

 


was (Author: ibessonov):
According to 
{{{}org.apache.ignite.internal.benchmarks.UpdateCommandsMarshalingMicroBenchmark{}}}.

Before:

 
{code:java}
Benchmar

[jira] [Updated] (IGNITE-22559) Optimize raft command deserialization

2024-06-24 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22559:
---
Description: 
# We should benchmark readInt / readLong against protobuf, since it uses the 
same binary format
 # We should create much faster way of creating deserializers for messages. For 
example, we could generate "switch" statements like in Ignite 2. Both for 
creating message deserializer (compile time generation) and for message group 
deserialization factory  (runtime generation, because we don't know the list of 
factories)
 # We should get rid of serializers and deserializers as separate classes and 
move generated code into message implementation. This way we save on 
allocations and we don't create builder, which is also expensive, we should 
write directly into fields of target object like in Ignite 2.

> Optimize raft command deserialization
> -
>
> Key: IGNITE-22559
> URL: https://issues.apache.org/jira/browse/IGNITE-22559
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> # We should benchmark readInt / readLong against protobuf, since it uses the 
> same binary format
>  # We should create much faster way of creating deserializers for messages. 
> For example, we could generate "switch" statements like in Ignite 2. Both for 
> creating message deserializer (compile time generation) and for message group 
> deserialization factory  (runtime generation, because we don't know the list 
> of factories)
>  # We should get rid of serializers and deserializers as separate classes and 
> move generated code into message implementation. This way we save on 
> allocations and we don't create builder, which is also expensive, we should 
> write directly into fields of target object like in Ignite 2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22559) Optimize raft command deserialization

2024-06-24 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22559:
--

 Summary: Optimize raft command deserialization
 Key: IGNITE-22559
 URL: https://issues.apache.org/jira/browse/IGNITE-22559
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-22544) Commands marshalling appears to be slow

2024-06-24 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-22544:
--

Assignee: Ivan Bessonov

> Commands marshalling appears to be slow
> ---
>
> Key: IGNITE-22544
> URL: https://issues.apache.org/jira/browse/IGNITE-22544
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Attachments: IGNITE-22544.patch
>
>
> We should benchmark the way we marshal commands using optimized marshaller 
> and make it faster. Some obvious places:
>  * byte buffers pool - we can replace queue with a manual implementation of 
> Treiber stack, it's trivial and doesn't use as many CAS/volatile operations
>  * new serializers are allocated every time, but they can be put into static 
> final constants instead, or cached in fields of corresponding factories
>  * we can create a serialization factory per group, not per message, this way 
> we will remove unnecessary indirection. Group factory can use {{{}switch{}}}, 
> like in Ignite 2, which would basically lead to static dispatch of 
> deserializer constructors and static access to serializers, instead of 
> dynamic dispatch (virtual call), which should be noticeably faster
>  * profiler might show other simple places, we must also compare 
> {{OptimizedMarshaller}} against other serialization algorithms in benchmarks
> EDIT: quick draft attached, it addresses points 1 and 2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22544) Commands marshalling appears to be slow

2024-06-20 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22544:
---
Description: 
We should benchmark the way we marshal commands using optimized marshaller and 
make it faster. Some obvious places:
 * byte buffers pool - we can replace queue with a manual implementation of 
Treiber stack, it's trivial and doesn't use as many CAS/volatile operations
 * new serializers are allocated every time, but they can be put into static 
final constants instead, or cached in fields of corresponding factories
 * we can create a serialization factory per group, not per message, this way 
we will remove unnecessary indirection. Group factory can use {{{}switch{}}}, 
like in Ignite 2, which would basically lead to static dispatch of deserializer 
constructors and static access to serializers, instead of dynamic dispatch 
(virtual call), which should be noticeably faster
 * profiler might show other simple places, we must also compare 
{{OptimizedMarshaller}} against other serialization algorithms in benchmarks

EDIT: quick draft attached, it addresses points 1 and 2.

  was:
We should benchmark the way we marshal commands using optimized marshaller and 
make it faster. Some obvious places:
 * byte buffers pool - we can replace queue with a manual implementation of 
Treiber stack, it's trivial and doesn't use as many CAS/volatile operations
 * new serializers are allocated every time, but they can be put into static 
final constants instead, or cached in fields of corresponding factories
 * we can create a serialization factory per group, not per message, this way 
we will remove unnecessary indirection. Group factory can use {{{}switch{}}}, 
like in Ignite 2, which would basically lead to static dispatch of deserializer 
constructors and static access to serializers, instead of dynamic dispatch 
(virtual call), which should be noticeably faster
 * profiler might show other simple places, we must also compare 
{{OptimizedMarshaller}} against other serialization algorithms in benchmarks


> Commands marshalling appears to be slow
> ---
>
> Key: IGNITE-22544
> URL: https://issues.apache.org/jira/browse/IGNITE-22544
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Attachments: IGNITE-22544.patch
>
>
> We should benchmark the way we marshal commands using optimized marshaller 
> and make it faster. Some obvious places:
>  * byte buffers pool - we can replace queue with a manual implementation of 
> Treiber stack, it's trivial and doesn't use as many CAS/volatile operations
>  * new serializers are allocated every time, but they can be put into static 
> final constants instead, or cached in fields of corresponding factories
>  * we can create a serialization factory per group, not per message, this way 
> we will remove unnecessary indirection. Group factory can use {{{}switch{}}}, 
> like in Ignite 2, which would basically lead to static dispatch of 
> deserializer constructors and static access to serializers, instead of 
> dynamic dispatch (virtual call), which should be noticeably faster
>  * profiler might show other simple places, we must also compare 
> {{OptimizedMarshaller}} against other serialization algorithms in benchmarks
> EDIT: quick draft attached, it addresses points 1 and 2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22544) Commands marshalling appears to be slow

2024-06-20 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22544:
---
Attachment: IGNITE-22544.patch

> Commands marshalling appears to be slow
> ---
>
> Key: IGNITE-22544
> URL: https://issues.apache.org/jira/browse/IGNITE-22544
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
> Attachments: IGNITE-22544.patch
>
>
> We should benchmark the way we marshal commands using optimized marshaller 
> and make it faster. Some obvious places:
>  * byte buffers pool - we can replace queue with a manual implementation of 
> Treiber stack, it's trivial and doesn't use as many CAS/volatile operations
>  * new serializers are allocated every time, but they can be put into static 
> final constants instead, or cached in fields of corresponding factories
>  * we can create a serialization factory per group, not per message, this way 
> we will remove unnecessary indirection. Group factory can use {{{}switch{}}}, 
> like in Ignite 2, which would basically lead to static dispatch of 
> deserializer constructors and static access to serializers, instead of 
> dynamic dispatch (virtual call), which should be noticeably faster
>  * profiler might show other simple places, we must also compare 
> {{OptimizedMarshaller}} against other serialization algorithms in benchmarks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22542) Synchronous message handling on local node

2024-06-20 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22542:
---
Description: 
{{org.apache.ignite.internal.network.DefaultMessagingService#isSelf}} - if we 
detect that we send a message to the local node, we handle it immediately in 
the same thread, which could be very bed for throughput of the system.

"send"/"invoke" themselves appear to be slow as well, we should benchmark them. 
We should remove instantiation of InetSocketAddress for sure, if it's possible, 
it takes time to resolve it. Maybe we should create it unresolved or just cache 
it like in Ignite 2.

  was:
{{org.apache.ignite.internal.network.DefaultMessagingService#isSelf}} - if we 
detect that we send a message to the local node, we handle it immediately in 
the same thread, which could be very bed for throughput of the system.

"send"/"invoke" themselves appear to be slow as well, we should benchmark them.


> Synchronous message handling on local node
> --
>
> Key: IGNITE-22542
> URL: https://issues.apache.org/jira/browse/IGNITE-22542
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> {{org.apache.ignite.internal.network.DefaultMessagingService#isSelf}} - if we 
> detect that we send a message to the local node, we handle it immediately in 
> the same thread, which could be very bed for throughput of the system.
> "send"/"invoke" themselves appear to be slow as well, we should benchmark 
> them. We should remove instantiation of InetSocketAddress for sure, if it's 
> possible, it takes time to resolve it. Maybe we should create it unresolved 
> or just cache it like in Ignite 2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22544) Commands marshalling appears to be slow

2024-06-20 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22544:
--

 Summary: Commands marshalling appears to be slow
 Key: IGNITE-22544
 URL: https://issues.apache.org/jira/browse/IGNITE-22544
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Bessonov


We should benchmark the way we marshal commands using optimized marshaller and 
make it faster. Some obvious places:
 * byte buffers pool - we can replace queue with a manual implementation of 
Treiber stack, it's trivial and doesn't use as many CAS/volatile operations
 * new serializers are allocated every time, but they can be put into static 
final constants instead, or cached in fields of corresponding factories
 * we can create a serialization factory per group, not per message, this way 
we will remove unnecessary indirection. Group factory can use {{{}switch{}}}, 
like in Ignite 2, which would basically lead to static dispatch of deserializer 
constructors and static access to serializers, instead of dynamic dispatch 
(virtual call), which should be noticeably faster
 * profiler might show other simple places, we must also compare 
{{OptimizedMarshaller}} against other serialization algorithms in benchmarks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-22542) Synchronous message handling on local node

2024-06-20 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-22542:
---
Ignite Flags:   (was: Docs Required,Release Notes Required)

> Synchronous message handling on local node
> --
>
> Key: IGNITE-22542
> URL: https://issues.apache.org/jira/browse/IGNITE-22542
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> {{org.apache.ignite.internal.network.DefaultMessagingService#isSelf}} - if we 
> detect that we send a message to the local node, we handle it immediately in 
> the same thread, which could be very bed for throughput of the system.
> "send"/"invoke" themselves appear to be slow as well, we should benchmark 
> them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22542) Synchronous message handling on local node

2024-06-20 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22542:
--

 Summary: Synchronous message handling on local node
 Key: IGNITE-22542
 URL: https://issues.apache.org/jira/browse/IGNITE-22542
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov


{{org.apache.ignite.internal.network.DefaultMessagingService#isSelf}} - if we 
detect that we send a message to the local node, we handle it immediately in 
the same thread, which could be very bed for throughput of the system.

"send"/"invoke" themselves appear to be slow as well, we should benchmark them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-22500) Remove unnecessary waits when creating an index

2024-06-19 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-22500:
--

Assignee: Ivan Bessonov

> Remove unnecessary waits when creating an index
> ---
>
> Key: IGNITE-22500
> URL: https://issues.apache.org/jira/browse/IGNITE-22500
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Roman Puchkovskiy
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> When creating an index with current defaults (DelayDuration=1sec, 
> MaxClockSkew=500ms, IdleSafeTimePropagationPeriod=1sec), it takes 6-6.5 
> seconds on my machine (without concurrent transactions, on an empty table 
> that was just created).
> According to the design, we need to first wait for the REGISTERED state to 
> activate on all nodes, including the ones that are currently down; this is to 
> make sure that all transactions started on schema versions before the index 
> creation have finished before we start to build the index (this makes us 
> waiting for DelayDuration+MaxClockSkew). Then, after the build finishes, we 
> switch the index to the AVAILABLE state. This requires another wait of 
> DelayDuration+MaxClockSkew.
> Because of IGNITE-20378, in the second case we actually wait longer (for 
> additional IdleSafeTimePropagationPeriod+MaxClockSkew).
> The total of waits is thus 1.5+3=4.5sec. But index creation actually takes 
> 6-6.5 seconds. It looks like there are some additional delays (like 
> submitting to the Metastorage and executing its watches).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-21661) Test scenario where all stable nodes are lost during a partially completed rebalance

2024-06-19 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-21661:
---
Reviewer: Kirill Tkalenko

> Test scenario where all stable nodes are lost during a partially completed 
> rebalance
> 
>
> Key: IGNITE-21661
> URL: https://issues.apache.org/jira/browse/IGNITE-21661
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Following case is possible:
>  * Nodes A, B and C for a partition
>  * B and C go offline
>  * new distribution is A, D and E
>  * EDIT: rebalance can only be started with one more "resetPartitions"
>  * full state transfer from A to D is completed
>  * full state transfer from A to E is not
>  * A goes offline
>  * we perform "resetPartitions"
> Ideally, we should use D as a new leader somehow, but the bare minimum should 
> be a partition that is functional, maybe an empty one. We should test the case
>  
> This might be a good place to add more tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (IGNITE-22502) Change default DelayDuration to 500ms

2024-06-19 Thread Ivan Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-22502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov reassigned IGNITE-22502:
--

Assignee: Ivan Bessonov

> Change default DelayDuration to 500ms
> -
>
> Key: IGNITE-22502
> URL: https://issues.apache.org/jira/browse/IGNITE-22502
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Roman Puchkovskiy
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: ignite-3
>
> When executing a DDL, we must wait for DelayDuration+MaxClockSkew. 
> DelayDuration for small clusters (which will probably be the usual mode of 
> operation) does not need to be long, so it makes sense to lower the default 
> from 1 second to 0.5 second.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IGNITE-22509) Deadlock during the node stop

2024-06-14 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-22509:
--

 Summary: Deadlock during the node stop
 Key: IGNITE-22509
 URL: https://issues.apache.org/jira/browse/IGNITE-22509
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Bessonov


{code:java}
"%itcskvt_n_1%Raft-Group-Client-1@51623" prio=5 tid=0x4a6e nid=NA waiting for 
monitor entry
  java.lang.Thread.State: BLOCKED
     waiting for main@1 to release lock on <0xca23> (a 
org.apache.ignite.internal.app.LifecycleManager)
      at 
org.apache.ignite.internal.app.LifecycleManager.lambda$allComponentsStartFuture$1(LifecycleManager.java:130)
      at 
org.apache.ignite.internal.app.LifecycleManager$$Lambda$2852.843214322.accept(Unknown
 Source:-1)
      at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
      at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
      at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
      at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
      at 
org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:550)
      at 
org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$handleThrowable$41(RaftGroupServiceImpl.java:605)
      at 
org.apache.ignite.internal.raft.RaftGroupServiceImpl$$Lambda$5439.1444714785.run(Unknown
 Source:-1)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
      at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264)
      at java.util.concurrent.FutureTask.run(FutureTask.java:-1)
      at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
      at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
      at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
      at java.lang.Thread.run(Thread.java:829)
 {code}
Holds busy lock in {{{}RaftGroupServiceImpl.sendWithRetry{}}}.
{code:java}
"main@1" prio=5 tid=0x1 nid=NA sleeping
  java.lang.Thread.State: TIMED_WAITING
     blocks %itcskvt_n_1%Raft-Group-Client-1@51623
      at java.lang.Thread.sleep(Thread.java:-1)
      at 
org.apache.ignite.internal.util.IgniteSpinReadWriteLock.writeLock(IgniteSpinReadWriteLock.java:255)
      at 
org.apache.ignite.internal.util.IgniteSpinBusyLock.block(IgniteSpinBusyLock.java:68)
      at 
org.apache.ignite.internal.raft.RaftGroupServiceImpl.shutdown(RaftGroupServiceImpl.java:491)
      at 
org.apache.ignite.internal.metastorage.impl.MetaStorageServiceContext.close(MetaStorageServiceContext.java:75)
      at 
org.apache.ignite.internal.metastorage.impl.MetaStorageServiceImpl.close(MetaStorageServiceImpl.java:272)
      at 
org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl$$Lambda$5148.891107.accept(Unknown
 Source:-1)
      at 
org.apache.ignite.internal.util.IgniteUtils.cancelOrConsume(IgniteUtils.java:967)
      at 
org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl.lambda$stopAsync$13(MetaStorageManagerImpl.java:452)
      at 
org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl$$Lambda$5141.633101377.close(Unknown
 Source:-1)
      at 
org.apache.ignite.internal.util.IgniteUtils.lambda$closeAllManually$1(IgniteUtils.java:611)
      at 
org.apache.ignite.internal.util.IgniteUtils$$Lambda$4822.1427077270.accept(Unknown
 Source:-1)
      at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
      at 
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
      at 
java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
      at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
      at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
      at 
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
      at 
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
      at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
      at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
      at 
org.apache.ignite.internal.util.IgniteUtils.closeAllManually(IgniteUtils.java:609)
      at 
org.apache.ignite.internal.util.IgniteUtils.closeAllManually(IgniteUtils.java:643)
      at 
org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl.stopAsync(MetaStorageManagerImpl.java:449)
      at 
org.apache.ignite.internal.util.IgniteUtils.lambda$stopAsync$6(IgniteUtils.java:1213)
      at 
org.apache.ignite.internal.util.IgniteUtils$$Lambda$5013.753691797.apply(Unknown
 Source:-1)
      at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
      at 
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipe

  1   2   3   4   5   6   7   8   9   10   >