[jira] [Commented] (IGNITE-23413) Catalog compaction. Component to determine minimum catalog version required by rebalance.
[ https://issues.apache.org/jira/browse/IGNITE-23413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17895644#comment-17895644 ] Ivan Bessonov commented on IGNITE-23413: What I think needs to be done: * We should fix current local meta-storage version and use it for further reads. * We should read all pending and planned assignments, there's a timestamp stored in each of them. We calculate the minimal one. * If there are no pending and planned assignments, we might return the timestamp that's associated with meta-storage revision. This approach almost works, there are situations with nuances. These are the cases where we're in-between some operation. Let's examine them: * Zone is created/altered, but assignments are not yet saved. This is the case when assignments timestamp is below latest zone's timestamp. ** For ALTER, returning an older timestamp from assignments is not a problem, it'll eventually become more recent. ** For CREATE, we should probably determine if assignments are not yet saved, and use the timestamp from catalog. * A list of data nodes is updated, but assignments are not yet re-calculated because of timeout. ** Current code uses "latest" catalog state when it transforms data nodes into assignments, so it is safe to use timestamp of latest catalog version. There's a Jira that aims to fix it: https://issues.apache.org/jira/browse/IGNITE-22723. This means that current approach might not work in the future. Anyway, considering everything from the above, there's one situation that we must keep in mind: * DZ is updated at catalog version 15. * Assignments are calculated for the same exact catalog version. * Nothing changes for a long time. Several days, for example. * During that time, catalog version increases and becomes 75, for example. If nothing changes, we should be able to remove versions 15-74, because DZ settings from versions 15 and 75 are identical. It seems like the proposed algorithm works exactly like we need. > Catalog compaction. Component to determine minimum catalog version required > by rebalance. > - > > Key: IGNITE-23413 > URL: https://issues.apache.org/jira/browse/IGNITE-23413 > Project: Ignite > Issue Type: Improvement >Reporter: Pavel Pereslegin >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > Each rebalance procedure uses specific catalog version, it "holds" the > timestamp corresponding to the latest (at the moment of rebalancing start) > version of the catalog > To be able safely perform catalog compaction, we need to design and implement > a component that can determine the minimum version required for active > rebalances (to avoid deleting this version during compaction). > {code:java} > interface RebalanceMinimumRequiredTimeProvider { > /** > * Returns the minimum time required for rebalance, > * or current timestamp if there are no active > * rebalances and there is a guarantee that all rebalances > * launched in the future will use catalog version > * corresponding to the current time or greater. > */ > long minimumRequiredTime(); > } > {code} > The component can be either global or local (whichever is easier to > implement). This means that the compaction procedure can call the component > on all nodes in the cluster and calculate the minimum. > The component must be able to track rebalances that may be triggered during > "replay" of the metastorage raft log. > The component should return only monotonically increasing values. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-23582) Xmx and GC options are ignored on startup
[ https://issues.apache.org/jira/browse/IGNITE-23582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-23582: --- Labels: ignite-3 (was: ) > Xmx and GC options are ignored on startup > - > > Key: IGNITE-23582 > URL: https://issues.apache.org/jira/browse/IGNITE-23582 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Zlenko >Assignee: Ivan Zlenko >Priority: Critical > Labels: ignite-3 > Fix For: 3.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Even though we can set values for Xmx and GC options in vars.env they are not > applied to the service. > We need to fix this. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-23588) B+Tree corruption during concurrent removes
[ https://issues.apache.org/jira/browse/IGNITE-23588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-23588: --- Description: {{ItBplusTreePersistentPageMemoryTest#testMassiveRemove2_true}} fails on TC sometimes. Can be reproduced locally. In order to make it faster, data region can be reduced to 32Mb, number of threads to 16 and number of keys to about 8000 (I used {{{}threads*50{}}}). It's not clear how precisely it happens, but during a remove the {{needReplaceInner}} logic does not work as it should sometimes, leading to an inner node that holds an obsolete key. Must be fixed in both Ignite 2 and Ignite 3. {code:java} [org.apache.ignite.internal.pagememory.tree.persistence.ItBplusTreePersistentPageMemoryTest.testMassiveRemove2_true()] org.opentest4j.AssertionFailedError: Removed row: 683[11:44:19][org.apache.ignite.internal.pagememory.tree.persistence.ItBplusTreePersistentPageMemoryTest.testMassiveRemove2_true()] org.opentest4j.AssertionFailedError: Removed row: 683 at app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.checkNotRemoved(AbstractBplusTreePageMemoryTest.java:2948) at app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.getLookupRow(AbstractBplusTreePageMemoryTest.java:2965) at app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.getLookupRow(AbstractBplusTreePageMemoryTest.java:2918) at app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$TestTree.compare(AbstractBplusTreePageMemoryTest.java:2844) at app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$TestTree.compare(AbstractBplusTreePageMemoryTest.java:2787) at app//org.apache.ignite.internal.pagememory.tree.BplusTree.compare(BplusTree.java:5748) at app//org.apache.ignite.internal.pagememory.tree.BplusTree.findInsertionPoint(BplusTree.java:5652) at app//org.apache.ignite.internal.pagememory.tree.BplusTree$Search.run0(BplusTree.java:398) at app//org.apache.ignite.internal.pagememory.tree.BplusTree$GetPageHandler.run(BplusTree.java:6422) at app//org.apache.ignite.internal.pagememory.tree.BplusTree$Search.run(BplusTree.java:370) at app//org.apache.ignite.internal.pagememory.tree.BplusTree$GetPageHandler.run(BplusTree.java:6398) at app//org.apache.ignite.internal.pagememory.util.PageHandler.readPage(PageHandler.java:157) at app//org.apache.ignite.internal.pagememory.datastructure.DataStructure.read(DataStructure.java:391) at app//org.apache.ignite.internal.pagememory.tree.BplusTree.read(BplusTree.java:6639) at app//org.apache.ignite.internal.pagememory.tree.BplusTree.removeDown(BplusTree.java:2300) at app//org.apache.ignite.internal.pagememory.tree.BplusTree.removeDown(BplusTree.java:2320) at app//org.apache.ignite.internal.pagememory.tree.BplusTree.removeDown(BplusTree.java:2320) at app//org.apache.ignite.internal.pagememory.tree.BplusTree.doRemove(BplusTree.java:2238) at app//org.apache.ignite.internal.pagememory.tree.BplusTree.remove(BplusTree.java:2067) at app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest.lambda$doTestMassiveRemove$0(AbstractBplusTreePageMemoryTest.java:894) at app//org.apache.ignite.internal.testframework.IgniteTestUtils.lambda$runMultiThreaded$2(IgniteTestUtils.java:569) at java.base@11.0.17/java.lang.Thread.run(Thread.java:834) {code} was: ItBplusTreePersistentPageMemoryTest#testMassiveRemove2_true fails on TC sometimes. Can be reproduced locally. In order to make it faster, data region can be reduced to 32Mb, number of threads to 16 and number of keys to about 8000 (I used {{{}threads*50{}}}). It's not clear how precisely it happens, but during a remove the {{needReplaceInner}} logic does not work as it should sometimes, leading to an inner node that holds an obsolete key. Must be fixed in both Ignite 2 and Ignite 3. {code:java} [org.apache.ignite.internal.pagememory.tree.persistence.ItBplusTreePersistentPageMemoryTest.testMassiveRemove2_true()] org.opentest4j.AssertionFailedError: Removed row: 683[11:44:19][org.apache.ignite.internal.pagememory.tree.persistence.ItBplusTreePersistentPageMemoryTest.testMassiveRemove2_true()] org.opentest4j.AssertionFailedError: Removed row: 683 at app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.checkNotRemoved(AbstractBplusTreePageMemoryTest.java:2948) at app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.getLookupRow(AbstractBplusTreePageMemoryTest.java:2965) at app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.getLookupR
[jira] [Updated] (IGNITE-23588) B+Tree corruption during concurrent removes
[ https://issues.apache.org/jira/browse/IGNITE-23588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-23588: --- Description: ItBplusTreePersistentPageMemoryTest#testMassiveRemove2_true fails on TC sometimes. Can be reproduced locally. In order to make it faster, data region can be reduced to 32Mb, number of threads to 16 and number of keys to about 8000 (I used {{{}threads*50{}}}). It's not clear how precisely it happens, but during a remove the {{needReplaceInner}} logic does not work as it should sometimes, leading to an inner node that holds an obsolete key. Must be fixed in both Ignite 2 and Ignite 3. {code:java} [org.apache.ignite.internal.pagememory.tree.persistence.ItBplusTreePersistentPageMemoryTest.testMassiveRemove2_true()] org.opentest4j.AssertionFailedError: Removed row: 683[11:44:19][org.apache.ignite.internal.pagememory.tree.persistence.ItBplusTreePersistentPageMemoryTest.testMassiveRemove2_true()] org.opentest4j.AssertionFailedError: Removed row: 683 at app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.checkNotRemoved(AbstractBplusTreePageMemoryTest.java:2948) at app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.getLookupRow(AbstractBplusTreePageMemoryTest.java:2965) at app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.getLookupRow(AbstractBplusTreePageMemoryTest.java:2918) at app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$TestTree.compare(AbstractBplusTreePageMemoryTest.java:2844) at app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$TestTree.compare(AbstractBplusTreePageMemoryTest.java:2787) at app//org.apache.ignite.internal.pagememory.tree.BplusTree.compare(BplusTree.java:5748) at app//org.apache.ignite.internal.pagememory.tree.BplusTree.findInsertionPoint(BplusTree.java:5652) at app//org.apache.ignite.internal.pagememory.tree.BplusTree$Search.run0(BplusTree.java:398) at app//org.apache.ignite.internal.pagememory.tree.BplusTree$GetPageHandler.run(BplusTree.java:6422) at app//org.apache.ignite.internal.pagememory.tree.BplusTree$Search.run(BplusTree.java:370) at app//org.apache.ignite.internal.pagememory.tree.BplusTree$GetPageHandler.run(BplusTree.java:6398) at app//org.apache.ignite.internal.pagememory.util.PageHandler.readPage(PageHandler.java:157) at app//org.apache.ignite.internal.pagememory.datastructure.DataStructure.read(DataStructure.java:391) at app//org.apache.ignite.internal.pagememory.tree.BplusTree.read(BplusTree.java:6639) at app//org.apache.ignite.internal.pagememory.tree.BplusTree.removeDown(BplusTree.java:2300) at app//org.apache.ignite.internal.pagememory.tree.BplusTree.removeDown(BplusTree.java:2320) at app//org.apache.ignite.internal.pagememory.tree.BplusTree.removeDown(BplusTree.java:2320) at app//org.apache.ignite.internal.pagememory.tree.BplusTree.doRemove(BplusTree.java:2238) at app//org.apache.ignite.internal.pagememory.tree.BplusTree.remove(BplusTree.java:2067) at app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest.lambda$doTestMassiveRemove$0(AbstractBplusTreePageMemoryTest.java:894) at app//org.apache.ignite.internal.testframework.IgniteTestUtils.lambda$runMultiThreaded$2(IgniteTestUtils.java:569) at java.base@11.0.17/java.lang.Thread.run(Thread.java:834) {code} was: ItBplusTreePersistentPageMemoryTest#testMassiveRemove2_true fails on TC sometimes. Can be reproduced locally. In order to make it faster, data region can be reduced to 32Mb, number of threads to 16 and number of keys to about 8000 (I used {{{}threads*50{}}}). It's not clear how precisely it happens, but during a remove the {{needReplaceInner}} logic does not work as it should sometimes, leading to an inner node that holds an obsolete key. Must be fixed in both Ignite 2 and Ignite 3. {code:java} [org.apache.ignite.internal.pagememory.tree.persistence.ItBplusTreePersistentPageMemoryTest.testMassiveRemove2_true()] org.opentest4j.AssertionFailedError: Removed row: 683[11:44:19][org.apache.ignite.internal.pagememory.tree.persistence.ItBplusTreePersistentPageMemoryTest.testMassiveRemove2_true()] org.opentest4j.AssertionFailedError: Removed row: 683 at app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.checkNotRemoved(AbstractBplusTreePageMemoryTest.java:2948) at app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.getLookupRow(AbstractBplusTreePageMemoryTest.java:2965) at app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.getLookupRow(Ab
[jira] [Created] (IGNITE-23588) B+Tree corruption during concurrent removes
Ivan Bessonov created IGNITE-23588: -- Summary: B+Tree corruption during concurrent removes Key: IGNITE-23588 URL: https://issues.apache.org/jira/browse/IGNITE-23588 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov ItBplusTreePersistentPageMemoryTest#testMassiveRemove2_true fails on TC sometimes. Can be reproduced locally. In order to make it faster, data region can be reduced to 32Mb, number of threads to 16 and number of keys to about 8000 (I used {{{}threads*50{}}}). It's not clear how precisely it happens, but during a remove the {{needReplaceInner}} logic does not work as it should sometimes, leading to an inner node that holds an obsolete key. Must be fixed in both Ignite 2 and Ignite 3. {code:java} [org.apache.ignite.internal.pagememory.tree.persistence.ItBplusTreePersistentPageMemoryTest.testMassiveRemove2_true()] org.opentest4j.AssertionFailedError: Removed row: 683[11:44:19][org.apache.ignite.internal.pagememory.tree.persistence.ItBplusTreePersistentPageMemoryTest.testMassiveRemove2_true()] org.opentest4j.AssertionFailedError: Removed row: 683 at app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.checkNotRemoved(AbstractBplusTreePageMemoryTest.java:2948) at app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.getLookupRow(AbstractBplusTreePageMemoryTest.java:2965) at app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$LongInnerIo.getLookupRow(AbstractBplusTreePageMemoryTest.java:2918) at app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$TestTree.compare(AbstractBplusTreePageMemoryTest.java:2844) at app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest$TestTree.compare(AbstractBplusTreePageMemoryTest.java:2787) at app//org.apache.ignite.internal.pagememory.tree.BplusTree.compare(BplusTree.java:5748) at app//org.apache.ignite.internal.pagememory.tree.BplusTree.findInsertionPoint(BplusTree.java:5652) at app//org.apache.ignite.internal.pagememory.tree.BplusTree$Search.run0(BplusTree.java:398) at app//org.apache.ignite.internal.pagememory.tree.BplusTree$GetPageHandler.run(BplusTree.java:6422) at app//org.apache.ignite.internal.pagememory.tree.BplusTree$Search.run(BplusTree.java:370) at app//org.apache.ignite.internal.pagememory.tree.BplusTree$GetPageHandler.run(BplusTree.java:6398) at app//org.apache.ignite.internal.pagememory.util.PageHandler.readPage(PageHandler.java:157) at app//org.apache.ignite.internal.pagememory.datastructure.DataStructure.read(DataStructure.java:391) at app//org.apache.ignite.internal.pagememory.tree.BplusTree.read(BplusTree.java:6639) at app//org.apache.ignite.internal.pagememory.tree.BplusTree.removeDown(BplusTree.java:2300) at app//org.apache.ignite.internal.pagememory.tree.BplusTree.removeDown(BplusTree.java:2320) at app//org.apache.ignite.internal.pagememory.tree.BplusTree.removeDown(BplusTree.java:2320) at app//org.apache.ignite.internal.pagememory.tree.BplusTree.doRemove(BplusTree.java:2238) at app//org.apache.ignite.internal.pagememory.tree.BplusTree.remove(BplusTree.java:2067) at app//org.apache.ignite.internal.pagememory.tree.AbstractBplusTreePageMemoryTest.lambda$doTestMassiveRemove$0(AbstractBplusTreePageMemoryTest.java:894) at app//org.apache.ignite.internal.testframework.IgniteTestUtils.lambda$runMultiThreaded$2(IgniteTestUtils.java:569) at java.base@11.0.17/java.lang.Thread.run(Thread.java:834) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-23549) MetaStorageListener doesn't flush on snapshot creation
Ivan Bessonov created IGNITE-23549: -- Summary: MetaStorageListener doesn't flush on snapshot creation Key: IGNITE-23549 URL: https://issues.apache.org/jira/browse/IGNITE-23549 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Just like partitions, we need to force the flushing of metastorage data to the storage, so that log won't be truncated before data is persisted. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-23547) Limit meta-storage log storage size
Ivan Bessonov created IGNITE-23547: -- Summary: Limit meta-storage log storage size Key: IGNITE-23547 URL: https://issues.apache.org/jira/browse/IGNITE-23547 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Technically, this issue affects all raft logs, but let's start with a meta-storage. Given the constant background update of meta-storage data, it always keeps growing. The resulting log size might easily exceed several gigabytes, if cluster worked for a few hours, which is too much for service data. We should make raft snapshot frequency configurable, and provide a higher default value (more frequent). Also, it would be nice to come up with size limits, as we do for WAL in Ignite 2.x. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-23550) Test and optimize metastorage snapshot transfer and recovery speed for new nodes
Ivan Bessonov created IGNITE-23550: -- Summary: Test and optimize metastorage snapshot transfer and recovery speed for new nodes Key: IGNITE-23550 URL: https://issues.apache.org/jira/browse/IGNITE-23550 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Test and optimize metastorage snapshot transfer and recovery speed for new nodes. Let's assume that we have a 100Mb+ meta-storage snapshot and 100k+ entries in raft log replicated as log. How long would it take for a new node to join the cluster under these conditions? Will something break? What can we do to make it work? Goal is - the joining process should work for a long-running clusters. It should be pretty fast as well. Less than 10 seconds for sure, of course depending on the network capabilities. No timeout errors should occur if it takes more than 10 seconds. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-23548) Investigate the project for missing logs of unexpected "Throwable"
Ivan Bessonov created IGNITE-23548: -- Summary: Investigate the project for missing logs of unexpected "Throwable" Key: IGNITE-23548 URL: https://issues.apache.org/jira/browse/IGNITE-23548 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Primary candidates: * {{{}netty{}}}, there's no default handler * some thread-pools might not have default handlers too ** this includes disruptors -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-23413) Catalog compaction. Component to determine minimum catalog version required by rebalance.
[ https://issues.apache.org/jira/browse/IGNITE-23413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov reassigned IGNITE-23413: -- Assignee: Ivan Bessonov > Catalog compaction. Component to determine minimum catalog version required > by rebalance. > - > > Key: IGNITE-23413 > URL: https://issues.apache.org/jira/browse/IGNITE-23413 > Project: Ignite > Issue Type: Improvement >Reporter: Pavel Pereslegin >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > Each rebalance procedure uses specific catalog version, it "holds" the > timestamp corresponding to the latest (at the moment of rebalancing start) > version of the catalog > To be able safely perform catalog compaction, we need to design and implement > a component that can determine the minimum version required for active > rebalances (to avoid deleting this version during compaction). > {code:java} > interface RebalanceMinimumRequiredTimeProvider { > /** > * Returns the minimum time required for rebalance, > * or current timestamp if there are no active > * rebalances and there is a guarantee that all rebalances > * launched in the future will use catalog version > * corresponding to the current time or greater. > */ > long minimumRequiredTime(); > } > {code} > The component can be either global or local (whichever is easier to > implement). This means that the compaction procedure can call the component > on all nodes in the cluster and calculate the minimum. > The component must be able to track rebalances that may be triggered during > "replay" of the metastorage raft log. > The component should return only monotonically increasing values. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-23128) NullPointerException if non-existent profile passed into zone and table
[ https://issues.apache.org/jira/browse/IGNITE-23128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov reassigned IGNITE-23128: -- Assignee: Roman Puchkovskiy > NullPointerException if non-existent profile passed into zone and table > > > Key: IGNITE-23128 > URL: https://issues.apache.org/jira/browse/IGNITE-23128 > Project: Ignite > Issue Type: Bug >Reporter: Ivan Zlenko >Assignee: Roman Puchkovskiy >Priority: Major > Labels: ignite-3 > > If we try to create a table with a zone containing a profile that not > described in the node configuration we will get NullPointerException trying > to create such table. > To reproduce you need to do following: > 1. Execute command > {code:sql} > create zone test with storage_profiles='IAmAPhantomProfile' > {code} > Where IAmAPhantomProfile is not described in storage.profiles section of node > configuration. > 2. Execute command > {code:sql} > create table test (I int) with storage_profile='IAmAPhantomProfile' > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-23318) Unable to restart cluster multiple times.
[ https://issues.apache.org/jira/browse/IGNITE-23318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17889220#comment-17889220 ] Ivan Bessonov commented on IGNITE-23318: What's done here: * Recover meta storage from persisted data, instead of recovering it from a snapshot. This way we will only recover a limited number of data, usually it would fit into a single mem-table. Maybe several mem-tables at worst. To do that, I introduced index/term/configuration values into the storage, just like it's done in partitions. As of right now, "raft snapshots" code for meta storage is not affected, let's do that separately. > Unable to restart cluster multiple times. > - > > Key: IGNITE-23318 > URL: https://issues.apache.org/jira/browse/IGNITE-23318 > Project: Ignite > Issue Type: Bug >Affects Versions: 3.0 >Reporter: Iurii Gerzhedovich >Assignee: Ivan Bessonov >Priority: Blocker > Labels: ignite-3 > Attachments: ignite.log > > Time Spent: 1h 50m > Remaining Estimate: 0h > > We have TPCH benchmarks in Ignite 3 > (org.apache.ignite.internal.benchmark.TpchBenchmark). The benchmark can > prepare data in a cluster and run benchmarks a few times on the same data set > without data reload. During run the benchmarks I observe the following issues: > 1. After a few such runs, I have situations when the nodes cannot assemble > into a cluster. 100% reproducible but with a different number of restarts. > 2. With every restart the startup logs become longer and longer. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-23318) Unable to restart cluster multiple times.
[ https://issues.apache.org/jira/browse/IGNITE-23318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-23318: --- Reviewer: Aleksandr Polovtsev > Unable to restart cluster multiple times. > - > > Key: IGNITE-23318 > URL: https://issues.apache.org/jira/browse/IGNITE-23318 > Project: Ignite > Issue Type: Bug >Affects Versions: 3.0 >Reporter: Iurii Gerzhedovich >Assignee: Ivan Bessonov >Priority: Blocker > Labels: ignite-3 > Attachments: ignite.log > > Time Spent: 10m > Remaining Estimate: 0h > > We have TPCH benchmarks in Ignite 3 > (org.apache.ignite.internal.benchmark.TpchBenchmark). The benchmark can > prepare data in a cluster and run benchmarks a few times on the same data set > without data reload. During run the benchmarks I observe the following issues: > 1. After a few such runs, I have situations when the nodes cannot assemble > into a cluster. 100% reproducible but with a different number of restarts. > 2. With every restart the startup logs become longer and longer. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-23393) RocksSnapshotManager#restoreSnapshot is not fail-proof
Ivan Bessonov created IGNITE-23393: -- Summary: RocksSnapshotManager#restoreSnapshot is not fail-proof Key: IGNITE-23393 URL: https://issues.apache.org/jira/browse/IGNITE-23393 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov If we fail between in-between different column families, node could be restarted with corrupted storage next time. We should probably delete everything if snapshot installation failed. Also, when we start on partially restored snapshot, we should be able to detect that. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-23365) Fix huge log message with assignments
Ivan Bessonov created IGNITE-23365: -- Summary: Fix huge log message with assignments Key: IGNITE-23365 URL: https://issues.apache.org/jira/browse/IGNITE-23365 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Attachments: assignments.log When starting a node, we can write a message like this: {code:java} [2024-10-04T13:08:50,921][INFO ][%node_3344%JRaft-ReadOnlyService-Disruptor_stripe_0-0][AssignmentsTracker] Assignment cache initialized for placement driver [groupAssignments={12_part_12=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment [consistentId=node_3345, isPeer=true]], token=1334], 20_part_20=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment [consistentId=node_3346, isPeer=true]], token=1969], 22_part_22=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment [consistentId=node_3344, isPeer=true]], token=997], 12_part_13=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment [consistentId=node_3345, isPeer=true]], token=1323], 20_part_21=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment [consistentId=node_3344, isPeer=true]], token=1004], 22_part_23=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment [consistentId=node_3346, isPeer=true]], token=1978], 25_part_24=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment [consistentId=node_3346, isPeer=true]], token=2019], 12_part_14=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment [consistentId=node_3345, isPeer=true]], token=1327], 20_part_22=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment [consistentId=node_3344, isPeer=true]], token=1012], 22_part_20=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment [consistentId=node_3346, isPeer=true]], token=1974],...{code} Full message is in the attachment. We should make it shorter. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-23365) Fix huge log message with assignments
[ https://issues.apache.org/jira/browse/IGNITE-23365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-23365: --- Attachment: assignments.log > Fix huge log message with assignments > - > > Key: IGNITE-23365 > URL: https://issues.apache.org/jira/browse/IGNITE-23365 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Attachments: assignments.log > > > When starting a node, we can write a message like this: > {code:java} > [2024-10-04T13:08:50,921][INFO > ][%node_3344%JRaft-ReadOnlyService-Disruptor_stripe_0-0][AssignmentsTracker] > Assignment cache initialized for placement driver > [groupAssignments={12_part_12=TokenizedAssignmentsImpl [nodes=UnmodifiableSet > [Assignment [consistentId=node_3345, isPeer=true]], token=1334], > 20_part_20=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment > [consistentId=node_3346, isPeer=true]], token=1969], > 22_part_22=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment > [consistentId=node_3344, isPeer=true]], token=997], > 12_part_13=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment > [consistentId=node_3345, isPeer=true]], token=1323], > 20_part_21=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment > [consistentId=node_3344, isPeer=true]], token=1004], > 22_part_23=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment > [consistentId=node_3346, isPeer=true]], token=1978], > 25_part_24=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment > [consistentId=node_3346, isPeer=true]], token=2019], > 12_part_14=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment > [consistentId=node_3345, isPeer=true]], token=1327], > 20_part_22=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment > [consistentId=node_3344, isPeer=true]], token=1012], > 22_part_20=TokenizedAssignmentsImpl [nodes=UnmodifiableSet [Assignment > [consistentId=node_3346, isPeer=true]], token=1974],...{code} > Full message is in the attachment. We should make it shorter. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-23318) Unable to restart cluster multiple times.
[ https://issues.apache.org/jira/browse/IGNITE-23318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov reassigned IGNITE-23318: -- Assignee: Ivan Bessonov > Unable to restart cluster multiple times. > - > > Key: IGNITE-23318 > URL: https://issues.apache.org/jira/browse/IGNITE-23318 > Project: Ignite > Issue Type: Bug >Affects Versions: 3.0 >Reporter: Iurii Gerzhedovich >Assignee: Ivan Bessonov >Priority: Blocker > Labels: ignite-3 > Attachments: ignite.log > > > We have TPCH benchmarks in Ignite 3 > (org.apache.ignite.internal.benchmark.TpchBenchmark). The benchmark can > prepare data in a cluster and run benchmarks a few times on the same data set > without data reload. During run the benchmarks I observe the following issues: > 1. After a few such runs, I have situations when the nodes cannot assemble > into a cluster. 100% reproducible but with a different number of restarts. > 2. With every restart the startup logs become longer and longer. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage
[ https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-23240: --- Description: h1. Preface Current implementation, based on {{{}RocksDB{}}}, is known to be way slower then it should be. There are multiple obvious reasons for that: * Writing into WAL +and+ memtable * Creating unique keys for every record * Inability to efficiently serialize data, we must have an intermediate state before we pass data into {{{}RocksDB{}}}'s API. h1. Benchmarks h3. Local benchmarks Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local environment with fsync disabled. I got the following results: * {{{}Logit{}}}: {noformat} Test write: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 23.541 Total size : 16777216000 Throughput(bps) : 712680684 Throughput(rps) : 43498 Test read: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 3.808 Total size : 16777216000 Throughput(bps) : 4405781512 Throughput(rps) : 268907 Test done!{noformat} * {{{}RocksDB{}}}: {noformat} Test write: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 178.785 Total size : 16777216000 Throughput(bps) : 93840176 Throughput(rps) : 5727 Test read: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 13.572 Total size : 16777216000 Throughput(bps) : 1236163866 Throughput(rps) : 75449 Test done!{noformat} While testing on local environment is not optimal, is still shows a huge improvement in writing speed (7.5x) and reading speed (3.5x). Enabling {{fsync}} sort-of equalizes writing speed, but we still expect that simpler log implementation would be faster dues to smaller overall overhead. h3. Integration testing Benchmark for 3 servers and 1 client writing data in multiple threads shows 34438 vs 30299 throughput improvement. {{{}RocksDB{}}}: !Screenshot from 2024-09-20 10-38-53.png! {{{}Logit{}}}: !Screenshot from 2024-09-20 10-38-57.png! Benchmark for single thread insertions in embedded mode shows 4072 vs 3739 throughput improvement. {{{}RocksDB{}}}: !Screenshot from 2024-09-20 10-42-49.png! {{{}Logit{}}}: !Screenshot from 2024-09-20 10-43-09.png! h1. Observations Despite a drastic difference in log throughput, user operations throughput increase is only about 10%. This means that we lose a lot of time elsewhere, and optimizing those parts could significantly increase performance too. Log optimizations would become more evident after that. h1. Unsolved issues There are multiple issues with new log implementation, some of them have been mentioned in IGNITE-22843 * {{Logit}} pre-allocates _a lot_ of data on drive. Considering that we use "log per partition" paradigm, it's too wasteful. * Storing separate log file per partition is not scalable anyway, it's too difficult to optimize batches and {{fsync}} in this approach. * Using the same log for all tables in a distribution zone won't really solve the issue, the best it could do is to make it {_}manageable{_}, in some sense. h1. Shortly about how Logit works Each log consists of 3 sets of files: * "segment" files with data. * "configuration" files with raft configuration. * "index" files with pointers to segment and configuration files. "segment" and "configuration" files contain chunks of data in a following format: |Magic header|Payload size|Payload itself| "index" files contain following pieces of data: |Magic header|Log entry type (data/cfg)|offset|position| It's a fixed-length tuple, that contains a "link" to one of data files. Each "index" file is basically an offset table, and it is used to resolve "logIndex" into real log data. h1. What we should change A list of actions, that we need to do to make this log fit the required criteria includes: * Merge "configuration" and "segment" files into one, to have fewer files on drive, the distinction is arbitrary anyway. Let's call it a "data" file. * Use the same "data" file for multiple raft groups. It's important to note that we can't use "data" file per stripe, because stripe calculation function is not {_}stable{_}, it allocates {{stripeId}} dynamically in order to have a smoother distribution in runtime. * Log should be able to enforce checkpoints/flushes in storage engines, in order to safely truncate data upon reaching a threshold (we truncate logs from multiple raft groups at the same time, that's why we need it). This means that we will change the way Raft canonically makes snapshots, instead we will have our own approach, similar to what we have in Ignite 2.x. Or we will abuse snapshots logic and trigger them outside of the schedule. * In order to make {{fsync}} faster, we should get rid of "index"
[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage
[ https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-23240: --- Description: h1. Preface Current implementation, based on {{{}RocksDB{}}}, is known to be way slower then it should be. There are multiple obvious reasons for that: * Writing into WAL +and+ memtable * Creating unique keys for every record * Inability to efficiently serialize data, we must have an intermediate state before we pass data into {{{}RocksDB{}}}'s API. h1. Benchmarks h3. Local benchmarks Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local environment with fsync disabled. I got the following results: * {{{}Logit{}}}: {noformat} Test write: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 23.541 Total size : 16777216000 Throughput(bps) : 712680684 Throughput(rps) : 43498 Test read: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 3.808 Total size : 16777216000 Throughput(bps) : 4405781512 Throughput(rps) : 268907 Test done!{noformat} * {{{}RocksDB{}}}: {noformat} Test write: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 178.785 Total size : 16777216000 Throughput(bps) : 93840176 Throughput(rps) : 5727 Test read: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 13.572 Total size : 16777216000 Throughput(bps) : 1236163866 Throughput(rps) : 75449 Test done!{noformat} While testing on local environment is not optimal, is still shows a huge improvement in writing speed (7.5x) and reading speed (3.5x). Enabling {{fsync}} sort-of equalizes writing speed, but we still expect that simpler log implementation would be faster dues to smaller overall overhead. h3. Integration testing Benchmark for 3 servers and 1 client writing data in multiple threads shows 34438 vs 30299 throughput improvement. {{{}RocksDB{}}}: !Screenshot from 2024-09-20 10-38-53.png! {{{}Logit{}}}: !Screenshot from 2024-09-20 10-38-57.png! Benchmark for single thread insertions in embedded mode shows 4072 vs 3739 throughput improvement. {{{}RocksDB{}}}: !Screenshot from 2024-09-20 10-42-49.png! {{{}Logit{}}}: !Screenshot from 2024-09-20 10-43-09.png! h1. Observations Despite a drastic difference in log throughput, user operations throughput increase is only about 10%. This means that we lose a lot of time elsewhere, and optimizing those parts could significantly increase performance too. Log optimizations would become more evident after that. h1. Unsolved issues There are multiple issues with new log implementation, some of them have been mentioned in IGNITE-22843 * {{Logit}} pre-allocates _a lot_ of data on drive. Considering that we use "log per partition" paradigm, it's too wasteful. * Storing separate log file per partition is not scalable anyway, it's too difficult to optimize batches and {{fsync}} in this approach. * Using the same log for all tables in a distribution zone won't really solve the issue, the best it could do is to make it {_}manageable{_}, in some sense. h1. Shortly about how Logit works Each log consists of 3 sets of files: * "segment" files with data. * "configuration" files with raft configuration. * "index" files with pointers to segment and configuration files. "segment" and "configuration" files contain chunks of data in a following format: |Magic header|Payload size|Payload itself| "index" files contain following pieces of data: |Magic header|Log entry type (data/cfg)|offset|position| It's a fixed-length tuple, that contains a "link" to one of data files. Each "index" file is basically an offset table, and it is used to resolve "logIndex" into real log data. h1. What we should change A list of actions, that we need to do to make this log fit the required criteria includes: * Merge "configuration" and "segment" files into one, to have fewer files on drive, the distinction is arbitrary anyway. Let's call it a "data" file. * Use the same "data" file for multiple raft groups. It's important to note that we can't use "data" file per stripe, because stripe calculation function is not {_}stable{_}, it allocates {{stripeId}} dynamically in order to have a smoother distribution in runtime. * Log should be able to enforce checkpoints/flushes in storage engines, in order to safely truncate data upon reaching a threshold (we truncate logs from multiple raft groups at the same time, that's why we need it). This means that we will change the way Raft canonically makes snapshots, instead we will have our own approach, similar to what we have in Ignite 2.x. Or we will abuse snapshots logic and trigger them outside of the schedule. * In order to make {{fsync}} faster, we should get rid of "index"
[jira] [Updated] (IGNITE-23325) Checkpoint read-lock timeout under high load
[ https://issues.apache.org/jira/browse/IGNITE-23325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-23325: --- Description: We encounter the following situation while having a very intensive load. It should not happen. Can be reproduced in {{ignite-22835-2}} on {{6bc1f97d0d2506b666975eea57b78ad8609b69d7}} commit. {noformat} [2024-09-25T11:03:12,180][INFO ][%node_3344%checkpoint-thread][Checkpointer] Checkpoint started [checkpointId=db0f392d-7d71-4ef1-bb50-c17eb2d02d82, checkpointBeforeWriteLockTime=30ms, checkpointWriteLockWait=1ms, checkpointListenersExecuteTime=2ms, checkpointWriteLockHoldTime=4ms, splitAndSortPagesDuration=276ms, pages=80904, reason='too many dirty pages'] 227808.089 ops/s [2024-09-25T11:03:13,438][WARN ][org.apache.ignite.internal.benchmark.UpsertKvBenchmark.upsert-jmh-worker-10][PersistentPageMemory] Page replacements started, pages will be rotated with disk, this will affect storage performance (consider increasing PageMemoryDataRegionConfiguration#setMaxSize for data region) [region=default] [2024-09-25T11:03:13,796][INFO ][checkpoint-runner-io0][CheckpointPagesWriter] Checkpoint pages were not written yet due to unsuccessful page write lock acquisition and will be retried [pageCount=1] [2024-09-25T11:03:14,071][INFO ][%node_3344%checkpoint-thread][Checkpointer] Checkpoint finished [checkpointId=db0f392d-7d71-4ef1-bb50-c17eb2d02d82, pages=80904, pagesWriteTime=1614ms, fsyncTime=273ms, totalTime=2203ms] [2024-09-25T11:03:14,073][INFO ][%node_3344%compaction-thread][Compactor] Starting new compaction round [files=64] [2024-09-25T11:03:15,828][INFO ][%node_3344%checkpoint-thread][Checkpointer] Checkpoint started [checkpointId=11157014-7a08-41e0-9e80-e26cf01656a8, checkpointBeforeWriteLockTime=21ms, checkpointWriteLockWait=0ms, checkpointListenersExecuteTime=6ms, checkpointWriteLockHoldTime=6ms, splitAndSortPagesDuration=205ms, pages=77234, reason='too many dirty pages'] 231068.630 ops/s 271814.068 ops/s # Warmup Iteration 8: [2024-09-25T11:03:23,547][INFO ][%node_3344%compaction-thread][Compactor] Starting new compaction round [files=64] [2024-09-25T11:03:23,547][INFO ][%node_3344%checkpoint-thread][Checkpointer] Checkpoint finished [checkpointId=11157014-7a08-41e0-9e80-e26cf01656a8, pages=77234, pagesWriteTime=2547ms, fsyncTime=5099ms, totalTime=7951ms] 26376.850 ops/s # Warmup Iteration 9: [2024-09-25T11:03:23,685][INFO ][%node_3344%checkpoint-thread][Checkpointer] Checkpoint started [checkpointId=b1b2541a-e948-46cf-821b-412ea120146c, checkpointBeforeWriteLockTime=9ms, checkpointWriteLockWait=0ms, checkpointListenersExecuteTime=1ms, checkpointWriteLockHoldTime=1ms, splitAndSortPagesDuration=125ms, pages=95251, reason='too many dirty pages'] [2024-09-25T11:03:34,706][INFO ][%node_3344%compaction-thread][Compactor] Compaction round finished [duration=11159ms] [2024-09-25T11:03:34,707][INFO ][%node_3344%compaction-thread][Compactor] Starting new compaction round [files=41] [2024-09-25T11:03:35,444][INFO ][%node_3344%lease-updater][LeaseUpdater] Leases updated (printed once per 10 iteration(s)): [inCurrentIteration=LeaseStats [leasesCreated=0, leasesPublished=0, leasesProlonged=64, leasesWithoutCandidates=0], active=64, currentAssignmentsSize=64]. [2024-09-25T11:03:39,075][WARN ][%node_3344%meta-storage-safe-time-0][TrackableNetworkMessageHandler] Message handling has been too long [duration=11ms, message=class org.apache.ignite.raft.jraft.rpc.WriteActionRequestImpl] [2024-09-25T11:03:40,287][INFO ][%node_3344%checkpoint-thread][Checkpointer] Checkpoint finished [checkpointId=b1b2541a-e948-46cf-821b-412ea120146c, pages=95251, pagesWriteTime=3098ms, fsyncTime=13146ms, totalTime=16740ms] [2024-09-25T11:03:40,307][WARN ][%node_3344%JRaft-FSMCaller-Disruptor_stripe_24-0][FailureProcessor] Possible failure suppressed according to a configured handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=SYSTEM_CRITICAL_OPERATION_TIMEOUT] org.apache.ignite.internal.lang.IgniteInternalException: Checkpoint read lock acquisition has been timed out. at org.apache.ignite.internal.pagememory.persistence.checkpoint.CheckpointTimeoutLock.failCheckpointReadLock(CheckpointTimeoutLock.java:242) ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.pagememory.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:130) ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.lambda$runConsistently$0(PersistentPageMemoryMvPartitionStorage.java:175) ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.storage.pagememory.mv.AbstractPageMemoryMvPartitionStorage.b
[jira] [Created] (IGNITE-23326) Configuration parser allows duplicated keys
Ivan Bessonov created IGNITE-23326: -- Summary: Configuration parser allows duplicated keys Key: IGNITE-23326 URL: https://issues.apache.org/jira/browse/IGNITE-23326 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Currently, this code leads to no warnings or errors: {code:java} String configTemplate = "ignite {\n" + " \"network\": {\n" + "\"port\":{},\n" + "\"nodeFinder\":{\n" + " \"netClusterNodes\": [ {} ]\n" + "}\n" + " },\n" + " storage.profiles: {" + "" + DEFAULT_STORAGE_PROFILE + ".engine: aipersist, " + "" + DEFAULT_STORAGE_PROFILE + ".size: 2073741824 " + " },\n" + " storage.profiles: {" + "" + DEFAULT_STORAGE_PROFILE + ".engine: aipersist, " + "" + DEFAULT_STORAGE_PROFILE + ".size: 2073741824 " // Avoid page replacement. + " },\n" + " clientConnector: { port:{} },\n" + " rest.port: {},\n" + " raft.fsync = " + fsync() + "}"; {code} This behavior is confusing and error-prone for the end user, we shouldn't allow it -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-23325) Checkpoint read-lock timeout under high load
Ivan Bessonov created IGNITE-23325: -- Summary: Checkpoint read-lock timeout under high load Key: IGNITE-23325 URL: https://issues.apache.org/jira/browse/IGNITE-23325 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov We encounter the following situation while having a very intensive load. It should not happen {noformat} [2024-09-25T11:03:12,180][INFO ][%node_3344%checkpoint-thread][Checkpointer] Checkpoint started [checkpointId=db0f392d-7d71-4ef1-bb50-c17eb2d02d82, checkpointBeforeWriteLockTime=30ms, checkpointWriteLockWait=1ms, checkpointListenersExecuteTime=2ms, checkpointWriteLockHoldTime=4ms, splitAndSortPagesDuration=276ms, pages=80904, reason='too many dirty pages'] 227808.089 ops/s [2024-09-25T11:03:13,438][WARN ][org.apache.ignite.internal.benchmark.UpsertKvBenchmark.upsert-jmh-worker-10][PersistentPageMemory] Page replacements started, pages will be rotated with disk, this will affect storage performance (consider increasing PageMemoryDataRegionConfiguration#setMaxSize for data region) [region=default] [2024-09-25T11:03:13,796][INFO ][checkpoint-runner-io0][CheckpointPagesWriter] Checkpoint pages were not written yet due to unsuccessful page write lock acquisition and will be retried [pageCount=1] [2024-09-25T11:03:14,071][INFO ][%node_3344%checkpoint-thread][Checkpointer] Checkpoint finished [checkpointId=db0f392d-7d71-4ef1-bb50-c17eb2d02d82, pages=80904, pagesWriteTime=1614ms, fsyncTime=273ms, totalTime=2203ms] [2024-09-25T11:03:14,073][INFO ][%node_3344%compaction-thread][Compactor] Starting new compaction round [files=64] [2024-09-25T11:03:15,828][INFO ][%node_3344%checkpoint-thread][Checkpointer] Checkpoint started [checkpointId=11157014-7a08-41e0-9e80-e26cf01656a8, checkpointBeforeWriteLockTime=21ms, checkpointWriteLockWait=0ms, checkpointListenersExecuteTime=6ms, checkpointWriteLockHoldTime=6ms, splitAndSortPagesDuration=205ms, pages=77234, reason='too many dirty pages'] 231068.630 ops/s 271814.068 ops/s # Warmup Iteration 8: [2024-09-25T11:03:23,547][INFO ][%node_3344%compaction-thread][Compactor] Starting new compaction round [files=64] [2024-09-25T11:03:23,547][INFO ][%node_3344%checkpoint-thread][Checkpointer] Checkpoint finished [checkpointId=11157014-7a08-41e0-9e80-e26cf01656a8, pages=77234, pagesWriteTime=2547ms, fsyncTime=5099ms, totalTime=7951ms] 26376.850 ops/s # Warmup Iteration 9: [2024-09-25T11:03:23,685][INFO ][%node_3344%checkpoint-thread][Checkpointer] Checkpoint started [checkpointId=b1b2541a-e948-46cf-821b-412ea120146c, checkpointBeforeWriteLockTime=9ms, checkpointWriteLockWait=0ms, checkpointListenersExecuteTime=1ms, checkpointWriteLockHoldTime=1ms, splitAndSortPagesDuration=125ms, pages=95251, reason='too many dirty pages'] [2024-09-25T11:03:34,706][INFO ][%node_3344%compaction-thread][Compactor] Compaction round finished [duration=11159ms] [2024-09-25T11:03:34,707][INFO ][%node_3344%compaction-thread][Compactor] Starting new compaction round [files=41] [2024-09-25T11:03:35,444][INFO ][%node_3344%lease-updater][LeaseUpdater] Leases updated (printed once per 10 iteration(s)): [inCurrentIteration=LeaseStats [leasesCreated=0, leasesPublished=0, leasesProlonged=64, leasesWithoutCandidates=0], active=64, currentAssignmentsSize=64]. [2024-09-25T11:03:39,075][WARN ][%node_3344%meta-storage-safe-time-0][TrackableNetworkMessageHandler] Message handling has been too long [duration=11ms, message=class org.apache.ignite.raft.jraft.rpc.WriteActionRequestImpl] [2024-09-25T11:03:40,287][INFO ][%node_3344%checkpoint-thread][Checkpointer] Checkpoint finished [checkpointId=b1b2541a-e948-46cf-821b-412ea120146c, pages=95251, pagesWriteTime=3098ms, fsyncTime=13146ms, totalTime=16740ms] [2024-09-25T11:03:40,307][WARN ][%node_3344%JRaft-FSMCaller-Disruptor_stripe_24-0][FailureProcessor] Possible failure suppressed according to a configured handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=SYSTEM_CRITICAL_OPERATION_TIMEOUT] org.apache.ignite.internal.lang.IgniteInternalException: Checkpoint read lock acquisition has been timed out. at org.apache.ignite.internal.pagememory.persistence.checkpoint.CheckpointTimeoutLock.failCheckpointReadLock(CheckpointTimeoutLock.java:242) ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.pagememory.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:130) ~[ignite-page-memory-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.lambda$runConsistently$0(PersistentPageMemoryMvPartitionStorage.java:175) ~[ignite-storage-page-memory-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.storage.pagememory.mv.AbstractPageMe
[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage
[ https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-23240: --- Description: h1. Preface Current implementation, based on {{{}RocksDB{}}}, is known to be way slower then it should be. There are multiple obvious reasons for that: * Writing into WAL +and+ memtable * Creating unique keys for every record * Inability to efficiently serialize data, we must have an intermediate state before we pass data into {{{}RocksDB{}}}'s API. h1. Benchmarks h3. Local benchmarks Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local environment with fsync disabled. I got the following results: * {{{}Logit{}}}: {noformat} Test write: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 23.541 Total size : 16777216000 Throughput(bps) : 712680684 Throughput(rps) : 43498 Test read: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 3.808 Total size : 16777216000 Throughput(bps) : 4405781512 Throughput(rps) : 268907 Test done!{noformat} * {{{}RocksDB{}}}: {noformat} Test write: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 178.785 Total size : 16777216000 Throughput(bps) : 93840176 Throughput(rps) : 5727 Test read: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 13.572 Total size : 16777216000 Throughput(bps) : 1236163866 Throughput(rps) : 75449 Test done!{noformat} While testing on local environment is not optimal, is still shows a huge improvement in writing speed (7.5x) and reading speed (3.5x). Enabling {{fsync}} sort-of equalizes writing speed, but we still expect that simpler log implementation would be faster dues to smaller overall overhead. h3. Integration testing Benchmark for 3 servers and 1 client writing data in multiple threads shows 34438 vs 30299 throughput improvement. {{{}RocksDB{}}}: !Screenshot from 2024-09-20 10-38-53.png! {{{}Logit{}}}: !Screenshot from 2024-09-20 10-38-57.png! Benchmark for single thread insertions in embedded mode shows 4072 vs 3739 throughput improvement. {{{}RocksDB{}}}: !Screenshot from 2024-09-20 10-42-49.png! {{{}Logit{}}}: !Screenshot from 2024-09-20 10-43-09.png! h1. Observations Despite a drastic difference in log throughput, user operations throughput increase is only about 10%. This means that we lose a lot of time elsewhere, and optimizing those parts could significantly increase performance too. Log optimizations would become more evident after that. h1. Unsolved issues There are multiple issues with new log implementation, some of them have been mentioned in IGNITE-22843 * {{Logit}} pre-allocates _a lot_ of data on drive. Considering that we use "log per partition" paradigm, it's too wasteful. * Storing separate log file per partition is not scalable anyway, it's too difficult to optimize batches and {{fsync}} in this approach. * Using the same log for all tables in a distribution zone won't really solve the issue, the best it could do is to make it {_}manageable{_}, in some sense. h1. Shortly about how Logit works Each log consists of 3 sets of files: * "segment" files with data. * "configuration" files with raft configuration. * "index" files with pointers to segment and configuration files. "segment" and "configuration" files contain chunks of data in a following format: |Magic header|Payload size|Payload itself| "index" files contain following pieces of data: |Magic header|Log entry type (data/cfg)|offset|position| It's a fixed-length tuple, that contains a "link" to one of data files. Each "index" file is basically an offset table, and it is used to resolve "logIndex" into real log data. h1. What we should change A list of actions, that we need to do to make this log fit the required criteria includes: * Merge "configuration" and "segment" files into one, to have fewer files on drive, the distinction is arbitrary anyway. Let's call it a "data" file. * Use the same "data" file for multiple raft groups. It's important to note that we can't use "data" file per stripe, because stripe calculation function is not {_}stable{_}, it allocates {{stripeId}} dynamically in order to have a smoother distribution in runtime. * Log should be able to enforce checkpoints/flushes in storage engines, in order to safely truncate data upon reaching a threshold (we truncate logs from multiple raft groups at the same time, that's why we need it). This means that we will change the way Raft canonically makes snapshots, instead we will have our own approach, similar to what we have in Ignite 2.x. Or we will abuse snapshots logic and trigger them outside of the schedule. * In order to make {{fsync}} faster, we should get rid of "inde
[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage
[ https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-23240: --- Description: h1. Preface Current implementation, based on {{{}RocksDB{}}}, is known to be way slower then it should be. There are multiple obvious reasons for that: * Writing into WAL +and+ memtable * Creating unique keys for every record * Inability to efficiently serialize data, we must have an intermediate state before we pass data into {{{}RocksDB{}}}'s API. h1. Benchmarks h3. Local benchmarks Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local environment with fsync disabled. I got the following results: * {{{}Logit{}}}: {noformat} Test write: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 23.541 Total size : 16777216000 Throughput(bps) : 712680684 Throughput(rps) : 43498 Test read: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 3.808 Total size : 16777216000 Throughput(bps) : 4405781512 Throughput(rps) : 268907 Test done!{noformat} * {{{}RocksDB{}}}: {noformat} Test write: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 178.785 Total size : 16777216000 Throughput(bps) : 93840176 Throughput(rps) : 5727 Test read: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 13.572 Total size : 16777216000 Throughput(bps) : 1236163866 Throughput(rps) : 75449 Test done!{noformat} While testing on local environment is not optimal, is still shows a huge improvement in writing speed (7.5x) and reading speed (3.5x). Enabling {{fsync}} sort-of equalizes writing speed, but we still expect that simpler log implementation would be faster dues to smaller overall overhead. h3. Integration testing Benchmark for 3 servers and 1 client writing data in multiple threads shows 34438 vs 30299 throughput improvement. {{{}RocksDB{}}}: !Screenshot from 2024-09-20 10-38-53.png! {{{}Logit{}}}: !Screenshot from 2024-09-20 10-38-57.png! Benchmark for single thread insertions in embedded mode shows 4072 vs 3739 throughput improvement. {{{}RocksDB{}}}: !Screenshot from 2024-09-20 10-42-49.png! {{{}Logit{}}}: !Screenshot from 2024-09-20 10-43-09.png! h1. Observations Despite a drastic difference in log throughput, user operations throughput increase is only about 10%. This means that we lose a lot of time elsewhere, and optimizing those parts could significantly increase performance too. Log optimizations would become more evident after that. h1. Unsolved issues There are multiple issues with new log implementation, some of them have been mentioned in IGNITE-22843 * {{Logit}} pre-allocates _a lot_ of data on drive. Considering that we use "log per partition" paradigm, it's too wasteful. * Storing separate log file per partition is not scalable anyway, it's too difficult to optimize batches and {{fsync}} in this approach. * Using the same log for all tables in a distribution zone won't really solve the issue, the best it could do is to make it {_}manageable{_}, in some sense. h1. Shortly about how Logit works Each log consists of 3 sets of files: * "segment" files with data. * "configuration" files with raft configuration. * "index" files with pointers to segment and configuration files. "segment" and "configuration" files contain chunks of data in a following format: |Magic header|Payload size|Payload itself| "index" files contain following pieces of data: |Magic header|Log entry type (data/cfg)|offset|position| It's a fixed-length tuple, that contains a "link" to one of data files. Each "index" file is basically an offset table, and it is used to resolve "logIndex" into real log data. h1. What we should change A list of actions, that we need to do to make this log fit the required criteria includes: * Merge "configuration" and "segment" files into one, to have fewer files on drive, the distinction is arbitrary anyway. Let's call it a "data" file. * Use the same "data" file for multiple raft groups. It's important to note that we can't use "data" file per stripe, because stripe calculation function is not {_}stable{_}, it allocates {{stripeId}} dynamically in order to have a smoother distribution in runtime. * Log should be able to enforce checkpoints/flushes in storage engines, in order to safely truncate data upon reaching a threshold (we truncate logs from multiple raft groups at the same time, that's why we need it). This means that we will change the way Raft canonically makes snapshots, instead we will have our own approach, similar to what we have in Ignite 2.x. Or we will abuse snapshots logic and trigger them outside of the schedule. * In order to make {{fsync}} faster, we should get rid of "inde
[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage
[ https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-23240: --- Description: h1. Preface Current implementation, based on {{{}RocksDB{}}}, is known to be way slower then it should be. There are multiple obvious reasons for that: * Writing into WAL +and+ memtable * Creating unique keys for every record * Inability to efficiently serialize data, we must have an intermediate state before we pass data into {{{}RocksDB{}}}'s API. h1. Benchmarks h3. Local benchmarks Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local environment with fsync disabled. I got the following results: * {{{}Logit{}}}: {noformat} Test write: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 23.541 Total size : 16777216000 Throughput(bps) : 712680684 Throughput(rps) : 43498 Test read: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 3.808 Total size : 16777216000 Throughput(bps) : 4405781512 Throughput(rps) : 268907 Test done!{noformat} * {{{}RocksDB{}}}: {noformat} Test write: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 178.785 Total size : 16777216000 Throughput(bps) : 93840176 Throughput(rps) : 5727 Test read: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 13.572 Total size : 16777216000 Throughput(bps) : 1236163866 Throughput(rps) : 75449 Test done!{noformat} While testing on local environment is not optimal, is still shows a huge improvement in writing speed (7.5x) and reading speed (3.5x). Enabling {{fsync}} sort-of equalizes writing speed, but we still expect that simpler log implementation would be faster dues to smaller overall overhead. h3. Integration testing Benchmark for 3 servers and 1 client writing data in multiple threads shows 34438 vs 30299 throughput improvement. {{{}RocksDB{}}}: !Screenshot from 2024-09-20 10-38-53.png! {{{}Logit{}}}: !Screenshot from 2024-09-20 10-38-57.png! Benchmark for single thread insertions in embedded mode shows 4072 vs 3739 throughput improvement. {{{}RocksDB{}}}: !Screenshot from 2024-09-20 10-42-49.png! {{{}Logit{}}}: !Screenshot from 2024-09-20 10-43-09.png! h1. Observations Despite a drastic difference in log throughput, user operations throughput increase is only about 10%. This means that we lose a lot of time elsewhere, and optimizing those parts could significantly increase performance too. Log optimizations would become more evident after that. h1. Unsolved issues There are multiple issues with new log implementation, some of them have been mentioned in IGNITE-22843 * {{Logit}} pre-allocates _a lot_ of data on drive. Considering that we use "log per partition" paradigm, it's too wasteful. * Storing separate log file per partition is not scalable anyway, it's too difficult to optimize batches and {{fsync}} in this approach. * Using the same log for all tables in a distribution zone won't really solve the issue, the best it could do is to make it {_}manageable{_}, in some sense. h1. Shortly about how Logit works Each log consists of 3 sets of files: * "segment" files with data. * "configuration" files with raft configuration. * "index" files with pointers to segment and configuration files. "segment" and "configuration" files contain chunks of data in a following format: |Magic header|Payload size|Payload itself| "index" files contain following pieces of data: |Magic header|Log entry type (data/cfg)|offset|position| It's a fixed-length tuple, that contains a "link" to one of data files. Each "index" file is basically an offset table, and it is used to resolve "logIndex" into real log data. h1. What we should change A list of actions, that we need to do to make this log fit the required criteria includes: * was: h1. Preface Current implementation, based on {{{}RocksDB{}}}, is known to be way slower then it should be. There are multiple obvious reasons for that: * Writing into WAL +and+ memtable * Creating unique keys for every record * Inability to efficiently serialize data, we must have an intermediate state before we pass data into {{{}RocksDB{}}}'s API. h1. Benchmarks h3. Local benchmarks Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local environment with fsync disabled. I got the following results: * {{{}Logit{}}}: {noformat} Test write: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 23.541 Total size : 16777216000 Throughput(bps) : 712680684 Throughput(rps) : 43498 Test read: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 3.808 Total size : 16777
[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage
[ https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-23240: --- Description: h1. Preface Current implementation, based on {{{}RocksDB{}}}, is known to be way slower then it should be. There are multiple obvious reasons for that: * Writing into WAL +and+ memtable * Creating unique keys for every record * Inability to efficiently serialize data, we must have an intermediate state before we pass data into {{{}RocksDB{}}}'s API. h1. Benchmarks h3. Local benchmarks Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local environment with fsync disabled. I got the following results: * {{{}Logit{}}}: {noformat} Test write: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 23.541 Total size : 16777216000 Throughput(bps) : 712680684 Throughput(rps) : 43498 Test read: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 3.808 Total size : 16777216000 Throughput(bps) : 4405781512 Throughput(rps) : 268907 Test done!{noformat} * {{{}RocksDB{}}}: {noformat} Test write: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 178.785 Total size : 16777216000 Throughput(bps) : 93840176 Throughput(rps) : 5727 Test read: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 13.572 Total size : 16777216000 Throughput(bps) : 1236163866 Throughput(rps) : 75449 Test done!{noformat} While testing on local environment is not optimal, is still shows a huge improvement in writing speed (7.5x) and reading speed (3.5x). Enabling {{fsync}} sort-of equalizes writing speed, but we still expect that simpler log implementation would be faster dues to smaller overall overhead. h3. Integration testing Benchmark for 3 servers and 1 client writing data in multiple threads shows 34438 vs 30299 throughput improvement. {{{}RocksDB{}}}: !Screenshot from 2024-09-20 10-38-53.png! {{{}Logit{}}}: !Screenshot from 2024-09-20 10-38-57.png! Benchmark for single thread insertions in embedded mode shows 4072 vs 3739 throughput improvement. {{{}RocksDB{}}}: !Screenshot from 2024-09-20 10-42-49.png! {{{}Logit{}}}: !Screenshot from 2024-09-20 10-43-09.png! h1. Observations Despite a drastic difference in log throughput, user operations throughput increase is only about 10%. This means that we lose a lot of time elsewhere, and optimizing those parts could significantly increase performance too. Log optimizations would become more evident after that. h1. Unsolved issues There are multiple issues with new log implementation, most of them have been mentioned in [IGNITE-22843|https://issues.apache.org/jira/browse/IGNITE-22843?focusedCommentId=17871250&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17871250] was: h1. Preface Current implementation, based on {{{}RocksDB{}}}, is known to be way slower then it should be. There are multiple obvious reasons for that: * Writing into WAL +and+ memtable * Creating unique keys for every record * Inability to efficiently serialize data, we must have an intermediate state before we pass data into {{{}RocksDB{}}}'s API. h1. Benchmarks h3. Local benchmarks Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local environment with fsync disabled. I got the following results: * {{{}Logit{}}}: {noformat} Test write: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 23.541 Total size : 16777216000 Throughput(bps) : 712680684 Throughput(rps) : 43498 Test read: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 3.808 Total size : 16777216000 Throughput(bps) : 4405781512 Throughput(rps) : 268907 Test done!{noformat} * {{{}RocksDB{}}}: {noformat} Test write: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 178.785 Total size : 16777216000 Throughput(bps) : 93840176 Throughput(rps) : 5727 Test read: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 13.572 Total size : 16777216000 Throughput(bps) : 1236163866 Throughput(rps) : 75449 Test done!{noformat} While testing on local environment is not optimal, is still shows a huge improvement in writing speed (7.5x) and reading speed (3.5x). Enabling {{fsync}} sort-of equalizes writing speed, but we still expect that simpler log implementation would be faster dues to smaller overall overhead. h3. Integration testing Benchmark for 3 servers and 1 client writing data in multiple threads show 34438 vs 30299 throughput. {{{}RocksDB{}}}: !Screenshot from 2024-09-20 10-38-
[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage
[ https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-23240: --- Attachment: Screenshot from 2024-09-20 10-38-53.png > Ignite 3 new log storage > > > Key: IGNITE-23240 > URL: https://issues.apache.org/jira/browse/IGNITE-23240 > Project: Ignite > Issue Type: Epic >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Attachments: Screenshot from 2024-09-20 10-38-53.png > > > h1. Preface > Current implementation, based on {{{}RocksDB{}}}, is known to be way slower > then it should be. There are multiple obvious reasons for that: > * Writing into WAL +and+ memtable > * Creating unique keys for every record > * Inability to efficiently serialize data, we must have an intermediate > state before we pass data into {{{}RocksDB{}}}'s API. > h1. Benchmarks > h3. Local benchmarks > Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my > local environment with fsync disabled. I got the following results: > * {{{}Logit{}}}: > > {noformat} > Test write: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 23.541 > Total size : 16777216000 > Throughput(bps) : 712680684 > Throughput(rps) : 43498 > Test read: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 3.808 > Total size : 16777216000 > Throughput(bps) : 4405781512 > Throughput(rps) : 268907 > Test done!{noformat} > * {{{}RocksDB{}}}: > {noformat} > Test write: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 178.785 > Total size : 16777216000 > Throughput(bps) : 93840176 > Throughput(rps) : 5727 > Test read: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 13.572 > Total size : 16777216000 > Throughput(bps) : 1236163866 > Throughput(rps) : 75449 > Test done!{noformat} > While testing on local environment is not optimal, is still shows a huge > improvement in writing speed (7.5x) and reading speed (3.5x). Enabling > {{fsync}} sort-of equalizes writing speed, but we still expect that simpler > log implementation would be faster dues to smaller overall overhead. > h3. Integration testing > Benchmark for 3 servers and 1 client writing data in multiple threads show > the following. > {{{}RocksDB{}}}: > > > {{{}Logit{}}}: > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage
[ https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-23240: --- Description: h1. Preface Current implementation, based on {{{}RocksDB{}}}, is known to be way slower then it should be. There are multiple obvious reasons for that: * Writing into WAL +and+ memtable * Creating unique keys for every record * Inability to efficiently serialize data, we must have an intermediate state before we pass data into {{{}RocksDB{}}}'s API. h1. Benchmarks h3. Local benchmarks Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local environment with fsync disabled. I got the following results: * {{{}Logit{}}}: {noformat} Test write: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 23.541 Total size : 16777216000 Throughput(bps) : 712680684 Throughput(rps) : 43498 Test read: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 3.808 Total size : 16777216000 Throughput(bps) : 4405781512 Throughput(rps) : 268907 Test done!{noformat} * {{{}RocksDB{}}}: {noformat} Test write: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 178.785 Total size : 16777216000 Throughput(bps) : 93840176 Throughput(rps) : 5727 Test read: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 13.572 Total size : 16777216000 Throughput(bps) : 1236163866 Throughput(rps) : 75449 Test done!{noformat} While testing on local environment is not optimal, is still shows a huge improvement in writing speed (7.5x) and reading speed (3.5x). Enabling {{fsync}} sort-of equalizes writing speed, but we still expect that simpler log implementation would be faster dues to smaller overall overhead. h3. Integration testing Benchmark for 3 servers and 1 client writing data in multiple threads show the following. {{{}RocksDB{}}}: {{{}Logit{}}}: was: h1. Preface Current implementation, based on {{{}RocksDB{}}}, is known to be way slower then it should be. There are multiple obvious reasons for that: * Writing into WAL +and+ memtable * Creating unique keys for every record * Inability to efficiently serialize data, we must have an intermediate state before we pass data into {{{}RocksDB{}}}'s API. h1. Benchmarks h3. Local benchmarks Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local environment with fsync disabled. I got the following results: * {{{}Logit{}}}: {noformat} Test write: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 23.541 Total size : 16777216000 Throughput(bps) : 712680684 Throughput(rps) : 43498 Test read: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 3.808 Total size : 16777216000 Throughput(bps) : 4405781512 Throughput(rps) : 268907 Test done!{noformat} * {{{}RocksDB{}}}: {noformat} Test write: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 178.785 Total size : 16777216000 Throughput(bps) : 93840176 Throughput(rps) : 5727 Test read: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 13.572 Total size : 16777216000 Throughput(bps) : 1236163866 Throughput(rps) : 75449 Test done!{noformat} While testing on local environment is not optimal, is still shows a huge improvement in writing speed (7.5x) and reading speed (3.5x). Enabling {{fsync}} sort-of equalizes writing speed, but we still expect that simpler log implementation would be faster dues to smaller overall overhead. h3. Integration testing Benchmark for 3 servers and 1 client writing data in multiple threads show the following. {{{}RocksDB{}}}: !image-2024-09-20-10-39-22-043.png! {{{}Logit{}}}: > Ignite 3 new log storage > > > Key: IGNITE-23240 > URL: https://issues.apache.org/jira/browse/IGNITE-23240 > Project: Ignite > Issue Type: Epic >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Attachments: Screenshot from 2024-09-20 10-38-53.png > > > h1. Preface > Current implementation, based on {{{}RocksDB{}}}, is known to be way slower > then it should be. There are multiple obvious reasons for that: > * Writing into WAL +and+ memtable > * Creating unique keys for every record > * Inability to efficiently serialize data, we must have an intermediate > state before we pass data into {{{}RocksDB{}}}'s API. > h1. Benchmarks > h3. Local benchmarks > Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my > local environment with fs
[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage
[ https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-23240: --- Attachment: (was: image-2024-09-20-10-43-23-213.png) > Ignite 3 new log storage > > > Key: IGNITE-23240 > URL: https://issues.apache.org/jira/browse/IGNITE-23240 > Project: Ignite > Issue Type: Epic >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Attachments: Screenshot from 2024-09-20 10-38-53.png, Screenshot from > 2024-09-20 10-38-57.png, Screenshot from 2024-09-20 10-42-49.png, Screenshot > from 2024-09-20 10-43-09.png > > > h1. Preface > Current implementation, based on {{{}RocksDB{}}}, is known to be way slower > then it should be. There are multiple obvious reasons for that: > * Writing into WAL +and+ memtable > * Creating unique keys for every record > * Inability to efficiently serialize data, we must have an intermediate > state before we pass data into {{{}RocksDB{}}}'s API. > h1. Benchmarks > h3. Local benchmarks > Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my > local environment with fsync disabled. I got the following results: > * {{{}Logit{}}}: > > {noformat} > Test write: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 23.541 > Total size : 16777216000 > Throughput(bps) : 712680684 > Throughput(rps) : 43498 > Test read: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 3.808 > Total size : 16777216000 > Throughput(bps) : 4405781512 > Throughput(rps) : 268907 > Test done!{noformat} > * {{{}RocksDB{}}}: > {noformat} > Test write: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 178.785 > Total size : 16777216000 > Throughput(bps) : 93840176 > Throughput(rps) : 5727 > Test read: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 13.572 > Total size : 16777216000 > Throughput(bps) : 1236163866 > Throughput(rps) : 75449 > Test done!{noformat} > While testing on local environment is not optimal, is still shows a huge > improvement in writing speed (7.5x) and reading speed (3.5x). Enabling > {{fsync}} sort-of equalizes writing speed, but we still expect that simpler > log implementation would be faster dues to smaller overall overhead. > h3. Integration testing > Benchmark for 3 servers and 1 client writing data in multiple threads show > 34438 vs 30299 throughput. > {{{}RocksDB{}}}: > !Screenshot from 2024-09-20 10-38-53.png! > {{{}Logit{}}}: > !Screenshot from 2024-09-20 10-38-57.png! > Single thread insertions in embedded mode show -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage
[ https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-23240: --- Attachment: Screenshot from 2024-09-20 10-38-57.png > Ignite 3 new log storage > > > Key: IGNITE-23240 > URL: https://issues.apache.org/jira/browse/IGNITE-23240 > Project: Ignite > Issue Type: Epic >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Attachments: Screenshot from 2024-09-20 10-38-53.png > > > h1. Preface > Current implementation, based on {{{}RocksDB{}}}, is known to be way slower > then it should be. There are multiple obvious reasons for that: > * Writing into WAL +and+ memtable > * Creating unique keys for every record > * Inability to efficiently serialize data, we must have an intermediate > state before we pass data into {{{}RocksDB{}}}'s API. > h1. Benchmarks > h3. Local benchmarks > Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my > local environment with fsync disabled. I got the following results: > * {{{}Logit{}}}: > > {noformat} > Test write: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 23.541 > Total size : 16777216000 > Throughput(bps) : 712680684 > Throughput(rps) : 43498 > Test read: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 3.808 > Total size : 16777216000 > Throughput(bps) : 4405781512 > Throughput(rps) : 268907 > Test done!{noformat} > * {{{}RocksDB{}}}: > {noformat} > Test write: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 178.785 > Total size : 16777216000 > Throughput(bps) : 93840176 > Throughput(rps) : 5727 > Test read: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 13.572 > Total size : 16777216000 > Throughput(bps) : 1236163866 > Throughput(rps) : 75449 > Test done!{noformat} > While testing on local environment is not optimal, is still shows a huge > improvement in writing speed (7.5x) and reading speed (3.5x). Enabling > {{fsync}} sort-of equalizes writing speed, but we still expect that simpler > log implementation would be faster dues to smaller overall overhead. > h3. Integration testing > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage
[ https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-23240: --- Attachment: (was: Screenshot from 2024-09-20 10-38-57.png) > Ignite 3 new log storage > > > Key: IGNITE-23240 > URL: https://issues.apache.org/jira/browse/IGNITE-23240 > Project: Ignite > Issue Type: Epic >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Attachments: Screenshot from 2024-09-20 10-38-53.png > > > h1. Preface > Current implementation, based on {{{}RocksDB{}}}, is known to be way slower > then it should be. There are multiple obvious reasons for that: > * Writing into WAL +and+ memtable > * Creating unique keys for every record > * Inability to efficiently serialize data, we must have an intermediate > state before we pass data into {{{}RocksDB{}}}'s API. > h1. Benchmarks > h3. Local benchmarks > Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my > local environment with fsync disabled. I got the following results: > * {{{}Logit{}}}: > > {noformat} > Test write: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 23.541 > Total size : 16777216000 > Throughput(bps) : 712680684 > Throughput(rps) : 43498 > Test read: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 3.808 > Total size : 16777216000 > Throughput(bps) : 4405781512 > Throughput(rps) : 268907 > Test done!{noformat} > * {{{}RocksDB{}}}: > {noformat} > Test write: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 178.785 > Total size : 16777216000 > Throughput(bps) : 93840176 > Throughput(rps) : 5727 > Test read: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 13.572 > Total size : 16777216000 > Throughput(bps) : 1236163866 > Throughput(rps) : 75449 > Test done!{noformat} > While testing on local environment is not optimal, is still shows a huge > improvement in writing speed (7.5x) and reading speed (3.5x). Enabling > {{fsync}} sort-of equalizes writing speed, but we still expect that simpler > log implementation would be faster dues to smaller overall overhead. > h3. Integration testing > Benchmark for 3 servers and 1 client writing data in multiple threads show > the following. > {{{}RocksDB{}}}: > > > {{{}Logit{}}}: > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage
[ https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-23240: --- Attachment: Screenshot from 2024-09-20 10-43-09.png > Ignite 3 new log storage > > > Key: IGNITE-23240 > URL: https://issues.apache.org/jira/browse/IGNITE-23240 > Project: Ignite > Issue Type: Epic >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Attachments: Screenshot from 2024-09-20 10-38-53.png, Screenshot from > 2024-09-20 10-38-57.png, Screenshot from 2024-09-20 10-42-49.png, Screenshot > from 2024-09-20 10-43-09.png > > > h1. Preface > Current implementation, based on {{{}RocksDB{}}}, is known to be way slower > then it should be. There are multiple obvious reasons for that: > * Writing into WAL +and+ memtable > * Creating unique keys for every record > * Inability to efficiently serialize data, we must have an intermediate > state before we pass data into {{{}RocksDB{}}}'s API. > h1. Benchmarks > h3. Local benchmarks > Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my > local environment with fsync disabled. I got the following results: > * {{{}Logit{}}}: > > {noformat} > Test write: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 23.541 > Total size : 16777216000 > Throughput(bps) : 712680684 > Throughput(rps) : 43498 > Test read: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 3.808 > Total size : 16777216000 > Throughput(bps) : 4405781512 > Throughput(rps) : 268907 > Test done!{noformat} > * {{{}RocksDB{}}}: > {noformat} > Test write: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 178.785 > Total size : 16777216000 > Throughput(bps) : 93840176 > Throughput(rps) : 5727 > Test read: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 13.572 > Total size : 16777216000 > Throughput(bps) : 1236163866 > Throughput(rps) : 75449 > Test done!{noformat} > While testing on local environment is not optimal, is still shows a huge > improvement in writing speed (7.5x) and reading speed (3.5x). Enabling > {{fsync}} sort-of equalizes writing speed, but we still expect that simpler > log implementation would be faster dues to smaller overall overhead. > h3. Integration testing > Benchmark for 3 servers and 1 client writing data in multiple threads show > 34438 vs 30299 throughput. > {{{}RocksDB{}}}: > !Screenshot from 2024-09-20 10-38-53.png! > {{{}Logit{}}}: > !Screenshot from 2024-09-20 10-38-57.png! > Single thread insertions in embedded mode show -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage
[ https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-23240: --- Description: h1. Preface Current implementation, based on {{{}RocksDB{}}}, is known to be way slower then it should be. There are multiple obvious reasons for that: * Writing into WAL +and+ memtable * Creating unique keys for every record * Inability to efficiently serialize data, we must have an intermediate state before we pass data into {{{}RocksDB{}}}'s API. h1. Benchmarks h3. Local benchmarks Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local environment with fsync disabled. I got the following results: * {{{}Logit{}}}: {noformat} Test write: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 23.541 Total size : 16777216000 Throughput(bps) : 712680684 Throughput(rps) : 43498 Test read: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 3.808 Total size : 16777216000 Throughput(bps) : 4405781512 Throughput(rps) : 268907 Test done!{noformat} * {{{}RocksDB{}}}: {noformat} Test write: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 178.785 Total size : 16777216000 Throughput(bps) : 93840176 Throughput(rps) : 5727 Test read: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 13.572 Total size : 16777216000 Throughput(bps) : 1236163866 Throughput(rps) : 75449 Test done!{noformat} While testing on local environment is not optimal, is still shows a huge improvement in writing speed (7.5x) and reading speed (3.5x). Enabling {{fsync}} sort-of equalizes writing speed, but we still expect that simpler log implementation would be faster dues to smaller overall overhead. h3. Integration testing Benchmark for 3 servers and 1 client writing data in multiple threads show 34438 vs 30299 throughput. {{{}RocksDB{}}}: !Screenshot from 2024-09-20 10-38-53.png! {{{}Logit{}}}: !Screenshot from 2024-09-20 10-38-57.png! Single thread insertions in embedded mode show was: h1. Preface Current implementation, based on {{{}RocksDB{}}}, is known to be way slower then it should be. There are multiple obvious reasons for that: * Writing into WAL +and+ memtable * Creating unique keys for every record * Inability to efficiently serialize data, we must have an intermediate state before we pass data into {{{}RocksDB{}}}'s API. h1. Benchmarks h3. Local benchmarks Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local environment with fsync disabled. I got the following results: * {{{}Logit{}}}: {noformat} Test write: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 23.541 Total size : 16777216000 Throughput(bps) : 712680684 Throughput(rps) : 43498 Test read: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 3.808 Total size : 16777216000 Throughput(bps) : 4405781512 Throughput(rps) : 268907 Test done!{noformat} * {{{}RocksDB{}}}: {noformat} Test write: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 178.785 Total size : 16777216000 Throughput(bps) : 93840176 Throughput(rps) : 5727 Test read: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 13.572 Total size : 16777216000 Throughput(bps) : 1236163866 Throughput(rps) : 75449 Test done!{noformat} While testing on local environment is not optimal, is still shows a huge improvement in writing speed (7.5x) and reading speed (3.5x). Enabling {{fsync}} sort-of equalizes writing speed, but we still expect that simpler log implementation would be faster dues to smaller overall overhead. h3. Integration testing Benchmark for 3 servers and 1 client writing data in multiple threads show the following. {{{}RocksDB{}}}: {{{}Logit{}}}: > Ignite 3 new log storage > > > Key: IGNITE-23240 > URL: https://issues.apache.org/jira/browse/IGNITE-23240 > Project: Ignite > Issue Type: Epic >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Attachments: Screenshot from 2024-09-20 10-38-53.png, Screenshot from > 2024-09-20 10-38-57.png, Screenshot from 2024-09-20 10-42-49.png, Screenshot > from 2024-09-20 10-43-09.png > > > h1. Preface > Current implementation, based on {{{}RocksDB{}}}, is known to be way slower > then it should be. There are multiple obvious reasons for that: > * Writing into WAL +and+ memtable > * Creating unique keys for every record > * Inability to efficiently serialize data
[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage
[ https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-23240: --- Attachment: image-2024-09-20-10-43-23-213.png > Ignite 3 new log storage > > > Key: IGNITE-23240 > URL: https://issues.apache.org/jira/browse/IGNITE-23240 > Project: Ignite > Issue Type: Epic >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Attachments: Screenshot from 2024-09-20 10-38-53.png, Screenshot from > 2024-09-20 10-38-57.png, Screenshot from 2024-09-20 10-42-49.png, Screenshot > from 2024-09-20 10-43-09.png > > > h1. Preface > Current implementation, based on {{{}RocksDB{}}}, is known to be way slower > then it should be. There are multiple obvious reasons for that: > * Writing into WAL +and+ memtable > * Creating unique keys for every record > * Inability to efficiently serialize data, we must have an intermediate > state before we pass data into {{{}RocksDB{}}}'s API. > h1. Benchmarks > h3. Local benchmarks > Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my > local environment with fsync disabled. I got the following results: > * {{{}Logit{}}}: > > {noformat} > Test write: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 23.541 > Total size : 16777216000 > Throughput(bps) : 712680684 > Throughput(rps) : 43498 > Test read: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 3.808 > Total size : 16777216000 > Throughput(bps) : 4405781512 > Throughput(rps) : 268907 > Test done!{noformat} > * {{{}RocksDB{}}}: > {noformat} > Test write: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 178.785 > Total size : 16777216000 > Throughput(bps) : 93840176 > Throughput(rps) : 5727 > Test read: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 13.572 > Total size : 16777216000 > Throughput(bps) : 1236163866 > Throughput(rps) : 75449 > Test done!{noformat} > While testing on local environment is not optimal, is still shows a huge > improvement in writing speed (7.5x) and reading speed (3.5x). Enabling > {{fsync}} sort-of equalizes writing speed, but we still expect that simpler > log implementation would be faster dues to smaller overall overhead. > h3. Integration testing > Benchmark for 3 servers and 1 client writing data in multiple threads show > the following. > {{{}RocksDB{}}}: > > > {{{}Logit{}}}: > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage
[ https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-23240: --- Attachment: Screenshot from 2024-09-20 10-42-49.png > Ignite 3 new log storage > > > Key: IGNITE-23240 > URL: https://issues.apache.org/jira/browse/IGNITE-23240 > Project: Ignite > Issue Type: Epic >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Attachments: Screenshot from 2024-09-20 10-38-53.png, Screenshot from > 2024-09-20 10-38-57.png, Screenshot from 2024-09-20 10-42-49.png, Screenshot > from 2024-09-20 10-43-09.png > > > h1. Preface > Current implementation, based on {{{}RocksDB{}}}, is known to be way slower > then it should be. There are multiple obvious reasons for that: > * Writing into WAL +and+ memtable > * Creating unique keys for every record > * Inability to efficiently serialize data, we must have an intermediate > state before we pass data into {{{}RocksDB{}}}'s API. > h1. Benchmarks > h3. Local benchmarks > Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my > local environment with fsync disabled. I got the following results: > * {{{}Logit{}}}: > > {noformat} > Test write: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 23.541 > Total size : 16777216000 > Throughput(bps) : 712680684 > Throughput(rps) : 43498 > Test read: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 3.808 > Total size : 16777216000 > Throughput(bps) : 4405781512 > Throughput(rps) : 268907 > Test done!{noformat} > * {{{}RocksDB{}}}: > {noformat} > Test write: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 178.785 > Total size : 16777216000 > Throughput(bps) : 93840176 > Throughput(rps) : 5727 > Test read: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 13.572 > Total size : 16777216000 > Throughput(bps) : 1236163866 > Throughput(rps) : 75449 > Test done!{noformat} > While testing on local environment is not optimal, is still shows a huge > improvement in writing speed (7.5x) and reading speed (3.5x). Enabling > {{fsync}} sort-of equalizes writing speed, but we still expect that simpler > log implementation would be faster dues to smaller overall overhead. > h3. Integration testing > Benchmark for 3 servers and 1 client writing data in multiple threads show > the following. > {{{}RocksDB{}}}: > > > {{{}Logit{}}}: > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage
[ https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-23240: --- Attachment: Screenshot from 2024-09-20 10-38-57.png > Ignite 3 new log storage > > > Key: IGNITE-23240 > URL: https://issues.apache.org/jira/browse/IGNITE-23240 > Project: Ignite > Issue Type: Epic >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Attachments: Screenshot from 2024-09-20 10-38-53.png, Screenshot from > 2024-09-20 10-38-57.png > > > h1. Preface > Current implementation, based on {{{}RocksDB{}}}, is known to be way slower > then it should be. There are multiple obvious reasons for that: > * Writing into WAL +and+ memtable > * Creating unique keys for every record > * Inability to efficiently serialize data, we must have an intermediate > state before we pass data into {{{}RocksDB{}}}'s API. > h1. Benchmarks > h3. Local benchmarks > Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my > local environment with fsync disabled. I got the following results: > * {{{}Logit{}}}: > > {noformat} > Test write: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 23.541 > Total size : 16777216000 > Throughput(bps) : 712680684 > Throughput(rps) : 43498 > Test read: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 3.808 > Total size : 16777216000 > Throughput(bps) : 4405781512 > Throughput(rps) : 268907 > Test done!{noformat} > * {{{}RocksDB{}}}: > {noformat} > Test write: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 178.785 > Total size : 16777216000 > Throughput(bps) : 93840176 > Throughput(rps) : 5727 > Test read: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 13.572 > Total size : 16777216000 > Throughput(bps) : 1236163866 > Throughput(rps) : 75449 > Test done!{noformat} > While testing on local environment is not optimal, is still shows a huge > improvement in writing speed (7.5x) and reading speed (3.5x). Enabling > {{fsync}} sort-of equalizes writing speed, but we still expect that simpler > log implementation would be faster dues to smaller overall overhead. > h3. Integration testing > Benchmark for 3 servers and 1 client writing data in multiple threads show > the following. > {{{}RocksDB{}}}: > > > {{{}Logit{}}}: > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage
[ https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-23240: --- Attachment: (was: image-2024-09-20-10-39-22-043.png) > Ignite 3 new log storage > > > Key: IGNITE-23240 > URL: https://issues.apache.org/jira/browse/IGNITE-23240 > Project: Ignite > Issue Type: Epic >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Attachments: Screenshot from 2024-09-20 10-38-53.png > > > h1. Preface > Current implementation, based on {{{}RocksDB{}}}, is known to be way slower > then it should be. There are multiple obvious reasons for that: > * Writing into WAL +and+ memtable > * Creating unique keys for every record > * Inability to efficiently serialize data, we must have an intermediate > state before we pass data into {{{}RocksDB{}}}'s API. > h1. Benchmarks > h3. Local benchmarks > Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my > local environment with fsync disabled. I got the following results: > * {{{}Logit{}}}: > > {noformat} > Test write: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 23.541 > Total size : 16777216000 > Throughput(bps) : 712680684 > Throughput(rps) : 43498 > Test read: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 3.808 > Total size : 16777216000 > Throughput(bps) : 4405781512 > Throughput(rps) : 268907 > Test done!{noformat} > * {{{}RocksDB{}}}: > {noformat} > Test write: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 178.785 > Total size : 16777216000 > Throughput(bps) : 93840176 > Throughput(rps) : 5727 > Test read: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 13.572 > Total size : 16777216000 > Throughput(bps) : 1236163866 > Throughput(rps) : 75449 > Test done!{noformat} > While testing on local environment is not optimal, is still shows a huge > improvement in writing speed (7.5x) and reading speed (3.5x). Enabling > {{fsync}} sort-of equalizes writing speed, but we still expect that simpler > log implementation would be faster dues to smaller overall overhead. > h3. Integration testing > Benchmark for 3 servers and 1 client writing data in multiple threads show > the following. > {{{}RocksDB{}}}: > > !image-2024-09-20-10-39-22-043.png! > {{{}Logit{}}}: > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage
[ https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-23240: --- Description: h1. Preface Current implementation, based on {{{}RocksDB{}}}, is known to be way slower then it should be. There are multiple obvious reasons for that: * Writing into WAL +and+ memtable * Creating unique keys for every record * Inability to efficiently serialize data, we must have an intermediate state before we pass data into {{{}RocksDB{}}}'s API. h1. Benchmarks h3. Local benchmarks Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local environment with fsync disabled. I got the following results: * {{{}Logit{}}}: {noformat} Test write: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 23.541 Total size : 16777216000 Throughput(bps) : 712680684 Throughput(rps) : 43498 Test read: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 3.808 Total size : 16777216000 Throughput(bps) : 4405781512 Throughput(rps) : 268907 Test done!{noformat} * {{{}RocksDB{}}}: {noformat} Test write: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 178.785 Total size : 16777216000 Throughput(bps) : 93840176 Throughput(rps) : 5727 Test read: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 13.572 Total size : 16777216000 Throughput(bps) : 1236163866 Throughput(rps) : 75449 Test done!{noformat} While testing on local environment is not optimal, is still shows a huge improvement in writing speed (7.5x) and reading speed (3.5x). Enabling {{fsync}} sort-of equalizes writing speed, but we still expect that simpler log implementation would be faster dues to smaller overall overhead. h3. Integration testing Benchmark for 3 servers and 1 client writing data in multiple threads show the following. {{{}RocksDB{}}}: !image-2024-09-20-10-39-22-043.png! {{{}Logit{}}}: was: h1. Preface Current implementation, based on {{{}RocksDB{}}}, is known to be way slower then it should be. There are multiple obvious reasons for that: * Writing into WAL +and+ memtable * Creating unique keys for every record * Inability to efficiently serialize data, we must have an intermediate state before we pass data into {{{}RocksDB{}}}'s API. h1. Benchmarks h3. Local benchmarks Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local environment with fsync disabled. I got the following results: * {{{}Logit{}}}: {noformat} Test write: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 23.541 Total size : 16777216000 Throughput(bps) : 712680684 Throughput(rps) : 43498 Test read: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 3.808 Total size : 16777216000 Throughput(bps) : 4405781512 Throughput(rps) : 268907 Test done!{noformat} * {{{}RocksDB{}}}: {noformat} Test write: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 178.785 Total size : 16777216000 Throughput(bps) : 93840176 Throughput(rps) : 5727 Test read: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 13.572 Total size : 16777216000 Throughput(bps) : 1236163866 Throughput(rps) : 75449 Test done!{noformat} While testing on local environment is not optimal, is still shows a huge improvement in writing speed (7.5x) and reading speed (3.5x). Enabling {{fsync}} sort-of equalizes writing speed, but we still expect that simpler log implementation would be faster dues to smaller overall overhead. h3. Integration testing > Ignite 3 new log storage > > > Key: IGNITE-23240 > URL: https://issues.apache.org/jira/browse/IGNITE-23240 > Project: Ignite > Issue Type: Epic >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Attachments: Screenshot from 2024-09-20 10-38-53.png > > > h1. Preface > Current implementation, based on {{{}RocksDB{}}}, is known to be way slower > then it should be. There are multiple obvious reasons for that: > * Writing into WAL +and+ memtable > * Creating unique keys for every record > * Inability to efficiently serialize data, we must have an intermediate > state before we pass data into {{{}RocksDB{}}}'s API. > h1. Benchmarks > h3. Local benchmarks > Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my > local environment with fsync disabled. I got the following results: > * {{{}Logit{}}}: > > {noformat} > Test write: > Log number : 1024000 >
[jira] [Updated] (IGNITE-23240) Ignite 3 new log storage
[ https://issues.apache.org/jira/browse/IGNITE-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-23240: --- Attachment: image-2024-09-20-10-39-22-043.png > Ignite 3 new log storage > > > Key: IGNITE-23240 > URL: https://issues.apache.org/jira/browse/IGNITE-23240 > Project: Ignite > Issue Type: Epic >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Attachments: image-2024-09-20-10-39-22-043.png > > > h1. Preface > Current implementation, based on {{{}RocksDB{}}}, is known to be way slower > then it should be. There are multiple obvious reasons for that: > * Writing into WAL +and+ memtable > * Creating unique keys for every record > * Inability to efficiently serialize data, we must have an intermediate > state before we pass data into {{{}RocksDB{}}}'s API. > h1. Benchmarks > h3. Local benchmarks > Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my > local environment with fsync disabled. I got the following results: > * {{{}Logit{}}}: > > {noformat} > Test write: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 23.541 > Total size : 16777216000 > Throughput(bps) : 712680684 > Throughput(rps) : 43498 > Test read: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 3.808 > Total size : 16777216000 > Throughput(bps) : 4405781512 > Throughput(rps) : 268907 > Test done!{noformat} > * {{{}RocksDB{}}}: > {noformat} > Test write: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 178.785 > Total size : 16777216000 > Throughput(bps) : 93840176 > Throughput(rps) : 5727 > Test read: > Log number : 1024000 > Log Size : 16384 > Batch Size : 100 > Cost time(s) : 13.572 > Total size : 16777216000 > Throughput(bps) : 1236163866 > Throughput(rps) : 75449 > Test done!{noformat} > While testing on local environment is not optimal, is still shows a huge > improvement in writing speed (7.5x) and reading speed (3.5x). Enabling > {{fsync}} sort-of equalizes writing speed, but we still expect that simpler > log implementation would be faster dues to smaller overall overhead. > h3. Integration testing > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-23240) Ignite 3 new log storage
Ivan Bessonov created IGNITE-23240: -- Summary: Ignite 3 new log storage Key: IGNITE-23240 URL: https://issues.apache.org/jira/browse/IGNITE-23240 Project: Ignite Issue Type: Epic Reporter: Ivan Bessonov Attachments: image-2024-09-20-10-39-22-043.png h1. Preface Current implementation, based on {{{}RocksDB{}}}, is known to be way slower then it should be. There are multiple obvious reasons for that: * Writing into WAL +and+ memtable * Creating unique keys for every record * Inability to efficiently serialize data, we must have an intermediate state before we pass data into {{{}RocksDB{}}}'s API. h1. Benchmarks h3. Local benchmarks Local benchmarks ({{{}LogStorageBenchmarks{}}}) have been performed on my local environment with fsync disabled. I got the following results: * {{{}Logit{}}}: {noformat} Test write: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 23.541 Total size : 16777216000 Throughput(bps) : 712680684 Throughput(rps) : 43498 Test read: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 3.808 Total size : 16777216000 Throughput(bps) : 4405781512 Throughput(rps) : 268907 Test done!{noformat} * {{{}RocksDB{}}}: {noformat} Test write: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 178.785 Total size : 16777216000 Throughput(bps) : 93840176 Throughput(rps) : 5727 Test read: Log number : 1024000 Log Size : 16384 Batch Size : 100 Cost time(s) : 13.572 Total size : 16777216000 Throughput(bps) : 1236163866 Throughput(rps) : 75449 Test done!{noformat} While testing on local environment is not optimal, is still shows a huge improvement in writing speed (7.5x) and reading speed (3.5x). Enabling {{fsync}} sort-of equalizes writing speed, but we still expect that simpler log implementation would be faster dues to smaller overall overhead. h3. Integration testing -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-22843) Writing into RAFT log is too long
[ https://issues.apache.org/jira/browse/IGNITE-22843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882920#comment-17882920 ] Ivan Bessonov commented on IGNITE-22843: Totally agree with what Roman said, I'll elaborate on this topic in a separate JIRA. Thank you! > Writing into RAFT log is too long > - > > Key: IGNITE-22843 > URL: https://issues.apache.org/jira/browse/IGNITE-22843 > Project: Ignite > Issue Type: Improvement >Reporter: Vladislav Pyatkov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Time Spent: 1h > Remaining Estimate: 0h > > h3. Motivation > We are using RocksDB as RAFT log storage. Writing in the log is significantly > longer than writing in the memory-mapped buffer (as we used in Ignite 2). > {noformat} > appendLogEntry 0.8 6493700 6494500 > Here is hidden 0.5 us > flushLog 20.1 6495000 6515100 > Here is hidden 2.8 us > {noformat} > h3. Definition of done > We should find a way to implement faster log storage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-22843) Writing into RAFT log is too long
[ https://issues.apache.org/jira/browse/IGNITE-22843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22843: --- Reviewer: Roman Puchkovskiy > Writing into RAFT log is too long > - > > Key: IGNITE-22843 > URL: https://issues.apache.org/jira/browse/IGNITE-22843 > Project: Ignite > Issue Type: Improvement >Reporter: Vladislav Pyatkov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Time Spent: 10m > Remaining Estimate: 0h > > h3. Motivation > We are using RocksDB as RAFT log storage. Writing in the log is significantly > longer than writing in the memory-mapped buffer (as we used in Ignite 2). > {noformat} > appendLogEntry 0.8 6493700 6494500 > Here is hidden 0.5 us > flushLog 20.1 6495000 6515100 > Here is hidden 2.8 us > {noformat} > h3. Definition of done > We should find a way to implement faster log storage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-23212) Page replacement doesn't work sometimes
Ivan Bessonov created IGNITE-23212: -- Summary: Page replacement doesn't work sometimes Key: IGNITE-23212 URL: https://issues.apache.org/jira/browse/IGNITE-23212 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Under a sophisticated load, we sometimes see the following exception: {noformat} org.apache.ignite.lang.IgniteException: Error while executing addWriteCommitted: [rowId=RowId [partitionId=13, uuid=0191-eb5c-824c-7a07-fa1210b49ed8], tableId=10, partitionId=13] at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710) ~[?:?] at org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:789) ~[ignite-core-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:723) ~[ignite-core-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:525) ~[ignite-core-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.util.ViewUtils.copyExceptionWithCauseIfPossible(ViewUtils.java:91) ~[ignite-core-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.util.ViewUtils.ensurePublicException(ViewUtils.java:71) ~[ignite-core-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.util.ViewUtils.sync(ViewUtils.java:54) ~[ignite-core-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.client.table.ClientKeyValueBinaryView.put(ClientKeyValueBinaryView.java:207) ~[ignite-client-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.client.table.ClientKeyValueBinaryView.put(ClientKeyValueBinaryView.java:60) ~[ignite-client-3.0.0-SNAPSHOT.jar:?] at site.ycsb.db.ignite3.IgniteClient.insert(IgniteClient.java:49) [ignite3-binding-2024.15.jar:?] at site.ycsb.DBWrapper.insert(DBWrapper.java:284) [core-2024.15.jar:?] at site.ycsb.workloads.CoreWorkload.doInsert(CoreWorkload.java:657) [core-2024.15.jar:?] at site.ycsb.ClientThread.run(ClientThread.java:181) [core-2024.15.jar:?] at java.lang.Thread.run(Thread.java:829) [?:?] Caused by: org.apache.ignite.lang.IgniteException: Error while executing addWriteCommitted: [rowId=RowId [partitionId=13, uuid=0191-eb5c-824c-7a07-fa1210b49ed8], tableId=10, partitionId=13] at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710) ~[?:?] at org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:789) ~[ignite-core-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:723) ~[ignite-core-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:525) ~[ignite-core-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.client.TcpClientChannel.readError(TcpClientChannel.java:549) ~[ignite-client-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.client.TcpClientChannel.processNextMessage(TcpClientChannel.java:435) ~[ignite-client-3.0.0-SNAPSHOT.jar:?] at org.apache.ignite.internal.client.TcpClientChannel.lambda$onMessage$3(TcpClientChannel.java:277) ~[ignite-client-3.0.0-SNAPSHOT.jar:?] at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426) ~[?:?] at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) ~[?:?] at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) ~[?:?] at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) ~[?:?] at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) ~[?:?] at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) ~[?:?] Caused by: org.apache.ignite.lang.IgniteException: org.apache.ignite.lang.IgniteException: IGN-CMN-65535 TraceId:60b4295a-8c18-4cdb-93e8-266bc9aaed88 Error while executing addWriteCommitted: [rowId=RowId [partitionId=13, uuid=0191-eb5c-824c-7a07-fa1210b49ed8], tableId=10, partitionId=13] at org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.lambda$mapToPublicException$2(IgniteExceptionMapperUtil.java:88) at org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.mapCheckingResultIsPublic(IgniteExceptionMapperUtil.java:141) at org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.mapToPublicException(IgniteExceptionMapperUtil.java:137) at org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.mapToPublicException(IgniteExceptionMapperUtil.java:88) at org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.lambda$convertToPublicFuture$3(IgniteExceptionMapperUtil.java:178) at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930) at
[jira] [Updated] (IGNITE-23056) Verbose logging of delta-files compaction
[ https://issues.apache.org/jira/browse/IGNITE-23056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-23056: --- Ignite Flags: (was: Docs Required,Release Notes Required) > Verbose logging of delta-files compaction > - > > Key: IGNITE-23056 > URL: https://issues.apache.org/jira/browse/IGNITE-23056 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Fix For: 3.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > For checkpoints we have a very extensive log message, that shows the duration > of each checkpoint's phase. We don't have that for compactor. > In this Jira we need to implement that. The list of phases and statistics is > at developers discretion. > As a bonus, we might want to print some values in microseconds instead of > milliseconds, and not use fast timestamp (they don't have enough > granularity). While we're doing that for compactor logs, we might as well > update checkpoint's logs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-23189) RocksDB tests flush too often
[ https://issues.apache.org/jira/browse/IGNITE-23189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-23189: --- Description: Write buffer size in rocksdb unit test for a corresponding storage engine is too small, it's flushed literally after every insertion. This makes these tests longer then they have to be, sometimes several seconds instead of several hundreds milliseconds. We should make this size bigger (was: Write buffer size in tests is too small, it's flushed literally after every insertion) > RocksDB tests flush too often > - > > Key: IGNITE-23189 > URL: https://issues.apache.org/jira/browse/IGNITE-23189 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Fix For: 3.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Write buffer size in rocksdb unit test for a corresponding storage engine is > too small, it's flushed literally after every insertion. This makes these > tests longer then they have to be, sometimes several seconds instead of > several hundreds milliseconds. We should make this size bigger -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-23189) RocksDB tests flush too often
Ivan Bessonov created IGNITE-23189: -- Summary: RocksDB tests flush too often Key: IGNITE-23189 URL: https://issues.apache.org/jira/browse/IGNITE-23189 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Assignee: Ivan Bessonov Fix For: 3.0 Write buffer size in tests is too small, it's flushed literally after every insertion -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-22843) Writing into RAFT log is too long
[ https://issues.apache.org/jira/browse/IGNITE-22843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov reassigned IGNITE-22843: -- Assignee: Ivan Bessonov > Writing into RAFT log is too long > - > > Key: IGNITE-22843 > URL: https://issues.apache.org/jira/browse/IGNITE-22843 > Project: Ignite > Issue Type: Improvement >Reporter: Vladislav Pyatkov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > h3. Motivation > We are using RocksDB as RAFT log storage. Writing in the log is significantly > longer than writing in the memory-mapped buffer (as we used in Ignite 2). > {noformat} > appendLogEntry 0.8 6493700 6494500 > Here is hidden 0.5 us > flushLog 20.1 6495000 6515100 > Here is hidden 2.8 us > {noformat} > h3. Definition of done > We should find a way to implement faster log storage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-22609) MetaStorageListener can access KeyValueStorage after it had been closed
[ https://issues.apache.org/jira/browse/IGNITE-22609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov reassigned IGNITE-22609: -- Assignee: Ivan Bessonov > MetaStorageListener can access KeyValueStorage after it had been closed > --- > > Key: IGNITE-22609 > URL: https://issues.apache.org/jira/browse/IGNITE-22609 > Project: Ignite > Issue Type: Bug >Reporter: Aleksandr Polovtsev >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Attachments: hs_err_pid2936001.log > > > See the attached stacktrace for details. Looks like we are trying to process > a read command in {{MetaStorageListener}} while the underlying storage has > already been closed. > This may be happening because looks like we don't guarantee that > {{RaftManager#stopRaftNodes}} waits for all read commands to be processed. If > this is the case, a possible solution would be to add a busy lock either to > the {{MetaStorageListener}} or to the {{KeyValueStorage}}. But this needs to > be verified first. > Also, it must be checked if similar Raft Listeners in other components are > affected by the same issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IGNITE-22598) Failed to allocate temporary buffer for checkpoint
[ https://issues.apache.org/jira/browse/IGNITE-22598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov resolved IGNITE-22598. Resolution: Duplicate > Failed to allocate temporary buffer for checkpoint > -- > > Key: IGNITE-22598 > URL: https://issues.apache.org/jira/browse/IGNITE-22598 > Project: Ignite > Issue Type: Bug >Reporter: Vladislav Pyatkov >Priority: Major > Labels: ignite-3 > Attachments: > poc-tester-SERVER-192.168.1.41-id-0-2024-06-27-09-14-17-client.log.2 > > > h3. Motivation > Many exception might appear in log of the thrutput test. After the partition > storage is in undefined state. Notsurprised the te continue work with the > storage leads to another issues. > {noformat} > 2024-06-27 12:19:46:881 +0300 > [INFO][%poc-tester-SERVER-192.168.1.41-id-0%JRaft-FSMCaller-Disruptor_stripe_6-0][ActionRequestProcessor] > Error occurred on a user's state machine > org.apache.ignite.internal.storage.StorageException: IGN-STORAGE-1 > TraceId:0d512917-7a88-4a7c-94c9-03d86304997d Failed to put value into index > at > org.apache.ignite.internal.storage.pagememory.index.hash.PageMemoryHashIndexStorage.lambda$put$1(PageMemoryHashIndexStorage.java:123) > at > org.apache.ignite.internal.storage.pagememory.index.AbstractPageMemoryIndexStorage.busy(AbstractPageMemoryIndexStorage.java:336) > at > org.apache.ignite.internal.storage.pagememory.index.AbstractPageMemoryIndexStorage.busyNonDataRead(AbstractPageMemoryIndexStorage.java:317) > at > org.apache.ignite.internal.storage.pagememory.index.hash.PageMemoryHashIndexStorage.put(PageMemoryHashIndexStorage.java:109) > at > org.apache.ignite.internal.table.distributed.TableSchemaAwareIndexStorage.put(TableSchemaAwareIndexStorage.java:83) > at > org.apache.ignite.internal.table.distributed.index.IndexUpdateHandler.putToIndex(IndexUpdateHandler.java:270) > at > org.apache.ignite.internal.table.distributed.index.IndexUpdateHandler.addToIndexes(IndexUpdateHandler.java:69) > at > org.apache.ignite.internal.table.distributed.StorageUpdateHandler.tryProcessRow(StorageUpdateHandler.java:173) > at > org.apache.ignite.internal.table.distributed.StorageUpdateHandler.lambda$handleUpdate$0(StorageUpdateHandler.java:114) > at > org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.lambda$runConsistently$0(PersistentPageMemoryMvPartitionStorage.java:165) > at > org.apache.ignite.internal.storage.pagememory.mv.AbstractPageMemoryMvPartitionStorage.busy(AbstractPageMemoryMvPartitionStorage.java:668) > at > org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.runConsistently(PersistentPageMemoryMvPartitionStorage.java:155) > at > org.apache.ignite.internal.table.distributed.raft.snapshot.outgoing.SnapshotAwarePartitionDataStorage.runConsistently(SnapshotAwarePartitionDataStorage.java:76) > at > org.apache.ignite.internal.table.distributed.StorageUpdateHandler.handleUpdate(StorageUpdateHandler.java:109) > at > org.apache.ignite.internal.table.distributed.raft.PartitionListener.handleUpdateCommand(PartitionListener.java:289) > at > org.apache.ignite.internal.table.distributed.raft.PartitionListener.lambda$onWrite$1(PartitionListener.java:209) > at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) > at > org.apache.ignite.internal.table.distributed.raft.PartitionListener.onWrite(PartitionListener.java:166) > at > org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:702) > at > org.apache.ignite.raft.jraft.core.FSMCallerImpl.doApplyTasks(FSMCallerImpl.java:571) > at > org.apache.ignite.raft.jraft.core.FSMCallerImpl.doCommitted(FSMCallerImpl.java:539) > at > org.apache.ignite.raft.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:458) > at > org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:131) > at > org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:125) > at > org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:326) > at > org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:283) > at > com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:167) > at > com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:122) > at java.base/java.lang.Thread.run(Thread.java:829) > Caused by: org.apache.ignite.internal.pagememory.tree.CorruptedTreeExc
[jira] [Updated] (IGNITE-23084) Implement checkpoint buffer protection
[ https://issues.apache.org/jira/browse/IGNITE-23084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-23084: --- Ignite Flags: (was: Docs Required,Release Notes Required) > Implement checkpoint buffer protection > -- > > Key: IGNITE-23084 > URL: https://issues.apache.org/jira/browse/IGNITE-23084 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Current implementation of checkpoint allows for checkpoint buffer overflow. > We should port a part that prioritizes cp-buffer pages in checkpoint writer, > if it's close to overflow. > Throttling should not be ported as of right now. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-23106) Wait for free space in checkpoint buffer
[ https://issues.apache.org/jira/browse/IGNITE-23106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-23106: --- Description: In PersistentPageMemory#postWriteLockPage we use a spin wait instead of properly waiting for a notification from checkpointer. This is, most likely, not optimal, and we should implement a fair waiting algorithm. Another thing to consider - we should make checkpoint buffer size configurable. was:In PersistentPageMemory#postWriteLockPage we use a spin wait instead of properly waiting for a notification from checkpointer. This is, most likely, not optimal, and we should implement a fair waiting algorithm. > Wait for free space in checkpoint buffer > > > Key: IGNITE-23106 > URL: https://issues.apache.org/jira/browse/IGNITE-23106 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > In PersistentPageMemory#postWriteLockPage we use a spin wait instead of > properly waiting for a notification from checkpointer. This is, most likely, > not optimal, and we should implement a fair waiting algorithm. > Another thing to consider - we should make checkpoint buffer size > configurable. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-23115) Checkpoint single partition from a single thread
Ivan Bessonov created IGNITE-23115: -- Summary: Checkpoint single partition from a single thread Key: IGNITE-23115 URL: https://issues.apache.org/jira/browse/IGNITE-23115 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov As far as I know, writing multiple files from multiple threads is more efficient that writing a single file from multiple threads. But that's exactly what we do. We should make an alternative implementation that would distribute partitions between threads and check if it performs better than current implementation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-23106) Wait for free space in checkpoint buffer
Ivan Bessonov created IGNITE-23106: -- Summary: Wait for free space in checkpoint buffer Key: IGNITE-23106 URL: https://issues.apache.org/jira/browse/IGNITE-23106 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov In PersistentPageMemory#postWriteLockPage we use a spin wait instead of properly waiting for a notification from checkpointer. This is, most likely, not optimal, and we should implement a fair waiting algorithm. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-23105) Data race in aipersist partition destruction
Ivan Bessonov created IGNITE-23105: -- Summary: Data race in aipersist partition destruction Key: IGNITE-23105 URL: https://issues.apache.org/jira/browse/IGNITE-23105 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov {{CheckpointProgressImpl#onStartPartitionProcessing}} and {{CheckpointProgressImpl#onFinishPartitionProcessing}} don't work as intended for several reasons: * There's a race, we could call {{onFinish}} before {{onStart}} is called in a concurrent thread. This might happen if there's only a handful of dirty pages in each partition and there are more than one checkpoint threads. Basically, this protection doesn't work. * Even if that particular race wouldn't exits, this code still doesn't work, because some of pages could be added to {{pageIdsToRetry}} map. That map will be processed later, when {{writePages}} is finished, manning that we mark unfinished partitions as finished. * Due to aforementioned bugs, I didn't bother including these methods to {{{}drainCheckpointBuffers{}}}. As a result, this method requires a fix too -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-23103) RandomLruPageReplacementPolicy is not fully ported
Ivan Bessonov created IGNITE-23103: -- Summary: RandomLruPageReplacementPolicy is not fully ported Key: IGNITE-23103 URL: https://issues.apache.org/jira/browse/IGNITE-23103 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov There should be a line {code:java} if (relRmvAddr == rndAddr || pinned || skip || (dirty && (checkpointPages == null || !checkpointPages.contains(fullId {{code} instead of {code:java} if (relRmvAddr == rndAddr || pinned || skip || dirty) { {code} Due to this mistake we have several conditions, that are always evaluated to constants, namely * {{!dirty}} - always true * {{pageTs < dirtyTs && dirty && !storMeta}} - always false Ideally, we should add tests that would cover this situation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-23084) Implement checkpoint buffer protection
[ https://issues.apache.org/jira/browse/IGNITE-23084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov reassigned IGNITE-23084: -- Assignee: Ivan Bessonov > Implement checkpoint buffer protection > -- > > Key: IGNITE-23084 > URL: https://issues.apache.org/jira/browse/IGNITE-23084 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > Current implementation of checkpoint allows for checkpoint buffer overflow. > We should port a part that prioritizes cp-buffer pages in checkpoint writer, > if it's close to overflow. > Throttling should not be ported as of right now. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-23084) Implement checkpoint buffer protection
Ivan Bessonov created IGNITE-23084: -- Summary: Implement checkpoint buffer protection Key: IGNITE-23084 URL: https://issues.apache.org/jira/browse/IGNITE-23084 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Current implementation of checkpoint allows for checkpoint buffer overflow. We should port a part that prioritizes cp-buffer pages in checkpoint writer, if it's close to overflow. Throttling should not be ported as of right now. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-22878) Periodic latency sinks on key-value KeyValueView#put
[ https://issues.apache.org/jira/browse/IGNITE-22878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22878: --- Description: h1. Results I put it right here, because comments can be missed easily. * The main reason of performance dips is the fact that we locate both raft log and table data on the same storage device. We should test a configuration where they are separated. * {{rocksdb}} based log storage adds minor issues during its flush and compaction, it might cause 10-20% dips. It's not too critical, but it once again shows downsides of current implementation. Reducing the number of threads that write SST files and compact them doesn't seem to do anything, although it's hard to say precisely. This part is not configurable, but I would investigate separately, whether or not it would make sense to set those values to 1. * Nothing really changes when you disable fsync. * Table data checkpoints and compaction have the most impact. For some reason, first checkpoint impacts the performance the worst, maybe due to some kind of a warmup. Making checkpoints more frequent helps smoothing out the graph a little. Reducing the number of checkpoint threads and compaction threads also helps smoothing out the graph, effects are more visible. Checkpoints become longer, obviously, but still don't overlap in single-put KV tests even under high load. What's implemented in current JIRA: * Basic logs of rocksdb compaction. * Basic logs of aipersist compaction, that should be expanded in https://issues.apache.org/jira/browse/IGNITE-23056. h1. Description Build under test: Ignite 3, rev. 1e8959c0a000f0901085eb0b11b37db4299fa72a Benchmark: [https://github.com/gridgain/YCSB/blob/ycsb-2024.14/ignite3/src/main/java/site/ycsb/db/ignite3/IgniteClient.java] h1. Test environment 6 AWS VMs of type c5d.4xlarge: * vCPU 16 * Memory 32 * Storage 400 NVMe SSD * Network up to 10 Gbps h1. Test Start 3 Ignite nodes (one node per host). Configuration: * raft.fsync=false * partitions=16 * replicas=1 Start 3 YCSB clients (one client per host). Each YCSB client spawns 32 load threads and works with own key range. Parameters: * Client 1: {{-db site.ycsb.db.ignite3.IgniteClient -load -P /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 -p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p status.interval=1 -p partitions=16 -p insertstart=510 -p insertcount=500 -s}} * Client 2: {{-db site.ycsb.db.ignite3.IgniteClient -load -P /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 -p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p status.interval=1 -p partitions=16 -p insertstart=0 -p insertcount=500 -s}} * {{{}Client 3: {{-db site.ycsb.db.ignite3.IgniteClient -load -P /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 -p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p status.interval=1 -p partitions=16 -p insertstart=1020 -p insertcount=500 -s{} h1. Results Results from each client are in the separate files (attached). >From these files we can draw transactions-per-second graphs: !cl1.png!!cl2.png!!cl3.png! Take a look at these sinks. We need to investigate the cause of them. was: h1. Results I put it right here, because comments can be missed easily. * The main reason of performance dips is the fact that we locate both raft log and table data on the same storage device. We should test a configuration where they are separated. * {{rocksdb}} based log storage adds minor issues during its flush and compaction, it might cause 10-20% dips. It's not too critical, but it once again shows downsides of current implementation. Reducing the number of threads that write SST files and compact them doesn't seem to do anything, although it's hard to say precisely. This part is not configurable, but I would investigate separately, whether or not it would make sense to set those values to 1. * Nothing really changes when you disable fsync. * Table data checkpoints and compaction have the most impact. For some reason, first checkpoint impacts the performance the worst, maybe due to some kind of a warmup. Making checkpoints more frequent helps smoothing out the graph a little. Reducing the number of checkpoint threads and compaction threads also helps smoothing out the graph, effects are more visible. Checkpoints become longer, obviously, but still don't overlap in single-put KV tests even under high load. h1. Description Build under test: Ignite 3, rev. 1e8959c0a000f0901085eb0b11b37db4299fa72a Benchmark: [https://githu
[jira] [Updated] (IGNITE-22878) Periodic latency sinks on key-value KeyValueView#put
[ https://issues.apache.org/jira/browse/IGNITE-22878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22878: --- Description: h1. Results I put it right here, because comments can be missed easily. * The main reason of performance dips is the fact that we locate both raft log and table data on the same storage device. We should test a configuration where they are separated. * {{rocksdb}} based log storage adds minor issues during its flush and compaction, it might cause 10-20% dips. It's not too critical, but it once again shows downsides of current implementation. Reducing the number of threads that write SST files and compact them doesn't seem to do anything, although it's hard to say precisely. This part is not configurable, but I would investigate separately, whether or not it would make sense to set those values to 1. * Nothing really changes when you disable fsync. * Table data checkpoints and compaction have the most impact. For some reason, first checkpoint impacts the performance the worst, maybe due to some kind of a warmup. Making checkpoints more frequent helps smoothing out the graph a little. Reducing the number of checkpoint threads and compaction threads also helps smoothing out the graph, effects are more visible. Checkpoints become longer, obviously, but still don't overlap in single-put KV tests even under high load. h1. Description Build under test: Ignite 3, rev. 1e8959c0a000f0901085eb0b11b37db4299fa72a Benchmark: [https://github.com/gridgain/YCSB/blob/ycsb-2024.14/ignite3/src/main/java/site/ycsb/db/ignite3/IgniteClient.java] h1. Test environment 6 AWS VMs of type c5d.4xlarge: * vCPU 16 * Memory 32 * Storage 400 NVMe SSD * Network up to 10 Gbps h1. Test Start 3 Ignite nodes (one node per host). Configuration: * raft.fsync=false * partitions=16 * replicas=1 Start 3 YCSB clients (one client per host). Each YCSB client spawns 32 load threads and works with own key range. Parameters: * Client 1: {{-db site.ycsb.db.ignite3.IgniteClient -load -P /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 -p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p status.interval=1 -p partitions=16 -p insertstart=510 -p insertcount=500 -s}} * Client 2: {{-db site.ycsb.db.ignite3.IgniteClient -load -P /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 -p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p status.interval=1 -p partitions=16 -p insertstart=0 -p insertcount=500 -s}} * {{{}Client 3: {{-db site.ycsb.db.ignite3.IgniteClient -load -P /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 -p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p status.interval=1 -p partitions=16 -p insertstart=1020 -p insertcount=500 -s{} h1. Results Results from each client are in the separate files (attached). >From these files we can draw transactions-per-second graphs: !cl1.png!!cl2.png!!cl3.png! Take a look at these sinks. We need to investigate the cause of them. was: h1. Results I put it right here, because comments can be missed easily. * The main reason of performance dips is the fact that we locate both raft log and table data on the same storage device. We should test a configuration where they are separated. * {{rocksdb}} based log storage adds minor issues during its flush and compaction, it might cause 10-20% dips. It's not too critical, but it once again shows downsides of current implementation. Reducing the number of threads that write SST files and compact them doesn't seem to do anything, although it's hard to say precisely. This part is not configurable, but I would investigate separately, whether or not it would make sense to set those values to 1. * Table data checkpoints and compaction have the most impact. For some reason, first checkpoint impacts the performance the worst, maybe due to some kind of a warmup. Making checkpoints more frequent helps smoothing out the graph a little. Reducing the number of checkpoint threads and compaction threads also helps smoothing out the graph, effects are more visible. Checkpoints become longer, obviously, but still don't overlap in single-put KV tests even under high load. h1. Description Build under test: Ignite 3, rev. 1e8959c0a000f0901085eb0b11b37db4299fa72a Benchmark: [https://github.com/gridgain/YCSB/blob/ycsb-2024.14/ignite3/src/main/java/site/ycsb/db/ignite3/IgniteClient.java] h1. Test environment 6 AWS VMs of type c5d.4xlarge: * vCPU 16 * Memory 32 * Storage 400 NVMe SSD * Network up to 10 Gbps h1
[jira] [Updated] (IGNITE-22878) Periodic latency sinks on key-value KeyValueView#put
[ https://issues.apache.org/jira/browse/IGNITE-22878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22878: --- Description: h1. Results I put it right here, because comments can be missed easily. * The main reason of performance dips is the fact that we locate both raft log and table data on the same storage device. We should test a configuration where they are separated. * {{rocksdb}} based log storage adds minor issues during its flush and compaction, it might cause 10-20% dips. It's not too critical, but it once again shows downsides of current implementation. Reducing the number of threads that write SST files and compact them doesn't seem to do anything, although it's hard to say precisely. This part is not configurable, but I would investigate separately, whether or not it would make sense to set those values to 1. * Table data checkpoints and compaction have the most impact. For some reason, first checkpoint impacts the performance the worst, maybe due to some kind of a warmup. Making checkpoints more frequent helps smoothing out the graph a little. Reducing the number of checkpoint threads and compaction threads also helps smoothing out the graph, effects are more visible. Checkpoints become longer, obviously, but still don't overlap in single-put KV tests even under high load. h1. Description Build under test: Ignite 3, rev. 1e8959c0a000f0901085eb0b11b37db4299fa72a Benchmark: [https://github.com/gridgain/YCSB/blob/ycsb-2024.14/ignite3/src/main/java/site/ycsb/db/ignite3/IgniteClient.java] h1. Test environment 6 AWS VMs of type c5d.4xlarge: * vCPU 16 * Memory 32 * Storage 400 NVMe SSD * Network up to 10 Gbps h1. Test Start 3 Ignite nodes (one node per host). Configuration: * raft.fsync=false * partitions=16 * replicas=1 Start 3 YCSB clients (one client per host). Each YCSB client spawns 32 load threads and works with own key range. Parameters: * Client 1: {{-db site.ycsb.db.ignite3.IgniteClient -load -P /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 -p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p status.interval=1 -p partitions=16 -p insertstart=510 -p insertcount=500 -s}} * Client 2: {{-db site.ycsb.db.ignite3.IgniteClient -load -P /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 -p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p status.interval=1 -p partitions=16 -p insertstart=0 -p insertcount=500 -s}} * {{{}Client 3: {{-db site.ycsb.db.ignite3.IgniteClient -load -P /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 -p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p status.interval=1 -p partitions=16 -p insertstart=1020 -p insertcount=500 -s{} h1. Results Results from each client are in the separate files (attached). >From these files we can draw transactions-per-second graphs: !cl1.png!!cl2.png!!cl3.png! Take a look at these sinks. We need to investigate the cause of them. was: Build under test: Ignite 3, rev. 1e8959c0a000f0901085eb0b11b37db4299fa72a Benchmark: [https://github.com/gridgain/YCSB/blob/ycsb-2024.14/ignite3/src/main/java/site/ycsb/db/ignite3/IgniteClient.java] h1. Test environment 6 AWS VMs of type c5d.4xlarge: * vCPU 16 * Memory 32 * Storage 400 NVMe SSD * Network up to 10 Gbps h1. Test Start 3 Ignite nodes (one node per host). Configuration: * raft.fsync=false * partitions=16 * replicas=1 Start 3 YCSB clients (one client per host). Each YCSB client spawns 32 load threads and works with own key range. Parameters: * Client 1: {{-db site.ycsb.db.ignite3.IgniteClient -load -P /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 -p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p status.interval=1 -p partitions=16 -p insertstart=510 -p insertcount=500 -s}} * Client 2: {{-db site.ycsb.db.ignite3.IgniteClient -load -P /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 -p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p status.interval=1 -p partitions=16 -p insertstart=0 -p insertcount=500 -s}} * {{Client 3: {{-db site.ycsb.db.ignite3.IgniteClient -load -P /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 -p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p s
[jira] [Updated] (IGNITE-22878) Periodic latency sinks on key-value KeyValueView#put
[ https://issues.apache.org/jira/browse/IGNITE-22878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22878: --- Reviewer: Aleksandr Polovtsev > Periodic latency sinks on key-value KeyValueView#put > > > Key: IGNITE-22878 > URL: https://issues.apache.org/jira/browse/IGNITE-22878 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 3.0.0-beta2 >Reporter: Ivan Artiukhov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Attachments: 2024-08-01-11-36-02_192.168.208.148_kv_load.txt, > 2024-08-01-11-36-02_192.168.209.141_kv_load.txt, > 2024-08-01-11-36-02_192.168.209.191_kv_load.txt, cl1.png, cl2.png, cl3.png > > Time Spent: 10m > Remaining Estimate: 0h > > Build under test: Ignite 3, rev. 1e8959c0a000f0901085eb0b11b37db4299fa72a > Benchmark: > [https://github.com/gridgain/YCSB/blob/ycsb-2024.14/ignite3/src/main/java/site/ycsb/db/ignite3/IgniteClient.java] > > h1. Test environment > 6 AWS VMs of type c5d.4xlarge: > * vCPU 16 > * Memory 32 > * Storage 400 NVMe SSD > * Network up to 10 Gbps > h1. Test > Start 3 Ignite nodes (one node per host). Configuration: > * raft.fsync=false > * partitions=16 > * replicas=1 > Start 3 YCSB clients (one client per host). Each YCSB client spawns 32 load > threads and works with own key range. Parameters: > * Client 1: {{-db site.ycsb.db.ignite3.IgniteClient -load -P > /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p > hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 > -p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p > status.interval=1 -p partitions=16 -p insertstart=510 -p > insertcount=500 -s}} > * Client 2: {{-db site.ycsb.db.ignite3.IgniteClient -load -P > /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p > hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 > -p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p > status.interval=1 -p partitions=16 -p insertstart=0 -p insertcount=500 > -s}} > * {{Client 3: {{-db site.ycsb.db.ignite3.IgniteClient -load -P > /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p > hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 > -p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p > status.interval=1 -p partitions=16 -p insertstart=1020 -p > insertcount=500 -s > h1. Results > Results from each client are in the separate files (attached). > From these files we can draw transactions-per-second graphs: > !cl1.png!!cl2.png!!cl3.png! > Take a look at these sinks. We need to investigate the cause of them. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-23056) Verbose logging of delta-files compaction
Ivan Bessonov created IGNITE-23056: -- Summary: Verbose logging of delta-files compaction Key: IGNITE-23056 URL: https://issues.apache.org/jira/browse/IGNITE-23056 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov For checkpoints we have a very extensive log message, that shows the duration of each checkpoint's phase. We don't have that for compactor. In this Jira we need to implement that. The list of phases and statistics is at developers discretion. As a bonus, we might want to print some values in microseconds instead of milliseconds, and not use fast timestamp (they don't have enough granularity). While we're doing that for compactor logs, we might as well update checkpoint's logs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-22987) Log rocksdb flush events
[ https://issues.apache.org/jira/browse/IGNITE-22987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22987: --- Reviewer: Roman Puchkovskiy (was: Philipp Shergalis) > Log rocksdb flush events > > > Key: IGNITE-22987 > URL: https://issues.apache.org/jira/browse/IGNITE-22987 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 10m > Remaining Estimate: 0h > > * {{org.apache.ignite.internal.rocksdb.flush.RocksDbFlushListener}} should > log its events and basic info about them (once per flush, we don't need an > individual log message per CF) > * We should add this listener to log storage, these logs will be most > valuable in it -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-22987) Log rocksdb flush events
[ https://issues.apache.org/jira/browse/IGNITE-22987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov reassigned IGNITE-22987: -- Assignee: Ivan Bessonov > Log rocksdb flush events > > > Key: IGNITE-22987 > URL: https://issues.apache.org/jira/browse/IGNITE-22987 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > * {{org.apache.ignite.internal.rocksdb.flush.RocksDbFlushListener}} should > log its events and basic info about them (once per flush, we don't need an > individual log message per CF) > * We should add this listener to log storage, these logs will be most > valuable in it -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-22987) Log rocksdb flush events
Ivan Bessonov created IGNITE-22987: -- Summary: Log rocksdb flush events Key: IGNITE-22987 URL: https://issues.apache.org/jira/browse/IGNITE-22987 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov * {{org.apache.ignite.internal.rocksdb.flush.RocksDbFlushListener}} should log its events and basic info about them (once per flush, we don't need an individual log message per CF) * We should add this listener to log storage, these logs will be most valuable in it -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-22878) Periodic latency sinks on key-value KeyValueView#put
[ https://issues.apache.org/jira/browse/IGNITE-22878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov reassigned IGNITE-22878: -- Assignee: Ivan Bessonov > Periodic latency sinks on key-value KeyValueView#put > > > Key: IGNITE-22878 > URL: https://issues.apache.org/jira/browse/IGNITE-22878 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 3.0.0-beta2 >Reporter: Ivan Artiukhov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Attachments: 2024-08-01-11-36-02_192.168.208.148_kv_load.txt, > 2024-08-01-11-36-02_192.168.209.141_kv_load.txt, > 2024-08-01-11-36-02_192.168.209.191_kv_load.txt, cl1.png, cl2.png, cl3.png > > Time Spent: 10m > Remaining Estimate: 0h > > Build under test: Ignite 3, rev. 1e8959c0a000f0901085eb0b11b37db4299fa72a > Benchmark: > [https://github.com/gridgain/YCSB/blob/ycsb-2024.14/ignite3/src/main/java/site/ycsb/db/ignite3/IgniteClient.java] > > h1. Test environment > 6 AWS VMs of type c5d.4xlarge: > * vCPU 16 > * Memory 32 > * Storage 400 NVMe SSD > * Network up to 10 Gbps > h1. Test > Start 3 Ignite nodes (one node per host). Configuration: > * raft.fsync=false > * partitions=16 > * replicas=1 > Start 3 YCSB clients (one client per host). Each YCSB client spawns 32 load > threads and works with own key range. Parameters: > * Client 1: {{-db site.ycsb.db.ignite3.IgniteClient -load -P > /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p > hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 > -p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p > status.interval=1 -p partitions=16 -p insertstart=510 -p > insertcount=500 -s}} > * Client 2: {{-db site.ycsb.db.ignite3.IgniteClient -load -P > /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p > hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 > -p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p > status.interval=1 -p partitions=16 -p insertstart=0 -p insertcount=500 > -s}} > * {{Client 3: {{-db site.ycsb.db.ignite3.IgniteClient -load -P > /opt/pubagent/poc/config/ycsb/workloads/workloadc -threads 32 -p > hosts=192.168.208.221,192.168.210.120,192.168.211.201 -p recordcount=1530 > -p warmupops=10 -p dataintegrity=true -p measurementtype=timeseries -p > status.interval=1 -p partitions=16 -p insertstart=1020 -p > insertcount=500 -s > h1. Results > Results from each client are in the separate files (attached). > From these files we can draw transactions-per-second graphs: > !cl1.png!!cl2.png!!cl3.png! > Take a look at these sinks. We need to investigate the cause of them. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-22952) IgniteDeploymentException upon using Compute API under Java 21
[ https://issues.apache.org/jira/browse/IGNITE-22952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22952: --- Release Note: Fixed lambda serialization issues in code deployment for Java 21 > IgniteDeploymentException upon using Compute API under Java 21 > -- > > Key: IGNITE-22952 > URL: https://issues.apache.org/jira/browse/IGNITE-22952 > Project: Ignite > Issue Type: Bug > Components: compute >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Fix For: 2.17 > > Time Spent: 20m > Remaining Estimate: 0h > > * Start a node via bin/ignite.sh > * Start the {{CacheAffinityExample}} on Java 21. The example is started with > the same JVM options which are used to start a node: > {code:java} > -DIGNITE_UPDATE_NOTIFIER=false > -Xmx1g > -Xms1g > -DCONSISTENT_ID=1001 > --add-opens=java.base/jdk.internal.access=ALL-UNNAMED > --add-opens=java.base/jdk.internal.misc=ALL-UNNAMED > --add-opens=java.base/sun.nio.ch=ALL-UNNAMED > --add-opens=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED > --add-opens=jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED > --add-opens=java.base/sun.reflect.generics.reflectiveObjects=ALL-UNNAMED > --add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED > --add-opens=java.base/java.io=ALL-UNNAMED > --add-opens=java.base/java.net=ALL-UNNAMED > --add-opens=java.base/java.nio=ALL-UNNAMED > --add-opens=java.base/java.security.cert=ALL-UNNAMED > --add-opens=java.base/java.util=ALL-UNNAMED > --add-opens=java.base/java.util.concurrent=ALL-UNNAMED > --add-opens=java.base/java.util.concurrent.locks=ALL-UNNAMED > --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED > --add-opens=java.base/java.lang=ALL-UNNAMED > --add-opens=java.base/java.lang.invoke=ALL-UNNAMED > --add-opens=java.base/java.math=ALL-UNNAMED > --add-opens=java.base/java.time=ALL-UNNAMED > --add-opens=java.base/sun.security.ssl=ALL-UNNAMED > --add-opens=java.base/sun.security.x509=ALL-UNNAMED > --add-opens=java.sql/java.sql=ALL-UNNAMED{code} > Expected behavior: > * The example finishes without errors > Actual behavior: > * The example fails with the following exception in the example’s log: > {code:java} > [2024-04-17T08:09:43.27][INFO][main][GridDeploymentLocalStore] Class locally > deployed: class > org.apache.ignite.examples.datagrid.CacheAffinityExample$$Lambda/0x7f7264515000 > [2024-04-17T08:09:43.384][WARNING][p2p-#78][GridDeploymentCommunication] > Failed to resolve class > [originatingNodeId=21e1dbde-b1b3-4eb2-8d8e-0418e4dfeb1b, > class=o.a.i.examples.datagrid.CacheAffinityExample$$Lambda.0x7f7264515000, > req=GridDeploymentRequest > [rsrcName=org/apache/ignite/examples/datagrid/CacheAffinityExample$$Lambda/0x7f7264515000.class, > ldrId=dbcba1bee81-37bf182b-b8f7-471e-ae26-7145048e19d1, isUndeploy=false, > nodeIds=null]] > java.lang.ClassNotFoundException: > org.apache.ignite.examples.datagrid.CacheAffinityExample$$Lambda.0x7f7264515000 > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:526) > at java.base/java.lang.Class.forName0(Native Method) > at java.base/java.lang.Class.forName(Class.java:534) > at java.base/java.lang.Class.forName(Class.java:513) > at > org.apache.ignite.internal.managers.deployment.GridDeploymentCommunication.processResourceRequest(GridDeploymentCommunication.java:218) > at > org.apache.ignite.internal.managers.deployment.GridDeploymentCommunication.processDeploymentRequest(GridDeploymentCommunication.java:155) > at > org.apache.ignite.internal.managers.deployment.GridDeploymentCommunication.access$000(GridDeploymentCommunication.java:55) > at > org.apache.ignite.internal.managers.deployment.GridDeploymentCommunication$1.onMessage(GridDeploymentCommunication.java:91) > at > org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1727) > at > org.apache.ignite.internal.managers.communication.GridIoManager.access$4500(GridIoManager.java:158) > at > org.apache.ignite.internal.managers.communication.GridIoManager$7.run(GridIoManager.java:1164) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) > at java.base/java.lang.Thread.run(Thread.java:1583){code} > The following exception is seen in the server node’s log: > {code:java} > [2024-04-17T08:09:43.391][INFO][pub-#77][GridDeploymentPerVersionStore] > Failed t
[jira] [Updated] (IGNITE-22952) IgniteDeploymentException upon using Compute API under Java 21
[ https://issues.apache.org/jira/browse/IGNITE-22952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22952: --- Description: * Start a node via bin/ignite.sh * Start the {{CacheAffinityExample}} on Java 21. The example is started with the same JVM options which are used to start a node: {code:java} -DIGNITE_UPDATE_NOTIFIER=false -Xmx1g -Xms1g -DCONSISTENT_ID=1001 --add-opens=java.base/jdk.internal.access=ALL-UNNAMED --add-opens=java.base/jdk.internal.misc=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED --add-opens=jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED --add-opens=java.base/sun.reflect.generics.reflectiveObjects=ALL-UNNAMED --add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.security.cert=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.locks=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.math=ALL-UNNAMED --add-opens=java.base/java.time=ALL-UNNAMED --add-opens=java.base/sun.security.ssl=ALL-UNNAMED --add-opens=java.base/sun.security.x509=ALL-UNNAMED --add-opens=java.sql/java.sql=ALL-UNNAMED{code} Expected behavior: * The example finishes without errors Actual behavior: * The example fails with the following exception in the example’s log: {code:java} [2024-04-17T08:09:43.27][INFO][main][GridDeploymentLocalStore] Class locally deployed: class org.apache.ignite.examples.datagrid.CacheAffinityExample$$Lambda/0x7f7264515000 [2024-04-17T08:09:43.384][WARNING][p2p-#78][GridDeploymentCommunication] Failed to resolve class [originatingNodeId=21e1dbde-b1b3-4eb2-8d8e-0418e4dfeb1b, class=o.a.i.examples.datagrid.CacheAffinityExample$$Lambda.0x7f7264515000, req=GridDeploymentRequest [rsrcName=org/apache/ignite/examples/datagrid/CacheAffinityExample$$Lambda/0x7f7264515000.class, ldrId=dbcba1bee81-37bf182b-b8f7-471e-ae26-7145048e19d1, isUndeploy=false, nodeIds=null]] java.lang.ClassNotFoundException: org.apache.ignite.examples.datagrid.CacheAffinityExample$$Lambda.0x7f7264515000 at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:526) at java.base/java.lang.Class.forName0(Native Method) at java.base/java.lang.Class.forName(Class.java:534) at java.base/java.lang.Class.forName(Class.java:513) at org.apache.ignite.internal.managers.deployment.GridDeploymentCommunication.processResourceRequest(GridDeploymentCommunication.java:218) at org.apache.ignite.internal.managers.deployment.GridDeploymentCommunication.processDeploymentRequest(GridDeploymentCommunication.java:155) at org.apache.ignite.internal.managers.deployment.GridDeploymentCommunication.access$000(GridDeploymentCommunication.java:55) at org.apache.ignite.internal.managers.deployment.GridDeploymentCommunication$1.onMessage(GridDeploymentCommunication.java:91) at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1727) at org.apache.ignite.internal.managers.communication.GridIoManager.access$4500(GridIoManager.java:158) at org.apache.ignite.internal.managers.communication.GridIoManager$7.run(GridIoManager.java:1164) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1583){code} The following exception is seen in the server node’s log: {code:java} [2024-04-17T08:09:43.391][INFO][pub-#77][GridDeploymentPerVersionStore] Failed to get resource from node [nodeId=37bf182b-b8f7-471e-ae26-7145048e19d1, clsLdrId=dbcba1bee81-37bf182b-b8f7-471e-ae26-7145048e19d1, resName=org/apache/ignite/examples/datagrid/CacheAffinityExample$$Lambda/0x7f7264515000.class, classLoadersHierarchy=org.apache.ignite.internal.managers.deployment.GridDeploymentClassLoader->jdk.internal.loader.ClassLoaders$AppClassLoader->jdk.internal.loader.ClassLoaders$PlatformClassLoader, msg=Requested resource not found (ignoring locally) [originatingNodeId=21e1dbde-b1b3-4eb2-8d8e-0418e4dfeb1b, resourceName=org/apache/ignite/examples/datagrid/CacheAffinityExample$$Lambda/0x7f7264515000.class, classLoaderId=dbcba1bee81-37bf182b-b8f7-471e-ae26-7145048e19d1]] [2024-04-17T08:09:43.392][S
[jira] [Created] (IGNITE-22952) IgniteDeploymentException upon using Compute API under Java 21
Ivan Bessonov created IGNITE-22952: -- Summary: IgniteDeploymentException upon using Compute API under Java 21 Key: IGNITE-22952 URL: https://issues.apache.org/jira/browse/IGNITE-22952 Project: Ignite Issue Type: Bug Components: compute Reporter: Ivan Bessonov Assignee: Ivan Bessonov Fix For: 2.17 * Start a node via bin/ignite.sh * Start the CacheAffinityExample on Java 21. The example is started with the same JVM options which are used to start a node: {{}} {code:java} '-DIGNITE_UPDATE_NOTIFIER=false', '-Xmx1g', '-Xms1g', '-DCONSISTENT_ID=1001', '--add-opens=java.base/jdk.internal.access=ALL-UNNAMED', '--add-opens=java.base/jdk.internal.misc=ALL-UNNAMED', '--add-opens=java.base/sun.nio.ch=ALL-UNNAMED', '--add-opens=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED', '--add-opens=jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED', '--add-opens=java.base/sun.reflect.generics.reflectiveObjects=ALL-UNNAMED', '--add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED', '--add-opens=java.base/java.io=ALL-UNNAMED', '--add-opens=java.base/java.net=ALL-UNNAMED', '--add-opens=java.base/java.nio=ALL-UNNAMED', '--add-opens=java.base/java.security.cert=ALL-UNNAMED', '--add-opens=java.base/java.util=ALL-UNNAMED', '--add-opens=java.base/java.util.concurrent=ALL-UNNAMED', '--add-opens=java.base/java.util.concurrent.locks=ALL-UNNAMED', '--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED', '--add-opens=java.base/java.lang=ALL-UNNAMED', '--add-opens=java.base/java.lang.invoke=ALL-UNNAMED', '--add-opens=java.base/java.math=ALL-UNNAMED', '--add-opens=java.base/java.time=ALL-UNNAMED', '--add-opens=java.base/sun.security.ssl=ALL-UNNAMED', '--add-opens=java.base/sun.security.x509=ALL-UNNAMED', '--add-opens=java.sql/java.sql=ALL-UNNAMED'{code} {{}} Expected behaviour: * The example finishes without errors Actual behaviour: * The example fails with the following exception in the example’s log: {{}} {code:java} [2024-04-17T08:09:43.27][INFO][main][GridDeploymentLocalStore] Class locally deployed: class org.apache.ignite.examples.datagrid.CacheAffinityExample$$Lambda/0x7f7264515000 [2024-04-17T08:09:43.384][WARNING][p2p-#78][GridDeploymentCommunication] Failed to resolve class [originatingNodeId=21e1dbde-b1b3-4eb2-8d8e-0418e4dfeb1b, class=o.a.i.examples.datagrid.CacheAffinityExample$$Lambda.0x7f7264515000, req=GridDeploymentRequest [rsrcName=org/apache/ignite/examples/datagrid/CacheAffinityExample$$Lambda/0x7f7264515000.class, ldrId=dbcba1bee81-37bf182b-b8f7-471e-ae26-7145048e19d1, isUndeploy=false, nodeIds=null]] java.lang.ClassNotFoundException: org.apache.ignite.examples.datagrid.CacheAffinityExample$$Lambda.0x7f7264515000 at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:526) at java.base/java.lang.Class.forName0(Native Method) at java.base/java.lang.Class.forName(Class.java:534) at java.base/java.lang.Class.forName(Class.java:513) at org.apache.ignite.internal.managers.deployment.GridDeploymentCommunication.processResourceRequest(GridDeploymentCommunication.java:218) at org.apache.ignite.internal.managers.deployment.GridDeploymentCommunication.processDeploymentRequest(GridDeploymentCommunication.java:155) at org.apache.ignite.internal.managers.deployment.GridDeploymentCommunication.access$000(GridDeploymentCommunication.java:55) at org.apache.ignite.internal.managers.deployment.GridDeploymentCommunication$1.onMessage(GridDeploymentCommunication.java:91) at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1727) at org.apache.ignite.internal.managers.communication.GridIoManager.access$4500(GridIoManager.java:158) at org.apache.ignite.internal.managers.communication.GridIoManager$7.run(GridIoManager.java:1164) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1583){code} {{}} The following exception is seen in the server node’s log: {{}} {code:java} [2024-04-17T08:09:43.391][INFO][pub-#77][GridDeploymentPerVersionStore] Failed to get resource from node [nodeId=37bf182b-b8f7-471e-ae26-7145048e19d1, clsLdrId=dbcba1bee81-37bf182b-b8f7-471e-ae26-7145048e19d1, resName=org/apache/ignite/examples/datagrid/CacheAffinityExample$$Lambda/0x7f7264515000.class, classLoadersHierarchy=org.apache.ignite.internal.managers.deployment.GridDeploymentClassLoader->jdk.internal.loader.ClassLoaders$AppClassLoader->jdk.internal.loader.ClassLoaders$PlatformClassLoa
[jira] [Updated] (IGNITE-22667) Optimise RocksDB sorted indexes
[ https://issues.apache.org/jira/browse/IGNITE-22667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22667: --- Description: We should write comparator for RocksDB in C++, contact [~ibessonov] for references Examples of native comparators for RocksDB can be found here: [https://github.com/facebook/rocksdb/blob/2e09a54c4fb82e88bcaa3e7cfa8ccf3635d5/java/src/test/java/org/rocksdb/NativeComparatorWrapperTest.java] [https://github.com/facebook/rocksdb/blob/06c8afeff5b9fd38a79bdd4ba1bbb9df572c8096/java/rocksjni/native_comparator_wrapper_test.cc] Only applicable for sorted indexes, because they use complicated algorithm for comparing binary tuples. Schema of these tuples is encoded in CF name, so reading it is not an issue. was:We should write comparator for RocksDB in C++, contact [~ibessonov] for references > Optimise RocksDB sorted indexes > --- > > Key: IGNITE-22667 > URL: https://issues.apache.org/jira/browse/IGNITE-22667 > Project: Ignite > Issue Type: Improvement > Components: persistence >Reporter: Philipp Shergalis >Priority: Major > Labels: ignite-3 > > We should write comparator for RocksDB in C++, contact [~ibessonov] for > references > Examples of native comparators for RocksDB can be found here: > [https://github.com/facebook/rocksdb/blob/2e09a54c4fb82e88bcaa3e7cfa8ccf3635d5/java/src/test/java/org/rocksdb/NativeComparatorWrapperTest.java] > [https://github.com/facebook/rocksdb/blob/06c8afeff5b9fd38a79bdd4ba1bbb9df572c8096/java/rocksjni/native_comparator_wrapper_test.cc] > Only applicable for sorted indexes, because they use complicated algorithm > for comparing binary tuples. Schema of these tuples is encoded in CF name, so > reading it is not an issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-22852) Remove extra leaf access while modifying data in B+Tree
Ivan Bessonov created IGNITE-22852: -- Summary: Remove extra leaf access while modifying data in B+Tree Key: IGNITE-22852 URL: https://issues.apache.org/jira/browse/IGNITE-22852 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Currently, put operation does the following, when it finds a leaf node: # acquire read lock # check triangle invariant # find insertion point # release read lock # acquire write lock # check triangle invariant # find insertion point # insert/replace data # maybe release write lock I'm simplifying it a little bit, but the fact is that steps 4-7 could potentially be ignored, if we had an option to convert read lock into write lock. There's already an API for that in offheap lock class ({{{}upgradeToWriteLock{}}}). We could use it, or make another method with "try" semantics, that wouldn't acquire write lock if it couldn't acquire it using spin-wait with limited number of iterations. Same approach is possible for "put" and "remove". "invoke" requires some intermediary actions, so I wouldn't modify it for now. As a result, we can expect slightly improved performance of put/remove operations. We will see it in benchmarks, most likely. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-22819) Metastorage revisions inconsistency
[ https://issues.apache.org/jira/browse/IGNITE-22819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22819: --- Description: Following situation might happen: {code:java} [2024-07-24T09:29:17,220][INFO ][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:220 +0300, logical=0, composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:220 +0300, logical=1, composite=112840052389969921], removedEntriesCount=0, cacheSize=240]. ... [2024-07-24T09:29:17,257][INFO ][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=0, composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=1, composite=112840052392394753], removedEntriesCount=3, cacheSize=237]. [2024-07-24T09:29:17,257][INFO ][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=0, composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=1, composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code} Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, because of the difference of their clocks. {{evictIdempotentCommandsCache}} works differently on different nodes for the same raft commands. The real problem here is that it might (or might not) call the {{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would increase local revision. Revision is always local, it's never replicated. Revision mismatch leads to different evaluation of conditions in conditional updates and invokes. Simple example of such an issue would be a skipped configuration update on one or several nodes in cluster. What can we do about it: * make an alternative for {{removeAll}} that doesn't increase local revision * call {{removeAll}} even if the list is empty * never invalidate cache locally, but rather replicate cache invalidation with a special command * there's a TODO that says "clear this during compaction". That's a bad option, it would lead to either frequent compactions, or huge memory overheads was: Following situation might happen: {code:java} [2024-07-24T09:29:17,220][INFO ][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:220 +0300, logical=0, composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:220 +0300, logical=1, composite=112840052389969921], removedEntriesCount=0, cacheSize=240]. ... [2024-07-24T09:29:17,257][INFO ][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=0, composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=1, composite=112840052392394753], removedEntriesCount=3, cacheSize=237]. [2024-07-24T09:29:17,257][INFO ][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=0, composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=1, composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code} Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, because of the difference of their clocks. {{evictIdempotentCommandsCache}} works differently on different nodes for the same raft commands. The real problem here is that it might (or might not) call the {{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would increase local revision. Revision is always local, it's never replicated. Revision mismatch leads to different evaluation of conditions in conditional updates and invokes. Simple example of such an issue would be a skipped configuration update on one or several nodes in cluster. What can we do about it: * make an alternative for {{removeAll}} that doesn't increase local revision * never invalidate cache locally, but rather replicate cache invalidation with a special command * there's a TODO that says "clear this during compaction". That's a
[jira] [Updated] (IGNITE-22819) Metastorage revisions inconsistency
[ https://issues.apache.org/jira/browse/IGNITE-22819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22819: --- Description: Following situation might happen: {code:java} [2024-07-24T09:29:17,220][INFO ][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:220 +0300, logical=0, composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:220 +0300, logical=1, composite=112840052389969921], removedEntriesCount=0, cacheSize=240]. ... [2024-07-24T09:29:17,257][INFO ][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=0, composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=1, composite=112840052392394753], removedEntriesCount=3, cacheSize=237]. [2024-07-24T09:29:17,257][INFO ][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=0, composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=1, composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code} Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, because of the difference of their clocks. {{evictIdempotentCommandsCache}} works differently on different nodes for the same raft commands. The real problem here is that it might (or might not) call the {{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would increase local revision. Revision is always local, it's never replicated. Revision mismatch leads to different evaluation of conditions in conditional updates and invokes. Simple example of such an issue would be a skipped configuration update on one or several nodes in cluster. What can we do about it: * make an alternative for {{removeAll}} that doesn't increase local revision * never invalidate cache locally, but rather replicate cache invalidation with a special command * there's a TODO that says "clear this during compaction". That's a bad option, it would lead to either frequent compactions, or huge memory overheads was: Following situation might happen: {code:java} [2024-07-24T09:29:17,220][INFO ][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:220 +0300, logical=0, composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:220 +0300, logical=1, composite=112840052389969921], removedEntriesCount=0, cacheSize=240]. ... [2024-07-24T09:29:17,257][INFO ][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=0, composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=1, composite=112840052392394753], removedEntriesCount=3, cacheSize=237]. [2024-07-24T09:29:17,257][INFO ][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=0, composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=1, composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code} Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, because of the difference of their clocks. {{evictIdempotentCommandsCache}} works differently on different nodes for the same raft commands. The real problem here is that it might (or might not) call the {{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would increase local revision. Revision is always local, it's never replicated. Revision mismatch leads to different evaluation of conditions in conditional updates and invokes. Simple example of such an issue would be a skipped configuration update on one or several nodes in cluster. What can we do about it: * make an alternative for {{removeAll}} that doesn't increase local revision * never invalidate cache locally, but rather replicate cache invalidation with a special command * there's a TODO that says "clear this during compaction". That's a bad option, it would lead to either frequent
[jira] [Updated] (IGNITE-22819) Metastorage revisions inconsistency
[ https://issues.apache.org/jira/browse/IGNITE-22819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22819: --- Description: Following situation might happen: {code:java} [2024-07-24T09:29:17,220][INFO ][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:220 +0300, logical=0, composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:220 +0300, logical=1, composite=112840052389969921], removedEntriesCount=0, cacheSize=240]. ... [2024-07-24T09:29:17,257][INFO ][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=0, composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=1, composite=112840052392394753], removedEntriesCount=3, cacheSize=237]. [2024-07-24T09:29:17,257][INFO ][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=0, composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=1, composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code} Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, because of the difference of their clocks. {{evictIdempotentCommandsCache}} works differently on different nodes for the same raft commands. The real problem here is that it might (or might not) call the {{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would increase local revision. Revision is always local, it's never replicated. Revision mismatch leads to different evaluation of conditions in conditional updates and invokes. Simple example of such an issue would be a skipped configuration update on one or several nodes in cluster. What can we do about it: * make an alternative for {{removeAll}} that doesn't increase local revision * never invalidate cache locally, but rather replicate cache invalidation with a special command * there's a TODO that says "clear this during compaction". That's a bad option, it would lead to either frequent compactions, or huge memory overheads was: Following situation might happen: {code:java} [2024-07-24T09:29:17,220][INFO ][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:220 +0300, logical=0, composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:220 +0300, logical=1, composite=112840052389969921], removedEntriesCount=0, cacheSize=240]. ... [2024-07-24T09:29:17,257][INFO ][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=0, composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=1, composite=112840052392394753], removedEntriesCount=3, cacheSize=237]. [2024-07-24T09:29:17,257][INFO ][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=0, composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=1, composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code} Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, because of the difference of their clocks. {{evictIdempotentCommandsCache}} works differently on different nodes for the same raft commands. The real problem here is that it might (or might not) call the {{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would increase local revision. Revision is always local, it's never replicated. Revision mismatch leads to different evaluation of conditions in conditional updates and invokes. Simple example of such an issue would be a skipped configuration update on one of or several nodes in cluster. > Metastorage revisions inconsistency > --- > > Key: IGNITE-22819 > URL: https://issues.apache.org/jira/browse/IGNITE-22819 > Project: Ignite > Issue Type: Bug >Reporter: Ivan Bessonov >Priority:
[jira] [Created] (IGNITE-22819) Metastorage revisions inconsistency
Ivan Bessonov created IGNITE-22819: -- Summary: Metastorage revisions inconsistency Key: IGNITE-22819 URL: https://issues.apache.org/jira/browse/IGNITE-22819 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov Following situation might happen: {code:java} [2024-07-24T09:29:17,220][INFO ][%itcskvt_n_0%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:220 +0300, logical=0, composite=112840052389969920], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:220 +0300, logical=1, composite=112840052389969921], removedEntriesCount=0, cacheSize=240]. ... [2024-07-24T09:29:17,257][INFO ][%itcskvt_n_2%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=0, composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=1, composite=112840052392394753], removedEntriesCount=3, cacheSize=237]. [2024-07-24T09:29:17,257][INFO ][%itcskvt_n_1%JRaft-FSMCaller-Disruptormetastorage_stripe_0-0][MetaStorageWriteHandler] Idempotent command cache cleanup finished [cleanupTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=0, composite=112840052392394752], cleanupCompletionTimestamp=HybridTimestamp [physical=2024-07-24 09:29:17:257 +0300, logical=1, composite=112840052392394753], removedEntriesCount=3, cacheSize=237].{code} Note that {{removedEntriesCount}} is 0 on a leader and 3 on followers, because of the difference of their clocks. {{evictIdempotentCommandsCache}} works differently on different nodes for the same raft commands. The real problem here is that it might (or might not) call the {{{}storage.removeAll(commandIdStorageKeys, safeTime){}}}, which would increase local revision. Revision is always local, it's never replicated. Revision mismatch leads to different evaluation of conditions in conditional updates and invokes. Simple example of such an issue would be a skipped configuration update on one of or several nodes in cluster. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-22736) PartitionCommandsMarshallerImpl corrupts the buffer it reads from
[ https://issues.apache.org/jira/browse/IGNITE-22736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov reassigned IGNITE-22736: -- Fix Version/s: 3.0.0-beta2 Assignee: Ivan Bessonov Labels: ignite-3 (was: ) > PartitionCommandsMarshallerImpl corrupts the buffer it reads from > - > > Key: IGNITE-22736 > URL: https://issues.apache.org/jira/browse/IGNITE-22736 > Project: Ignite > Issue Type: Bug >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > {{PartitionCommandsMarshallerImpl#unmarshall}} receives a buffer, that's > requested from the log manager, for example. > The instance of byte buffer that it receives might be acquired from on-heap > cache of log entries. Modifying it would be > # not thread-safe, because multiple threads may start modifying it > concurrently > # illegal, because it stays in the cache for some time, and we basically > corrupt it by modifying it > We shouldn't do that -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-22736) PartitionCommandsMarshallerImpl corrupts the buffer it reads from
Ivan Bessonov created IGNITE-22736: -- Summary: PartitionCommandsMarshallerImpl corrupts the buffer it reads from Key: IGNITE-22736 URL: https://issues.apache.org/jira/browse/IGNITE-22736 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov {{PartitionCommandsMarshallerImpl#unmarshall}} receives a buffer, that's requested from the log manager, for example. The instance of byte buffer that it receives might be acquired from on-heap cache of log entries. Modifying it would be # not thread-safe, because multiple threads may start modifying it concurrently # illegal, because it stays in the cache for some time, and we basically corrupt it by modifying it We shouldn't do that -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-22657) Investigate why ItDisasterRecoveryReconfigurationTest#testIncompleteRebalanceAfterResetPartitions fails without sleep
Ivan Bessonov created IGNITE-22657: -- Summary: Investigate why ItDisasterRecoveryReconfigurationTest#testIncompleteRebalanceAfterResetPartitions fails without sleep Key: IGNITE-22657 URL: https://issues.apache.org/jira/browse/IGNITE-22657 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-21303) Exclude nodes in "error" state from manual group reconfiguration
[ https://issues.apache.org/jira/browse/IGNITE-21303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov reassigned IGNITE-21303: -- Assignee: Ivan Bessonov > Exclude nodes in "error" state from manual group reconfiguration > > > Key: IGNITE-21303 > URL: https://issues.apache.org/jira/browse/IGNITE-21303 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > Instead of simply using existing set of node as a baseline for new > assignments, we should either exclude peers in ERROR state from it, or force > data cleanup on such nodes. Third option - forbid such reconfiguration, > forcing user to clear ERROR peers in advance -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IGNITE-22500) Remove unnecessary waits when creating an index
[ https://issues.apache.org/jira/browse/IGNITE-22500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov resolved IGNITE-22500. Resolution: Won't Fix About eliminating a BUILDING status from catalog, we can't simply change a few lines, this task involves more changes. To my understanding, following nuances are important: * ChangeIndexStatusTask should be changed. If we remove REGISTERED->BUILDING change, then we wouldn't have to update catalog, this will lead to small refactoring. * We would have to create {{CatalogEvent.INDEX_BUILDING}} event instead of updating the catalog. * This event will have nothing to do with catalog at this point, it should be renamed. * It will *not* be fired in a context of meta-storage watch execution, which might be a problem if listener implementations rely on it. Spoiler: they do. * Local recovery and other such stuff will be changed slightly, this part shouldn't be that hard. Overall, I don't think that we should do such an optimization in this issue specifically. It's not about "removing wait that we don't need", it's about changing the internal protocol of index creation. I will file another Jira for that soon > Remove unnecessary waits when creating an index > --- > > Key: IGNITE-22500 > URL: https://issues.apache.org/jira/browse/IGNITE-22500 > Project: Ignite > Issue Type: Improvement >Reporter: Roman Puchkovskiy >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > When creating an index with current defaults (DelayDuration=1sec, > MaxClockSkew=500ms, IdleSafeTimePropagationPeriod=1sec), it takes 6-6.5 > seconds on my machine (without concurrent transactions, on an empty table > that was just created). > According to the design, we need to first wait for the REGISTERED state to > activate on all nodes, including the ones that are currently down; this is to > make sure that all transactions started on schema versions before the index > creation have finished before we start to build the index (this makes us > waiting for DelayDuration+MaxClockSkew). Then, after the build finishes, we > switch the index to the AVAILABLE state. This requires another wait of > DelayDuration+MaxClockSkew. > Because of IGNITE-20378, in the second case we actually wait longer (for > additional IdleSafeTimePropagationPeriod+MaxClockSkew). > The total of waits is thus 1.5+3=4.5sec. But index creation actually takes > 6-6.5 seconds. It looks like there are some additional delays (like > submitting to the Metastorage and executing its watches). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-22500) Remove unnecessary waits when creating an index
[ https://issues.apache.org/jira/browse/IGNITE-22500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859937#comment-17859937 ] Ivan Bessonov commented on IGNITE-22500: My thoughts on the topic: * _We have additional switch from REGISTERED to BUILDING, which can in theory be eliminated from catalog, it'll save us additional second (DD is 500ms now)_ * We can't lower DD for a specific status change, because it would violate schema synchronization protocol. After waiting for "msSafeTime - DD - skew" (don't remember precise rules about clock skew) we rely on the fact that the catalog is up-to-date, breaking that invariant would lead to some unforeseen consequences. * What we really need it: ** The ability to create indexes in the same DDL as the table itself. We do this implicitly for PK. For other indexes it's only a question of API ** For SQL scripts we could batch consecutive DDLs and create indexes at the same time as a table implicitly, which seems like an optimal choice. This way we don't need any special syntax ** Some DDL queries can be executed in parallel, why not. Again, seems more like a SQL issue to me > Remove unnecessary waits when creating an index > --- > > Key: IGNITE-22500 > URL: https://issues.apache.org/jira/browse/IGNITE-22500 > Project: Ignite > Issue Type: Improvement >Reporter: Roman Puchkovskiy >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > When creating an index with current defaults (DelayDuration=1sec, > MaxClockSkew=500ms, IdleSafeTimePropagationPeriod=1sec), it takes 6-6.5 > seconds on my machine (without concurrent transactions, on an empty table > that was just created). > According to the design, we need to first wait for the REGISTERED state to > activate on all nodes, including the ones that are currently down; this is to > make sure that all transactions started on schema versions before the index > creation have finished before we start to build the index (this makes us > waiting for DelayDuration+MaxClockSkew). Then, after the build finishes, we > switch the index to the AVAILABLE state. This requires another wait of > DelayDuration+MaxClockSkew. > Because of IGNITE-20378, in the second case we actually wait longer (for > additional IdleSafeTimePropagationPeriod+MaxClockSkew). > The total of waits is thus 1.5+3=4.5sec. But index creation actually takes > 6-6.5 seconds. It looks like there are some additional delays (like > submitting to the Metastorage and executing its watches). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-22561) Get rid of ByteString in messages
Ivan Bessonov created IGNITE-22561: -- Summary: Get rid of ByteString in messages Key: IGNITE-22561 URL: https://issues.apache.org/jira/browse/IGNITE-22561 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov Here I would include two types of improvements: * {{@Marshallable ByteString}} - this pattern became obsolete long time ago. {{ByteBuffer}} type is natively supported by the protocol, and it should eliminate unnecessary data copying, potentioally making the system faster * Pretty much the same thing, but for {{{}byte[]{}}}. It's used in classes like {{{}org.apache.ignite.internal.metastorage.dsl.Operation{}}}. If we migrate these properties to {{ByteBuffer}} then deserialization will become significantly faster, but in order to utilize it we would have to change internal metastorage implementation a little bit (like optimizing memory usage in {{{}RocksDbKeyValueStorage#addDataToBatch{}}}). If it requires too many changes then I propose doing it in a separate JIRA. My assumption - it will not require too many changes, but we'll see. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (IGNITE-22544) Commands marshalling appears to be slow
[ https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859613#comment-17859613 ] Ivan Bessonov edited comment on IGNITE-22544 at 6/24/24 10:00 AM: -- According to {{{}org.apache.ignite.internal.benchmarks.UpdateCommandsMarshalingMicroBenchmark{}}}. Before: {code:java} Benchmark (payloadSize) (updateAll) Mode Cnt Score Error Units UpdateCommandsMarshalingMicroBenchmark.marshal 128false thrpt5 2361.249 ± 66.884 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 128 true thrpt552.377 ± 3.769 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048false thrpt5 1713.443 ± 331.795 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048 true thrpt514.916 ± 2.230 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192false thrpt5 833.372 ± 227.738 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192 true thrpt5 3.281 ± 0.906 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal128false thrpt5 2090.845 ± 792.226 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal128 true thrpt551.393 ± 16.872 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048false thrpt5 2188.459 ± 69.423 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048 true thrpt552.705 ± 2.771 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192false thrpt5 2174.810 ± 61.331 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192 true thrpt553.805 ± 1.000 ops/ms {code} After: {code:java} Benchmark (payloadSize) (updateAll) Mode Cnt Score Error Units UpdateCommandsMarshalingMicroBenchmark.marshal 128 false thrpt 5 4389.765 ± 66.332 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 128 true thrpt 5 79.684 ± 0.965 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048 false thrpt 5 2754.506 ± 58.151 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048 true thrpt 5 17.435 ± 0.267 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192 false thrpt 5 1066.381 ± 10.254 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192 true thrpt 5 3.389 ± 0.688 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 128 false thrpt 5 2782.648 ± 173.791 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 128 true thrpt 5 69.952 ± 9.109 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048 false thrpt 5 2752.568 ± 50.796 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048 true thrpt 5 63.721 ± 2.902 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192 false thrpt 5 2676.343 ± 1209.184 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192 true thrpt 5 62.139 ± 17.144 ops/ms {code} Short summary: * Depending on the number of byte arrays inside of the message (which can't be optimized), marshaling became from 0% to 85% faster according to created benchmark, due to a combination of a lot of different optimizations, such as ** avoiding the creation of serializers ** simpler and slightly faster byte buffers pool ** better binary UUID format ** low-level stuff in direct stream ** better {{writeVarInt}} / {{writeVarLong}} * If we take a look at the flamegraph, we could see that serialization itself is about 1.5-2.0 times slower than the following {{{}Arrays.copyOf{}}}, which is pretty good in my opinion. * Reading speed wasn't so thoroughly checked in this issue, I created another one: https://issues.apache.org/jira/browse/IGNITE-22559 Overall, reading speed doesn't depend on the size of individual byte buffers, because we simple wrap the original array. Other then that, current optimizations show 15%-35% increase in deserialization speed, due to ** {{...StreamImplV1}} optimizations ** faster {{readInt}} / {{readLong}} ** better binary UUID format * Further optimizations for reads are required. Here I mostly focused on writing speed. Reading speed turned out to be worse than writing speed for small commands, I don't like it. was (Author: ibessonov): According to {{{}org.apache.ignite.internal.benchmarks.UpdateCommandsMarshalingMicroBenchmark{}}}. Before: {code:java} Benchmark
[jira] [Comment Edited] (IGNITE-22544) Commands marshalling appears to be slow
[ https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859613#comment-17859613 ] Ivan Bessonov edited comment on IGNITE-22544 at 6/24/24 9:59 AM: - According to {{{}org.apache.ignite.internal.benchmarks.UpdateCommandsMarshalingMicroBenchmark{}}}. Before: {code:java} Benchmark (payloadSize) (updateAll) Mode Cnt Score Error Units UpdateCommandsMarshalingMicroBenchmark.marshal 128false thrpt5 2361.249 ± 66.884 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 128 true thrpt552.377 ± 3.769 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048false thrpt5 1713.443 ± 331.795 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048 true thrpt514.916 ± 2.230 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192false thrpt5 833.372 ± 227.738 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192 true thrpt5 3.281 ± 0.906 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal128false thrpt5 2090.845 ± 792.226 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal128 true thrpt551.393 ± 16.872 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048false thrpt5 2188.459 ± 69.423 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048 true thrpt552.705 ± 2.771 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192false thrpt5 2174.810 ± 61.331 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192 true thrpt553.805 ± 1.000 ops/ms {code} After: {code:java} Benchmark (payloadSize) (updateAll) Mode Cnt Score Error Units UpdateCommandsMarshalingMicroBenchmark.marshal 128 false thrpt 5 4389.765 ± 66.332 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 128 true thrpt 5 79.684 ± 0.965 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048 false thrpt 5 2754.506 ± 58.151 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048 true thrpt 5 17.435 ± 0.267 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192 false thrpt 5 1066.381 ± 10.254 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192 true thrpt 5 3.389 ± 0.688 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 128 false thrpt 5 2782.648 ± 173.791 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 128 true thrpt 5 69.952 ± 9.109 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048 false thrpt 5 2752.568 ± 50.796 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048 true thrpt 5 63.721 ± 2.902 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192 false thrpt 5 2676.343 ± 1209.184 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192 true thrpt 5 62.139 ± 17.144 ops/ms {code} Short summary: * Depending on the number of byte arrays inside of the message (which can't be optimized), marshaling became from 0% to 85% faster according to created benchmark, due to a combination of a lot of different optimizations, such as ** avoiding the creation of serializers ** simpler and slightly faster byte buffers pool ** better binary UUID format ** low-level stuff in direct stream ** better {{writeVarInt}} / {{writeVarLong}} * If we take a look at the flamegraph, we could see that serialization itself is about 1.5-2.0 times slower than the following {{{}Arrays.copyOf{}}}, which is pretty good in my opinion. * Reading speed wasn't so thoroughly checked in this issue, I created another one: https://issues.apache.org/jira/browse/IGNITE-22559 Overall, reading speed doesn't depend on the size of individual byte buffers, because we simple wrap the original array. Other then that, current optimizations show 15%-35% increase in deserialization speed, due to ** {{...StreamImplV1}} optimizations ** faster {{readInt}} / {{readLong}} ** better binary UUID format * * Further optimizations for reads are required. Here I mostly focused on writing speed. Reading speed turned out to be worse than writing speed for small commands, I don't like it. was (Author: ibessonov): According to {{{}org.apache.ignite.internal.benchmarks.UpdateCommandsMarshalingMicroBenchmark{}}}. Before: {code:java} Benchmark
[jira] [Updated] (IGNITE-22544) Commands marshalling appears to be slow
[ https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22544: --- Reviewer: Philipp Shergalis > Commands marshalling appears to be slow > --- > > Key: IGNITE-22544 > URL: https://issues.apache.org/jira/browse/IGNITE-22544 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3, ignite3_performance > Attachments: IGNITE-22544.patch > > Time Spent: 10m > Remaining Estimate: 0h > > We should benchmark the way we marshal commands using optimized marshaller > and make it faster. Some obvious places: > * byte buffers pool - we can replace queue with a manual implementation of > Treiber stack, it's trivial and doesn't use as many CAS/volatile operations > * new serializers are allocated every time, but they can be put into static > final constants instead, or cached in fields of corresponding factories > * we can create a serialization factory per group, not per message, this way > we will remove unnecessary indirection. Group factory can use {{{}switch{}}}, > like in Ignite 2, which would basically lead to static dispatch of > deserializer constructors and static access to serializers, instead of > dynamic dispatch (virtual call), which should be noticeably faster > * profiler might show other simple places, we must also compare > {{OptimizedMarshaller}} against other serialization algorithms in benchmarks > EDIT: quick draft attached, it addresses points 1 and 2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-22544) Commands marshalling appears to be slow
[ https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859613#comment-17859613 ] Ivan Bessonov commented on IGNITE-22544: According to {{{}org.apache.ignite.internal.benchmarks.UpdateCommandsMarshalingMicroBenchmark{}}}. Before: {code:java} Benchmark (payloadSize) (updateAll) Mode Cnt Score Error Units UpdateCommandsMarshalingMicroBenchmark.marshal 128false thrpt5 2361.249 ± 66.884 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 128 true thrpt552.377 ± 3.769 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048false thrpt5 1713.443 ± 331.795 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048 true thrpt514.916 ± 2.230 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192false thrpt5 833.372 ± 227.738 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192 true thrpt5 3.281 ± 0.906 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal128false thrpt5 2090.845 ± 792.226 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal128 true thrpt551.393 ± 16.872 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048false thrpt5 2188.459 ± 69.423 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048 true thrpt552.705 ± 2.771 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192false thrpt5 2174.810 ± 61.331 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192 true thrpt553.805 ± 1.000 ops/ms {code} After: {code:java} Benchmark (payloadSize) (updateAll) Mode Cnt Score Error Units UpdateCommandsMarshalingMicroBenchmark.marshal 128 false thrpt 5 4389.765 ± 66.332 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 128 true thrpt 5 79.684 ± 0.965 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048 false thrpt 5 2754.506 ± 58.151 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048 true thrpt 5 17.435 ± 0.267 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192 false thrpt 5 1066.381 ± 10.254 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192 true thrpt 5 3.389 ± 0.688 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 128 false thrpt 5 2782.648 ± 173.791 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 128 true thrpt 5 69.952 ± 9.109 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048 false thrpt 5 2752.568 ± 50.796 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048 true thrpt 5 63.721 ± 2.902 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192 false thrpt 5 2676.343 ± 1209.184 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192 true thrpt 5 62.139 ± 17.144 ops/ms {code} Short summary: * Depending on the number of byte arrays inside of the message (which can't be optimized), marshaling became from 0% to 85% faster according to created benchmark, due to a combination of a lot of different optimizations, such as ** avoiding the creation of serializers ** simpler and slightly faster byte buffers pool ** better binary UUID format ** low-level stuff in direct stream ** better {{writeVarInt}} / {{writeVarLong}} * If we take a look at the flamegraph, we could see that serialization itself is about 1.5-2.0 times slower than the following {{{}Arrays.copyOf{}}}, which is pretty good in my opinion. * Reading speed wasn't so thoroughly checked in this issue, I created another one: https://issues.apache.org/jira/browse/IGNITE-22559 Overall, reading speed doesn't depend on the size of individual byte buffers, because we simple wrap the original array. Other then that, current optimizations show 15%-35% increase in deserialization speed, due to ** {{...StreamImplV1}} optimizations ** faster {{readInt}} / {{readLong}} ** better binary UUID format * Further optimizations for reads are required. Here I mostly focused on writing speed. Reading speed turned out to be worse than writing speed for small commands, I don't like it. > Commands marshalling appears to be slow > --- > > Key: IGNITE-22544 > URL: https://issues.apache.org/jira/browse/IGNITE-22544 >
[jira] [Comment Edited] (IGNITE-22544) Commands marshalling appears to be slow
[ https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859613#comment-17859613 ] Ivan Bessonov edited comment on IGNITE-22544 at 6/24/24 8:25 AM: - According to {{{}org.apache.ignite.internal.benchmarks.UpdateCommandsMarshalingMicroBenchmark{}}}. Before: {code:java} Benchmark (payloadSize) (updateAll) Mode Cnt Score Error Units UpdateCommandsMarshalingMicroBenchmark.marshal 128false thrpt5 2361.249 ± 66.884 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 128 true thrpt552.377 ± 3.769 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048false thrpt5 1713.443 ± 331.795 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048 true thrpt514.916 ± 2.230 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192false thrpt5 833.372 ± 227.738 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192 true thrpt5 3.281 ± 0.906 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal128false thrpt5 2090.845 ± 792.226 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal128 true thrpt551.393 ± 16.872 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048false thrpt5 2188.459 ± 69.423 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048 true thrpt552.705 ± 2.771 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192false thrpt5 2174.810 ± 61.331 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192 true thrpt553.805 ± 1.000 ops/ms {code} After: {code:java} Benchmark (payloadSize) (updateAll) Mode Cnt Score Error Units UpdateCommandsMarshalingMicroBenchmark.marshal 128 false thrpt 5 4389.765 ± 66.332 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 128 true thrpt 5 79.684 ± 0.965 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048 false thrpt 5 2754.506 ± 58.151 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 2048 true thrpt 5 17.435 ± 0.267 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192 false thrpt 5 1066.381 ± 10.254 ops/ms UpdateCommandsMarshalingMicroBenchmark.marshal 8192 true thrpt 5 3.389 ± 0.688 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 128 false thrpt 5 2782.648 ± 173.791 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 128 true thrpt 5 69.952 ± 9.109 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048 false thrpt 5 2752.568 ± 50.796 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 2048 true thrpt 5 63.721 ± 2.902 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192 false thrpt 5 2676.343 ± 1209.184 ops/ms UpdateCommandsMarshalingMicroBenchmark.unmarshal 8192 true thrpt 5 62.139 ± 17.144 ops/ms {code} Short summary: * Depending on the number of byte arrays inside of the message (which can't be optimized), marshaling became from 0% to 85% faster according to created benchmark, due to a combination of a lot of different optimizations, such as * ** avoiding the creation of serializers ** simpler and slightly faster byte buffers pool ** better binary UUID format ** low-level stuff in direct stream ** better {{writeVarInt}} / {{writeVarLong}} * If we take a look at the flamegraph, we could see that serialization itself is about 1.5-2.0 times slower than the following {{{}Arrays.copyOf{}}}, which is pretty good in my opinion. * Reading speed wasn't so thoroughly checked in this issue, I created another one: https://issues.apache.org/jira/browse/IGNITE-22559 Overall, reading speed doesn't depend on the size of individual byte buffers, because we simple wrap the original array. Other then that, current optimizations show 15%-35% increase in deserialization speed, due to ** {{...StreamImplV1}} optimizations ** faster {{readInt}} / {{readLong}} ** better binary UUID format * Further optimizations for reads are required. Here I mostly focused on writing speed. Reading speed turned out to be worse than writing speed for small commands, I don't like it. was (Author: ibessonov): According to {{{}org.apache.ignite.internal.benchmarks.UpdateCommandsMarshalingMicroBenchmark{}}}. Before: {code:java} Benchmar
[jira] [Updated] (IGNITE-22559) Optimize raft command deserialization
[ https://issues.apache.org/jira/browse/IGNITE-22559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22559: --- Description: # We should benchmark readInt / readLong against protobuf, since it uses the same binary format # We should create much faster way of creating deserializers for messages. For example, we could generate "switch" statements like in Ignite 2. Both for creating message deserializer (compile time generation) and for message group deserialization factory (runtime generation, because we don't know the list of factories) # We should get rid of serializers and deserializers as separate classes and move generated code into message implementation. This way we save on allocations and we don't create builder, which is also expensive, we should write directly into fields of target object like in Ignite 2. > Optimize raft command deserialization > - > > Key: IGNITE-22559 > URL: https://issues.apache.org/jira/browse/IGNITE-22559 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > # We should benchmark readInt / readLong against protobuf, since it uses the > same binary format > # We should create much faster way of creating deserializers for messages. > For example, we could generate "switch" statements like in Ignite 2. Both for > creating message deserializer (compile time generation) and for message group > deserialization factory (runtime generation, because we don't know the list > of factories) > # We should get rid of serializers and deserializers as separate classes and > move generated code into message implementation. This way we save on > allocations and we don't create builder, which is also expensive, we should > write directly into fields of target object like in Ignite 2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-22559) Optimize raft command deserialization
Ivan Bessonov created IGNITE-22559: -- Summary: Optimize raft command deserialization Key: IGNITE-22559 URL: https://issues.apache.org/jira/browse/IGNITE-22559 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-22544) Commands marshalling appears to be slow
[ https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov reassigned IGNITE-22544: -- Assignee: Ivan Bessonov > Commands marshalling appears to be slow > --- > > Key: IGNITE-22544 > URL: https://issues.apache.org/jira/browse/IGNITE-22544 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Attachments: IGNITE-22544.patch > > > We should benchmark the way we marshal commands using optimized marshaller > and make it faster. Some obvious places: > * byte buffers pool - we can replace queue with a manual implementation of > Treiber stack, it's trivial and doesn't use as many CAS/volatile operations > * new serializers are allocated every time, but they can be put into static > final constants instead, or cached in fields of corresponding factories > * we can create a serialization factory per group, not per message, this way > we will remove unnecessary indirection. Group factory can use {{{}switch{}}}, > like in Ignite 2, which would basically lead to static dispatch of > deserializer constructors and static access to serializers, instead of > dynamic dispatch (virtual call), which should be noticeably faster > * profiler might show other simple places, we must also compare > {{OptimizedMarshaller}} against other serialization algorithms in benchmarks > EDIT: quick draft attached, it addresses points 1 and 2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-22544) Commands marshalling appears to be slow
[ https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22544: --- Description: We should benchmark the way we marshal commands using optimized marshaller and make it faster. Some obvious places: * byte buffers pool - we can replace queue with a manual implementation of Treiber stack, it's trivial and doesn't use as many CAS/volatile operations * new serializers are allocated every time, but they can be put into static final constants instead, or cached in fields of corresponding factories * we can create a serialization factory per group, not per message, this way we will remove unnecessary indirection. Group factory can use {{{}switch{}}}, like in Ignite 2, which would basically lead to static dispatch of deserializer constructors and static access to serializers, instead of dynamic dispatch (virtual call), which should be noticeably faster * profiler might show other simple places, we must also compare {{OptimizedMarshaller}} against other serialization algorithms in benchmarks EDIT: quick draft attached, it addresses points 1 and 2. was: We should benchmark the way we marshal commands using optimized marshaller and make it faster. Some obvious places: * byte buffers pool - we can replace queue with a manual implementation of Treiber stack, it's trivial and doesn't use as many CAS/volatile operations * new serializers are allocated every time, but they can be put into static final constants instead, or cached in fields of corresponding factories * we can create a serialization factory per group, not per message, this way we will remove unnecessary indirection. Group factory can use {{{}switch{}}}, like in Ignite 2, which would basically lead to static dispatch of deserializer constructors and static access to serializers, instead of dynamic dispatch (virtual call), which should be noticeably faster * profiler might show other simple places, we must also compare {{OptimizedMarshaller}} against other serialization algorithms in benchmarks > Commands marshalling appears to be slow > --- > > Key: IGNITE-22544 > URL: https://issues.apache.org/jira/browse/IGNITE-22544 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Attachments: IGNITE-22544.patch > > > We should benchmark the way we marshal commands using optimized marshaller > and make it faster. Some obvious places: > * byte buffers pool - we can replace queue with a manual implementation of > Treiber stack, it's trivial and doesn't use as many CAS/volatile operations > * new serializers are allocated every time, but they can be put into static > final constants instead, or cached in fields of corresponding factories > * we can create a serialization factory per group, not per message, this way > we will remove unnecessary indirection. Group factory can use {{{}switch{}}}, > like in Ignite 2, which would basically lead to static dispatch of > deserializer constructors and static access to serializers, instead of > dynamic dispatch (virtual call), which should be noticeably faster > * profiler might show other simple places, we must also compare > {{OptimizedMarshaller}} against other serialization algorithms in benchmarks > EDIT: quick draft attached, it addresses points 1 and 2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-22544) Commands marshalling appears to be slow
[ https://issues.apache.org/jira/browse/IGNITE-22544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22544: --- Attachment: IGNITE-22544.patch > Commands marshalling appears to be slow > --- > > Key: IGNITE-22544 > URL: https://issues.apache.org/jira/browse/IGNITE-22544 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Attachments: IGNITE-22544.patch > > > We should benchmark the way we marshal commands using optimized marshaller > and make it faster. Some obvious places: > * byte buffers pool - we can replace queue with a manual implementation of > Treiber stack, it's trivial and doesn't use as many CAS/volatile operations > * new serializers are allocated every time, but they can be put into static > final constants instead, or cached in fields of corresponding factories > * we can create a serialization factory per group, not per message, this way > we will remove unnecessary indirection. Group factory can use {{{}switch{}}}, > like in Ignite 2, which would basically lead to static dispatch of > deserializer constructors and static access to serializers, instead of > dynamic dispatch (virtual call), which should be noticeably faster > * profiler might show other simple places, we must also compare > {{OptimizedMarshaller}} against other serialization algorithms in benchmarks -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-22542) Synchronous message handling on local node
[ https://issues.apache.org/jira/browse/IGNITE-22542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22542: --- Description: {{org.apache.ignite.internal.network.DefaultMessagingService#isSelf}} - if we detect that we send a message to the local node, we handle it immediately in the same thread, which could be very bed for throughput of the system. "send"/"invoke" themselves appear to be slow as well, we should benchmark them. We should remove instantiation of InetSocketAddress for sure, if it's possible, it takes time to resolve it. Maybe we should create it unresolved or just cache it like in Ignite 2. was: {{org.apache.ignite.internal.network.DefaultMessagingService#isSelf}} - if we detect that we send a message to the local node, we handle it immediately in the same thread, which could be very bed for throughput of the system. "send"/"invoke" themselves appear to be slow as well, we should benchmark them. > Synchronous message handling on local node > -- > > Key: IGNITE-22542 > URL: https://issues.apache.org/jira/browse/IGNITE-22542 > Project: Ignite > Issue Type: Bug >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > {{org.apache.ignite.internal.network.DefaultMessagingService#isSelf}} - if we > detect that we send a message to the local node, we handle it immediately in > the same thread, which could be very bed for throughput of the system. > "send"/"invoke" themselves appear to be slow as well, we should benchmark > them. We should remove instantiation of InetSocketAddress for sure, if it's > possible, it takes time to resolve it. Maybe we should create it unresolved > or just cache it like in Ignite 2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-22544) Commands marshalling appears to be slow
Ivan Bessonov created IGNITE-22544: -- Summary: Commands marshalling appears to be slow Key: IGNITE-22544 URL: https://issues.apache.org/jira/browse/IGNITE-22544 Project: Ignite Issue Type: Improvement Reporter: Ivan Bessonov We should benchmark the way we marshal commands using optimized marshaller and make it faster. Some obvious places: * byte buffers pool - we can replace queue with a manual implementation of Treiber stack, it's trivial and doesn't use as many CAS/volatile operations * new serializers are allocated every time, but they can be put into static final constants instead, or cached in fields of corresponding factories * we can create a serialization factory per group, not per message, this way we will remove unnecessary indirection. Group factory can use {{{}switch{}}}, like in Ignite 2, which would basically lead to static dispatch of deserializer constructors and static access to serializers, instead of dynamic dispatch (virtual call), which should be noticeably faster * profiler might show other simple places, we must also compare {{OptimizedMarshaller}} against other serialization algorithms in benchmarks -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-22542) Synchronous message handling on local node
[ https://issues.apache.org/jira/browse/IGNITE-22542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-22542: --- Ignite Flags: (was: Docs Required,Release Notes Required) > Synchronous message handling on local node > -- > > Key: IGNITE-22542 > URL: https://issues.apache.org/jira/browse/IGNITE-22542 > Project: Ignite > Issue Type: Bug >Reporter: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > {{org.apache.ignite.internal.network.DefaultMessagingService#isSelf}} - if we > detect that we send a message to the local node, we handle it immediately in > the same thread, which could be very bed for throughput of the system. > "send"/"invoke" themselves appear to be slow as well, we should benchmark > them. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-22542) Synchronous message handling on local node
Ivan Bessonov created IGNITE-22542: -- Summary: Synchronous message handling on local node Key: IGNITE-22542 URL: https://issues.apache.org/jira/browse/IGNITE-22542 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov {{org.apache.ignite.internal.network.DefaultMessagingService#isSelf}} - if we detect that we send a message to the local node, we handle it immediately in the same thread, which could be very bed for throughput of the system. "send"/"invoke" themselves appear to be slow as well, we should benchmark them. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-22500) Remove unnecessary waits when creating an index
[ https://issues.apache.org/jira/browse/IGNITE-22500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov reassigned IGNITE-22500: -- Assignee: Ivan Bessonov > Remove unnecessary waits when creating an index > --- > > Key: IGNITE-22500 > URL: https://issues.apache.org/jira/browse/IGNITE-22500 > Project: Ignite > Issue Type: Improvement >Reporter: Roman Puchkovskiy >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > When creating an index with current defaults (DelayDuration=1sec, > MaxClockSkew=500ms, IdleSafeTimePropagationPeriod=1sec), it takes 6-6.5 > seconds on my machine (without concurrent transactions, on an empty table > that was just created). > According to the design, we need to first wait for the REGISTERED state to > activate on all nodes, including the ones that are currently down; this is to > make sure that all transactions started on schema versions before the index > creation have finished before we start to build the index (this makes us > waiting for DelayDuration+MaxClockSkew). Then, after the build finishes, we > switch the index to the AVAILABLE state. This requires another wait of > DelayDuration+MaxClockSkew. > Because of IGNITE-20378, in the second case we actually wait longer (for > additional IdleSafeTimePropagationPeriod+MaxClockSkew). > The total of waits is thus 1.5+3=4.5sec. But index creation actually takes > 6-6.5 seconds. It looks like there are some additional delays (like > submitting to the Metastorage and executing its watches). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-21661) Test scenario where all stable nodes are lost during a partially completed rebalance
[ https://issues.apache.org/jira/browse/IGNITE-21661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov updated IGNITE-21661: --- Reviewer: Kirill Tkalenko > Test scenario where all stable nodes are lost during a partially completed > rebalance > > > Key: IGNITE-21661 > URL: https://issues.apache.org/jira/browse/IGNITE-21661 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Bessonov >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > Time Spent: 10m > Remaining Estimate: 0h > > Following case is possible: > * Nodes A, B and C for a partition > * B and C go offline > * new distribution is A, D and E > * EDIT: rebalance can only be started with one more "resetPartitions" > * full state transfer from A to D is completed > * full state transfer from A to E is not > * A goes offline > * we perform "resetPartitions" > Ideally, we should use D as a new leader somehow, but the bare minimum should > be a partition that is functional, maybe an empty one. We should test the case > > This might be a good place to add more tests. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-22502) Change default DelayDuration to 500ms
[ https://issues.apache.org/jira/browse/IGNITE-22502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Bessonov reassigned IGNITE-22502: -- Assignee: Ivan Bessonov > Change default DelayDuration to 500ms > - > > Key: IGNITE-22502 > URL: https://issues.apache.org/jira/browse/IGNITE-22502 > Project: Ignite > Issue Type: Improvement >Reporter: Roman Puchkovskiy >Assignee: Ivan Bessonov >Priority: Major > Labels: ignite-3 > > When executing a DDL, we must wait for DelayDuration+MaxClockSkew. > DelayDuration for small clusters (which will probably be the usual mode of > operation) does not need to be long, so it makes sense to lower the default > from 1 second to 0.5 second. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-22509) Deadlock during the node stop
Ivan Bessonov created IGNITE-22509: -- Summary: Deadlock during the node stop Key: IGNITE-22509 URL: https://issues.apache.org/jira/browse/IGNITE-22509 Project: Ignite Issue Type: Bug Reporter: Ivan Bessonov {code:java} "%itcskvt_n_1%Raft-Group-Client-1@51623" prio=5 tid=0x4a6e nid=NA waiting for monitor entry java.lang.Thread.State: BLOCKED waiting for main@1 to release lock on <0xca23> (a org.apache.ignite.internal.app.LifecycleManager) at org.apache.ignite.internal.app.LifecycleManager.lambda$allComponentsStartFuture$1(LifecycleManager.java:130) at org.apache.ignite.internal.app.LifecycleManager$$Lambda$2852.843214322.accept(Unknown Source:-1) at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) at org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:550) at org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$handleThrowable$41(RaftGroupServiceImpl.java:605) at org.apache.ignite.internal.raft.RaftGroupServiceImpl$$Lambda$5439.1444714785.run(Unknown Source:-1) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264) at java.util.concurrent.FutureTask.run(FutureTask.java:-1) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.lang.Thread.run(Thread.java:829) {code} Holds busy lock in {{{}RaftGroupServiceImpl.sendWithRetry{}}}. {code:java} "main@1" prio=5 tid=0x1 nid=NA sleeping java.lang.Thread.State: TIMED_WAITING blocks %itcskvt_n_1%Raft-Group-Client-1@51623 at java.lang.Thread.sleep(Thread.java:-1) at org.apache.ignite.internal.util.IgniteSpinReadWriteLock.writeLock(IgniteSpinReadWriteLock.java:255) at org.apache.ignite.internal.util.IgniteSpinBusyLock.block(IgniteSpinBusyLock.java:68) at org.apache.ignite.internal.raft.RaftGroupServiceImpl.shutdown(RaftGroupServiceImpl.java:491) at org.apache.ignite.internal.metastorage.impl.MetaStorageServiceContext.close(MetaStorageServiceContext.java:75) at org.apache.ignite.internal.metastorage.impl.MetaStorageServiceImpl.close(MetaStorageServiceImpl.java:272) at org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl$$Lambda$5148.891107.accept(Unknown Source:-1) at org.apache.ignite.internal.util.IgniteUtils.cancelOrConsume(IgniteUtils.java:967) at org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl.lambda$stopAsync$13(MetaStorageManagerImpl.java:452) at org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl$$Lambda$5141.633101377.close(Unknown Source:-1) at org.apache.ignite.internal.util.IgniteUtils.lambda$closeAllManually$1(IgniteUtils.java:611) at org.apache.ignite.internal.util.IgniteUtils$$Lambda$4822.1427077270.accept(Unknown Source:-1) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177) at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) at org.apache.ignite.internal.util.IgniteUtils.closeAllManually(IgniteUtils.java:609) at org.apache.ignite.internal.util.IgniteUtils.closeAllManually(IgniteUtils.java:643) at org.apache.ignite.internal.metastorage.impl.MetaStorageManagerImpl.stopAsync(MetaStorageManagerImpl.java:449) at org.apache.ignite.internal.util.IgniteUtils.lambda$stopAsync$6(IgniteUtils.java:1213) at org.apache.ignite.internal.util.IgniteUtils$$Lambda$5013.753691797.apply(Unknown Source:-1) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipe