Re: [DISCUSS] Release pub-sub Ignite extensions
Thank you Alexey for preparing the release process. We should be good to release all the migrated modules. Regards, Saikat On Fri, Nov 20, 2020 at 5:43 AM Alexey Goncharuk wrote: > Igniters, > > I think we a bit overdue for releasing already migrated extension modules > which were removed in Ignite 2.9. As Saikat mentioned, I suggest releasing > the following modules: > ignite-flink-ext > ignite-flume-ext > ignite-pub-sub-ext > ignite-zeromq-ext > ignite-twitter-ext > ignite-rocketmq-ext > ignite-mqtt-ext > ignite-storm-ext > ignite-camel-ext > ignite-jms11-ext > ignite-kafka-ext > > I can be a release manager for these (I discussed this with Mikhail Petrov > - he was not intending to release these modules together with spring data). > Each extension will be released separately (a separate tag), but I suggest > having a single vote for them. > > Let me know if you have any objections. Meanwhile, I'll start preparing the > artifacts and branches/tags. > > --AG >
Re: Ignite extensions - ignite-spring-data release.
Hi Mikhail, Since spring-data-commons is common module and used internally we should be ok to not rename it to spring-data-commons-ext. Thank you for clarifying. Regards, Saikat On Thu, Nov 19, 2020 at 5:02 AM Mikhail Petrov wrote: > Petr, > > The purpose of the spring-data-commons modules is to store the general > classes needed by spring-data extensions to avoid redundant code > duplication between different version of Spring Data integration. I > don't think it can be reused outside the "extensions" scope. Why can't > it be placed in the ignite-extensions repository? > > Alexey, > > I don't mind if all extensions are released. I proposed to release > spring-data modules in the first place because Spring Data thin client > support is not included in any Ignite release and is crucial for some > users. > > Regards, > Mikhail > > On 19.11.2020 12:31, Petr Ivanov wrote: > > If it is not an extensions, so why do we put it to ignite-extensions > repository? > > > > Do we need additional separate ignite-utilities repository for modules > like spring-data-commons? > > > > > > > >> On 19 Nov 2020, at 12:08, Mikhail Petrov wrote: > >> > >> Saikat, > >> > >> spring-data-commons is a utility Ignite module that does not provide > integration with anything and is only needed to store Spring Data > version-independent classes for "spring-data" modules. > >> So, spring-data-commons is not an "extension". > >> > >> Should we rename it in this case? > >> > >> Regards, > >> Mikhail > >> > >> > >> On 19.11.2020 10:55, Petr Ivanov wrote: > >>> No 11 separate votes, but 11 separate tags is all I am proposing :) > >>> > >>> > On 19 Nov 2020, at 10:33, Denis Magda wrote: > > 11+ separate votes is an overkill. We certainly want, and agreed, to > be > able to release each extension separately. But I see nothing wrong if > releases of N extensions are passed through a single vote. > > On Wednesday, November 18, 2020, Petr Ivanov > wrote: > > > I would object against all together release of these modules if this > > process will be done in single release branch / tag. > > Despite of the fact that all these extensions are in single > repository, we > > have to treat them as separate projects with separate release cycle > and > > release each one of them in their own tag with correct naming we were > > discussing previously. > > > > > >> On 19 Nov 2020, at 04:26, Saikat Maitra > wrote: > >> > >> Hi, > >> > >> Mikhail, Can we please rename ignite-spring-data-commons to > >> ignite-spring-data-commons-ext? > >> > >> Denis, > >> > >> We are good to release the following migrated modules as well... > >> > >> ignite-flink-ext > >> ignite-flume-ext > >> ignite-pub-sub-ext > >> ignite-zeromq-ext > >> ignite-twitter-ext > >> ignite-rocketmq-ext > >> ignite-mqtt-ext > >> ignite-storm-ext > >> ignite-camel-ext > >> ignite-jms11-ext > >> ignite-kafka-ext > >> > >> It will be great if we can release all these modules also together. > >> > >> Regards, > >> Saikat > >> > >> > >> > >> > >> > >> > >> > >> On Wed, Nov 18, 2020 at 8:00 AM Mikhail Petrov < > pmgheap@gmail.com> > >> wrote: > >> > >>> Denis, > >>> > >>> I omitted "-ext" for simplicity. Currently, this suffix is present > in > >>> the name of all Spring Data integration modules [1], [2], [3]. > >>> > >>> [1] - > >>> > >>> https://github.com/apache/ignite-extensions/tree/master/ > > modules/spring-data-2.2-ext > >>> [2] - > >>> > >>> https://github.com/apache/ignite-extensions/tree/master/ > > modules/spring-data-2.0-ext > >>> [3] - > >>> > >>> https://github.com/apache/ignite-extensions/tree/master/ > > modules/spring-data-ext > >>> Regards, > >>> Mikhail > >>> > >>> On 18.11.2020 16:26, Denis Magda wrote: > Are we keeping the original names of theses Spring modules? In > separate > threads I saw that the names of other extensions end with “ext”. > > Also, how about making a single release of all the extensions > that were > migrated from the main Ignite repo. There are many of them > waiting for > >>> this > to happen. Saikat, Alex Goncharuk what do you think? > > Denis > > On Wednesday, November 18, 2020, Mikhail Petrov < > pmgheap@gmail.com > wrote: > > > Hello, Igniters. > > > > Since the migration of Ignite Spring Data modules to extensions, > thin > > client support for Spring Data integration was implemented. - > [1]. > > > > To make this feature available for users, I propose to start the > > release > > process of the following modules: > > > > * ignite-spring-data > > *
Re: 2.9.1 release scope and dates
Yaroslav, Hi. I suggest to merge minor fix of javadoc: [1]. It should have appeared in 2.9. Commits in master: d3e5b7c11ed037670700eea75851e619d5d1b6b1 and 1654e9fac61842424c08d26a08ef67569f74746a [1] https://github.com/apache/ignite/pull/8448 19.11.2020 17:15, Ivan Daschinsky пишет: Hi! Yaroslav, Max -- I have another ticket that will be nice to have in 2.9.1 https://issues.apache.org/jira/browse/IGNITE-13699 пт, 13 нояб. 2020 г. в 15:08, Yaroslav Molochkov : Igniters, hello! I think the scope of 2.9.1 is finalized. On 9 Nov 2020, at 12:04, Yaroslav Molochkov wrote: Ivan, thanks! Added it to the list. On 8 Nov 2020, at 14:13, Ivan Daschinsky wrote: Yaroslav, there is another bug for 2.9.1 release https://issues.apache.org/jira/browse/IGNITE-13572 чт, 5 нояб. 2020 г., 19:23 Yaroslav Molochkov : Ivan, hi! Sure. UPD: i am the release manager and will be doing this with Maxim's help (since i don't have some user permissions) On Thu, Nov 5, 2020 at 6:24 PM Ivan Daschinsky wrote: Hi. I'd suggest to add this issue. This is a usability improvement for zk discovery, and also this patch incorporates fixes for JMX metrics concurrency issues [1] -- https://issues.apache.org/jira/browse/IGNITE-13577 чт, 5 нояб. 2020 г., 16:20 Yaroslav Molochkov : Igniters! I'd like to help with the 2.9.1 release. The scope of this release includes following issues: https://issues.apache.org/jira/browse/IGNITE-13676?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%202.9.1 Maxim Muzafarov agreed to help me with the process and he will be the release manager. Scope freeze: Nov. 12th Code freeze: Nov. 19th Voting date: Nov. 26th Release date: Nov. 31st Tickets that were added (or to be added) to the scope don't bring new features but various bug fixes.
[DISCUSS] Release pub-sub Ignite extensions
Igniters, I think we a bit overdue for releasing already migrated extension modules which were removed in Ignite 2.9. As Saikat mentioned, I suggest releasing the following modules: ignite-flink-ext ignite-flume-ext ignite-pub-sub-ext ignite-zeromq-ext ignite-twitter-ext ignite-rocketmq-ext ignite-mqtt-ext ignite-storm-ext ignite-camel-ext ignite-jms11-ext ignite-kafka-ext I can be a release manager for these (I discussed this with Mikhail Petrov - he was not intending to release these modules together with spring data). Each extension will be released separately (a separate tag), but I suggest having a single vote for them. Let me know if you have any objections. Meanwhile, I'll start preparing the artifacts and branches/tags. --AG
[MTCGA]: new failures in builds [5745955] needs to be handled
Hi Igniters, I've detected some new issue on TeamCity to be handled. You are more than welcomed to help. *New Critical Failure in master Thin client: Node.js https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_ThinClientNodeJs?branch=%3Cdefault%3E No changes in the build - Here's a reminder of what contributors were agreed to do https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute - Should you have any questions please contact dev@ignite.apache.org Best Regards, Apache Ignite TeamCity Bot https://github.com/apache/ignite-teamcity-bot Notification generated at 14:22:40 20-11-2020
[jira] [Created] (IGNITE-13742) Fix failed WalModeChangeAdvancedSelfTest.testMaintenanceIsSkippedIfWasFixedManuallyOnDowntime
Ivan Bessonov created IGNITE-13742: -- Summary: Fix failed WalModeChangeAdvancedSelfTest.testMaintenanceIsSkippedIfWasFixedManuallyOnDowntime Key: IGNITE-13742 URL: https://issues.apache.org/jira/browse/IGNITE-13742 Project: Ignite Issue Type: Sub-task Reporter: Ivan Bessonov https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=5803772702668480758&tab=testDetails -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re[2]: [DISCUSS] Page replacement improvement
>Zhenya, > >> Alexey, we already have changes that partially fixes this issue [1] >IGNITE-13086 it's a minor improvement. We still have major problems with >our page replacement algorithm (slow page selection and non-optimal >page-fault rate). I think changing from random 5 pages to 7 will make >things even worse (it's better for page-fault rate, but page selection will >be slower). All this words above need to be proven, i hope. + 1 with Nikolay, we need correct reproduces or some graphs from 2.9 ver. > >> This approach still not applicable for real life >Why do you think batch replacement is not applicable for real-life? It can >be applied for workloads, where some big amount of data periodically used, >but not very often. For example, when OLAP request over historical data >raised pages to page-memory, and after such request this data is not needed >for a long time. Or when OLTP transactions mostly add new data and process >recent data but rarely touch historical data. In these cases with the >current approach, we will enter "page replacement mode" after some period >of time and never leave it. With batch page replacement there is a chance >to prevent random-LRU page replacement or postpone it. I understand case you described, but who will pull the switch ? Human, artificial intelligence ? You approach assume some triggering from inner, i don`t like this. > >> But request once more, do you really observe such problems with 2.9 ver ? >Any graphs maybe ? >I don't have production usage feedback after IGNITE-13086, but I doubt >something changed significantly. Lets wait ?:) In any case (Nikolay, Alex) IGNITE-13086 includes yardstik bench for PR proven, we can use it once more. Thanks ! > > >чт, 19 нояб. 2020 г. в 09:18, Zhenya Stanilovsky < arzamas...@mail.ru.invalid >>: > >> >> Alexey, we already have changes that partially fixes this issue [1] >> Easy way: >> Looks like we already have converge in page replacement. >> If we change 5 times touch iterator from random lru algo into, for >> example — 7 we will obtain fast improvement from scratch. >> >> » Batch page replacement >> This approach still not applicable for real life if you wan`t to observe >> ugly people for threshold (i.e. 12 h) interval. And, of course, you >> understand that dramatically reduce of such interval gives nothing? >> >> » Change the page replacement algorithm. >> That`s way i vote for ) But request once more, do you really observe such >> problems with 2.9 ver ? Any graphs maybe ? >> >> thanks ! >> >> [1] https://issues.apache.org/jira/browse/IGNITE-13086 >> >Hello, Igniters! >> > >> >Currently, for page replacement (page rotation between page-memory and >> >disk) we use Random-LRU algorithm. It has a low maintenance cost and >> >relatively simple implementation, but it has many disadvantages and >> affects >> >performance very much when replacement is started. We even have warnings >> in >> >the log when page replacement started and a special event for this. I know >> >Ignite deployments where administrators force to restart cluster nodes >> >periodically to avoid page replacement. >> > >> >I have a couple of proposals to improve page replacement in Ignite: >> > >> >*Batch page replacement.* >> > >> >Main idea: in some cases start background task to evict cold pages from >> >page-memory (for example, pages, last touched more than 12 hours ago). >> > >> >The task can be started: >> >- Automatically, triggered by some events, for example, when we expect a >> >start of Random-LRU page replacing soon (allocated more than 90% of >> >page-memory) + we have enough amount of cold pages (we need some metric to >> >calculate the number of cold pages) + some time passed since last batch >> >page replacement (to avoid too much resource consumption by background >> >batch replacement). >> >- Manually (JMX or control.sh), if an administrator wants to control the >> >time of batch replacement more precisely (for example, to avoid the start >> >of this task during peak time). >> > >> >Batch page replacement will be helpful in some workloads (when some data >> >much colder than another), it can prevent the starting of Random-LRU page >> >replacement, or if Random-LRU already started it can provide conditions to >> >stop it. >> > >> >*Change the page replacement algorithm.* >> > >> >Good page replacement algorithm should satisfy the requirements: >> >- low page-fault rates for typical workload >> >- low maintenance cost (low resource consumption to maintain additional >> >structures required for page replacement) >> >- fast searching of next page for replacement >> >- sequential scans resistance (one sequential scan should not evict all >> >relatively hot pages from page-memory) >> > >> >Our Random-LRU has low maintenance cost and sequential scan resistant, but >> >to find the next page for replacement in the best case we scan 5 pages, in >> >the worst case we can scan all data region segment. Also, due to random >> >nature, it
[jira] [Created] (IGNITE-13741) NPE on client reconnect with static caches configured
Amelchev Nikita created IGNITE-13741: Summary: NPE on client reconnect with static caches configured Key: IGNITE-13741 URL: https://issues.apache.org/jira/browse/IGNITE-13741 Project: Ignite Issue Type: Bug Affects Versions: 2.8.1 Reporter: Amelchev Nikita There was NPE during the client node reconnect to the cluster. 1. Configure static replicated persisted cache. 2. Start 2 server nodes and 1 client node. Activate the cluster and fill the cache with any values. 3. Restart all servers. 4. When client reconnected it throws the exception from {{CacheAffinitySharedManager#processClientCachesRequests}}: {noformat} // Get ready exchange version. AffinityTopologyVersion topVer = cctx.exchange().readyAffinityVersion(); DiscoCache discoCache = cctx.discovery().discoCache(topVer); // discoCache was null ClusterNode node = discoCache.oldestAliveServerNode(); {noformat} {noformat} 2020-11-17 17:22:59.471 [INFO ][exchange-worker-#65][org.apache.ignite.internal.processors.cache.GridCacheProcessor] Stopped cache [cacheName=test-persist-partitioned-atomic-cache] 2020-11-17 17:22:59.472 [WARN ][tcp-client-disco-msg-worker-#4][org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi] Client node was reconnected after it was already considered failed by the server topology (this could happen after all servers restarted or due to a long network outage between the client and servers). All continuous queries and remote event listeners created by this client will be unsubscribed, consider listening to EVT_CLIENT_NODE_RECONNECTED event to restore them. 2020-11-17 17:22:59.473 [INFO ][sys-#56][org.apache.ignite.internal.processors.cluster.ClusterProcessor] Writing cluster ID and tag to metastorage on ready for write ClusterIdAndTag [id=b4d4e0b1-23ca-4399-bf0d-5238bb42eccb, tag=admiring_herschel] 2020-11-17 17:22:59.473 [INFO ][disco-event-worker-#52][org.apache.ignite.internal.managers.discovery.GridDiscoveryManager] Client node reconnected to topology: TcpDiscoveryNode [id=9864d053-9100-4ab1-892b-540f19b30c32, consistentId=Example_3, addrs=ArrayList [*], sockAddrs=HashSet [*], discPort=0, order=2, intOrder=0, lastExchangeTime=1605622970162, loc=true, ver=2.8.1, isClient=true] 2020-11-17 17:22:59.474 [INFO ][disco-event-worker-#52][org.apache.ignite.internal.managers.discovery.GridDiscoveryManager] Topology snapshot [ver=2, locNode=9864d053, servers=1, clients=1, state=ACTIVE, CPUs=8, offheap=4.0GB, heap=6.9GB] 2020-11-17 17:22:59.474 [INFO ][disco-event-worker-#52][org.apache.ignite.internal.managers.discovery.GridDiscoveryManager] ^-- Baseline [id=0, size=2, online=1, offline=1] 2020-11-17 17:22:59.476 [INFO ][disco-notifier-worker-#50][org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor] Received state change finish message: true 2020-11-17 17:22:59.477 [ERROR][exchange-worker-#65][org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager] Failed to process custom exchange task: ClientCacheChangeDummyDiscoveryMessage [reqId=9eb245bd-5cdc-4d13-8eac-9e7c7585c83f, cachesToClose=null, startCaches= [test-persist-partitioned-atomic-cache]] java.lang.NullPointerException: null at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.processClientCachesRequests(CacheAffinitySharedManager.java:666) ~[ignite-core-2.8.1.jar:2.8.1] at org.apache.ignite.internal.processors.cache.GridCacheProcessor.processCustomExchangeTask(GridCacheProcessor.java:408) ~[ignite-core-2.8.1.jar:2.8.1] at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.processCustomTask(GridCachePartitionExchangeManager.java:2997) [ignite-core-2.8.1.jar:2.8.1] at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3145) [ignite-core-2.8.1.jar:2.8.1] at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3063) [ignite-core-2.8.1.jar:2.8.1] at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) [ignite-core-2.8.1.jar:2.8.1] at java.lang.Thread.run(Thread.java:748) 2020-11-17 17:22:59.479 [INFO ][exchange-worker-#65][org.apache.ignite.internal.exchange.time] Started exchange init [topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0], crd=false, evt=NODE_JOINED, evtNode=9864d053-9100-4ab1-892b-540f19b30c32, customEvt=null, allowMerge=true, exchangeFreeSwitch=false] {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Page replacement improvement
Hello, Alex. Do you have a reproducer for a Page memory issues, you described? What is consequences of that types of issues? Can we create some «benchmark» that will measure imrpovements of page replacement algorithm? May be we can use ducktape branch for it? > 20 нояб. 2020 г., в 12:05, Alex Plehanov написал(а): > > Zhenya, > >> Alexey, we already have changes that partially fixes this issue [1] > IGNITE-13086 it's a minor improvement. We still have major problems with > our page replacement algorithm (slow page selection and non-optimal > page-fault rate). I think changing from random 5 pages to 7 will make > things even worse (it's better for page-fault rate, but page selection will > be slower). > >> This approach still not applicable for real life > Why do you think batch replacement is not applicable for real-life? It can > be applied for workloads, where some big amount of data periodically used, > but not very often. For example, when OLAP request over historical data > raised pages to page-memory, and after such request this data is not needed > for a long time. Or when OLTP transactions mostly add new data and process > recent data but rarely touch historical data. In these cases with the > current approach, we will enter "page replacement mode" after some period > of time and never leave it. With batch page replacement there is a chance > to prevent random-LRU page replacement or postpone it. > >> But request once more, do you really observe such problems with 2.9 ver ? > Any graphs maybe ? > I don't have production usage feedback after IGNITE-13086, but I doubt > something changed significantly. > > > чт, 19 нояб. 2020 г. в 09:18, Zhenya Stanilovsky > : > >> >> Alexey, we already have changes that partially fixes this issue [1] >> Easy way: >> Looks like we already have converge in page replacement. >> If we change 5 times touch iterator from random lru algo into, for >> example — 7 we will obtain fast improvement from scratch. >> >> » Batch page replacement >> This approach still not applicable for real life if you wan`t to observe >> ugly people for threshold (i.e. 12 h) interval. And, of course, you >> understand that dramatically reduce of such interval gives nothing? >> >> » Change the page replacement algorithm. >> That`s way i vote for ) But request once more, do you really observe such >> problems with 2.9 ver ? Any graphs maybe ? >> >> thanks ! >> >> [1] https://issues.apache.org/jira/browse/IGNITE-13086 >>> Hello, Igniters! >>> >>> Currently, for page replacement (page rotation between page-memory and >>> disk) we use Random-LRU algorithm. It has a low maintenance cost and >>> relatively simple implementation, but it has many disadvantages and >> affects >>> performance very much when replacement is started. We even have warnings >> in >>> the log when page replacement started and a special event for this. I know >>> Ignite deployments where administrators force to restart cluster nodes >>> periodically to avoid page replacement. >>> >>> I have a couple of proposals to improve page replacement in Ignite: >>> >>> *Batch page replacement.* >>> >>> Main idea: in some cases start background task to evict cold pages from >>> page-memory (for example, pages, last touched more than 12 hours ago). >>> >>> The task can be started: >>> - Automatically, triggered by some events, for example, when we expect a >>> start of Random-LRU page replacing soon (allocated more than 90% of >>> page-memory) + we have enough amount of cold pages (we need some metric to >>> calculate the number of cold pages) + some time passed since last batch >>> page replacement (to avoid too much resource consumption by background >>> batch replacement). >>> - Manually (JMX or control.sh), if an administrator wants to control the >>> time of batch replacement more precisely (for example, to avoid the start >>> of this task during peak time). >>> >>> Batch page replacement will be helpful in some workloads (when some data >>> much colder than another), it can prevent the starting of Random-LRU page >>> replacement, or if Random-LRU already started it can provide conditions to >>> stop it. >>> >>> *Change the page replacement algorithm.* >>> >>> Good page replacement algorithm should satisfy the requirements: >>> - low page-fault rates for typical workload >>> - low maintenance cost (low resource consumption to maintain additional >>> structures required for page replacement) >>> - fast searching of next page for replacement >>> - sequential scans resistance (one sequential scan should not evict all >>> relatively hot pages from page-memory) >>> >>> Our Random-LRU has low maintenance cost and sequential scan resistant, but >>> to find the next page for replacement in the best case we scan 5 pages, in >>> the worst case we can scan all data region segment. Also, due to random >>> nature, it's not very effective in predicting the right page for >>> replacement to minimize the page-fault rate. And it
Re: [DISCUSS] Page replacement improvement
Zhenya, > Alexey, we already have changes that partially fixes this issue [1] IGNITE-13086 it's a minor improvement. We still have major problems with our page replacement algorithm (slow page selection and non-optimal page-fault rate). I think changing from random 5 pages to 7 will make things even worse (it's better for page-fault rate, but page selection will be slower). > This approach still not applicable for real life Why do you think batch replacement is not applicable for real-life? It can be applied for workloads, where some big amount of data periodically used, but not very often. For example, when OLAP request over historical data raised pages to page-memory, and after such request this data is not needed for a long time. Or when OLTP transactions mostly add new data and process recent data but rarely touch historical data. In these cases with the current approach, we will enter "page replacement mode" after some period of time and never leave it. With batch page replacement there is a chance to prevent random-LRU page replacement or postpone it. > But request once more, do you really observe such problems with 2.9 ver ? Any graphs maybe ? I don't have production usage feedback after IGNITE-13086, but I doubt something changed significantly. чт, 19 нояб. 2020 г. в 09:18, Zhenya Stanilovsky : > > Alexey, we already have changes that partially fixes this issue [1] > Easy way: > Looks like we already have converge in page replacement. > If we change 5 times touch iterator from random lru algo into, for > example — 7 we will obtain fast improvement from scratch. > > » Batch page replacement > This approach still not applicable for real life if you wan`t to observe > ugly people for threshold (i.e. 12 h) interval. And, of course, you > understand that dramatically reduce of such interval gives nothing? > > » Change the page replacement algorithm. > That`s way i vote for ) But request once more, do you really observe such > problems with 2.9 ver ? Any graphs maybe ? > > thanks ! > > [1] https://issues.apache.org/jira/browse/IGNITE-13086 > >Hello, Igniters! > > > >Currently, for page replacement (page rotation between page-memory and > >disk) we use Random-LRU algorithm. It has a low maintenance cost and > >relatively simple implementation, but it has many disadvantages and > affects > >performance very much when replacement is started. We even have warnings > in > >the log when page replacement started and a special event for this. I know > >Ignite deployments where administrators force to restart cluster nodes > >periodically to avoid page replacement. > > > >I have a couple of proposals to improve page replacement in Ignite: > > > >*Batch page replacement.* > > > >Main idea: in some cases start background task to evict cold pages from > >page-memory (for example, pages, last touched more than 12 hours ago). > > > >The task can be started: > >- Automatically, triggered by some events, for example, when we expect a > >start of Random-LRU page replacing soon (allocated more than 90% of > >page-memory) + we have enough amount of cold pages (we need some metric to > >calculate the number of cold pages) + some time passed since last batch > >page replacement (to avoid too much resource consumption by background > >batch replacement). > >- Manually (JMX or control.sh), if an administrator wants to control the > >time of batch replacement more precisely (for example, to avoid the start > >of this task during peak time). > > > >Batch page replacement will be helpful in some workloads (when some data > >much colder than another), it can prevent the starting of Random-LRU page > >replacement, or if Random-LRU already started it can provide conditions to > >stop it. > > > >*Change the page replacement algorithm.* > > > >Good page replacement algorithm should satisfy the requirements: > >- low page-fault rates for typical workload > >- low maintenance cost (low resource consumption to maintain additional > >structures required for page replacement) > >- fast searching of next page for replacement > >- sequential scans resistance (one sequential scan should not evict all > >relatively hot pages from page-memory) > > > >Our Random-LRU has low maintenance cost and sequential scan resistant, but > >to find the next page for replacement in the best case we scan 5 pages, in > >the worst case we can scan all data region segment. Also, due to random > >nature, it's not very effective in predicting the right page for > >replacement to minimize the page-fault rate. And it's much time required > to > >totally evict old cold data. > > > >Usually, database management systems and operating systems use > >modifications of LRU algorithms. These algorithms have higher maintenance > >costs (pages list should be modified on each page access), but often they > >are effective from a "page-fault rate" point of view and have O(1) > >complexity for a searching page to replace. Simple LRU is not sequential > >scan resistant, but m