Re: [DISCUSS] Release pub-sub Ignite extensions

2020-11-20 Thread Saikat Maitra
Thank you Alexey for preparing the release process. We should be good to
release all the migrated modules.

Regards,
Saikat

On Fri, Nov 20, 2020 at 5:43 AM Alexey Goncharuk 
wrote:

> Igniters,
>
> I think we a bit overdue for releasing already migrated extension modules
> which were removed in Ignite 2.9. As Saikat mentioned, I suggest releasing
> the following modules:
> ignite-flink-ext
> ignite-flume-ext
> ignite-pub-sub-ext
> ignite-zeromq-ext
> ignite-twitter-ext
> ignite-rocketmq-ext
> ignite-mqtt-ext
> ignite-storm-ext
> ignite-camel-ext
> ignite-jms11-ext
> ignite-kafka-ext
>
> I can be a release manager for these (I discussed this with Mikhail Petrov
> - he was not intending to release these modules together with spring data).
> Each extension will be released separately (a separate tag), but I suggest
> having a single vote for them.
>
> Let me know if you have any objections. Meanwhile, I'll start preparing the
> artifacts and branches/tags.
>
> --AG
>


Re: Ignite extensions - ignite-spring-data release.

2020-11-20 Thread Saikat Maitra
Hi Mikhail,

Since spring-data-commons is common module and used internally we should be
ok to not rename it to spring-data-commons-ext.

Thank you for clarifying.

Regards,
Saikat

On Thu, Nov 19, 2020 at 5:02 AM Mikhail Petrov 
wrote:

> Petr,
>
> The purpose of the spring-data-commons modules is to store the general
> classes needed by spring-data extensions to avoid redundant code
> duplication between different version of Spring Data integration. I
> don't think it can be reused outside the "extensions" scope. Why can't
> it be placed in the ignite-extensions repository?
>
> Alexey,
>
> I don't mind if all extensions are released. I proposed to release
> spring-data modules in the first place because Spring Data thin client
> support is not included in any Ignite release and is crucial for some
> users.
>
> Regards,
> Mikhail
>
> On 19.11.2020 12:31, Petr Ivanov wrote:
> > If it is not an extensions, so why do we put it to ignite-extensions
> repository?
> >
> > Do we need additional separate ignite-utilities repository for modules
> like spring-data-commons?
> >
> >
> >
> >> On 19 Nov 2020, at 12:08, Mikhail Petrov  wrote:
> >>
> >> Saikat,
> >>
> >> spring-data-commons is a utility Ignite module that does not provide
> integration with anything and is only needed to store Spring Data
> version-independent classes for "spring-data" modules.
> >> So, spring-data-commons is not an "extension".
> >>
> >> Should we rename it in this case?
> >>
> >> Regards,
> >> Mikhail
> >>
> >>
> >> On 19.11.2020 10:55, Petr Ivanov wrote:
> >>> No 11 separate votes, but 11 separate tags is all I am proposing :)
> >>>
> >>>
>  On 19 Nov 2020, at 10:33, Denis Magda  wrote:
> 
>  11+ separate votes is an overkill. We certainly want, and agreed, to
> be
>  able to release each extension separately. But I see nothing wrong if
>  releases of N extensions are passed through a single vote.
> 
>  On Wednesday, November 18, 2020, Petr Ivanov 
> wrote:
> 
> > I would object against all together release of these modules if this
> > process will be done in single release branch / tag.
> > Despite of the fact that all these extensions are in single
> repository, we
> > have to treat them as separate projects with separate release cycle
> and
> > release each one of them in their own tag with correct naming we were
> > discussing previously.
> >
> >
> >> On 19 Nov 2020, at 04:26, Saikat Maitra 
> wrote:
> >>
> >> Hi,
> >>
> >> Mikhail, Can we please rename ignite-spring-data-commons to
> >> ignite-spring-data-commons-ext?
> >>
> >> Denis,
> >>
> >> We are good to release the following migrated modules as well...
> >>
> >> ignite-flink-ext
> >> ignite-flume-ext
> >> ignite-pub-sub-ext
> >> ignite-zeromq-ext
> >> ignite-twitter-ext
> >> ignite-rocketmq-ext
> >> ignite-mqtt-ext
> >> ignite-storm-ext
> >> ignite-camel-ext
> >> ignite-jms11-ext
> >> ignite-kafka-ext
> >>
> >> It will be great if we can release all these modules also together.
> >>
> >> Regards,
> >> Saikat
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Nov 18, 2020 at 8:00 AM Mikhail Petrov <
> pmgheap@gmail.com>
> >> wrote:
> >>
> >>> Denis,
> >>>
> >>> I omitted "-ext" for simplicity. Currently, this suffix is present
> in
> >>> the name of  all Spring Data integration modules [1], [2], [3].
> >>>
> >>> [1] -
> >>>
> >>> https://github.com/apache/ignite-extensions/tree/master/
> > modules/spring-data-2.2-ext
> >>> [2] -
> >>>
> >>> https://github.com/apache/ignite-extensions/tree/master/
> > modules/spring-data-2.0-ext
> >>> [3] -
> >>>
> >>> https://github.com/apache/ignite-extensions/tree/master/
> > modules/spring-data-ext
> >>> Regards,
> >>> Mikhail
> >>>
> >>> On 18.11.2020 16:26, Denis Magda wrote:
>  Are we keeping the original names of theses Spring modules? In
> separate
>  threads I saw that the names of other extensions end with “ext”.
> 
>  Also, how about making a single release of all the extensions
> that were
>  migrated from the main Ignite repo. There are many of them
> waiting for
> >>> this
>  to happen. Saikat, Alex Goncharuk what do you think?
> 
>  Denis
> 
>  On Wednesday, November 18, 2020, Mikhail Petrov <
> pmgheap@gmail.com
>  wrote:
> 
> > Hello, Igniters.
> >
> > Since the migration of Ignite Spring Data modules to extensions,
> thin
> > client support for Spring Data integration was implemented. -
> [1].
> >
> > To make this feature available for users, I propose to start the
> > release
> > process of the following modules:
> >
> > * ignite-spring-data
> > *

Re: 2.9.1 release scope and dates

2020-11-20 Thread Steshin Vladimir

Yaroslav, Hi.


I suggest to merge minor fix of javadoc: [1]. It should have appeared in 
2.9. Commits in master:


d3e5b7c11ed037670700eea75851e619d5d1b6b1

and

1654e9fac61842424c08d26a08ef67569f74746a


[1] https://github.com/apache/ignite/pull/8448



19.11.2020 17:15, Ivan Daschinsky пишет:

Hi!
Yaroslav, Max -- I have another ticket that will be nice to have in 2.9.1
https://issues.apache.org/jira/browse/IGNITE-13699

пт, 13 нояб. 2020 г. в 15:08, Yaroslav Molochkov :


Igniters, hello!

I think the scope of 2.9.1 is finalized.


On 9 Nov 2020, at 12:04, Yaroslav Molochkov 

wrote:

Ivan, thanks!

Added it to the list.


On 8 Nov 2020, at 14:13, Ivan Daschinsky  wrote:

Yaroslav, there is another bug for 2.9.1 release
https://issues.apache.org/jira/browse/IGNITE-13572

чт, 5 нояб. 2020 г., 19:23 Yaroslav Molochkov :


Ivan, hi!
Sure.

UPD: i am the release manager and will be doing this with Maxim's help
(since i don't have some user permissions)


On Thu, Nov 5, 2020 at 6:24 PM Ivan Daschinsky 
wrote:

Hi. I'd suggest to add this issue. This is a usability improvement

for zk

discovery, and also this patch incorporates fixes for JMX metrics
concurrency issues

[1] -- https://issues.apache.org/jira/browse/IGNITE-13577

чт, 5 нояб. 2020 г., 16:20 Yaroslav Molochkov 
:

Igniters!

I'd like to help with the 2.9.1 release. The scope of this release

includes

following issues:



https://issues.apache.org/jira/browse/IGNITE-13676?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%202.9.1

Maxim Muzafarov agreed to help me with the process and he will be the
release manager.

Scope freeze: Nov. 12th
Code freeze: Nov. 19th
Voting date: Nov. 26th
Release date: Nov. 31st

Tickets that were added (or to be added) to the scope don't bring new
features but various bug fixes.





[DISCUSS] Release pub-sub Ignite extensions

2020-11-20 Thread Alexey Goncharuk
Igniters,

I think we a bit overdue for releasing already migrated extension modules
which were removed in Ignite 2.9. As Saikat mentioned, I suggest releasing
the following modules:
ignite-flink-ext
ignite-flume-ext
ignite-pub-sub-ext
ignite-zeromq-ext
ignite-twitter-ext
ignite-rocketmq-ext
ignite-mqtt-ext
ignite-storm-ext
ignite-camel-ext
ignite-jms11-ext
ignite-kafka-ext

I can be a release manager for these (I discussed this with Mikhail Petrov
- he was not intending to release these modules together with spring data).
Each extension will be released separately (a separate tag), but I suggest
having a single vote for them.

Let me know if you have any objections. Meanwhile, I'll start preparing the
artifacts and branches/tags.

--AG


[MTCGA]: new failures in builds [5745955] needs to be handled

2020-11-20 Thread dpavlov . tasks
Hi Igniters,

 I've detected some new issue on TeamCity to be handled. You are more than 
welcomed to help.

 *New Critical Failure in master Thin client: Node.js 
https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_ThinClientNodeJs?branch=%3Cdefault%3E
 No changes in the build

 - Here's a reminder of what contributors were agreed to do 
https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute 
 - Should you have any questions please contact dev@ignite.apache.org 

Best Regards,
Apache Ignite TeamCity Bot 
https://github.com/apache/ignite-teamcity-bot
Notification generated at 14:22:40 20-11-2020 


[jira] [Created] (IGNITE-13742) Fix failed WalModeChangeAdvancedSelfTest.testMaintenanceIsSkippedIfWasFixedManuallyOnDowntime

2020-11-20 Thread Ivan Bessonov (Jira)
Ivan Bessonov created IGNITE-13742:
--

 Summary: Fix failed 
WalModeChangeAdvancedSelfTest.testMaintenanceIsSkippedIfWasFixedManuallyOnDowntime
 Key: IGNITE-13742
 URL: https://issues.apache.org/jira/browse/IGNITE-13742
 Project: Ignite
  Issue Type: Sub-task
Reporter: Ivan Bessonov


https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=5803772702668480758&tab=testDetails



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re[2]: [DISCUSS] Page replacement improvement

2020-11-20 Thread Zhenya Stanilovsky





 
>Zhenya,
>
>> Alexey, we already have changes that partially fixes this issue [1]
>IGNITE-13086 it's a minor improvement. We still have major problems with
>our page replacement algorithm (slow page selection and non-optimal
>page-fault rate). I think changing from random 5 pages to 7 will make
>things even worse (it's better for page-fault rate, but page selection will
>be slower).
All this words above need to be proven, i hope. + 1 with Nikolay, we need 
correct reproduces or some graphs from 2.9 ver.
 
>
>> This approach still not applicable for real life
>Why do you think batch replacement is not applicable for real-life? It can
>be applied for workloads, where some big amount of data periodically used,
>but not very often. For example, when OLAP request over historical data
>raised pages to page-memory, and after such request this data is not needed
>for a long time. Or when OLTP transactions mostly add new data and process
>recent data but rarely touch historical data. In these cases with the
>current approach, we will enter "page replacement mode" after some period
>of time and never leave it. With batch page replacement there is a chance
>to prevent random-LRU page replacement or postpone it.
I understand case you described, but who will pull the switch ? Human, 
artificial intelligence ?
You approach assume some triggering from inner, i don`t like this.  
 
>
>> But request once more, do you really observe such problems with 2.9 ver ?
>Any graphs maybe ?
>I don't have production usage feedback after IGNITE-13086, but I doubt
>something changed significantly.
 
Lets wait ?:) In any case (Nikolay, Alex) IGNITE-13086 includes yardstik bench 
for PR proven, we can use it once more.
 
Thanks !
>
>
>чт, 19 нояб. 2020 г. в 09:18, Zhenya Stanilovsky < arzamas...@mail.ru.invalid
>>:
>
>>
>> Alexey, we already have changes that partially fixes this issue [1]
>> Easy way:
>> Looks like we already have converge in page replacement.
>> If we change 5 times touch iterator from random lru algo into, for
>> example — 7 we will obtain fast improvement from scratch.
>>
>> » Batch page replacement
>> This approach still not applicable for real life if you wan`t to observe
>> ugly people for threshold (i.e. 12 h) interval. And, of course, you
>> understand that dramatically reduce of such interval gives nothing?
>>
>> » Change the page replacement algorithm.
>> That`s way i vote for ) But request once more, do you really observe such
>> problems with 2.9 ver ? Any graphs maybe ?
>>
>> thanks !
>>
>> [1]  https://issues.apache.org/jira/browse/IGNITE-13086
>> >Hello, Igniters!
>> >
>> >Currently, for page replacement (page rotation between page-memory and
>> >disk) we use Random-LRU algorithm. It has a low maintenance cost and
>> >relatively simple implementation, but it has many disadvantages and
>> affects
>> >performance very much when replacement is started. We even have warnings
>> in
>> >the log when page replacement started and a special event for this. I know
>> >Ignite deployments where administrators force to restart cluster nodes
>> >periodically to avoid page replacement.
>> >
>> >I have a couple of proposals to improve page replacement in Ignite:
>> >
>> >*Batch page replacement.*
>> >
>> >Main idea: in some cases start background task to evict cold pages from
>> >page-memory (for example, pages, last touched more than 12 hours ago).
>> >
>> >The task can be started:
>> >- Automatically, triggered by some events, for example, when we expect a
>> >start of Random-LRU page replacing soon (allocated more than 90% of
>> >page-memory) + we have enough amount of cold pages (we need some metric to
>> >calculate the number of cold pages) + some time passed since last batch
>> >page replacement (to avoid too much resource consumption by background
>> >batch replacement).
>> >- Manually (JMX or control.sh), if an administrator wants to control the
>> >time of batch replacement more precisely (for example, to avoid the start
>> >of this task during peak time).
>> >
>> >Batch page replacement will be helpful in some workloads (when some data
>> >much colder than another), it can prevent the starting of Random-LRU page
>> >replacement, or if Random-LRU already started it can provide conditions to
>> >stop it.
>> >
>> >*Change the page replacement algorithm.*
>> >
>> >Good page replacement algorithm should satisfy the requirements:
>> >- low page-fault rates for typical workload
>> >- low maintenance cost (low resource consumption to maintain additional
>> >structures required for page replacement)
>> >- fast searching of next page for replacement
>> >- sequential scans resistance (one sequential scan should not evict all
>> >relatively hot pages from page-memory)
>> >
>> >Our Random-LRU has low maintenance cost and sequential scan resistant, but
>> >to find the next page for replacement in the best case we scan 5 pages, in
>> >the worst case we can scan all data region segment. Also, due to random
>> >nature, it

[jira] [Created] (IGNITE-13741) NPE on client reconnect with static caches configured

2020-11-20 Thread Amelchev Nikita (Jira)
Amelchev Nikita created IGNITE-13741:


 Summary: NPE on client reconnect with static caches configured
 Key: IGNITE-13741
 URL: https://issues.apache.org/jira/browse/IGNITE-13741
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.8.1
Reporter: Amelchev Nikita


There was NPE during the client node reconnect to the cluster.

1. Configure static replicated persisted cache.
2. Start 2 server nodes and 1 client node. Activate the cluster and fill the 
cache with any values.
3. Restart all servers.
4. When client reconnected it throws the exception from 
{{CacheAffinitySharedManager#processClientCachesRequests}}: 
{noformat}
// Get ready exchange version.
AffinityTopologyVersion topVer = cctx.exchange().readyAffinityVersion();

DiscoCache discoCache = cctx.discovery().discoCache(topVer);

// discoCache was null
ClusterNode node = discoCache.oldestAliveServerNode();
{noformat}

{noformat}
2020-11-17 17:22:59.471 [INFO 
][exchange-worker-#65][org.apache.ignite.internal.processors.cache.GridCacheProcessor]
 Stopped cache [cacheName=test-persist-partitioned-atomic-cache]
2020-11-17 17:22:59.472 [WARN 
][tcp-client-disco-msg-worker-#4][org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi]
 Client node was reconnected after it was already considered failed by the 
server topology (this could happen after all servers restarted or due to a long 
network outage between the client and servers). All continuous queries and 
remote event listeners created by this client will be unsubscribed, consider 
listening to EVT_CLIENT_NODE_RECONNECTED event to restore them.
2020-11-17 17:22:59.473 [INFO 
][sys-#56][org.apache.ignite.internal.processors.cluster.ClusterProcessor] 
Writing cluster ID and tag to metastorage on ready for write ClusterIdAndTag 
[id=b4d4e0b1-23ca-4399-bf0d-5238bb42eccb, tag=admiring_herschel]
2020-11-17 17:22:59.473 [INFO 
][disco-event-worker-#52][org.apache.ignite.internal.managers.discovery.GridDiscoveryManager]
 Client node reconnected to topology: TcpDiscoveryNode 
[id=9864d053-9100-4ab1-892b-540f19b30c32, consistentId=Example_3, 
addrs=ArrayList [*], sockAddrs=HashSet [*], discPort=0, order=2, intOrder=0, 
lastExchangeTime=1605622970162, loc=true, ver=2.8.1, isClient=true]
2020-11-17 17:22:59.474 [INFO 
][disco-event-worker-#52][org.apache.ignite.internal.managers.discovery.GridDiscoveryManager]
 Topology snapshot [ver=2, locNode=9864d053, servers=1, clients=1, 
state=ACTIVE, CPUs=8, offheap=4.0GB, heap=6.9GB]
2020-11-17 17:22:59.474 [INFO 
][disco-event-worker-#52][org.apache.ignite.internal.managers.discovery.GridDiscoveryManager]
   ^-- Baseline [id=0, size=2, online=1, offline=1]
2020-11-17 17:22:59.476 [INFO 
][disco-notifier-worker-#50][org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor]
 Received state change finish message: true
2020-11-17 17:22:59.477 
[ERROR][exchange-worker-#65][org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager]
 Failed to process custom exchange task: ClientCacheChangeDummyDiscoveryMessage 
[reqId=9eb245bd-5cdc-4d13-8eac-9e7c7585c83f, cachesToClose=null, startCaches= 
[test-persist-partitioned-atomic-cache]]
java.lang.NullPointerException: null
at 
org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.processClientCachesRequests(CacheAffinitySharedManager.java:666)
 ~[ignite-core-2.8.1.jar:2.8.1]
at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.processCustomExchangeTask(GridCacheProcessor.java:408)
 ~[ignite-core-2.8.1.jar:2.8.1]
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.processCustomTask(GridCachePartitionExchangeManager.java:2997)
 [ignite-core-2.8.1.jar:2.8.1]
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3145)
 [ignite-core-2.8.1.jar:2.8.1]
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3063)
 [ignite-core-2.8.1.jar:2.8.1]
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) 
[ignite-core-2.8.1.jar:2.8.1]
at java.lang.Thread.run(Thread.java:748)
2020-11-17 17:22:59.479 [INFO 
][exchange-worker-#65][org.apache.ignite.internal.exchange.time] Started 
exchange init [topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0], 
crd=false, evt=NODE_JOINED, evtNode=9864d053-9100-4ab1-892b-540f19b30c32, 
customEvt=null, allowMerge=true, exchangeFreeSwitch=false]
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Page replacement improvement

2020-11-20 Thread Nikolay Izhikov
Hello, Alex.

Do you have a reproducer for a Page memory issues, you described?
What is consequences of that types of issues?

Can we create some «benchmark» that will measure imrpovements of page 
replacement algorithm?
May be we can use ducktape branch for it?


> 20 нояб. 2020 г., в 12:05, Alex Plehanov  написал(а):
> 
> Zhenya,
> 
>> Alexey, we already have changes that partially fixes this issue [1]
> IGNITE-13086 it's a minor improvement. We still have major problems with
> our page replacement algorithm (slow page selection and non-optimal
> page-fault rate). I think changing from random 5 pages to 7 will make
> things even worse (it's better for page-fault rate, but page selection will
> be slower).
> 
>> This approach still not applicable for real life
> Why do you think batch replacement is not applicable for real-life? It can
> be applied for workloads, where some big amount of data periodically used,
> but not very often. For example, when OLAP request over historical data
> raised pages to page-memory, and after such request this data is not needed
> for a long time. Or when OLTP transactions mostly add new data and process
> recent data but rarely touch historical data. In these cases with the
> current approach, we will enter "page replacement mode" after some period
> of time and never leave it. With batch page replacement there is a chance
> to prevent random-LRU page replacement or postpone it.
> 
>> But request once more, do you really observe such problems with 2.9 ver ?
> Any graphs maybe ?
> I don't have production usage feedback after IGNITE-13086, but I doubt
> something changed significantly.
> 
> 
> чт, 19 нояб. 2020 г. в 09:18, Zhenya Stanilovsky > :
> 
>> 
>> Alexey, we already have changes that partially fixes this issue [1]
>> Easy way:
>> Looks like we already have converge in page replacement.
>> If we change 5 times touch iterator from random lru algo into, for
>> example — 7 we will obtain fast improvement from scratch.
>> 
>> » Batch page replacement
>> This approach still not applicable for real life if you wan`t to observe
>> ugly people for threshold (i.e. 12 h) interval. And, of course, you
>> understand that dramatically reduce of such interval gives nothing?
>> 
>> » Change the page replacement algorithm.
>> That`s way i vote for ) But request once more, do you really observe such
>> problems with 2.9 ver ? Any graphs maybe ?
>> 
>> thanks !
>> 
>> [1] https://issues.apache.org/jira/browse/IGNITE-13086
>>> Hello, Igniters!
>>> 
>>> Currently, for page replacement (page rotation between page-memory and
>>> disk) we use Random-LRU algorithm. It has a low maintenance cost and
>>> relatively simple implementation, but it has many disadvantages and
>> affects
>>> performance very much when replacement is started. We even have warnings
>> in
>>> the log when page replacement started and a special event for this. I know
>>> Ignite deployments where administrators force to restart cluster nodes
>>> periodically to avoid page replacement.
>>> 
>>> I have a couple of proposals to improve page replacement in Ignite:
>>> 
>>> *Batch page replacement.*
>>> 
>>> Main idea: in some cases start background task to evict cold pages from
>>> page-memory (for example, pages, last touched more than 12 hours ago).
>>> 
>>> The task can be started:
>>> - Automatically, triggered by some events, for example, when we expect a
>>> start of Random-LRU page replacing soon (allocated more than 90% of
>>> page-memory) + we have enough amount of cold pages (we need some metric to
>>> calculate the number of cold pages) + some time passed since last batch
>>> page replacement (to avoid too much resource consumption by background
>>> batch replacement).
>>> - Manually (JMX or control.sh), if an administrator wants to control the
>>> time of batch replacement more precisely (for example, to avoid the start
>>> of this task during peak time).
>>> 
>>> Batch page replacement will be helpful in some workloads (when some data
>>> much colder than another), it can prevent the starting of Random-LRU page
>>> replacement, or if Random-LRU already started it can provide conditions to
>>> stop it.
>>> 
>>> *Change the page replacement algorithm.*
>>> 
>>> Good page replacement algorithm should satisfy the requirements:
>>> - low page-fault rates for typical workload
>>> - low maintenance cost (low resource consumption to maintain additional
>>> structures required for page replacement)
>>> - fast searching of next page for replacement
>>> - sequential scans resistance (one sequential scan should not evict all
>>> relatively hot pages from page-memory)
>>> 
>>> Our Random-LRU has low maintenance cost and sequential scan resistant, but
>>> to find the next page for replacement in the best case we scan 5 pages, in
>>> the worst case we can scan all data region segment. Also, due to random
>>> nature, it's not very effective in predicting the right page for
>>> replacement to minimize the page-fault rate. And it

Re: [DISCUSS] Page replacement improvement

2020-11-20 Thread Alex Plehanov
Zhenya,

> Alexey, we already have changes that partially fixes this issue [1]
IGNITE-13086 it's a minor improvement. We still have major problems with
our page replacement algorithm (slow page selection and non-optimal
page-fault rate). I think changing from random 5 pages to 7 will make
things even worse (it's better for page-fault rate, but page selection will
be slower).

> This approach still not applicable for real life
Why do you think batch replacement is not applicable for real-life? It can
be applied for workloads, where some big amount of data periodically used,
but not very often. For example, when OLAP request over historical data
raised pages to page-memory, and after such request this data is not needed
for a long time. Or when OLTP transactions mostly add new data and process
recent data but rarely touch historical data. In these cases with the
current approach, we will enter "page replacement mode" after some period
of time and never leave it. With batch page replacement there is a chance
to prevent random-LRU page replacement or postpone it.

> But request once more, do you really observe such problems with 2.9 ver ?
Any graphs maybe ?
I don't have production usage feedback after IGNITE-13086, but I doubt
something changed significantly.


чт, 19 нояб. 2020 г. в 09:18, Zhenya Stanilovsky :

>
> Alexey, we already have changes that partially fixes this issue [1]
> Easy way:
> Looks like we already have converge in page replacement.
> If we change 5 times touch iterator from random lru algo into, for
> example — 7 we will obtain fast improvement from scratch.
>
> » Batch page replacement
> This approach still not applicable for real life if you wan`t to observe
> ugly people for threshold (i.e. 12 h) interval. And, of course, you
> understand that dramatically reduce of such interval gives nothing?
>
> » Change the page replacement algorithm.
> That`s way i vote for ) But request once more, do you really observe such
> problems with 2.9 ver ? Any graphs maybe ?
>
> thanks !
>
> [1] https://issues.apache.org/jira/browse/IGNITE-13086
> >Hello, Igniters!
> >
> >Currently, for page replacement (page rotation between page-memory and
> >disk) we use Random-LRU algorithm. It has a low maintenance cost and
> >relatively simple implementation, but it has many disadvantages and
> affects
> >performance very much when replacement is started. We even have warnings
> in
> >the log when page replacement started and a special event for this. I know
> >Ignite deployments where administrators force to restart cluster nodes
> >periodically to avoid page replacement.
> >
> >I have a couple of proposals to improve page replacement in Ignite:
> >
> >*Batch page replacement.*
> >
> >Main idea: in some cases start background task to evict cold pages from
> >page-memory (for example, pages, last touched more than 12 hours ago).
> >
> >The task can be started:
> >- Automatically, triggered by some events, for example, when we expect a
> >start of Random-LRU page replacing soon (allocated more than 90% of
> >page-memory) + we have enough amount of cold pages (we need some metric to
> >calculate the number of cold pages) + some time passed since last batch
> >page replacement (to avoid too much resource consumption by background
> >batch replacement).
> >- Manually (JMX or control.sh), if an administrator wants to control the
> >time of batch replacement more precisely (for example, to avoid the start
> >of this task during peak time).
> >
> >Batch page replacement will be helpful in some workloads (when some data
> >much colder than another), it can prevent the starting of Random-LRU page
> >replacement, or if Random-LRU already started it can provide conditions to
> >stop it.
> >
> >*Change the page replacement algorithm.*
> >
> >Good page replacement algorithm should satisfy the requirements:
> >- low page-fault rates for typical workload
> >- low maintenance cost (low resource consumption to maintain additional
> >structures required for page replacement)
> >- fast searching of next page for replacement
> >- sequential scans resistance (one sequential scan should not evict all
> >relatively hot pages from page-memory)
> >
> >Our Random-LRU has low maintenance cost and sequential scan resistant, but
> >to find the next page for replacement in the best case we scan 5 pages, in
> >the worst case we can scan all data region segment. Also, due to random
> >nature, it's not very effective in predicting the right page for
> >replacement to minimize the page-fault rate. And it's much time required
> to
> >totally evict old cold data.
> >
> >Usually, database management systems and operating systems use
> >modifications of LRU algorithms. These algorithms have higher maintenance
> >costs (pages list should be modified on each page access), but often they
> >are effective from a "page-fault rate" point of view and have O(1)
> >complexity for a searching page to replace. Simple LRU is not sequential
> >scan resistant, but m