[MTCGA]: new failures in builds [5716164] needs to be handled
Hi Igniters, I've detected some new issue on TeamCity to be handled. You are more than welcomed to help. If your changes can lead to this failure(s): We're grateful that you were a volunteer to make the contribution to this project, but things change and you may no longer be able to finalize your contribution. Could you respond to this email and indicate if you wish to continue and fix test failures or step down and some committer may revert you commit. *New test failure in master-nightly IgnitePersistentStoreDataStructuresTest.testLatchVolatility https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-2498689540135370176&branch=%3Cdefault%3E&tab=testDetails *New test failure in master-nightly IgnitePersistentStoreDataStructuresTest.testLockVolatility https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=8536744125057342252&branch=%3Cdefault%3E&tab=testDetails *New test failure in master-nightly IgnitePersistentStoreDataStructuresTest.testSemaphoreVolatility https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-8607160794826656046&branch=%3Cdefault%3E&tab=testDetails Changes may lead to failure were done by - zstan https://ci.ignite.apache.org/viewModification.html?modId=909509 - Here's a reminder of what contributors were agreed to do https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute - Should you have any questions please contact dev@ignite.apache.org Best Regards, Apache Ignite TeamCity Bot https://github.com/apache/ignite-teamcity-bot Notification generated at 06:52:42 07-11-2020
[MTCGA]: new failures in builds [5709314] needs to be handled
Hi Igniters, I've detected some new issue on TeamCity to be handled. You are more than welcomed to help. *Test with high flaky rate in master WebSessionSelfTest.testRestarts https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=6720374228021379378&branch=%3Cdefault%3E&tab=testDetails No changes in the build - Here's a reminder of what contributors were agreed to do https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute - Should you have any questions please contact dev@ignite.apache.org Best Regards, Apache Ignite TeamCity Bot https://github.com/apache/ignite-teamcity-bot Notification generated at 05:22:46 07-11-2020
[jira] [Created] (IGNITE-13685) Flaky failure of FunctionalTest.testOptimitsticRepeatableReadUpdatesValue
Aleksey Plekhanov created IGNITE-13685: -- Summary: Flaky failure of FunctionalTest.testOptimitsticRepeatableReadUpdatesValue Key: IGNITE-13685 URL: https://issues.apache.org/jira/browse/IGNITE-13685 Project: Ignite Issue Type: Bug Components: thin client Reporter: Aleksey Plekhanov Assignee: Aleksey Plekhanov Test FunctionalTest.testOptimitsticRepeatableReadUpdatesValue is flaky Root cause: {{get()}} method on {{ForkJoinTask}} sometimes can help with the execution of the task in common {{ForkJoinPool}}, so task is executed in the current thread, that already holds a transaction, end {{cache.get()}} method returns a value relative to this transaction. See sack trace: {noformat} java.util.concurrent.ExecutionException: org.junit.ComparisonFailure: expected: but was: at java.base/java.util.concurrent.ForkJoinTask.get(ForkJoinTask.java:1006) at org.apache.ignite.client.FunctionalTest.testOptimitsticRepeatableReadUpdatesValue(FunctionalTest.java:719) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: org.junit.ComparisonFailure: expected: but was: at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.ignite.client.FunctionalTest.lambda$testOptimitsticRepeatableReadUpdatesValue$10(FunctionalTest.java:712) at java.base/java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1407) at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) at java.base/java.util.concurrent.ForkJoinTask.tryExternalHelp(ForkJoinTask.java:381) at java.base/java.util.concurrent.ForkJoinTask.externalInterruptibleAwaitDone(ForkJoinTask.java:351) at java.base/java.util.concurrent.ForkJoinTask.get(ForkJoinTask.java:1004) ... 13 more {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Ignite 3.0 development approach
Here are the slides from Alexey Goncharuk. Let's think this over and continue on Monday: https://go.gridgain.com/rs/491-TWR-806/images/Ignite_3_Plans_and_development_process.pdf чт, 5 нояб. 2020 г. в 11:13, Anton Vinogradov : > Folks, > > Should we perform cleanup work before (r)evolutional changes? > My huge proposal is to get rid of things which we don't need anyway > - local caches, > - strange tx modes, > - code overcomplexity because of RollingUpgrade feature never attended at > AI, > - etc, > before choosing the way. > > On Tue, Nov 3, 2020 at 3:31 PM Valentin Kulichenko < > valentin.kuliche...@gmail.com> wrote: > > > Ksenia, thanks for scheduling this on such short notice! > > > > As for the original topic, I do support Alexey's idea. We're not going to > > rewrite anything from scratch, as most of the components are going to be > > moved as-is or with minimal modifications. However, the changes that are > > proposed imply serious rework of the core parts of the code, which are > not > > properly decoupled from each other and from other parts. This makes the > > incremental approach borderline impossible. Developing in a new repo, > > however, addresses this concern. As a bonus, we can also refactor the > code, > > introduce better decoupling, get rid of kernel context, and develop unit > > tests (finally!). > > > > Basically, this proposal only affects the *process*, not the set of > changes > > we had discussed before. Ignite 3.0 is our unique chance to make things > > right. > > > > -Val > > > > On Tue, Nov 3, 2020 at 3:06 AM Kseniya Romanova < > romanova.ks@gmail.com > > > > > wrote: > > > > > Pavel, all the interesting points will be anyway published here in > > English > > > (as the principal "if it's not on devlist it doesn't happened" is still > > > relevant). This is just a quick call for a group of developers. Later > we > > > can do a separate presentation of idea and discussion in English as we > > did > > > for the Ignite 3.0 draft of changes. > > > > > > вт, 3 нояб. 2020 г. в 13:52, Pavel Tupitsyn : > > > > > > > Kseniya, > > > > > > > > Thanks for scheduling this call. > > > > Do you think we can switch to English if non-Russian speaking > community > > > > members decide to join? > > > > > > > > On Tue, Nov 3, 2020 at 1:32 PM Kseniya Romanova < > > > romanova.ks@gmail.com > > > > > > > > > wrote: > > > > > > > > > Let's do this community discussion open. Here's the link on zoom > call > > > in > > > > > Russian for Friday 6 PM: > > > > > > https://www.meetup.com/Moscow-Apache-Ignite-Meetup/events/274360378/ > > > > > > > > > > вт, 3 нояб. 2020 г. в 12:49, Nikolay Izhikov >: > > > > > > > > > > > Time works for me. > > > > > > > > > > > > > 3 нояб. 2020 г., в 12:40, Alexey Goncharuk < > > > > alexey.goncha...@gmail.com > > > > > > > > > > > > написал(а): > > > > > > > > > > > > > > Nikolay, > > > > > > > > > > > > > > I am up for the call. I will try to explain my reasoning in > > greater > > > > > > detail > > > > > > > and will be glad to hear the concerns. Will this Friday, Nov > 6th, > > > > work? > > > > > > > > > > > > > > вт, 3 нояб. 2020 г. в 10:09, Nikolay Izhikov < > > nizhi...@apache.org > > > >: > > > > > > > > > > > > > >> Igniters, should we have a call for this topic? > > > > > > >> > > > > > > >>> 2 нояб. 2020 г., в 18:53, Pavel Tupitsyn < > ptupit...@apache.org > > > > > > > > > >> написал(а): > > > > > > >>> > > > > > > not intend to rewrite everything from scratch > > > > > > >>> > > > > > > Every single test from Ignite 2.x should be moved to Ignite > 3 > > > > > > regardless of how we choose to proceed. > > > > > > >>> > > > > > > >>> Alexey, thank you for the explanation, this addresses all of > my > > > > > > concerns. > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> On Mon, Nov 2, 2020 at 6:43 PM Andrey Mashenkov < > > > > > > >> andrey.mashen...@gmail.com> > > > > > > >>> wrote: > > > > > > >>> > > > > > > Hi, Igniters. > > > > > > > > > > > > * AFAIU, we need a new repo if we want to apply different > > > > > restrictions > > > > > > >> to > > > > > > pull requests, > > > > > > otherwise I see no difference for myself. > > > > > > E.g. make static analysis (do we have?), compile, styles, > and > > > > > javadoc > > > > > > checks mandatory. > > > > > > > > > > > > I think that relaxed requirements here will lead to bad > > product > > > > > > quality. > > > > > > > > > > > > * Agree with Pavel, we should 'keep' integrations tests > > somehow. > > > > > > During active development tests will be broken most of time, > > so, > > > > > > I'd port them e.g. suite-by-suite once we will have a stable > > and > > > > > > >> featured > > > > > > environment to run them and of course make test's code clear > > and > > > > > avoid > > > > > > bad/non-relevant ones. > > > > > > > > > > > > * I like bottom-up approac
[DISCUSSION] Apache Ignite Release 2.10 (time, scope, manager)
Igniters, Let's finalize the discussion [1] about the next upcoming major Apache Ignite 2.10 release. The major improvements related to the proposed release: - Improvements for partition clearing related parts - Add tracing of SQL queries. - CPP: Implement Cluster API - .NET: Thin client: Transactions - .NET: Thin Client: Continuous Query - Java Thin client Kubernetes discovery etc. Total: 166 RESOLVED issues [2]. Let's start the discussion about Time and Scope, and also I propose myself as the release manager of the Apache Ignite 2.10. If you'd like to lead this release, please, let us know, I see no problems to chose a better candidate. Proposed release timeline: Scope Freeze: December 10, 2020 Code Freeze: December 24, 2020 Voting Date: January 18, 2021 Release Date: January 25, 2021 Proposed release scope: [2] WDYT? [1] http://apache-ignite-developers.2346864.n4.nabble.com/2-9-1-release-proposal-tp49769p49867.html [2] https://issues.apache.org/jira/issues/?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%202.10%20and%20status%20in%20(Resolved%2C%20Closed)%20and%20resolution%20%3D%20Fixed%20order%20by%20priority%20
[jira] [Created] (IGNITE-13684) Rewrite PageIo resolver from static to explicit dependency
Anton Kalashnikov created IGNITE-13684: -- Summary: Rewrite PageIo resolver from static to explicit dependency Key: IGNITE-13684 URL: https://issues.apache.org/jira/browse/IGNITE-13684 Project: Ignite Issue Type: Sub-task Reporter: Anton Kalashnikov Assignee: Ivan Bessonov Right now, ignite has a static pageIo resolver which not allow substituting the different implementation if needed. So it is needed to rewrite the current implementation in order of this target. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13683) Added MVCC validation to ValidateIndexesClosure
Anton Kalashnikov created IGNITE-13683: -- Summary: Added MVCC validation to ValidateIndexesClosure Key: IGNITE-13683 URL: https://issues.apache.org/jira/browse/IGNITE-13683 Project: Ignite Issue Type: Sub-task Reporter: Anton Kalashnikov Assignee: Semyon Danilov MVCC indexes validation should be added to ValidateIndexesClosure -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13682) Added generic to maintenance mode feature
Anton Kalashnikov created IGNITE-13682: -- Summary: Added generic to maintenance mode feature Key: IGNITE-13682 URL: https://issues.apache.org/jira/browse/IGNITE-13682 Project: Ignite Issue Type: Sub-task Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov MaintenanceAction has no generic right now which lead to parametirezed problem -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-13681) Non markers checkpoint implementation
Anton Kalashnikov created IGNITE-13681: -- Summary: Non markers checkpoint implementation Key: IGNITE-13681 URL: https://issues.apache.org/jira/browse/IGNITE-13681 Project: Ignite Issue Type: Sub-task Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov It's needed to implement a new version of checkpoint which will be simpler than the current one. The main differences compared to the current checkpoint: * It doesn't contain any write operation to WAL. * It doesn't create checkpoint markers. * It should be possible to configure checkpoint listener only on the exact data region This checkpoint will be helpful for defragmentation and for recovery(it is not possible to use the current checkpoint during recovery right now) -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Why WAL archives enabled by default?
Alex, thanks for pointing that out. Shame that I missed it. пт, 6 нояб. 2020 г. в 13:45, Alex Plehanov : > Guys, > > We already have FileWriteAheadLogManager#maxSegCountWithoutCheckpoint. > Checkpoint triggered if there are too many WAL segments without checkpoint. > Looks like you are talking about this feature. > > пт, 6 нояб. 2020 г. в 13:21, Ivan Daschinsky : > > > Kirill and I discussed privately proposed approach. As far as I > understand, > > Kirill suggests to implement some > > heuristic to do a force checkpoint in some cases if user by mistake > > misconfigured cluster in order to preserve > > requested size of WAL archive. > > Currently, as for me, this approach is questionable, because it can cause > > some performance problems. But as an option, > > it can be used and should be switchable. > > > > пт, 6 нояб. 2020 г. в 12:36, Ivan Daschinsky : > > > > > Kirill, how your approach will help if user tuned a cluster to do > > > checkpoints rarely under load? > > > No way. > > > > > > пт, 6 нояб. 2020 г. в 12:19, ткаленко кирилл : > > > > > >> Ivan, I agree with you that the archive is primarily about > optimization. > > >> > > >> If the size of the archive is critical for the user, we have no > > >> protection against this, we can always go beyond this limit. > > >> Thus, the user needs to remember this and configure it in some way. > > >> > > >> I suggest not to exceed this limit and give the expected behavior for > > the > > >> user. At the same time, the segments needed for recovery will remain > and > > >> there will be no data loss. > > >> > > >> 06.11.2020, 11:29, "Ivan Daschinsky" : > > >> > Guys, fisrt of all, archiving is not for PITR at all, this is > > >> optimization. > > >> > If we disable archiving, every rollover we need to create new file. > If > > >> we > > >> > enable archiving, we reserve 10 (by default) segments filled with > > >> zeroes. > > >> > We use mmap by default, so if we use no-archiver approach: > > >> > 1. We firstly create new empty file > > >> > 2. Call on it sun.nio.ch.FileChannelImpl#map, thats under the hood > > >> > a. If file is shorter, than wal segment size, it > > >> > calls sun.nio.ch.FileDispatcherImpl#truncate0, this is under the > hood > > >> just > > >> > a system call truncate [1] > > >> > b. Than it calls system call mmap on this > > >> > file sun.nio.ch.FileChannelImpl#map0, under the hood see [2] > > >> > These manipulation are not free and cheap. So rollover will be much > > much > > >> > slower. > > >> > If archiving is enabled, 10 segments are already preallocated at the > > >> moment > > >> > of node's start. > > >> > > > >> > When archiving is enabled, archiver just copy previous preallocated > > >> segment > > >> > and move it to archive directory. > > >> > This archived segment is crucial for recovery. When new checkpoints > > >> > finished, all eligible for trunocating segments are just removed. > > >> > > > >> > If archiving is disabled, we also write WAL segments in wal > directory > > >> and > > >> > disabling archiving don't prevent you from storing segments, if they > > are > > >> > required for recovery. > > >> > > > >> >>> Before increasing the size of WAL archive (transferring to archive > > >> > > > >> > /rollOver, compression, decompression), we can make sure that there > > >> will be > > >> > enough space in the archive and if there is no such, then we will > try > > to > > >> >>> clean it. We cannot delete those segments that are required for > > >> recovery > > >> > > > >> > (between the last two checkpoints) and reserved for example for > > >> historical > > >> > rebalancing. > > >> > First of all, compression/decompression is offtopic here. > > >> > Secondly, wal segments are required only with idx higher than LAST > > >> > checkpoint marker. > > >> > Thirdly, archiving and rolling over can be during checkpoint and we > > can > > >> > broke everything accidentially. > > >> > Fourthly, I see no benefits to overcomplicated already complicated > > >> logic. > > >> > This is basically problem of misunderstanding and tuning. > > >> > There are a lot of similar topics for almost every DB. [3] > > >> > > > >> > [1] -- https://man7.org/linux/man-pages/man2/ftruncate.2.html > > >> > [2] -- https://man7.org/linux/man-pages/man2/mmap.2.html > > >> > [3] -- > > >> > > > >> > > > https://www.google.com/search?q=pg_wal%2Fxlogtemp+no+space+left+on+device&oq=pg+wal+no > > >> > > > >> > пт, 6 нояб. 2020 г. в 10:42, ткаленко кирилл >: > > >> > > > >> >> Hi, Ivan! > > >> >> > > >> >> I have only described ideas. But here are a few more details. > > >> >> > > >> >> We can take care not to go beyond > > >> >> DataStorageConfiguration#maxWalArchiveSize. > > >> >> > > >> >> Before increasing the size of WAL archive (transferring to archive > > >> >> /rollOver, compression, decompression), we can make sure that > there > > >> will be > > >> >> enough space in the archive and if there is no such, then we will > > try > > >> to > > >> >> cl
Re: Why WAL archives enabled by default?
Guys, We already have FileWriteAheadLogManager#maxSegCountWithoutCheckpoint. Checkpoint triggered if there are too many WAL segments without checkpoint. Looks like you are talking about this feature. пт, 6 нояб. 2020 г. в 13:21, Ivan Daschinsky : > Kirill and I discussed privately proposed approach. As far as I understand, > Kirill suggests to implement some > heuristic to do a force checkpoint in some cases if user by mistake > misconfigured cluster in order to preserve > requested size of WAL archive. > Currently, as for me, this approach is questionable, because it can cause > some performance problems. But as an option, > it can be used and should be switchable. > > пт, 6 нояб. 2020 г. в 12:36, Ivan Daschinsky : > > > Kirill, how your approach will help if user tuned a cluster to do > > checkpoints rarely under load? > > No way. > > > > пт, 6 нояб. 2020 г. в 12:19, ткаленко кирилл : > > > >> Ivan, I agree with you that the archive is primarily about optimization. > >> > >> If the size of the archive is critical for the user, we have no > >> protection against this, we can always go beyond this limit. > >> Thus, the user needs to remember this and configure it in some way. > >> > >> I suggest not to exceed this limit and give the expected behavior for > the > >> user. At the same time, the segments needed for recovery will remain and > >> there will be no data loss. > >> > >> 06.11.2020, 11:29, "Ivan Daschinsky" : > >> > Guys, fisrt of all, archiving is not for PITR at all, this is > >> optimization. > >> > If we disable archiving, every rollover we need to create new file. If > >> we > >> > enable archiving, we reserve 10 (by default) segments filled with > >> zeroes. > >> > We use mmap by default, so if we use no-archiver approach: > >> > 1. We firstly create new empty file > >> > 2. Call on it sun.nio.ch.FileChannelImpl#map, thats under the hood > >> > a. If file is shorter, than wal segment size, it > >> > calls sun.nio.ch.FileDispatcherImpl#truncate0, this is under the hood > >> just > >> > a system call truncate [1] > >> > b. Than it calls system call mmap on this > >> > file sun.nio.ch.FileChannelImpl#map0, under the hood see [2] > >> > These manipulation are not free and cheap. So rollover will be much > much > >> > slower. > >> > If archiving is enabled, 10 segments are already preallocated at the > >> moment > >> > of node's start. > >> > > >> > When archiving is enabled, archiver just copy previous preallocated > >> segment > >> > and move it to archive directory. > >> > This archived segment is crucial for recovery. When new checkpoints > >> > finished, all eligible for trunocating segments are just removed. > >> > > >> > If archiving is disabled, we also write WAL segments in wal directory > >> and > >> > disabling archiving don't prevent you from storing segments, if they > are > >> > required for recovery. > >> > > >> >>> Before increasing the size of WAL archive (transferring to archive > >> > > >> > /rollOver, compression, decompression), we can make sure that there > >> will be > >> > enough space in the archive and if there is no such, then we will try > to > >> >>> clean it. We cannot delete those segments that are required for > >> recovery > >> > > >> > (between the last two checkpoints) and reserved for example for > >> historical > >> > rebalancing. > >> > First of all, compression/decompression is offtopic here. > >> > Secondly, wal segments are required only with idx higher than LAST > >> > checkpoint marker. > >> > Thirdly, archiving and rolling over can be during checkpoint and we > can > >> > broke everything accidentially. > >> > Fourthly, I see no benefits to overcomplicated already complicated > >> logic. > >> > This is basically problem of misunderstanding and tuning. > >> > There are a lot of similar topics for almost every DB. [3] > >> > > >> > [1] -- https://man7.org/linux/man-pages/man2/ftruncate.2.html > >> > [2] -- https://man7.org/linux/man-pages/man2/mmap.2.html > >> > [3] -- > >> > > >> > https://www.google.com/search?q=pg_wal%2Fxlogtemp+no+space+left+on+device&oq=pg+wal+no > >> > > >> > пт, 6 нояб. 2020 г. в 10:42, ткаленко кирилл : > >> > > >> >> Hi, Ivan! > >> >> > >> >> I have only described ideas. But here are a few more details. > >> >> > >> >> We can take care not to go beyond > >> >> DataStorageConfiguration#maxWalArchiveSize. > >> >> > >> >> Before increasing the size of WAL archive (transferring to archive > >> >> /rollOver, compression, decompression), we can make sure that there > >> will be > >> >> enough space in the archive and if there is no such, then we will > try > >> to > >> >> clean it. We cannot delete those segments that are required for > >> recovery > >> >> (between the last two checkpoints) and reserved for example for > >> historical > >> >> rebalancing. > >> >> > >> >> We can receive a notification about the change of checkpoints and > the > >> >> reservation / release of segments, thus we can know how many > segments >
Re: Why WAL archives enabled by default?
Kirill and I discussed privately proposed approach. As far as I understand, Kirill suggests to implement some heuristic to do a force checkpoint in some cases if user by mistake misconfigured cluster in order to preserve requested size of WAL archive. Currently, as for me, this approach is questionable, because it can cause some performance problems. But as an option, it can be used and should be switchable. пт, 6 нояб. 2020 г. в 12:36, Ivan Daschinsky : > Kirill, how your approach will help if user tuned a cluster to do > checkpoints rarely under load? > No way. > > пт, 6 нояб. 2020 г. в 12:19, ткаленко кирилл : > >> Ivan, I agree with you that the archive is primarily about optimization. >> >> If the size of the archive is critical for the user, we have no >> protection against this, we can always go beyond this limit. >> Thus, the user needs to remember this and configure it in some way. >> >> I suggest not to exceed this limit and give the expected behavior for the >> user. At the same time, the segments needed for recovery will remain and >> there will be no data loss. >> >> 06.11.2020, 11:29, "Ivan Daschinsky" : >> > Guys, fisrt of all, archiving is not for PITR at all, this is >> optimization. >> > If we disable archiving, every rollover we need to create new file. If >> we >> > enable archiving, we reserve 10 (by default) segments filled with >> zeroes. >> > We use mmap by default, so if we use no-archiver approach: >> > 1. We firstly create new empty file >> > 2. Call on it sun.nio.ch.FileChannelImpl#map, thats under the hood >> > a. If file is shorter, than wal segment size, it >> > calls sun.nio.ch.FileDispatcherImpl#truncate0, this is under the hood >> just >> > a system call truncate [1] >> > b. Than it calls system call mmap on this >> > file sun.nio.ch.FileChannelImpl#map0, under the hood see [2] >> > These manipulation are not free and cheap. So rollover will be much much >> > slower. >> > If archiving is enabled, 10 segments are already preallocated at the >> moment >> > of node's start. >> > >> > When archiving is enabled, archiver just copy previous preallocated >> segment >> > and move it to archive directory. >> > This archived segment is crucial for recovery. When new checkpoints >> > finished, all eligible for trunocating segments are just removed. >> > >> > If archiving is disabled, we also write WAL segments in wal directory >> and >> > disabling archiving don't prevent you from storing segments, if they are >> > required for recovery. >> > >> >>> Before increasing the size of WAL archive (transferring to archive >> > >> > /rollOver, compression, decompression), we can make sure that there >> will be >> > enough space in the archive and if there is no such, then we will try to >> >>> clean it. We cannot delete those segments that are required for >> recovery >> > >> > (between the last two checkpoints) and reserved for example for >> historical >> > rebalancing. >> > First of all, compression/decompression is offtopic here. >> > Secondly, wal segments are required only with idx higher than LAST >> > checkpoint marker. >> > Thirdly, archiving and rolling over can be during checkpoint and we can >> > broke everything accidentially. >> > Fourthly, I see no benefits to overcomplicated already complicated >> logic. >> > This is basically problem of misunderstanding and tuning. >> > There are a lot of similar topics for almost every DB. [3] >> > >> > [1] -- https://man7.org/linux/man-pages/man2/ftruncate.2.html >> > [2] -- https://man7.org/linux/man-pages/man2/mmap.2.html >> > [3] -- >> > >> https://www.google.com/search?q=pg_wal%2Fxlogtemp+no+space+left+on+device&oq=pg+wal+no >> > >> > пт, 6 нояб. 2020 г. в 10:42, ткаленко кирилл : >> > >> >> Hi, Ivan! >> >> >> >> I have only described ideas. But here are a few more details. >> >> >> >> We can take care not to go beyond >> >> DataStorageConfiguration#maxWalArchiveSize. >> >> >> >> Before increasing the size of WAL archive (transferring to archive >> >> /rollOver, compression, decompression), we can make sure that there >> will be >> >> enough space in the archive and if there is no such, then we will try >> to >> >> clean it. We cannot delete those segments that are required for >> recovery >> >> (between the last two checkpoints) and reserved for example for >> historical >> >> rebalancing. >> >> >> >> We can receive a notification about the change of checkpoints and the >> >> reservation / release of segments, thus we can know how many segments >> we >> >> can delete right now. >> >> >> >> 06.11.2020, 09:53, "Ivan Daschinsky" : >> >> >>> For example, when trying to move a segment to the archive. >> >> > >> >> > We cannot do this, we will lost data. We can truncate archived >> segment if >> >> > and only if it is not required for recovery. If last checkpoint >> marker >> >> > points to segment >> >> > with lower index, we cannot delete any segment with higher index. >> So the >> >> > only moment where we can remo
Re: [DISCUSS] Disable socket linger by default in TCP discovery SPI.
The tickets are: [1] disables linger by default and [2] is the doc. [1] https://issues.apache.org/jira/browse/IGNITE-13643 [2] https://issues.apache.org/jira/browse/IGNITE-13662 05.11.2020 11:00, Anton Vinogradov пишет: Folks, Seems, we've got an agreement that the fix is necessary. Do we need to do except the following? zero linger as default + warning on SSL enabled on JVM before the fix + warning at documentation + migration notes On Tue, Nov 3, 2020 at 2:38 PM Steshin Vladimir wrote: Ilya, hi. Of course: /TcpDiscoverySpi.setSoLinger(int)/ property. Always been. 02.11.2020 20:14, Ilya Kasnacheev пишет: Hello! Is there any option to re-enable linger on SSL sockets? Telling people to re-configure does not help if they can't. Regards,
Re: delete is too slow, sometimes even causes OOM
Hi Frank! There is an old ticket [1] - We will try to prioritize it to finish before the end of the year it should prevent OOM for most cases. [1] https://issues.apache.org/jira/browse/IGNITE-9182 вт, 3 нояб. 2020 г. в 18:53, frank li : > Current code logic for DELETE is as follows: > if WHERE clause contains a condition as "key=xxx", it uses fastUpdate > which remove the related item directly. > > else > do select for update; > for each row, call closure code "RMV" to remove it. > > 1. As "executeSelectForDml" get _KEY and _VAL columns for all condidate > rows, it often causes OOM when there are a lot of data to delete. Why do > we verify "val" during remove operation? > > 2. After selection, why don't we just remove it with cache.remove as > fastUpdate does? > > > -- Живи с улыбкой! :D
Re: Why WAL archives enabled by default?
Kirill, how your approach will help if user tuned a cluster to do checkpoints rarely under load? No way. пт, 6 нояб. 2020 г. в 12:19, ткаленко кирилл : > Ivan, I agree with you that the archive is primarily about optimization. > > If the size of the archive is critical for the user, we have no protection > against this, we can always go beyond this limit. > Thus, the user needs to remember this and configure it in some way. > > I suggest not to exceed this limit and give the expected behavior for the > user. At the same time, the segments needed for recovery will remain and > there will be no data loss. > > 06.11.2020, 11:29, "Ivan Daschinsky" : > > Guys, fisrt of all, archiving is not for PITR at all, this is > optimization. > > If we disable archiving, every rollover we need to create new file. If we > > enable archiving, we reserve 10 (by default) segments filled with zeroes. > > We use mmap by default, so if we use no-archiver approach: > > 1. We firstly create new empty file > > 2. Call on it sun.nio.ch.FileChannelImpl#map, thats under the hood > > a. If file is shorter, than wal segment size, it > > calls sun.nio.ch.FileDispatcherImpl#truncate0, this is under the hood > just > > a system call truncate [1] > > b. Than it calls system call mmap on this > > file sun.nio.ch.FileChannelImpl#map0, under the hood see [2] > > These manipulation are not free and cheap. So rollover will be much much > > slower. > > If archiving is enabled, 10 segments are already preallocated at the > moment > > of node's start. > > > > When archiving is enabled, archiver just copy previous preallocated > segment > > and move it to archive directory. > > This archived segment is crucial for recovery. When new checkpoints > > finished, all eligible for trunocating segments are just removed. > > > > If archiving is disabled, we also write WAL segments in wal directory and > > disabling archiving don't prevent you from storing segments, if they are > > required for recovery. > > > >>> Before increasing the size of WAL archive (transferring to archive > > > > /rollOver, compression, decompression), we can make sure that there will > be > > enough space in the archive and if there is no such, then we will try to > >>> clean it. We cannot delete those segments that are required for > recovery > > > > (between the last two checkpoints) and reserved for example for > historical > > rebalancing. > > First of all, compression/decompression is offtopic here. > > Secondly, wal segments are required only with idx higher than LAST > > checkpoint marker. > > Thirdly, archiving and rolling over can be during checkpoint and we can > > broke everything accidentially. > > Fourthly, I see no benefits to overcomplicated already complicated logic. > > This is basically problem of misunderstanding and tuning. > > There are a lot of similar topics for almost every DB. [3] > > > > [1] -- https://man7.org/linux/man-pages/man2/ftruncate.2.html > > [2] -- https://man7.org/linux/man-pages/man2/mmap.2.html > > [3] -- > > > https://www.google.com/search?q=pg_wal%2Fxlogtemp+no+space+left+on+device&oq=pg+wal+no > > > > пт, 6 нояб. 2020 г. в 10:42, ткаленко кирилл : > > > >> Hi, Ivan! > >> > >> I have only described ideas. But here are a few more details. > >> > >> We can take care not to go beyond > >> DataStorageConfiguration#maxWalArchiveSize. > >> > >> Before increasing the size of WAL archive (transferring to archive > >> /rollOver, compression, decompression), we can make sure that there > will be > >> enough space in the archive and if there is no such, then we will try > to > >> clean it. We cannot delete those segments that are required for > recovery > >> (between the last two checkpoints) and reserved for example for > historical > >> rebalancing. > >> > >> We can receive a notification about the change of checkpoints and the > >> reservation / release of segments, thus we can know how many segments > we > >> can delete right now. > >> > >> 06.11.2020, 09:53, "Ivan Daschinsky" : > >> >>> For example, when trying to move a segment to the archive. > >> > > >> > We cannot do this, we will lost data. We can truncate archived > segment if > >> > and only if it is not required for recovery. If last checkpoint > marker > >> > points to segment > >> > with lower index, we cannot delete any segment with higher index. So > the > >> > only moment where we can remove truncate segments is a finish of > >> checkpoint. > >> > > >> > пт, 6 нояб. 2020 г. в 09:46, ткаленко кирилл : > >> > > >> >> Hello, everybody! > >> >> > >> >> As far as I know, WAL archive is used for PITP(GridGain feature) and > >> >> historical rebalancing. > >> >> > >> >> Facundo seems to have a problem with running out of directory > >> >> (/opt/work/walarchive) space. > >> >> Currently, WAL archive is cleared at the end of checkpoint. > Potentially > >> >> long transaction may prevent checkpoint starting, thereby not > cleaning > >> WAL > >> >> archive, w
Re: Why WAL archives enabled by default?
Ivan, I agree with you that the archive is primarily about optimization. If the size of the archive is critical for the user, we have no protection against this, we can always go beyond this limit. Thus, the user needs to remember this and configure it in some way. I suggest not to exceed this limit and give the expected behavior for the user. At the same time, the segments needed for recovery will remain and there will be no data loss. 06.11.2020, 11:29, "Ivan Daschinsky" : > Guys, fisrt of all, archiving is not for PITR at all, this is optimization. > If we disable archiving, every rollover we need to create new file. If we > enable archiving, we reserve 10 (by default) segments filled with zeroes. > We use mmap by default, so if we use no-archiver approach: > 1. We firstly create new empty file > 2. Call on it sun.nio.ch.FileChannelImpl#map, thats under the hood > a. If file is shorter, than wal segment size, it > calls sun.nio.ch.FileDispatcherImpl#truncate0, this is under the hood just > a system call truncate [1] > b. Than it calls system call mmap on this > file sun.nio.ch.FileChannelImpl#map0, under the hood see [2] > These manipulation are not free and cheap. So rollover will be much much > slower. > If archiving is enabled, 10 segments are already preallocated at the moment > of node's start. > > When archiving is enabled, archiver just copy previous preallocated segment > and move it to archive directory. > This archived segment is crucial for recovery. When new checkpoints > finished, all eligible for trunocating segments are just removed. > > If archiving is disabled, we also write WAL segments in wal directory and > disabling archiving don't prevent you from storing segments, if they are > required for recovery. > >>> Before increasing the size of WAL archive (transferring to archive > > /rollOver, compression, decompression), we can make sure that there will be > enough space in the archive and if there is no such, then we will try to >>> clean it. We cannot delete those segments that are required for recovery > > (between the last two checkpoints) and reserved for example for historical > rebalancing. > First of all, compression/decompression is offtopic here. > Secondly, wal segments are required only with idx higher than LAST > checkpoint marker. > Thirdly, archiving and rolling over can be during checkpoint and we can > broke everything accidentially. > Fourthly, I see no benefits to overcomplicated already complicated logic. > This is basically problem of misunderstanding and tuning. > There are a lot of similar topics for almost every DB. [3] > > [1] -- https://man7.org/linux/man-pages/man2/ftruncate.2.html > [2] -- https://man7.org/linux/man-pages/man2/mmap.2.html > [3] -- > https://www.google.com/search?q=pg_wal%2Fxlogtemp+no+space+left+on+device&oq=pg+wal+no > > пт, 6 нояб. 2020 г. в 10:42, ткаленко кирилл : > >> Hi, Ivan! >> >> I have only described ideas. But here are a few more details. >> >> We can take care not to go beyond >> DataStorageConfiguration#maxWalArchiveSize. >> >> Before increasing the size of WAL archive (transferring to archive >> /rollOver, compression, decompression), we can make sure that there will be >> enough space in the archive and if there is no such, then we will try to >> clean it. We cannot delete those segments that are required for recovery >> (between the last two checkpoints) and reserved for example for historical >> rebalancing. >> >> We can receive a notification about the change of checkpoints and the >> reservation / release of segments, thus we can know how many segments we >> can delete right now. >> >> 06.11.2020, 09:53, "Ivan Daschinsky" : >> >>> For example, when trying to move a segment to the archive. >> > >> > We cannot do this, we will lost data. We can truncate archived segment if >> > and only if it is not required for recovery. If last checkpoint marker >> > points to segment >> > with lower index, we cannot delete any segment with higher index. So the >> > only moment where we can remove truncate segments is a finish of >> checkpoint. >> > >> > пт, 6 нояб. 2020 г. в 09:46, ткаленко кирилл : >> > >> >> Hello, everybody! >> >> >> >> As far as I know, WAL archive is used for PITP(GridGain feature) and >> >> historical rebalancing. >> >> >> >> Facundo seems to have a problem with running out of directory >> >> (/opt/work/walarchive) space. >> >> Currently, WAL archive is cleared at the end of checkpoint. Potentially >> >> long transaction may prevent checkpoint starting, thereby not cleaning >> WAL >> >> archive, which will lead to such an error. >> >> At the moment, I see such a WA to increase size of directory >> >> (/opt/work/walarchive) in k8s and avoid long transactions or something >> like >> >> that that modifies data and runs for a long time. >> >> >> >> And it is best to fix the logic of working with WAL archive. I think we >> >> should remove WAL archive cleanup from the end
Re: Why WAL archives enabled by default?
Guys, fisrt of all, archiving is not for PITR at all, this is optimization. If we disable archiving, every rollover we need to create new file. If we enable archiving, we reserve 10 (by default) segments filled with zeroes. We use mmap by default, so if we use no-archiver approach: 1. We firstly create new empty file 2. Call on it sun.nio.ch.FileChannelImpl#map, thats under the hood a. If file is shorter, than wal segment size, it calls sun.nio.ch.FileDispatcherImpl#truncate0, this is under the hood just a system call truncate [1] b. Than it calls system call mmap on this file sun.nio.ch.FileChannelImpl#map0, under the hood see [2] These manipulation are not free and cheap. So rollover will be much much slower. If archiving is enabled, 10 segments are already preallocated at the moment of node's start. When archiving is enabled, archiver just copy previous preallocated segment and move it to archive directory. This archived segment is crucial for recovery. When new checkpoints finished, all eligible for trunocating segments are just removed. If archiving is disabled, we also write WAL segments in wal directory and disabling archiving don't prevent you from storing segments, if they are required for recovery. >>Before increasing the size of WAL archive (transferring to archive /rollOver, compression, decompression), we can make sure that there will be enough space in the archive and if there is no such, then we will try to >>clean it. We cannot delete those segments that are required for recovery (between the last two checkpoints) and reserved for example for historical rebalancing. First of all, compression/decompression is offtopic here. Secondly, wal segments are required only with idx higher than LAST checkpoint marker. Thirdly, archiving and rolling over can be during checkpoint and we can broke everything accidentially. Fourthly, I see no benefits to overcomplicated already complicated logic. This is basically problem of misunderstanding and tuning. There are a lot of similar topics for almost every DB. [3] [1] -- https://man7.org/linux/man-pages/man2/ftruncate.2.html [2] -- https://man7.org/linux/man-pages/man2/mmap.2.html [3] -- https://www.google.com/search?q=pg_wal%2Fxlogtemp+no+space+left+on+device&oq=pg+wal+no пт, 6 нояб. 2020 г. в 10:42, ткаленко кирилл : > Hi, Ivan! > > I have only described ideas. But here are a few more details. > > We can take care not to go beyond > DataStorageConfiguration#maxWalArchiveSize. > > Before increasing the size of WAL archive (transferring to archive > /rollOver, compression, decompression), we can make sure that there will be > enough space in the archive and if there is no such, then we will try to > clean it. We cannot delete those segments that are required for recovery > (between the last two checkpoints) and reserved for example for historical > rebalancing. > > We can receive a notification about the change of checkpoints and the > reservation / release of segments, thus we can know how many segments we > can delete right now. > > 06.11.2020, 09:53, "Ivan Daschinsky" : > >>> For example, when trying to move a segment to the archive. > > > > We cannot do this, we will lost data. We can truncate archived segment if > > and only if it is not required for recovery. If last checkpoint marker > > points to segment > > with lower index, we cannot delete any segment with higher index. So the > > only moment where we can remove truncate segments is a finish of > checkpoint. > > > > пт, 6 нояб. 2020 г. в 09:46, ткаленко кирилл : > > > >> Hello, everybody! > >> > >> As far as I know, WAL archive is used for PITP(GridGain feature) and > >> historical rebalancing. > >> > >> Facundo seems to have a problem with running out of directory > >> (/opt/work/walarchive) space. > >> Currently, WAL archive is cleared at the end of checkpoint. Potentially > >> long transaction may prevent checkpoint starting, thereby not cleaning > WAL > >> archive, which will lead to such an error. > >> At the moment, I see such a WA to increase size of directory > >> (/opt/work/walarchive) in k8s and avoid long transactions or something > like > >> that that modifies data and runs for a long time. > >> > >> And it is best to fix the logic of working with WAL archive. I think we > >> should remove WAL archive cleanup from the end of the checkpoint and > do it > >> on demand. For example, when trying to move a segment to the archive. > >> > >> 06.11.2020, 01:58, "Denis Magda" : > >> > Folks, > >> > > >> > In my understanding, you need the archives only for features such as > >> PITR. > >> > Considering, that the PITR functionality is not provided in Ignite > why do > >> > we have the archives enabled by default? > >> > > >> > How about having this feature disabled by default to prevent the > >> following > >> > issues experienced by our users: > >> > > >> > http://apache-ignite-users.70518.x6.nabble.com/WAL-and-WAL-Archive-volume-size-recommendation-td34458.ht