Hi Folks, We have performance suggestions logged during Ignite start. Can we create data safety suggestions and list all these options during startup?
Arguments about Ignite is distributed and may have backup for protection make sence. Who can provide reference to test of such or similar scenario: - distributed recovery when LOG_ONLY mode is enabled on 2 nodes, 1 backup. - and 1 node is crashed during WAL logging of transaction with huge number of records. But node 2 was managed to log all records in TX. (Node1 WAL: RR crash, Node2 WAL: RRRRRR). - node 1 returns to cluster - it should be possible to see consistent state with all TX data after delta rebalancing. If such test exist, runs in TC and is not flaky I'm also ok with new relaxed default. Sincerely, Dmitriy Pavlov пн, 19 февр. 2018 г. в 10:06, Alexey Goncharuk <alexey.goncha...@gmail.com>: > In terms of 'safety', Ignite default settings are far beyond optimal. For > in-memory mode, we have 0 backups by default, which means partition loss in > a case of node failure, we have readFromBackup=true and PRIMARY_SYNC by > default which effectively cancels linearizability property for cache > updates, so setting the default WAL mode to LOG_ONLY does not seem to be a > bigger evil than it currently is. If we are to move to safer defaults, we > should change all of the affected sides. > > I also want to clarify the difference between guarantees in > non-fsync modes. We should distinguish the loss of durability (the loss of > the last update) because the update did not make it to disk and data loss > because the disk content was shuffled due to an incomplete page write. In > my understanding, the current situation is: > FSYNC: loss of durability: not possible, data loss: not possible > LOG_ONLY: loss of durability: possible only if OS/power fails, data loss: > possible only if OS/power fails > BACKGROUND: loss of durability: possible if Ignite process fails, data > loss: possible only if OS/power fails > > The data loss situation can be mitigated in the cluster using a large > enough replication factor (this is what Dmitriy was describing in the case > of LOG_ONLY and 3 backups configuration). > > Denis, > I do not think it is fair to compare Ignite defaults to Cassandra's > defaults because Cassandra is _not_ transactional _eventually consistent_ > datastore, they claim much weaker guarantees than Ignite. > > All in all, I'm ok to change the WAL default right now, but I would revisit > all those settings in 3.0 and made Ignite safe-first. > > 2018-02-17 3:24 GMT+03:00 Denis Magda <dma...@apache.org>: > > > Classic relational databases have no choice rather than to use FSYNC by > > default. RDBMS is all about consistency. > > > > Distributed databases try to balance consistency and performance. For > > instance, why to fsync every update if there is usually 1 backup copy? > > This is probably why VoltDB [1] and Cassandra use the modes comparable to > > Ignite's LOG_ONLY. > > > > Ignite as a distributed database should care of both consistency and > > performance. > > > > My vote goes to FSYNC, LOG_ONLY (default), BACKGROUND, NONE. > > > > > > [1] https://docs.voltdb.com/UsingVoltDB/CmdLogConfig.php > > > > -- > > Denis > > > > > > On Fri, Feb 16, 2018 at 2:14 PM, Dmitriy Setrakyan < > dsetrak...@apache.org> > > wrote: > > > > > Vova, > > > > > > I hear your concerns, but at the same time I know that one of the > largest > > > banks in eastern Europe is using Ignite in LOG_ONLY mode with 3 backups > > to > > > move money. The rational is that the probability of failure of 4 > servers > > at > > > hardware level at the same time is very low. However, if the JVM > process > > > fails on any server, then it can be safely restarted without loosing > > data. > > > In my view, this is why LOG_ONLY mode makes sense as a default. > > > > > > I still vote to change the default to LOG_ONLY, deprecate the DEFAULT > > name > > > altogether and add FSYNC mode instead. > > > > > > D. > > > > > > On Fri, Feb 16, 2018 at 4:05 PM, Vladimir Ozerov <voze...@gridgain.com > > > > > wrote: > > > > > > > Sergey, > > > > > > > > We do not have backups by default either, so essentially we are > loosing > > > > data by default. Moreover, backups are less reliable option than > fsync > > > > because a lot of users cannot afford putting servers into separate > > power > > > > circuits, so a single power failure may easily lead to poweroff of > the > > > > whole cluster at once, so data is lost still. This is normal practice > > > even > > > > for enterprise deployments (e.g. asynchronous replication). > > > > > > > > To make things even worse, we employ PRIMARY_SYNC mode by default! So > > > even > > > > if you configured backups, you still may loose data due to a single > > node > > > > failure - just shutdown the PRIMARY after commit is confirmed to the > > > client > > > > and your recent update will disappers. > > > > > > > > So this is what user should do to make himself safe: > > > > 1) Learn about WAL modes > > > > 2) Learn about backups > > > > 3) Learn about synchronization modes > > > > 4) Cross his fingers that he understood everything correctly and that > > > there > > > > are no other hidden surprises in Ignite which could lead to data > loss. > > > > > > > > Way to much for a product, claiming to be A*C*ID and persistent, > don't > > > you > > > > think so? > > > > > > > > Leaving deafult WAL mode with fsync resolves all these issues. > > > > > > > > Vladimir. > > > > > > > > > > > > On Fri, Feb 16, 2018 at 11:43 PM, Sergey Kozlov < > skoz...@gridgain.com> > > > > wrote: > > > > > > > > > I suppose some approaches used by classic databases makes no sense > > for > > > > > Ignite. FSYNC requirement for databases has the nature of single > host > > > > > solution. If you have corrupted db files you have corrupted (lost) > > > data. > > > > > > > > > > For Ignite the enough number of backups and the failure detecting > > logic > > > > can > > > > > provide the data consistency in term "cluster data consistency". > > > > > > > > > > > > > > > > > > > > On Fri, Feb 16, 2018 at 8:57 PM, Dmitry Pavlov < > > dpavlov....@gmail.com> > > > > > wrote: > > > > > > > > > > > Hi, all WAL modes except NONE protects from data consistency > > problem > > > > > > (B+Tree, pages, etc), which is why I suggest to avoid saying > > > > 'corrupted' > > > > > > about 'unapplied updates'. > > > > > > > > > > > > Log Only and Background may cause unapplied updates in case of > > > > OS/process > > > > > > failures. > > > > > > > > > > > > None mode IMO is not an option in case data consistency is > needed. > > > > > > > > > > > > пт, 16 февр. 2018 г. в 20:49, Valentin Kulichenko < > > > > > > valentin.kuliche...@gmail.com>: > > > > > > > > > > > > > Guys, > > > > > > > > > > > > > > While we're on this topic, what is the difference between > > > BACKGROUND > > > > > and > > > > > > > NONE in terms of semantics and provided guarantees? To me it > > looks > > > > like > > > > > > > both guarantee to recover the state since last checkpoint and > > > > anything > > > > > > else > > > > > > > can potentially be lost, so from user perspective they are the > > > same. > > > > > Am I > > > > > > > missing something here? > > > > > > > > > > > > > > Also there is the following in Javadoc for NONE: "If an Ignite > > node > > > > is > > > > > > > terminated in NONE mode abruptly, it is likely that the data > > stored > > > > on > > > > > > disk > > > > > > > is corrupted and work directory will need to be cleared for a > > node > > > > > > > restart.". If this is really the case, I'm not sure NONE makes > > > sense > > > > at > > > > > > > all. Why would I enable persistence if I'm likely to clear the > > > > storage > > > > > on > > > > > > > restart? > > > > > > > > > > > > > > -Val > > > > > > > > > > > > > > On Fri, Feb 16, 2018 at 8:39 AM, Vladimir Ozerov < > > > > voze...@gridgain.com > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > What is the reason to have DEFAULT mode at all if you claim > > > > LOG_ONLY > > > > > to > > > > > > > be > > > > > > > > completely safe? :) > > > > > > > > > > > > > > > > And how it could be safe provided that without fsync we loose > > > part > > > > of > > > > > > WAL > > > > > > > > itself in case of crash? > > > > > > > > > > > > > > > > пт, 16 февр. 2018 г. в 19:32, Dmitry Pavlov < > > > dpavlov....@gmail.com > > > > >: > > > > > > > > > > > > > > > > > Thank you. Data can't be corrupted in case crash because of > > WAL > > > > > > replay > > > > > > > > > (since completed checkpoint). Physical records are used to > > > > restore > > > > > > > > probably > > > > > > > > > corrupted pages in persistent store (we overwrite so called > > > 'grey > > > > > > > zone' - > > > > > > > > > pages we don't know for sure if they have been written). > > > > > > > > > > > > > > > > > > Only one effect is unwritten one or several last > > transactions. > > > It > > > > > is > > > > > > > not > > > > > > > > > the same with corrupted data. > > > > > > > > > > > > > > > > > > пт, 16 февр. 2018 г. в 19:19, Vladimir Ozerov < > > > > > voze...@gridgain.com > > > > > > >: > > > > > > > > > > > > > > > > > > > Log only mode is not safe - data might be corrupted in > case > > > of > > > > > > system > > > > > > > > > > crash. Oracle - fsync, Postgres - fsync, SQL Server - > > fsync, > > > > > > > Cassandra > > > > > > > > - > > > > > > > > > > similar to our “background”. > > > > > > > > > > > > > > > > > > > > пт, 16 февр. 2018 г. в 19:11, Dmitry Pavlov < > > > > > dpavlov....@gmail.com > > > > > > >: > > > > > > > > > > > > > > > > > > > > > Hi Vladimir, > > > > > > > > > > > > > > > > > > > > > > What you saying is defenetely make sence. > > > > > > > > > > > > > > > > > > > > > > In the same time LOG_ONLY is also safe mode, user will > be > > > > able > > > > > to > > > > > > > > > restore > > > > > > > > > > > system after crash. If it is not true, we should create > > > > > critical > > > > > > > > ticket > > > > > > > > > > and > > > > > > > > > > > fix it. > > > > > > > > > > > > > > > > > > > > > > Do you know other databases defaults, such as > Cassandra, > > > > > Oracle, > > > > > > > > > Postgre? > > > > > > > > > > > > > > > > > > > > > > Sincerely, > > > > > > > > > > > Dmitriy Pavlov > > > > > > > > > > > > > > > > > > > > > > пт, 16 февр. 2018 г. в 18:41, Vladimir Ozerov < > > > > > > > voze...@gridgain.com > > > > > > > > >: > > > > > > > > > > > > > > > > > > > > > > > Igniters, > > > > > > > > > > > > > > > > > > > > > > > > Sorry for pouring oil on the flames, but from > database > > > > > > > perspective > > > > > > > > > > moving > > > > > > > > > > > > from FSYNC to non-FSYNC mode appears to be a mistake. > > > When > > > > > you > > > > > > > work > > > > > > > > > > with > > > > > > > > > > > > database, your main expectation is that it will save > > your > > > > > data. > > > > > > > All > > > > > > > > > > > > production database vendor make sure that you are > safe, > > > not > > > > > > that > > > > > > > > you > > > > > > > > > > are > > > > > > > > > > > > fast. Moreover, some vendors even prevent you from > > being > > > in > > > > > > > unsafe > > > > > > > > > mode > > > > > > > > > > > > (e.g. you cannot disable fsync in SQL Server at all). > > > > > > > > > > > > > > > > > > > > > > > > If we continue going in this direction, we will end > up > > > > with a > > > > > > > > > product, > > > > > > > > > > > > which is unsafe out of the box and require tons of > > > > > > documentation > > > > > > > to > > > > > > > > > > > > understand how to make it safe. Definitely not the > > right > > > > > > message > > > > > > > to > > > > > > > > > the > > > > > > > > > > > > market. This is like a car without brakes - would you > > > like > > > > to > > > > > > > drive > > > > > > > > > it? > > > > > > > > > > > If > > > > > > > > > > > > this is Need For Speed game and you have unlimited > > lives > > > > > > > (in-memory > > > > > > > > > > cache > > > > > > > > > > > > with backing store), then yes. If this is a real life > > > with > > > > > > > > > > (persistence) > > > > > > > > > > > - > > > > > > > > > > > > then no. > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Feb 16, 2018 at 5:20 PM, Dmitriy Setrakyan < > > > > > > > > > > > dsetrak...@apache.org> > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Well, I cannot say that I like the name LOG_ONLY, > > but I > > > > > would > > > > > > > > vote > > > > > > > > > to > > > > > > > > > > > > keep > > > > > > > > > > > > > it for now, given that it is already documented in > > many > > > > > > places, > > > > > > > > > > blogs, > > > > > > > > > > > > and > > > > > > > > > > > > > examples. > > > > > > > > > > > > > > > > > > > > > > > > > > D. > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Feb 16, 2018 at 8:13 AM, Ivan Rakov < > > > > > > > > ivan.glu...@gmail.com > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > Looks like it's an Ignite term - I've never heard > > of > > > it > > > > > > > outside > > > > > > > > > > > Ignite > > > > > > > > > > > > > > scope. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Though, renaming existing enum value requires > > keeping > > > > old > > > > > > as > > > > > > > > > > > > deprecated. > > > > > > > > > > > > > > DEFAULT is confusing enough to pay this price. > > > > > > > > > > > > > > As for LOG_ONLY, I think we can keep it as long > as > > it > > > > has > > > > > > > good > > > > > > > > > and > > > > > > > > > > > > > > definitive javadoc. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best Regards, > > > > > > > > > > > > > > Ivan Rakov > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 16.02.2018 17:07, Dmitriy Setrakyan wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > >> Igniters, just to clarify, does the term > LOG_ONLY > > > mean > > > > > > > > anything > > > > > > > > > in > > > > > > > > > > > the > > > > > > > > > > > > > >> industry or is this just an Ignite term? > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> D. > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> On Fri, Feb 16, 2018 at 8:03 AM, Anton > Vinogradov > > < > > > > > > > > > > > > > >> avinogra...@gridgain.com> > > > > > > > > > > > > > >> wrote: > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> Log only mode: flushes application buffers. > > > > > > > > > > > > > >>> So, in synced mode without fsync guarantee. > > That's > > > > why > > > > > I > > > > > > > > > propose > > > > > > > > > > to > > > > > > > > > > > > > >>> rename > > > > > > > > > > > > > >>> it as SYNC. > > > > > > > > > > > > > >>> > > > > > > > > > > > > > >>> On Fri, Feb 16, 2018 at 4:49 PM, Ilya Lantukh < > > > > > > > > > > > ilant...@gridgain.com > > > > > > > > > > > > > > > > > > > > > > > > > > >>> wrote: > > > > > > > > > > > > > >>> > > > > > > > > > > > > > >>> I am OK with either FSYNC or STRICT variant. > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > >>>> LOG_ONLY name means "log without fsync". > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > >>>> On Fri, Feb 16, 2018 at 4:05 PM, Dmitriy > > > Setrakyan < > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > >>> dsetrak...@apache.org> > > > > > > > > > > > > > >>> > > > > > > > > > > > > > >>>> wrote: > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > >>>> On Fri, Feb 16, 2018 at 7:02 AM, Ivan Rakov < > > > > > > > > > > > ivan.glu...@gmail.com> > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > >>>> wrote: > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > >>>>> Why create a new term to define something > that > > > has > > > > > > > already > > > > > > > > > been > > > > > > > > > > > > > >>>>>> > > > > > > > > > > > > > >>>>> defined? > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > >>>>> That makes sense. I'm ok with FSYNC. > > > > > > > > > > > > > >>>>>> Anton, I don't understand why we should > rename > > > > > > LOG_ONLY > > > > > > > to > > > > > > > > > > SYNC. > > > > > > > > > > > > We > > > > > > > > > > > > > >>>>>> started this discussion with bad naming of > > > > DEFAULT, > > > > > > but > > > > > > > > this > > > > > > > > > > has > > > > > > > > > > > > > >>>>>> > > > > > > > > > > > > > >>>>> nothing > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > >>>>> to > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > >>>>>> do with LOG_ONLY (even though it may be > > > > scientific - > > > > > > but > > > > > > > > > SYNC > > > > > > > > > > > > sounds > > > > > > > > > > > > > >>>>>> scientific as well). > > > > > > > > > > > > > >>>>>> > > > > > > > > > > > > > >>>>>> I agree with Ivan, we should not go wild > with > > > > > > renaming. > > > > > > > > > > > However, I > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > >>>> would > > > > > > > > > > > > > >>> > > > > > > > > > > > > > >>>> like to find out what is the meaning behind > the > > > > > LOG_ONLY > > > > > > > > name. > > > > > > > > > > Can > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > >>>> someone > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > >>>>> explain? > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > >>>>> D. > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > >>>> -- > > > > > > > > > > > > > >>>> Best regards, > > > > > > > > > > > > > >>>> Ilya > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Sergey Kozlov > > > > > GridGain Systems > > > > > www.gridgain.com > > > > > > > > > > > > > > >