Ivan, How quick are you going to merge the fix into the master? Many persistence related optimizations have already stacked up. Probably, we can release them sooner if the community agrees.
-- Denis On Thu, Mar 22, 2018 at 5:22 AM, Ivan Rakov <ivan.glu...@gmail.com> wrote: > Thanks all! > We seem to have reached a consensus on this issue. I'll just add necessary > fsyncs under IGNITE-7754. > > Best Regards, > Ivan Rakov > > > On 22.03.2018 15:13, Ilya Lantukh wrote: > >> +1 for fixing LOG_ONLY. If current implementation doesn't protect from >> data >> corruption, it doesn't make sence. >> >> On Wed, Mar 21, 2018 at 10:38 PM, Denis Magda <dma...@apache.org> wrote: >> >> +1 for the fix of LOG_ONLY >>> >>> On Wed, Mar 21, 2018 at 11:23 AM, Alexey Goncharuk < >>> alexey.goncha...@gmail.com> wrote: >>> >>> +1 for fixing LOG_ONLY to enforce corruption safety given the provided >>>> performance results. >>>> >>>> 2018-03-21 18:20 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>: >>>> >>>> +1 for accepting drop in LOG_ONLY. 7% is not that much and not a drop >>>>> >>>> at >>> >>>> all, provided that we fixing a bug. I.e. should we implement it >>>>> >>>> correctly >>> >>>> in the first place we would never notice any "drop". >>>>> I do not understand why someone would like to use current broken mode. >>>>> >>>>> On Wed, Mar 21, 2018 at 6:11 PM, Dmitry Pavlov <dpavlov....@gmail.com> >>>>> wrote: >>>>> >>>>> Hi, I think option 1 is better. As Val said any mode that allows >>>>>> >>>>> corruption >>>>> >>>>>> does not make much sense. >>>>>> >>>>>> What Ivan mentioned here as drop, in relation to old mode DEFAULT >>>>>> >>>>> (FSYNC >>>> >>>>> now), is still significant perfromance boost. >>>>>> >>>>>> Sincerely, >>>>>> Dmitriy Pavlov >>>>>> >>>>>> ср, 21 мар. 2018 г. в 17:56, Ivan Rakov <ivan.glu...@gmail.com>: >>>>>> >>>>>> I've attached benchmark results to the JIRA ticket. >>>>>>> We observe ~7% drop in "fair" LOG_ONLY_SAFE mode, independent of >>>>>>> >>>>>> WAL >>> >>>> compaction enabled flag. It's pretty significant drop: WAL >>>>>>> >>>>>> compaction >>> >>>> itself gives only ~3% drop. >>>>>>> >>>>>>> I see two options here: >>>>>>> 1) Change LOG_ONLY behavior. That implies that we'll be ready to >>>>>>> >>>>>> release >>>>> >>>>>> AI 2.5 with 7% drop. >>>>>>> 2) Introduce LOG_ONLY_SAFE, make it default, add release note to AI >>>>>>> >>>>>> 2.5 >>>> >>>>> that we added power loss durability in default mode, but user may >>>>>>> fallback to previous LOG_ONLY in order to retain performance. >>>>>>> >>>>>>> Thoughts? >>>>>>> >>>>>>> Best Regards, >>>>>>> Ivan Rakov >>>>>>> >>>>>>> On 20.03.2018 16:00, Ivan Rakov wrote: >>>>>>> >>>>>>>> Val, >>>>>>>> >>>>>>>> If a storage is in >>>>>>>>> corrupted state, does it mean that it needs to be completely >>>>>>>>> >>>>>>>> removed >>>> >>>>> and >>>>>> >>>>>>> cluster needs to be restarted without data? >>>>>>>>> >>>>>>>> Yes, there's a chance that in LOG_ONLY all local data will be >>>>>>>> >>>>>>> lost, >>> >>>> but only in *power loss**/ OS crash* case. >>>>>>>> kill -9, JVM crash, death of critical system thread and all other >>>>>>>> cases that usually take place are variations of *process crash*. >>>>>>>> >>>>>>> All >>>> >>>>> WAL modes (except NONE, of course) ensure corruption-safety in >>>>>>>> >>>>>>> case >>> >>>> of >>>>> >>>>>> process crash. >>>>>>>> >>>>>>>> If so, I'm not sure any mode >>>>>>>>> that allows corruption makes much sense to me. >>>>>>>>> >>>>>>>> It depends on performance impact of enforcing power-loss >>>>>>>> >>>>>>> corruption >>> >>>> safety. Price of full protection from power loss is high - FSYNC >>>>>>>> >>>>>>> is >>> >>>> way slower (2-10 times) than other WAL modes. The question is >>>>>>>> >>>>>>> whether >>>> >>>>> ensuring weaker guarantees (corruption can't happen, but loss of >>>>>>>> >>>>>>> last >>>> >>>>> updates can) will affect performance as badly as strong >>>>>>>> >>>>>>> guarantees. >>> >>>> I'll share benchmark results soon. >>>>>>>> >>>>>>>> Best Regards, >>>>>>>> Ivan Rakov >>>>>>>> >>>>>>>> On 20.03.2018 5:09, Valentin Kulichenko wrote: >>>>>>>> >>>>>>>>> Guys, >>>>>>>>> >>>>>>>>> What do we understand under "data corruption" here? If a storage >>>>>>>>> >>>>>>>> is >>>> >>>>> in >>>>> >>>>>> corrupted state, does it mean that it needs to be completely >>>>>>>>> >>>>>>>> removed >>>> >>>>> and >>>>>> >>>>>>> cluster needs to be restarted without data? If so, I'm not sure >>>>>>>>> >>>>>>>> any >>>> >>>>> mode >>>>>> >>>>>>> that allows corruption makes much sense to me. How am I supposed >>>>>>>>> >>>>>>>> to >>>> >>>>> use a >>>>>>>>> database, if virtually any failure can end with complete loss of >>>>>>>>> >>>>>>>> data? >>>>> >>>>>> In any case, this definitely should not be a default behavior. >>>>>>>>> >>>>>>>> If >>> >>>> user ever >>>>>>>>> switches to corruption-unsafe mode, there should be a clear >>>>>>>>> >>>>>>>> warning >>>> >>>>> about >>>>>>>>> this. >>>>>>>>> >>>>>>>>> -Val >>>>>>>>> >>>>>>>>> On Fri, Mar 16, 2018 at 1:06 AM, Ivan Rakov < >>>>>>>>> >>>>>>>> ivan.glu...@gmail.com> >>>> >>>>> wrote: >>>>>>>>> >>>>>>>>> Ticket to track changes: >>>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-7754 >>>>>>>>>> >>>>>>>>>> Best Regards, >>>>>>>>>> Ivan Rakov >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 16.03.2018 10:58, Dmitriy Setrakyan wrote: >>>>>>>>>> >>>>>>>>>> On Fri, Mar 16, 2018 at 12:55 AM, Ivan Rakov < >>>>>>>>>>> >>>>>>>>>> ivan.glu...@gmail.com >>>>> >>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Vladimir, >>>>>>>>>>> >>>>>>>>>>>> Unlike BACKGROUND, LOG_ONLY provides strict write guarantees >>>>>>>>>>>> unless power >>>>>>>>>>>> loss has happened. >>>>>>>>>>>> Seems like we need to measure performance difference to >>>>>>>>>>>> >>>>>>>>>>> decide >>> >>>> whether do >>>>>>>>>>>> we need separate WAL mode. If it will be invisible, we'll >>>>>>>>>>>> >>>>>>>>>>> just >>> >>>> fix >>>>> >>>>>> these >>>>>>>>>>>> bugs without introducing new mode; if it will be perceptible, >>>>>>>>>>>> >>>>>>>>>>> we'll >>>>> >>>>>> continue the discussion about introducing LOG_ONLY_SAFE. >>>>>>>>>>>> Makes sense? >>>>>>>>>>>> >>>>>>>>>>>> Yes, this sounds like the right approach. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>> >> >> >