Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Denis Magda Thu, 22 Mar 2018 07:50:02 -0700

Ivan,

How quick are you going to merge the fix into the master? Many persistence
related optimizations have already stacked up. Probably, we can release
them sooner if the community agrees.


--
Denis

On Thu, Mar 22, 2018 at 5:22 AM, Ivan Rakov <ivan.glu...@gmail.com> wrote:

> Thanks all!
> We seem to have reached a consensus on this issue. I'll just add necessary
> fsyncs under IGNITE-7754.
>
> Best Regards,
> Ivan Rakov
>
>
> On 22.03.2018 15:13, Ilya Lantukh wrote:
>
>> +1 for fixing LOG_ONLY. If current implementation doesn't protect from
>> data
>> corruption, it doesn't make sence.
>>
>> On Wed, Mar 21, 2018 at 10:38 PM, Denis Magda <dma...@apache.org> wrote:
>>
>> +1 for the fix of LOG_ONLY
>>>
>>> On Wed, Mar 21, 2018 at 11:23 AM, Alexey Goncharuk <
>>> alexey.goncha...@gmail.com> wrote:
>>>
>>> +1 for fixing LOG_ONLY to enforce corruption safety given the provided
>>>> performance results.
>>>>
>>>> 2018-03-21 18:20 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>:
>>>>
>>>> +1 for accepting drop in LOG_ONLY. 7% is not that much and not a drop
>>>>>
>>>> at
>>>
>>>> all, provided that we fixing a bug. I.e. should we implement it
>>>>>
>>>> correctly
>>>
>>>> in the first place we would never notice any "drop".
>>>>> I do not understand why someone would like to use current broken mode.
>>>>>
>>>>> On Wed, Mar 21, 2018 at 6:11 PM, Dmitry Pavlov <dpavlov....@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hi, I think option 1 is better. As Val said any mode that allows
>>>>>>
>>>>> corruption
>>>>>
>>>>>> does not make much sense.
>>>>>>
>>>>>> What Ivan mentioned here as drop, in relation to old mode DEFAULT
>>>>>>
>>>>> (FSYNC
>>>>
>>>>> now), is still significant perfromance boost.
>>>>>>
>>>>>> Sincerely,
>>>>>> Dmitriy Pavlov
>>>>>>
>>>>>> ср, 21 мар. 2018 г. в 17:56, Ivan Rakov <ivan.glu...@gmail.com>:
>>>>>>
>>>>>> I've attached benchmark results to the JIRA ticket.
>>>>>>> We observe ~7% drop in "fair" LOG_ONLY_SAFE mode, independent of
>>>>>>>
>>>>>> WAL
>>>
>>>> compaction enabled flag. It's pretty significant drop: WAL
>>>>>>>
>>>>>> compaction
>>>
>>>> itself gives only ~3% drop.
>>>>>>>
>>>>>>> I see two options here:
>>>>>>> 1) Change LOG_ONLY behavior. That implies that we'll be ready to
>>>>>>>
>>>>>> release
>>>>>
>>>>>> AI 2.5 with 7% drop.
>>>>>>> 2) Introduce LOG_ONLY_SAFE, make it default, add release note to AI
>>>>>>>
>>>>>> 2.5
>>>>
>>>>> that we added power loss durability in default mode, but user may
>>>>>>> fallback to previous LOG_ONLY in order to retain performance.
>>>>>>>
>>>>>>> Thoughts?
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Ivan Rakov
>>>>>>>
>>>>>>> On 20.03.2018 16:00, Ivan Rakov wrote:
>>>>>>>
>>>>>>>> Val,
>>>>>>>>
>>>>>>>> If a storage is in
>>>>>>>>> corrupted state, does it mean that it needs to be completely
>>>>>>>>>
>>>>>>>> removed
>>>>
>>>>> and
>>>>>>
>>>>>>> cluster needs to be restarted without data?
>>>>>>>>>
>>>>>>>> Yes, there's a chance that in LOG_ONLY all local data will be
>>>>>>>>
>>>>>>> lost,
>>>
>>>> but only in *power loss**/ OS crash* case.
>>>>>>>> kill -9, JVM crash, death of critical system thread and all other
>>>>>>>> cases that usually take place are variations of *process crash*.
>>>>>>>>
>>>>>>> All
>>>>
>>>>> WAL modes (except NONE, of course) ensure corruption-safety in
>>>>>>>>
>>>>>>> case
>>>
>>>> of
>>>>>
>>>>>> process crash.
>>>>>>>>
>>>>>>>> If so, I'm not sure any mode
>>>>>>>>> that allows corruption makes much sense to me.
>>>>>>>>>
>>>>>>>> It depends on performance impact of enforcing power-loss
>>>>>>>>
>>>>>>> corruption
>>>
>>>> safety. Price of full protection from power loss is high - FSYNC
>>>>>>>>
>>>>>>> is
>>>
>>>> way slower (2-10 times) than other WAL modes. The question is
>>>>>>>>
>>>>>>> whether
>>>>
>>>>> ensuring weaker guarantees (corruption can't happen, but loss of
>>>>>>>>
>>>>>>> last
>>>>
>>>>> updates can) will affect performance as badly as strong
>>>>>>>>
>>>>>>> guarantees.
>>>
>>>> I'll share benchmark results soon.
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>> Ivan Rakov
>>>>>>>>
>>>>>>>> On 20.03.2018 5:09, Valentin Kulichenko wrote:
>>>>>>>>
>>>>>>>>> Guys,
>>>>>>>>>
>>>>>>>>> What do we understand under "data corruption" here? If a storage
>>>>>>>>>
>>>>>>>> is
>>>>
>>>>> in
>>>>>
>>>>>> corrupted state, does it mean that it needs to be completely
>>>>>>>>>
>>>>>>>> removed
>>>>
>>>>> and
>>>>>>
>>>>>>> cluster needs to be restarted without data? If so, I'm not sure
>>>>>>>>>
>>>>>>>> any
>>>>
>>>>> mode
>>>>>>
>>>>>>> that allows corruption makes much sense to me. How am I supposed
>>>>>>>>>
>>>>>>>> to
>>>>
>>>>> use a
>>>>>>>>> database, if virtually any failure can end with complete loss of
>>>>>>>>>
>>>>>>>> data?
>>>>>
>>>>>> In any case, this definitely should not be a default behavior.
>>>>>>>>>
>>>>>>>> If
>>>
>>>> user ever
>>>>>>>>> switches to corruption-unsafe mode, there should be a clear
>>>>>>>>>
>>>>>>>> warning
>>>>
>>>>> about
>>>>>>>>> this.
>>>>>>>>>
>>>>>>>>> -Val
>>>>>>>>>
>>>>>>>>> On Fri, Mar 16, 2018 at 1:06 AM, Ivan Rakov <
>>>>>>>>>
>>>>>>>> ivan.glu...@gmail.com>
>>>>
>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Ticket to track changes:
>>>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-7754
>>>>>>>>>>
>>>>>>>>>> Best Regards,
>>>>>>>>>> Ivan Rakov
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 16.03.2018 10:58, Dmitriy Setrakyan wrote:
>>>>>>>>>>
>>>>>>>>>> On Fri, Mar 16, 2018 at 12:55 AM, Ivan Rakov <
>>>>>>>>>>>
>>>>>>>>>> ivan.glu...@gmail.com
>>>>>
>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Vladimir,
>>>>>>>>>>>
>>>>>>>>>>>> Unlike BACKGROUND, LOG_ONLY provides strict write guarantees
>>>>>>>>>>>> unless power
>>>>>>>>>>>> loss has happened.
>>>>>>>>>>>> Seems like we need to measure performance difference to
>>>>>>>>>>>>
>>>>>>>>>>> decide
>>>
>>>> whether do
>>>>>>>>>>>> we need separate WAL mode. If it will be invisible, we'll
>>>>>>>>>>>>
>>>>>>>>>>> just
>>>
>>>> fix
>>>>>
>>>>>> these
>>>>>>>>>>>> bugs without introducing new mode; if it will be perceptible,
>>>>>>>>>>>>
>>>>>>>>>>> we'll
>>>>>
>>>>>> continue the discussion about introducing LOG_ONLY_SAFE.
>>>>>>>>>>>> Makes sense?
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, this sounds like the right approach.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>
>>
>

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Reply via email to