Re: BTRFS and databases

Martin Raiber Thu, 02 Aug 2018 06:15:03 -0700

On 02.08.2018 14:27 Austin S. Hemmelgarn wrote:
> On 2018-08-02 06:56, Qu Wenruo wrote:
>>
>> On 2018年08月02日 18:45, Andrei Borzenkov wrote:
>>>
>>> Отправлено с iPhone
>>>
>>>> 2 авг. 2018 г., в 10:02, Qu Wenruo <quwenruo.bt...@gmx.com>
>>>> написал(а):
>>>>
>>>>> On 2018年08月01日 11:45, MegaBrutal wrote:
>>>>> Hi all,
>>>>>
>>>>> I know it's a decade-old question, but I'd like to hear your thoughts
>>>>> of today. By now, I became a heavy BTRFS user. Almost everywhere I
>>>>> use
>>>>> BTRFS, except in situations when it is obvious there is no benefit
>>>>> (e.g. /var/log, /boot). At home, all my desktop, laptop and server
>>>>> computers are mainly running on BTRFS with only a few file systems on
>>>>> ext4. I even installed BTRFS in corporate productive systems (in
>>>>> those
>>>>> cases, the systems were mainly on ext4; but there were some specific
>>>>> file systems those exploited BTRFS features).
>>>>>
>>>>> But there is still one question that I can't get over: if you store a
>>>>> database (e.g. MySQL), would you prefer having a BTRFS volume mounted
>>>>> with nodatacow, or would you just simply use ext4?
>>>>>
>>>>> I know that with nodatacow, I take away most of the benefits of BTRFS
>>>>> (those are actually hurting database performance – the exact CoW
>>>>> nature that is elsewhere a blessing, with databases it's a drawback).
>>>>> But are there any advantages of still sticking to BTRFS for a
>>>>> database
>>>>> albeit CoW is disabled, or should I just return to the old and
>>>>> reliable ext4 for those applications?
>>>>
>>>> Since I'm not a expert in database, so I can totally be wrong, but
>>>> what
>>>> about completely disabling database write-ahead-log (WAL), and let
>>>> btrfs' data CoW to handle data consistency completely?
>>>>
>>>
>>> This would make content of database after crash completely
>>> unpredictable, thus making it impossible to reliably roll back
>>> transaction.
>>
>> Btrfs itself (with datacow) can ensure the fs is updated completely.
>>
>> That's to say, even a crash happens, the content of the fs will be the
>> same state as previous btrfs transaction (btrfs sync).
>>
>> Thus there is no need to rollback database transaction though.
>> (Unless database transaction is not sync to btrfs transaction)
>>
> Two issues with this statement:
>
> 1. Not all database software properly groups logically related
> operations that need to be atomic as a unit into transactions.
> 2. Even aside from point 1 and the possibility of database corruption,
> there are other legitimate reasons that you might need to roll-back a
> transaction (for example, the rather obvious case of a transaction
> that should not have happened in the first place).


I thought of a database transaction scheme that is based on btrfs
features before. It has practical issues, though.
One would put a b-tree database file into a subvolume (e.g. trans_0).
When changing the b-tree database one would create a snapshot (trans_1),
then change the file in the snapshot. On commit sync trans_1, then
delete trans_0. On rollback, delete trans_1.

Problems:
* Large overhead for small transactions (OLTP) -- problem in general for
copy-on-write b-tree databases
* Only root can create or destroy snapshots
* Per default the Linux memory system starts write-back pretty much
immediately, so pages that get overwritten more than once in a
transaction (and not kept in RAM) unless Linux is tuned to not do this.

I have used this method, albeit by reflinking the database, then
modifying the reflink, but I think reflinking it slower than creating a
snapshot?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS and databases

Reply via email to