On 02.08.2018 14:27 Austin S. Hemmelgarn wrote: > On 2018-08-02 06:56, Qu Wenruo wrote: >> >> On 2018年08月02日 18:45, Andrei Borzenkov wrote: >>> >>> Отправлено с iPhone >>> >>>> 2 авг. 2018 г., в 10:02, Qu Wenruo <quwenruo.bt...@gmx.com> >>>> написал(а): >>>> >>>>> On 2018年08月01日 11:45, MegaBrutal wrote: >>>>> Hi all, >>>>> >>>>> I know it's a decade-old question, but I'd like to hear your thoughts >>>>> of today. By now, I became a heavy BTRFS user. Almost everywhere I >>>>> use >>>>> BTRFS, except in situations when it is obvious there is no benefit >>>>> (e.g. /var/log, /boot). At home, all my desktop, laptop and server >>>>> computers are mainly running on BTRFS with only a few file systems on >>>>> ext4. I even installed BTRFS in corporate productive systems (in >>>>> those >>>>> cases, the systems were mainly on ext4; but there were some specific >>>>> file systems those exploited BTRFS features). >>>>> >>>>> But there is still one question that I can't get over: if you store a >>>>> database (e.g. MySQL), would you prefer having a BTRFS volume mounted >>>>> with nodatacow, or would you just simply use ext4? >>>>> >>>>> I know that with nodatacow, I take away most of the benefits of BTRFS >>>>> (those are actually hurting database performance – the exact CoW >>>>> nature that is elsewhere a blessing, with databases it's a drawback). >>>>> But are there any advantages of still sticking to BTRFS for a >>>>> database >>>>> albeit CoW is disabled, or should I just return to the old and >>>>> reliable ext4 for those applications? >>>> >>>> Since I'm not a expert in database, so I can totally be wrong, but >>>> what >>>> about completely disabling database write-ahead-log (WAL), and let >>>> btrfs' data CoW to handle data consistency completely? >>>> >>> >>> This would make content of database after crash completely >>> unpredictable, thus making it impossible to reliably roll back >>> transaction. >> >> Btrfs itself (with datacow) can ensure the fs is updated completely. >> >> That's to say, even a crash happens, the content of the fs will be the >> same state as previous btrfs transaction (btrfs sync). >> >> Thus there is no need to rollback database transaction though. >> (Unless database transaction is not sync to btrfs transaction) >> > Two issues with this statement: > > 1. Not all database software properly groups logically related > operations that need to be atomic as a unit into transactions. > 2. Even aside from point 1 and the possibility of database corruption, > there are other legitimate reasons that you might need to roll-back a > transaction (for example, the rather obvious case of a transaction > that should not have happened in the first place).
I thought of a database transaction scheme that is based on btrfs features before. It has practical issues, though. One would put a b-tree database file into a subvolume (e.g. trans_0). When changing the b-tree database one would create a snapshot (trans_1), then change the file in the snapshot. On commit sync trans_1, then delete trans_0. On rollback, delete trans_1. Problems: * Large overhead for small transactions (OLTP) -- problem in general for copy-on-write b-tree databases * Only root can create or destroy snapshots * Per default the Linux memory system starts write-back pretty much immediately, so pages that get overwritten more than once in a transaction (and not kept in RAM) unless Linux is tuned to not do this. I have used this method, albeit by reflinking the database, then modifying the reflink, but I think reflinking it slower than creating a snapshot? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html