On 23/03/2024 03:41, Bruce Momjian wrote: > On Fri, Mar 22, 2024 at 10:31:11PM +0100, Tomas Vondra wrote: >> Right, but things change over time - current storage devices support >> much larger sectors (LBA format), usually 4K. And if you do I/O with >> this size, it's usually atomic. >> >> AFAIK if you built Postgres with 4K pages, on a device with 4K LBA >> format, that would not need full-page writes - we always do I/O in 4k >> pages, and block layer does I/O (during writeback from page cache) with >> minimum guaranteed size = logical block size. 4K are great for OLTP >> systems in general, it'd be even better if we didn't need to worry about >> torn pages (but the tricky part is to be confident it's safe to disable >> them on a particular system). > > Yes, even if the file system is 8k, and the storage is 8k, we only know > that torn pages are impossible if the file system never overwrites > existing 8k pages, but writes new ones and then makes it active. I > think ZFS does that to handle snapshots. >
I think we can also avoid torn writes: - if filesystem's data path always writes in multiples of 8k (with alignment) - device supports 8k atomic writes. Then we might be able to push the responsibility to the device without having the overhead of a CoW FS or FPW=on. Of course, the performance here depends on the vendor specific implementation of atomics. We are trying to enable the former by adding LBS support to XFS in Linux. -- Pankaj