On Fri, Jun 30, 2023 at 03:51:18PM -0700, Andres Freund wrote: > > For a 4kB write, to say it is not partially written would be to require > > the operating system to guarantee that the 4kB write is not split into > > smaller writes which might each be atomic because smaller atomic writes > > would not help us. > > That's why were talking about drives with 4k sector size - you *can't* split > the writes below that.
Okay, good point. > The problem is that, as far as I know,it's not always obvious what block size > is being used on the actual storage level. It's not even trivial when > operating on a filesystem directly stored on a single block device ([1]). Once > there's things like LVM or disk encryption involved, it gets pretty hairy > ([2]). Once you know all the block devices, it's not too bad, but ... > > Greetings, > > Andres Freund > > [1] On linux I think you need to use stat() to figure out the st_dev for a > file, then look in /proc/self/mountinfo for the block device, use the name > of the file to look in /sys/block/$d/queue/physical_block_size. I just got a new server: https://momjian.us/main/blogs/blog/2023.html#June_28_2023 so tested this on my new M.2 NVME storage device: $ /sys/block/nvme0n1/queue/physical_block_size 262144 that's 256k, not 4k. > [2] The above doesn't work because e.g. a device mapper target might only > support 4k sectors, even though the sectors on the underlying storage device > are 512b sectors. E.g. my root filesystem is encrypted, and if you follow the > above recipe (with the added step of resolving the symlink to know the actual > device name), you would see a 4k sector size. Even though the underlying NVMe > disk only supports 512b sectors. Good point. -- Bruce Momjian <br...@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.