On Tue, May 10, 2016 at 05:54:44PM +0200, Quentin Casasnovas wrote: > On Tue, May 10, 2016 at 09:46:36AM -0600, Eric Blake wrote: > > On 05/10/2016 09:41 AM, Alex Bligh wrote: > > > > > > On 10 May 2016, at 16:29, Eric Blake <ebl...@redhat.com> wrote: > > > > > >> So the kernel is currently one of the clients that does NOT honor block > > >> sizes, and as such, servers should be prepared for ANY size up to > > >> UINT_MAX (other than DoS handling). > > > > > > Interesting followup question: > > > > > > If the kernel does not fragment TRIM requests at all (in the > > > same way it fragments read and write requests), I suspect > > > something bad may happen with TRIM requests over 2^31 > > > in size (particularly over 2^32 in size), as the length > > > field in nbd only has 32 bits. > > > > > > Whether it supports block size constraints or not, it is > > > going to need to do *some* breaking up of requests. > > > > Does anyone have an easy way to cause the kernel to request a trim > > operation that large on a > 4G export? I'm not familiar enough with > > EXT4 operation to know what file system operations you can run to > > ultimately indirectly create a file system trim operation that large. > > But maybe there is something simpler - does the kernel let you use the > > fallocate(2) syscall operation with FALLOC_FL_PUNCH_HOLE or > > FALLOC_FL_ZERO_RANGE on an fd backed by an NBD device? > > > > It was fairly reproducible here, we just used a random qcow2 image with > some Debian minimal system pre-installed, mounted that qcow2 image through > qemu-nbd then compiled a whole kernel inside it. Then you can make clean > and run fstrim on the mount point. I'm assuming you can go faster than > that by just writing a big file to the qcow2 image mounted without -o > discard, delete the big file, then remount with -o discard + run fstrim. >
Looks like there's an easier way: $ qemu-img create -f qcow2 foo.qcow2 10G $ qemu-nbd --discard=on -c /dev/nbd0 foo.qcow2 $ mkfs.ext4 /dev/nbd0 mke2fs 1.42.13 (17-May-2015) Discarding device blocks: failed - Input/output error Creating filesystem with 2621440 4k blocks and 655360 inodes Filesystem UUID: 25aeb51f-0dea-4c1d-8b65-61f6bcdf97e9 Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632 Allocating group tables: done Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done Notice the "Discarding device blocks: failed - Input/output error" line, I bet that it is mkfs.ext4 trying to trim all blocks prior to writing the filesystem, but it gets an I/O error while doing so. I haven't verified it is the same problem, but it it isn't, simply mount the resulting filesystem and run fstrim on it: $ mount -o discard /dev/nbd0 /tmp/foo $ fstrim /tmp/foo fstrim: /tmp/foo: FITRIM ioctl failed: Input/output error Quentin