Re: [Qemu-devel] [PATCH v12 4/7] dm: enable synchronous dax
On Tue, Jun 11 2019 at 12:37pm -0400, Pankaj Gupta wrote: > This patch sets dax device 'DAXDEV_SYNC' flag if all the target > devices of device mapper support synchrononous DAX. If device > mapper consists of both synchronous and asynchronous dax devices, > we don't set 'DAXDEV_SYNC' flag. > > 'dm_table_supports_dax' is refactored to pass 'iterate_devices_fn' > as argument so that the callers can pass the appropriate functions. > > Suggested-by: Mike Snitzer > Signed-off-by: Pankaj Gupta Thanks, and for the benefit of others, passing function pointers like this is perfectly fine IMHO because this code is _not_ in the fast path. These methods are only for device creation. Reviewed-by: Mike Snitzer
Re: [Qemu-devel] [PATCH v11 4/7] dm: enable synchronous dax
On Tue, Jun 11 2019 at 9:10am -0400, Pankaj Gupta wrote: > Hi Mike, > > Thanks for the review Please find my reply inline. > > > > > dm_table_supports_dax() is called multiple times (from > > dm_table_set_restrictions and dm_table_determine_type). It is strange > > to have a getter have a side-effect of being a setter too. Overloading > > like this could get you in trouble in the future. > > > > Are you certain this is what you want? > > I agree with you. > > > > > Or would it be better to refactor dm_table_supports_dax() to take an > > iterate_devices_fn arg and have callers pass the appropriate function? > > Then have dm_table_set_restrictions() caller do: > > > > if (dm_table_supports_dax(t, device_synchronous, NULL)) > >set_dax_synchronous(t->md->dax_dev); > > > > (NULL arg implies dm_table_supports_dax() refactoring would take a int > > *data pointer rather than int type). > > > > Mike > > > > I am sending below patch as per your suggestion. Does it look > near to what you have in mind? Yes, it does.. just one nit I noticed inlined below. > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c > index 350cf0451456..8d89acc8b8c2 100644 > --- a/drivers/md/dm-table.c > +++ b/drivers/md/dm-table.c ... > @@ -1910,8 +1919,13 @@ void dm_table_set_restrictions(struct dm_table *t, > struct request_queue *q, > } > blk_queue_write_cache(q, wc, fua); > > - if (dm_table_supports_dax(t, PAGE_SIZE)) > + if (dm_table_supports_dax(t, device_supports_dax, _size)) { > + No need for an empty newline here ^ > blk_queue_flag_set(QUEUE_FLAG_DAX, q); > + if (dm_table_supports_dax(t, device_synchronous, NULL)) > + set_dax_synchronous(t->md->dax_dev); > + } > else > blk_queue_flag_clear(QUEUE_FLAG_DAX, q); > Thanks, Mike
Re: [Qemu-devel] [PATCH v11 4/7] dm: enable synchronous dax
On Mon, Jun 10 2019 at 5:07am -0400, Pankaj Gupta wrote: > This patch sets dax device 'DAXDEV_SYNC' flag if all the target > devices of device mapper support synchrononous DAX. If device > mapper consists of both synchronous and asynchronous dax devices, > we don't set 'DAXDEV_SYNC' flag. > > Signed-off-by: Pankaj Gupta > --- > drivers/md/dm-table.c | 14 ++ > 1 file changed, 14 insertions(+) > > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c > index 350cf0451456..c5160d846fe6 100644 > --- a/drivers/md/dm-table.c > +++ b/drivers/md/dm-table.c > @@ -890,10 +890,17 @@ static int device_supports_dax(struct dm_target *ti, > struct dm_dev *dev, > start, len); > } > > +static int device_synchronous(struct dm_target *ti, struct dm_dev *dev, > +sector_t start, sector_t len, void *data) > +{ > + return dax_synchronous(dev->dax_dev); > +} > + > bool dm_table_supports_dax(struct dm_table *t, int blocksize) > { > struct dm_target *ti; > unsigned i; > + bool dax_sync = true; > > /* Ensure that all targets support DAX. */ > for (i = 0; i < dm_table_get_num_targets(t); i++) { > @@ -906,7 +913,14 @@ bool dm_table_supports_dax(struct dm_table *t, int > blocksize) > !ti->type->iterate_devices(ti, device_supports_dax, > )) > return false; > + > + /* Check devices support synchronous DAX */ > + if (dax_sync && > + !ti->type->iterate_devices(ti, device_synchronous, NULL)) > + dax_sync = false; > } > + if (dax_sync) > + set_dax_synchronous(t->md->dax_dev); > > return true; > } > -- > 2.20.1 > dm_table_supports_dax() is called multiple times (from dm_table_set_restrictions and dm_table_determine_type). It is strange to have a getter have a side-effect of being a setter too. Overloading like this could get you in trouble in the future. Are you certain this is what you want? Or would it be better to refactor dm_table_supports_dax() to take an iterate_devices_fn arg and have callers pass the appropriate function? Then have dm_table_set_restrictions() caller do: if (dm_table_supports_dax(t, device_synchronous, NULL)) set_dax_synchronous(t->md->dax_dev); (NULL arg implies dm_table_supports_dax() refactoring would take a int *data pointer rather than int type). Mike
Re: [Qemu-devel] virtio_blk: fix defaults for max_hw_sectors and max_segment_size
On Wed, Nov 26 2014 at 2:48pm -0500, Jens Axboe ax...@kernel.dk wrote: On 11/21/2014 08:49 AM, Mike Snitzer wrote: On Fri, Nov 21 2014 at 4:54am -0500, Christoph Hellwig h...@infradead.org wrote: On Thu, Nov 20, 2014 at 02:00:59PM -0500, Mike Snitzer wrote: virtio_blk incorrectly established -1U as the default for these queue_limits. Set these limits to sane default values to avoid crashing the kernel. But the virtio-blk protocol should probably be extended to allow proper stacking of the disk's limits from the host. This change fixes a crash that was reported when virtio-blk was used to test linux-dm.git commit 604ea90641b4 (dm thin: adjust max_sectors_kb based on thinp blocksize) that will initially set max_sectors to max_hw_sectors and then rounddown to the first power-of-2 factor of the DM thin-pool's blocksize. Basically that commit assumes drivers don't suck when establishing max_hw_sectors so it acted like a canary in the coal mine. Is that a crash in the host or guest? What kind of mishandling did you see? Unless the recent virtio standard changed anything the host should be able to handle our arbitrary limits, and even if it doesn't that something we need to hash out with qemu and the virtio standards folks. Some good news: this guest crash isn't an issue with recent kernels (so upstream, fedora 20, RHEL7, etc aren't impacted -- Jens feel free to drop my virtio_blk patch; even though some of it's limits are clearly broken I'll defer to the virtio_blk developers on the best way forward -- sorry for the noise!). The BUG I saw only seems to impact RHEL6 kernels so far (note to self, actually _test_ on upstream before reporting a crash against upstream!) [root@RHEL-6 ~]# echo 1073741824 /sys/block/vdc/queue/max_sectors_kb [root@RHEL-6 ~]# lvs Message from syslogd@RHEL-6 at Nov 21 15:32:15 ... kernel:Kernel panic - not syncing: Fatal exception Here is the RHEL6 guest crash, just for full disclosure: kernel BUG at fs/direct-io.c:696! invalid opcode: [#1] SMP last sysfs file: /sys/devices/virtual/block/dm-4/dev CPU 0 Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipv6 ext2 dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio libcrc32c dm_mirror dm_region_hash dm_log dm_mod microcode virtio_balloon i2c_piix4 i2c_core virtio_net ext4 jbd2 mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix [last unloaded: speedstep_lib] Pid: 1679, comm: lvs Not tainted 2.6.32 #6 Bochs Bochs RIP: 0010:[811ce336] [811ce336] __blockdev_direct_IO_newtrunc+0x986/0x1270 RSP: 0018:88011a11ba48 EFLAGS: 00010287 RAX: RBX: 8801192fbd28 RCX: 1000 RDX: ea0003b3d218 RSI: 88011aac4300 RDI: 880118572378 RBP: 88011a11bbe8 R08: R09: R10: R11: R12: 8801192fbd00 R13: R14: 880118c3cac0 R15: FS: 7fde78bc37a0() GS:88002820() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 012706f0 CR3: 00011a432000 CR4: 000407f0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process lvs (pid: 1679, threadinfo 88011a11a000, task 8801185a4aa0) Stack: 88011a11bb48 88011a11baa8 8801000c 88011a11bb18 d 88011a11bdc8 88011a11beb8 d 000c1a11baa8 880118c3cb98 18c3ccb8 Call Trace: [811c9e90] ? blkdev_get_block+0x0/0x20 [811cec97] __blockdev_direct_IO+0x77/0xe0 [811c9e90] ? blkdev_get_block+0x0/0x20 [811caf17] blkdev_direct_IO+0x57/0x60 [811c9e90] ? blkdev_get_block+0x0/0x20 [8112619b] generic_file_aio_read+0x6bb/0x700 [811cba60] ? blkdev_get+0x10/0x20 [811cba70] ? blkdev_open+0x0/0xc0 [8118af4f] ? __dentry_open+0x23f/0x360 [811ca2d1] blkdev_aio_read+0x51/0x80 [8118dc6a] do_sync_read+0xfa/0x140 [8109eaf0] ? autoremove_wake_function+0x0/0x40 [811ca22c] ? block_ioctl+0x3c/0x40 [811a34c2] ? vfs_ioctl+0x22/0xa0 [811a3664] ? do_vfs_ioctl+0x84/0x580 [8122cee6] ? security_file_permission+0x16/0x20 [8118e625] vfs_read+0xb5/0x1a0 [8118e761] sys_read+0x51/0x90 [810e5aae] ? __audit_syscall_exit+0x25e/0x290 [8100b072] system_call_fastpath+0x16/0x1b Code: fe ff ff c7 85 fc fe ff ff 00 00 00 00 48 89 95 10 ff ff ff 8b 95 34 ff ff ff e8 46 ac ff ff 3b 85 34 ff ff ff 0f 84 fc 02 00 00 0f 0b eb fe 8b 9d 34 ff ff ff 8b 85 30 ff ff ff 01 d8 85 c0 0f RIP [811ce336] __blockdev_direct_IO_newtrunc+0x986/0x1270
Re: [Qemu-devel] virtio_blk: fix defaults for max_hw_sectors and max_segment_size
On Wed, Nov 26 2014 at 3:54pm -0500, Jens Axboe ax...@kernel.dk wrote: On 11/26/2014 01:51 PM, Mike Snitzer wrote: On Wed, Nov 26 2014 at 2:48pm -0500, Jens Axboe ax...@kernel.dk wrote: That code isn't even in mainline, as far as I can tell... Right, it is old RHEL6 code. But I've yet to determine what changed upstream that enables this to just work with a really large max_sectors (I haven't been looking either). Kind of hard for the rest of us to say, since it's triggering a BUG in code we don't have :-) I never asked you or others to weigh in on old RHEL6 code. Once I realized upstream worked even if max_sectors is _really_ high I said sorry for the noise. But while you're here, I wouldn't mind getting your take on virtio-blk setting max_hw_sectors to -1U. As I said in my original reply to mst: it only makes sense to set a really high initial upper bound like that in a driver if that driver goes on to stack an underlying device's limit.
Re: [Qemu-devel] virtio_blk: fix defaults for max_hw_sectors and max_segment_size
On Wed, Nov 26 2014 at 4:53pm -0500, Jens Axboe ax...@kernel.dk wrote: On 11/26/2014 02:51 PM, Mike Snitzer wrote: But while you're here, I wouldn't mind getting your take on virtio-blk setting max_hw_sectors to -1U. As I said in my original reply to mst: it only makes sense to set a really high initial upper bound like that in a driver if that driver goes on to stack an underlying device's limit. -1U should just work, IMHO, there's no reason we should need to cap it at some synthetic value. That said, it seems it should be one of those parameters that should be negotiated up and set appropriately. I'm saying set it to the underlying device's value for max_hw_sectors -- not some synthetic value. So I think we're saying the same thing. But it isn't immediately clear (to me) how that benefits virtio-blk users (obviously they are getting by today). So until that is pinned down I imagine nobody will care to extend the virtio-blk protocol to allow stacking max_hw_sectors and max_sectors up.
Re: [Qemu-devel] virtio_blk: fix defaults for max_hw_sectors and max_segment_size
On Fri, Nov 21 2014 at 4:54am -0500, Christoph Hellwig h...@infradead.org wrote: On Thu, Nov 20, 2014 at 02:00:59PM -0500, Mike Snitzer wrote: virtio_blk incorrectly established -1U as the default for these queue_limits. Set these limits to sane default values to avoid crashing the kernel. But the virtio-blk protocol should probably be extended to allow proper stacking of the disk's limits from the host. This change fixes a crash that was reported when virtio-blk was used to test linux-dm.git commit 604ea90641b4 (dm thin: adjust max_sectors_kb based on thinp blocksize) that will initially set max_sectors to max_hw_sectors and then rounddown to the first power-of-2 factor of the DM thin-pool's blocksize. Basically that commit assumes drivers don't suck when establishing max_hw_sectors so it acted like a canary in the coal mine. Is that a crash in the host or guest? What kind of mishandling did you see? Unless the recent virtio standard changed anything the host should be able to handle our arbitrary limits, and even if it doesn't that something we need to hash out with qemu and the virtio standards folks. Some good news: this guest crash isn't an issue with recent kernels (so upstream, fedora 20, RHEL7, etc aren't impacted -- Jens feel free to drop my virtio_blk patch; even though some of it's limits are clearly broken I'll defer to the virtio_blk developers on the best way forward -- sorry for the noise!). The BUG I saw only seems to impact RHEL6 kernels so far (note to self, actually _test_ on upstream before reporting a crash against upstream!) [root@RHEL-6 ~]# echo 1073741824 /sys/block/vdc/queue/max_sectors_kb [root@RHEL-6 ~]# lvs Message from syslogd@RHEL-6 at Nov 21 15:32:15 ... kernel:Kernel panic - not syncing: Fatal exception Here is the RHEL6 guest crash, just for full disclosure: kernel BUG at fs/direct-io.c:696! invalid opcode: [#1] SMP last sysfs file: /sys/devices/virtual/block/dm-4/dev CPU 0 Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipv6 ext2 dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio libcrc32c dm_mirror dm_region_hash dm_log dm_mod microcode virtio_balloon i2c_piix4 i2c_core virtio_net ext4 jbd2 mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix [last unloaded: speedstep_lib] Pid: 1679, comm: lvs Not tainted 2.6.32 #6 Bochs Bochs RIP: 0010:[811ce336] [811ce336] __blockdev_direct_IO_newtrunc+0x986/0x1270 RSP: 0018:88011a11ba48 EFLAGS: 00010287 RAX: RBX: 8801192fbd28 RCX: 1000 RDX: ea0003b3d218 RSI: 88011aac4300 RDI: 880118572378 RBP: 88011a11bbe8 R08: R09: R10: R11: R12: 8801192fbd00 R13: R14: 880118c3cac0 R15: FS: 7fde78bc37a0() GS:88002820() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 012706f0 CR3: 00011a432000 CR4: 000407f0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process lvs (pid: 1679, threadinfo 88011a11a000, task 8801185a4aa0) Stack: 88011a11bb48 88011a11baa8 8801000c 88011a11bb18 d 88011a11bdc8 88011a11beb8 d 000c1a11baa8 880118c3cb98 18c3ccb8 Call Trace: [811c9e90] ? blkdev_get_block+0x0/0x20 [811cec97] __blockdev_direct_IO+0x77/0xe0 [811c9e90] ? blkdev_get_block+0x0/0x20 [811caf17] blkdev_direct_IO+0x57/0x60 [811c9e90] ? blkdev_get_block+0x0/0x20 [8112619b] generic_file_aio_read+0x6bb/0x700 [811cba60] ? blkdev_get+0x10/0x20 [811cba70] ? blkdev_open+0x0/0xc0 [8118af4f] ? __dentry_open+0x23f/0x360 [811ca2d1] blkdev_aio_read+0x51/0x80 [8118dc6a] do_sync_read+0xfa/0x140 [8109eaf0] ? autoremove_wake_function+0x0/0x40 [811ca22c] ? block_ioctl+0x3c/0x40 [811a34c2] ? vfs_ioctl+0x22/0xa0 [811a3664] ? do_vfs_ioctl+0x84/0x580 [8122cee6] ? security_file_permission+0x16/0x20 [8118e625] vfs_read+0xb5/0x1a0 [8118e761] sys_read+0x51/0x90 [810e5aae] ? __audit_syscall_exit+0x25e/0x290 [8100b072] system_call_fastpath+0x16/0x1b Code: fe ff ff c7 85 fc fe ff ff 00 00 00 00 48 89 95 10 ff ff ff 8b 95 34 ff ff ff e8 46 ac ff ff 3b 85 34 ff ff ff 0f 84 fc 02 00 00 0f 0b eb fe 8b 9d 34 ff ff ff 8b 85 30 ff ff ff 01 d8 85 c0 0f RIP [811ce336] __blockdev_direct_IO_newtrunc+0x986/0x1270 RSP 88011a11ba48 ---[ end trace 73be5dcaf8939399 ]--- Kernel panic - not syncing: Fatal exception
Re: [Qemu-devel] [RFC]QEMU disk I/O limits
On Tue, May 31 2011 at 2:39pm -0400, Anthony Liguori anth...@codemonkey.ws wrote: On 05/31/2011 12:59 PM, Vivek Goyal wrote: On Tue, May 31, 2011 at 09:25:31AM -0500, Anthony Liguori wrote: On 05/31/2011 09:04 AM, Vivek Goyal wrote: On Tue, May 31, 2011 at 08:50:40AM -0500, Anthony Liguori wrote: On 05/31/2011 08:45 AM, Vivek Goyal wrote: On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote: Hello, all, I have prepared to work on a feature called Disk I/O limits for qemu-kvm projeect. This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs. Hi Zhiyong, Why not use kernel blkio controller for this and why reinvent the wheel and implement the feature again in qemu? blkio controller only works for block devices. It doesn't work when using files. So can't we comeup with something to easily determine which device backs up this file? Though that will still not work for NFS backed storage though. Right. Additionally, in QEMU, we can rate limit based on concepts that make sense to a guest. We can limit the actual I/O ops visible to the guest which means that we'll get consistent performance regardless of whether the backing file is qcow2, raw, LVM, or raw over NFS. Are you referring to merging taking place which can change the definition of IOPS as seen by guest? No, with qcow2, it may take multiple real IOPs for what the guest sees as an IOP. That's really the main argument I'm making here. The only entity that knows what a guest IOP corresponds to is QEMU. On the backend, it may end up being a network request, multiple BIOs to physical disks, file access, etc. Couldn't QEMU give a hint to the kernel about the ratio of guest IOP to real IOPs? Or is QEMU blind to the real IOPs that correspond to a guest IOP? If QEMU is trully blind to the amount of real IOPs then couldn't QEMU driven throttling cause physical resources to be oversubscribed (underestimating the backend work it is creating)? Mike
[Qemu-devel] Re: [PATCH] [RFC] Add support for a USB audio device model
On Tue, Sep 14, 2010 at 1:56 AM, H. Peter Anvin h...@zytor.com wrote: On 09/13/2010 06:37 PM, Amos Kong wrote: I've heard wonderful music (guest:win7), but mixed with a litte noise, not so fluent. The following debug msg is normal? Yes, all of that is normal. I talked to malc earlier today, and I think I have a pretty good idea for how to deal with the rate-matching issues; I'm going to try to write it up tomorrow. Hi, Was just wondering if you've been able to put some time to the rate-matching issues? Has this usb-audio patch evolved and I'm just missing it? Thanks for doing this work! Mike