Re: [Qemu-devel] [PATCH v12 4/7] dm: enable synchronous dax

2019-06-11 Thread Mike Snitzer
On Tue, Jun 11 2019 at 12:37pm -0400,
Pankaj Gupta  wrote:

> This patch sets dax device 'DAXDEV_SYNC' flag if all the target
> devices of device mapper support synchrononous DAX. If device
> mapper consists of both synchronous and asynchronous dax devices,
> we don't set 'DAXDEV_SYNC' flag.
> 
> 'dm_table_supports_dax' is refactored to pass 'iterate_devices_fn'
> as argument so that the callers can pass the appropriate functions.
> 
> Suggested-by: Mike Snitzer 
> Signed-off-by: Pankaj Gupta 

Thanks, and for the benefit of others, passing function pointers like
this is perfectly fine IMHO because this code is _not_ in the fast
path.  These methods are only for device creation.

Reviewed-by: Mike Snitzer 



Re: [Qemu-devel] [PATCH v11 4/7] dm: enable synchronous dax

2019-06-11 Thread Mike Snitzer
On Tue, Jun 11 2019 at  9:10am -0400,
Pankaj Gupta  wrote:

> Hi Mike,
> 
> Thanks for the review Please find my reply inline.
> 
> > 
> > dm_table_supports_dax() is called multiple times (from
> > dm_table_set_restrictions and dm_table_determine_type).  It is strange
> > to have a getter have a side-effect of being a setter too.  Overloading
> > like this could get you in trouble in the future.
> > 
> > Are you certain this is what you want?
> 
> I agree with you.
> 
> > 
> > Or would it be better to refactor dm_table_supports_dax() to take an
> > iterate_devices_fn arg and have callers pass the appropriate function?
> > Then have dm_table_set_restrictions() caller do:
> > 
> >  if (dm_table_supports_dax(t, device_synchronous, NULL))
> >set_dax_synchronous(t->md->dax_dev);
> > 
> > (NULL arg implies dm_table_supports_dax() refactoring would take a int
> > *data pointer rather than int type).
> > 
> > Mike
> > 
> 
> I am sending below patch as per your suggestion. Does it look
> near to what you have in mind?

Yes, it does.. just one nit I noticed inlined below.

> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index 350cf0451456..8d89acc8b8c2 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c

...

> @@ -1910,8 +1919,13 @@ void dm_table_set_restrictions(struct dm_table *t, 
> struct request_queue *q,
> }
> blk_queue_write_cache(q, wc, fua);
> 
> -   if (dm_table_supports_dax(t, PAGE_SIZE))
> +   if (dm_table_supports_dax(t, device_supports_dax, _size)) {
> +

No need for an empty newline here ^

> blk_queue_flag_set(QUEUE_FLAG_DAX, q);
> +   if (dm_table_supports_dax(t, device_synchronous, NULL))
> +   set_dax_synchronous(t->md->dax_dev);
> +   }
> else
> blk_queue_flag_clear(QUEUE_FLAG_DAX, q);
> 

Thanks,
Mike



Re: [Qemu-devel] [PATCH v11 4/7] dm: enable synchronous dax

2019-06-10 Thread Mike Snitzer
On Mon, Jun 10 2019 at  5:07am -0400,
Pankaj Gupta  wrote:

>  This patch sets dax device 'DAXDEV_SYNC' flag if all the target
>  devices of device mapper support synchrononous DAX. If device
>  mapper consists of both synchronous and asynchronous dax devices,
>  we don't set 'DAXDEV_SYNC' flag.
> 
> Signed-off-by: Pankaj Gupta 
> ---
>  drivers/md/dm-table.c | 14 ++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index 350cf0451456..c5160d846fe6 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -890,10 +890,17 @@ static int device_supports_dax(struct dm_target *ti, 
> struct dm_dev *dev,
>   start, len);
>  }
>  
> +static int device_synchronous(struct dm_target *ti, struct dm_dev *dev,
> +sector_t start, sector_t len, void *data)
> +{
> + return dax_synchronous(dev->dax_dev);
> +}
> +
>  bool dm_table_supports_dax(struct dm_table *t, int blocksize)
>  {
>   struct dm_target *ti;
>   unsigned i;
> + bool dax_sync = true;
>  
>   /* Ensure that all targets support DAX. */
>   for (i = 0; i < dm_table_get_num_targets(t); i++) {
> @@ -906,7 +913,14 @@ bool dm_table_supports_dax(struct dm_table *t, int 
> blocksize)
>   !ti->type->iterate_devices(ti, device_supports_dax,
>   ))
>   return false;
> +
> + /* Check devices support synchronous DAX */
> + if (dax_sync &&
> + !ti->type->iterate_devices(ti, device_synchronous, NULL))
> + dax_sync = false;
>   }
> + if (dax_sync)
> + set_dax_synchronous(t->md->dax_dev);
>  
>   return true;
>  }
> -- 
> 2.20.1
> 

dm_table_supports_dax() is called multiple times (from
dm_table_set_restrictions and dm_table_determine_type).  It is strange
to have a getter have a side-effect of being a setter too.  Overloading
like this could get you in trouble in the future.

Are you certain this is what you want?

Or would it be better to refactor dm_table_supports_dax() to take an
iterate_devices_fn arg and have callers pass the appropriate function?
Then have dm_table_set_restrictions() caller do:

 if (dm_table_supports_dax(t, device_synchronous, NULL))
  set_dax_synchronous(t->md->dax_dev);

(NULL arg implies dm_table_supports_dax() refactoring would take a int
*data pointer rather than int type).

Mike



Re: [Qemu-devel] virtio_blk: fix defaults for max_hw_sectors and max_segment_size

2014-11-26 Thread Mike Snitzer
On Wed, Nov 26 2014 at  2:48pm -0500,
Jens Axboe ax...@kernel.dk wrote:

 On 11/21/2014 08:49 AM, Mike Snitzer wrote:
  On Fri, Nov 21 2014 at  4:54am -0500,
  Christoph Hellwig h...@infradead.org wrote:
  
  On Thu, Nov 20, 2014 at 02:00:59PM -0500, Mike Snitzer wrote:
  virtio_blk incorrectly established -1U as the default for these
  queue_limits.  Set these limits to sane default values to avoid crashing
  the kernel.  But the virtio-blk protocol should probably be extended to
  allow proper stacking of the disk's limits from the host.
 
  This change fixes a crash that was reported when virtio-blk was used to
  test linux-dm.git commit 604ea90641b4 (dm thin: adjust max_sectors_kb
  based on thinp blocksize) that will initially set max_sectors to
  max_hw_sectors and then rounddown to the first power-of-2 factor of the
  DM thin-pool's blocksize.  Basically that commit assumes drivers don't
  suck when establishing max_hw_sectors so it acted like a canary in the
  coal mine.
 
  Is that a crash in the host or guest?  What kind of mishandling did you
  see?  Unless the recent virtio standard changed anything the host
  should be able to handle our arbitrary limits, and even if it doesn't
  that something we need to hash out with qemu and the virtio standards
  folks.
  
  Some good news: this guest crash isn't an issue with recent kernels (so
  upstream, fedora 20, RHEL7, etc aren't impacted -- Jens feel free to
  drop my virtio_blk patch; even though some of it's limits are clearly
  broken I'll defer to the virtio_blk developers on the best way forward
  -- sorry for the noise!).
  
  The BUG I saw only seems to impact RHEL6 kernels so far (note to self,
  actually _test_ on upstream before reporting a crash against upstream!)
  
  [root@RHEL-6 ~]# echo 1073741824  /sys/block/vdc/queue/max_sectors_kb
  [root@RHEL-6 ~]# lvs
  
  Message from syslogd@RHEL-6 at Nov 21 15:32:15 ...
   kernel:Kernel panic - not syncing: Fatal exception
  
  Here is the RHEL6 guest crash, just for full disclosure:
  
  kernel BUG at fs/direct-io.c:696!
  invalid opcode:  [#1] SMP
  last sysfs file: /sys/devices/virtual/block/dm-4/dev
  CPU 0
  Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipv6 ext2 
  dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio libcrc32c dm_mirror 
  dm_region_hash dm_log dm_mod microcode virtio_balloon i2c_piix4 i2c_core 
  virtio_net ext4 jbd2 mbcache virtio_blk virtio_pci virtio_ring virtio 
  pata_acpi ata_generic ata_piix [last unloaded: speedstep_lib]
  
  Pid: 1679, comm: lvs Not tainted 2.6.32 #6 Bochs Bochs
  RIP: 0010:[811ce336]  [811ce336] 
  __blockdev_direct_IO_newtrunc+0x986/0x1270
  RSP: 0018:88011a11ba48  EFLAGS: 00010287
  RAX:  RBX: 8801192fbd28 RCX: 1000
  RDX: ea0003b3d218 RSI: 88011aac4300 RDI: 880118572378
  RBP: 88011a11bbe8 R08:  R09: 
  R10:  R11:  R12: 8801192fbd00
  R13:  R14: 880118c3cac0 R15: 
  FS:  7fde78bc37a0() GS:88002820() knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2: 012706f0 CR3: 00011a432000 CR4: 000407f0
  DR0:  DR1:  DR2: 
  DR3:  DR6: 0ff0 DR7: 0400
  Process lvs (pid: 1679, threadinfo 88011a11a000, task 8801185a4aa0)
  Stack:
   88011a11bb48 88011a11baa8 8801000c 88011a11bb18
  d   88011a11bdc8 88011a11beb8
  d 000c1a11baa8 880118c3cb98  18c3ccb8
  Call Trace:
   [811c9e90] ? blkdev_get_block+0x0/0x20
   [811cec97] __blockdev_direct_IO+0x77/0xe0
   [811c9e90] ? blkdev_get_block+0x0/0x20
   [811caf17] blkdev_direct_IO+0x57/0x60
   [811c9e90] ? blkdev_get_block+0x0/0x20
   [8112619b] generic_file_aio_read+0x6bb/0x700
   [811cba60] ? blkdev_get+0x10/0x20
   [811cba70] ? blkdev_open+0x0/0xc0
   [8118af4f] ? __dentry_open+0x23f/0x360
   [811ca2d1] blkdev_aio_read+0x51/0x80
   [8118dc6a] do_sync_read+0xfa/0x140
   [8109eaf0] ? autoremove_wake_function+0x0/0x40
   [811ca22c] ? block_ioctl+0x3c/0x40
   [811a34c2] ? vfs_ioctl+0x22/0xa0
   [811a3664] ? do_vfs_ioctl+0x84/0x580
   [8122cee6] ? security_file_permission+0x16/0x20
   [8118e625] vfs_read+0xb5/0x1a0
   [8118e761] sys_read+0x51/0x90
   [810e5aae] ? __audit_syscall_exit+0x25e/0x290
   [8100b072] system_call_fastpath+0x16/0x1b
  Code: fe ff ff c7 85 fc fe ff ff 00 00 00 00 48 89 95 10 ff ff ff 8b 95 34 
  ff ff ff e8 46 ac ff ff 3b 85 34 ff ff ff 0f 84 fc 02 00 00 0f 0b eb fe 
  8b 9d 34 ff ff ff 8b 85 30 ff ff ff 01 d8 85 c0 0f
  RIP  [811ce336] __blockdev_direct_IO_newtrunc+0x986/0x1270

Re: [Qemu-devel] virtio_blk: fix defaults for max_hw_sectors and max_segment_size

2014-11-26 Thread Mike Snitzer
On Wed, Nov 26 2014 at  3:54pm -0500,
Jens Axboe ax...@kernel.dk wrote:

 On 11/26/2014 01:51 PM, Mike Snitzer wrote:
  On Wed, Nov 26 2014 at  2:48pm -0500,
  Jens Axboe ax...@kernel.dk wrote:
  
 
  That code isn't even in mainline, as far as I can tell...
  
  Right, it is old RHEL6 code.
  
  But I've yet to determine what changed upstream that enables this to
  just work with a really large max_sectors (I haven't been looking
  either).
 
 Kind of hard for the rest of us to say, since it's triggering a BUG in
 code we don't have :-)

I never asked you or others to weigh in on old RHEL6 code.  Once I
realized upstream worked even if max_sectors is _really_ high I said
sorry for the noise.

But while you're here, I wouldn't mind getting your take on virtio-blk
setting max_hw_sectors to -1U.

As I said in my original reply to mst: it only makes sense to set a
really high initial upper bound like that in a driver if that driver
goes on to stack an underlying device's limit.



Re: [Qemu-devel] virtio_blk: fix defaults for max_hw_sectors and max_segment_size

2014-11-26 Thread Mike Snitzer
On Wed, Nov 26 2014 at  4:53pm -0500,
Jens Axboe ax...@kernel.dk wrote:

 On 11/26/2014 02:51 PM, Mike Snitzer wrote:
  
  But while you're here, I wouldn't mind getting your take on virtio-blk
  setting max_hw_sectors to -1U.
  
  As I said in my original reply to mst: it only makes sense to set a
  really high initial upper bound like that in a driver if that driver
  goes on to stack an underlying device's limit.
 
 -1U should just work, IMHO, there's no reason we should need to cap it
 at some synthetic value.  That said, it seems it should be one of
 those parameters that should be negotiated up and set appropriately. 

I'm saying set it to the underlying device's value for max_hw_sectors --
not some synthetic value.  So I think we're saying the same thing.

But it isn't immediately clear (to me) how that benefits virtio-blk
users (obviously they are getting by today).  So until that is pinned
down I imagine nobody will care to extend the virtio-blk protocol to
allow stacking max_hw_sectors and max_sectors up.



Re: [Qemu-devel] virtio_blk: fix defaults for max_hw_sectors and max_segment_size

2014-11-21 Thread Mike Snitzer
On Fri, Nov 21 2014 at  4:54am -0500,
Christoph Hellwig h...@infradead.org wrote:

 On Thu, Nov 20, 2014 at 02:00:59PM -0500, Mike Snitzer wrote:
  virtio_blk incorrectly established -1U as the default for these
  queue_limits.  Set these limits to sane default values to avoid crashing
  the kernel.  But the virtio-blk protocol should probably be extended to
  allow proper stacking of the disk's limits from the host.
  
  This change fixes a crash that was reported when virtio-blk was used to
  test linux-dm.git commit 604ea90641b4 (dm thin: adjust max_sectors_kb
  based on thinp blocksize) that will initially set max_sectors to
  max_hw_sectors and then rounddown to the first power-of-2 factor of the
  DM thin-pool's blocksize.  Basically that commit assumes drivers don't
  suck when establishing max_hw_sectors so it acted like a canary in the
  coal mine.
 
 Is that a crash in the host or guest?  What kind of mishandling did you
 see?  Unless the recent virtio standard changed anything the host
 should be able to handle our arbitrary limits, and even if it doesn't
 that something we need to hash out with qemu and the virtio standards
 folks.

Some good news: this guest crash isn't an issue with recent kernels (so
upstream, fedora 20, RHEL7, etc aren't impacted -- Jens feel free to
drop my virtio_blk patch; even though some of it's limits are clearly
broken I'll defer to the virtio_blk developers on the best way forward
-- sorry for the noise!).

The BUG I saw only seems to impact RHEL6 kernels so far (note to self,
actually _test_ on upstream before reporting a crash against upstream!)

[root@RHEL-6 ~]# echo 1073741824  /sys/block/vdc/queue/max_sectors_kb
[root@RHEL-6 ~]# lvs

Message from syslogd@RHEL-6 at Nov 21 15:32:15 ...
 kernel:Kernel panic - not syncing: Fatal exception

Here is the RHEL6 guest crash, just for full disclosure:

kernel BUG at fs/direct-io.c:696!
invalid opcode:  [#1] SMP
last sysfs file: /sys/devices/virtual/block/dm-4/dev
CPU 0
Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipv6 ext2 
dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio libcrc32c dm_mirror 
dm_region_hash dm_log dm_mod microcode virtio_balloon i2c_piix4 i2c_core 
virtio_net ext4 jbd2 mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi 
ata_generic ata_piix [last unloaded: speedstep_lib]

Pid: 1679, comm: lvs Not tainted 2.6.32 #6 Bochs Bochs
RIP: 0010:[811ce336]  [811ce336] 
__blockdev_direct_IO_newtrunc+0x986/0x1270
RSP: 0018:88011a11ba48  EFLAGS: 00010287
RAX:  RBX: 8801192fbd28 RCX: 1000
RDX: ea0003b3d218 RSI: 88011aac4300 RDI: 880118572378
RBP: 88011a11bbe8 R08:  R09: 
R10:  R11:  R12: 8801192fbd00
R13:  R14: 880118c3cac0 R15: 
FS:  7fde78bc37a0() GS:88002820() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 012706f0 CR3: 00011a432000 CR4: 000407f0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process lvs (pid: 1679, threadinfo 88011a11a000, task 8801185a4aa0)
Stack:
 88011a11bb48 88011a11baa8 8801000c 88011a11bb18
d   88011a11bdc8 88011a11beb8
d 000c1a11baa8 880118c3cb98  18c3ccb8
Call Trace:
 [811c9e90] ? blkdev_get_block+0x0/0x20
 [811cec97] __blockdev_direct_IO+0x77/0xe0
 [811c9e90] ? blkdev_get_block+0x0/0x20
 [811caf17] blkdev_direct_IO+0x57/0x60
 [811c9e90] ? blkdev_get_block+0x0/0x20
 [8112619b] generic_file_aio_read+0x6bb/0x700
 [811cba60] ? blkdev_get+0x10/0x20
 [811cba70] ? blkdev_open+0x0/0xc0
 [8118af4f] ? __dentry_open+0x23f/0x360
 [811ca2d1] blkdev_aio_read+0x51/0x80
 [8118dc6a] do_sync_read+0xfa/0x140
 [8109eaf0] ? autoremove_wake_function+0x0/0x40
 [811ca22c] ? block_ioctl+0x3c/0x40
 [811a34c2] ? vfs_ioctl+0x22/0xa0
 [811a3664] ? do_vfs_ioctl+0x84/0x580
 [8122cee6] ? security_file_permission+0x16/0x20
 [8118e625] vfs_read+0xb5/0x1a0
 [8118e761] sys_read+0x51/0x90
 [810e5aae] ? __audit_syscall_exit+0x25e/0x290
 [8100b072] system_call_fastpath+0x16/0x1b
Code: fe ff ff c7 85 fc fe ff ff 00 00 00 00 48 89 95 10 ff ff ff 8b 95 34 ff 
ff ff e8 46 ac ff ff 3b 85 34 ff ff ff 0f 84 fc 02 00 00 0f 0b eb fe 8b 9d 34 
ff ff ff 8b 85 30 ff ff ff 01 d8 85 c0 0f
RIP  [811ce336] __blockdev_direct_IO_newtrunc+0x986/0x1270
 RSP 88011a11ba48
---[ end trace 73be5dcaf8939399 ]---
Kernel panic - not syncing: Fatal exception



Re: [Qemu-devel] [RFC]QEMU disk I/O limits

2011-05-31 Thread Mike Snitzer
On Tue, May 31 2011 at  2:39pm -0400,
Anthony Liguori anth...@codemonkey.ws wrote:

 On 05/31/2011 12:59 PM, Vivek Goyal wrote:
 On Tue, May 31, 2011 at 09:25:31AM -0500, Anthony Liguori wrote:
 On 05/31/2011 09:04 AM, Vivek Goyal wrote:
 On Tue, May 31, 2011 at 08:50:40AM -0500, Anthony Liguori wrote:
 On 05/31/2011 08:45 AM, Vivek Goyal wrote:
 On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
 Hello, all,
 
  I have prepared to work on a feature called Disk I/O limits for 
  qemu-kvm projeect.
  This feature will enable the user to cap disk I/O amount
 performed by a VM.It is important for some storage resources to be
 shared among multi-VMs. As you've known, if some of VMs are doing
 excessive disk I/O, they will hurt the performance of other VMs. 
 
 
 Hi Zhiyong,
 
 Why not use kernel blkio controller for this and why reinvent the wheel
 and implement the feature again in qemu?
 
 blkio controller only works for block devices.  It doesn't work when
 using files.
 
 So can't we comeup with something to easily determine which device backs
 up this file? Though that will still not work for NFS backed storage
 though.
 
 Right.
 
 Additionally, in QEMU, we can rate limit based on concepts that make
 sense to a guest.  We can limit the actual I/O ops visible to the
 guest which means that we'll get consistent performance regardless
 of whether the backing file is qcow2, raw, LVM, or raw over NFS.
 
 
 Are you referring to merging taking place which can change the definition
 of IOPS as seen by guest?
 
 No, with qcow2, it may take multiple real IOPs for what the guest
 sees as an IOP.
 
 That's really the main argument I'm making here.  The only entity
 that knows what a guest IOP corresponds to is QEMU.  On the backend,
 it may end up being a network request, multiple BIOs to physical
 disks, file access, etc.

Couldn't QEMU give a hint to the kernel about the ratio of guest IOP to
real IOPs?  Or is QEMU blind to the real IOPs that correspond to a guest
IOP?

If QEMU is trully blind to the amount of real IOPs then couldn't QEMU
driven throttling cause physical resources to be oversubscribed
(underestimating the backend work it is creating)?

Mike



[Qemu-devel] Re: [PATCH] [RFC] Add support for a USB audio device model

2010-10-14 Thread Mike Snitzer
On Tue, Sep 14, 2010 at 1:56 AM, H. Peter Anvin h...@zytor.com wrote:
 On 09/13/2010 06:37 PM, Amos Kong wrote:
 I've heard wonderful music (guest:win7), but mixed with a litte noise, not 
 so fluent.
 The following debug msg is normal?

 Yes, all of that is normal.  I talked to malc earlier today, and I think
 I have a pretty good idea for how to deal with the rate-matching issues;
 I'm going to try to write it up tomorrow.

Hi,

Was just wondering if you've been able to put some time to the
rate-matching issues?

Has this usb-audio patch evolved and I'm just missing it?

Thanks for doing this work!
Mike