Re: [dm-devel] [PATCH v4 0/29] block: Make blkdev_get_by_*() return handle

2023-09-29 Thread Jens Axboe
On Wed, Sep 27, 2023 at 3:34?AM Jan Kara  wrote:
>
> Hello,
>
> this is a v3 of the patch series which implements the idea of 
> blkdev_get_by_*()

v4?

> calls returning bdev_handle which is then passed to blkdev_put() [1]. This
> makes the get and put calls for bdevs more obviously matching and allows us to
> propagate context from get to put without having to modify all the users
> (again!). In particular I need to propagate used open flags to blkdev_put() to
> be able count writeable opens and add support for blocking writes to mounted
> block devices. I'll send that series separately.
>
> The series is based on Btrfs tree's for-next branch [2] as of today as the
> series depends on Christoph's changes to btrfs device handling.  Patches have
> passed some reasonable testing - I've tested block changes, md, dm, bcache,
> xfs, btrfs, ext4, swap. More testing or review is always welcome. Thanks! I've
> pushed out the full branch to:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git bdev_handle
>
> to ease review / testing. Christian, can you pull the patches to your tree
> to get some exposure in linux-next as well? Thanks!

For the block bits:

Acked-by: Jens Axboe 

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] DM brokeness with NOWAIT

2023-09-15 Thread Jens Axboe
On 9/15/23 1:13 PM, Mikulas Patocka wrote:
> 
> 
> On Fri, 15 Sep 2023, Mike Snitzer wrote:
> 
>> On Fri, Sep 15 2023 at 12:14P -0400,
>> Jens Axboe  wrote:
>>
>>> On 9/15/23 10:04 AM, Jens Axboe wrote:
>>>> Hi,
>>>>
>>>> Threw some db traffic into my testing mix, and that ended in tears
>>>> very quickly:
>>>>
>>>> CPU: 7 PID: 49609 Comm: ringbuf-read.t Tainted: GW  
>>>> 6.6.0-rc1-g39956d2dcd81 #129
>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
>>>> 1.16.2-debian-1.16.2-1 04/01/2014
>>>> Call Trace:
>>>>  
>>>>  dump_stack_lvl+0x11d/0x1b0
>>>>  __might_resched+0x3c3/0x5e0
>>>>  ? preempt_count_sub+0x150/0x150
>>>>  mempool_alloc+0x1e2/0x390
>>>>  ? sanity_check_pinned_pages+0x23/0x1010
>>>>  ? mempool_resize+0x7d0/0x7d0
>>>>  bio_alloc_bioset+0x417/0x8c0
>>>>  ? bvec_alloc+0x200/0x200
>>>>  ? __gup_device_huge+0x900/0x900
>>>>  bio_alloc_clone+0x53/0x100
>>>>  dm_submit_bio+0x27f/0x1a20
>>>>  ? lock_release+0x4b7/0x670
>>>>  ? pin_user_pages_fast+0xb6/0xf0
>>>>  ? blk_try_enter_queue+0x1a0/0x4d0
>>>>  ? dm_dax_direct_access+0x260/0x260
>>>>  ? rcu_is_watching+0x12/0xb0
>>>>  ? blk_try_enter_queue+0x1cc/0x4d0
>>>>  __submit_bio+0x239/0x310
>>>>  ? __bio_queue_enter+0x700/0x700
>>>>  ? kvm_clock_get_cycles+0x40/0x60
>>>>  ? ktime_get+0x285/0x470
>>>>  submit_bio_noacct_nocheck+0x4d9/0xb80
>>>>  ? should_fail_request+0x80/0x80
>>>>  ? preempt_count_sub+0x150/0x150
>>>>  ? folio_flags+0x6c/0x1e0
>>>>  submit_bio_noacct+0x53e/0x1b30
>>>>  blkdev_direct_IO.part.0+0x833/0x1810
>>>>  ? rcu_is_watching+0x12/0xb0
>>>>  ? lock_release+0x4b7/0x670
>>>>  ? blkdev_read_iter+0x40d/0x530
>>>>  ? reacquire_held_locks+0x4e0/0x4e0
>>>>  ? __blkdev_direct_IO_simple+0x780/0x780
>>>>  ? rcu_is_watching+0x12/0xb0
>>>>  ? __mark_inode_dirty+0x297/0xd50
>>>>  ? preempt_count_add+0x72/0x140
>>>>  blkdev_read_iter+0x2a4/0x530
>>>>  ? blkdev_write_iter+0xc40/0xc40
>>>>  io_read+0x369/0x1490
>>>>  ? rcu_is_watching+0x12/0xb0
>>>>  ? io_writev_prep_async+0x260/0x260
>>>>  ? __fget_files+0x279/0x410
>>>>  ? rcu_is_watching+0x12/0xb0
>>>>  io_issue_sqe+0x18a/0xd90
>>>>  io_submit_sqes+0x970/0x1ed0
>>>>  __do_sys_io_uring_enter+0x14d4/0x2650
>>>>  ? io_submit_sqes+0x1ed0/0x1ed0
>>>>  ? rcu_is_watching+0x12/0xb0
>>>>  ? __do_sys_io_uring_register+0x3f6/0x2190
>>>>  ? io_req_caches_free+0x500/0x500
>>>>  ? ksys_mmap_pgoff+0x85/0x5b0
>>>>  ? rcu_is_watching+0x12/0xb0
>>>>  ? trace_irq_enable.constprop.0+0xd0/0x100
>>>>  do_syscall_64+0x39/0xb0
>>>>  entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>
>>>> which seems to demonstrate a misunderstanding on what REQ_NOWAIT is
>>>> about. In particulary, it seems to assume you can then submit with
>>>> atomic context? DM does an rcu_read_lock() and happily proceeds to
>>>> attempt to submit IO under RCU being disabled.
>>>
>>> Did a quick check to see where this came from, and it got added with:
>>>
>>> commit 563a225c9fd207326c2a2af9d59b4097cb31ce70
>>> Author: Mike Snitzer 
>>> Date:   Sat Mar 26 21:08:36 2022 -0400
>>>
>>> dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio
>>>
>>> which conspiciously doesn't include any numbers on why this is necessary
>>> or a good thing, and notably probably wasn't tested? This landed in 5.19
>>> fwiw.
>>
>> Don't recall what I was thinking, and I clearly didn't properly test
>> either... should've consulted Mikulas.  Sorry for the trouble.
>>
>> Would you like to send a formal patch with your Signed-off-by and I'll
>> mark it for stable@ and get it to Linus?
>>
>> Mike
> 
> We could revert that commit or we could change the all the remaining 
> GFP_NOIOs in drivers/md/dm.c to "bio_opf & REQ_NOWAIT ? GFP_NOWAIT : 
> GFP_NOIO". I'm not sure which one of these possibilities is better.
> 
> Converting GFP_NOIOs would complicate dm.c a bit, but it would make sure 
> that requests with REQ_NOWAIT don't really sleep. What do you think?

I've sent out a patch for this now. Getting rid of SRCU for NOWAIT may
indeed make sense, but I think that should get introduced separately and
with actual numbers demonstrating it is a win and by how much. IMHO it
doesn't necessarily need to be a big win, the main benefit here would be
that NOWAIT is supported a lot better.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH] dm: don't attempt to queue IO under RCU protection

2023-09-15 Thread Jens Axboe
dm looks up the table for IO based on the request type, with an
assumption that if the request is marked REQ_NOWAIT, it's fine to
attempt to submit that IO while under RCU read lock protection. This
is not OK, as REQ_NOWAIT just means that we should not be sleeping
waiting on other IO, it does not mean that we can't potentially
schedule.

A simple test case demonstrates this quite nicely:

int main(int argc, char *argv[])
{
struct iovec iov;
int fd;

fd = open("/dev/dm-0", O_RDONLY | O_DIRECT);
posix_memalign(_base, 4096, 4096);
iov.iov_len = 4096;
preadv2(fd, , 1, 0, RWF_NOWAIT);
return 0;
}

which will instantly spew:

BUG: sleeping function called from invalid context at 
include/linux/sched/mm.h:306
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 5580, name: dm-nowait
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
INFO: lockdep is turned off.
CPU: 7 PID: 5580 Comm: dm-nowait Not tainted 6.6.0-rc1-g39956d2dcd81 #132
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 
 dump_stack_lvl+0x11d/0x1b0
 __might_resched+0x3c3/0x5e0
 ? preempt_count_sub+0x150/0x150
 mempool_alloc+0x1e2/0x390
 ? mempool_resize+0x7d0/0x7d0
 ? lock_sync+0x190/0x190
 ? lock_release+0x4b7/0x670
 ? internal_get_user_pages_fast+0x868/0x2d40
 bio_alloc_bioset+0x417/0x8c0
 ? bvec_alloc+0x200/0x200
 ? internal_get_user_pages_fast+0xb8c/0x2d40
 bio_alloc_clone+0x53/0x100
 dm_submit_bio+0x27f/0x1a20
 ? lock_release+0x4b7/0x670
 ? blk_try_enter_queue+0x1a0/0x4d0
 ? dm_dax_direct_access+0x260/0x260
 ? rcu_is_watching+0x12/0xb0
 ? blk_try_enter_queue+0x1cc/0x4d0
 __submit_bio+0x239/0x310
 ? __bio_queue_enter+0x700/0x700
 ? kvm_clock_get_cycles+0x40/0x60
 ? ktime_get+0x285/0x470
 submit_bio_noacct_nocheck+0x4d9/0xb80
 ? should_fail_request+0x80/0x80
 ? preempt_count_sub+0x150/0x150
 ? lock_release+0x4b7/0x670
 ? __bio_add_page+0x143/0x2d0
 ? iov_iter_revert+0x27/0x360
 submit_bio_noacct+0x53e/0x1b30
 submit_bio_wait+0x10a/0x230
 ? submit_bio_wait_endio+0x40/0x40
 __blkdev_direct_IO_simple+0x4f8/0x780
 ? blkdev_bio_end_io+0x4c0/0x4c0
 ? stack_trace_save+0x90/0xc0
 ? __bio_clone+0x3c0/0x3c0
 ? lock_release+0x4b7/0x670
 ? lock_sync+0x190/0x190
 ? atime_needs_update+0x3bf/0x7e0
 ? timestamp_truncate+0x21b/0x2d0
 ? inode_owner_or_capable+0x240/0x240
 blkdev_direct_IO.part.0+0x84a/0x1810
 ? rcu_is_watching+0x12/0xb0
 ? lock_release+0x4b7/0x670
 ? blkdev_read_iter+0x40d/0x530
 ? reacquire_held_locks+0x4e0/0x4e0
 ? __blkdev_direct_IO_simple+0x780/0x780
 ? rcu_is_watching+0x12/0xb0
 ? __mark_inode_dirty+0x297/0xd50
 ? preempt_count_add+0x72/0x140
 blkdev_read_iter+0x2a4/0x530
 do_iter_readv_writev+0x2f2/0x3c0
 ? generic_copy_file_range+0x1d0/0x1d0
 ? fsnotify_perm.part.0+0x25d/0x630
 ? security_file_permission+0xd8/0x100
 do_iter_read+0x31b/0x880
 ? import_iovec+0x10b/0x140
 vfs_readv+0x12d/0x1a0
 ? vfs_iter_read+0xb0/0xb0
 ? rcu_is_watching+0x12/0xb0
 ? rcu_is_watching+0x12/0xb0
 ? lock_release+0x4b7/0x670
 do_preadv+0x1b3/0x260
 ? do_readv+0x370/0x370
 __x64_sys_preadv2+0xef/0x150
 do_syscall_64+0x39/0xb0
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f5af41ad806
Code: 41 54 41 89 fc 55 44 89 c5 53 48 89 cb 48 83 ec 18 80 3d e4 dd 0d 00 00 
74 7a 45 89 c1 49 89 ca 45 31 c0 b8 47 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 
be 00 00 00 48 85 c0 79 4a 48 8b 0d da 55
RSP: 002b:7ffd3145c7f0 EFLAGS: 0246 ORIG_RAX: 0147
RAX: ffda RBX:  RCX: 7f5af41ad806
RDX: 0001 RSI: 7ffd3145c850 RDI: 0003
RBP: 0008 R08:  R09: 0008
R10:  R11: 0246 R12: 0003
R13: 7ffd3145c850 R14: 55f5f0431dd8 R15: 0001
 

where in fact it is dm itself that attempts to allocate a bio clone with
GFP_NOIO under the rcu read lock, regardless of the request type.

Fix this by getting rid of the special casing for REQ_NOWAIT, and just
use the normal SRCU protected table lookup. Get rid of the bio based
table locking helpers at the same time, as they are now unused.

Cc: sta...@vger.kernel.org
Fixes: 563a225c9fd2 ("dm: introduce dm_{get,put}_live_table_bio called from 
dm_submit_bio")
Signed-off-by: Jens Axboe 

---

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index f0f118ab20fa..64a1f306c96c 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -715,24 +715,6 @@ static void dm_put_live_table_fast(struct mapped_device 
*md) __releases(RCU)
rcu_read_unlock();
 }
 
-static inline struct dm_table *dm_get_live_table_bio(struct mapped_device *md,
-   int *srcu_idx, blk_opf_t bio_opf)
-{
-   if (bio_opf & REQ_NOWAIT)
-   return dm_get_live_table_fast(md);
-   else
-   return dm_get_live_table(md, srcu_idx);
-}
-
-static inline void dm_put_live_table_bio(struct ma

Re: [dm-devel] DM brokeness with NOWAIT

2023-09-15 Thread Jens Axboe
On 9/15/23 12:54 PM, Mike Snitzer wrote:
> On Fri, Sep 15 2023 at 12:14P -0400,
> Jens Axboe  wrote:
> 
>> On 9/15/23 10:04 AM, Jens Axboe wrote:
>>> Hi,
>>>
>>> Threw some db traffic into my testing mix, and that ended in tears
>>> very quickly:
>>>
>>> CPU: 7 PID: 49609 Comm: ringbuf-read.t Tainted: GW  
>>> 6.6.0-rc1-g39956d2dcd81 #129
>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
>>> 1.16.2-debian-1.16.2-1 04/01/2014
>>> Call Trace:
>>>  
>>>  dump_stack_lvl+0x11d/0x1b0
>>>  __might_resched+0x3c3/0x5e0
>>>  ? preempt_count_sub+0x150/0x150
>>>  mempool_alloc+0x1e2/0x390
>>>  ? sanity_check_pinned_pages+0x23/0x1010
>>>  ? mempool_resize+0x7d0/0x7d0
>>>  bio_alloc_bioset+0x417/0x8c0
>>>  ? bvec_alloc+0x200/0x200
>>>  ? __gup_device_huge+0x900/0x900
>>>  bio_alloc_clone+0x53/0x100
>>>  dm_submit_bio+0x27f/0x1a20
>>>  ? lock_release+0x4b7/0x670
>>>  ? pin_user_pages_fast+0xb6/0xf0
>>>  ? blk_try_enter_queue+0x1a0/0x4d0
>>>  ? dm_dax_direct_access+0x260/0x260
>>>  ? rcu_is_watching+0x12/0xb0
>>>  ? blk_try_enter_queue+0x1cc/0x4d0
>>>  __submit_bio+0x239/0x310
>>>  ? __bio_queue_enter+0x700/0x700
>>>  ? kvm_clock_get_cycles+0x40/0x60
>>>  ? ktime_get+0x285/0x470
>>>  submit_bio_noacct_nocheck+0x4d9/0xb80
>>>  ? should_fail_request+0x80/0x80
>>>  ? preempt_count_sub+0x150/0x150
>>>  ? folio_flags+0x6c/0x1e0
>>>  submit_bio_noacct+0x53e/0x1b30
>>>  blkdev_direct_IO.part.0+0x833/0x1810
>>>  ? rcu_is_watching+0x12/0xb0
>>>  ? lock_release+0x4b7/0x670
>>>  ? blkdev_read_iter+0x40d/0x530
>>>  ? reacquire_held_locks+0x4e0/0x4e0
>>>  ? __blkdev_direct_IO_simple+0x780/0x780
>>>  ? rcu_is_watching+0x12/0xb0
>>>  ? __mark_inode_dirty+0x297/0xd50
>>>  ? preempt_count_add+0x72/0x140
>>>  blkdev_read_iter+0x2a4/0x530
>>>  ? blkdev_write_iter+0xc40/0xc40
>>>  io_read+0x369/0x1490
>>>  ? rcu_is_watching+0x12/0xb0
>>>  ? io_writev_prep_async+0x260/0x260
>>>  ? __fget_files+0x279/0x410
>>>  ? rcu_is_watching+0x12/0xb0
>>>  io_issue_sqe+0x18a/0xd90
>>>  io_submit_sqes+0x970/0x1ed0
>>>  __do_sys_io_uring_enter+0x14d4/0x2650
>>>  ? io_submit_sqes+0x1ed0/0x1ed0
>>>  ? rcu_is_watching+0x12/0xb0
>>>  ? __do_sys_io_uring_register+0x3f6/0x2190
>>>  ? io_req_caches_free+0x500/0x500
>>>  ? ksys_mmap_pgoff+0x85/0x5b0
>>>  ? rcu_is_watching+0x12/0xb0
>>>  ? trace_irq_enable.constprop.0+0xd0/0x100
>>>  do_syscall_64+0x39/0xb0
>>>  entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>
>>> which seems to demonstrate a misunderstanding on what REQ_NOWAIT is
>>> about. In particulary, it seems to assume you can then submit with
>>> atomic context? DM does an rcu_read_lock() and happily proceeds to
>>> attempt to submit IO under RCU being disabled.
>>
>> Did a quick check to see where this came from, and it got added with:
>>
>> commit 563a225c9fd207326c2a2af9d59b4097cb31ce70
>> Author: Mike Snitzer 
>> Date:   Sat Mar 26 21:08:36 2022 -0400
>>
>> dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio
>>
>> which conspiciously doesn't include any numbers on why this is necessary
>> or a good thing, and notably probably wasn't tested? This landed in 5.19
>> fwiw.
> 
> Don't recall what I was thinking, and I clearly didn't properly test
> either... should've consulted Mikulas.  Sorry for the trouble.
> 
> Would you like to send a formal patch with your Signed-off-by and I'll
> mark it for stable@ and get it to Linus?

Sure, I can do that.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] DM brokeness with NOWAIT

2023-09-15 Thread Jens Axboe
On 9/15/23 10:04 AM, Jens Axboe wrote:
> Hi,
> 
> Threw some db traffic into my testing mix, and that ended in tears
> very quickly:
> 
> CPU: 7 PID: 49609 Comm: ringbuf-read.t Tainted: GW  
> 6.6.0-rc1-g39956d2dcd81 #129
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> 1.16.2-debian-1.16.2-1 04/01/2014
> Call Trace:
>  
>  dump_stack_lvl+0x11d/0x1b0
>  __might_resched+0x3c3/0x5e0
>  ? preempt_count_sub+0x150/0x150
>  mempool_alloc+0x1e2/0x390
>  ? sanity_check_pinned_pages+0x23/0x1010
>  ? mempool_resize+0x7d0/0x7d0
>  bio_alloc_bioset+0x417/0x8c0
>  ? bvec_alloc+0x200/0x200
>  ? __gup_device_huge+0x900/0x900
>  bio_alloc_clone+0x53/0x100
>  dm_submit_bio+0x27f/0x1a20
>  ? lock_release+0x4b7/0x670
>  ? pin_user_pages_fast+0xb6/0xf0
>  ? blk_try_enter_queue+0x1a0/0x4d0
>  ? dm_dax_direct_access+0x260/0x260
>  ? rcu_is_watching+0x12/0xb0
>  ? blk_try_enter_queue+0x1cc/0x4d0
>  __submit_bio+0x239/0x310
>  ? __bio_queue_enter+0x700/0x700
>  ? kvm_clock_get_cycles+0x40/0x60
>  ? ktime_get+0x285/0x470
>  submit_bio_noacct_nocheck+0x4d9/0xb80
>  ? should_fail_request+0x80/0x80
>  ? preempt_count_sub+0x150/0x150
>  ? folio_flags+0x6c/0x1e0
>  submit_bio_noacct+0x53e/0x1b30
>  blkdev_direct_IO.part.0+0x833/0x1810
>  ? rcu_is_watching+0x12/0xb0
>  ? lock_release+0x4b7/0x670
>  ? blkdev_read_iter+0x40d/0x530
>  ? reacquire_held_locks+0x4e0/0x4e0
>  ? __blkdev_direct_IO_simple+0x780/0x780
>  ? rcu_is_watching+0x12/0xb0
>  ? __mark_inode_dirty+0x297/0xd50
>  ? preempt_count_add+0x72/0x140
>  blkdev_read_iter+0x2a4/0x530
>  ? blkdev_write_iter+0xc40/0xc40
>  io_read+0x369/0x1490
>  ? rcu_is_watching+0x12/0xb0
>  ? io_writev_prep_async+0x260/0x260
>  ? __fget_files+0x279/0x410
>  ? rcu_is_watching+0x12/0xb0
>  io_issue_sqe+0x18a/0xd90
>  io_submit_sqes+0x970/0x1ed0
>  __do_sys_io_uring_enter+0x14d4/0x2650
>  ? io_submit_sqes+0x1ed0/0x1ed0
>  ? rcu_is_watching+0x12/0xb0
>  ? __do_sys_io_uring_register+0x3f6/0x2190
>  ? io_req_caches_free+0x500/0x500
>  ? ksys_mmap_pgoff+0x85/0x5b0
>  ? rcu_is_watching+0x12/0xb0
>  ? trace_irq_enable.constprop.0+0xd0/0x100
>  do_syscall_64+0x39/0xb0
>  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> 
> which seems to demonstrate a misunderstanding on what REQ_NOWAIT is
> about. In particulary, it seems to assume you can then submit with
> atomic context? DM does an rcu_read_lock() and happily proceeds to
> attempt to submit IO under RCU being disabled.

Did a quick check to see where this came from, and it got added with:

commit 563a225c9fd207326c2a2af9d59b4097cb31ce70
Author: Mike Snitzer 
Date:   Sat Mar 26 21:08:36 2022 -0400

dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio

which conspiciously doesn't include any numbers on why this is necessary
or a good thing, and notably probably wasn't tested? This landed in 5.19
fwiw.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



[dm-devel] DM brokeness with NOWAIT

2023-09-15 Thread Jens Axboe
Hi,

Threw some db traffic into my testing mix, and that ended in tears
very quickly:

CPU: 7 PID: 49609 Comm: ringbuf-read.t Tainted: GW  
6.6.0-rc1-g39956d2dcd81 #129
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 
 dump_stack_lvl+0x11d/0x1b0
 __might_resched+0x3c3/0x5e0
 ? preempt_count_sub+0x150/0x150
 mempool_alloc+0x1e2/0x390
 ? sanity_check_pinned_pages+0x23/0x1010
 ? mempool_resize+0x7d0/0x7d0
 bio_alloc_bioset+0x417/0x8c0
 ? bvec_alloc+0x200/0x200
 ? __gup_device_huge+0x900/0x900
 bio_alloc_clone+0x53/0x100
 dm_submit_bio+0x27f/0x1a20
 ? lock_release+0x4b7/0x670
 ? pin_user_pages_fast+0xb6/0xf0
 ? blk_try_enter_queue+0x1a0/0x4d0
 ? dm_dax_direct_access+0x260/0x260
 ? rcu_is_watching+0x12/0xb0
 ? blk_try_enter_queue+0x1cc/0x4d0
 __submit_bio+0x239/0x310
 ? __bio_queue_enter+0x700/0x700
 ? kvm_clock_get_cycles+0x40/0x60
 ? ktime_get+0x285/0x470
 submit_bio_noacct_nocheck+0x4d9/0xb80
 ? should_fail_request+0x80/0x80
 ? preempt_count_sub+0x150/0x150
 ? folio_flags+0x6c/0x1e0
 submit_bio_noacct+0x53e/0x1b30
 blkdev_direct_IO.part.0+0x833/0x1810
 ? rcu_is_watching+0x12/0xb0
 ? lock_release+0x4b7/0x670
 ? blkdev_read_iter+0x40d/0x530
 ? reacquire_held_locks+0x4e0/0x4e0
 ? __blkdev_direct_IO_simple+0x780/0x780
 ? rcu_is_watching+0x12/0xb0
 ? __mark_inode_dirty+0x297/0xd50
 ? preempt_count_add+0x72/0x140
 blkdev_read_iter+0x2a4/0x530
 ? blkdev_write_iter+0xc40/0xc40
 io_read+0x369/0x1490
 ? rcu_is_watching+0x12/0xb0
 ? io_writev_prep_async+0x260/0x260
 ? __fget_files+0x279/0x410
 ? rcu_is_watching+0x12/0xb0
 io_issue_sqe+0x18a/0xd90
 io_submit_sqes+0x970/0x1ed0
 __do_sys_io_uring_enter+0x14d4/0x2650
 ? io_submit_sqes+0x1ed0/0x1ed0
 ? rcu_is_watching+0x12/0xb0
 ? __do_sys_io_uring_register+0x3f6/0x2190
 ? io_req_caches_free+0x500/0x500
 ? ksys_mmap_pgoff+0x85/0x5b0
 ? rcu_is_watching+0x12/0xb0
 ? trace_irq_enable.constprop.0+0xd0/0x100
 do_syscall_64+0x39/0xb0
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

which seems to demonstrate a misunderstanding on what REQ_NOWAIT is
about. In particulary, it seems to assume you can then submit with
atomic context? DM does an rcu_read_lock() and happily proceeds to
attempt to submit IO under RCU being disabled.

A test case for this is pretty trivial, just do RWF_NOWAIT IO on any dm
device:

int main(int argc, char *argv[])
{
struct iovec iov;
void *buf;
int fd;

fd = open("/dev/dm-0", O_RDONLY | O_DIRECT);
if (fd < 0) {
perror("open");
return 1;
}

if (posix_memalign(, 4096, 4096))
return 1;

iov.iov_base = buf;
iov.iov_len = 4096;
preadv2(fd, , 1, 0, RWF_NOWAIT);

return 0;
}

and watch the splat go by. I didn't check which kernel had this
brokeness introduced, a quick check shows it's in 6.5 too at least.
Really looks like someone added a fast NOWAIT version, but then didn't
actually test it at all...

Quick patch below makes it go away, as expected, as we'd resort to using
SRCU.

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index f0f118ab20fa..64a1f306c96c 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -715,24 +715,6 @@ static void dm_put_live_table_fast(struct mapped_device 
*md) __releases(RCU)
rcu_read_unlock();
 }
 
-static inline struct dm_table *dm_get_live_table_bio(struct mapped_device *md,
-   int *srcu_idx, blk_opf_t bio_opf)
-{
-   if (bio_opf & REQ_NOWAIT)
-   return dm_get_live_table_fast(md);
-   else
-   return dm_get_live_table(md, srcu_idx);
-}
-
-static inline void dm_put_live_table_bio(struct mapped_device *md, int 
srcu_idx,
-blk_opf_t bio_opf)
-{
-   if (bio_opf & REQ_NOWAIT)
-   dm_put_live_table_fast(md);
-   else
-   dm_put_live_table(md, srcu_idx);
-}
-
 static char *_dm_claim_ptr = "I belong to device-mapper";
 
 /*
@@ -1833,9 +1815,8 @@ static void dm_submit_bio(struct bio *bio)
struct mapped_device *md = bio->bi_bdev->bd_disk->private_data;
int srcu_idx;
struct dm_table *map;
-   blk_opf_t bio_opf = bio->bi_opf;
 
-   map = dm_get_live_table_bio(md, _idx, bio_opf);
+   map = dm_get_live_table(md, _idx);
 
/* If suspended, or map not yet available, queue this IO for later */
if (unlikely(test_bit(DMF_BLOCK_IO_FOR_SUSPEND, >flags)) ||
@@ -1851,7 +1832,7 @@ static void dm_submit_bio(struct bio *bio)
 
dm_split_and_process_bio(md, map, bio);
 out:
-   dm_put_live_table_bio(md, srcu_idx, bio_opf);
+   dm_put_live_table(md, srcu_idx);
 }
 
 static bool dm_poll_dm_io(struct dm_io *io, struct io_comp_batch *iob,

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v2 0/3] brd discard patches

2023-07-21 Thread Jens Axboe
On 7/21/23 7:48?AM, Mikulas Patocka wrote:
> This is a new version of the brd discard patches.

Can you please:

1) Ensure that your postings thread properly, it's all separate emails
   and the patches don't nest under the cover letter parent.

2) Include a changelog. What changed since v1?

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v2 3/4] brd: enable discard

2023-07-10 Thread Jens Axboe
On 7/10/23 9:24?AM, Mikulas Patocka wrote:
> 
> 
> On Mon, 10 Jul 2023, Li Nan wrote:
> 
>> Hi, Mikulas
>>
>> The lack of discard in ramdisk can cause some issues related to dm. see:
>> https://lore.kernel.org/all/20220228141354.1091687-1-luomen...@huawei.com/
>>
>> I noticed that your patch series has already supported discard for brd. But
>> this patch series has not been applied to mainline at present, may I ask if
>> you still plan to continue working on it?
>>
>> -- 
>> Thanks,
>> Nan
> 
> Hi
> 
> I got no response from ramdisk maintainer Jens Axboe. We should ask him, 
> whether he doesn't want discard on ramdisk at all or whether he wants it.

When a series is posted and reviewers comment on required changes, I
always wait for a respin of that series with those addressed. That
didn't happen, so this didn't get applied.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH] blk-crypto: use dynamic lock class for blk_crypto_profile::lock

2023-07-05 Thread Jens Axboe


On Fri, 09 Jun 2023 23:11:39 -0700, Eric Biggers wrote:
> When a device-mapper device is passing through the inline encryption
> support of an underlying device, calls to blk_crypto_evict_key() take
> the blk_crypto_profile::lock of the device-mapper device, then take the
> blk_crypto_profile::lock of the underlying device (nested).  This isn't
> a real deadlock, but it causes a lockdep report because there is only
> one lock class for all instances of this lock.
> 
> [...]

Applied, thanks!

[1/1] blk-crypto: use dynamic lock class for blk_crypto_profile::lock
  commit: de9f927faf8dfb158763898e09a3e371f2ebd30d

Best regards,
-- 
Jens Axboe



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] decouple block open flags from fmode_t v2

2023-06-12 Thread Jens Axboe


On Thu, 08 Jun 2023 13:02:28 +0200, Christoph Hellwig wrote:
> this series adds a new blk_mode_t for block open flags instead of abusing
> fmode_t.  The block open flags work very different from the normal use of
> fmode_t and only share the basic READ/WRITE flags with it.  None of the
> other normal FMODE_* flags is used, but instead there are three
> block-specific ones not used by anyone else, which can now be removed.
> 
> Note that I've only CCed maintainers and lists for drivers and file systems
> that have non-trivial changes, as otherwise the series would spam literally
> everyone in the block and file system world.
> 
> [...]

Applied, thanks!

[01/30] block: also call ->open for incremental partition opens
commit: 9d1c92872e7082f100f629a58b32fa0214aa1aec
[02/30] cdrom: remove the unused bdev argument to cdrom_open
commit: 764b83100b9aff52f950e408539c22a37cdedae8
[03/30] cdrom: remove the unused mode argument to cdrom_ioctl
commit: 473399b50de1fdc12606254351273c71d1786251
[04/30] cdrom: remove the unused cdrom_close_write release code
commit: a4cec8bc14c02e15006a71f02b0e1bbc72b9f796
[05/30] cdrom: track if a cdrom_device_info was opened for data
commit: 8cdf433e2b8e4fc6c7b4393deb93fb258175d537
[06/30] cdrom: remove the unused mode argument to cdrom_release
commit: 7ae24fcee9929f9002b84d8121144b2b3590b58c
[07/30] block: pass a gendisk on bdev_check_media_change
commit: 444aa2c58cb3b6cfe3b7cc7db6c294d73393a894
[08/30] block: pass a gendisk to ->open
commit: d32e2bf83791727a84ad5d3e3d713e82f9adbe30
[09/30] block: remove the unused mode argument to ->release
commit: ae220766d87cd6799dbf918fea10613ae14c0654
[10/30] block: rename blkdev_close to blkdev_release
commit: 7ee34cbc291a28134b60683b246ba58b4b676ec3
[11/30] swsusp: don't pass a stack address to blkdev_get_by_path
commit: c889d0793d9dc07e94a5fddcc05356157fab00b7
[12/30] bcache: don't pass a stack address to blkdev_get_by_path
commit: 29499ab060fec044161be73fb0e448eab97b4813
[13/30] rnbd-srv: don't pass a holder for non-exclusive blkdev_get_by_path
commit: 5ee607675debef509946f8a251d4c30a21493ec2
[14/30] btrfs: don't pass a holder for non-exclusive blkdev_get_by_path
commit: 2ef789288afd365f4245ba97e56189062de5148e
[15/30] block: use the holder as indication for exclusive opens
commit: 2736e8eeb0ccdc71d1f4256c9c9a28f58cc43307
[16/30] block: add a sb_open_mode helper
commit: 3f0b3e785e8b54a40c530fa77b7ab37bec925c57
[17/30] fs: remove sb->s_mode
commit: 81b1fb7d17c0110df839e13468ada9e99bb6e5f4
[18/30] scsi: replace the fmode_t argument to scsi_cmd_allowed with a simple 
bool
commit: 5f4eb9d5413fdfc779c099fdaf0ff417eb163145
[19/30] scsi: replace the fmode_t argument to scsi_ioctl with a simple bool
commit: 2e80089c18241699c41d0af0669cb93844ff0dc1
[20/30] scsi: replace the fmode_t argument to ->sg_io_fn with a simple bool
commit: 1991299e49fa58c3ba7e91599932f84bf537d592
[21/30] nvme: replace the fmode_t argument to the nvme ioctl handlers with a 
simple bool
commit: 7d9d7d59d44b7e9236d168472aa222b6543fae25
[22/30] mtd: block: use a simple bool to track open for write
commit: 658afed19ceed54a52b9e9e69c0791c8868ff55d
[23/30] rnbd-srv: replace sess->open_flags with a "bool readonly"
commit: 99b07780814e89f16bec2773c237eb25121f8502
[24/30] ubd: remove commented out code in ubd_open
commit: bd6abfc8e7898ce2163a1ffdbb9ec71a0a081267
[25/30] block: move a few internal definitions out of blkdev.h
commit: cfb425761c79b6056ae5bb73f8d400f03b513959
[26/30] block: remove unused fmode_t arguments from ioctl handlers
commit: 5e4ea834676e3b8965344ca61d36e1ae236249eb
[27/30] block: replace fmode_t with a block-specific type for block open flags
commit: 05bdb9965305bbfdae79b31d22df03d1e2cfcb22
[28/30] block: always use I_BDEV on file->f_mapping->host to find the bdev
commit: 4e762d8623448bb9d32711832ce977a65ff7636a
[29/30] block: store the holder in file->private_data
commit: ee3249a8ce78ef014a71b05157a43fba8dc764e3
[30/30] fs: remove the now unused FMODE_* flags
    commit: 0733ad8002916b9dbbbcfe6e92ad44d2657de1c1

Best regards,
-- 
Jens Axboe



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 01/24] driver core: return bool from driver_probe_done

2023-06-05 Thread Jens Axboe


On Wed, 31 May 2023 14:55:12 +0200, Christoph Hellwig wrote:
> bool is the most sensible return value for a yes/no return.  Also
> add __init as this funtion is only called from the early boot code.
> 
> 

Applied, thanks!

[01/24] driver core: return bool from driver_probe_done
commit: aa5f6ed8c21ec1aa5fd688118d8d5cd87c5ffc1d
[02/24] PM: hibernate: factor out a helper to find the resume device
commit: 02b42d58f3898134b900ff3030561099e38adb32
[03/24] PM: hibernate: remove the global snapshot_test variable
commit: d6545e687271ab27472eebff770f2de6a5f1a464
[04/24] PM: hibernate: move finding the resume device out of software_resume
commit: cc89c63e2fe37d476357c82390dfb12edcd41cdd
[05/24] init: remove pointless Root_* values
commit: f5524c3fadba35c075a5131bad74e3041507a694
[06/24] init: rename mount_block_root to mount_root_generic
commit: e3102722ffe77094ba9e7e46380792b3dd8a7abd
[07/24] init: refactor mount_root
commit: a6a41d39c2d91ff2543d31b6cc6070f3957e3aea
[08/24] init: pass root_device_name explicitly
commit: c8643c72bc42781fc169c6498a3902bec447099e
[09/24] init: don't remove the /dev/ prefix from error messages
commit: 73231b58b1b496d631fa0ecf9fa7f64f5a07c6e3
[10/24] init: handle ubi/mtd root mounting like all other root types
commit: 07d63cbb67cdb5e2a7720fdd8579b3be979c2d66
[11/24] init: factor the root_wait logic in prepare_namespace into a helper
commit: 3701c600a3e735b9fbac6f7a73e4c086090c97ca
[12/24] init: move the nfs/cifs/ram special cases out of name_to_dev_t
commit: c0c1a7dcb6f5db4500e6574294674213bc24940c
[13/24] init: improve the name_to_dev_t interface
commit: cf056a43121559d3642419917d405c3237ded90a
[14/24] init: clear root_wait on all invalid root= strings
commit: 079caa35f7863cd9958b4555ae873ea4d352a502
[15/24] block: move the code to do early boot lookup of block devices to block/
commit: 702f3189e454b3c3c2f3c99dbf30acf41aab707c
[16/24] block: move more code to early-lookup.c
commit: 7cadcaf1d82618852745e7206fffa2c72c17ce4b
[17/24] dm-snap: simplify the origin_dev == cow_dev check in snapshot_ctr
commit: 26110d5afe8117d1b505fe735ac709bdf063f4da
[18/24] dm: open code dm_get_dev_t in dm_init_init
commit: 49177377e910a8fd5cd1388c966d8fbb51075c3c
[19/24] dm: remove dm_get_dev_t
commit: d4a28d7defe79006e59293a4b43d518ba8483fb0
[20/24] dm: only call early_lookup_bdev from early boot context
commit: 7a126d5bf975f082281fb9b45d110cd49b7c3ee4
[21/24] PM: hibernate: don't use early_lookup_bdev in resume_store
commit: 1e8c813b083c4122dfeaa5c3b11028331026e85d
[22/24] mtd: block2mtd: factor the early block device open logic into a helper
commit: b2baa57475e3a24bb9ad27bb9047ea3be94627f5
[23/24] mtd: block2mtd: don't call early_lookup_bdev after the system is running
commit: 8d03187ee7328af8e18ef1782289e0b034e75485
[24/24] block: mark early_lookup_bdev as __init
commit: 2577f53f42947d8ca01666e3444bb7307319ea38

Best regards,
-- 
Jens Axboe



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v7 00/20] bio: check return values of bio_add_page

2023-05-31 Thread Jens Axboe


On Wed, 31 May 2023 04:50:23 -0700, Johannes Thumshirn wrote:
> We have two functions for adding a page to a bio, __bio_add_page() which is
> used to add a single page to a freshly created bio and bio_add_page() which is
> used to add a page to an existing bio.
> 
> While __bio_add_page() is expected to succeed, bio_add_page() can fail.
> 
> This series converts the callers of bio_add_page() which can easily use
> __bio_add_page() to using it and checks the return of bio_add_page() for
> callers that don't work on a freshly created bio.
> 
> [...]

Applied, thanks!

[01/20] swap: use __bio_add_page to add page to bio
commit: cb58bf91b138c1a8b18cca9503308789e26e3522
[02/20] drbd: use __bio_add_page to add page to bio
commit: 8f11f79f193c935da617375ba5ea4e768a73a094
[03/20] dm: dm-zoned: use __bio_add_page for adding single metadata page
commit: fc8ac3e539561aff1c0a255d701d9412d425373c
[04/20] fs: buffer: use __bio_add_page to add single page to bio
commit: 741af75d4027b1229fc6e62f4e3c4378dfe04897
[05/20] md: use __bio_add_page to add single page
commit: 3c383235c51dcd6198d37ac3ac06e2acad79f981
[06/20] md: raid5-log: use __bio_add_page to add single page
commit: b0a2f17cad9d3fa564d67c543f5d19343401fefd
[07/20] md: raid5: use __bio_add_page to add single page to new bio
commit: 6eea4ff8528d6a5b9f0eeb47992e48a8f44b5b8f
[08/20] jfs: logmgr: use __bio_add_page to add single page to bio
commit: 2896db174ced7a800863223f9e74543b98271ba0
[09/20] gfs2: use __bio_add_page for adding single page to bio
commit: effa7ddeeba782406c81b572791a142fbdaf6b05
[10/20] zonefs: use __bio_add_page for adding single page to bio
commit: 0fa5b08cf6e17b0a64ffcc5894d8efe186691ab8
[11/20] zram: use __bio_add_page for adding single page to bio
commit: 34848c910b911838e1e83e1370cb988b578c8860
[12/20] floppy: use __bio_add_page for adding single page to bio
commit: 5225229b8fdfb3e65520c43547ecf9a737161c3f
[13/20] md: check for failure when adding pages in alloc_behind_master_bio
commit: 6473bc325644b9c8473e6c92bfb520a68dce1e12
[14/20] md: raid1: use __bio_add_page for adding single page to bio
commit: 2f9848178cfa4ac68a5b46e63e5163a09b8bd80f
[15/20] md: raid1: check if adding pages to resync bio fails
commit: 2be32fe91ff54ff326b3a1608973544e835a
[16/20] dm-crypt: use __bio_add_page to add single page to clone bio
commit: 9be63ecfdd63f957b9ed25eaf85666d22a02d7a5
[17/20] block: mark bio_add_page as __must_check
commit: 5b3e39c1cc8e1cf31a398830dd665eb15546b4f7
[18/20] block: add bio_add_folio_nofail
commit: 42205551d1d43b1b42942fb7ef023cf954136cea
[19/20] fs: iomap: use bio_add_folio_nofail where possible
commit: f31c58ab3ddaf64503d7988197602d7443d5be37
[20/20] block: mark bio_add_folio as __must_check
commit: 9320744e4dbe10df6059b2b6531946c200a0ba3b

Best regards,
-- 
Jens Axboe



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v5 00/20] bio: check return values of bio_add_page

2023-05-30 Thread Jens Axboe
On 5/26/23 12:37 AM, Johannes Thumshirn wrote:
> On 24.05.23 17:02, Jens Axboe wrote:
>> On 5/2/23 4:19?AM, Johannes Thumshirn wrote:
>>> We have two functions for adding a page to a bio, __bio_add_page() which is
>>> used to add a single page to a freshly created bio and bio_add_page() which 
>>> is
>>> used to add a page to an existing bio.
>>>
>>> While __bio_add_page() is expected to succeed, bio_add_page() can fail.
>>>
>>> This series converts the callers of bio_add_page() which can easily use
>>> __bio_add_page() to using it and checks the return of bio_add_page() for
>>> callers that don't work on a freshly created bio.
>>>
>>> Lastly it marks bio_add_page() as __must_check so we don't have to go again
>>> and audit all callers.
>>
>> Looks fine to me, though it would be nice if the fs and dm people could
>> give this a quick look. Should not take long, any empty bio addition
>> should, by definition, be able to use a non-checked page addition for
>> the first page.
>>
> 
> I think the FS side is all covered. @Mike could you have a look at the dm
> patches?

Not the iomap one, that was my main concern. Not that this is tricky stuff,
but still...

-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v5 00/20] bio: check return values of bio_add_page

2023-05-24 Thread Jens Axboe
On 5/2/23 4:19?AM, Johannes Thumshirn wrote:
> We have two functions for adding a page to a bio, __bio_add_page() which is
> used to add a single page to a freshly created bio and bio_add_page() which is
> used to add a page to an existing bio.
> 
> While __bio_add_page() is expected to succeed, bio_add_page() can fail.
> 
> This series converts the callers of bio_add_page() which can easily use
> __bio_add_page() to using it and checks the return of bio_add_page() for
> callers that don't work on a freshly created bio.
> 
> Lastly it marks bio_add_page() as __must_check so we don't have to go again
> and audit all callers.

Looks fine to me, though it would be nice if the fs and dm people could
give this a quick look. Should not take long, any empty bio addition
should, by definition, be able to use a non-checked page addition for
the first page.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v5 00/20] bio: check return values of bio_add_page

2023-05-05 Thread Jens Axboe
On 5/5/23 2:09?AM, Johannes Thumshirn wrote:
> On 02.05.23 12:20, Johannes Thumshirn wrote:
>> We have two functions for adding a page to a bio, __bio_add_page() which is
>> used to add a single page to a freshly created bio and bio_add_page() which 
>> is
>> used to add a page to an existing bio.
>>
>> While __bio_add_page() is expected to succeed, bio_add_page() can fail.
>>
>> This series converts the callers of bio_add_page() which can easily use
>> __bio_add_page() to using it and checks the return of bio_add_page() for
>> callers that don't work on a freshly created bio.
>>
>> Lastly it marks bio_add_page() as __must_check so we don't have to go again
>> and audit all callers.
>>
>> Changes to v4:
>> - Rebased onto latest Linus' master
>> - Dropped already merged patches
>> - Added Sergey's Reviewed-by
>>
>> Changes to v3:
>> - Added __bio_add_folio and use it in iomap (Willy)
>> - Mark bio_add_folio must check (Willy)
>> - s/GFS/GFS2/ (Andreas)
>>
>> Changes to v2:
>> - Removed 'wont fail' comments pointed out by Song
>>
>> Changes to v1:
>> - Removed pointless comment pointed out by Willy
>> - Changed commit messages pointed out by Damien
>> - Colledted Damien's Reviews and Acks
> 
> Jens any comments on this?

I'll take a look post -rc1.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 3/5] iomap: simplify iomap_init() with PAGE_SECTORS

2023-04-21 Thread Jens Axboe
On 4/21/23 4:30?PM, Luis Chamberlain wrote:
> On Fri, Apr 21, 2023 at 04:24:57PM -0600, Jens Axboe wrote:
>> On 4/21/23 4:02?PM, Luis Chamberlain wrote:
>>> On Fri, Apr 21, 2023 at 09:14:00PM +0100, Matthew Wilcox wrote:
>>>> On Fri, Apr 21, 2023 at 12:58:05PM -0700, Luis Chamberlain wrote:
>>>>> Just use the PAGE_SECTORS generic define. This produces no functional
>>>>> changes. While at it use left shift to simplify this even further.
>>>>
>>>> How is FOO << 2 simpler than FOO * 4?
>>>>
>>>>> - return bioset_init(_ioend_bioset, 4 * (PAGE_SIZE / SECTOR_SIZE),
>>>>> + return bioset_init(_ioend_bioset, PAGE_SECTORS << 2,
>>>
>>> We could just do:
>>>
>>>
>>> -   return bioset_init(_ioend_bioset, 4 * (PAGE_SIZE / SECTOR_SIZE),
>>> +   return bioset_init(_ioend_bioset, 4 * PAGE_SECTORS,
>>>
>>> The shift just seemed optimal if we're just going to change it.
>>
>> It's going to generate the same code, but the multiplication is arguably
>> easier to read (or harder to misread).
> 
> Then let's stick with the 4 * PAGE_SECTORS. Let me know if you need another
> patch.

Just send out a v2 at some point, you've also got a number of cases
where there are superfluous parenthesis, at least in patch 4, and Willy
pointed one out in an earlier patch too. Didn't check the last one.

This will be 6.5 anyway I think, I already sent out the changes for the
6.4 merge window.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 3/5] iomap: simplify iomap_init() with PAGE_SECTORS

2023-04-21 Thread Jens Axboe
On 4/21/23 4:02 PM, Luis Chamberlain wrote:
> On Fri, Apr 21, 2023 at 09:14:00PM +0100, Matthew Wilcox wrote:
>> On Fri, Apr 21, 2023 at 12:58:05PM -0700, Luis Chamberlain wrote:
>>> Just use the PAGE_SECTORS generic define. This produces no functional
>>> changes. While at it use left shift to simplify this even further.
>>
>> How is FOO << 2 simpler than FOO * 4?
>>
>>> -   return bioset_init(_ioend_bioset, 4 * (PAGE_SIZE / SECTOR_SIZE),
>>> +   return bioset_init(_ioend_bioset, PAGE_SECTORS << 2,
> 
> We could just do:
> 
> 
> - return bioset_init(_ioend_bioset, 4 * (PAGE_SIZE / SECTOR_SIZE),
> + return bioset_init(_ioend_bioset, 4 * PAGE_SECTORS,
> 
> The shift just seemed optimal if we're just going to change it.

It's going to generate the same code, but the multiplication is arguably
easier to read (or harder to misread).

-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [dm-6.4 PATCH v2 3/9] dm bufio: improve concurrent IO performance

2023-03-24 Thread Jens Axboe
On 3/24/23 4:53?PM, Mike Snitzer wrote:
> On Fri, Mar 24 2023 at  3:34P -0400,
> Jens Axboe  wrote:
> 
>> Just some random drive-by comments.
>>
>>> diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
>>> index 1de1bdcda1ce..a58f8ac3ba75 100644
>>> --- a/drivers/md/dm-bufio.c
>>> +++ b/drivers/md/dm-bufio.c
>>> +static void lru_destroy(struct lru *lru)
>>> +{
>>> +   BUG_ON(lru->cursor);
>>> +   BUG_ON(!list_empty(>iterators));
>>> +}
>>
>> Ehm no, WARN_ON_ONCE() for these presumably.
> 
> Yeah, I raised concern about the BUG_ONs with Joe. Will try to rid the
> code of BUG_ONs as follow-on work.

Please do.

>>> @@ -116,9 +366,579 @@ struct dm_buffer {
>>>  #endif
>>>  };
>>>  
>>> +/*--*/
>>> +
>>> +/*
>>> + * The buffer cache manages buffers, particularly:
>>> + *  - inc/dec of holder count
>>> + *  - setting the last_accessed field
>>> + *  - maintains clean/dirty state along with lru
>>> + *  - selecting buffers that match predicates
>>> + *
>>> + * It does *not* handle:
>>> + *  - allocation/freeing of buffers.
>>> + *  - IO
>>> + *  - Eviction or cache sizing.
>>> + *
>>> + * cache_get() and cache_put() are threadsafe, you do not need to
>>> + * protect these calls with a surrounding mutex.  All the other
>>> + * methods are not threadsafe; they do use locking primitives, but
>>> + * only enough to ensure get/put are threadsafe.
>>> + */
>>> +
>>> +#define NR_LOCKS 64
>>> +#define LOCKS_MASK (NR_LOCKS - 1)
>>> +
>>> +struct tree_lock {
>>> +   struct rw_semaphore lock;
>>> +} cacheline_aligned_in_smp;
>>> +
>>> +struct dm_buffer_cache {
>>> +   /*
>>> +* We spread entries across multiple trees to reduce contention
>>> +* on the locks.
>>> +*/
>>> +   struct tree_lock locks[NR_LOCKS];
>>> +   struct rb_root roots[NR_LOCKS];
>>> +   struct lru lru[LIST_SIZE];
>>> +};
>>
>> This:
>>
>> struct foo_tree {
>>  struct rw_semaphore lock;
>>  struct rb_root root;
>>  struct lru lru;
>> } cacheline_aligned_in_smp;
>>
>> would be a lot better.
>>
>> And where does this NR_LOCKS come from? Don't make up magic values out
>> of thin air. Should this be per-cpu? per-node? N per node? I'll bet you
>> that 64 is way too much for most use cases, and too little for others.
> 
> I cannot speak to the 64 magic value (other than I think it worked
> well for Joe's testbed).  But the point of this exercise is to split
> the lock to avoid contention.  Using 64 accomplishes this. Having
> there be more or less isn't _that_ critical.  The hash to get the
> region index isn't a function of cpu.  We aren't after lockless here.

I don't doubt it worked well in his setup, and it'll probably be fine in
a lot of setups. But the point still stands - it's a magic value, it
should at least be documented. And 64 is likely way too much on a lot of
machines.

> Now that said, will certainly discuss further with Joe and see if we
> can be smarter here.
> 
> Your suggestion to combine members certainly makes a lot of sense.  I
> ran with it relative to the bio-prison-v1 (patch 9) changes which have
> the same layout. Definitely a win on in-core memory as well as
> avoiding cacheline thrashing while accessing the lock and then the
> rb_root members (as is always expected):

Right, this is why I suggested doing it like that. It's not very smart
to split related members like that, wastes both memory and is less
efficient than doing the right thing.

>> I stopped reading here, the patch is just too long. Surely this could be
>> split up?
>>
>>  1 file changed, 1292 insertions(+), 477 deletions(-)
>>
>> That's not a patch, that's a patch series.
> 
> I don't want to upset you or the community but saying this but:
> in this narrow instance where a sizable percentage of the file got
> rewritten: to properly review this work you need to look at the full
> scope of the changes in combination.

That's nonsense. That's like saying "to see what this series does, apply
the whole thing and compare it with the file before". With that logic,
why even split changes ever.

A big patch that could be split is harder to properly review than a lot
of small patches. Nobody ever reviews a 1000+ line patch. But I guess we
can just stop reviewing?

It should be split, it's really not up for debate.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [dm-6.4 PATCH v2 3/9] dm bufio: improve concurrent IO performance

2023-03-24 Thread Jens Axboe
ome from? Don't make up magic values out
of thin air. Should this be per-cpu? per-node? N per node? I'll bet you
that 64 is way too much for most use cases, and too little for others.

> +static bool cache_insert(struct dm_buffer_cache *bc, struct dm_buffer *b)
> +{
> + bool r;
> +
> + BUG_ON(b->list_mode >= LIST_SIZE);
> +
> + cache_write_lock(bc, b->block);
> + BUG_ON(atomic_read(>hold_count) != 1);
> + r = __cache_insert(>roots[cache_index(b->block)], b);
> + if (r)
> + lru_insert(>lru[b->list_mode], >lru);
> + cache_write_unlock(bc, b->block);
> +
> + return r;
> +}

Again, not BUG_ON's.

> +/*
> + * Removes buffer from cache, ownership of the buffer passes back to the 
> caller.
> + * Fails if the hold_count is not one (ie. the caller holds the only 
> reference).
> + *
> + * Not threadsafe.
> + */
> +static bool cache_remove(struct dm_buffer_cache *bc, struct dm_buffer *b)
> +{
> + bool r;
> +
> + cache_write_lock(bc, b->block);
> +
> + if (atomic_read(>hold_count) != 1)
> + r = false;
> +
> + else {
> + r = true;
> + rb_erase(>node, >roots[cache_index(b->block)]);
> + lru_remove(>lru[b->list_mode], >lru);
> + }
> +
> + cache_write_unlock(bc, b->block);
> +
> + return r;
> +}

Braces again.

> +static struct dm_buffer *__find_next(struct rb_root *root, sector_t block)
> +{
> + struct rb_node *n = root->rb_node;
> + struct dm_buffer *b;
> + struct dm_buffer *best = NULL;
> +
> + while (n) {
> + b = container_of(n, struct dm_buffer, node);
> +
> + if (b->block == block)
> + return b;
> +
> + if (block <= b->block) {
> + n = n->rb_left;
> + best = b;
> + } else
> + n = n->rb_right;
> + }

And again.

> @@ -1141,7 +1904,6 @@ static void *new_read(struct dm_bufio_client *c, 
> sector_t block,
>   }
>  
>   *bp = b;
> -
>   return b->data;
>  }
>  

Unrelated change. There are a bunch of these.

I stopped reading here, the patch is just too long. Surely this could be
split up?

 1 file changed, 1292 insertions(+), 477 deletions(-)

That's not a patch, that's a patch series.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH -next RFC] block: count 'ios' and 'sectors' when io is done for bio-based device

2023-03-15 Thread Jens Axboe


On Thu, 23 Feb 2023 17:12:26 +0800, Yu Kuai wrote:
> While using iostat for raid, I observed very strange 'await'
> occasionally, and turns out it's due to that 'ios' and 'sectors' is
> counted in bdev_start_io_acct(), while 'nsecs' is counted in
> bdev_end_io_acct(). I'm not sure why they are ccounted like that
> but I think this behaviour is obviously wrong because user will get
> wrong disk stats.
> 
> [...]

Applied, thanks!

[1/1] block: count 'ios' and 'sectors' when io is done for bio-based device
  commit: 5f27571382ca42daa3e3d40d1b252bf18c2b61d2

Best regards,
-- 
Jens Axboe



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v3] blk-mq: enforce op-specific segment limits in blk_insert_cloned_request

2023-03-02 Thread Jens Axboe


On Tue, 28 Feb 2023 17:06:55 -0700, Uday Shankar wrote:
> The block layer might merge together discard requests up until the
> max_discard_segments limit is hit, but blk_insert_cloned_request checks
> the segment count against max_segments regardless of the req op. This
> can result in errors like the following when discards are issued through
> a DM device and max_discard_segments exceeds max_segments for the queue
> of the chosen underlying device.
> 
> [...]

Applied, thanks!

[1/1] blk-mq: enforce op-specific segment limits in blk_insert_cloned_request
  commit: 49d24398327e32265eccdeec4baeb5a6a609c0bd

Best regards,
-- 
Jens Axboe



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [RFC for-6.2/block V2] block: Change the granularity of io ticks from ms to ns

2022-12-07 Thread Jens Axboe
On 12/7/22 5:35?PM, Keith Busch wrote:
> On Wed, Dec 07, 2022 at 11:17:12PM +, Chaitanya Kulkarni wrote:
>> On 12/7/22 15:08, Jens Axboe wrote:
>>>
>>> My default peak testing runs at 122M IOPS. That's also the peak IOPS of
>>> the devices combined, and with iostats disabled. If I enabled iostats,
>>> then the performance drops to 112M IOPS. It's no longer device limited,
>>> that's a drop of about 8.2%.
>>>
>>
>> Wow, clearly not acceptable that's exactly I asked for perf
>> numbers :).
> 
> For the record, we did say per-io ktime_get() has a measurable
> performance harm and should be aggregated.
> 
>   https://www.spinics.net/lists/linux-block/msg89937.html

Yes, I iterated that in the v1 posting as well, and mentioned it was the
reason the time batching was done. From the results I posted, if you
look at a profile of the run, here are the time related additions:

+   27.22%  io_uring  [kernel.vmlinux]  [k] read_tsc
+4.37%  io_uring  [kernel.vmlinux]  [k] ktime_get

which are #1 and $4, respectively. That's a LOT of added overhead. Not
sure why people think time keeping is free, particularly high
granularity time keeping. It's definitely not, and adding 2-3 per IO is
very noticeable.

-- 
Jens Axboe




Re: [dm-devel] [RFC for-6.2/block V2] block: Change the granularity of io ticks from ms to ns

2022-12-07 Thread Jens Axboe
On 12/7/22 3:32?PM, Gulam Mohamed wrote:
> As per the review comment from Jens Axboe, I am re-sending this patch
> against "for-6.2/block".
> 
> 
> Use ktime to change the granularity of IO accounting in block layer from
> milli-seconds to nano-seconds to get the proper latency values for the
> devices whose latency is in micro-seconds. After changing the granularity
> to nano-seconds the iostat command, which was showing incorrect values for
> %util, is now showing correct values.
> 
> We did not work on the patch to drop the logic for
> STAT_PRECISE_TIMESTAMPS yet. Will do it if this patch is ok.
> 
> The iostat command was run after starting the fio with following command
> on an NVME disk. For the same fio command, the iostat %util was showing
> ~100% for the disks whose latencies are in the range of microseconds.
> With the kernel changes (granularity to nano-seconds), the %util was
> showing correct values. Following are the details of the test and their
> output:

My default peak testing runs at 122M IOPS. That's also the peak IOPS of
the devices combined, and with iostats disabled. If I enabled iostats,
then the performance drops to 112M IOPS. It's no longer device limited,
that's a drop of about 8.2%.

Adding this patch, and with iostats enabled, performance is at 91M IOPS.
That's a ~25% drop from no iostats, and a ~19% drop from the iostats we
have now...

Here's what I'd like to see changed:

- Split the patch up. First change all the types from unsigned long to
  u64, that can be done while retaining jiffies.

- Add an iostats == 2 setting, which enables this higher resolution
  mode. We'd still default to 1, lower granularity iostats enabled.

I think that's cleaner than one big patch, and means that patch 1 should
not really have any noticeable changes. That's generally how I like to
get things split. With that, then I think there could be a way to get
this included.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [RFC] block: Change the granularity of io ticks from ms to ns

2022-12-07 Thread Jens Axboe
On 12/7/22 6:09 AM, Yu Kuai wrote:
> Hi,
> 
> 在 2022/12/07 11:15, Ming Lei 写道:
>> On Wed, Dec 07, 2022 at 10:19:08AM +0800, Yu Kuai wrote:
>>> Hi,
>>>
>>> 在 2022/12/07 2:15, Gulam Mohamed 写道:
>>>> Use ktime to change the granularity of IO accounting in block layer from
>>>> milli-seconds to nano-seconds to get the proper latency values for the
>>>> devices whose latency is in micro-seconds. After changing the granularity
>>>> to nano-seconds the iostat command, which was showing incorrect values for
>>>> %util, is now showing correct values.
>>>
>>> This patch didn't correct the counting of io_ticks, just make the
>>> error accounting from jiffies(ms) to ns. The problem that util can be
>>> smaller or larger still exist.
>>
>> Agree.
>>
>>>
>>> However, I think this change make sense consider that error margin is
>>> much smaller, and performance overhead should be minimum.
>>>
>>> Hi, Ming, how do you think?
>>
>> I remembered that ktime_get() has non-negligible overhead, is there any
>> test data(iops/cpu utilization) when running fio or t/io_uring on
>> null_blk with this patch?
> 
> Yes, testing with null_blk is necessary, we don't want any performance
> regression.

null_blk is fine as a substitute, but I'd much rather run this on my
test bench with actual IO and devices.

> BTW, I thought it's fine because it's already used for tracking io
> latency.

Reading a nsec timestamp is a LOT more expensive than reading jiffies,
which is essentially free. If you look at the amount of work that's
gone into minimizing ktime_get() for the fast path in the IO stack,
then that's a testament to that.

So that's a very bad assumption, and definitely wrong.

-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [RFC] block: Change the granularity of io ticks from ms to ns

2022-12-07 Thread Jens Axboe
On 12/6/22 11:15?AM, Gulam Mohamed wrote:
> Use ktime to change the granularity of IO accounting in block layer from
> milli-seconds to nano-seconds to get the proper latency values for the
> devices whose latency is in micro-seconds. After changing the granularity
> to nano-seconds the iostat command, which was showing incorrect values for
> %util, is now showing correct values.
> 
> We did not work on the patch to drop the logic for
> STAT_PRECISE_TIMESTAMPS yet. Will do it if this patch is ok.
> 
> The iostat command was run after starting the fio with following command
> on an NVME disk. For the same fio command, the iostat %util was showing
> ~100% for the disks whose latencies are in the range of microseconds.
> With the kernel changes (granularity to nano-seconds), the %util was
> showing correct values. Following are the details of the test and their
> output:

As mentioned, this will most likely have a substantial performance
impact. I'd test it, but your patch is nowhere near applying to the
current block tree. Please resend it against for-6.2/block so it can
get tested.

-- 
Jens Axboe




Re: [dm-devel] [PATCH] block: remove bio_set_op_attrs

2022-12-07 Thread Jens Axboe


On Tue, 06 Dec 2022 15:40:57 +0100, Christoph Hellwig wrote:
> This macro is obsolete, so replace the last few uses with open coded
> bi_opf assignments.
> 
> 

Applied, thanks!

[1/1] block: remove bio_set_op_attrs
  commit: c34b7ac65087554627f4840f4ecd6f2107a68fd1

Best regards,
-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v3] virtio_blk: add VIRTIO_BLK_F_LIFETIME feature support

2022-12-05 Thread Jens Axboe
On 12/5/22 1:29?PM, Michael S. Tsirkin wrote:
> On Mon, Dec 05, 2022 at 11:53:51AM -0700, Jens Axboe wrote:
>> On 12/5/22 11:36?AM, Alvaro Karsz wrote:
>>> Hi,
>>>
>>>> Is this based on some spec? Because it looks pretty odd to me. There
>>>> can be a pretty wide range of two/three/etc level cells with wildly
>>>> different ranges of durability. And there's really not a lot of slc
>>>> for generic devices these days, if any.
>>>
>>> Yes, this is based on the virtio spec
>>> https://docs.oasis-open.org/virtio/virtio/v1.2/csd01/virtio-v1.2-csd01.html
>>> section  5.2.6
>>
>> And where did this come from?
> 
> 
> Here's the commit log from the spec:
>   In many embedded systems, virtio-blk implementations are
>   backed by eMMC or UFS storage devices, which are subject to
>   predictable and measurable wear over time due to repeated write
>   cycles.
> 
>   For such systems, it can be important to be able to track
>   accurately the amount of wear imposed on the storage over
>   time and surface it to applications. In a native deployments
>   this is generally handled by the physical block device driver
>   but no such provision is made in virtio-blk to expose these
>   metrics for devices where it makes sense to do so.
> 
>   This patch adds support to virtio-blk for lifetime and wear
>   metrics to be exposed to the guest when a deployment of
>   virtio-blk is done over compatible eMMC or UFS storage.
> 
>   Signed-off-by: Enrico Granata 
> 
> Cc Enrico Granata as well.

Alvaro sent me this one privately too. To be honest, I don't like this
patch very much, but that's mostly because the spec for this caters to a
narrow use case of embedded flash. It's not a generic storage thing,
it's just for mmc/ufs and the embedded space. That's a missed
opportunity. And either it'll become mostly unused, or it'll actually be
used and then extended at some point.

While I'm not a fan at all, if you guys are willing to take it in
virtio-blk, then I won't stand in your way. I would rename it though to
more specifically detail what it deals with.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v3] virtio_blk: add VIRTIO_BLK_F_LIFETIME feature support

2022-12-05 Thread Jens Axboe
On 12/5/22 11:36 AM, Alvaro Karsz wrote:
> Hi,
> 
>> Is this based on some spec? Because it looks pretty odd to me. There
>> can be a pretty wide range of two/three/etc level cells with wildly
>> different ranges of durability. And there's really not a lot of slc
>> for generic devices these days, if any.
> 
> Yes, this is based on the virtio spec
> https://docs.oasis-open.org/virtio/virtio/v1.2/csd01/virtio-v1.2-csd01.html
> section  5.2.6

And where did this come from?

-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v3] virtio_blk: add VIRTIO_BLK_F_LIFETIME feature support

2022-12-05 Thread Jens Axboe
On 12/5/22 9:20 AM, Alvaro Karsz wrote:
> Implement the VIRTIO_BLK_F_LIFETIME feature for VirtIO block devices.
> 
> This commit introduces a new ioctl command, VBLK_LIFETIME.
> 
> VBLK_LIFETIME ioctl asks for the block device to provide lifetime
> information by sending a VIRTIO_BLK_T_GET_LIFETIME command to the device.

s/VBLK_LIFETIME/VBLK_GET_LIFETIME

for the above.

-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH v3] virtio_blk: add VIRTIO_BLK_F_LIFETIME feature support

2022-12-05 Thread Jens Axboe
On 12/5/22 9:20 AM, Alvaro Karsz wrote:
> Implement the VIRTIO_BLK_F_LIFETIME feature for VirtIO block devices.
> 
> This commit introduces a new ioctl command, VBLK_LIFETIME.
> 
> VBLK_LIFETIME ioctl asks for the block device to provide lifetime
> information by sending a VIRTIO_BLK_T_GET_LIFETIME command to the device.
> 
> lifetime information fields:
> 
> - pre_eol_info: specifies the percentage of reserved blocks that are
>   consumed.
>   optional values following virtio spec:
>   *) 0 - undefined.
>   *) 1 - normal, < 80% of reserved blocks are consumed.
>   *) 2 - warning, 80% of reserved blocks are consumed.
>   *) 3 - urgent, 90% of reserved blocks are consumed.
> 
> - device_lifetime_est_typ_a: this field refers to wear of SLC cells and
>is provided in increments of 10used, and so
>on, thru to 11 meaning estimated lifetime
>exceeded. All values above 11 are reserved.
> 
> - device_lifetime_est_typ_b: this field refers to wear of MLC cells and is
>provided with the same semantics as
>device_lifetime_est_typ_a.
> 
> The data received from the device will be sent as is to the user.
> No data check/decode is done by virtblk.

Is this based on some spec? Because it looks pretty odd to me. There
can be a pretty wide range of two/three/etc level cells with wildly
different ranges of durability. And there's really not a lot of slc
for generic devices these days, if any.

-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] pass a struct block_device to the blk-crypto interfaces v3

2022-11-21 Thread Jens Axboe
On Mon, 14 Nov 2022 05:29:41 +0100, Christoph Hellwig wrote:
> this series switches the blk-crypto interfaces to take block_device
> arguments instead of request_queues, and with that finishes off the
> project to hide struct request_queue from file systems.
> 
> Changes since v2:
>  - update a few comments
>  - fix a whitespace error
>  - remove now unused forward declarations
>  - fix spelling errors an not precise enough wording in commit messages
>  - move a few more declarations around inside or between headers
> 
> [...]

Applied, thanks!

[1/3] blk-crypto: don't use struct request_queue for public interfaces
  commit: fce3caea0f241f5d34855c82c399d5e0e2d91f07
[2/3] blk-crypto: add a blk_crypto_config_supported_natively helper
  commit: 6715c98b6cf003f26b1b2f655393134e9d999a05
[3/3] blk-crypto: move internal only declarations to blk-crypto-internal.h
  commit: 3569788c08235c6f3e9e6ca724b2df44787ff487

Best regards,
-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCHv2 0/5] fix direct io device mapper errors

2022-11-16 Thread Jens Axboe
On Thu, 10 Nov 2022 10:44:56 -0800, Keith Busch wrote:
> From: Keith Busch 
> 
> The 6.0 kernel made some changes to the direct io interface to allow
> offsets in user addresses. This based on the hardware's capabilities
> reported in the request_queue's dma_alignment attribute.
> 
> dm-crypt, -log-writes and -integrity require direct io be aligned to the
> block size. Since it was only ever using the default 511 dma mask, this
> requirement may fail if formatted to something larger, like 4k, which
> will result in unexpected behavior with direct-io.
> 
> [...]

Applied, thanks!

[1/5] block: make dma_alignment a stacking queue_limit
  commit: c964d62f5cab7b43dd0534f22a96eab386c6ec5d
[2/5] dm-crypt: provide dma_alignment limit in io_hints
  commit: 86e4d3e8d1838ca88fb9267e669c36f6c8f7c6cd
[3/5] block: make blk_set_default_limits() private
  commit: b3228254bb6e91e57f920227f72a1a7d81925d81
[4/5] dm-integrity: set dma_alignment limit in io_hints
  commit: 29aa778bb66795e6a78b1c99beadc83887827868
[5/5] dm-log-writes: set dma_alignment limit in io_hints
  commit: 50a893359cd2643ee1afc96eedc9e7084cab49fa

Best regards,
-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v3 00/10] fix delayed holder tracking

2022-11-16 Thread Jens Axboe
On Tue, 15 Nov 2022 22:10:44 +0800, Yu Kuai wrote:
> From: Yu Kuai 
> 
> Hi all,
> 
> this series tries to fix the delayed holder tracking that is only used by
> dm by moving it into dm, where we can track the lifetimes much better.
> v2 is from Christoph, here I send v3 with some additional fixes.
> 
> [...]

Applied, thanks!

[01/10] block: clear ->slave_dir when dropping the main slave_dir reference
commit: d90db3b1c8676bc88b4309c5a571333de2263b8e
[02/10] dm: remove free_table_devices
commit: 992ec6a92ac315d3301353ff3beb818fcc34e4e4
[03/10] dm: cleanup open_table_device
commit: b9a785d2dc6567b2fd9fc60057a6a945a276927a
[04/10] dm: cleanup close_table_device
commit: 7b5865831c1003122f737df5e16adaa583f1a595
[05/10] dm: make sure create and remove dm device won't race with open and 
close table
commit: d563792c8933a810d28ce0f2831f0726c2b15a31
[06/10] dm: track per-add_disk holder relations in DM
commit: 1a581b72169968f4154b5793828f3bc28b258b35
[07/10] block: remove delayed holder registration
commit: 7abc077788363ac7194aefd355306f8e974feff7
[08/10] block: fix use after free for bd_holder_dir
commit: 62f535e1f061b4c2cc76061b6b59af9f9335ee34
[09/10] block: store the holder kobject in bd_holder_disk
commit: 3b3449c1e6c3fe19f62607ff4f353f8bb82d5c4e
[10/10] block: don't allow a disk link holder to itself
commit: 077a4033541fc96fb0a955985aab7d1f353da831

Best regards,
-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH] blk-mq: don't add non-pt request with ->end_io to batch

2022-10-27 Thread Jens Axboe
On Thu, 27 Oct 2022 16:57:09 +0800, Ming Lei wrote:
> dm-rq implements ->end_io callback for request issued to underlying queue,
> and it isn't passthrough request.
> 
> Commit ab3e1d3bbab9 ("block: allow end_io based requests in the completion
> batch handling") doesn't clear rq->bio and rq->__data_len for request
> with ->end_io in blk_mq_end_request_batch(), and this way is actually
> dangerous, but so far it is only for nvme passthrough request.
> 
> [...]

Applied, thanks!

[1/1] blk-mq: don't add non-pt request with ->end_io to batch
  (no commit info)

Best regards,
-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v15 00/13] support zoned block devices with non-power-of-2 zone sizes

2022-09-30 Thread Jens Axboe
On 9/30/22 1:38 PM, Bart Van Assche wrote:
> On 9/30/22 08:13, Jens Axboe wrote:
>> On 9/29/22 12:31 AM, Pankaj Raghav wrote:
>>>> Hi Jens,
>>>> ?? Please consider this patch series for the 6.1 release.
>>>>
>>>
>>> Hi Jens, Christoph, and Keith,
>>> ? All the patches have a Reviewed-by tag at this point. Can we queue this up
>>> for 6.1?
>>
>> It's getting pretty late for 6.1 and I'd really like to have both Christoph
>> and Martin sign off on these changes.
> 
> Hi Jens,
> 
> Agreed that it's getting late for 6.1.
> 
> Since this has not been mentioned in the cover letter, I want to add
> that in the near future we will need these patches for Android
> devices. JEDEC is working on supporting zoned storage for UFS devices,
> the storage devices used in all modern Android phones. Although it
> would be possible to make the offset between zone starts a power of
> two by inserting gap zones between data zones, UFS vendors asked not
> to do this and hence need support for zone sizes that are not a power
> of two. An advantage of not having to deal with gap zones is better
> filesystem performance since filesystem extents cannot span gap zones.
> Having to split filesystem extents because of gap zones reduces
> filesystem performance.

Noted. I'll find some time to review this as well separately, once we're
on the other side of the merge window.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v15 00/13] support zoned block devices with non-power-of-2 zone sizes

2022-09-30 Thread Jens Axboe
On 9/29/22 12:31 AM, Pankaj Raghav wrote:
>> Hi Jens,
>>   Please consider this patch series for the 6.1 release.
>>
> 
> Hi Jens, Christoph, and Keith,
>  All the patches have a Reviewed-by tag at this point. Can we queue this up
> for 6.1?

It's getting pretty late for 6.1 and I'd really like to have both Christoph
and Martin sign off on these changes.

-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH] blk-lib: fix blkdev_issue_secure_erase

2022-09-15 Thread Jens Axboe
On Wed, 14 Sep 2022 16:55:51 -0400 (EDT), Mikulas Patocka wrote:
> There's a bug in blkdev_issue_secure_erase. The statement
> "unsigned int len = min_t(sector_t, nr_sects, max_sectors);"
> sets the variable "len" to the length in sectors, but the statement
> "bio->bi_iter.bi_size = len" treats it as if it were in bytes.
> The statements "sector += len << SECTOR_SHIFT" and "nr_sects -= len <<
> SECTOR_SHIFT" are thinko.
> 
> [...]

Applied, thanks!

[1/1] blk-lib: fix blkdev_issue_secure_erase
  commit: c4fa368466cc1b60bb92f867741488930ddd6034

Best regards,
-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] clean up zoned device information v2

2022-07-06 Thread Jens Axboe
On Wed, 6 Jul 2022 09:03:34 +0200, Christoph Hellwig wrote:
> this cleans up the block layer zoned device information APIs and
> moves all fields currently in the request_queue to the gendisk as
> they aren't relevant for passthrough I/O.
> 
> Changes since v1:
>  - drop the blk-zoned/nvmet code sharing for now
>  - use a helper a little earlier
>  - various spelling fixes
> 
> [...]

Applied, thanks!

[01/16] block: remove a superflous ifdef in blkdev.h
commit: f1a8bbd1100d9cd117bc8b7fc0903982bbaf474f
[02/16] block: call blk_queue_free_zone_bitmaps from disk_release
commit: 6cc37a672a1e21245b931722a016b3bd4ae10e2d
[03/16] block: use bdev_is_zoned instead of open coding it
commit: edd1dbc83b1de3b98590b76e09b86ddf6887fce7
[04/16] block: simplify blk_mq_plug
commit: 6deacb3bfac2b720e707c566549a7041f17db9c8
[05/16] block: simplify blk_check_zone_append
commit: 052e545c9276f97e86368579fda32aa1ac017d51
[06/16] block: pass a gendisk to blk_queue_set_zoned
commit: 6b2bd274744e6454ba7bbbe6a09b44866f2f414a
[07/16] block: pass a gendisk to blk_queue_clear_zone_settings
commit: b3c72f8138b5f967a9fa527af84b35018897aba3
[08/16] block: pass a gendisk to blk_queue_free_zone_bitmaps
commit: 5d40066567a73a67ddb656ad118c6cfa1c4a6d71
[09/16] block: remove queue_max_open_zones and queue_max_active_zones
commit: 1dc0172027b0aa09823b430e395b1116d2745f36
[10/16] block: pass a gendisk to blk_queue_max_open_zones and 
blk_queue_max_active_zones
commit: 982977df48179c8c690868f398051074e68eef0f
[11/16] block: replace blkdev_nr_zones with bdev_nr_zones
commit: b623e347323f6464b20fb0d899a0a73522ed8f6c
[12/16] block: use bdev based helpers in blkdev_zone_mgmt{,all}
commit: 375c140c199ebd2866f9c50a0b8853ffca3f1b68
[13/16] nvmet:: use bdev based helpers in nvmet_bdev_zone_mgmt_emulate_all
commit: a239145ad18b59338a2b6c419c1a15a0e52d1315
[14/16] dm-zoned: cleanup dmz_fixup_devices
commit: fabed68c272389db85655a2933737d602f4008fb
[15/16] block: remove blk_queue_zone_sectors
commit: de71973c2951cb2ce4b46560f021f03b15906408
[16/16] block: move zone related fields to struct gendisk
commit: d86e716aa40643e3eb8c69fab3a198146bf76dd6

Best regards,
-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 5.20 1/4] block: add bio_rewind() API

2022-06-29 Thread Jens Axboe
On Jun 29, 2022, at 1:26 PM, Kent Overstreet  wrote:
> 
> On Wed, Jun 29, 2022 at 01:00:52PM -0600, Jens Axboe wrote:
>>> On 6/29/22 12:40 PM, Kent Overstreet wrote:
>>> On Wed, Jun 29, 2022 at 11:16:10AM -0600, Jens Axboe wrote:
>>>> Not sure what Christoph change you are referring to, but all the ones
>>>> that I did to improve the init side were all backed by numbers I ran at
>>>> that time (and most/all of the commit messages will have that data). So
>>>> yes, it is indeed still very noticeable. Maybe not at 100K IOPS, but at
>>>> 10M on a core it most certainly is.
>>> 
>>> I was referring to 609be1066731fea86436f5f91022f82e592ab456. You
>>> signed off on it, you must remember it...?
>> 
>> I'm sure we all remember each and every commit that gets applied,
>> particularly with such a precise description of the change.
>> 
>>>> I'm all for having solid and maintainable code, obviously, but frivolous
>>>> bloating of structures and more expensive setup cannot be hand waved
>>>> away with "it doesn't matter if we touch 3 or 6 cachelines" because we
>>>> obviously have a disagreement on that.
>>> 
>>> I wouldn't propose inflating struct _bio_ like that. But Jens, to be
>>> blunt - I know we have different priorities in the way we write code.
>>> Your writeback throttling code was buggy for _ages_ and I had users
>>> hitting deadlocks there that I pinged you about, and I could not make
>>> heads or tails of how that code was supposed to work and not for lack
>>> of time spent trying!
>> 
>> OK Kent, you just wasted your 2nd chance here. Plonk. There are many
>> rebuttals that could be made here, but I won't waste my time on it, nor
>> would it be appropriate.
>> 
>> Come back when you know how to act professionally. Or don't come back
>> at all.
> 
> Jens, you're just acting like your code is immune to criticism, and I don't 
> have
> an eyeroll big enough for that. We all know how you care about chasing every
> last of those 10 million iops - and not much else.

Kent, the time for your unsolicited opinions and attacks have passed. Just go 
away, not interested in interacting with you. You have no idea what you’re 
talking about. 




--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


Re: [dm-devel] [PATCH 5.20 1/4] block: add bio_rewind() API

2022-06-29 Thread Jens Axboe
On 6/29/22 12:40 PM, Kent Overstreet wrote:
> On Wed, Jun 29, 2022 at 11:16:10AM -0600, Jens Axboe wrote:
>> Not sure what Christoph change you are referring to, but all the ones
>> that I did to improve the init side were all backed by numbers I ran at
>> that time (and most/all of the commit messages will have that data). So
>> yes, it is indeed still very noticeable. Maybe not at 100K IOPS, but at
>> 10M on a core it most certainly is.
> 
> I was referring to 609be1066731fea86436f5f91022f82e592ab456. You
> signed off on it, you must remember it...?

I'm sure we all remember each and every commit that gets applied,
particularly with such a precise description of the change.

>> I'm all for having solid and maintainable code, obviously, but frivolous
>> bloating of structures and more expensive setup cannot be hand waved
>> away with "it doesn't matter if we touch 3 or 6 cachelines" because we
>> obviously have a disagreement on that.
> 
> I wouldn't propose inflating struct _bio_ like that. But Jens, to be
> blunt - I know we have different priorities in the way we write code.
> Your writeback throttling code was buggy for _ages_ and I had users
> hitting deadlocks there that I pinged you about, and I could not make
> heads or tails of how that code was supposed to work and not for lack
> of time spent trying!

OK Kent, you just wasted your 2nd chance here. Plonk. There are many
rebuttals that could be made here, but I won't waste my time on it, nor
would it be appropriate.

Come back when you know how to act professionally. Or don't come back
at all.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 5.20 1/4] block: add bio_rewind() API

2022-06-29 Thread Jens Axboe
On 6/28/22 12:32 PM, Kent Overstreet wrote:
> On Tue, Jun 28, 2022 at 12:13:06PM -0600, Jens Axboe wrote:
>> It's much less about using whatever amount of memory for inflight IO,
>> and much more about not bloating fast path structures (of which the
>> bio is certainly one). All of this gunk has to be initialized for
>> each IO, and that's the real issue.
>>
>> Just look at the recent work for iov_iter and why ITER_UBUF makes
>> sense to do.
>>
>> This is not a commentary on this patchset, just a general
>> observation. Sizes of hot data structures DO matter, and quite a bit
>> so.
> 
> Younger me would have definitely been in agreement; initializing these
> structs definitely tends to show up in profiles.

Older me still greatly cares :-)

> These days I'm somewhat less inclined towards that view - profiles
> naturally highlight where your cache misses are happening, and
> initializing a freshly allocated data structure is always going to be
> a cache miss. But the difference between touching 3 and 6 contiguous
> cachelines is practically nil...  assuming we aren't doing anything
> stupid like using memset (despite what Linus wants from the CPU
> vendors, rep stos _still_ sucks) and perhaps inserting prefetches
> where appropriate.
> 
> And I see work going by that makes me really wonder if it was
> justified - in particular I _really_ want to know if Christoph's bio
> initialization change was justified by actual benchmarks and what
> those numbers were, vs. looking at profiles. Wasn't anything in the
> commit log...

Not sure what Christoph change you are referring to, but all the ones
that I did to improve the init side were all backed by numbers I ran at
that time (and most/all of the commit messages will have that data). So
yes, it is indeed still very noticeable. Maybe not at 100K IOPS, but at
10M on a core it most certainly is.

I'm all for having solid and maintainable code, obviously, but frivolous
bloating of structures and more expensive setup cannot be hand waved
away with "it doesn't matter if we touch 3 or 6 cachelines" because we
obviously have a disagreement on that.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 5.20 1/4] block: add bio_rewind() API

2022-06-28 Thread Jens Axboe
On 6/27/22 10:20 PM, Kent Overstreet wrote:
> On Mon, Jun 27, 2022 at 03:36:22PM +0800, Ming Lei wrote:
>> On Sun, Jun 26, 2022 at 04:14:58PM -0400, Kent Overstreet wrote:
>>> On Fri, Jun 24, 2022 at 10:12:52PM +0800, Ming Lei wrote:
>>>> Commit 7759eb23fd98 ("block: remove bio_rewind_iter()") removes
>>>> the similar API because the following reasons:
>>>>
>>>> ```
>>>> It is pointed that bio_rewind_iter() is one very bad API[1]:
>>>>
>>>> 1) bio size may not be restored after rewinding
>>>>
>>>> 2) it causes some bogus change, such as 5151842b9d8732 (block: reset
>>>> bi_iter.bi_done after splitting bio)
>>>>
>>>> 3) rewinding really makes things complicated wrt. bio splitting
>>>>
>>>> 4) unnecessary updating of .bi_done in fast path
>>>>
>>>> [1] https://marc.info/?t=15354992425=1=2
>>>>
>>>> So this patch takes Kent's suggestion to restore one bio into its 
>>>> original
>>>> state via saving bio iterator(struct bvec_iter) in 
>>>> bio_integrity_prep(),
>>>> given now bio_rewind_iter() is only used by bio integrity code.
>>>> ```
>>>>
>>>> However, it isn't easy to restore bio by saving 32 bytes bio->bi_iter, and 
>>>> saving
>>>> it only can't restore crypto and integrity info.
>>>>
>>>> Add bio_rewind() back for some use cases which may not be same with
>>>> previous generic case:
>>>>
>>>> 1) most of bio has fixed end sector since bio split is done from front of 
>>>> the bio,
>>>> if driver just records how many sectors between current bio's start sector 
>>>> and
>>>> the bio's end sector, the original position can be restored
>>>>
>>>> 2) if one bio's end sector won't change, usually bio_trim() isn't called, 
>>>> user can
>>>> restore original position by storing sectors from current 
>>>> ->bi_iter.bi_sector to
>>>> bio's end sector; together by saving bio size, 8 bytes can restore to
>>>> original bio.
>>>>
>>>> 3) dm's requeue use case: when BLK_STS_DM_REQUEUE happens, dm core needs to
>>>> restore to the original bio which represents current dm io to be requeued.
>>>> By storing sectors to the bio's end sector and dm io's size,
>>>> bio_rewind() can restore such original bio, then dm core code needn't to
>>>> allocate one bio beforehand just for handling BLK_STS_DM_REQUEUE which
>>>> is actually one unusual event.
>>>>
>>>> 4) Not like original rewind API, this one needn't to add .bi_done, and no 
>>>> any
>>>> effect on fast path
>>>
>>> It seems like perhaps the real issue here is that we need a real bio_iter,
>>> separate from bvec_iter, that also encapsulates iterating over integrity &
>>> fscrypt. 
>>
>> Not mention bio_iter, bvec_iter has been 32 bytes, which is too big to
>> hold in per-io data structure. With this patch, 8bytes is enough
>> to rewind one bio if the end sector is fixed.
> 
> Hold on though, does that check out? Why is that too big for per IO data
> structures?
> 
> By definition these structures are only for IOs in flight, and we don't _want_
> there to ever be very many of these or we're going to run into latency issues
> due to queue depth.

It's much less about using whatever amount of memory for inflight IO,
and much more about not bloating fast path structures (of which the bio
is certainly one). All of this gunk has to be initialized for each IO,
and that's the real issue.

Just look at the recent work for iov_iter and why ITER_UBUF makes sense
to do.

This is not a commentary on this patchset, just a general observation.
Sizes of hot data structures DO matter, and quite a bit so.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] clean up the chunk_sizehandling helpers a little

2022-06-17 Thread Jens Axboe
On Tue, 14 Jun 2022 11:09:28 +0200, Christoph Hellwig wrote:
> this series cleans up a bunch of block layer helpers related to the chunk
> size.
> 
> Diffstat:
>  block/blk-merge.c  |   28 
>  block/blk.h|   13 +
>  drivers/md/dm.c|   17 ++---
>  include/linux/blkdev.h |   39 +++
>  4 files changed, 42 insertions(+), 55 deletions(-)
> 
> [...]

Applied, thanks!

[1/6] block: factor out a chunk_size_left helper
  commit: d0e3310bb972f65c4b614c29f8022f57a52123c8
[2/6] dm: open code blk_max_size_offset in max_io_len
  commit: 6d5661a5d0e513dde5d49820315c5d6249a5c732
[3/6] block: open code blk_max_size_offset in blk_rq_get_max_sectors
  commit: 92ac28684e7eccf968b556893ca09c57d1fb3cdd
[4/6] block: cleanup variable naming in get_max_io_size
  commit: 08fdba80df1fd78a22b00e96ffd062a5bbaf8d8e
[5/6] block: fold blk_max_size_offset into get_max_io_size
  commit: d8f1d38c87b87ea3a0a0c58b6386333731e29470
[6/6] block: move blk_queue_get_max_sectors to blk.h
  commit: d8fca63495fb21e9b2dfcf722346aa844459139a

Best regards,
-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] bioset_exit poison from dm_destroy

2022-06-02 Thread Jens Axboe
On 6/2/22 2:12 AM, Christoph Hellwig wrote:
> On Wed, Jun 01, 2022 at 10:13:34AM -0400, Mike Snitzer wrote:
>> Please take the time to look at the code and save your judgement until
>> you do.  That said, I'm not in love with the complexity of how DM
>> handles bioset initialization.  But both you and Jens keep taking
>> shots at DM for doing things wrong without actually looking.
> 
> I'm not taking shots.  I'm just saying we should kill this API.  In
> the worse case the caller can keep track of if a bioset is initialized,
> but in most cases we should be able to deduct it in a nicer way.

Yeah ditto, it's more an observation on needing something like
this_foo_was_initialized() is just a bad way to program it to begin
with. The caller should know this already, rather than us needing
functions and state in the struct to tell you about it.

>> DM uses bioset_init_from_src().  Yet you've both assumed double frees
>> and such (while not entirely wrong your glossing over the detail that
>> there is intervening reinitialization of DM's biosets between the
>> bioset_exit()s)
> 
> And looking at the code, that use of bioset_init_from_src is completely
> broken.  It does not actually preallocated anything as intended by
> dm (maybe that isn't actually needed) but just uses the biosets in
> dm_md_mempools as an awkward way to carry parameters.  And completely
> loses bringing over the integrity allocations.  And no, this is not
> intended as a "cheap shot" against Jens who did that either..
> 
> This is what I think should fix this, and will allow us to remove
> bioset_init_from_src which was a bad idea from the start:

Based on a quick look, seems good to me.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] bioset_exit poison from dm_destroy

2022-06-01 Thread Jens Axboe
On 6/1/22 12:04 AM, Christoph Hellwig wrote:
> On Tue, May 31, 2022 at 02:58:00PM -0400, Mike Snitzer wrote:
>> Yes, we need the above to fix the crash.  Does it also make sense to
>> add this?
> 
> Can we just stop treating bio_sets so sloppily and make the callers
> handle their lifetime properly?  No one should have to use
> bioset_initialized (or double free bio_sets).

Yes, that was my point in the previous email - get rid of
bioset_initialized() and just fixup dm, which is the only user of it.
There really should be no need to have this kind of conditional checking
and initialization.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] bioset_exit poison from dm_destroy

2022-05-31 Thread Jens Axboe
On 5/31/22 1:49 PM, Mike Snitzer wrote:
> On Tue, May 31 2022 at  3:00P -0400,
> Jens Axboe  wrote:
> 
>> On 5/31/22 12:58 PM, Mike Snitzer wrote:
>>> On Sun, May 29 2022 at  8:46P -0400,
>>> Jens Axboe  wrote:
>>>
>>>> On 5/28/22 6:17 PM, Matthew Wilcox wrote:
>>>>> Not quite sure whose bug this is.  Current Linus head running xfstests
>>>>> against ext4 (probably not ext4's fault?)
>>>>>
>>>>> 01818 generic/250 run fstests generic/250 at 2022-05-28 23:48:09
>>>>> 01818 EXT4-fs (dm-0): mounted filesystem with ordered data mode. Quota 
>>>>> mode: none.
>>>>> 01818 EXT4-fs (dm-0): unmounting filesystem.
>>>>> 01818 EXT4-fs (dm-0): mounted filesystem with ordered data mode. Quota 
>>>>> mode: none.
>>>>> 01818 EXT4-fs (dm-0): unmounting filesystem.
>>>>> 01818 EXT4-fs (dm-0): mounted filesystem with ordered data mode. Quota 
>>>>> mode: none.
>>>>> 01818 Buffer I/O error on dev dm-0, logical block 367, async page read
>>>>> 01818 EXT4-fs (dm-0): unmounting filesystem.
>>>>> 01818 EXT4-fs (dm-0): mounted filesystem with ordered data mode. Quota 
>>>>> mode: none.
>>>>> 01818 EXT4-fs (dm-0): unmounting filesystem.
>>>>> 01818 general protection fault, probably for non-canonical address 
>>>>> 0xdead0122:  [#1] PREEMPT SMP NOPTI
>>>>> 01818 CPU: 0 PID: 1579117 Comm: dmsetup Kdump: loaded Not tainted 
>>>>> 5.18.0-11049-g1dec3d7fd0c3-dirty #262
>>>>> 01818 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 
>>>>> 04/01/2014
>>>>> 01818 RIP: 0010:__cpuhp_state_remove_instance+0xf0/0x1b0
>>>>> 01818 Code: a0 f8 d7 81 42 3b 1c 28 7f d9 4c 89 e1 31 d2 89 de 89 7d dc 
>>>>> e8 01 fd ff ff 8b 7d dc eb c5 49 8b 04 24 49 8b 54 24 08 48 85 c0 <48> 89 
>>>>> 02 74 04 48 89 50 08 48 b8 00 01 00 00 00 00 ad de 48 c7 c7
>>>>> 01818 RSP: 0018:888101fcfc60 EFLAGS: 00010286
>>>>> 01818 RAX: dead0100 RBX: 0017 RCX: 
>>>>> 01818 RDX: dead0122 RSI: 8881233b0ae8 RDI: 81e3b080
>>>>> 01818 RBP: 888101fcfc88 R08: 0008 R09: 0003
>>>>> 01818 R10:  R11: 002dc6c0 R12: 8881233b0ae8
>>>>> 01818 R13:  R14: 81e38f58 R15: 88817b5a3c00
>>>>> 01818 FS:  7ff56daec280() GS:88827580() 
>>>>> knlGS:
>>>>> 01818 CS:  0010 DS:  ES:  CR0: 80050033
>>>>> 01818 CR2: 5591ad94f198 CR3: 00017b5a0004 CR4: 00770eb0
>>>>> 01818 PKRU: 5554
>>>>> 01818 Call Trace:
>>>>> 01818  
>>>>> 01818  ? kfree+0x66/0x250
>>>>> 01818  bioset_exit+0x32/0x210
>>>>> 01818  cleanup_mapped_device+0x34/0xf0
>>>>> 01818  __dm_destroy+0x149/0x1f0
>>>>> 01818  ? table_clear+0xc0/0xc0
>>>>> 01818  dm_destroy+0xe/0x10
>>>>> 01818  dev_remove+0xd9/0x120
>>>>> 01818  ctl_ioctl+0x1cb/0x420
>>>>> 01818  dm_ctl_ioctl+0x9/0x10
>>>>> 01818  __x64_sys_ioctl+0x89/0xb0
>>>>> 01818  do_syscall_64+0x35/0x80
>>>>> 01818  entry_SYSCALL_64_after_hwframe+0x46/0xb0
>>>>> 01818 RIP: 0033:0x7ff56de3b397
>>>>> 01818 Code: 3c 1c e8 1c ff ff ff 85 c0 79 87 49 c7 c4 ff ff ff ff 5b 5d 
>>>>> 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 
>>>>> 01 f0 ff ff 73 01 c3 48 8b 0d a9 da 0d 00 f7 d8 64 89 01 48
>>>>> 01818 RSP: 002b:7ffe55367ef8 EFLAGS: 0206 ORIG_RAX: 
>>>>> 0010
>>>>> 01818 RAX: ffda RBX: 7ff56df31a8e RCX: 7ff56de3b397
>>>>> 01818 RDX: 55daad7cab30 RSI: c138fd04 RDI: 0003
>>>>> 01818 RBP: 7ffe55367fb0 R08: 7ff56df81558 R09: 7ffe55367d60
>>>>> 01818 R10: 7ff56df808a2 R11: 0206 R12: 7ff56df808a2
>>>>> 01818 R13: 7ff56df808a2 R14: 7ff56df808a2 R15: 7ff56df808a2
>>>>> 01818  
>>>>> 01818 Modules linked in: crct10dif_generic crct10dif_common [last 
>>>>> unloaded: crc_t10dif]
>>>>
>>>> I suspect dm is calling bioset_exit() multiple times? Which it probably
>&

Re: [dm-devel] bioset_exit poison from dm_destroy

2022-05-31 Thread Jens Axboe
On 5/31/22 12:58 PM, Mike Snitzer wrote:
> On Sun, May 29 2022 at  8:46P -0400,
> Jens Axboe  wrote:
> 
>> On 5/28/22 6:17 PM, Matthew Wilcox wrote:
>>> Not quite sure whose bug this is.  Current Linus head running xfstests
>>> against ext4 (probably not ext4's fault?)
>>>
>>> 01818 generic/250   run fstests generic/250 at 2022-05-28 23:48:09
>>> 01818 EXT4-fs (dm-0): mounted filesystem with ordered data mode. Quota 
>>> mode: none.
>>> 01818 EXT4-fs (dm-0): unmounting filesystem.
>>> 01818 EXT4-fs (dm-0): mounted filesystem with ordered data mode. Quota 
>>> mode: none.
>>> 01818 EXT4-fs (dm-0): unmounting filesystem.
>>> 01818 EXT4-fs (dm-0): mounted filesystem with ordered data mode. Quota 
>>> mode: none.
>>> 01818 Buffer I/O error on dev dm-0, logical block 367, async page read
>>> 01818 EXT4-fs (dm-0): unmounting filesystem.
>>> 01818 EXT4-fs (dm-0): mounted filesystem with ordered data mode. Quota 
>>> mode: none.
>>> 01818 EXT4-fs (dm-0): unmounting filesystem.
>>> 01818 general protection fault, probably for non-canonical address 
>>> 0xdead0122:  [#1] PREEMPT SMP NOPTI
>>> 01818 CPU: 0 PID: 1579117 Comm: dmsetup Kdump: loaded Not tainted 
>>> 5.18.0-11049-g1dec3d7fd0c3-dirty #262
>>> 01818 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 
>>> 04/01/2014
>>> 01818 RIP: 0010:__cpuhp_state_remove_instance+0xf0/0x1b0
>>> 01818 Code: a0 f8 d7 81 42 3b 1c 28 7f d9 4c 89 e1 31 d2 89 de 89 7d dc e8 
>>> 01 fd ff ff 8b 7d dc eb c5 49 8b 04 24 49 8b 54 24 08 48 85 c0 <48> 89 02 
>>> 74 04 48 89 50 08 48 b8 00 01 00 00 00 00 ad de 48 c7 c7
>>> 01818 RSP: 0018:888101fcfc60 EFLAGS: 00010286
>>> 01818 RAX: dead0100 RBX: 0017 RCX: 
>>> 01818 RDX: dead0122 RSI: 8881233b0ae8 RDI: 81e3b080
>>> 01818 RBP: 888101fcfc88 R08: 0008 R09: 0003
>>> 01818 R10:  R11: 002dc6c0 R12: 8881233b0ae8
>>> 01818 R13:  R14: 81e38f58 R15: 88817b5a3c00
>>> 01818 FS:  7ff56daec280() GS:88827580() 
>>> knlGS:
>>> 01818 CS:  0010 DS:  ES:  CR0: 80050033
>>> 01818 CR2: 5591ad94f198 CR3: 00017b5a0004 CR4: 00770eb0
>>> 01818 PKRU: 5554
>>> 01818 Call Trace:
>>> 01818  
>>> 01818  ? kfree+0x66/0x250
>>> 01818  bioset_exit+0x32/0x210
>>> 01818  cleanup_mapped_device+0x34/0xf0
>>> 01818  __dm_destroy+0x149/0x1f0
>>> 01818  ? table_clear+0xc0/0xc0
>>> 01818  dm_destroy+0xe/0x10
>>> 01818  dev_remove+0xd9/0x120
>>> 01818  ctl_ioctl+0x1cb/0x420
>>> 01818  dm_ctl_ioctl+0x9/0x10
>>> 01818  __x64_sys_ioctl+0x89/0xb0
>>> 01818  do_syscall_64+0x35/0x80
>>> 01818  entry_SYSCALL_64_after_hwframe+0x46/0xb0
>>> 01818 RIP: 0033:0x7ff56de3b397
>>> 01818 Code: 3c 1c e8 1c ff ff ff 85 c0 79 87 49 c7 c4 ff ff ff ff 5b 5d 4c 
>>> 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 
>>> f0 ff ff 73 01 c3 48 8b 0d a9 da 0d 00 f7 d8 64 89 01 48
>>> 01818 RSP: 002b:7ffe55367ef8 EFLAGS: 0206 ORIG_RAX: 0010
>>> 01818 RAX: ffda RBX: 7ff56df31a8e RCX: 7ff56de3b397
>>> 01818 RDX: 55daad7cab30 RSI: c138fd04 RDI: 0003
>>> 01818 RBP: 7ffe55367fb0 R08: 7ff56df81558 R09: 7ffe55367d60
>>> 01818 R10: 7ff56df808a2 R11: 0206 R12: 7ff56df808a2
>>> 01818 R13: 7ff56df808a2 R14: 7ff56df808a2 R15: 7ff56df808a2
>>> 01818  
>>> 01818 Modules linked in: crct10dif_generic crct10dif_common [last unloaded: 
>>> crc_t10dif]
>>
>> I suspect dm is calling bioset_exit() multiple times? Which it probably
>> should not.
>>
>> The reset of bioset_exit() is resilient against this, so might be best
>> to include bio_alloc_cache_destroy() in that.
>>
>>
>> diff --git a/block/bio.c b/block/bio.c
>> index a3893d80dccc..be3937b84e68 100644
>> --- a/block/bio.c
>> +++ b/block/bio.c
>> @@ -722,6 +722,7 @@ static void bio_alloc_cache_destroy(struct bio_set *bs)
>>  bio_alloc_cache_prune(cache, -1U);
>>  }
>>  free_percpu(bs->cache);
>> +bs->cache = NULL;
>>  }
>>  
>>  /**
> 
> Yes, we need the above to fix the crash.  Does it also make sense to
> add this?
> 
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 49eff01fb829..f410c78e9c0c 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -681,7 +681,7 @@ struct bio_set {
> 
>  static inline bool bioset_initialized(struct bio_set *bs)
>  {
> - return bs->bio_slab != NULL;
> + return (bs->bio_slab != NULL || bs->cache != NULL);
>  }

Should not be possible to have valid bs->cache without bs->bio_slab?


-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] bioset_exit poison from dm_destroy

2022-05-29 Thread Jens Axboe
On 5/28/22 6:17 PM, Matthew Wilcox wrote:
> Not quite sure whose bug this is.  Current Linus head running xfstests
> against ext4 (probably not ext4's fault?)
> 
> 01818 generic/250 run fstests generic/250 at 2022-05-28 23:48:09
> 01818 EXT4-fs (dm-0): mounted filesystem with ordered data mode. Quota mode: 
> none.
> 01818 EXT4-fs (dm-0): unmounting filesystem.
> 01818 EXT4-fs (dm-0): mounted filesystem with ordered data mode. Quota mode: 
> none.
> 01818 EXT4-fs (dm-0): unmounting filesystem.
> 01818 EXT4-fs (dm-0): mounted filesystem with ordered data mode. Quota mode: 
> none.
> 01818 Buffer I/O error on dev dm-0, logical block 367, async page read
> 01818 EXT4-fs (dm-0): unmounting filesystem.
> 01818 EXT4-fs (dm-0): mounted filesystem with ordered data mode. Quota mode: 
> none.
> 01818 EXT4-fs (dm-0): unmounting filesystem.
> 01818 general protection fault, probably for non-canonical address 
> 0xdead0122:  [#1] PREEMPT SMP NOPTI
> 01818 CPU: 0 PID: 1579117 Comm: dmsetup Kdump: loaded Not tainted 
> 5.18.0-11049-g1dec3d7fd0c3-dirty #262
> 01818 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 
> 04/01/2014
> 01818 RIP: 0010:__cpuhp_state_remove_instance+0xf0/0x1b0
> 01818 Code: a0 f8 d7 81 42 3b 1c 28 7f d9 4c 89 e1 31 d2 89 de 89 7d dc e8 01 
> fd ff ff 8b 7d dc eb c5 49 8b 04 24 49 8b 54 24 08 48 85 c0 <48> 89 02 74 04 
> 48 89 50 08 48 b8 00 01 00 00 00 00 ad de 48 c7 c7
> 01818 RSP: 0018:888101fcfc60 EFLAGS: 00010286
> 01818 RAX: dead0100 RBX: 0017 RCX: 
> 01818 RDX: dead0122 RSI: 8881233b0ae8 RDI: 81e3b080
> 01818 RBP: 888101fcfc88 R08: 0008 R09: 0003
> 01818 R10:  R11: 002dc6c0 R12: 8881233b0ae8
> 01818 R13:  R14: 81e38f58 R15: 88817b5a3c00
> 01818 FS:  7ff56daec280() GS:88827580() 
> knlGS:
> 01818 CS:  0010 DS:  ES:  CR0: 80050033
> 01818 CR2: 5591ad94f198 CR3: 00017b5a0004 CR4: 00770eb0
> 01818 PKRU: 5554
> 01818 Call Trace:
> 01818  
> 01818  ? kfree+0x66/0x250
> 01818  bioset_exit+0x32/0x210
> 01818  cleanup_mapped_device+0x34/0xf0
> 01818  __dm_destroy+0x149/0x1f0
> 01818  ? table_clear+0xc0/0xc0
> 01818  dm_destroy+0xe/0x10
> 01818  dev_remove+0xd9/0x120
> 01818  ctl_ioctl+0x1cb/0x420
> 01818  dm_ctl_ioctl+0x9/0x10
> 01818  __x64_sys_ioctl+0x89/0xb0
> 01818  do_syscall_64+0x35/0x80
> 01818  entry_SYSCALL_64_after_hwframe+0x46/0xb0
> 01818 RIP: 0033:0x7ff56de3b397
> 01818 Code: 3c 1c e8 1c ff ff ff 85 c0 79 87 49 c7 c4 ff ff ff ff 5b 5d 4c 89 
> e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff 
> ff 73 01 c3 48 8b 0d a9 da 0d 00 f7 d8 64 89 01 48
> 01818 RSP: 002b:7ffe55367ef8 EFLAGS: 0206 ORIG_RAX: 0010
> 01818 RAX: ffda RBX: 7ff56df31a8e RCX: 7ff56de3b397
> 01818 RDX: 55daad7cab30 RSI: c138fd04 RDI: 0003
> 01818 RBP: 7ffe55367fb0 R08: 7ff56df81558 R09: 7ffe55367d60
> 01818 R10: 7ff56df808a2 R11: 0206 R12: 7ff56df808a2
> 01818 R13: 7ff56df808a2 R14: 7ff56df808a2 R15: 7ff56df808a2
> 01818  
> 01818 Modules linked in: crct10dif_generic crct10dif_common [last unloaded: 
> crc_t10dif]

I suspect dm is calling bioset_exit() multiple times? Which it probably
should not.

The reset of bioset_exit() is resilient against this, so might be best
to include bio_alloc_cache_destroy() in that.


diff --git a/block/bio.c b/block/bio.c
index a3893d80dccc..be3937b84e68 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -722,6 +722,7 @@ static void bio_alloc_cache_destroy(struct bio_set *bs)
bio_alloc_cache_prune(cache, -1U);
}
free_percpu(bs->cache);
+   bs->cache = NULL;
 }
 
 /**

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] fix and cleanup discard_alignment handling

2022-05-03 Thread Jens Axboe
On Mon, 18 Apr 2022 06:53:03 +0200, Christoph Hellwig wrote:
> the somewhat confusing name of the discard_alignment queue limit, that
> really is an offset for the discard granularity mislead a lot of driver
> authors to set it to an incorrect value.  This series tries to fix up
> all these cases.
> 
> Diffstat:
>  arch/um/drivers/ubd_kern.c |1 -
>  drivers/block/loop.c   |1 -
>  drivers/block/nbd.c|3 ---
>  drivers/block/null_blk/main.c  |1 -
>  drivers/block/rnbd/rnbd-srv-dev.h  |2 +-
>  drivers/block/virtio_blk.c |7 ---
>  drivers/block/xen-blkback/xenbus.c |4 ++--
>  drivers/md/dm-zoned-target.c   |2 +-
>  drivers/md/raid5.c |1 -
>  drivers/nvme/host/core.c   |1 -
>  drivers/s390/block/dasd_fba.c  |1 -
>  11 files changed, 8 insertions(+), 16 deletions(-)
> 
> [...]

Applied, thanks!

[01/11] ubd: don't set the discard_alignment queue limit
commit: 07c6e92a8478770a7302f7dde72f03a5465901bd
[02/11] nbd: don't set the discard_alignment queue limit
commit: 4a04d517c56e0616c6f69afc226ee2691e543712
[03/11] null_blk: don't set the discard_alignment queue limit
commit: fb749a87f4536d2fa86ea135ae4eff1072903438
[04/11] virtio_blk: fix the discard_granularity and discard_alignment queue 
limits
commit: 62952cc5bccd89b76d710de1d0b43244af0f2903
[05/11] dm-zoned: don't set the discard_alignment queue limit
commit: 44d583702f4429763c558624fac763650a1f05bf
[06/11] raid5: don't set the discard_alignment queue limit
commit: 3d50d368c92ade2f98a3d0d28b842a57c35284e9
[07/11] dasd: don't set the discard_alignment queue limit
commit: c3f765299632727fa5ea5a0acf118665227a4f1a
[08/11] loop: remove a spurious clear of discard_alignment
commit: 4418bfd8fb9602d9cd8747c3ad52fdbaa02e2ffd
[09/11] nvme: remove a spurious clear of discard_alignment
commit: 4e7f0ece41e1be8f876f320a0972a715daec0a50
[10/11] rnbd-srv: use bdev_discard_alignment
commit: 18292faa89d2bff3bdd33ab9c065f45fb6710e47
[11/11] xen-blkback: use bdev_discard_alignment
commit: c899b23533866910c90ef4386b501af50270d320

Best regards,
-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] use block_device based APIs in block layer consumers v3

2022-04-18 Thread Jens Axboe
On Fri, 15 Apr 2022 06:52:31 +0200, Christoph Hellwig wrote:
> this series cleanups up the block layer API so that APIs consumed
> by file systems are (almost) only struct block_devic based, so that
> file systems don't have to poke into block layer internals like the
> request_queue.
> 
> I also found a bunch of existing bugs related to partition offsets
> and discard so these are fixed while going along.
> 
> [...]

Applied, thanks!

[01/27] target: remove an incorrect unmap zeroes data deduction
commit: 179d8609d8424529e95021df939ed7b0b82b37f1
[02/27] target: pass a block_device to target_configure_unmap_from_queue
commit: 817e8b51eb3d927ce6d56ecf9f48bc3c5b26168b
[03/27] target: fix discard alignment on partitions
commit: 968786b9ef56e75e0109158a4936ea962c1e
[04/27] drbd: remove assign_p_sizes_qlim
commit: 40349d0e16cedd0de561f59752c3249780fb749b
[05/27] drbd: use bdev based limit helpers in drbd_send_sizes
commit: 7a38acce229685968b770d1d9e64e01396b93643
[06/27] drbd: use bdev_alignment_offset instead of queue_alignment_offset
commit: c6f23b1a05441a26f765e59dd95e8ba7354f9388
[07/27] drbd: cleanup decide_on_discard_support
commit: 998e9cbcd615e5e6a7baa69e673ee845f812744e
[08/27] btrfs: use bdev_max_active_zones instead of open coding it
commit: c1e7b24416400ef13ff92a1c60c336c9a2834d7b
[09/27] ntfs3: use bdev_logical_block_size instead of open coding it
commit: f09dac9afb8e3ce4b6485dbc091a9b9c742db023
[10/27] mm: use bdev_is_zoned in claim_swapfile
commit: 9964e674559b02619fee2012a56839624143d02e
[11/27] block: add a bdev_nonrot helper
commit: 10f0d2a517796b8f6dc04fb0cc3e49003ae6b0bc
[12/27] block: add a bdev_write_cache helper
commit: 08e688fdb8f7e862092ae64cee20bc8b463d1046
[13/27] block: add a bdev_fua helper
commit: a557e82e5a01826f902bd94fc925c03f253cb712
[14/27] block: add a bdev_stable_writes helper
commit: 36d254893aa6a6e204075c3cce94bb572ac32c04
[15/27] block: add a bdev_max_zone_append_sectors helper
commit: 2aba0d19f4d8c8929b4b3b94a9cfde2aa20e6ee2
[16/27] block: use bdev_alignment_offset in part_alignment_offset_show
commit: 64dcc7c2717395b7c83ffb10f040d3be795d03c1
[17/27] block: use bdev_alignment_offset in disk_alignment_offset_show
commit: 640f2a23911b8388989547f89d055afbb910b88e
[18/27] block: move bdev_alignment_offset and queue_limit_alignment_offset out 
of line
commit: 89098b075cb74a80083bc4ed6b71d0ee18b6898f
[19/27] block: remove queue_discard_alignment
commit: 4e1462ffe8998749884d61f91be251a7a8719677
[20/27] block: use bdev_discard_alignment in part_discard_alignment_show
commit: f0f975a4dde890bfe25ce17bf07a6495453988a4
[21/27] block: move {bdev,queue_limit}_discard_alignment out of line
commit: 5c4b4a5c6f11c869a57c6bd977143430bc9dc43d
[22/27] block: refactor discard bio size limiting
commit: e3cc28ea28b5f8794db2aed24f8a0282ad2e85a2
[23/27] block: add a bdev_max_discard_sectors helper
commit: cf0fbf894bb543f472f682c486be48298eccf199
[24/27] block: remove QUEUE_FLAG_DISCARD
commit: 70200574cc229f6ba038259e8142af2aa09e6976
[25/27] block: add a bdev_discard_granularity helper
commit: 7b47ef52d0a2025fd1408a8a0990933b8e1e510f
[26/27] block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD
commit: 44abff2c0b970ae3d310b97617525dc01f248d7c
[27/27] direct-io: remove random prefetches
commit: c22198e78d523c8fa079bbb70b2523bb6aa51849

Best regards,
-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] (subset) [dm-5.19 PATCH 00/21] dm: changes staged for 5.19

2022-04-18 Thread Jens Axboe
On Sun, 17 Apr 2022 22:27:12 -0400, Mike Snitzer wrote:
> This patchset reflects what I've staged in linux-dm.git's "dm-5.19"
> branch (also staged in "for-next" for linux-next):
> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-5.19
> 
> It build's on jens/for-5.19/block branch (which is based on v5.18-rc3)
> 
> I can still make changes or add Reviewed-by:s, etc. So please feel
> free to review.
> 
> [...]

Applied, thanks!

[01/21] block: change exported IO accounting interface from gendisk to bdev
commit: 5f0614a55ecebdf55f1a17db0b5f6b787ed009f1

Best regards,
-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [dm-5.19 PATCH 00/21] dm: changes staged for 5.19

2022-04-18 Thread Jens Axboe
On 4/17/22 9:16 PM, Mike Snitzer wrote:
> On Sun, Apr 17 2022 at 11:00P -0400,
> Damien Le Moal  wrote:
> 
>> On 4/18/22 11:27, Mike Snitzer wrote:
>>> Hi,
>>>
>>> This patchset reflects what I've staged in linux-dm.git's "dm-5.19"
>>> branch (also staged in "for-next" for linux-next):
>>> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-5.19
>>>
>>> It build's on jens/for-5.19/block branch (which is based on v5.18-rc3)
>>
>> Mike, thanks for posting. We will run this on zoned stuff to check.
> 
> OK, I appreciate it..
> 
>> Note that patches 13 to 20 are empty...
> 
> Not sure what's going on there... basically any patch that wasn't from
> me (so 1, 13-19) isn't showing up in patchwork or the dm-devel
> archive.

The patches I received look normal, fwiw.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] cleanup bio_kmalloc v3

2022-04-17 Thread Jens Axboe
On Wed, 6 Apr 2022 08:12:23 +0200, Christoph Hellwig wrote:
> this series finishes off the bio allocation interface cleanups by dealing
> with the weirdest member of the famility.  bio_kmalloc combines a kmalloc
> for the bio and bio_vecs with a hidden bio_init call and magic cleanup
> semantics.
> 
> This series moves a few callers away from bio_kmalloc and then turns
> bio_kmalloc into a simple wrapper for a slab allocation of a bio and the
> inline biovecs.  The callers need to manually call bio_init instead with
> all that entails and the magic that turns bio_put into a kfree goes away
> as well, allowing for a proper debug check in bio_put that catches
> accidental use on a bio_init()ed bio.
> 
> [...]

Applied, thanks!

[1/5] btrfs: simplify ->flush_bio handling
  commit: f9e69aa9ccd7e51c47b147e45e03987ea0ef9aa3
[2/5] squashfs: always use bio_kmalloc in squashfs_bio_read
  commit: 46a2d4ccc49903923506685a8368ca88312bbdc9
[3/5] target/pscsi: remove pscsi_get_bio
  commit: 7655db80932d95f501a0811544d9520ec720e38d
[4/5] block: turn bio_kmalloc into a simple kmalloc wrapper
  commit: 066ff571011d8416e903d3d4f1f41e0b5eb91e1d
[5/5] pktcdvd: stop using bio_reset
  commit: 852ad96cb03621f7995764b4b31cbff9801d8bcd

Best regards,
-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 4/5] block: turn bio_kmalloc into a simple kmalloc wrapper

2022-03-31 Thread Jens Axboe
On 3/31/22 3:18 PM, Marek Szyprowski wrote:
> Hi Christoph,
> 
> On 08.03.2022 07:15, Christoph Hellwig wrote:
>> Remove the magic autofree semantics and require the callers to explicitly
>> call bio_init to initialize the bio.
>>
>> This allows bio_free to catch accidental bio_put calls on bio_init()ed
>> bios as well.
>>
>> Signed-off-by: Christoph Hellwig 
> 
> This patch, which landed in today's next-20220331 as commit 57c47b42f454 
> ("block: turn bio_kmalloc into a simple kmalloc wrapper"), breaks badly 
> all my test systems, which use squashfs initrd:

The series has been reverted on the block side, so next linux-next should
be fine again. We'll try again for 5.19.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] cleanup bio_kmalloc v2

2022-03-31 Thread Jens Axboe
On 3/31/22 10:40 AM, Christoph Hellwig wrote:
> This should fix it:

Let's drop this one for 5.18, it's also causing a few conflicts and
would probably be more suited for 5.19 at this point.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] (subset) [PATCH v3 0/3] block/dm: use BIOSET_PERCPU_CACHE from bio_alloc_bioset

2022-03-30 Thread Jens Axboe
On Thu, 24 Mar 2022 16:35:23 -0400, Mike Snitzer wrote:
> This v3 is a rebase of the previous v2 series ontop of the revised v2
> patch that Christoph provided.
> 
> Linus hasn't pulled the for-5.18/dm-changes branch yet, so the 3rd DM
> patch cannot be applied yet.  But feel free to pickup the first 2
> block patches for 5.19 and I'll rebase dm-5.19 on block accordingly.
> 
> [...]

Applied, thanks!

[1/3] block: allow using the per-cpu bio cache from bio_alloc_bioset
  commit: a147e4805855e34f8e1027b88baf59a7f7c8b8d3
[2/3] block: allow use of per-cpu bio alloc cache by block drivers
  commit: e866e4dbad251b4dd1e134c295afd862333864bc

Best regards,
-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] cleanup bio_kmalloc v2

2022-03-30 Thread Jens Axboe
On Tue, 8 Mar 2022 07:15:46 +0100, Christoph Hellwig wrote:
> this series finishes off the bio allocation interface cleanups by dealing
> with the weirdest member of the famility.  bio_kmalloc combines a kmalloc
> for the bio and bio_vecs with a hidden bio_init call and magic cleanup
> semantics.
> 
> This series moves a few callers away from bio_kmalloc and then turns
> bio_kmalloc into a simple wrapper for a slab allocation of a bio and the
> inline biovecs.  The callers need to manually call bio_init instead with
> all that entails and the magic that turns bio_put into a kfree goes away
> as well, allowing for a proper debug check in bio_put that catches
> accidental use on a bio_init()ed bio.
> 
> [...]

Applied, thanks!

[1/5] btrfs: simplify ->flush_bio handling
  commit: 6978ffddd5bba44e6b7614d52868cf4954e0529b
[2/5] squashfs: always use bio_kmalloc in squashfs_bio_read
  commit: 88a39feabf25efbaec775ffb48ea240af198994e
[3/5] target/pscsi: remove pscsi_get_bio
  commit: bbccc65bd7c1b22f050b65d8171fbdd8d72cf39c
[4/5] block: turn bio_kmalloc into a simple kmalloc wrapper
  commit: 57c47b42f4545b5f8fa288f190c0d68f96bc477f
[5/5] pktcdvd: stop using bio_reset
  commit: 1292fb59f283e76f55843d94f066c2f0b91dfb7e

Best regards,
-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] cleanup bio_kmalloc v2

2022-03-30 Thread Jens Axboe
On 3/30/22 8:29 AM, Christoph Hellwig wrote:
> I just noticed this didn't make it into the 5.18 queue.  Which is a
> bit sad as it leaves us with a rather inconsistent bio API in 5.18.

Let me take a look, we might still be able to make it...

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v2 0/4] block/dm: use BIOSET_PERCPU_CACHE from bio_alloc_bioset

2022-03-23 Thread Jens Axboe
On 3/23/22 1:45 PM, Mike Snitzer wrote:
> Hi Jens,
> 
> I ran with your suggestion and DM now sees a ~7% improvement in hipri
> bio polling with io_uring (using dm-linear on null_blk, IOPS went from
> 900K to 966K).
> 
> Christoph,
> 
> I tried to address your review of the previous set. Patch 1 and 2 can
> obviously be folded but I left them split out for review purposes.
> Feel free to see if these changes are meaningful for nvme's use.
> Happy for either you to take on iterating on these block changes
> further or you letting me know what changes you'd like made.

Ran the usual peak testing, and it's good for about a 20% improvement
for me. 5.6M -> 6.6M IOPS on a single core, dm-linear.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v6 2/2] dm: support bio polling

2022-03-09 Thread Jens Axboe
On 3/9/22 9:00 PM, Ming Lei wrote:
> Looks improvement on 4k is small, is it caused by pcie bw limit?
> What is the IOPS when running the same t/io_uring on single optane
> directly?

Yes, see what you responded to higher up:

"which ends up being bw limited for me, because the devices aren't
linked gen4".

Some of them are, but the adapters are a bit janky and we often end up
with gen3 links and hence limited to ~3.2GB/sec per drive. But with the
bw limits even on gen4, you're roughly at about 1.5M IOPS per drive at
that point, so would expect lower percentage wise gains for 4k with
polling.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v6 2/2] dm: support bio polling

2022-03-09 Thread Jens Axboe
On 3/8/22 6:13 PM, Ming Lei wrote:
> On Tue, Mar 08, 2022 at 06:02:50PM -0700, Jens Axboe wrote:
>> On 3/7/22 11:53 AM, Mike Snitzer wrote:
>>> From: Ming Lei 
>>>
>>> Support bio(REQ_POLLED) polling in the following approach:
>>>
>>> 1) only support io polling on normal READ/WRITE, and other abnormal IOs
>>> still fallback to IRQ mode, so the target io is exactly inside the dm
>>> io.
>>>
>>> 2) hold one refcnt on io->io_count after submitting this dm bio with
>>> REQ_POLLED
>>>
>>> 3) support dm native bio splitting, any dm io instance associated with
>>> current bio will be added into one list which head is bio->bi_private
>>> which will be recovered before ending this bio
>>>
>>> 4) implement .poll_bio() callback, call bio_poll() on the single target
>>> bio inside the dm io which is retrieved via bio->bi_bio_drv_data; call
>>> dm_io_dec_pending() after the target io is done in .poll_bio()
>>>
>>> 5) enable QUEUE_FLAG_POLL if all underlying queues enable QUEUE_FLAG_POLL,
>>> which is based on Jeffle's previous patch.
>>
>> It's not the prettiest thing in the world with the overlay on bi_private,
>> but at least it's nicely documented now.
>>
>> I would encourage you to actually test this on fast storage, should make
>> a nice difference. I can run this on a gen2 optane, it's 10x the IOPS
>> of what it was tested on and should help better highlight where it
>> makes a difference.
>>
>> If either of you would like that, then send me a fool proof recipe for
>> what should be setup so I have a poll capable dm device.
> 
> Follows steps for setup dm stripe over two nvmes, then run io_uring on
> the dm stripe dev.

Thanks! Much easier when I don't have to figure it out... Setup:

CPU: 12900K
Drives: 2x P5800X gen2 optane (~5M IOPS each at 512b)

Baseline kernel:

sudo taskset -c 10 t/io_uring -d128 -b512 -s31 -c16 -p1 -F1 -B1 -n1 -R1 -X1 
/dev/dm-0
Added file /dev/dm-0 (submitter 0)
polled=1, fixedbufs=1/0, register_files=1, buffered=0, QD=128
Engine=io_uring, sq_ring=128, cq_ring=128
submitter=0, tid=1004
IOPS=2794K, BW=1364MiB/s, IOS/call=31/30, inflight=(124)
IOPS=2793K, BW=1363MiB/s, IOS/call=31/31, inflight=(62)
IOPS=2789K, BW=1362MiB/s, IOS/call=31/30, inflight=(124)
IOPS=2779K, BW=1357MiB/s, IOS/call=31/31, inflight=(124)
IOPS=2780K, BW=1357MiB/s, IOS/call=31/31, inflight=(62)
IOPS=2779K, BW=1357MiB/s, IOS/call=31/31, inflight=(62)
^CExiting on signal
Maximum IOPS=2794K

generating about 500K ints/sec, and using 4k blocks:

sudo taskset -c 10 t/io_uring -d128 -b4096 -s31 -c16 -p1 -F1 -B1 -n1 -R1 -X1 
/dev/dm-0
Added file /dev/dm-0 (submitter 0)
polled=1, fixedbufs=1/0, register_files=1, buffered=0, QD=128
Engine=io_uring, sq_ring=128, cq_ring=128
submitter=0, tid=967
IOPS=1683K, BW=6575MiB/s, IOS/call=24/24, inflight=(93)
IOPS=1685K, BW=6584MiB/s, IOS/call=24/24, inflight=(124)
IOPS=1686K, BW=6588MiB/s, IOS/call=24/24, inflight=(124)
IOPS=1684K, BW=6581MiB/s, IOS/call=24/24, inflight=(93)
IOPS=1686K, BW=6589MiB/s, IOS/call=24/24, inflight=(124)
IOPS=1687K, BW=6593MiB/s, IOS/call=24/24, inflight=(128)
IOPS=1687K, BW=6590MiB/s, IOS/call=24/24, inflight=(93)
^CExiting on signal
Maximum IOPS=1687K

which ends up being bw limited for me, because the devices aren't linked
gen4. That's about 1.4M ints/sec.

With the patched kernel, same test:

sudo taskset -c 10 t/io_uring -d128 -b512 -s31 -c16 -p1 -F1 -B1 -n1 -R1 -X1 
/dev/dm-0
Added file /dev/dm-0 (submitter 0)
polled=1, fixedbufs=1/0, register_files=1, buffered=0, QD=128
Engine=io_uring, sq_ring=128, cq_ring=128
submitter=0, tid=989
IOPS=4151K, BW=2026MiB/s, IOS/call=16/15, inflight=(128)
IOPS=4159K, BW=2031MiB/s, IOS/call=15/15, inflight=(128)
IOPS=4193K, BW=2047MiB/s, IOS/call=15/15, inflight=(128)
IOPS=4191K, BW=2046MiB/s, IOS/call=15/15, inflight=(128)
IOPS=4202K, BW=2052MiB/s, IOS/call=15/15, inflight=(128)
^CExiting on signal
Maximum IOPS=4202K

with basically zero interrupts, and 4k:

sudo taskset -c 10 t/io_uring -d128 -b4096 -s31 -c16 -p1 -F1 -B1 -n1 -R1 -X1 
/dev/dm-0
Added file /dev/dm-0 (submitter 0)
polled=1, fixedbufs=1/0, register_files=1, buffered=0, QD=128
Engine=io_uring, sq_ring=128, cq_ring=128
submitter=0, tid=1015
IOPS=1706K, BW=MiB/s, IOS/call=15/15, inflight=(128)
IOPS=1704K, BW=6658MiB/s, IOS/call=15/15, inflight=(128)
IOPS=1704K, BW=6658MiB/s, IOS/call=15/15, inflight=(128)
IOPS=1704K, BW=6658MiB/s, IOS/call=15/15, inflight=(128)
IOPS=1704K, BW=6658MiB/s, IOS/call=15/15, inflight=(128)
^CExiting on signal
Maximum IOPS=1706K

again with basically zero interrupts.

That's about a 50% improvement for polled IO. This is using 2 gen2
optanes, which are good for ~5M IOPS each. Using two threads on a single
core, baseline kernel:

sudo taskset -c 10,1

Re: [dm-devel] [PATCH v6 2/2] dm: support bio polling

2022-03-08 Thread Jens Axboe
On 3/7/22 11:53 AM, Mike Snitzer wrote:
> From: Ming Lei 
> 
> Support bio(REQ_POLLED) polling in the following approach:
> 
> 1) only support io polling on normal READ/WRITE, and other abnormal IOs
> still fallback to IRQ mode, so the target io is exactly inside the dm
> io.
> 
> 2) hold one refcnt on io->io_count after submitting this dm bio with
> REQ_POLLED
> 
> 3) support dm native bio splitting, any dm io instance associated with
> current bio will be added into one list which head is bio->bi_private
> which will be recovered before ending this bio
> 
> 4) implement .poll_bio() callback, call bio_poll() on the single target
> bio inside the dm io which is retrieved via bio->bi_bio_drv_data; call
> dm_io_dec_pending() after the target io is done in .poll_bio()
> 
> 5) enable QUEUE_FLAG_POLL if all underlying queues enable QUEUE_FLAG_POLL,
> which is based on Jeffle's previous patch.

It's not the prettiest thing in the world with the overlay on bi_private,
but at least it's nicely documented now.

I would encourage you to actually test this on fast storage, should make
a nice difference. I can run this on a gen2 optane, it's 10x the IOPS
of what it was tested on and should help better highlight where it
makes a difference.

If either of you would like that, then send me a fool proof recipe for
what should be setup so I have a poll capable dm device.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v6 1/2] block: add ->poll_bio to block_device_operations

2022-03-08 Thread Jens Axboe
On 3/7/22 11:53 AM, Mike Snitzer wrote:
> From: Ming Lei 
> 
> Prepare for supporting IO polling for bio-based driver.
> 
> Add ->poll_bio callback so that bio-based driver can provide their own
> logic for polling bio.
> 
> Also fix ->submit_bio_bio typo in comment block above
> __submit_bio_noacct.

Assuming you want to take this through the dm tree:

Reviewed-by: Jens Axboe 

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] remove bio_devname

2022-03-07 Thread Jens Axboe
On Fri, 4 Mar 2022 19:00:55 +0100, Christoph Hellwig wrote:
> this series removes the bio_devname helper and just switches all users
> to use the %pg format string directly.
> 
> Diffstat
>  block/bio.c   |6 --
>  block/blk-core.c  |   25 +++--
>  drivers/block/pktcdvd.c   |9 +
>  drivers/md/dm-crypt.c |   10 --
>  drivers/md/dm-integrity.c |5 ++---
>  drivers/md/md-multipath.c |9 -
>  drivers/md/raid1.c|5 ++---
>  drivers/md/raid5-ppl.c|   13 -
>  fs/ext4/page-io.c |5 ++---
>  include/linux/bio.h   |2 --
>  10 files changed, 26 insertions(+), 63 deletions(-)
> 
> [...]

Applied, thanks!

[01/10] block: fix and cleanup bio_check_ro
commit: 57e95e4670d1126c103305bcf34a9442f49f6d6a
[02/10] block: remove handle_bad_sector
commit: ad740780bbc2fe37856f944dbbaff07aac9db9e3
[03/10] pktcdvd: remove a pointless debug check in pkt_submit_bio
commit: 47c426d5241795cfcd9be748c44d1b2e2987ce70
[04/10] dm-crypt: stop using bio_devname
commit: 66671719650085f92fd460d2a356c33f9198dd35
[05/10] dm-integrity: stop using bio_devname
commit: 0a806cfde82fcd1fb856864e33d17c68d1b51dee
[06/10] md-multipath: stop using bio_devname
commit: ee1925bd834418218c782c94e889f826d40b14d5
[07/10] raid1: stop using bio_devname
commit: ac483eb375fa4a815a515945a5456086c197430e
[08/10] raid5-ppl: stop using bio_devname
commit: c7dec4623c9cde20dad8de319d177ed6aa382aaa
[09/10] ext4: stop using bio_devname
commit: 734294e47a2ec48fd25dcf2d96cdf2c6c6740c00
[10/10] block: remove bio_devname
commit: 97939610b893de068c82c347d06319cd231a4602

Best regards,
-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v5 2/2] dm: support bio polling

2022-03-06 Thread Jens Axboe
On 3/6/22 7:20 PM, Ming Lei wrote:
> On Sun, Mar 06, 2022 at 06:48:15PM -0700, Jens Axboe wrote:
>> On 3/6/22 2:29 AM, Christoph Hellwig wrote:
>>>> +/*
>>>> + * Reuse ->bi_end_io as hlist head for storing all dm_io instances
>>>> + * associated with this bio, and this bio's bi_end_io has to be
>>>> + * stored in one of 'dm_io' instance first.
>>>> + */
>>>> +static inline struct hlist_head *dm_get_bio_hlist_head(struct bio *bio)
>>>> +{
>>>> +  WARN_ON_ONCE(!(bio->bi_opf & REQ_DM_POLL_LIST));
>>>> +
>>>> +  return (struct hlist_head *)>bi_end_io;
>>>> +}
>>>
>>> So this reuse is what I really hated.  I still think we should be able
>>> to find space in the bio by creatively shifting fields around to just
>>> add the hlist there directly, which would remove the need for this
>>> override and more importantly the quite cumbersome saving and restoring
>>> of the end_io handler.
>>
>> If it's possible, then that would be preferable. But I don't think
>> that's going to be easy to do...
> 
> I agree, now basically there isn't gap inside bio, so either adding one
> new field or reusing one existed field...

There'd no amount of re-arranging that'll free up 8 bytes, that's just
not happening. I'm not a huge fan of growing struct bio for that, and
the oddity here is mostly (to me) that ->bi_end_io is the one overlayed.
That would usually belong to the owner of the bio.

Maybe some commenting would help? Is bi_next available at this point?

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v5 2/2] dm: support bio polling

2022-03-06 Thread Jens Axboe
On 3/6/22 2:29 AM, Christoph Hellwig wrote:
>> +/*
>> + * Reuse ->bi_end_io as hlist head for storing all dm_io instances
>> + * associated with this bio, and this bio's bi_end_io has to be
>> + * stored in one of 'dm_io' instance first.
>> + */
>> +static inline struct hlist_head *dm_get_bio_hlist_head(struct bio *bio)
>> +{
>> +WARN_ON_ONCE(!(bio->bi_opf & REQ_DM_POLL_LIST));
>> +
>> +return (struct hlist_head *)>bi_end_io;
>> +}
> 
> So this reuse is what I really hated.  I still think we should be able
> to find space in the bio by creatively shifting fields around to just
> add the hlist there directly, which would remove the need for this
> override and more importantly the quite cumbersome saving and restoring
> of the end_io handler.

If it's possible, then that would be preferable. But I don't think
that's going to be easy to do...

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v4 1/2] block: add ->poll_bio to block_device_operations

2022-03-04 Thread Jens Axboe
On 3/4/22 2:26 PM, Mike Snitzer wrote:
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 94bf37f8e61d..e739c6264331 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -985,10 +985,16 @@ int bio_poll(struct bio *bio, struct io_comp_batch 
> *iob, unsigned int flags)
>  
>   if (blk_queue_enter(q, BLK_MQ_REQ_NOWAIT))
>   return 0;
> - if (WARN_ON_ONCE(!queue_is_mq(q)))
> - ret = 0;/* not yet implemented, should not happen */
> - else
> + if (queue_is_mq(q)) {
>   ret = blk_mq_poll(q, cookie, iob, flags);
> + } else {
> + struct gendisk *disk = q->disk;
> +
> + if (disk && disk->fops->poll_bio)
> + ret = disk->fops->poll_bio(bio, iob, flags);
> + else
> + ret = !WARN_ON_ONCE(1);

This is an odd way to do it, would be a lot more readable as

ret = 0;
WARN_ON_ONCE(1);

if we even need that WARN_ON?

> diff --git a/block/genhd.c b/block/genhd.c
> index e351fac41bf2..eb43fa63ba47 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -410,6 +410,8 @@ int __must_check device_add_disk(struct device *parent, 
> struct gendisk *disk,
>   struct device *ddev = disk_to_dev(disk);
>   int ret;
>  
> + WARN_ON_ONCE(queue_is_mq(disk->queue) && disk->fops->poll_bio);

Also seems kind of useless, maybe at least combine it with failing to
add the disk. This is a "I'm developing some new driver or feature"
failure, and would be more visible that way. And if you do that, then
the WARN_ON_ONCE() seems pointless anyway, and I'd just do:

if (queue_is_mq(disk->queue) && disk->fops->poll_bio)
return -EINVAL;

or something like that, with a comment saying why that doesn't make any
sense.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] remove REQ_OP_WRITE_SAME v2

2022-02-16 Thread Jens Axboe
On 2/9/22 10:51 PM, Christoph Hellwig wrote:
> On Wed, Feb 09, 2022 at 01:00:26PM -0500, Martin K. Petersen wrote:
>>
>> Christoph,
>>
>>> Now that we are using REQ_OP_WRITE_ZEROES for all zeroing needs in the
>>> kernel there is very little use left for REQ_OP_WRITE_SAME.  We only
>>> have two callers left, and both just export optional protocol features
>>> to remote systems: DRBD and the target code.
>>
>> No particular objections from me. I had a half-baked series to do the
>> same thing.
>>
>> One thing I would like is to either pull this series through SCSI or do
>> the block pieces in a post merge branch because I'm about to post my
>> discard/zeroing rework and that's going to clash with your changes.
> 
> I'd be fine with taking this through the SCSI tree.  Or we can wait
> another merge window to make your life easier.

Let's just use the SCSI tree - I didn't check if it throws any conflicts
right now, so probably something to check upfront...

If things pan out, you can add my Acked-by to the series.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] make the blk-mq stacking interface optional

2022-02-16 Thread Jens Axboe
On Tue, 15 Feb 2022 11:05:35 +0100, Christoph Hellwig wrote:
> this series requires an explicit select to use the blk-mq stacking
> interfaces.  This means they don't get build without dm support, and
> thus the buildbot should catch abuses like the one we had in the ufs
> driver more easily.  And while I touched this code I also ended up
> cleaning up various loose ends.
> 
> Diffstat:
>  block/Kconfig  |3 +++
>  block/blk-mq.c |   45 +
>  drivers/md/Kconfig |1 +
>  drivers/md/dm-rq.c |   26 +-
>  include/linux/blk-mq.h |3 +--
>  5 files changed, 23 insertions(+), 55 deletions(-)
> 
> [...]

Applied, thanks!

[1/5] blk-mq: make the blk-mq stacking code optional
  (no commit info)
[2/5] blk-mq: fold blk_cloned_rq_check_limits into blk_insert_cloned_request
  (no commit info)
[3/5] blk-mq: remove the request_queue argument to blk_insert_cloned_request
  (no commit info)
[4/5] dm: remove useless code from dm_dispatch_clone_request
  (no commit info)
[5/5] dm: remove dm_dispatch_clone_request
  (no commit info)

Best regards,
-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] improve the bio cloning interface v2

2022-02-04 Thread Jens Axboe
On Wed, 2 Feb 2022 17:00:56 +0100, Christoph Hellwig wrote:
> this series changes the bio cloning interface to match the rest changes
> to the bio allocation interface and passes the block_device and operation
> to the cloning helpers.  In addition it renames the cloning helpers to
> be more descriptive.
> 
> To get there it requires a bit of refactoring in the device mapper code.
> 
> [...]

Applied, thanks!

[01/13] drbd: set ->bi_bdev in drbd_req_new
commit: c347a787e34cba0e5a80a04082dacaf259105605
[02/13] dm: add a clone_to_tio helper
commit: 6c23f0bd7f16d88c774db37b30c5da82811c41be
[03/13] dm: fold clone_bio into __clone_and_map_data_bio
commit: b1bee79237ce0ab43ef7fe66aa6e5c4783165012
[04/13] dm: fold __send_duplicate_bios into __clone_and_map_simple_bio
commit: 8eabf5d0a7bd9226d6cc25402dde67f372aae838
[05/13] dm: move cloning the bio into alloc_tio
commit: dc8e2021da71f6b2d5971f98ee3e528cf30c409c
[06/13] dm: pass the bio instead of tio to __map_bio
commit: 1561b396106d759fdf5f9a71b412e068f74d2cc9
[07/13] dm: retun the clone bio from alloc_tio
commit: 1d1068cecff70cb8e48c7cb0ba27cc3fd906eb31
[08/13] dm: simplify the single bio fast path in __send_duplicate_bios
commit: 891fced644a7529bfd4b1436b2341527ce8f68ad
[09/13] dm-cache: remove __remap_to_origin_clear_discard
commit: 3c4b455ef8acdacd0e5ecd33428d4f32f861637a
[10/13] block: clone crypto and integrity data in __bio_clone_fast
commit: 56b4b5abcdab6daf71c5536fca2772f178590e06
[11/13] dm: use bio_clone_fast in alloc_io/alloc_tio
commit: 92986f6b4c8a2c24d3a36b80140624f80fd93de4
[12/13] block: initialize the target bio in __bio_clone_fast
commit: a0e8de798dd6710a69d69ec57b246a0e34c4a695
[13/13] block: pass a block_device to bio_clone_fast
commit: abfc426d1b2fb2176df59851a64223b58ddae7e7

Best regards,
-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [RFC PATCH 1/3] block: add copy offload support

2022-02-04 Thread Jens Axboe
On 2/4/22 5:09 AM, Mikulas Patocka wrote:
> 
> 
> On Thu, 3 Feb 2022, Bart Van Assche wrote:
> 
>> On 2/3/22 10:50, Mikulas Patocka wrote:
>>> On Tue, 1 Feb 2022, Bart Van Assche wrote:
>>>> On 2/1/22 10:32, Mikulas Patocka wrote:
>>>>>/**
>>>>> + * blk_queue_max_copy_sectors - set maximum copy offload sectors for
>>>>> the
>>>>> queue
>>>>> + * @q:  the request queue for the device
>>>>> + * @size:  the maximum copy offload sectors
>>>>> + */
>>>>> +void blk_queue_max_copy_sectors(struct request_queue *q, unsigned int
>>>>> size)
>>>>> +{
>>>>> + q->limits.max_copy_sectors = size;
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(blk_queue_max_copy_sectors);
>>>>
>>>> Please either change the unit of 'size' into bytes or change its type into
>>>> sector_t.
>>>
>>> blk_queue_chunk_sectors, blk_queue_max_discard_sectors,
>>> blk_queue_max_write_same_sectors, blk_queue_max_write_zeroes_sectors,
>>> blk_queue_max_zone_append_sectors also have the unit of sectors and the
>>> argument is "unsigned int". Should blk_queue_max_copy_sectors be
>>> different?
>>
>> As far as I know using the type sector_t for variables that represent a 
>> number
>> of sectors is a widely followed convention:
>>
>> $ git grep -w sector_t | wc -l
>> 2575
>>
>> I would appreciate it if that convention would be used consistently, even if
>> that means modifying existing code.
>>
>> Thanks,
>>
>> Bart.
> 
> Changing the sector limit variables in struct queue_limits from
> unsigned int to sector_t would increase the size of the structure and
> its cache footprint.
> 
> And we can't send bios larger than 4GiB anyway because bi_size is
> 32-bit.
> 
> Jens, what do you think about it? Should the sectors limits be
> sector_t?

Why make it larger than it needs to?

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] improve the bio allocation interface v2

2022-02-02 Thread Jens Axboe
On Mon, 24 Jan 2022 10:10:48 +0100, Christoph Hellwig wrote:
> this series is posted early because it has wide-ranging changes and could use 
> some
> early ACKs before -rc1.
> 
> It changes the interface to the bio allocators to always pass a block_device 
> and
> the operation, which is information needed for every bio submitted through
> bio_submit.  This means the fields can be directly initialized in bio_init 
> instead
> of first being zeroed and thus should help to micro-optimize even better than 
> the
> __bio_set_dev that Pavel proposed while also cleaning up code.
> 
> [...]

Applied, thanks!

[01/19] fs: remove mpage_alloc
commit: d5f68a42da7a4516e7503c281a54a58727f07dc3
[02/19] nilfs2: remove nilfs_alloc_seg_bio
commit: f0d911927b3c7cf5f9edb5941d0287144a602d0d
[03/19] nfs/blocklayout: remove bl_alloc_init_bio
commit: 5d2ca2132f889bc2c90d6d07fc9fc129cfee8955
[04/19] ntfs3: remove ntfs_alloc_bio
commit: 39146b6f66ba5c107d5c5758a17f290846165b4d
[05/19] dm: bio_alloc can't fail if it is allowed to sleep
commit: 53db984e004c7116ce69e2f4a163664453336ae1
[06/19] dm-crypt: remove clone_init
commit: 3f868c09ea8f40f800c4c644c072d91c9eee0d71
[07/19] dm-snap: use blkdev_issue_flush instead of open coding it
commit: eba33b8ef1b90d8996eceb0569c06a4f784ef2b5
[08/19] dm-thin: use blkdev_issue_flush instead of open coding it
commit: 28d7d128aad5cd2178b158900d58365d1fd3de94
[09/19] drbd: bio_alloc can't fail if it is allow to sleep
commit: 4b1dc86d1857f1007865cab759f2285280692eee
[10/19] rnbd-srv: simplify bio mapping in process_rdma
commit: 1fe0640ff94feae6d21417e2f4f2829b882274b1
[11/19] rnbd-srv: remove struct rnbd_dev_blk_io
commit: d7b78de2b1552e3e7ce3a069f075cc2729aa5c34
[12/19] xen-blkback: bio_alloc can't fail if it is allow to sleep
commit: 7d8d0c658d48705fca35238a8ff601b7c5cbc0de
[13/19] block: move blk_next_bio to bio.c
commit: 3b005bf6acf009abd700e2c652c86e5c209cf63d
[14/19] block: pass a block_device and opf to blk_next_bio
commit: 0a3140ea0fae377c9eaa031b7db1670ae422ed47
[15/19] block: pass a block_device and opf to bio_alloc_bioset
commit: 609be1066731fea86436f5f91022f82e592ab456
[16/19] block: pass a block_device and opf to bio_alloc_kiocb
commit: b77c88c2100ce6a5ec8126c13599b5a7f6663e32
[17/19] block: pass a block_device and opf to bio_alloc
commit: 07888c665b405b1cd3577ddebfeb74f4717a84c4
[18/19] block: pass a block_device and opf to bio_init
commit: 49add4966d79244013fce35f95c6833fae82b8b1
[19/19] block: pass a block_device and opf to bio_reset
commit: a7c50c940477bae89fb2b4f51bd969a2d95d7512

Best regards,
-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v4 0/3] block/dm: fix bio-based DM IO accounting

2022-01-28 Thread Jens Axboe
On Fri, 28 Jan 2022 10:58:38 -0500, Mike Snitzer wrote:
> [this v4 is final iteration, should be "ready"...]
> 
> Hi Jens,
> 
> Just over 3 years ago, with commit a1e1cb72d9649 ("dm: fix redundant
> IO accounting for bios that need splitting") I focused too narrowly on
> fixing the reported potential for redundant accounting for IO totals.
> Which, at least mentally for me, papered over how inaccurate all other
> bio-based DM's IO accounting is for bios that get split.
> 
> [...]

Applied, thanks!

[1/3] block: add bio_start_io_acct_time() to control start_time
  commit: 5a6cd1d29f2104bd0306a0f839c8b328395b784f
[2/3] dm: revert partial fix for redundant bio-based IO accounting
  commit: b6e31a39c63e0214937c8c586faa10122913e935
[3/3] dm: properly fix redundant bio-based IO accounting
  commit: 3c4ae3478082388ae9680a932d6bfd54c10fca0d

Best regards,
-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] improve the bio allocation interface

2022-01-19 Thread Jens Axboe
On 1/18/22 12:19 AM, Christoph Hellwig wrote:
> Hi Jens,
> 
> this series is posted early because it has wide-ranging changes and
> could use some early ACKs before -rc1.
> 
> It changes the interface to the bio allocators to always pass a
> block_device and the operation, which is information needed for every
> bio submitted through bio_submit.  This means the fields can be
> directly initialized in bio_init instead of first being zeroed and
> thus should help to micro-optimize even better than the __bio_set_dev
> that Pavel proposed while also cleaning up code.

Looks pretty straight forward from the block core point of view. Didn't
look too closely at the fs/driver changes yet.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 2/3] scsi: make sure that request queue queiesce and unquiesce balanced

2021-11-02 Thread Jens Axboe
On 11/2/21 8:47 AM, James Bottomley wrote:
> On Tue, 2021-11-02 at 08:41 -0600, Jens Axboe wrote:
>> On 11/2/21 8:36 AM, Jens Axboe wrote:
>>> On 11/2/21 8:33 AM, James Bottomley wrote:
>>>> On Tue, 2021-11-02 at 06:59 -0600, Jens Axboe wrote:
>>>>> On 11/1/21 7:43 PM, James Bottomley wrote:
>>>>>> On Thu, 2021-10-21 at 22:59 +0800, Ming Lei wrote:
>>>>>>> For fixing queue quiesce race between driver and block
>>>>>>> layer(elevator switch, update nr_requests, ...), we need to
>>>>>>> support concurrent quiesce and unquiesce, which requires
>>>>>>> the two
>>>>>>> call balanced.
>>>>>>>
>>>>>>> It isn't easy to audit that in all scsi drivers, especially
>>>>>>> the two may be called from different contexts, so do it in
>>>>>>> scsi core with one per-device bit flag & global spinlock,
>>>>>>> basically zero cost since request queue quiesce is seldom
>>>>>>> triggered.
>>>>>>>
>>>>>>> Reported-by: Yi Zhang 
>>>>>>> Fixes: e70feb8b3e68 ("blk-mq: support concurrent queue
>>>>>>> quiesce/unquiesce")
>>>>>>> Signed-off-by: Ming Lei 
>>>>>>> ---
>>>>>>>  drivers/scsi/scsi_lib.c| 45
>>>>>>> ++
>>>>>>> 
>>>>>>> 
>>>>>>>  include/scsi/scsi_device.h |  1 +
>>>>>>>  2 files changed, 37 insertions(+), 9 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/scsi/scsi_lib.c
>>>>>>> b/drivers/scsi/scsi_lib.c
>>>>>>> index 51fcd46be265..414f4daf8005 100644
>>>>>>> --- a/drivers/scsi/scsi_lib.c
>>>>>>> +++ b/drivers/scsi/scsi_lib.c
>>>>>>> @@ -2638,6 +2638,40 @@ static int
>>>>>>> __scsi_internal_device_block_nowait(struct scsi_device
>>>>>>> *sdev)
>>>>>>> return 0;
>>>>>>>  }
>>>>>>>  
>>>>>>> +static DEFINE_SPINLOCK(sdev_queue_stop_lock);
>>>>>>> +
>>>>>>> +void scsi_start_queue(struct scsi_device *sdev)
>>>>>>> +{
>>>>>>> +   bool need_start;
>>>>>>> +   unsigned long flags;
>>>>>>> +
>>>>>>> +   spin_lock_irqsave(_queue_stop_lock, flags);
>>>>>>> +   need_start = sdev->queue_stopped;
>>>>>>> +   sdev->queue_stopped = 0;
>>>>>>> +   spin_unlock_irqrestore(_queue_stop_lock, flags);
>>>>>>> +
>>>>>>> +   if (need_start)
>>>>>>> +   blk_mq_unquiesce_queue(sdev->request_queue);
>>>>>>
>>>>>> Well, this is a classic atomic pattern:
>>>>>>
>>>>>> if (cmpxchg(>queue_stopped, 1, 0))
>>>>>>  blk_mq_unquiesce_queue(sdev->request_queue);
>>>>>>
>>>>>> The reason to do it with atomics rather than spinlocks is
>>>>>>
>>>>>>1. no need to disable interrupts: atomics are locked
>>>>>>2. faster because a spinlock takes an exclusive line every
>>>>>> time but the
>>>>>>   read to check the value can be in shared mode in
>>>>>> cmpxchg
>>>>>>3. it's just shorter and better code.
>>>>>>
>>>>>> The only minor downside is queue_stopped now needs to be a
>>>>>> u32.
>>>>>
>>>>> Are you fine with the change as-is, or do you want it redone? I
>>>>> can drop the SCSI parts and just queue up the dm fix.
>>>>> Personally I think it'd be better to get it fixed upfront.
>>>>
>>>> Well, given the path isn't hot, I don't really care.  However,
>>>> what I don't want is to have to continually bat back patches from
>>>> the make work code churners trying to update this code for being
>>>> the wrong pattern.  I think at the very least it needs a comment
>>>> saying why we chose a suboptimal pattern to try to forestall
>>>> this.
>>>
>>> Right, with a comment it's probably better. And as you said, since
>>> it's not a hot path, don't think we'd be revisiting it anyway.
>>>
>>> I'll amend the patch with a comment.
>>
>> I started adding the comment and took another look at this, and that
>> made me change my mind. We really should make this a cmpxcgh, it's
>> not even using a device lock here.
>>
>> I've dropped the two SCSI patches for now, Ming can you resend? If
>> James agrees, I really think queue_stopped should just have the type
>> changed and the patch redone with that using cmpxcgh().
> 
> Well, that's what I suggested originally, so I agree ... I don't think
> 31 more bytes is going to be a huge burden to scsi_device.
  

Bits? :-)

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 2/3] scsi: make sure that request queue queiesce and unquiesce balanced

2021-11-02 Thread Jens Axboe
On 11/2/21 8:47 AM, James Bottomley wrote:
> On Tue, 2021-11-02 at 08:41 -0600, Jens Axboe wrote:
>> On 11/2/21 8:36 AM, Jens Axboe wrote:
>>> On 11/2/21 8:33 AM, James Bottomley wrote:
>>>> On Tue, 2021-11-02 at 06:59 -0600, Jens Axboe wrote:
>>>>> On 11/1/21 7:43 PM, James Bottomley wrote:
>>>>>> On Thu, 2021-10-21 at 22:59 +0800, Ming Lei wrote:
>>>>>>> For fixing queue quiesce race between driver and block
>>>>>>> layer(elevator switch, update nr_requests, ...), we need to
>>>>>>> support concurrent quiesce and unquiesce, which requires
>>>>>>> the two
>>>>>>> call balanced.
>>>>>>>
>>>>>>> It isn't easy to audit that in all scsi drivers, especially
>>>>>>> the two may be called from different contexts, so do it in
>>>>>>> scsi core with one per-device bit flag & global spinlock,
>>>>>>> basically zero cost since request queue quiesce is seldom
>>>>>>> triggered.
>>>>>>>
>>>>>>> Reported-by: Yi Zhang 
>>>>>>> Fixes: e70feb8b3e68 ("blk-mq: support concurrent queue
>>>>>>> quiesce/unquiesce")
>>>>>>> Signed-off-by: Ming Lei 
>>>>>>> ---
>>>>>>>  drivers/scsi/scsi_lib.c| 45
>>>>>>> ++
>>>>>>> 
>>>>>>> 
>>>>>>>  include/scsi/scsi_device.h |  1 +
>>>>>>>  2 files changed, 37 insertions(+), 9 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/scsi/scsi_lib.c
>>>>>>> b/drivers/scsi/scsi_lib.c
>>>>>>> index 51fcd46be265..414f4daf8005 100644
>>>>>>> --- a/drivers/scsi/scsi_lib.c
>>>>>>> +++ b/drivers/scsi/scsi_lib.c
>>>>>>> @@ -2638,6 +2638,40 @@ static int
>>>>>>> __scsi_internal_device_block_nowait(struct scsi_device
>>>>>>> *sdev)
>>>>>>> return 0;
>>>>>>>  }
>>>>>>>  
>>>>>>> +static DEFINE_SPINLOCK(sdev_queue_stop_lock);
>>>>>>> +
>>>>>>> +void scsi_start_queue(struct scsi_device *sdev)
>>>>>>> +{
>>>>>>> +   bool need_start;
>>>>>>> +   unsigned long flags;
>>>>>>> +
>>>>>>> +   spin_lock_irqsave(_queue_stop_lock, flags);
>>>>>>> +   need_start = sdev->queue_stopped;
>>>>>>> +   sdev->queue_stopped = 0;
>>>>>>> +   spin_unlock_irqrestore(_queue_stop_lock, flags);
>>>>>>> +
>>>>>>> +   if (need_start)
>>>>>>> +   blk_mq_unquiesce_queue(sdev->request_queue);
>>>>>>
>>>>>> Well, this is a classic atomic pattern:
>>>>>>
>>>>>> if (cmpxchg(>queue_stopped, 1, 0))
>>>>>>  blk_mq_unquiesce_queue(sdev->request_queue);
>>>>>>
>>>>>> The reason to do it with atomics rather than spinlocks is
>>>>>>
>>>>>>1. no need to disable interrupts: atomics are locked
>>>>>>2. faster because a spinlock takes an exclusive line every
>>>>>> time but the
>>>>>>   read to check the value can be in shared mode in
>>>>>> cmpxchg
>>>>>>3. it's just shorter and better code.
>>>>>>
>>>>>> The only minor downside is queue_stopped now needs to be a
>>>>>> u32.
>>>>>
>>>>> Are you fine with the change as-is, or do you want it redone? I
>>>>> can drop the SCSI parts and just queue up the dm fix.
>>>>> Personally I think it'd be better to get it fixed upfront.
>>>>
>>>> Well, given the path isn't hot, I don't really care.  However,
>>>> what I don't want is to have to continually bat back patches from
>>>> the make work code churners trying to update this code for being
>>>> the wrong pattern.  I think at the very least it needs a comment
>>>> saying why we chose a suboptimal pattern to try to forestall
>>>> this.
>>>
>>> Right, with a comment it's probably better. And as you said, since
>>> it's not a hot path, don't think we'd be revisiting it anyway.
>>>
>>> I'll amend the patch with a comment.
>>
>> I started adding the comment and took another look at this, and that
>> made me change my mind. We really should make this a cmpxcgh, it's
>> not even using a device lock here.
>>
>> I've dropped the two SCSI patches for now, Ming can you resend? If
>> James agrees, I really think queue_stopped should just have the type
>> changed and the patch redone with that using cmpxcgh().
> 
> Well, that's what I suggested originally, so I agree ... I don't think
> 31 more bytes is going to be a huge burden to scsi_device.

Yeah I know, just saying I agree with that being the better solution.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 2/3] scsi: make sure that request queue queiesce and unquiesce balanced

2021-11-02 Thread Jens Axboe
On 11/2/21 8:36 AM, Jens Axboe wrote:
> On 11/2/21 8:33 AM, James Bottomley wrote:
>> On Tue, 2021-11-02 at 06:59 -0600, Jens Axboe wrote:
>>> On 11/1/21 7:43 PM, James Bottomley wrote:
>>>> On Thu, 2021-10-21 at 22:59 +0800, Ming Lei wrote:
>>>>> For fixing queue quiesce race between driver and block
>>>>> layer(elevator switch, update nr_requests, ...), we need to
>>>>> support concurrent quiesce and unquiesce, which requires the two
>>>>> call balanced.
>>>>>
>>>>> It isn't easy to audit that in all scsi drivers, especially the
>>>>> two may be called from different contexts, so do it in scsi core
>>>>> with one per-device bit flag & global spinlock, basically zero
>>>>> cost since request queue quiesce is seldom triggered.
>>>>>
>>>>> Reported-by: Yi Zhang 
>>>>> Fixes: e70feb8b3e68 ("blk-mq: support concurrent queue
>>>>> quiesce/unquiesce")
>>>>> Signed-off-by: Ming Lei 
>>>>> ---
>>>>>  drivers/scsi/scsi_lib.c| 45 ++
>>>>> 
>>>>> 
>>>>>  include/scsi/scsi_device.h |  1 +
>>>>>  2 files changed, 37 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>>>> index 51fcd46be265..414f4daf8005 100644
>>>>> --- a/drivers/scsi/scsi_lib.c
>>>>> +++ b/drivers/scsi/scsi_lib.c
>>>>> @@ -2638,6 +2638,40 @@ static int
>>>>> __scsi_internal_device_block_nowait(struct scsi_device *sdev)
>>>>>   return 0;
>>>>>  }
>>>>>  
>>>>> +static DEFINE_SPINLOCK(sdev_queue_stop_lock);
>>>>> +
>>>>> +void scsi_start_queue(struct scsi_device *sdev)
>>>>> +{
>>>>> + bool need_start;
>>>>> + unsigned long flags;
>>>>> +
>>>>> + spin_lock_irqsave(_queue_stop_lock, flags);
>>>>> + need_start = sdev->queue_stopped;
>>>>> + sdev->queue_stopped = 0;
>>>>> + spin_unlock_irqrestore(_queue_stop_lock, flags);
>>>>> +
>>>>> + if (need_start)
>>>>> + blk_mq_unquiesce_queue(sdev->request_queue);
>>>>
>>>> Well, this is a classic atomic pattern:
>>>>
>>>> if (cmpxchg(>queue_stopped, 1, 0))
>>>>blk_mq_unquiesce_queue(sdev->request_queue);
>>>>
>>>> The reason to do it with atomics rather than spinlocks is
>>>>
>>>>1. no need to disable interrupts: atomics are locked
>>>>2. faster because a spinlock takes an exclusive line every time
>>>> but the
>>>>   read to check the value can be in shared mode in cmpxchg
>>>>3. it's just shorter and better code.
>>>>
>>>> The only minor downside is queue_stopped now needs to be a u32.
>>>
>>> Are you fine with the change as-is, or do you want it redone? I
>>> can drop the SCSI parts and just queue up the dm fix. Personally
>>> I think it'd be better to get it fixed upfront.
>>
>> Well, given the path isn't hot, I don't really care.  However, what I
>> don't want is to have to continually bat back patches from the make
>> work code churners trying to update this code for being the wrong
>> pattern.  I think at the very least it needs a comment saying why we
>> chose a suboptimal pattern to try to forestall this.
> 
> Right, with a comment it's probably better. And as you said, since it's
> not a hot path, don't think we'd be revisiting it anyway.
> 
> I'll amend the patch with a comment.

I started adding the comment and took another look at this, and that
made me change my mind. We really should make this a cmpxcgh, it's not
even using a device lock here.

I've dropped the two SCSI patches for now, Ming can you resend? If James
agrees, I really think queue_stopped should just have the type changed
and the patch redone with that using cmpxcgh().

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 2/3] scsi: make sure that request queue queiesce and unquiesce balanced

2021-11-02 Thread Jens Axboe
On 11/2/21 8:33 AM, James Bottomley wrote:
> On Tue, 2021-11-02 at 06:59 -0600, Jens Axboe wrote:
>> On 11/1/21 7:43 PM, James Bottomley wrote:
>>> On Thu, 2021-10-21 at 22:59 +0800, Ming Lei wrote:
>>>> For fixing queue quiesce race between driver and block
>>>> layer(elevator switch, update nr_requests, ...), we need to
>>>> support concurrent quiesce and unquiesce, which requires the two
>>>> call balanced.
>>>>
>>>> It isn't easy to audit that in all scsi drivers, especially the
>>>> two may be called from different contexts, so do it in scsi core
>>>> with one per-device bit flag & global spinlock, basically zero
>>>> cost since request queue quiesce is seldom triggered.
>>>>
>>>> Reported-by: Yi Zhang 
>>>> Fixes: e70feb8b3e68 ("blk-mq: support concurrent queue
>>>> quiesce/unquiesce")
>>>> Signed-off-by: Ming Lei 
>>>> ---
>>>>  drivers/scsi/scsi_lib.c| 45 ++
>>>> 
>>>> 
>>>>  include/scsi/scsi_device.h |  1 +
>>>>  2 files changed, 37 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>>> index 51fcd46be265..414f4daf8005 100644
>>>> --- a/drivers/scsi/scsi_lib.c
>>>> +++ b/drivers/scsi/scsi_lib.c
>>>> @@ -2638,6 +2638,40 @@ static int
>>>> __scsi_internal_device_block_nowait(struct scsi_device *sdev)
>>>>return 0;
>>>>  }
>>>>  
>>>> +static DEFINE_SPINLOCK(sdev_queue_stop_lock);
>>>> +
>>>> +void scsi_start_queue(struct scsi_device *sdev)
>>>> +{
>>>> +  bool need_start;
>>>> +  unsigned long flags;
>>>> +
>>>> +  spin_lock_irqsave(_queue_stop_lock, flags);
>>>> +  need_start = sdev->queue_stopped;
>>>> +  sdev->queue_stopped = 0;
>>>> +  spin_unlock_irqrestore(_queue_stop_lock, flags);
>>>> +
>>>> +  if (need_start)
>>>> +  blk_mq_unquiesce_queue(sdev->request_queue);
>>>
>>> Well, this is a classic atomic pattern:
>>>
>>> if (cmpxchg(>queue_stopped, 1, 0))
>>> blk_mq_unquiesce_queue(sdev->request_queue);
>>>
>>> The reason to do it with atomics rather than spinlocks is
>>>
>>>1. no need to disable interrupts: atomics are locked
>>>2. faster because a spinlock takes an exclusive line every time
>>> but the
>>>   read to check the value can be in shared mode in cmpxchg
>>>3. it's just shorter and better code.
>>>
>>> The only minor downside is queue_stopped now needs to be a u32.
>>
>> Are you fine with the change as-is, or do you want it redone? I
>> can drop the SCSI parts and just queue up the dm fix. Personally
>> I think it'd be better to get it fixed upfront.
> 
> Well, given the path isn't hot, I don't really care.  However, what I
> don't want is to have to continually bat back patches from the make
> work code churners trying to update this code for being the wrong
> pattern.  I think at the very least it needs a comment saying why we
> chose a suboptimal pattern to try to forestall this.

Right, with a comment it's probably better. And as you said, since it's
not a hot path, don't think we'd be revisiting it anyway.

I'll amend the patch with a comment.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 2/3] scsi: make sure that request queue queiesce and unquiesce balanced

2021-11-02 Thread Jens Axboe
On 11/1/21 7:43 PM, James Bottomley wrote:
> On Thu, 2021-10-21 at 22:59 +0800, Ming Lei wrote:
>> For fixing queue quiesce race between driver and block layer(elevator
>> switch, update nr_requests, ...), we need to support concurrent
>> quiesce
>> and unquiesce, which requires the two call balanced.
>>
>> It isn't easy to audit that in all scsi drivers, especially the two
>> may
>> be called from different contexts, so do it in scsi core with one
>> per-device
>> bit flag & global spinlock, basically zero cost since request queue
>> quiesce
>> is seldom triggered.
>>
>> Reported-by: Yi Zhang 
>> Fixes: e70feb8b3e68 ("blk-mq: support concurrent queue
>> quiesce/unquiesce")
>> Signed-off-by: Ming Lei 
>> ---
>>  drivers/scsi/scsi_lib.c| 45 ++
>> 
>>  include/scsi/scsi_device.h |  1 +
>>  2 files changed, 37 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> index 51fcd46be265..414f4daf8005 100644
>> --- a/drivers/scsi/scsi_lib.c
>> +++ b/drivers/scsi/scsi_lib.c
>> @@ -2638,6 +2638,40 @@ static int
>> __scsi_internal_device_block_nowait(struct scsi_device *sdev)
>>  return 0;
>>  }
>>  
>> +static DEFINE_SPINLOCK(sdev_queue_stop_lock);
>> +
>> +void scsi_start_queue(struct scsi_device *sdev)
>> +{
>> +bool need_start;
>> +unsigned long flags;
>> +
>> +spin_lock_irqsave(_queue_stop_lock, flags);
>> +need_start = sdev->queue_stopped;
>> +sdev->queue_stopped = 0;
>> +spin_unlock_irqrestore(_queue_stop_lock, flags);
>> +
>> +if (need_start)
>> +blk_mq_unquiesce_queue(sdev->request_queue);
> 
> Well, this is a classic atomic pattern:
> 
> if (cmpxchg(>queue_stopped, 1, 0))
>   blk_mq_unquiesce_queue(sdev->request_queue);
> 
> The reason to do it with atomics rather than spinlocks is
> 
>1. no need to disable interrupts: atomics are locked
>2. faster because a spinlock takes an exclusive line every time but the
>   read to check the value can be in shared mode in cmpxchg
>3. it's just shorter and better code.
> 
> The only minor downside is queue_stopped now needs to be a u32.

Are you fine with the change as-is, or do you want it redone? I
can drop the SCSI parts and just queue up the dm fix. Personally
I think it'd be better to get it fixed upfront.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 0/3] block: keep quiesce & unquiesce balanced for scsi/dm

2021-11-01 Thread Jens Axboe
On Thu, 21 Oct 2021 22:59:15 +0800, Ming Lei wrote:
> Recently we merge the patch of e70feb8b3e68 ("blk-mq: support concurrent queue
> quiesce/unquiesce") for fixing race between driver and block layer wrt.
> queue quiesce.
> 
> Yi reported that srp/002 is broken with this patch, turns out scsi and
> dm don't keep the two balanced actually.
> 
> [...]

Applied, thanks!

[1/3] scsi: avoid to quiesce sdev->request_queue two times
  commit: 256117fb3b4f8832d6b29485d49d37ccc4c314d5
[2/3] scsi: make sure that request queue queiesce and unquiesce balanced
  commit: fba9539fc2109740e70e77c303dec50d1411e11f
[3/3] dm: don't stop request queue after the dm device is suspended
  commit: e719593760c34fbf346fc6e348113e042feb5f63

Best regards,
-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v6 0/4] blk-crypto cleanups

2021-10-21 Thread Jens Axboe
On Mon, 18 Oct 2021 11:04:49 -0700, Eric Biggers wrote:
> This series renames struct blk_keyslot_manager to struct
> blk_crypto_profile, as it is misnamed; it doesn't always manage
> keyslots.  It's much more logical to think of it as the
> "blk-crypto profile" of a device, similar to blk_integrity_profile.
> 
> This series also improves the inline-encryption.rst documentation file,
> and cleans up blk-crypto-fallback a bit.
> 
> [...]

Applied, thanks!

[1/4] blk-crypto-fallback: properly prefix function and struct names
  commit: eebcafaebb17cb8fda671709fab5dd836bdc3a08
[2/4] blk-crypto: rename keyslot-manager files to blk-crypto-profile
  commit: 1e8d44bddf57f6d878e083f281a34d5c88feb7db
[3/4] blk-crypto: rename blk_keyslot_manager to blk_crypto_profile
  commit: cb77cb5abe1f4fae4a33b735606aae22f9eaa1c7
[4/4] blk-crypto: update inline encryption documentation
  commit: 8e9f666a6e66d3f882c094646d35536d2759103a

Best regards,
-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] (subset) [PATCH 0/9] block: reviewed add_disk() error handling set

2021-10-21 Thread Jens Axboe
On Fri, 15 Oct 2021 16:30:19 -0700, Luis Chamberlain wrote:
> Jens,
> 
> I had last split up patches into 7 groups, but at this point now
> most changes are merged except a few more drivers. Instead of creating
> a new patch set for each of the 7 groups I'm creating 3 new groups of
> patches now:
> 
> [...]

Applied, thanks!

[3/9] dm: add add_disk() error handling
  commit: e7089f65dd51afeda5eb760506b5950d95f9ec29
[4/9] bcache: add error handling support for add_disk()
  commit: 2961c3bbcaec0ed7fb7b9a465b3796f37f2294e5
[5/9] xen-blkfront: add error handling support for add_disk()
  commit: 293a7c528803321479593d42d0898bb5a9769db1
[6/9] m68k/emu/nfblock: add error handling support for add_disk()
  commit: 21fd880d3da7564bab68979417cab7408e4f9642
[7/9] um/drivers/ubd_kern: add error handling support for add_disk()
  commit: 66638f163a2b5c5b462ca38525129b14a20117eb
[8/9] rnbd: add error handling support for add_disk()
  commit: 2e9e31bea01997450397d64da43b6675e0adb9e3
[9/9] mtd: add add_disk() error handling
  commit: 83b863f4a3f0de4ece7802d9121fed0c3e64145f

Best regards,
-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] don't use ->bd_inode to access the block device size v3

2021-10-18 Thread Jens Axboe
On 10/18/21 7:04 PM, Kari Argillander wrote:
> On Mon, Oct 18, 2021 at 11:53:08AM -0600, Jens Axboe wrote:
> 
> snip..
> 
>> diff --git a/include/linux/genhd.h b/include/linux/genhd.h
>> index 7b0326661a1e..a967b3fb3c71 100644
>> --- a/include/linux/genhd.h
>> +++ b/include/linux/genhd.h
>> @@ -236,14 +236,14 @@ static inline sector_t get_start_sect(struct 
>> block_device *bdev)
>>  return bdev->bd_start_sect;
>>  }
>>  
>> -static inline loff_t bdev_nr_bytes(struct block_device *bdev)
>> +static inline sector_t bdev_nr_sectors(struct block_device *bdev)
>>  {
>> -return i_size_read(bdev->bd_inode);
>> +return bdev->bd_nr_sectors;
>>  }
>>  
>> -static inline sector_t bdev_nr_sectors(struct block_device *bdev)
>> +static inline loff_t bdev_nr_bytes(struct block_device *bdev)
>>  {
>> -return bdev_nr_bytes(bdev) >> SECTOR_SHIFT;
>> +return bdev_nr_setors(bdev) << SECTOR_SHIFT;
> 
> setors -> sectors

Yep, did catch that prior.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] don't use ->bd_inode to access the block device size v3

2021-10-18 Thread Jens Axboe
On 10/18/21 11:49 AM, Christoph Hellwig wrote:
> On Mon, Oct 18, 2021 at 11:40:51AM -0600, Jens Axboe wrote:
>>  static inline loff_t bdev_nr_bytes(struct block_device *bdev)
>>  {
>> -return i_size_read(bdev->bd_inode);
>> +return bdev->bd_nr_sectors;
> 
> This hunk needs to go into bdev_nr_sectors, and the bdev_nr_bytes
> probably wants to call bdev_nr_sectors and do the shifting.

Makes sense.

commit dd018a580d0037f65d7dd801cbf3e053f36283de
Author: Jens Axboe 
Date:   Mon Oct 18 11:39:45 2021 -0600

block: cache inode size in bdev

Reading the inode size brings in a new cacheline for IO submit, and
it's in the hot path being checked for every single IO. When doing
millions of IOs per core per second, this is noticeable overhead.

Cache the nr_sectors in the bdev itself.

Signed-off-by: Jens Axboe 

diff --git a/block/genhd.c b/block/genhd.c
index 759bc06810f8..53495e3391e3 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -58,6 +58,7 @@ void set_capacity(struct gendisk *disk, sector_t sectors)
 
spin_lock(>bd_size_lock);
i_size_write(bdev->bd_inode, (loff_t)sectors << SECTOR_SHIFT);
+   bdev->bd_nr_sectors = sectors;
spin_unlock(>bd_size_lock);
 }
 EXPORT_SYMBOL(set_capacity);
diff --git a/block/partitions/core.c b/block/partitions/core.c
index 9dbddc355b40..66ef9bc6d6a1 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -91,6 +91,7 @@ static void bdev_set_nr_sectors(struct block_device *bdev, 
sector_t sectors)
 {
spin_lock(>bd_size_lock);
i_size_write(bdev->bd_inode, (loff_t)sectors << SECTOR_SHIFT);
+   bdev->bd_nr_sectors = sectors;
spin_unlock(>bd_size_lock);
 }
 
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 472e55e0e94f..fe065c394fff 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -39,6 +39,7 @@ struct bio_crypt_ctx;
 
 struct block_device {
sector_tbd_start_sect;
+   sector_tbd_nr_sectors;
struct disk_stats __percpu *bd_stats;
unsigned long   bd_stamp;
boolbd_read_only;   /* read-only policy */
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 7b0326661a1e..a967b3fb3c71 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -236,14 +236,14 @@ static inline sector_t get_start_sect(struct block_device 
*bdev)
return bdev->bd_start_sect;
 }
 
-static inline loff_t bdev_nr_bytes(struct block_device *bdev)
+static inline sector_t bdev_nr_sectors(struct block_device *bdev)
 {
-   return i_size_read(bdev->bd_inode);
+   return bdev->bd_nr_sectors;
 }
 
-static inline sector_t bdev_nr_sectors(struct block_device *bdev)
+static inline loff_t bdev_nr_bytes(struct block_device *bdev)
 {
-   return bdev_nr_bytes(bdev) >> SECTOR_SHIFT;
+   return bdev_nr_setors(bdev) << SECTOR_SHIFT;
 }
 
 static inline sector_t get_capacity(struct gendisk *disk)

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] don't use ->bd_inode to access the block device size v3

2021-10-18 Thread Jens Axboe
On Mon, 18 Oct 2021 12:11:00 +0200, Christoph Hellwig wrote:
> various drivers currently poke directy at the block device inode, which
> is a bit of a mess.  This series cleans up the places that read the
> block device size to use the proper helpers.  I have separate patches
> for many of the other bd_inode uses, but this series is already big
> enough as-is,
> 
> Changes since v2:
>  - bdev_nr_bytes should return loff_t
>  - fix a commit message typo
>  - drop a redundant note in a commit message
> 
> [...]

Applied, thanks!

[01/30] block: move the SECTOR_SIZE related definitions to blk_types.h
commit: ac076a376d4c1fa7f01bedad76bab96a981b7464
[02/30] block: add a bdev_nr_bytes helper
commit: 449c780f68d9adbab2373c996d4341e61c088685
[03/30] bcache: remove bdev_sectors
commit: 519070e1b8411c93b483fb50511c9d5d7932f62a
[04/30] drbd: use bdev_nr_sectors instead of open coding it
commit: eee1958b9a7b912fff33319e5737d861703c3a47
[05/30] dm: use bdev_nr_sectors and bdev_nr_bytes instead of open coding them
commit: 34d7526093779e26c1a281992c7e91662f3afa85
[06/30] md: use bdev_nr_sectors instead of open coding it
commit: 1a70a0364bbbf29eab22c9fa4b3d71087df940a5
[07/30] nvmet: use bdev_nr_bytes instead of open coding it
commit: d61ec9eeaa161c6e385f4adebc5d671bc5290687
[08/30] target/iblock: use bdev_nr_bytes instead of open coding it
commit: 30de91d3df67291093736890b7496620d5025df9
[09/30] fs: use bdev_nr_bytes instead of open coding it in blkdev_max_block
commit: 011bb9476ef8f9867330e2bce22cf124d034cd33
[10/30] fs: simplify init_page_buffers
commit: 957c50dd8af9945fde3a3fb6c8baf5d638ef3177
[11/30] affs: use bdev_nr_sectors instead of open coding it
commit: ec003894a9db3858165dd61fb4cabf9a402aabe0
[12/30] btrfs: use bdev_nr_bytes instead of open coding it
commit: 167a1c754eae512e45de682e2cb4ea05f080fda5
[13/30] cramfs: use bdev_nr_bytes instead of open coding it
commit: cdf881e14aa127c7602110d208b3412b1412c1ab
[14/30] fat: use bdev_nr_sectors instead of open coding it
commit: 4513b8c903782c4f3963172d81414e08f48a0317
[15/30] hfs: use bdev_nr_sectors instead of open coding it
commit: 311b610de54a52c199e2a129da2c26ad5953edb3
[16/30] hfsplus: use bdev_nr_sectors instead of open coding it
commit: 03b67c1de5d3b085360f3d6dcf37560f44e8cb4b
[17/30] jfs: use bdev_nr_bytes instead of open coding it
commit: c1e80b87c3acd52817bea278310900ad2825686c
[18/30] nfs/blocklayout: use bdev_nr_bytes instead of open coding it
commit: 6b1b53cf606d70dc6dd375b42558cfe7e945
[19/30] nilfs2: use bdev_nr_bytes instead of open coding it
commit: a24d8bcfd590de5dc4a9e806c9e76558676c2eef
[20/30] ntfs3: use bdev_nr_bytes instead of open coding it
commit: 9242c8b0b4432b6929b030c729a1edd9d9116d4c
[21/30] pstore/blk: use bdev_nr_bytes instead of open coding it
commit: 989ab34bd83f075efdae2cf6026cec0507374696
[22/30] reiserfs: use bdev_nr_bytes instead of open coding it
commit: 8d147b3353c7fd853313521c4f66430d38d66391
[23/30] squashfs: use bdev_nr_bytes instead of open coding it
commit: 8538360bb42955166d0073ffb6dff6a4b0caa4ec
[24/30] block: use bdev_nr_bytes instead of open coding it in blkdev_fallocate
commit: 7ad94c3008a3f5e0ff8af1e3ff1c7061955ccec4
[25/30] block: add a sb_bdev_nr_blocks helper
commit: 5793a4ebc76566fd24d7afdbcefb3311355fd077
[26/30] ext4: use sb_bdev_nr_blocks
commit: 3a10af74c8f1d390857cf87648573bc4f157e4ca
[27/30] jfs: use sb_bdev_nr_blocks
commit: cd8ac55f93923c65e18204c99b08a8c4cba3d187
[28/30] ntfs: use sb_bdev_nr_blocks
commit: 8e2c901e6d1c97bf862514901beaae3e248655d8
[29/30] reiserfs: use sb_bdev_nr_blocks
commit: 93361ef44a8931d281583ea9c608247fe8127528
[30/30] udf: use sb_bdev_nr_blocks
commit: ea8befeb35c47cf95012032850fe3f0ec80e5cde

Best regards,
-- 
Jens Axboe


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] don't use ->bd_inode to access the block device size v3

2021-10-18 Thread Jens Axboe
On 10/18/21 11:18 AM, Christoph Hellwig wrote:
> On Mon, Oct 18, 2021 at 11:16:08AM -0600, Jens Axboe wrote:
>> This looks good to me. Followup question, as it's related - I've got a
>> hacky patch that caches the inode size in the bdev:
>>
>> https://git.kernel.dk/cgit/linux-block/commit/?h=perf-wip=c754951eb7193258c35a574bd1b7c4946ee4
>>
>> so we don't have to dip into the inode itself for the fast path. While
>> it's obviously not something being proposed for inclusion right now, is
>> there a world in which we can make something like that work?
> 
> There's just two places that update i_size for block devices:
> set_capacity and bdev_set_nr_sectors.  So you just need to update
> bd_nr_sectors there and you're done.

This on top of your patches should do the trick, then.


commit eebb7c5048163985fb21d6cb740ebac78cb46051
Author: Jens Axboe 
Date:   Mon Oct 18 11:39:45 2021 -0600

block: cache inode size in bdev

Reading the inode size brings in a new cacheline for IO submit, and
it's in the hot path being checked for every single IO. When doing
millions of IOs per core per second, this is noticeable overhead.

Cache the nr_sectors in the bdev itself.

Signed-off-by: Jens Axboe 

diff --git a/block/genhd.c b/block/genhd.c
index 759bc06810f8..53495e3391e3 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -58,6 +58,7 @@ void set_capacity(struct gendisk *disk, sector_t sectors)
 
spin_lock(>bd_size_lock);
i_size_write(bdev->bd_inode, (loff_t)sectors << SECTOR_SHIFT);
+   bdev->bd_nr_sectors = sectors;
spin_unlock(>bd_size_lock);
 }
 EXPORT_SYMBOL(set_capacity);
diff --git a/block/partitions/core.c b/block/partitions/core.c
index 9dbddc355b40..66ef9bc6d6a1 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -91,6 +91,7 @@ static void bdev_set_nr_sectors(struct block_device *bdev, 
sector_t sectors)
 {
spin_lock(>bd_size_lock);
i_size_write(bdev->bd_inode, (loff_t)sectors << SECTOR_SHIFT);
+   bdev->bd_nr_sectors = sectors;
spin_unlock(>bd_size_lock);
 }
 
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 472e55e0e94f..fe065c394fff 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -39,6 +39,7 @@ struct bio_crypt_ctx;
 
 struct block_device {
sector_tbd_start_sect;
+   sector_tbd_nr_sectors;
struct disk_stats __percpu *bd_stats;
unsigned long   bd_stamp;
boolbd_read_only;   /* read-only policy */
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 7b0326661a1e..001f617f82da 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -238,7 +238,7 @@ static inline sector_t get_start_sect(struct block_device 
*bdev)
 
 static inline loff_t bdev_nr_bytes(struct block_device *bdev)
 {
-   return i_size_read(bdev->bd_inode);
+   return bdev->bd_nr_sectors;
 }
 
 static inline sector_t bdev_nr_sectors(struct block_device *bdev)

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] don't use ->bd_inode to access the block device size v3

2021-10-18 Thread Jens Axboe
On 10/18/21 4:11 AM, Christoph Hellwig wrote:
> Hi Jens,
> 
> various drivers currently poke directy at the block device inode, which
> is a bit of a mess.  This series cleans up the places that read the
> block device size to use the proper helpers.  I have separate patches
> for many of the other bd_inode uses, but this series is already big
> enough as-is,

This looks good to me. Followup question, as it's related - I've got a
hacky patch that caches the inode size in the bdev:

https://git.kernel.dk/cgit/linux-block/commit/?h=perf-wip=c754951eb7193258c35a574bd1b7c4946ee4

so we don't have to dip into the inode itself for the fast path. While
it's obviously not something being proposed for inclusion right now, is
there a world in which we can make something like that work?

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v4 6/6] nbd: add error handling support for add_disk()

2021-09-27 Thread Jens Axboe
On 9/27/21 3:59 PM, Luis Chamberlain wrote:
> We never checked for errors on add_disk() as this function
> returned void. Now that this is fixed, use the shiny new
> error handling.

Applied, thanks.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v4 5/6] loop: add error handling support for add_disk()

2021-09-27 Thread Jens Axboe
On 9/27/21 3:59 PM, Luis Chamberlain wrote:
> We never checked for errors on add_disk() as this function
> returned void. Now that this is fixed, use the shiny new
> error handling.

Applied, thanks.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] add a bvec_virt helper

2021-08-16 Thread Jens Axboe
On 8/4/21 3:56 AM, Christoph Hellwig wrote:
> Hi Jens,
> 
> this series adds a bvec_virt helper to return the virtual address of the
> data in bvec to replace the open coded calculation, and as a reminder
> that generall bio/bvec data can be in high memory unless it is caller
> controller or in an architecture specific driver where highmem is
> impossible.

Applied, thanks.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH] block: move some macros to blkdev.h

2021-08-11 Thread Jens Axboe
On 7/20/21 8:53 PM, Guoqing Jiang wrote:
> From: Guoqing Jiang 
> 
> Move them (PAGE_SECTORS_SHIFT, PAGE_SECTORS and SECTOR_MASK) to the
> generic header file to remove redundancy.

Applied for 5.15, thanks.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] use regular gendisk registration in device mapper v2

2021-08-09 Thread Jens Axboe
On 8/4/21 3:41 AM, Christoph Hellwig wrote:
> Hi all,
> 
> The device mapper code currently has a somewhat odd gendisk registration
> scheme where it calls add_disk early, but uses a special flag to skip the
> "queue registration", which is a major part of add_disk.  This series
> improves the block layer holder tracking to work on an entirely
> unregistered disk and thus allows device mapper to use the normal scheme
> of calling add_disk when it is ready to accept I/O.
> 
> Note that this leads to a user visible change - the sysfs attributes on
> the disk and the dm directory hanging off it are not only visible once
> the initial table is loaded.  This did not make a different to my testing
> using dmsetup and the lvm2 tools.

Applied, thanks.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] switch the block layer to use kmap_local_page v3

2021-07-27 Thread Jens Axboe
On 7/26/21 11:56 PM, Christoph Hellwig wrote:
> Hi all,
> 
> this series switches the core block layer code and all users of the
> existing bvec kmap helpers to use kmap_local_page.  Drivers that
> currently use open coded kmap_atomic calls will converted in a follow
> on series.
> 
> To do so a new kunmap variant is added that calls
> flush_kernel_dcache_page.  I'm not entirely sure where to call
> flush_dcache_page vs flush_kernel_dcache_page, so I've tried to follow
> the documentation here, but additional feedback would be welcome.
> 
> Note that the ps3disk has a minir conflict with the
> flush_kernel_dcache_page removal in linux-next through the -mm tree.
> I had hoped that change would go into 5.14, but it seems like it is
> being held for 5.15.

Applied for 5.15, thanks.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] simplify gendisk and request_queue allocation for blk-mq based drivers

2021-06-11 Thread Jens Axboe
On 6/2/21 12:53 AM, Christoph Hellwig wrote:
> Hi all,
> 
> this series is the scond part of cleaning up lifetimes and allocation of
> the gendisk and request_queue structure.  It adds a new interface to
> allocate the disk and queue together for blk based drivers, and uses that
> in all drivers that do not have any caveats in their gendisk and
> request_queue lifetime rules.

Applied, thanks.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v5 00/11] dm: Improve zoned block device support

2021-06-03 Thread Jens Axboe
On 6/2/21 12:32 PM, Mike Snitzer wrote:
> On Tue, Jun 01 2021 at  6:57P -0400,
> Damien Le Moal  wrote:
> 
>> On 2021/05/26 6:25, Damien Le Moal wrote:
>>> This series improve device mapper support for zoned block devices and
>>> of targets exposing a zoned device.
>>
>> Mike, Jens,
>>
>> Any feedback regarding this series ?
>>
>>>
>>> The first patch improve support for user requests to reset all zones of
>>> the target device. With the fix, such operation behave similarly to
>>> physical block devices implementation based on the single zone reset
>>> command with the ALL bit set.
>>>
>>> The following 2 patches are preparatory block layer patches.
>>>
>>> Patch 4 and 5 are 2 small fixes to DM core zoned block device support.
>>>
>>> Patch 6 reorganizes DM core code, moving conditionally defined zoned
>>> block device code into the new dm-zone.c file. This avoids sprinkly DM
>>> with zone related code defined under an #ifdef CONFIG_BLK_DEV_ZONED.
>>>
>>> Patch 7 improves DM zone report helper functions for target drivers.
>>>
>>> Patch 8 fixes a potential problem with BIO requeue on zoned target.
>>>
>>> Finally, patch 9 to 11 implement zone append emulation using regular
>>> writes for target drivers that cannot natively support this BIO type.
>>> The only target currently needing this emulation is dm-crypt. With this
>>> change, a zoned dm-crypt device behaves exactly like a regular zoned
>>> block device, correctly executing user zone append BIOs.
>>>
>>> This series passes the following tests:
>>> 1) zonefs tests on top of dm-crypt with a zoned nullblk device
>>> 2) zonefs tests on top of dm-crypt+dm-linear with an SMR HDD
>>> 3) btrfs fstests on top of dm-crypt with zoned nullblk devices.
>>>
>>> Comments are as always welcome.
> 
> I've picked up DM patches 4-8 because they didn't depend on the first
> 3 block patches.
> 
> But I'm fine with picking up 1-3 if Jens provides his Acked-by.
> And then I can pickup the remaining DM patches 9-11.

I'm fine with 1-3, you can add my Acked-by to those.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] simplify gendisk and request_queue allocation for bio based drivers

2021-06-01 Thread Jens Axboe
On 5/20/21 11:50 PM, Christoph Hellwig wrote:
> Hi all,
> 
> this series is the first part of cleaning up lifetimes and allocation of
> the gendisk and request_queue structure.  It adds a new interface to
> allocate the disk and queue together for bio based drivers, and a helper
> for cleanup/free them when a driver is unloaded or a device is removed.
> 
> Together this removes the need to treat the gendisk and request_queue
> as separate entities for bio based drivers.

Applied, thanks.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH V5 10/12] block: add queue_to_disk() to get gendisk from request_queue

2021-04-12 Thread Jens Axboe
On 3/31/21 8:19 PM, Ming Lei wrote:
> From: Jeffle Xu 
> 
> Sometimes we need to get the corresponding gendisk from request_queue.
> 
> It is preferred that block drivers store private data in
> gendisk->private_data rather than request_queue->queuedata, e.g. see:
> commit c4a59c4e5db3 ("dm: stop using ->queuedata").
> 
> So if only request_queue is given, we need to get its corresponding
> gendisk to get the private data stored in that gendisk.

Applied this one as a separate cleanup/helper.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 4/4] dm: support I/O polling

2021-03-07 Thread Jens Axboe
On 3/7/21 8:54 PM, JeffleXu wrote:
> 
> 
> On 3/6/21 1:56 AM, Heinz Mauelshagen wrote:
>>
>> On 3/5/21 6:46 PM, Heinz Mauelshagen wrote:
>>> On 3/5/21 10:52 AM, JeffleXu wrote:
>>>>
>>>> On 3/3/21 6:09 PM, Mikulas Patocka wrote:
>>>>>
>>>>> On Wed, 3 Mar 2021, JeffleXu wrote:
>>>>>
>>>>>>
>>>>>> On 3/3/21 3:05 AM, Mikulas Patocka wrote:
>>>>>>
>>>>>>> Support I/O polling if submit_bio_noacct_mq_direct returned non-empty
>>>>>>> cookie.
>>>>>>>
>>>>>>> Signed-off-by: Mikulas Patocka 
>>>>>>>
>>>>>>> ---
>>>>>>>   drivers/md/dm.c |    5 +
>>>>>>>   1 file changed, 5 insertions(+)
>>>>>>>
>>>>>>> Index: linux-2.6/drivers/md/dm.c
>>>>>>> ===
>>>>>>> --- linux-2.6.orig/drivers/md/dm.c    2021-03-02
>>>>>>> 19:26:34.0 +0100
>>>>>>> +++ linux-2.6/drivers/md/dm.c    2021-03-02 19:26:34.0 +0100
>>>>>>> @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru
>>>>>>>   }
>>>>>>>   }
>>>>>>>   +    if (ci.poll_cookie != BLK_QC_T_NONE) {
>>>>>>> +    while (atomic_read(>io_count) > 1 &&
>>>>>>> +   blk_poll(ci.poll_queue, ci.poll_cookie, true)) ;
>>>>>>> +    }
>>>>>>> +
>>>>>>>   /* drop the extra reference count */
>>>>>>>   dec_pending(ci.io, errno_to_blk_status(error));
>>>>>>>   }
>>>>>> It seems that the general idea of your design is to
>>>>>> 1) submit *one* split bio
>>>>>> 2) blk_poll(), waiting the previously submitted split bio complets
>>>>> No, I submit all the bios and poll for the last one.
>>>>>
>>>>>> and then submit next split bio, repeating the above process. I'm
>>>>>> afraid
>>>>>> the performance may be an issue here, since the batch every time
>>>>>> blk_poll() reaps may decrease.
>>>>> Could you benchmark it?
>>>> I only tested dm-linear.
>>>>
>>>> The configuration (dm table) of dm-linear is:
>>>> 0 1048576 linear /dev/nvme0n1 0
>>>> 1048576 1048576 linear /dev/nvme2n1 0
>>>> 2097152 1048576 linear /dev/nvme5n1 0
>>>>
>>>>
>>>> fio script used is:
>>>> ```
>>>> $cat fio.conf
>>>> [global]
>>>> name=iouring-sqpoll-iopoll-1
>>>> ioengine=io_uring
>>>> iodepth=128
>>>> numjobs=1
>>>> thread
>>>> rw=randread
>>>> direct=1
>>>> registerfiles=1
>>>> hipri=1
>>>> runtime=10
>>>> time_based
>>>> group_reporting
>>>> randrepeat=0
>>>> filename=/dev/mapper/testdev
>>>> bs=4k
>>>>
>>>> [job-1]
>>>> cpus_allowed=14
>>>> ```
>>>>
>>>> IOPS (IRQ mode) | IOPS (iopoll mode (hipri=1))
>>>> --- | 
>>>>     213k |   19k
>>>>
>>>> At least, it doesn't work well with io_uring interface.
>>>>
>>>>
>>>
>>>
>>> Jeffle,
>>>
>>> I ran your above fio test on a linear LV split across 3 NVMes to
>>> second your split mapping
>>> (system: 32 core Intel, 256GiB RAM) comparing io engines sync, libaio
>>> and io_uring,
>>> the latter w/ and w/o hipri (sync+libaio obviously w/o registerfiles
>>> and hipri) which resulted ok:
>>>
>>>
>>>
>>> sync  |  libaio  |  IRQ mode (hipri=0) | iopoll (hipri=1)
>>> --|--|-|- 56.3K |   
>>> 290K  |    329K | 351K I can't second your
>>> drastic hipri=1 drop here...
>>
>>
>> Sorry, email mess.
>>
>>
>> sync   |  libaio  |  IRQ mode (hipri=0) | iopoll (hipri=1)
>> ---|--|-|-
>> 56.3K  |    290K  |    329K | 351K
>>
>>
>>
>> I can't second your drastic hipri=1 drop here...
>>
> 
> Hummm, that's indeed somewhat strange...
> 
> My test environment:
> - CPU: 128 cores, though only one CPU core is used since
> 'cpus_allowed=14' in fio configuration
> - memory: 983G memory free
> - NVMe: Huawai ES3510P (HWE52P434T0L005N), with 'nvme.poll_queues=3'
> 
> Maybe you didn't specify 'nvme.poll_queues=XXX'? In this case, IO still
> goes into IRQ mode, even you have specified 'hipri=1'?

That would be my guess too, and the patches also have a very suspicious
clear of HIPRI which shouldn't be there (which would let that fly through).

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] [PATCH 4/4] dm: support I/O polling

2021-03-05 Thread Jens Axboe
On 3/4/21 3:09 AM, Mikulas Patocka wrote:
> 
> 
> On Thu, 4 Mar 2021, JeffleXu wrote:
> 
>>> __split_and_process_non_flush records the poll cookie in ci.poll_cookie. 
>>> When we processed all the bios, we poll for the last cookie here:
>>>
>>> if (ci.poll_cookie != BLK_QC_T_NONE) {
>>> while (atomic_read(>io_count) > 1 &&
>>>blk_poll(ci.poll_queue, ci.poll_cookie, true)) ;
>>> }
>>
>> So what will happen if one bio submitted to dm device crosses the device
>> boundary among several target devices (e.g., dm-stripe)? Please refer
>> the following call graph.
>>
>> ```
>> submit_bio
>>   __submit_bio_noacct
>> disk->fops->submit_bio(), calling into __split_and_process_bio(),
>> call __split_and_process_non_flush() once, submitting the *first* split bio
>> disk->fops->submit_bio(), calling into __split_and_process_bio(),
>> call __split_and_process_non_flush() once, submitting the *second* split bio
>> ...
>> ```
>>
>>
>> So the loop is in __submit_bio_noacct(), rather than
>> __split_and_process_bio(). Your design will send the first split bio,
>> and then poll on this split bio, then send the next split bio, polling
>> on this, go on and on...
> 
> No. It will send all the bios and poll for the last one.

I took a quick look, and this seems very broken. You must not poll off
the submission path, polling should be invoked by the higher layer when
someone wants to reap events. IOW, dm should not be calling blk_poll()
by itself, only off mq_ops->poll(). Your patch seems to do it off
submission once you submit the last bio in that batch, effectively
implementing sync polling for that series. That's not right.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 1/4] block: introduce a function submit_bio_noacct_mq_direct

2021-03-05 Thread Jens Axboe
On 3/4/21 3:14 AM, Mikulas Patocka wrote:
> 
> 
> On Wed, 3 Mar 2021, Jens Axboe wrote:
> 
>> On 3/2/21 12:05 PM, Mikulas Patocka wrote:
>>
>> There seems to be something wrong with how this series is being sent
>> out. I have 1/4 and 3/4, but both are just attachments.
>>
>> -- 
>> Jens Axboe
> 
> I used quilt to send it. I don't know what's wrong with it - if you look 
> at archives at 
> https://listman.redhat.com/archives/dm-devel/2021-March/thread.html , it 
> seems normal.

I guess the archives handle it, but it just shows up as an empty email with
an attachment. Not very conducive to review, so I do suggest you fix how
you're sending them out.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



  1   2   3   4   >