Re: [PATCH 01/19] bcache: Fix leak of bdev reference

2017-09-04 Thread Coly Li
On 2017/9/5 下午2:43, Christoph Hellwig wrote:
> On Tue, Sep 05, 2017 at 01:30:04AM +0800, Coly Li wrote:
>>
>> When you mentioned "whole chunk of code", do you mean the following
>> block of code ?
>>
>>
>> 1960 if (IS_ERR(bdev)) {
>> = start of whole chunk of code 
>> 1961 if (bdev == ERR_PTR(-EBUSY)) {
>> 1962 bdev = lookup_bdev(strim(path));
>> 1963 mutex_lock(&bch_register_lock);
>> 1964 if (!IS_ERR(bdev) && bch_is_open(bdev))
>> 1965 err = "device already registered";
>> 1966 else
>> 1967 err = "device busy";
>> 1968 mutex_unlock(&bch_register_lock);
>> 1969 if (!IS_ERR(bdev))
>> 1970 bdput(bdev);
>> 1971 if (attr == &ksysfs_register_quiet)
>> 1972 goto out;
>> 1973 }
>> = end of whole chunk of code 
>> 1974 goto err;
>> 1975 }
>>
>> I don't mind to remove it, just double check I don't misunderstand what
>> you meant.
> 
> Yes, that's the problematic block.
> 
Understand, I will send out a patch, hopefully it can catch up 4.14
merge window.

Thanks for the hint.

-- 
Coly Li


Re: [PATCH] mkfs: Remove messages printed when blocksize < physical sectorsize

2017-09-04 Thread Dave Chinner
On Mon, Sep 04, 2017 at 11:31:39PM -0700, Christoph Hellwig wrote:
> On Tue, Sep 05, 2017 at 11:14:42AM +0530, Chandan Rajendra wrote:
> > Linux kernel commit 6c6b6f28b3335fd85ec833ee0005d9c9dca6c003 (loop: set
> > physical block size to PAGE_SIZE) now sets PAGE_SIZE as the default
> > physical sector size of loop devices. On ppc64, this causes loop devices
> > to have 64k as the physical sector size.
> 
> Eek.  We'll need to revert the loop change ASAP!

And, FWIW, making the warning go away if probably a bad idea,
because XFS only supports devices with sector sizes up
to 32k:

#define XFS_MIN_SECTORSIZE_LOG  9   /* i.e. 512 bytes */
#define XFS_MAX_SECTORSIZE_LOG  15  /* i.e. 32768 bytes */

And so it should be warning about devices trying to tell it to use
something larger

Cheers,

Dave.

-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH 01/19] bcache: Fix leak of bdev reference

2017-09-04 Thread Christoph Hellwig
On Tue, Sep 05, 2017 at 01:30:04AM +0800, Coly Li wrote:
> 
> When you mentioned "whole chunk of code", do you mean the following
> block of code ?
> 
> 
> 1960 if (IS_ERR(bdev)) {
> = start of whole chunk of code 
> 1961 if (bdev == ERR_PTR(-EBUSY)) {
> 1962 bdev = lookup_bdev(strim(path));
> 1963 mutex_lock(&bch_register_lock);
> 1964 if (!IS_ERR(bdev) && bch_is_open(bdev))
> 1965 err = "device already registered";
> 1966 else
> 1967 err = "device busy";
> 1968 mutex_unlock(&bch_register_lock);
> 1969 if (!IS_ERR(bdev))
> 1970 bdput(bdev);
> 1971 if (attr == &ksysfs_register_quiet)
> 1972 goto out;
> 1973 }
> = end of whole chunk of code 
> 1974 goto err;
> 1975 }
> 
> I don't mind to remove it, just double check I don't misunderstand what
> you meant.

Yes, that's the problematic block.


Re: [PATCH] mkfs: Remove messages printed when blocksize < physical sectorsize

2017-09-04 Thread Omar Sandoval
On Mon, Sep 04, 2017 at 11:31:39PM -0700, Christoph Hellwig wrote:
> On Tue, Sep 05, 2017 at 11:14:42AM +0530, Chandan Rajendra wrote:
> > Linux kernel commit 6c6b6f28b3335fd85ec833ee0005d9c9dca6c003 (loop: set
> > physical block size to PAGE_SIZE) now sets PAGE_SIZE as the default
> > physical sector size of loop devices. On ppc64, this causes loop devices
> > to have 64k as the physical sector size.
> 
> Eek.  We'll need to revert the loop change ASAP!

Most annoying patch series ever. It hasn't made it to Linus' tree yet,
right? We can revert (although the later change depends on that), fold
in a fix, or apply a fix on top of it, whatever Jens prefers.


Re: [PATCH] mkfs: Remove messages printed when blocksize < physical sectorsize

2017-09-04 Thread Christoph Hellwig
On Tue, Sep 05, 2017 at 11:14:42AM +0530, Chandan Rajendra wrote:
> Linux kernel commit 6c6b6f28b3335fd85ec833ee0005d9c9dca6c003 (loop: set
> physical block size to PAGE_SIZE) now sets PAGE_SIZE as the default
> physical sector size of loop devices. On ppc64, this causes loop devices
> to have 64k as the physical sector size.

Eek.  We'll need to revert the loop change ASAP!



[PATCH] block: fix bio.c kernel-doc notation warning

2017-09-04 Thread Randy Dunlap
From: Randy Dunlap 

Sphinx treats symbols that end with '_' as some kind of special
documentation indicator, so fix that by adding an ending '*'
to it.

../block/bio.c:404: ERROR: Unknown target name: "gfp".

Signed-off-by: Randy Dunlap 
---
 block/bio.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- lnx-413.orig/block/bio.c
+++ lnx-413/block/bio.c
@@ -400,7 +400,7 @@ static void punt_bios_to_rescuer(struct
 
 /**
  * bio_alloc_bioset - allocate a bio for I/O
- * @gfp_mask:   the GFP_ mask given to the slab allocator
+ * @gfp_mask:   the GFP_* mask given to the slab allocator
  * @nr_iovecs: number of iovecs to pre-allocate
  * @bs:the bio_set to allocate from.
  *




Re: [PATCH V3 7/8] block: allow to allocate req with REQF_PREEMPT when queue is preempt frozen

2017-09-04 Thread Ming Lei
On Tue, Sep 05, 2017 at 01:40:11AM +, Bart Van Assche wrote:
> On Tue, 2017-09-05 at 00:08 +0800, Ming Lei wrote:
> > On Mon, Sep 04, 2017 at 03:40:35PM +, Bart Van Assche wrote:
> > > Have you considered to use the blk-mq "reserved request" mechanism to 
> > > avoid
> > > starvation of power management requests instead of making the block layer
> > > even more complicated than it already is?
> > 
> > reserved request is really a bad idea, that means the reserved request
> > can't be used for normal I/O, we all know the request/tag space is
> > precious, and some device has a quite small tag space, such as sata.
> > This way will affect performance definitely.
> 
> Sorry but I'm neither convinced that reserving a request for power management 
> would be a bad idea nor that it would have a significant performance impact 
> nor
> that it would be complicated to implement. Have you noticed that the Linux ATA
> implementation already reserves a request for internal use and thereby reduces
> the queue depth from 32 to 31 (see also ATA_TAG_INTERNAL)? What I would like 
> to
> know if is whether the performance impact of reserving a request is more or 
> less
> than 1%.

Firstly we really can avoid the reservation, why do we have to
wast one precious tag just for PM, which may never happen on
one machine from its running. For SATA, the internal tag is for EH,
I believe the reservation is inevitable.

Secondly reserving one tag may decrease the concurrent I/O by 1,
that definitely hurts performance, especially for random I/O. Think
about why NVMe increases its queue depth so many. Not mention
there are some devices which have only one tag(.can_queue is 1),
how can you reserve one tag on this kind of device?

Finally bad result will follow if you reserve one tag for PM only.
Suppose it is doable to reserve one tag, did you consider
how to do that? Tag can be shared in host wide, do you want to
reserve one tag just for one request_queue?
- If yes, lots of tag can be reserved/wasted for the unusual PM
or sort of commands, even worse the whole tag space of HBA may
not be enough for the reservation if there are lots of LUNs in
this HBA.

- If not, and you just reserve one tag for one HBA, then all
PM commands share the one reservation. During suspend/resume, all
these PM commands have to run exclusively(serialized) for diskes
attached to the HBA, that will slow down the suspend/resume very
much because there may be lots of LUNs in this HBA.

That is why I said reserving one tag is really bad, isn't it?


Thanks,
Ming


Re: [PATCH V3 7/8] block: allow to allocate req with REQF_PREEMPT when queue is preempt frozen

2017-09-04 Thread Bart Van Assche
On Tue, 2017-09-05 at 00:08 +0800, Ming Lei wrote:
> On Mon, Sep 04, 2017 at 03:40:35PM +, Bart Van Assche wrote:
> > Have you considered to use the blk-mq "reserved request" mechanism to avoid
> > starvation of power management requests instead of making the block layer
> > even more complicated than it already is?
> 
> reserved request is really a bad idea, that means the reserved request
> can't be used for normal I/O, we all know the request/tag space is
> precious, and some device has a quite small tag space, such as sata.
> This way will affect performance definitely.

Sorry but I'm neither convinced that reserving a request for power management 
would be a bad idea nor that it would have a significant performance impact nor
that it would be complicated to implement. Have you noticed that the Linux ATA
implementation already reserves a request for internal use and thereby reduces
the queue depth from 32 to 31 (see also ATA_TAG_INTERNAL)? What I would like to
know if is whether the performance impact of reserving a request is more or less
than 1%.

Bart.

Re: [PATCH V4 00/14] blk-mq-sched: improve SCSI-MQ performance

2017-09-04 Thread Ming Lei
On Mon, Sep 04, 2017 at 11:12:49AM +0200, Paolo Valente wrote:
> 
> > Il giorno 02 set 2017, alle ore 17:17, Ming Lei  ha 
> > scritto:
> > 
> > Hi,
> > 
> > In Red Hat internal storage test wrt. blk-mq scheduler, we
> > found that I/O performance is much bad with mq-deadline, especially
> > about sequential I/O on some multi-queue SCSI devcies(lpfc, qla2xxx,
> > SRP...)
> > 
> > Turns out one big issue causes the performance regression: requests
> > are still dequeued from sw queue/scheduler queue even when ldd's
> > queue is busy, so I/O merge becomes quite difficult to make, then
> > sequential IO degrades a lot.
> > 
> > The 1st five patches improve this situation, and brings back
> > some performance loss.
> > 
> > Patch 6 ~ 7 uses q->queue_depth as hint for setting up
> > scheduler queue depth.
> > 
> > Patch 8 ~ 15 improve bio merge via hash table in sw queue,
> > which makes bio merge more efficient than current approch
> > in which only the last 8 requests are checked. Since patch
> > 6~14 converts to the scheduler way of dequeuing one request
> > from sw queue one time for SCSI device, and the times of
> > acquring ctx->lock is increased, and merging bio via hash
> > table decreases holding time of ctx->lock and should eliminate
> > effect from patch 14. 
> > 
> > With this changes, SCSI-MQ sequential I/O performance is
> > improved much, Paolo reported that mq-deadline performance
> > improved much[2] in his dbench test wrt V2. Also performanc
> > improvement on lpfc/qla2xx was observed with V1.[1]
> > 
> > Also Bart worried that this patchset may affect SRP, so provide
> > test data on SCSI SRP this time:
> > 
> > - fio(libaio, bs:4k, dio, queue_depth:64, 64 jobs)
> > - system(16 cores, dual sockets, mem: 96G)
> > 
> >  |v4.13-rc6+*  |v4.13-rc6+   | patched v4.13-rc6+ 
> > -
> > IOPS(K)  |  DEADLINE   |NONE |NONE 
> > -
> > read  |  587.81 |  511.96 |  518.51 
> > -
> > randread  |  116.44 |  142.99 |  142.46 
> > -
> > write |  580.87 |   536.4 |  582.15 
> > -
> > randwrite |  104.95 |  124.89 |  123.99 
> > -
> > 
> > 
> >  |v4.13-rc6+   |v4.13-rc6+   | patched v4.13-rc6+ 
> > -
> > IOPS(K)  |  DEADLINE   |MQ-DEADLINE  |MQ-DEADLINE  
> > -
> > read  |  587.81 |   158.7 |  450.41 
> > -
> > randread  |  116.44 |  142.04 |  142.72 
> > -
> > write |  580.87 |  136.61 |  569.37 
> > -
> > randwrite |  104.95 |  123.14 |  124.36 
> > -
> > 
> > *: v4.13-rc6+ means v4.13-rc6 with block for-next
> > 
> > 
> > Please consider to merge to V4.4.
> > 
> > [1] http://marc.info/?l=linux-block&m=150151989915776&w=2
> > [2] https://marc.info/?l=linux-block&m=150217980602843&w=2
> > 
> > V4:
> > - add Reviewed-by tag
> > - some trival change: typo fix in commit log or comment,
> > variable name, no actual functional change
> > 
> > V3:
> > - totally round robin for picking req from ctx, as suggested
> > by Bart
> > - remove one local variable in __sbitmap_for_each_set()
> > - drop patches of single dispatch list, which can improve
> > performance on mq-deadline, but cause a bit degrade on
> > none because all hctxs need to be checked after ->dispatch
> > is flushed. Will post it again once it is mature.
> > - rebase on v4.13-rc6 with block for-next
> > 
> > V2:
> > - dequeue request from sw queues in round roubin's style
> > as suggested by Bart, and introduces one helper in sbitmap
> > for this purpose
> > - improve bio merge via hash table from sw queue
> > - add comments about using DISPATCH_BUSY state in lockless way,
> > simplifying handling on busy state,
> > - hold ctx->lock when clearing ctx busy bit as suggested
> > by Bart
> > 
> > 
> 
> Tested-by: Paolo Valente 

Hi Jens,

Is there any chance to make this patchset merged to V4.4?


Thanks,
Ming


Re: [PATCH V7 00/10] mmc: Add Command Queue support

2017-09-04 Thread Adrian Hunter
On 09/04/2017 04:48 PM, Ulf Hansson wrote:
> On 4 September 2017 at 09:06, Adrian Hunter  wrote:
>> On 01/09/17 16:28, Adrian Hunter wrote:
>>> On 01/09/17 15:58, Ulf Hansson wrote:
 + Christoph

 On 1 September 2017 at 13:42, Adrian Hunter  
 wrote:
> On 31/08/17 14:56, Adrian Hunter wrote:
>> Here is V7 of the hardware command queue patches without the software
>> command queue patches, now using blk-mq.
>>
>> HW CMDQ offers 25% - 50% better random multi-threaded I/O.  I see a 
>> slight
>> 2% drop in sequential read speed but no change to sequential write.
>
> Any comments?

 A couple of overall comments, for now.

 To make sure we don't overlook something when converting to mq, I
 would prefer that we first convert the existing mmc block code to mq,
 then we add CMDQ on top.
>>>
>>> That doesn't make sense.  This patch set is not converting the legacy driver
>>> to mq therefore it cannot overlook anything for converting to mq.
>>
>> And then you go silent again.
> 
> We have weekends in Sweden - and I also work on other things than mmc :-).
> 
> I do however admit, that I could have been reviewing a bit faster
> throughout the re-spins. Apologize for that, but I am only doing my
> best.
> 
>>
>> I can send blk-mq support for legacy requests in a few days if you like, but
>> I want to hear a better explanation of why you are delaying CQE support.
> 
> That would be very nice, however be aware of that we are in the merge
> window, so I am not picking new material for 4.14 from this point. I
> assume you understand why.

Nope.  This is new functionality - doesn't affect anyone who doesn't have a
command queue engine.  Next to no chance of regressions.  Tested by several
in the community.  Substantially unchanged since February.  It is not even
very much code in the block driver.

> 
> Still, big changes is always nice to queue up early for a release
> cycle. Let's aim for that!

You said that in February.  Never happened.  You said you wanted blk-mq, so
I waited to re-base on top, but it never appeared.

> Moreover, I am not delaying CQE, but really want it to be merged asap!
> However, I am also having the role as a maintainer and the things that
> comes with it. For example, I would like the community to reach
> consensus around how to move forward with CQE, before I decide to pick
> it up.

It has been more than 6 months.  That is enough time to wait for "consensus".




Re: [PATCH 01/19] bcache: Fix leak of bdev reference

2017-09-04 Thread Coly Li
On 2017/7/6 上午2:24, Christoph Hellwig wrote:
> On Fri, Jun 30, 2017 at 01:42:50PM -0700, bca...@lists.ewheeler.net wrote:
>> From: Jan Kara 
>>
>> If blkdev_get_by_path() in register_bcache() fails, we try to lookup the
>> block device using lookup_bdev() to detect which situation we are in to
>> properly report error. However we never drop the reference returned to
>> us from lookup_bdev(). Fix that.
> 
> This look ok, but I think that whole chunk of code should just go
> away - adding a lookup_bdev and resulting mess just for a slightly
> different error message is just insane.

Hi Christoph,

When you mentioned "whole chunk of code", do you mean the following
block of code ?


1960 if (IS_ERR(bdev)) {
= start of whole chunk of code 
1961 if (bdev == ERR_PTR(-EBUSY)) {
1962 bdev = lookup_bdev(strim(path));
1963 mutex_lock(&bch_register_lock);
1964 if (!IS_ERR(bdev) && bch_is_open(bdev))
1965 err = "device already registered";
1966 else
1967 err = "device busy";
1968 mutex_unlock(&bch_register_lock);
1969 if (!IS_ERR(bdev))
1970 bdput(bdev);
1971 if (attr == &ksysfs_register_quiet)
1972 goto out;
1973 }
= end of whole chunk of code 
1974 goto err;
1975 }

I don't mind to remove it, just double check I don't misunderstand what
you meant.

Thanks.


-- 
Coly Li


Re: [PATCH V3 7/8] block: allow to allocate req with REQF_PREEMPT when queue is preempt frozen

2017-09-04 Thread Ming Lei
On Mon, Sep 04, 2017 at 04:18:32PM +, Bart Van Assche wrote:
> On Tue, 2017-09-05 at 00:08 +0800, Ming Lei wrote:
> > On Mon, Sep 04, 2017 at 03:40:35PM +, Bart Van Assche wrote:
> > > On Mon, 2017-09-04 at 15:16 +0800, Ming Lei wrote:
> > > > On Mon, Sep 04, 2017 at 04:13:26AM +, Bart Van Assche wrote:
> > > > > Allowing blk_get_request() to succeed after the DYING flag has been 
> > > > > set is
> > > > > completely wrong because that could result in a request being queued 
> > > > > after
> > > > > the DEAD flag has been set, resulting in either a hanging request or 
> > > > > a kernel
> > > > > crash. This is why it's completely wrong to add a 
> > > > > blk_queue_enter_live() call
> > > > > in blk_old_get_request() or blk_mq_alloc_request(). Hence my NAK for 
> > > > > any
> > > > > patch that adds a blk_queue_enter_live() call to any function called 
> > > > > from
> > > > > blk_get_request(). That includes the patch at the start of this 
> > > > > e-mail thread.
> > > > 
> > > > See above, this patch changes nothing about this fact, please look at
> > > > the patch carefully next time just before posting your long comment.
> > > 
> > > Are you really sure that your patch does not allow blk_get_request() to
> > > succeed after the DYING flag has been set? blk_mq_alloc_request() calls 
> > > both
> > > blk_queue_is_preempt_frozen() and blk_queue_enter_live() without holding
> > > any lock. A thread that is running concurrently with blk_mq_get_request()
> > > can unfreeze the queue after blk_queue_is_preempt_frozen() returned and
> > > before blk_queue_enter_live() is called. This means that with your patch
> > > series applied blk_get_request() can succeed after the DYING flag has been
> > > set, which is something we don't want. Additionally, I don't think we want
> > > to introduce any kind of locking in blk_mq_get_request() because that 
> > > would
> > > be a serialization point.
> >
> > Yeah, I am pretty sure.
> > 
> > Firstly blk_queue_freeze_preempt() is exclusive, that means it will wait
> > for completion of all pending freezing(both normal and preempt), and other
> > freezing can't be started too if there is in-progress preempt
> > freezing, actually it is a typical read/write lock use case, but
> > we need to support nested normal freezing, so we can't use rwsem. 
> 
> You seem to overlook that blk_get_request() can be called from another thread
> than the thread that is performing the freezing and unfreezing.

The freezing state of queue can be observed in all context, so this
issue isn't related with context, please see blk_queue_is_preempt_frozen()
which is called from blk_get_request():

if (!percpu_ref_is_dying(&q->q_usage_counter))
return false;

spin_lock(&q->freeze_lock);
preempt_frozen = q->preempt_freezing;
preempt_unfreezing = q->preempt_unfreezing;
spin_unlock(&q->freeze_lock);

return preempt_frozen && !preempt_unfreezing;


If the queue is in preempt freezing, blk_queue_enter_live() is called
before allocating request, otherwise blk_queue_enter() is called.

BTW, we can add the check of queue dying in this helper as one
improvement.

-- 
Ming


Re: [PATCH V3 6/8] block: introduce preempt version of blk_[freeze|unfreeze]_queue

2017-09-04 Thread Ming Lei
On Mon, Sep 04, 2017 at 03:21:08PM +, Bart Van Assche wrote:
> On Sat, 2017-09-02 at 21:08 +0800, Ming Lei wrote:
> > --- a/include/linux/blkdev.h
> > +++ b/include/linux/blkdev.h
> > @@ -565,6 +565,10 @@ struct request_queue {
> >  
> > int bypass_depth;
> > atomic_tmq_freeze_depth;
> > +   spinlock_t  freeze_lock;
> > +   unsignednormal_freezing:1;
> > +   unsignedpreempt_freezing:1;
> > +   unsignedpreempt_unfreezing:1;
> >  
> >  #if defined(CONFIG_BLK_DEV_BSG)
> > bsg_job_fn  *bsg_job_fn;
> 
> Requests queues already have to many states and you want to make request 
> queues
> even more complicated by introducing several new state variables? Yikes!

The three flags are used in freeze/unfreeze path only, and I don't think
they are too complicated to maintain. Actually each state are simply
enough:

- normal_freezing means the queue is in normal freezing, it is set
before blk_queue_freeze() returns. In this state, no any request
can be allocated from the queue, just like current blk queue
freezing.

- preempt_freezing means the queue is in preempt freezing, the flag
is set before blk_queue_freeze_preempt() returns successfully. In
this state, only RQF_PREEMPT is allowed to be allocated.

- preempt_unfreezing means the queue is in preempt unfreezing, just
set in the entry of blk_queue_unfreeze_preempt(). In this state,
no any request can be allocated from the queue.

-- 
Ming


Re: [PATCH V3 7/8] block: allow to allocate req with REQF_PREEMPT when queue is preempt frozen

2017-09-04 Thread Bart Van Assche
On Tue, 2017-09-05 at 00:08 +0800, Ming Lei wrote:
> On Mon, Sep 04, 2017 at 03:40:35PM +, Bart Van Assche wrote:
> > On Mon, 2017-09-04 at 15:16 +0800, Ming Lei wrote:
> > > On Mon, Sep 04, 2017 at 04:13:26AM +, Bart Van Assche wrote:
> > > > Allowing blk_get_request() to succeed after the DYING flag has been set 
> > > > is
> > > > completely wrong because that could result in a request being queued 
> > > > after
> > > > the DEAD flag has been set, resulting in either a hanging request or a 
> > > > kernel
> > > > crash. This is why it's completely wrong to add a 
> > > > blk_queue_enter_live() call
> > > > in blk_old_get_request() or blk_mq_alloc_request(). Hence my NAK for any
> > > > patch that adds a blk_queue_enter_live() call to any function called 
> > > > from
> > > > blk_get_request(). That includes the patch at the start of this e-mail 
> > > > thread.
> > > 
> > > See above, this patch changes nothing about this fact, please look at
> > > the patch carefully next time just before posting your long comment.
> > 
> > Are you really sure that your patch does not allow blk_get_request() to
> > succeed after the DYING flag has been set? blk_mq_alloc_request() calls both
> > blk_queue_is_preempt_frozen() and blk_queue_enter_live() without holding
> > any lock. A thread that is running concurrently with blk_mq_get_request()
> > can unfreeze the queue after blk_queue_is_preempt_frozen() returned and
> > before blk_queue_enter_live() is called. This means that with your patch
> > series applied blk_get_request() can succeed after the DYING flag has been
> > set, which is something we don't want. Additionally, I don't think we want
> > to introduce any kind of locking in blk_mq_get_request() because that would
> > be a serialization point.
>
> Yeah, I am pretty sure.
> 
> Firstly blk_queue_freeze_preempt() is exclusive, that means it will wait
> for completion of all pending freezing(both normal and preempt), and other
> freezing can't be started too if there is in-progress preempt
> freezing, actually it is a typical read/write lock use case, but
> we need to support nested normal freezing, so we can't use rwsem. 

You seem to overlook that blk_get_request() can be called from another thread
than the thread that is performing the freezing and unfreezing.

Bart.

Re: [PATCH V3 7/8] block: allow to allocate req with REQF_PREEMPT when queue is preempt frozen

2017-09-04 Thread Ming Lei
On Mon, Sep 04, 2017 at 03:40:35PM +, Bart Van Assche wrote:
> On Mon, 2017-09-04 at 15:16 +0800, Ming Lei wrote:
> > On Mon, Sep 04, 2017 at 04:13:26AM +, Bart Van Assche wrote:
> > > Allowing blk_get_request() to succeed after the DYING flag has been set is
> > > completely wrong because that could result in a request being queued after
> > > the DEAD flag has been set, resulting in either a hanging request or a 
> > > kernel
> > > crash. This is why it's completely wrong to add a blk_queue_enter_live() 
> > > call
> > > in blk_old_get_request() or blk_mq_alloc_request(). Hence my NAK for any
> > > patch that adds a blk_queue_enter_live() call to any function called from
> > > blk_get_request(). That includes the patch at the start of this e-mail 
> > > thread.
> >
> > See above, this patch changes nothing about this fact, please look at
> > the patch carefully next time just before posting your long comment.
> 
> Are you really sure that your patch does not allow blk_get_request() to
> succeed after the DYING flag has been set? blk_mq_alloc_request() calls both

Yeah, I am pretty sure.

Firstly blk_queue_freeze_preempt() is exclusive, that means it will wait
for completion of all pending freezing(both normal and preempt), and other
freezing can't be started too if there is in-progress preempt
freezing, actually it is a typical read/write lock use case, but
we need to support nested normal freezing, so we can't use rwsem.

Also DYNING flag is checked first before starting preempt freezing, the
API will return and preempt_freezing flag isn't set if DYNING is set.

Secondly after preempt freezing is started:

- for block legacy path, dying is always tested in the entry of
__get_request(), so no new request is allocated after queue is dying.

- for blk-mq, it is normal for the DYNING flag to be set just
between blk_queue_enter() and allocating the request, because
we depend on lld to handle the case. Even we can enhance the point
by checking dying flag in blk_queue_enter(), but that is just
a improvement, not mean V3 isn't correct. 

> blk_queue_is_preempt_frozen() and blk_queue_enter_live() without holding
> any lock. A thread that is running concurrently with blk_mq_get_request()
> can unfreeze the queue after blk_queue_is_preempt_frozen() returned and
> before blk_queue_enter_live() is called. This means that with your patch

preempt freezing is exclusive, so no other freezing can be started at all,
then no such issue you worried about.

> series applied blk_get_request() can succeed after the DYING flag has been
> set, which is something we don't want. Additionally, I don't think we want
> to introduce any kind of locking in blk_mq_get_request() because that would
> be a serialization point.

That needn't to be worried about, as you saw, we can check
percpu_ref_is_dying() first, then acquire the lock to check
flag if queue is freezing.

> 
> Have you considered to use the blk-mq "reserved request" mechanism to avoid
> starvation of power management requests instead of making the block layer
> even more complicated than it already is?

reserved request is really a bad idea, that means the reserved request
can't be used for normal I/O, we all know the request/tag space is
precious, and some device has a quite small tag space, such as sata.
This way will affect performance definitely.

Also I don't think the approach is complicated, and actually the idea
is simple, and the implementation isn't complicated too.

> 
> Note: extending blk_mq_freeze/unfreeze_queue() to the legacy block layer
> could be useful to make scsi_wait_for_queuecommand() more elegant. However,
> I don't think we should spend our time on legacy block layer / SCSI core
> changes. The code I'm referring to is the following:

That can be side-product of this approach, but this patchset is just
focus on fixing I/O hang.

-- 
Ming


Re: [PATCH V3 7/8] block: allow to allocate req with REQF_PREEMPT when queue is preempt frozen

2017-09-04 Thread Bart Van Assche
On Mon, 2017-09-04 at 15:16 +0800, Ming Lei wrote:
> On Mon, Sep 04, 2017 at 04:13:26AM +, Bart Van Assche wrote:
> > Allowing blk_get_request() to succeed after the DYING flag has been set is
> > completely wrong because that could result in a request being queued after
> > the DEAD flag has been set, resulting in either a hanging request or a 
> > kernel
> > crash. This is why it's completely wrong to add a blk_queue_enter_live() 
> > call
> > in blk_old_get_request() or blk_mq_alloc_request(). Hence my NAK for any
> > patch that adds a blk_queue_enter_live() call to any function called from
> > blk_get_request(). That includes the patch at the start of this e-mail 
> > thread.
>
> See above, this patch changes nothing about this fact, please look at
> the patch carefully next time just before posting your long comment.

Are you really sure that your patch does not allow blk_get_request() to
succeed after the DYING flag has been set? blk_mq_alloc_request() calls both
blk_queue_is_preempt_frozen() and blk_queue_enter_live() without holding
any lock. A thread that is running concurrently with blk_mq_get_request()
can unfreeze the queue after blk_queue_is_preempt_frozen() returned and
before blk_queue_enter_live() is called. This means that with your patch
series applied blk_get_request() can succeed after the DYING flag has been
set, which is something we don't want. Additionally, I don't think we want
to introduce any kind of locking in blk_mq_get_request() because that would
be a serialization point.

Have you considered to use the blk-mq "reserved request" mechanism to avoid
starvation of power management requests instead of making the block layer
even more complicated than it already is?

Note: extending blk_mq_freeze/unfreeze_queue() to the legacy block layer
could be useful to make scsi_wait_for_queuecommand() more elegant. However,
I don't think we should spend our time on legacy block layer / SCSI core
changes. The code I'm referring to is the following:

/**
 * scsi_wait_for_queuecommand() - wait for ongoing queuecommand() calls
 * @sdev: SCSI device pointer.
 *
 * Wait until the ongoing shost->hostt->queuecommand() calls that are
 * invoked from scsi_request_fn() have finished.
 */
static void scsi_wait_for_queuecommand(struct scsi_device *sdev)
{
WARN_ON_ONCE(sdev->host->use_blk_mq);

while (scsi_request_fn_active(sdev))
msleep(20);
}

Bart.

Re: [PATCH V3 6/8] block: introduce preempt version of blk_[freeze|unfreeze]_queue

2017-09-04 Thread Bart Van Assche
On Sat, 2017-09-02 at 21:08 +0800, Ming Lei wrote:
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -565,6 +565,10 @@ struct request_queue {
>  
>   int bypass_depth;
>   atomic_tmq_freeze_depth;
> + spinlock_t  freeze_lock;
> + unsignednormal_freezing:1;
> + unsignedpreempt_freezing:1;
> + unsignedpreempt_unfreezing:1;
>  
>  #if defined(CONFIG_BLK_DEV_BSG)
>   bsg_job_fn  *bsg_job_fn;

Requests queues already have to many states and you want to make request queues
even more complicated by introducing several new state variables? Yikes!

Bart.

Re: [PATCH V7 00/10] mmc: Add Command Queue support

2017-09-04 Thread Ulf Hansson
On 4 September 2017 at 09:06, Adrian Hunter  wrote:
> On 01/09/17 16:28, Adrian Hunter wrote:
>> On 01/09/17 15:58, Ulf Hansson wrote:
>>> + Christoph
>>>
>>> On 1 September 2017 at 13:42, Adrian Hunter  wrote:
 On 31/08/17 14:56, Adrian Hunter wrote:
> Here is V7 of the hardware command queue patches without the software
> command queue patches, now using blk-mq.
>
> HW CMDQ offers 25% - 50% better random multi-threaded I/O.  I see a slight
> 2% drop in sequential read speed but no change to sequential write.

 Any comments?
>>>
>>> A couple of overall comments, for now.
>>>
>>> To make sure we don't overlook something when converting to mq, I
>>> would prefer that we first convert the existing mmc block code to mq,
>>> then we add CMDQ on top.
>>
>> That doesn't make sense.  This patch set is not converting the legacy driver
>> to mq therefore it cannot overlook anything for converting to mq.
>
> And then you go silent again.

We have weekends in Sweden - and I also work on other things than mmc :-).

I do however admit, that I could have been reviewing a bit faster
throughout the re-spins. Apologize for that, but I am only doing my
best.

>
> I can send blk-mq support for legacy requests in a few days if you like, but
> I want to hear a better explanation of why you are delaying CQE support.

That would be very nice, however be aware of that we are in the merge
window, so I am not picking new material for 4.14 from this point. I
assume you understand why.

Still, big changes is always nice to queue up early for a release
cycle. Let's aim for that!

Moreover, I am not delaying CQE, but really want it to be merged asap!
However, I am also having the role as a maintainer and the things that
comes with it. For example, I would like the community to reach
consensus around how to move forward with CQE, before I decide to pick
it up.

Kind regards
Uffe


Re: [PATCH 2/4] fs: support IOCB_NOWAIT in generic_file_buffered_read

2017-09-04 Thread Jan Kara
On Tue 29-08-17 16:13:19, Christoph Hellwig wrote:
> From: Milosz Tanski 
> 
> Allow generic_file_buffered_read to bail out early instead of waiting for
> the page lock or reading a page if IOCB_NOWAIT is specified.
> 
> Signed-off-by: Milosz Tanski 
> Reviewed-by: Christoph Hellwig 
> Reviewed-by: Jeff Moyer 
> Acked-by: Sage Weil 

This looks good to me now. You can add:

Reviewed-by: Jan Kara 

Honza


> ---
>  mm/filemap.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 4bcfa74ad802..eed394fd331c 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1937,6 +1937,8 @@ static ssize_t generic_file_buffered_read(struct kiocb 
> *iocb,
>  
>   page = find_get_page(mapping, index);
>   if (!page) {
> + if (iocb->ki_flags & IOCB_NOWAIT)
> + goto would_block;
>   page_cache_sync_readahead(mapping,
>   ra, filp,
>   index, last_index - index);
> @@ -1950,6 +1952,11 @@ static ssize_t generic_file_buffered_read(struct kiocb 
> *iocb,
>   index, last_index - index);
>   }
>   if (!PageUptodate(page)) {
> + if (iocb->ki_flags & IOCB_NOWAIT) {
> + put_page(page);
> + goto would_block;
> + }
> +
>   /*
>* See comment in do_read_cache_page on why
>* wait_on_page_locked is used to avoid unnecessarily
> @@ -2131,6 +2138,8 @@ static ssize_t generic_file_buffered_read(struct kiocb 
> *iocb,
>   goto readpage;
>   }
>  
> +would_block:
> + error = -EAGAIN;
>  out:
>   ra->prev_pos = prev_index;
>   ra->prev_pos <<= PAGE_SHIFT;
> -- 
> 2.11.0
> 
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH] blk-mq: fix nr_requests wrong value when modify it from sysfs

2017-09-04 Thread Ming Lei
On Sun, Sep 3, 2017 at 9:46 PM, weiping zhang
 wrote:
> if blk-mq use "none" io scheduler, nr_request get a wrong value when
> input a number > tag_set->queue_depth. blk_mq_tag_update_depth will get
> the smaller one min(nr, set->queue_depth), and then q->nr_request get a
> wrong value.
>
> Reproduce:
>
> echo none > /sys/block/nvme0n1/queue/ioscheduler
> echo 100 > /sys/block/nvme0n1/queue/nr_requests
> cat /sys/block/nvme0n1/queue/nr_requests
> 100
>
> Signed-off-by: weiping zhang 
> ---
>  block/blk-mq.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index f84d145..8303e5e 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2622,8 +2622,11 @@ int blk_mq_update_nr_requests(struct request_queue *q, 
> unsigned int nr)
>  * queue depth. This is similar to what the old code would do.
>  */
> if (!hctx->sched_tags) {
> -   ret = blk_mq_tag_update_depth(hctx, &hctx->tags,
> -   min(nr, 
> set->queue_depth),
> +   if (nr > set->queue_depth) {
> +   nr = set->queue_depth;
> +   pr_warn("reduce nr_request to %u\n", nr);
> +   }
> +   ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
> false);
> } else {
> ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,

Not sure if the pr_warn() is useful, but the idea is correct:

  Reviewed-by: Ming Lei 

-- 
Ming Lei


Re: Memory leak in BFQ?

2017-09-04 Thread Paolo Valente

> Il giorno 31 ago 2017, alle ore 22:42, Bart Van Assche 
>  ha scritto:
> 
> Hello Paolo,
> 
> If I run the following commands:
> 
> # echo bfq > /sys/block/sda/queue/scheduler 
> # echo 3 >/proc/sys/vm/drop_caches; find / -xdev >/dev/null
> # echo none > /sys/block/sda/queue/scheduler
> # echo scan > /sys/kernel/debug/kmemleak
> 
> Then kmemleak reports the following:
> 
> unreferenced object 0x880405149158 (size 4096):
>  comm "bash", pid 1716, jiffies 4294960386 (age 75256.970s)
>  hex dump (first 32 bytes):
>48 a5 f4 a3 04 88 ff ff 01 00 00 00 00 00 00 00  H...
>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>  backtrace:
>[] kmemleak_alloc+0x4a/0xa0
>[] kmem_cache_alloc_node_trace+0x154/0x310
>[] bfq_pd_alloc+0x3c/0x3f0 [bfq]
>[] blkcg_activate_policy+0xac/0x170
>[] bfq_create_group_hierarchy+0x1c/0x66 [bfq]
>[] bfq_init_queue+0x28b/0x350 [bfq]
>[] blk_mq_init_sched+0xbb/0x150
>[] elevator_switch+0x62/0x210
>[] elv_iosched_store+0xe2/0x180
>[] queue_attr_store+0x59/0x90
>[] sysfs_kf_write+0x45/0x60
>[] kernfs_fop_write+0x124/0x1c0
>[] __vfs_write+0x28/0x150
>[] vfs_write+0xc7/0x1b0
>[] SyS_write+0x49/0xa0
>[] entry_SYSCALL_64_fastpath+0x18/0xad
> 
> Could this indicate a memory leak in the BFQ code?
> 

I guess so.  But I've been trying to reproduce it at no avail.  Is it
readily reproducible in your system?  If so, did you happen, by chance,
to find also some other way to reproduce it?

Thanks,
Paolo

> Thanks,
> 
> Bart.



Re: [PATCH V4 00/14] blk-mq-sched: improve SCSI-MQ performance

2017-09-04 Thread Paolo Valente

> Il giorno 02 set 2017, alle ore 17:17, Ming Lei  ha 
> scritto:
> 
> Hi,
> 
> In Red Hat internal storage test wrt. blk-mq scheduler, we
> found that I/O performance is much bad with mq-deadline, especially
> about sequential I/O on some multi-queue SCSI devcies(lpfc, qla2xxx,
> SRP...)
> 
> Turns out one big issue causes the performance regression: requests
> are still dequeued from sw queue/scheduler queue even when ldd's
> queue is busy, so I/O merge becomes quite difficult to make, then
> sequential IO degrades a lot.
> 
> The 1st five patches improve this situation, and brings back
> some performance loss.
> 
> Patch 6 ~ 7 uses q->queue_depth as hint for setting up
> scheduler queue depth.
> 
> Patch 8 ~ 15 improve bio merge via hash table in sw queue,
> which makes bio merge more efficient than current approch
> in which only the last 8 requests are checked. Since patch
> 6~14 converts to the scheduler way of dequeuing one request
> from sw queue one time for SCSI device, and the times of
> acquring ctx->lock is increased, and merging bio via hash
> table decreases holding time of ctx->lock and should eliminate
> effect from patch 14. 
> 
> With this changes, SCSI-MQ sequential I/O performance is
> improved much, Paolo reported that mq-deadline performance
> improved much[2] in his dbench test wrt V2. Also performanc
> improvement on lpfc/qla2xx was observed with V1.[1]
> 
> Also Bart worried that this patchset may affect SRP, so provide
> test data on SCSI SRP this time:
> 
> - fio(libaio, bs:4k, dio, queue_depth:64, 64 jobs)
> - system(16 cores, dual sockets, mem: 96G)
> 
>  |v4.13-rc6+*  |v4.13-rc6+   | patched v4.13-rc6+ 
> -
> IOPS(K)  |  DEADLINE   |NONE |NONE 
> -
> read  |  587.81 |  511.96 |  518.51 
> -
> randread  |  116.44 |  142.99 |  142.46 
> -
> write |  580.87 |   536.4 |  582.15 
> -
> randwrite |  104.95 |  124.89 |  123.99 
> -
> 
> 
>  |v4.13-rc6+   |v4.13-rc6+   | patched v4.13-rc6+ 
> -
> IOPS(K)  |  DEADLINE   |MQ-DEADLINE  |MQ-DEADLINE  
> -
> read  |  587.81 |   158.7 |  450.41 
> -
> randread  |  116.44 |  142.04 |  142.72 
> -
> write |  580.87 |  136.61 |  569.37 
> -
> randwrite |  104.95 |  123.14 |  124.36 
> -
> 
> *: v4.13-rc6+ means v4.13-rc6 with block for-next
> 
> 
> Please consider to merge to V4.4.
> 
> [1] http://marc.info/?l=linux-block&m=150151989915776&w=2
> [2] https://marc.info/?l=linux-block&m=150217980602843&w=2
> 
> V4:
>   - add Reviewed-by tag
>   - some trival change: typo fix in commit log or comment,
>   variable name, no actual functional change
> 
> V3:
>   - totally round robin for picking req from ctx, as suggested
>   by Bart
>   - remove one local variable in __sbitmap_for_each_set()
>   - drop patches of single dispatch list, which can improve
>   performance on mq-deadline, but cause a bit degrade on
>   none because all hctxs need to be checked after ->dispatch
>   is flushed. Will post it again once it is mature.
>   - rebase on v4.13-rc6 with block for-next
> 
> V2:
>   - dequeue request from sw queues in round roubin's style
>   as suggested by Bart, and introduces one helper in sbitmap
>   for this purpose
>   - improve bio merge via hash table from sw queue
>   - add comments about using DISPATCH_BUSY state in lockless way,
>   simplifying handling on busy state,
>   - hold ctx->lock when clearing ctx busy bit as suggested
>   by Bart
> 
> 

Tested-by: Paolo Valente 

> Ming Lei (14):
>  blk-mq-sched: fix scheduler bad performance
>  sbitmap: introduce __sbitmap_for_each_set()
>  blk-mq: introduce blk_mq_dispatch_rq_from_ctx()
>  blk-mq-sched: move actual dispatching into one helper
>  blk-mq-sched: improve dispatching from sw queue
>  blk-mq-sched: don't dequeue request until all in ->dispatch are
>flushed
>  blk-mq-sched: introduce blk_mq_sched_queue_depth()
>  blk-mq-sched: use q->queue_depth as hint for q->nr_requests
>  block: introduce rqhash helpers
>  block: move actual bio merge code into __elv_merge
>  block: add check on elevator for supporting bio merge via hashtable
>from blk-mq sw queue
>  block: introduce .last_merge and .hash to blk_mq_ctx
>  blk-mq-sched: refactor blk_mq_sched_t

Re: [PATCH BUGFIX/IMPROVEMENT V2 0/3] three bfq fixes restoring service guarantees with random sync writes in bg

2017-09-04 Thread Ming Lei
On Mon, Sep 4, 2017 at 4:14 PM, Mel Gorman  wrote:
> On Thu, Aug 31, 2017 at 03:42:57PM +0100, Mel Gorman wrote:
>> On Thu, Aug 31, 2017 at 08:46:28AM +0200, Paolo Valente wrote:
>> > [SECOND TAKE, with just the name of one of the tester fixed]
>> >
>> > Hi,
>> > while testing the read-write unfairness issues reported by Mel, I
>> > found BFQ failing to guarantee good responsiveness against heavy
>> > random sync writes in the background, i.e., multiple writers doing
>> > random writes and systematic fdatasync [1]. The failure was caused by
>> > three related bugs, because of which BFQ failed to guarantee to
>> > high-weight processes the expected fraction of the throughput.
>> >
>>
>> Queued on top of Ming's most recent series even though that's still a work
>> in progress. I should know in a few days how things stand.
>>
>
> The problems with parallel heavy writers seem to have disappeared with this
> series. There are still revisions taking place on Ming's to overall setting
> of legacy vs mq is still a work in progress but this series looks good.

Hi Mel and Paolo,

BTW, no actual functional change in V4.

Also could you guys provide one tested-by since looks you are using
it in your test?

Thanks,
Ming Lei


Re: [PATCH BUGFIX/IMPROVEMENT V2 0/3] three bfq fixes restoring service guarantees with random sync writes in bg

2017-09-04 Thread Paolo Valente

> Il giorno 04 set 2017, alle ore 10:14, Mel Gorman 
>  ha scritto:
> 
> On Thu, Aug 31, 2017 at 03:42:57PM +0100, Mel Gorman wrote:
>> On Thu, Aug 31, 2017 at 08:46:28AM +0200, Paolo Valente wrote:
>>> [SECOND TAKE, with just the name of one of the tester fixed]
>>> 
>>> Hi,
>>> while testing the read-write unfairness issues reported by Mel, I
>>> found BFQ failing to guarantee good responsiveness against heavy
>>> random sync writes in the background, i.e., multiple writers doing
>>> random writes and systematic fdatasync [1]. The failure was caused by
>>> three related bugs, because of which BFQ failed to guarantee to
>>> high-weight processes the expected fraction of the throughput.
>>> 
>> 
>> Queued on top of Ming's most recent series even though that's still a work
>> in progress. I should know in a few days how things stand.
>> 
> 
> The problems with parallel heavy writers seem to have disappeared with this
> series. There are still revisions taking place on Ming's to overall setting
> of legacy vs mq is still a work in progress but this series looks good.
> 

Great news!

Thanks for testing,
Paolo

> Thanks.
> 
> -- 
> Mel Gorman
> SUSE Labs



Re: [PATCH BUGFIX/IMPROVEMENT V2 0/3] three bfq fixes restoring service guarantees with random sync writes in bg

2017-09-04 Thread Mel Gorman
On Thu, Aug 31, 2017 at 03:42:57PM +0100, Mel Gorman wrote:
> On Thu, Aug 31, 2017 at 08:46:28AM +0200, Paolo Valente wrote:
> > [SECOND TAKE, with just the name of one of the tester fixed]
> > 
> > Hi,
> > while testing the read-write unfairness issues reported by Mel, I
> > found BFQ failing to guarantee good responsiveness against heavy
> > random sync writes in the background, i.e., multiple writers doing
> > random writes and systematic fdatasync [1]. The failure was caused by
> > three related bugs, because of which BFQ failed to guarantee to
> > high-weight processes the expected fraction of the throughput.
> > 
> 
> Queued on top of Ming's most recent series even though that's still a work
> in progress. I should know in a few days how things stand.
> 

The problems with parallel heavy writers seem to have disappeared with this
series. There are still revisions taking place on Ming's to overall setting
of legacy vs mq is still a work in progress but this series looks good.

Thanks.

-- 
Mel Gorman
SUSE Labs


Re: [PATCH V3 7/8] block: allow to allocate req with REQF_PREEMPT when queue is preempt frozen

2017-09-04 Thread Ming Lei
On Mon, Sep 04, 2017 at 04:13:26AM +, Bart Van Assche wrote:
> On Sat, 2017-09-02 at 21:12 +0800, Ming Lei wrote:
> > Please let us know if V3 addresses your previous concern about calling
> > blk_queue_enter_live() during preempt freezing.
> 
> Do you understand how request queue cleanup works? The algorithm used for
> request queue cleanup is as follows:
> * Set the DYING flag. This flag makes all later blk_get_request() calls
>   fail.

If you look at the __blk_freeze_queue_start(), you will see that preemp
freeze will fail after queue is dying, and blk_queue_enter_live() won't
be called at all if queue is dying, right?

> * Wait until all pending requests have finished.
> * Set the DEAD flag. For the traditional block layer, this flag causes
>   blk_run_queue() not to call .request_fn() anymore. For blk-mq it is
>   guaranteed in another way that .queue_rq() won't be called anymore after
>   this flag has been set.
> 
> Allowing blk_get_request() to succeed after the DYING flag has been set is
> completely wrong because that could result in a request being queued after

See above, this patch changes nothing about this fact, please look at
the patch carefully next time just before posting your long comment.

> the DEAD flag has been set, resulting in either a hanging request or a kernel
> crash. This is why it's completely wrong to add a blk_queue_enter_live() call
> in blk_old_get_request() or blk_mq_alloc_request(). Hence my NAK for any
> patch that adds a blk_queue_enter_live() call to any function called from
> blk_get_request(). That includes the patch at the start of this e-mail thread.
> 
> Bart.

-- 
Ming


Re: [PATCH V7 00/10] mmc: Add Command Queue support

2017-09-04 Thread Adrian Hunter
On 01/09/17 16:28, Adrian Hunter wrote:
> On 01/09/17 15:58, Ulf Hansson wrote:
>> + Christoph
>>
>> On 1 September 2017 at 13:42, Adrian Hunter  wrote:
>>> On 31/08/17 14:56, Adrian Hunter wrote:
 Here is V7 of the hardware command queue patches without the software
 command queue patches, now using blk-mq.

 HW CMDQ offers 25% - 50% better random multi-threaded I/O.  I see a slight
 2% drop in sequential read speed but no change to sequential write.
>>>
>>> Any comments?
>>
>> A couple of overall comments, for now.
>>
>> To make sure we don't overlook something when converting to mq, I
>> would prefer that we first convert the existing mmc block code to mq,
>> then we add CMDQ on top.
> 
> That doesn't make sense.  This patch set is not converting the legacy driver
> to mq therefore it cannot overlook anything for converting to mq.

And then you go silent again.

I can send blk-mq support for legacy requests in a few days if you like, but
I want to hear a better explanation of why you are delaying CQE support.