Re: [PATCH] xen-blkfront: Fix handling of non-supported operations

2017-07-24 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 


Re: [PATCH for-4.13] block: disable runtime-pm for blk-mq

2017-07-24 Thread Ming Lei
On Fri, Jul 21, 2017 at 01:46:10PM +0200, Christoph Hellwig wrote:
> The blk-mq code lacks support for looking at the rpm_status field, tracking
> active requests and the RQF_PM flag.
> 
> Due to the default switch to blk-mq for scsi people start to run into
> suspend / resume issue due to this fact, so make sure we disable the runtime
> PM functionality until it is properly implemented.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  block/blk-core.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 970b9c9638c5..dbecbf4a64e0 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -3421,6 +3421,10 @@ EXPORT_SYMBOL(blk_finish_plug);
>   */
>  void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
>  {
> + /* not support for RQF_PM and ->rpm_status in blk-mq yet */

The big problem should be that the SCSI device can be runtime suspended,
but never to get back in case of blk-mq.

> + if (q->mq_ops)
> + return;
> +
>   q->dev = dev;
>   q->rpm_status = RPM_ACTIVE;
>   pm_runtime_set_autosuspend_delay(q->dev, -1);
> -- 
> 2.11.0
> 

Either blk-mq or scsi-mq isn't ready for runtime PM, so
we can't enable runtime PM until both are ready.

Reviewed-by: Ming Lei 

-- 
Ming


Re: [PATCH] xen-blkfront: Fix handling of non-supported operations

2017-07-24 Thread Jens Axboe
On 07/21/2017 11:11 AM, Bart Van Assche wrote:
> This patch fixes the following sparse warnings:
> 
> drivers/block/xen-blkfront.c:916:45: warning: incorrect type in argument 2 
> (different base types)
> drivers/block/xen-blkfront.c:916:45:expected restricted blk_status_t 
> [usertype] error
> drivers/block/xen-blkfront.c:916:45:got int [signed] error
> drivers/block/xen-blkfront.c:1599:47: warning: incorrect type in assignment 
> (different base types)
> drivers/block/xen-blkfront.c:1599:47:expected int [signed] error
> drivers/block/xen-blkfront.c:1599:47:got restricted blk_status_t 
> [usertype] 
> drivers/block/xen-blkfront.c:1607:55: warning: incorrect type in assignment 
> (different base types)
> drivers/block/xen-blkfront.c:1607:55:expected int [signed] error
> drivers/block/xen-blkfront.c:1607:55:got restricted blk_status_t 
> [usertype] 
> drivers/block/xen-blkfront.c:1625:55: warning: incorrect type in assignment 
> (different base types)
> drivers/block/xen-blkfront.c:1625:55:expected int [signed] error
> drivers/block/xen-blkfront.c:1625:55:got restricted blk_status_t 
> [usertype] 
> drivers/block/xen-blkfront.c:1628:62: warning: restricted blk_status_t 
> degrades to integer
> 
> Compile-tested only.

Applied, but I killed your stable tag. Why did you add that?

-- 
Jens Axboe



Re: [PATCH for-4.13] block: disable runtime-pm for blk-mq

2017-07-24 Thread Jens Axboe
On 07/21/2017 05:46 AM, Christoph Hellwig wrote:
> The blk-mq code lacks support for looking at the rpm_status field, tracking
> active requests and the RQF_PM flag.
> 
> Due to the default switch to blk-mq for scsi people start to run into
> suspend / resume issue due to this fact, so make sure we disable the runtime
> PM functionality until it is properly implemented.

Added, thanks.

-- 
Jens Axboe



Re: [PATCH] xen-blkfront: Fix handling of non-supported operations

2017-07-24 Thread Bart Van Assche
On Mon, 2017-07-24 at 08:46 -0600, Jens Axboe wrote:
> On 07/21/2017 11:11 AM, Bart Van Assche wrote:
> > This patch fixes the following sparse warnings:
> > 
> > drivers/block/xen-blkfront.c:916:45: warning: incorrect type in argument 2 
> > (different base types)
> > drivers/block/xen-blkfront.c:916:45:expected restricted blk_status_t 
> > [usertype] error
> > drivers/block/xen-blkfront.c:916:45:got int [signed] error
> > drivers/block/xen-blkfront.c:1599:47: warning: incorrect type in assignment 
> > (different base types)
> > drivers/block/xen-blkfront.c:1599:47:expected int [signed] error
> > drivers/block/xen-blkfront.c:1599:47:got restricted blk_status_t 
> > [usertype] 
> > drivers/block/xen-blkfront.c:1607:55: warning: incorrect type in assignment 
> > (different base types)
> > drivers/block/xen-blkfront.c:1607:55:expected int [signed] error
> > drivers/block/xen-blkfront.c:1607:55:got restricted blk_status_t 
> > [usertype] 
> > drivers/block/xen-blkfront.c:1625:55: warning: incorrect type in assignment 
> > (different base types)
> > drivers/block/xen-blkfront.c:1625:55:expected int [signed] error
> > drivers/block/xen-blkfront.c:1625:55:got restricted blk_status_t 
> > [usertype] 
> > drivers/block/xen-blkfront.c:1628:62: warning: restricted blk_status_t 
> > degrades to integer
> > 
> > Compile-tested only.
> 
> Applied, but I killed your stable tag. Why did you add that?

Hello Jens,

That tag was added based on the output of git describe:
$ git describe 2a842acab109
v4.12-rc2-199-g2a842acab109

However, the following command shows that the above output is misleading and 
that the
stable tag is indeed not needed:
$ git log v4.12..origin/master | grep '^commit 2a842acab109'
commit 2a842acab109f40f0d7d10b38e9ca88390628996

Bart.

Re: [PATCH] blk-mq: map queues to all present CPUs

2017-07-24 Thread Jens Axboe
On 07/18/2017 09:04 AM, Christoph Hellwig wrote:
> We already do this for PCI mappings, and the higher level code now
> expects that CPU on/offlining doesn't have an affect on the queue
> mappings.

Applied for 4.13, thanks.

-- 
Jens Axboe



Re: [PATCH] block: note about cloned bios and bio_for_each_segment_all

2017-07-24 Thread Peter Zijlstra
On Tue, Jul 18, 2017 at 03:19:00PM -0700, Bart Van Assche wrote:
> Hello Peter,

Sorry for being late, I'm trying to recover from a few weeks of leave
and the inbox is in shambles.

> In a test I ran myself with kernel v4.12-rc1 I also noticed that a
> WARN_ON_ONCE() statement triggered an oops in report_bug() and killed the
> kernel thread of the caller instead of letting the caller continue. What I
> ran into is probably the same oops as in the above call trace. For the test
> I ran myself the disassembly is as follows:
> 
> (gdb) list *(report_bug+0x94)
> 0x812ba024 is in report_bug (lib/bug.c:177).
> 172 return BUG_TRAP_TYPE_WARN;
> 173
> 174 /*
> 175  * Since this is the only store, concurrency 
> is not an issue.
> 176  */
> 177 bug->flags |= BUGFLAG_DONE;
> 178 }
> 179 }
> 180
> 181 if (warning) {
> 
> Could this be related to patch "debug: Add _ONCE() logic to report_bug()"?
> Is this a known issue? If so, is a fix perhaps already available?

Yep..  please see:

  https://marc.info/?l=linux-kernel&m=150055119925677


Re: handling cache device disconnection more properly ( was Re: [PATCH] bcache: only recovery I/O error for writethrough mode)

2017-07-24 Thread Eric Wheeler
On Thu, 13 Jul 2017, Coly Li wrote:

> What we are discussing is more then the original patch, now the topic
> changes to how handling cache device disconnection more properly. So I
> change email thread subject.
> 
> On 2017/7/13 上午8:53, Eric Wheeler wrote:
> > On Wed, 12 Jul 2017, Coly Li wrote:
> > 
> >> On 2017/7/12 上午10:01, tang.jun...@zte.com.cn wrote:
>  I meant "it is very necessary for data base applications which always
>  use *writeback* mode and not switch to other mode during all their
>  online time."  ^_^
> >>>
> >>> I know, it is necessary, but not enough. who can promise they will not
> >>> switch during online time? This patch is logical imperfectly.
> >>
> >> Yes, I agree with you. Since Eric mentions dirty data map, an improved
> >> fix shows up in my head,
> >>
> >> When cache device disconnected from system,
> >> 0) If in non mode, do nothing.
> > 
> > Does non mode guarantee that nothing is dirty?  I'm not convinced of that.  
> > I think you can set non mode with dirty blocks. (Correct me if I'm wrong 
> > here.)
> > 
> 
> I think you are correct. Your question notices me, that it is still
> possible that user switches cache mode more then once as they want,
> maybe some sequence like this,
>  writeback -> writethrough -> none -> writeback -> none 
> So we should always check whether dirty map exists or clean, no matter
> what current cache mode is.
> 
> Nice hint :-)
> 
> 
> >> 1) If in writeback/writethough/writearound mode, and dirty map is clean,
> >>- switch to non mode
> >>- continue to handle I/O without cache device
> > 
> > Sure, that makes sense.  
> > 
> >> 2) If in writeback mode, and dirty map is not clean,
> > 
> > You would want to do a dirty map lookup for each IO.  How about this:
> > 
> > 2) If in _any_ mode, and dirty map is dirty for *that specific block*:
> > 
> >   If WRITE request completely overlaps the dirty segment then clear 
> > the dirty flag and pass through to the backing dev.
> >   otherwise:
> > - return -EIO immediately for WRITE request
> > - return -EIO immediately for READ request (*)
> > 
> > If the WRITE request completely overlaps the dirty segment as indicated 
> > from the in-memory metadata, then clear its dirty flag and write to the 
> > backing device.  Whatever was dirty isn't important anymore as it was 
> > overwritten.
> 
> What I worried here is, the lost dirty data blocks is probably to have
> inter-dependency, e.g. file system metadata (maybe database transaction
> records).
> 
> If the cache device disconnected and a single overlapped dirty block is
> permitted to go into backing device. It may cause a more worse metadata
> corruption and data lose on backing device.

I think we might be using the term "overlapped" in two different ways: I 
think you mean overlap as a request which only partially overwrites the 
data which is dirty.  My meaning for overlap was that it completely 
overlaps the dirty data, that is, the whole block specified by the WRITE 
request is already dirty at the time of submission and exactly matches 
what the dirty map indicates such that clearing the dirty map for the 
request does not clear the dirty map for any dirty data that is not 
overwritten by the request.

I agree that WRITE requests which do not fully overwrite the dirty block 
must -EIO.

We can only write to the backing device if the WRITE request being made is 
to an offset+length that is completely dirty, in which case the related 
cache block in-memory dirty flag can be cleared.  It must completely match 
the dirty block size so the write complete replaces the dirty area.  

Does this seem correct?  If not, please suggest an example to illustrate.

> > 
> > Unless there is a good reason to diverge, we would want this recovery 
> > logic would be the same for failed IOs from an existing cachedev (eg, with 
> > badblocks), and for cachedevs that are altogether missing.
> > 
> 
> For clean data lost, this is totally correct. For dirty data lost, it
> might not be always correct. At least for writeback mode, this recovery
> logic is buggy. Return a corrupted/stale data in silence is disaster,
> this logic should be fixed.

I think we are saying the same thing.  Always return valid data or -EIO.  
I'm just suggesting that the -EIO path from the cache device should be the 
same logic whether it is because the driver returned -EIO or because the 
cache device is missing.

> > 
> >> 3) If not in writeback mode, and dirty map is not clean. It means the
> >> cache mode is switched from writeback mode with dirty data lost, then
> >>- returns -EIO immediately for WRITE request
> >>- returns -EIO immediately for READ request (*)
> > 
> > For #2,3, do a dirty map lookup for every IO: if the block is clean, pass 
> > it to the backing device.  Only -EIO if the request cannot be recovered 
> > (block is dirty) and invoke pr_crit_once() to notify the user.  We want 
> > all IO requests to succeed to the extent possi

Re: [Nbd] [PATCH 1/3] nbd: allow multiple disconnects to be sent

2017-07-24 Thread Wouter Verhelst
On Fri, Jul 21, 2017 at 10:48:13AM -0400, jo...@toxicpanda.com wrote:
> From: Josef Bacik 
> 
> There's no reason to limit ourselves to one disconnect message per
> socket.  Sometimes networks do strange things, might as well let
> sysadmins hit the panic button as much as they want.

The protocol spec is pretty clear that any requests sent after the
disconnect request was sent out are not guaranteed to be processed
anymore.

Doesn't this allow more requests to be sent out? Or is the
NBD_DISCONNECT_REQUESTED flag enough to make that impossible?

-- 
Could you people please use IRC like normal people?!?

  -- Amaya Rodrigo Sastre, trying to quiet down the buzz in the DebConf 2008
 Hacklab


Re: [PATCH] xen-blkfront: Fix handling of non-supported operations

2017-07-24 Thread Omar Sandoval
On Mon, Jul 24, 2017 at 03:10:09PM +, Bart Van Assche wrote:
> On Mon, 2017-07-24 at 08:46 -0600, Jens Axboe wrote:
> > On 07/21/2017 11:11 AM, Bart Van Assche wrote:
> > > This patch fixes the following sparse warnings:
> > > 
> > > drivers/block/xen-blkfront.c:916:45: warning: incorrect type in argument 
> > > 2 (different base types)
> > > drivers/block/xen-blkfront.c:916:45:expected restricted blk_status_t 
> > > [usertype] error
> > > drivers/block/xen-blkfront.c:916:45:got int [signed] error
> > > drivers/block/xen-blkfront.c:1599:47: warning: incorrect type in 
> > > assignment (different base types)
> > > drivers/block/xen-blkfront.c:1599:47:expected int [signed] error
> > > drivers/block/xen-blkfront.c:1599:47:got restricted blk_status_t 
> > > [usertype] 
> > > drivers/block/xen-blkfront.c:1607:55: warning: incorrect type in 
> > > assignment (different base types)
> > > drivers/block/xen-blkfront.c:1607:55:expected int [signed] error
> > > drivers/block/xen-blkfront.c:1607:55:got restricted blk_status_t 
> > > [usertype] 
> > > drivers/block/xen-blkfront.c:1625:55: warning: incorrect type in 
> > > assignment (different base types)
> > > drivers/block/xen-blkfront.c:1625:55:expected int [signed] error
> > > drivers/block/xen-blkfront.c:1625:55:got restricted blk_status_t 
> > > [usertype] 
> > > drivers/block/xen-blkfront.c:1628:62: warning: restricted blk_status_t 
> > > degrades to integer
> > > 
> > > Compile-tested only.
> > 
> > Applied, but I killed your stable tag. Why did you add that?
> 
> Hello Jens,
> 
> That tag was added based on the output of git describe:
> $ git describe 2a842acab109
> v4.12-rc2-199-g2a842acab109
> 
> However, the following command shows that the above output is misleading and 
> that the
> stable tag is indeed not needed:
> $ git log v4.12..origin/master | grep '^commit 2a842acab109'
> commit 2a842acab109f40f0d7d10b38e9ca88390628996
> 
> Bart.

You want git tag --contains:

$ git tag --contains 2a842acab109
v4.13-rc1
v4.13-rc2


Re: [Nbd] [PATCH 1/3] nbd: allow multiple disconnects to be sent

2017-07-24 Thread Josef Bacik
On Mon, Jul 24, 2017 at 10:08:21PM +0200, Wouter Verhelst wrote:
> On Fri, Jul 21, 2017 at 10:48:13AM -0400, jo...@toxicpanda.com wrote:
> > From: Josef Bacik 
> > 
> > There's no reason to limit ourselves to one disconnect message per
> > socket.  Sometimes networks do strange things, might as well let
> > sysadmins hit the panic button as much as they want.
> 
> The protocol spec is pretty clear that any requests sent after the
> disconnect request was sent out are not guaranteed to be processed
> anymore.
> 
> Doesn't this allow more requests to be sent out? Or is the
> NBD_DISCONNECT_REQUESTED flag enough to make that impossible?
> 

This just allows users to call the disconnect ioctl/netlink thing multiple times
and have it send the DISCONNECT command if they want.  We've had problems with
our in-hosue nbd server missing messages, and it's just a pain to have to
unstuck it because the server messed up.  It's just for the rare case the server
is being weird, not because we expect/guarantee that subsequent disconnect
commands will be processed.  Thanks,

Josef