On Mon, Jul 29, 2013 at 12:45:33PM -0400, Paul Clements wrote:
> Michal,
> 
> thanks for this...a couple of preliminary questions...
> 
> On Mon, Jun 17, 2013 at 4:10 PM, Michal Belczyk <[email protected]>wrote:
> 
> > The main idea behind it is to be able to quickly detect broken replica
> > and switch over to another when used with any sort of mirror type device
> > built on top of any number of nbd devices.
> >
> 
> I'm wondering if a simpler change would have the same result. Namely, could
> you just apply the xmit_timeout, as currently implemented, to the recvmsg
> path (ignoring the timeout, of course, when no requests are queued). Is
> there a case that this would not handle that the proposed patch handles?
> They seem to be equivalent.

Oh... no, they would not be equivalent.
It is all my fault -- the commit message lacks very important information:

Timeouts implemented via timers directly on top of kernel_sendmsg() or
kernel_recvmsg() routines apply to a single bio_vec being sent/received at
a time while the proposed patch makes a block queue request the subject to
time out -- timeouts handling was moved from the sock_xmit() way up to
nbd_send_req() and nbd_do_it()...

Currently, assuming for this example max_sectors_kb=512, a single
direct read request from an nbd device:

# dd if=/dev/nbd0 of=/dev/null bs=512K iflag=direct count=1

generates:

[1301353.424080] nbd0: request ffff8802253d22f0: got reply
[1301353.424089] nbd0: request ffff8802253d22f0: got 4096 bytes data
[1301353.424092] nbd0: request ffff8802253d22f0: got 4096 bytes data
[ ... 128 4KB bio_vec-s / 128 sock_xmit() calls? ]

So, assuming a timeout set to 30s, a single 512KB read request could
take up to 128 times 30s which makes... 64 minutes, if the timeouts were
implemented the way you ask...
Well, they actually _are_ implemented this way in the upstream code, so
a single direct write request like this will generate 128 timer
arms/disarms AND a 30s timeout actually means that such request can take
up to 64 minutes...
I think it is wrong and I believe that if the maximum sized (subject to
now configurable max_sectors_kb tunable) request cannot be fulfilled
within a specified timeout then this NBD device queue should die,
hopefully being handled by a DM layer on top... and such request could
be happily requeued to a well-behaving replica, possibly another NBD
device...


> Also move nbd->pid modifications behind nbd->tx_lock wherever possible
> > to avoid races between the concurrent nbd-client invocations.
> >
> 
> Is there a bug related to this that you've seen, or is this just a
> precaution?

Precaution but 1) I don't like that the current code checks for nbd->pid
with nbd->tx_lock held and modifies it later with the same lock released
in nbd_do_it() and 2) the proposed patch uses nbd->pid to determine if
the device is in use (nbd-client alive) -- should it do it some other
way?

Thank you for the feedback!

-- 
Michal Belczyk Sr.

------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
_______________________________________________
Nbd-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nbd-general

Reply via email to