On Mon, Jul 29, 2013 at 12:45:33PM -0400, Paul Clements wrote: > Michal, > > thanks for this...a couple of preliminary questions... > > On Mon, Jun 17, 2013 at 4:10 PM, Michal Belczyk <[email protected]>wrote: > > > The main idea behind it is to be able to quickly detect broken replica > > and switch over to another when used with any sort of mirror type device > > built on top of any number of nbd devices. > > > > I'm wondering if a simpler change would have the same result. Namely, could > you just apply the xmit_timeout, as currently implemented, to the recvmsg > path (ignoring the timeout, of course, when no requests are queued). Is > there a case that this would not handle that the proposed patch handles? > They seem to be equivalent.
Oh... no, they would not be equivalent. It is all my fault -- the commit message lacks very important information: Timeouts implemented via timers directly on top of kernel_sendmsg() or kernel_recvmsg() routines apply to a single bio_vec being sent/received at a time while the proposed patch makes a block queue request the subject to time out -- timeouts handling was moved from the sock_xmit() way up to nbd_send_req() and nbd_do_it()... Currently, assuming for this example max_sectors_kb=512, a single direct read request from an nbd device: # dd if=/dev/nbd0 of=/dev/null bs=512K iflag=direct count=1 generates: [1301353.424080] nbd0: request ffff8802253d22f0: got reply [1301353.424089] nbd0: request ffff8802253d22f0: got 4096 bytes data [1301353.424092] nbd0: request ffff8802253d22f0: got 4096 bytes data [ ... 128 4KB bio_vec-s / 128 sock_xmit() calls? ] So, assuming a timeout set to 30s, a single 512KB read request could take up to 128 times 30s which makes... 64 minutes, if the timeouts were implemented the way you ask... Well, they actually _are_ implemented this way in the upstream code, so a single direct write request like this will generate 128 timer arms/disarms AND a 30s timeout actually means that such request can take up to 64 minutes... I think it is wrong and I believe that if the maximum sized (subject to now configurable max_sectors_kb tunable) request cannot be fulfilled within a specified timeout then this NBD device queue should die, hopefully being handled by a DM layer on top... and such request could be happily requeued to a well-behaving replica, possibly another NBD device... > Also move nbd->pid modifications behind nbd->tx_lock wherever possible > > to avoid races between the concurrent nbd-client invocations. > > > > Is there a bug related to this that you've seen, or is this just a > precaution? Precaution but 1) I don't like that the current code checks for nbd->pid with nbd->tx_lock held and modifies it later with the same lock released in nbd_do_it() and 2) the proposed patch uses nbd->pid to determine if the device is in use (nbd-client alive) -- should it do it some other way? Thank you for the feedback! -- Michal Belczyk Sr. ------------------------------------------------------------------------------ Get your SQL database under version control now! Version control is standard for application code, but databases havent caught up. So what steps can you take to put your SQL databases under version control? Why should you start doing it? Read more to find out. http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk _______________________________________________ Nbd-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nbd-general
