Re: ATA over ethernet swapping
Hi! > > But I'm able to compile kernel (-j 10) on 128MB machine, and I tried > > cat /dev/zero | grep foo to exhaust memory... and could not reproduce > > the deadlock. Should I pingflood? Tweak down ammount of atomic memory > > avaialable to make deadlocks easier to reproduce? > > I usually test swap over NFS in the following manner, I setup a regular > inet service on the machine (apache or a bunch of ncat sockets piping to > files or something) and run a heavy workload on the machine (128M): > 2*64M file backed thrashers and 2*64M anonymous thrashers. Then I start > clients for the regular inet service, wait for a bit, and shut down the > NFS server. > > This makes the machine grind to a halt, I then restart the NFS server, > wait for it to reconnect and the client to come alive again. > > Without the last few swap-over-NFS patches this last bit - getting back > out of that situation - never happens. > > The basic idea is to make connectivity to the machine where swap traffic > goes very hard (pull a cable, cleanly shut down the server) and to keep > other network traffic pounding the machine. Hmm, I could not get swap-over-ata-over-ethernet to break. Maybe I should not have local / filesystem, because it allows kernel to get rid of some memory pressure by dropping clean pages? Plus I guess ata-over-ethernet has some significant advantages, as it works over ethernet directly, not over IP. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ATA over ethernet swapping
On Thu, 2007-08-09 at 12:11 +0200, Pavel Machek wrote: > Hi! > > > I've been working on this for quite some time. And should post again > > soon. Please see the patches: > > > > http://programming.kicks-ass.net/kernel-patches/vm_deadlock/current/ > > > > For now it requires one uses SLUB, I hope that SLAB will go away (will > > save me the trouble of adding support) and I guess I ought to do SLOB > > some time (if that does stay). > > > > You'd need the first 22 patches of that series, and then call > > sk_set_memalloc(sk) on the proper socket, and do some fiddling with the > > reconnect logic. See nfs-swapfile.patch for examples. > > What do you use for testing? I set up ata over ethernet... swapping > over that should deadlock w/o your patches. > > But I'm able to compile kernel (-j 10) on 128MB machine, and I tried > cat /dev/zero | grep foo to exhaust memory... and could not reproduce > the deadlock. Should I pingflood? Tweak down ammount of atomic memory > avaialable to make deadlocks easier to reproduce? I usually test swap over NFS in the following manner, I setup a regular inet service on the machine (apache or a bunch of ncat sockets piping to files or something) and run a heavy workload on the machine (128M): 2*64M file backed thrashers and 2*64M anonymous thrashers. Then I start clients for the regular inet service, wait for a bit, and shut down the NFS server. This makes the machine grind to a halt, I then restart the NFS server, wait for it to reconnect and the client to come alive again. Without the last few swap-over-NFS patches this last bit - getting back out of that situation - never happens. The basic idea is to make connectivity to the machine where swap traffic goes very hard (pull a cable, cleanly shut down the server) and to keep other network traffic pounding the machine. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ATA over ethernet swapping
Hi! > I've been working on this for quite some time. And should post again > soon. Please see the patches: > > http://programming.kicks-ass.net/kernel-patches/vm_deadlock/current/ > > For now it requires one uses SLUB, I hope that SLAB will go away (will > save me the trouble of adding support) and I guess I ought to do SLOB > some time (if that does stay). > > You'd need the first 22 patches of that series, and then call > sk_set_memalloc(sk) on the proper socket, and do some fiddling with the > reconnect logic. See nfs-swapfile.patch for examples. What do you use for testing? I set up ata over ethernet... swapping over that should deadlock w/o your patches. But I'm able to compile kernel (-j 10) on 128MB machine, and I tried cat /dev/zero | grep foo to exhaust memory... and could not reproduce the deadlock. Should I pingflood? Tweak down ammount of atomic memory avaialable to make deadlocks easier to reproduce? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ATA over ethernet swapping and obfuscated code
On 7/31/07, Ed L. Cashin <[EMAIL PROTECTED]> wrote: > It is easy to chat, though. Maybe someday I will test and submit a > patch that implements this mechanism, but I'm hoping that somebody > beats me to it. :) There is already such a mechanism (planned). swap over networked storage -v11 http://lwn.net/Articles/223057/ "... Also knowing it is headed towards the VM needs a little help, hence we introduce the socket flag SOCK_VMIO to mark sockets with. ..." But it seems it did not make it into -mm. HTH Torsten - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ATA over ethernet swapping
I've been working on this for quite some time. And should post again soon. Please see the patches: http://programming.kicks-ass.net/kernel-patches/vm_deadlock/current/ For now it requires one uses SLUB, I hope that SLAB will go away (will save me the trouble of adding support) and I guess I ought to do SLOB some time (if that does stay). You'd need the first 22 patches of that series, and then call sk_set_memalloc(sk) on the proper socket, and do some fiddling with the reconnect logic. See nfs-swapfile.patch for examples. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ATA over ethernet swapping
Hi! > ... > > Is the protocol documented somewhere? aoe.txt only points at > > HOWTO... aha, protocol is linked from wikipedia. > > http://www.coraid.com/documents/AoEr10.txt ... perhaps that should be > > linked from aoe.txt, too? > > Perhaps. Most people reading the aoe.txt file won't need to refer to > the protocol itself, though. Some of your users are developers, too :-). Should I generate a patch? > > Hmm, aoe protocol is really trivial. Perhaps netpoll/netconsole > > infrastructure could be used to create driver good enough for > > swapping? (Ok, it would not neccessarily perform too well, but... we'd > > simply wait for the reply synchronously. It should be pretty simple). > > I think that in general you still need a way to receive write > confirmations without allocating memory, and the driver can't provide > that mechanism. The problem is that when memory is scarce, writes of > dirty data must be able to complete, but because memory is scarce, > there might not be enough to receive and process packets write-reponse > packets, and the driver has no way of affecting the situation. That's > why I think a callback could work: The network layer could allow > storage drivers to register a callback that recognizes write > responses. Hmm, ok, it is not as simple as I thought. include/linux/netpoll.h already includes mechanism to notify interested parties really soon, but drivers still call dev_alloc_skb() before that. > Usually the callback would not be used, but if free pages became so > scarce that network receives could not take place in a normal fashion, > the (zero or few) registered callbacks would be used to quickly > determine whether each packet was a write response. The distinction > is important, because write responses can result in the freeing of > pages. Hmm, adding GFP_GIVE_ME_EMERGENCY_POOLS to dev_alloc_skb(), then doing ... int netif_rx(struct sk_buff *skb) { struct softnet_data *queue; unsigned long flags; /* if netpoll wants it, pretend we never saw it */ if (netpoll_rx(skb)) return NET_RX_DROP; if (memory_is_still_very_low()) return NET_RX_DROP; ...should do the trick. Would something like that be acceptable to network people? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ATA over ethernet swapping and obfuscated code
On Tue, Jul 31, 2007 at 05:29:24PM +0200, Pavel Machek wrote: ... > Is the protocol documented somewhere? aoe.txt only points at > HOWTO... aha, protocol is linked from wikipedia. > http://www.coraid.com/documents/AoEr10.txt ... perhaps that should be > linked from aoe.txt, too? Perhaps. Most people reading the aoe.txt file won't need to refer to the protocol itself, though. > Hmm, aoe protocol is really trivial. Perhaps netpoll/netconsole > infrastructure could be used to create driver good enough for > swapping? (Ok, it would not neccessarily perform too well, but... we'd > simply wait for the reply synchronously. It should be pretty simple). I think that in general you still need a way to receive write confirmations without allocating memory, and the driver can't provide that mechanism. The problem is that when memory is scarce, writes of dirty data must be able to complete, but because memory is scarce, there might not be enough to receive and process packets write-reponse packets, and the driver has no way of affecting the situation. That's why I think a callback could work: The network layer could allow storage drivers to register a callback that recognizes write responses. Usually the callback would not be used, but if free pages became so scarce that network receives could not take place in a normal fashion, the (zero or few) registered callbacks would be used to quickly determine whether each packet was a write response. The distinction is important, because write responses can result in the freeing of pages. When a storage driver's callback identified a write response, then a reserved skb could be used to process the receive without allocating memory. During the memory crunch packets that were not write responses would be dropped just as they are already, but dirty pages would be flushed. The mechanism would only take effect when free pages were scarce. It is easy to chat, though. Maybe someday I will test and submit a patch that implements this mechanism, but I'm hoping that somebody beats me to it. :) -- Ed L Cashin <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ATA over ethernet swapping and obfuscated code
Hi! (and thanks for the response). > > I wanted to know if it is possible/okay to swap over AOE... > > > > According to > > http://www.coraid.com/support/linux/EtherDrive-2.6-HOWTO-5.html#ss5.20 > > .. it runs OOM even during normal use, so I guess swapping over it is > > no-no? > > It can be done (e.g., to create virtual memory for running xfs_check > on a diskless machine as a temporary measure), but it probably won't > be a good idea until there is a mechanism that allows write responses > to be (quickly recognized and then) received without allocating memory > when there are no free pages. > > I think if we could register a very fast function to recognize write > responses, which would be called only when free memory was very low, > and then use a pre-allocated receive skb for receiving write > responses, then we'd be OK, and the common case wouldn't be > affected. Is the protocol documented somewhere? aoe.txt only points at HOWTO... aha, protocol is linked from wikipedia. http://www.coraid.com/documents/AoEr10.txt ... perhaps that should be linked from aoe.txt, too? Hmm, aoe protocol is really trivial. Perhaps netpoll/netconsole infrastructure could be used to create driver good enough for swapping? (Ok, it would not neccessarily perform too well, but... we'd simply wait for the reply synchronously. It should be pretty simple). > > Can I build both client and server for these using free software? > > Yes. A popular free target is the vblade (aoetools.sourceforge.net), > and there are others. The most popular free software initiator is the > aoe driver in Linux. Good, thanks. > > In the process, I looked at the aoe code, and parts of it look like > > obfuscated C contest. The use of switch() as an if was particulary > > creative; I'm not even sure if I translated it properly... can you > > take a look? > > I recently submitted a set of patches, and Andrew Morton asked me to > avoid the switch statement you are talking about, so thanks for the > patch, but that code is going to be patched soon anyway. Sorry for the noise. I was blind, the spin_unlock issue is not there and I should have looked at -mm. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ATA over ethernet swapping and obfuscated code
On Tue, Jul 31, 2007 at 03:58:31PM +0200, Pavel Machek wrote: > Hi! > > I wanted to know if it is possible/okay to swap over AOE... > > According to > http://www.coraid.com/support/linux/EtherDrive-2.6-HOWTO-5.html#ss5.20 > .. it runs OOM even during normal use, so I guess swapping over it is > no-no? It can be done (e.g., to create virtual memory for running xfs_check on a diskless machine as a temporary measure), but it probably won't be a good idea until there is a mechanism that allows write responses to be (quickly recognized and then) received without allocating memory when there are no free pages. I think if we could register a very fast function to recognize write responses, which would be called only when free memory was very low, and then use a pre-allocated receive skb for receiving write responses, then we'd be OK, and the common case wouldn't be affected. > Can I build both client and server for these using free software? Yes. A popular free target is the vblade (aoetools.sourceforge.net), and there are others. The most popular free software initiator is the aoe driver in Linux. > In the process, I looked at the aoe code, and parts of it look like > obfuscated C contest. The use of switch() as an if was particulary > creative; I'm not even sure if I translated it properly... can you > take a look? I recently submitted a set of patches, and Andrew Morton asked me to avoid the switch statement you are talking about, so thanks for the patch, but that code is going to be patched soon anyway. More below. > (Patch is > > Signed-off-by: Pavel Machek <[EMAIL PROTECTED]> > > but I did not even compile test it) > > diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c > index 05a9719..38ba35d 100644 > --- a/drivers/block/aoe/aoedev.c > +++ b/drivers/block/aoe/aoedev.c > @@ -64,29 +64,26 @@ aoedev_newdev(ulong nframes) > > d = kzalloc(sizeof *d, GFP_ATOMIC); > f = kcalloc(nframes, sizeof *f, GFP_ATOMIC); > - switch (!d || !f) { > - case 0: > - d->nframes = nframes; > - d->frames = f; > - e = f + nframes; > - for (; f - f->tag = FREETAG; > - f->skb = new_skb(ETH_ZLEN); > - if (!f->skb) > - break; > - } > - if (f == e) > - break; > + if (!d || !f) { > + kfree(f); > + kfree(d); > + return NULL; > + } > + > + d->nframes = nframes; > + d->frames = f; > + e = f + nframes; > + for (; f + f->tag = FREETAG; > + f->skb = new_skb(ETH_ZLEN); > + if (!f->skb) > + break; > + } > + if (f != e) { > while (f > d->frames) { > f--; > dev_kfree_skb(f->skb); > } > - default: > - if (f) > - kfree(f); > - if (d) > - kfree(d); > - return NULL; > } > INIT_WORK(&d->work, aoecmd_sleepwork); > spin_lock_init(&d->lock); > > > aoedev_by_sysminor_m() returns with spinlock held in error case; I > guess that's bad. > > struct aoedev * > aoedev_by_sysminor_m(ulong sysminor, ulong bufcnt) > { > struct aoedev *d; > ulong flags; > > spin_lock_irqsave(&devlist_lock, flags); > > for (d=devlist; d; d=d->next) > if (d->sysminor == sysminor) > break; > > if (d == NULL) { > d = aoedev_newdev(bufcnt); > if (d == NULL) { > spin_unlock_irqrestore(&devlist_lock, flags); > printk(KERN_INFO "aoe: aoedev_newdev > failure.\n"); > return NULL; > ~ here I don't see what you mean. There's an unlock two lines before the return. > } > d->sysminor = sysminor; > d->aoemajor = AOEMAJOR(sysminor); > d->aoeminor = AOEMINOR(sysminor); > } > > spin_unlock_irqrestore(&devlist_lock, flags); > return d; > } > -- Ed L Cashin <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ATA over ethernet swapping and obfuscated code
Hi Pavel, On Tue, 31 Jul 2007 15:58:31 +0200 Pavel Machek <[EMAIL PROTECTED]> wrote: > Hi! > > I wanted to know if it is possible/okay to swap over AOE... > > According to > http://www.coraid.com/support/linux/EtherDrive-2.6-HOWTO-5.html#ss5.20 > .. it runs OOM even during normal use, so I guess swapping over it is > no-no? > > Can I build both client and server for these using free software? > > In the process, I looked at the aoe code, and parts of it look like > obfuscated C contest. The use of switch() as an if was particulary > creative; I'm not even sure if I translated it properly... can you > take a look? > > (Patch is > > Signed-off-by: Pavel Machek <[EMAIL PROTECTED]> > > but I did not even compile test it) > > diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c > index 05a9719..38ba35d 100644 > --- a/drivers/block/aoe/aoedev.c > +++ b/drivers/block/aoe/aoedev.c > @@ -64,29 +64,26 @@ aoedev_newdev(ulong nframes) > > d = kzalloc(sizeof *d, GFP_ATOMIC); > f = kcalloc(nframes, sizeof *f, GFP_ATOMIC); > - switch (!d || !f) { > - case 0: > - d->nframes = nframes; > - d->frames = f; > - e = f + nframes; > - for (; f - f->tag = FREETAG; > - f->skb = new_skb(ETH_ZLEN); > - if (!f->skb) > - break; > - } > - if (f == e) > - break; > + if (!d || !f) { > + kfree(f); > + kfree(d); > + return NULL; > + } > + > + d->nframes = nframes; > + d->frames = f; > + e = f + nframes; > + for (; f + f->tag = FREETAG; > + f->skb = new_skb(ETH_ZLEN); > + if (!f->skb) > + break; > + } > + if (f != e) { > while (f > d->frames) { > f--; > dev_kfree_skb(f->skb); > } > - default: > - if (f) > - kfree(f); > - if (d) > - kfree(d); > - return NULL; > } > INIT_WORK(&d->work, aoecmd_sleepwork); > spin_lock_init(&d->lock); Creative it is. > > > aoedev_by_sysminor_m() returns with spinlock held in error case; I > guess that's bad. > > struct aoedev * > aoedev_by_sysminor_m(ulong sysminor, ulong bufcnt) > { > struct aoedev *d; > ulong flags; > > spin_lock_irqsave(&devlist_lock, flags); > > for (d=devlist; d; d=d->next) > if (d->sysminor == sysminor) > break; > > if (d == NULL) { > d = aoedev_newdev(bufcnt); > if (d == NULL) { > spin_unlock_irqrestore(&devlist_lock, flags); ~ what about here > printk(KERN_INFO "aoe: aoedev_newdev > failure.\n"); > return NULL; > ~ here > } > d->sysminor = sysminor; > d->aoemajor = AOEMAJOR(sysminor); > d->aoeminor = AOEMINOR(sysminor); > } > > spin_unlock_irqrestore(&devlist_lock, flags); > return d; > } > Sébastien. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
ATA over ethernet swapping and obfuscated code
Hi! I wanted to know if it is possible/okay to swap over AOE... According to http://www.coraid.com/support/linux/EtherDrive-2.6-HOWTO-5.html#ss5.20 .. it runs OOM even during normal use, so I guess swapping over it is no-no? Can I build both client and server for these using free software? In the process, I looked at the aoe code, and parts of it look like obfuscated C contest. The use of switch() as an if was particulary creative; I'm not even sure if I translated it properly... can you take a look? (Patch is Signed-off-by: Pavel Machek <[EMAIL PROTECTED]> but I did not even compile test it) diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c index 05a9719..38ba35d 100644 --- a/drivers/block/aoe/aoedev.c +++ b/drivers/block/aoe/aoedev.c @@ -64,29 +64,26 @@ aoedev_newdev(ulong nframes) d = kzalloc(sizeof *d, GFP_ATOMIC); f = kcalloc(nframes, sizeof *f, GFP_ATOMIC); - switch (!d || !f) { - case 0: - d->nframes = nframes; - d->frames = f; - e = f + nframes; - for (; ftag = FREETAG; - f->skb = new_skb(ETH_ZLEN); - if (!f->skb) - break; - } - if (f == e) - break; + if (!d || !f) { + kfree(f); + kfree(d); + return NULL; + } + + d->nframes = nframes; + d->frames = f; + e = f + nframes; + for (; ftag = FREETAG; + f->skb = new_skb(ETH_ZLEN); + if (!f->skb) + break; + } + if (f != e) { while (f > d->frames) { f--; dev_kfree_skb(f->skb); } - default: - if (f) - kfree(f); - if (d) - kfree(d); - return NULL; } INIT_WORK(&d->work, aoecmd_sleepwork); spin_lock_init(&d->lock); aoedev_by_sysminor_m() returns with spinlock held in error case; I guess that's bad. struct aoedev * aoedev_by_sysminor_m(ulong sysminor, ulong bufcnt) { struct aoedev *d; ulong flags; spin_lock_irqsave(&devlist_lock, flags); for (d=devlist; d; d=d->next) if (d->sysminor == sysminor) break; if (d == NULL) { d = aoedev_newdev(bufcnt); if (d == NULL) { spin_unlock_irqrestore(&devlist_lock, flags); printk(KERN_INFO "aoe: aoedev_newdev failure.\n"); return NULL; ~ here } d->sysminor = sysminor; d->aoemajor = AOEMAJOR(sysminor); d->aoeminor = AOEMINOR(sysminor); } spin_unlock_irqrestore(&devlist_lock, flags); return d; } Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/