Re: [PATCH] sb1000: prevent a potential NULL pointer dereference in sb1000_dev_ioctl()
On Sun, 29 Jul 2007, Domen Puncer wrote: > On 29/07/07 00:02 +0200, Jesper Juhl wrote: > > Hi, > > > > Here's a small patch, prompted by a find by the Coverity checker, > > that removes a potential NULL pointer dereference from > > drivers/net/sb1000.c::sb1000_dev_ioctl(). > > The checker spotted that we do a NULL test of 'dev', yet we > > dereference the pointer prior to that check. > > This patch simply moves the dereference after the NULL test. > > But... it can't be called without a valid 'dev', no? > A quick 'grep do_ioctl net/' confirms that all calls are in > the form of 'dev->do_ioctl(dev, ...'. Yup, I think so too ... > > @@ -991,11 +991,13 @@ static int sb1000_dev_ioctl(struct net_device *dev, > > struct ifreq *ifr, int cmd) > > short PID[4]; > > int ioaddr[2], status, frequency; > > unsigned int stats[5]; > > - struct sb1000_private *lp = netdev_priv(dev); > > + struct sb1000_private *lp; > > > > if (!(dev && dev->flags & IFF_UP)) > > return -ENODEV; I think we could get rid of the !dev check itself. Actually, the IFF_UP check /also/ looks suspect to me for two reasons: (1) I remember Stephen Hemminger once telling me dev->flags is legacy and unsafe, and one of the netif_xxx() functions be used instead, and, (2) I wonder if we really require the interface to be up and *running* when we do this ioctl. Satyam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Intel Turbo Memory
Hi all, My new laptop came with some of the above. Has anyone tried looking at this to see what is involved in using it? http://www.intel.com/support/chipsets/itm/ -- Cheers, Stephen Rothwell[EMAIL PROTECTED] http://www.canb.auug.org.au/~sfr/ pgp0jeeNpjY79.pgp Description: PGP signature
Re: [PATCH] fix return value of i8042_aux_test_irq
On Thursday 26 July 2007 11:57, [EMAIL PROTECTED] wrote: > On Fri, July 27, 2007 12:29 am, Alan Cox wrote: > >> > A small number of boxes do share IRQ12 and it was switched to shared > >> for > >> > them. > >> If that is the case interrupt handlers should be able to determine > >> whether > >> a certain interrupt comes from their respective devices, and return > >> IRQ_HANDLED or IRQ_NONE accordingly. Returning IRQ_HANDLED > >> unconditionally > >> when IRQF_SHARED is set seems strange. Is this behavior intended? > > > > Sometimes you simple can't tell and in those cases you have no choice. > As I mentioned in a previous email, i8042_interrupt considers that it > should not handle an interrupt when there is no data to read and, > accordingly, it returns IRQ_NONE in such cases. I was just wondering if we > could follow the same approach to make i8042_aux_test_irq more > IRQF_SHARED-friendly. > Yes, you are right. Patch applied to 'for-linus' branch of input tree. Thank you. -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: Linus 2.6.23-rc1
> It's like CONFIG_HZ - more or less often debated, and now we have everyone > happy by giving them the choice. That's an interesting analogy -- since really the right answer there seems not to be modal at all, but rather to do CONFIG_NO_HZ. - R. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sb1000: prevent a potential NULL pointer dereference in sb1000_dev_ioctl()
On 29/07/07 00:02 +0200, Jesper Juhl wrote: > Hi, > > Here's a small patch, prompted by a find by the Coverity checker, > that removes a potential NULL pointer dereference from > drivers/net/sb1000.c::sb1000_dev_ioctl(). > The checker spotted that we do a NULL test of 'dev', yet we > dereference the pointer prior to that check. > This patch simply moves the dereference after the NULL test. But... it can't be called without a valid 'dev', no? A quick 'grep do_ioctl net/' confirms that all calls are in the form of 'dev->do_ioctl(dev, ...'. Domen > @@ -991,11 +991,13 @@ static int sb1000_dev_ioctl(struct net_device *dev, > struct ifreq *ifr, int cmd) > short PID[4]; > int ioaddr[2], status, frequency; > unsigned int stats[5]; > - struct sb1000_private *lp = netdev_priv(dev); > + struct sb1000_private *lp; > > if (!(dev && dev->flags & IFF_UP)) > return -ENODEV; > > + lp = netdev_priv(dev); > + > ioaddr[0] = dev->base_addr; > /* mem_start holds the second I/O address */ > ioaddr[1] = dev->mem_start; > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Add /sys/module/name/notes
This patch adds the /sys/module//notes/ magic directory, which has a file for each allocated SHT_NOTE section that appears in .ko. This is the counterpart for each module of /sys/kernel/notes for vmlinux. Reading this delivers the contents of the module's SHT_NOTE sections. This lets userland easily glean any detailed information about that module's build that was stored there at compile time (e.g. by ld --build-id). Signed-off-by: Roland McGrath <[EMAIL PROTECTED]> --- include/linux/module.h |3 + kernel/module.c| 106 2 files changed, 109 insertions(+), 0 deletions(-) diff --git a/include/linux/module.h b/include/linux/module.h index b6a646c..65d0752 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -346,6 +346,9 @@ struct module /* Section attributes */ struct module_sect_attrs *sect_attrs; + + /* Notes attributes */ + struct module_notes_attrs *notes_attrs; #endif /* Per-cpu data. */ diff --git a/kernel/module.c b/kernel/module.c index 33c04ad..d7bbe1a 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -1054,6 +1055,100 @@ static void remove_sect_attrs(struct module *mod) } } +/* + * /sys/module/foo/notes/.section.name gives contents of SHT_NOTE sections. + */ + +struct module_notes_attrs { + struct kobject *dir; + unsigned int notes; + struct bin_attribute attrs[0]; +}; + +static ssize_t module_notes_read(struct kobject *kobj, +struct bin_attribute *bin_attr, +char *buf, loff_t pos, size_t count) +{ + /* +* The caller checked the pos and count against our size. +*/ + memcpy(buf, bin_attr->private + pos, count); + return count; +} + +static void free_notes_attrs(struct module_notes_attrs *notes_attrs, +unsigned int i) +{ + if (notes_attrs->dir) { + while (i-- > 0) + sysfs_remove_bin_file(notes_attrs->dir, + _attrs->attrs[i]); + kobject_del(notes_attrs->dir); + } + kfree(notes_attrs); +} + +static void add_notes_attrs(struct module *mod, unsigned int nsect, + char *secstrings, Elf_Shdr *sechdrs) +{ + unsigned int notes, loaded, i; + struct module_notes_attrs *notes_attrs; + struct bin_attribute *nattr; + + /* Count notes sections and allocate structures. */ + notes = 0; + for (i = 0; i < nsect; i++) + if ((sechdrs[i].sh_flags & SHF_ALLOC) && + (sechdrs[i].sh_type == SHT_NOTE)) + ++notes; + + if (notes == 0) + return; + + notes_attrs = kzalloc(sizeof(*notes_attrs) + + notes * sizeof(notes_attrs->attrs[0]), + GFP_KERNEL); + if (notes_attrs == NULL) + return; + + notes_attrs->notes = notes; + nattr = _attrs->attrs[0]; + for (loaded = i = 0; i < nsect; ++i) { + if (!(sechdrs[i].sh_flags & SHF_ALLOC)) + continue; + if (sechdrs[i].sh_type == SHT_NOTE) { + nattr->attr.name = mod->sect_attrs->attrs[loaded].name; + nattr->attr.mode = S_IRUGO; + nattr->size = sechdrs[i].sh_size; + nattr->private = (void *) sechdrs[i].sh_addr; + nattr->read = module_notes_read; + ++nattr; + } + ++loaded; + } + + notes_attrs->dir = kobject_add_dir(>mkobj.kobj, "notes"); + if (!notes_attrs->dir) + goto out; + + for (i = 0; i < notes; ++i) + if (sysfs_create_bin_file(notes_attrs->dir, + _attrs->attrs[i])) + goto out; + + mod->notes_attrs = notes_attrs; + return; + + out: + free_notes_attrs(notes_attrs, i); +} + +static void remove_notes_attrs(struct module *mod) +{ + if (mod->notes_attrs) + free_notes_attrs(mod->notes_attrs, mod->notes_attrs->notes); +} + #else static inline void add_sect_attrs(struct module *mod, unsigned int nsect, @@ -1064,6 +1159,15 @@ static inline void add_sect_attrs(struct module *mod, unsigned int nsect, static inline void remove_sect_attrs(struct module *mod) { } + +static inline void add_notes_attrs(struct module *mod, unsigned int nsect, + char *sectstrings, Elf_Shdr *sechdrs) +{ +} + +static inline void remove_notes_attrs(struct module *mod) +{ +} #endif /* CONFIG_KALLSYMS */ #ifdef CONFIG_SYSFS @@ -1198,6 +1302,7 @@ static void free_module(struct module *mod) { /* Delete from
Re: [PATCH] arch/i386/kernel/apm.c: apm_init() warning fix
On Sun, 29 Jul 2007 10:49:18 +0800 Eugene Teo <[EMAIL PROTECTED]> wrote: > > arch/i386/kernel/apm.c: In function 'apm_init': > arch/i386/kernel/apm.c:2240: warning: format '%lx' expects type 'long > unsigned int', but argument 3 has type 'u32' > > apm_info.bios.offset is of type 'u32'. > > Signed-off-by: Eugene Teo <[EMAIL PROTECTED]> Acked-by: Stephen Rothwell <[EMAIL PROTECTED]> -- Cheers, Stephen Rothwell[EMAIL PROTECTED] http://www.canb.auug.org.au/~sfr/ pgpVSp8PakgjC.pgp Description: PGP signature
Re: [PATCH] Merge the Sonics Silicon Backplane subsystem
On Friday 27 July 2007 16:12, Andrew Morton wrote: > On Fri, 27 Jul 2007 21:43:59 +0200 > Michael Buesch <[EMAIL PROTECTED]> wrote: > > > > Sure, but why is the locking interruptible rather than plain old > > > mutex_lock()? > > > > Hm, well. We hold this mutex for several seconds, as writing takes > > this long. So I simply thought it was worth allowing the waiter > > to interrupt here. If you say that's not an issue, I'll be happy > > to use mutex_lock() and reduce code complexity in this area. > > So.. is that what the _interruptible() is for? To allow an impatient user > to ^c > a read? > > If so, that sounds reasonable. It's worth a comment explaining these > decisions > to future readers, because it is hard to work out this sort of thinking just > from the bare C code. I think most of sysfs ->show() and ->store() implementations use _interruptible() variant to allow user to interrupt and return early. -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] TSDEV - Don't flood dmesg with removal warnings
Hi Parag, On Friday 27 July 2007 10:43, Parag Warudkar wrote: > Ignore my previous whitespace damaged patch. This one should be good. > > tsdev.c warns about scheduled removal each time tsdev_open is called - > So even for a default boot I get to see the warning 3 times - > > [ 340.537078] tsdev (compaq touchscreen emulation) is scheduled for > removal. > [ 340.537081] See Documentation/feature-removal-schedule.txt for details. > [ 340.550314] tsdev (compaq touchscreen emulation) is scheduled for > removal. > [ 340.550318] See Documentation/feature-removal-schedule.txt for details. > [ 340.565065] tsdev (compaq touchscreen emulation) is scheduled for > removal. > [ 340.565068] See Documentation/feature-removal-schedule.txt for details. > > Move the warning to tsdev_init() from tsdev_open so we don't end up > printing a large string in dmesg everytime tsdev_open is called. > The printk was moved per Andrew's request to make it more annoying. Obviously it is working ;) Do you know what is opening /dev/input/tsX nodes? -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: How can we make page replacement smarter (was: swap-prefetch)
Al Boldi wrote: Chris Snook wrote: At best, reads can be read-ahead and cached, which is why sequential swap-in sucks less. On-demand reads are as expensive as I/O can get. Which means that it should be at least as fast as swap-out, even faster because write to disk is usually slower than read on modern disks. But linux currently shows a distinct 2x slowdown for sequential swap-in wrt swap-out. That's because writes are faster than reads in moderate quantities. The disk caches writes, allowing the OS to write a whole bunch of data into the disk cache and the disk can optimize the IO a bit internally. The same optimization is not possible for reads. -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-usb-devel] Edgeport UPS Monitoring Problems
Andrew Morton wrote: On Fri, 27 Jul 2007 13:37:08 -0700 Nick Pasich <[EMAIL PROTECTED]> wrote: Greg/Peter/Al, added linux-usb-devel. I've been using the edgeport 4 port USB to Serial Converter to monitor APC Smart UPS's via apcupsd for quite awhile on various Linux boxes. I just upgraded to Kernel Version 2.6.22.1 from 2.6.20.6 on a couple of systems and both the edgeports stopped communicating. I tried applying various patches, "PATCH 026/149" and "PATCH 082/149" and one by Alan Cox.. but they didn't fix the problem. I copied the 2.6.20.6 edgeport module sources to the new 2.6.22.1 tree and everything works again. linux/drivers/usb/serial/io_edgeport.c linux/drivers/usb/serial/io_edgeport.h linux/drivers/usb/serial/io_edgeport.mod.c linux/drivers/usb/serial/io_tables.h Straightforward regression, most serious. Thanks for reporting it. I don't know much of anything about usb-serial, but I'll take a whack at it. Could you enable debug for that driver, launch apcupsd, and report any intersting messages that show up in dmesg? I'd be especially interested in any "Not setting..." or "Not writing..." messages, because some critical-looking code for baud rate setting and similar became conditional in 2.6.22.1 whereas it was always executed before. Apcupsd is going to be rather unhappy if the baud rate doesn't change when it asks. The debug should show if the these operations are being ignored on your hw. --Adam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ide problems: 2.6.22-git17 working, 2.6.23-rc1* is not
Danny ter Haar wrote: [ added linux-acpi and Len to CC ] > Quoting Gabriel C ([EMAIL PROTECTED]): >> Maybe try to : >> disable BSG ( maybe some leftover bug ) >> boot acpi=off ( that got merged kind late ) > > My first git disected kernel wouldn't boot, but with > acpi=off it would indeed boot! Now while we think is ACPI this should be easy for you to bisect. This commit http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=39804b20f62532fa05c2a8c3e2d1ae551fd0327b merged ACPI so this one should be your first bad one. Maybe Len has some idea and you don't need to bisect :) > > As did the 2.6.23-rc1-git5 kernel... > > I will bisect further to find out exactly what patch is > playing up in my particular setup. > > thanks for the tip! ;-) You are welcome :) > > Danny > Gabriel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] bsg: Fix warning with CONFIG_BLK_DEV_BSG=n
The current stub definitions of bsg_register_queue() and bsg_unregister_queue() as macros leads to drivers/scsi/scsi_sysfs.c: In function 'scsi_sysfs_add_sdev': drivers/scsi/scsi_sysfs.c:718: warning: unused variable 'rq' because the first parameter of bsg_register_queue() is completely discarded. As akpm says, "program in C, not in cpp." We might as well get a little bit better type-checking when we fix this by converting the stubs to empty inline functions. Signed-off-by: Roland Dreier <[EMAIL PROTECTED]> --- include/linux/bsg.h |9 +++-- 1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/include/linux/bsg.h b/include/linux/bsg.h index f415f89..69e23e1 100644 --- a/include/linux/bsg.h +++ b/include/linux/bsg.h @@ -60,8 +60,13 @@ struct bsg_class_device { extern int bsg_register_queue(struct request_queue *, struct device *, const char *); extern void bsg_unregister_queue(struct request_queue *); #else -#define bsg_register_queue(disk, dev, name)(0) -#define bsg_unregister_queue(disk) do { } while (0) +static inline int bsg_register_queue(struct request_queue *q, struct device *gdev, +const char *name) +{ + return 0; +} + +static inline void bsg_unregister_queue(struct request_queue *q) { } #endif #endif /* __KERNEL__ */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC/RFT 1/5] Input: implement proper locking in input core
Hi Indan, On Friday 27 July 2007 19:28, Indan Zupancic wrote: > Hi, > > Not real feedback, just some nitpicks. > > On Tue, July 24, 2007 06:45, Dmitry Torokhov wrote: > > +static int input_defuzz_abs_event(int value, int old_val, int fuzz) > > +{ > > + if (fuzz) { > > + if (value > old_val - fuzz / 2 && value < old_val + fuzz / 2) > > + return value; > > > > - add_input_randomness(type, code, value); > > + if (value > old_val - fuzz && value < old_val + fuzz) > > + return (old_val * 3 + value) / 4; > > > > - switch (type) { > > + if (value > old_val - fuzz * 2 && value < old_val + fuzz * 2) > > + return (old_val + value) / 2; > > + } > > Shouldn't the return values of the second and third case be reversed? > In the 2nd check the new values is weighted for 1/4, while in the 3rd > case it counts for 1/2, which breaks the "account new value more when > it is closer to the old one" logic that I thought I saw here. So to sum up, > should the second return be "return (old_val + value * 3) / 4"? Thank you for bringing this up. Actually the 1st return valus should be "old_val", not value. The logic is to "gravitate towards old" when difference is small. > > > > +/* > > + * Generate software autorepeat event. Note that we take > > + * dev->event_lock here to avoid racing with input_event > > + * which may cause keys get "stuck". > > + */ > > Hurray. :-) > > > - if (code > SW_MAX || !test_bit(code, dev->swbit) || > > !!test_bit(code, dev->sw) == value) > > - return; > > + if (dev->rep[REP_PERIOD]) > > + mod_timer(>timer, jiffies + > > + msecs_to_jiffies(dev->rep[REP_PERIOD])); > > + } > > Perhaps use a local var for the "msecs_to_jiffies(dev->rep[REP_PERIOD])" part. > What would be the benefit of doing so? > > > +static void input_start_autorepeat(struct input_dev *dev, int code) > > +{ > > + if (test_bit(EV_REP, dev->evbit) && > > + dev->rep[REP_PERIOD] && dev->rep[REP_DELAY] && > > + dev->timer.data) { > > + dev->repeat_key = code; > > + mod_timer(>timer, > > + jiffies + msecs_to_jiffies(dev->rep[REP_DELAY])); > > + } > > +} > > Same here. > > > > + case EV_KEY: > > + if (is_event_supported(code, dev->keybit, KEY_MAX) && > > + !!test_bit(code, dev->key) != value) { > > A bit confusing, test_bit(0 only returns 0 or 1 anyway, doesn't it? > So "test_bit(code, dev->key) != value" should be all right. > I noticed that the old code did it too, but still. Is it guaranteed? I only expect it to return 0/non-0 values, not necessarily 0 and 1. > > > - case EV_MSC: > > + case EV_SW: > > + if (is_event_supported(code, dev->swbit, SW_MAX) && > > + !!test_bit(code, dev->sw) != value) { > > Same. > > > - break; > > + case EV_LED: > > + if (is_event_supported(code, dev->ledbit, LED_MAX) && > > + !!test_bit(code, dev->led) != value) { > > And here. > > > > +void input_inject_event(struct input_handle *handle, > > + unsigned int type, unsigned int code, int value) > > { > > - struct input_dev *dev = (void *) data; > > + struct input_dev *dev = handle->dev; > > + struct input_handle *grab; > > > > - if (!test_bit(dev->repeat_key, dev->key)) > > - return; > > + if (is_event_supported(type, dev->evbit, EV_MAX)) { > > + spin_lock_irq(>event_lock); > > > > - input_event(dev, EV_KEY, dev->repeat_key, 2); > > - input_sync(dev); > > + grab = rcu_dereference(dev->grab); > > + if (!grab || grab == handle) > > + input_handle_event(dev, type, code, value); > > 'handle' can't be NULL, so can drop the "!grab" check, as checking > "grab == handle" should be sufficient. > It is "or", not "and". The idea is to pass the event if device is not grabbed by anyone _or_ if source of event is handle that grabbed the device. > > > +/** > > + * input_open_device - open input device > > + * @handle: handle through which device is being accessed > > + * > > + * This function should be called by input handlers when they > > + * want to start receive events from given input device. > > + */ > > int input_open_device(struct input_handle *handle) > > { > > struct input_dev *dev = handle->dev; > > - int err; > > + int retval; > > > > - err = mutex_lock_interruptible(>mutex); > > - if (err) > > - return err; > > + retval = mutex_lock_interruptible(>mutex); > > + if (retval) > > + return retval; > > + > > + if (dev->going_away) { > > + retval = -ENODEV; > > + goto out; > > + } > > > > handle->open++; > > > > if (!dev->users++ && dev->open) > > Ugh, not your code, and perhaps it's me, but that looks weird. > The ++ hidden
Re: [PATCH 2/2] ehca: correction include order according kernel coding style
thanks, I applied this by hand since it was so trivial. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFT: updatedb "morning after" problem [was: Re: -mm merge plans for 2.6.23]
On Sat, 28 Jul 2007 21:33:59 -0400 Rik van Riel <[EMAIL PROTECTED]> wrote: > Andrew Morton wrote: > > > What I think is killing us here is the blockdev pagecache: the pagecache > > which backs those directory entries and inodes. These pages get read > > multiple times because they hold multiple directory entries and multiple > > inodes. These multiple touches will put those pages onto the active list > > so they stick around for a long time and everything else gets evicted. > > > > I've never been very sure about this policy for the metadata pagecache. We > > read the filesystem objects into the dcache and icache and then we won't > > read from that page again for a long time (I expect). But the page will > > still hang around for a long time. > > > > It could be that we should leave those pages inactive. > > Good idea for updatedb. > > However, it may be a bad idea for files that are often > written to. Turning an inode write into a read plus a > write does not sound like such a hot idea, we really > want to keep those in the cache. Remember that this problem applies to both inode blocks and to directory blocks. Yes, it might be useful to hold onto an inode block for a future write (atime, mtime, usually), but not a directory block. > I think what you need is to ignore multiple references > to the same page when they all happen in one time > interval, counting them only if they happen in multiple > time intervals. Yes, the sudden burst of accesses for adjacent inode/dirents will be a common pattern, and it'd make heaps of sense to treat that as a single touch. It'd have to be done in the fs I guess, and it might be a bit hard to do. And it turns out that embedding the touch_buffer() all the way down in __find_get_block() was convenient, but it's going to be tricky to change. For now I'm fairly inclined to just nuke the touch_buffer() on the read side and maybe add one on the modification codepaths and see what happens. As always, testing is the problem. > The use-once cleanup (which takes a page flag for PG_new, > I know...) would solve that problem. > > However, it would introduce the problem of having to scan > all the pages on the list before a page becomes freeable. > We would have to add some background scanning (or a separate > list for PG_new pages) to make the initial pageout run use > an acceptable amount of CPU time. > > Not sure that complexity will be worth it... > I suspect that the situation we have now is so bad that pretty much anything we do will be an improvement. I've always wondered "ytf is there so much blockdev pagecache?" This machine I'm typing at: MemTotal: 3975080 kB MemFree:750400 kB Buffers:547736 kB Cached:1299532 kB SwapCached: 12772 kB Active:1789864 kB Inactive: 861420 kB HighTotal: 0 kB HighFree:0 kB LowTotal: 3975080 kB LowFree:750400 kB SwapTotal: 4875716 kB SwapFree: 4715660 kB Dirty: 76 kB Writeback: 0 kB Mapped: 638036 kB Slab: 522724 kB CommitLimit: 6863256 kB Committed_AS: 1115632 kB PageTables: 14452 kB VmallocTotal: 34359738367 kB VmallocUsed: 36432 kB VmallocChunk: 34359696379 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 Hugepagesize: 2048 kB More that a quarter of my RAM in fs metadata! Most of it I'll bet is on the active list. And the fs on which I do most of the work is mounted noatime.. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC/RFT 0/5] Input locking patches
Hi Indan, On Friday 27 July 2007 18:25, Indan Zupancic wrote: > Sorry for the babbling, just wanted to say that I've tested these > patches and that they seem to fix real problems. > Thank you for testing the patches. -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] [PATCH 1/2] ehca: remove checkpatch.pl's warnings "externs should be avoided in .c files"
the patch looks fine except your mailer seems to have mangled it... can you resend so I can apply it? thanks... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ide problems: 2.6.22-git17 working, 2.6.23-rc1* is not
Quoting Gabriel C ([EMAIL PROTECTED]): > Maybe try to : > disable BSG ( maybe some leftover bug ) > boot acpi=off ( that got merged kind late ) My first git disected kernel wouldn't boot, but with acpi=off it would indeed boot! As did the 2.6.23-rc1-git5 kernel... I will bisect further to find out exactly what patch is playing up in my particular setup. thanks for the tip! ;-) Danny -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] scheduler: improve SMP fairness in CFS
Tong Li wrote: Without the global locking, the global synchronization here is simply ping-ponging a cache line once of while. This doesn't look expensive to me, but if it does after benchmarking, adjusting sysctl_base_round_slice can reduce the ping-pong frequency. There might also be a smart implementation that can alleviate this problem. Scaling it proportionally to migration cost and log2(cpus) should suffice. I don't understand why quantizing CPU time is a bad thing. Could you educate me on this? It depends on how precisely you do it. We save a lot of power going tickless. If round expiration is re-introducing ticks on idle CPUs, we could waste a lot of power. Hardware is getting even more aggressive about power saving, to the point of allowing individual cores to be completely powered off when idle. We need to make sure the scheduler doesn't interfere with power management. -- Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] fs/partitions/check.c: add_partition() warning fixes
This patch fixes these warnings: fs/partitions/check.c: In function ‘add_partition’: fs/partitions/check.c:391: warning: ignoring return value of ‘kobject_add’, declared with attribute warn_unused_result fs/partitions/check.c:394: warning: ignoring return value of ‘sysfs_create_link’, declared with attribute warn_unused_result fs/partitions/check.c:401: warning: ignoring return value of ‘sysfs_create_file’, declared with attribute warn_unused_result Signed-off-by: Eugene Teo <[EMAIL PROTECTED]> --- fs/partitions/check.c | 21 ++--- 1 files changed, 18 insertions(+), 3 deletions(-) diff --git a/fs/partitions/check.c b/fs/partitions/check.c index 783c57e..01397c1 100644 --- a/fs/partitions/check.c +++ b/fs/partitions/check.c @@ -371,6 +371,7 @@ void delete_partition(struct gendisk *disk, int part) void add_partition(struct gendisk *disk, int part, sector_t start, sector_t len, int flags) { struct hd_struct *p; + int err; p = kzalloc(sizeof(*p), GFP_KERNEL); if (!p) @@ -388,20 +389,34 @@ void add_partition(struct gendisk *disk, int part, sector_t start, sector_t len, p->kobj.parent = >kobj; p->kobj.ktype = _part; kobject_init(>kobj); - kobject_add(>kobj); + err = kobject_add(>kobj); + if (err) + goto err_out; if (!disk->part_uevent_suppress) kobject_uevent(>kobj, KOBJ_ADD); - sysfs_create_link(>kobj, _subsys.kobj, "subsystem"); + err = sysfs_create_link(>kobj, _subsys.kobj, "subsystem"); + if (err) + goto err_out_del_kobj; if (flags & ADDPART_FLAG_WHOLEDISK) { static struct attribute addpartattr = { .name = "whole_disk", .mode = S_IRUSR | S_IRGRP | S_IROTH, }; - sysfs_create_file(>kobj, ); + err = sysfs_create_file(>kobj, ); + if (err) + goto err_out_del_link; } partition_sysfs_add_subdir(p); disk->part[part-1] = p; + return; + +err_out_del_link: + sysfs_remove_link(>kobj, "subsystem"); +err_out_del_kobj: + kobject_del(>kobj); +err_out: + kfree(p); } static char *make_block_name(struct gendisk *disk) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ide problems: 2.6.22-git17 working, 2.6.23-rc1* is not
Danny ter Haar wrote: > Quoting Bartlomiej Zolnierkiewicz ([EMAIL PROTECTED]): >> Please retry with the latest -git kernel and if the problem is still >> there install git, get kernel tree and run git-bisect. > > I ran over "make menuconfig" and did a few changes. > > http://www.dth.net/kernel/config-2.6.23-rc1-git5 > Maybe try to : disable BSG ( maybe some leftover bug ) boot acpi=off ( that got merged kind late ) > > Danny Gabriel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] arch/i386/kernel/apm.c: apm_init() warning fix
arch/i386/kernel/apm.c: In function 'apm_init': arch/i386/kernel/apm.c:2240: warning: format '%lx' expects type 'long unsigned int', but argument 3 has type 'u32' apm_info.bios.offset is of type 'u32'. Signed-off-by: Eugene Teo <[EMAIL PROTECTED]> --- arch/i386/kernel/apm.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/i386/kernel/apm.c b/arch/i386/kernel/apm.c index 47001d5..f02a8ac 100644 --- a/arch/i386/kernel/apm.c +++ b/arch/i386/kernel/apm.c @@ -2235,7 +2235,7 @@ static int __init apm_init(void) apm_info.bios.cseg_16_len = 0; /* 64k */ if (debug) { - printk(KERN_INFO "apm: entry %x:%lx cseg16 %x dseg %x", + printk(KERN_INFO "apm: entry %x:%x cseg16 %x dseg %x", apm_info.bios.cseg, apm_info.bios.offset, apm_info.bios.cseg_16, apm_info.bios.dseg); if (apm_info.bios.version > 0x100) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] List VESA graphics videomodes when vesafb is present
Hello, I had something like this in video.S for years, so it is probably time to try to push it upstream... Besides other problems it confirmed that when I connect HDTV to my nVidia, BIOS decides that 640x480 is largest possible resolution, as it is largest standard resolution smaller than non-interlaced 1920x540, and apparently BIOS I have does not believe that interlaced modes exist. Thanks, Petr Vandrovec List VESA videomodes when vesafb is available There is no CONFIG_VIDEO_VESA option, so code to retrieve VESA modes (even text ones) was always disabled - it was introduced by conversion, old video.S had #define CONFIG_VIDEO_VESA at the beginning. Modify video-vesa.c to list graphics videomodes when vesafb is built in and videomode is acceptable. Because there is lot of videomodes in some VESA BIOSes, list them in three columns (should be good for anybody - videocards with more than ~70 videomodes cannot be used with some OSes, so vendors usually try to not cross more than 60 videomodes reported by VESA). Which unfortunately means that card_name for "BIOS (scanned)" needs to be made shorter. And display color depth for videomodes. To avoid confusion depth is shown only if non-text videomodes are present in the list. --- commit 9453a2a4afee6cadb020838d0a35c2d11af25aa6 tree 5903315f7fd62db34f22107e63ec37a24bbf8075 parent 5de7fc0bf0e2e36a8dcf619e95576219dcc13d70 author Petr Vandrovec <[EMAIL PROTECTED]> Sun, 22 Jul 2007 23:54:34 -0700 committer Petr Vandrovec <[EMAIL PROTECTED]> Sun, 22 Jul 2007 23:54:34 -0700 arch/i386/boot/video-bios.c |3 ++- arch/i386/boot/video-vesa.c | 48 +++ arch/i386/boot/video-vga.c | 20 +- arch/i386/boot/video.c | 39 --- arch/i386/boot/video.h |3 ++- 5 files changed, 89 insertions(+), 24 deletions(-) diff --git a/arch/i386/boot/video-bios.c b/arch/i386/boot/video-bios.c index afea46c..376ff71 100644 --- a/arch/i386/boot/video-bios.c +++ b/arch/i386/boot/video-bios.c @@ -104,6 +104,7 @@ static int bios_probe(void) mi = GET_HEAP(struct mode_info, 1); mi->mode = VIDEO_FIRST_BIOS+mode; + mi->depth = 0; /* text */ mi->x = rdfs16(0x44a); mi->y = rdfs8(0x484)+1; nmodes++; @@ -116,7 +117,7 @@ static int bios_probe(void) __videocard video_bios = { - .card_name = "BIOS (scanned)", + .card_name = "BIOS", .probe = bios_probe, .set_mode = bios_set_mode, .unsafe = 1, diff --git a/arch/i386/boot/video-vesa.c b/arch/i386/boot/video-vesa.c index e6aa9eb..af7019f 100644 --- a/arch/i386/boot/video-vesa.c +++ b/arch/i386/boot/video-vesa.c @@ -28,7 +28,7 @@ static void vesa_store_mode_params_graphics(void); static int vesa_probe(void) { -#if defined(CONFIG_VIDEO_VESA) || defined(CONFIG_FIRMWARE_EDID) +#if defined(CONFIG_VIDEO_SELECT) || defined(CONFIG_FIRMWARE_EDID) u16 ax; u16 mode; addr_t mode_ptr; @@ -47,8 +47,8 @@ static int vesa_probe(void) vginfo.signature != VESA_MAGIC || vginfo.version < 0x0102) return 0; /* Not present */ -#endif /* CONFIG_VIDEO_VESA || CONFIG_FIRMWARE_EDID */ -#ifdef CONFIG_VIDEO_VESA +#endif /* CONFIG_VIDEO_SELECT || CONFIG_FIRMWARE_EDID */ +#ifdef CONFIG_VIDEO_SELECT set_fs(vginfo.video_mode_ptr.seg); mode_ptr = vginfo.video_mode_ptr.off; @@ -75,19 +75,49 @@ static int vesa_probe(void) /* Text Mode, TTY BIOS supported, supported by hardware */ mi = GET_HEAP(struct mode_info, 1); - mi->mode = mode + VIDEO_FIRST_VESA; - mi->x= vminfo.h_res; - mi->y= vminfo.v_res; + mi->mode = mode + VIDEO_FIRST_VESA; + mi->depth = 0; /* text */ + mi->x = vminfo.h_res; + mi->y = vminfo.v_res; nmodes++; } else if ((vminfo.mode_attr & 0x99) == 0x99) { #ifdef CONFIG_FB /* Graphics mode, color, linear frame buffer - supported -- register the mode but hide from + supported -- register the mode, and if there + is no VESA framebuffer then hide from the menu. Only do this if framebuffer is configured, however, otherwise the user will be left without a screen. */ mi = GET_HEAP(struct mode_info, 1); - mi->mode = mode + VIDEO_FIRST_VESA; + mi->mode = mode + VIDEO_FIRST_VESA; + mi->depth = 7; /*
Re: [RFC] scheduler: improve SMP fairness in CFS
Tong Li wrote: On Fri, 27 Jul 2007, Chris Snook wrote: Bill Huey (hui) wrote: You have to consider the target for this kind of code. There are applications where you need something that falls within a constant error bound. According to the numbers, the current CFS rebalancing logic doesn't achieve that to any degree of rigor. So CFS is ok for SCHED_OTHER, but not for anything more strict than that. I've said from the beginning that I think that anyone who desperately needs perfect fairness should be explicitly enforcing it with the aid of realtime priorities. The problem is that configuring and tuning a realtime application is a pain, and people want to be able to approximate this behavior without doing a whole lot of dirty work themselves. I believe that CFS can and should be enhanced to ensure SMP-fairness over potentially short, user-configurable intervals, even for SCHED_OTHER. I do not, however, believe that we should take it to the extreme of wasting CPU cycles on migrations that will not improve performance for *any* task, just to avoid letting some tasks get ahead of others. We should be as fair as possible but no fairer. If we've already made it as fair as possible, we should account for the margin of error and correct for it the next time we rebalance. We should not burn the surplus just to get rid of it. Proportional-share scheduling actually has one of its roots in real-time and having a p-fair scheduler is essential for real-time apps (soft real-time). Sounds like another scheduler class might be in order. I find CFS to be fair enough for most purposes. If the code that gives us near-perfect fairness at the expense of efficiency only runs when tasks have been given boosted priority by a privileged user, and only on the CPUs that have such tasks queued on them, the run time overhead and code complexity become much smaller concerns. On a non-NUMA box with single-socket, non-SMT processors, a constant error bound is fine. Once we add SMT, go multi-core, go NUMA, and add inter-chassis interconnects on top of that, we need to multiply this error bound at each stage in the hierarchy, or else we'll end up wasting CPU cycles on migrations that actually hurt the processes they're supposed to be helping, and hurt everyone else even more. I believe we should enforce an error bound that is proportional to migration cost. I think we are actually in agreement. When I say constant bound, it can certainly be a constant that's determined based on inputs from the memory hierarchy. The point is that it needs to be a constant independent of things like # of tasks. Agreed. But this patch is only relevant to SCHED_OTHER. The realtime scheduler doesn't have a concept of fairness, just priorities. That why each realtime priority level has its own separate runqueue. Realtime schedulers are supposed to be dumb as a post, so they cannot heuristically decide to do anything other than precisely what you configured them to do, and so they don't get in the way when you're context switching a million times a second. Are you referring to hard real-time? As I said, an infrastructure that enables p-fair scheduling, EDF, or things alike is the foundation for real-time. I designed DWRR, however, with a target of non-RT apps, although I was hoping the research results might be applicable to RT. I'm referring to the static priority SCHED_FIFO and SCHED_RR schedulers, which are (intentionally) dumb as a post, allowing userspace to manage CPU time explicitly. Proportionally fair scheduling is a cool capability, but not a design goal of those schedulers. -- Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] core_pattern: allow passing of arguments to user mode helper when core_pattern is a pipe
On Sat, Jul 28, 2007 at 03:52:02PM -0700, Jeremy Fitzhardinge wrote: > Neil Horman wrote: > > Jeremy asked that I make a patch next week to address split_argv's > > requirement > > that the argc parameter be non-NULL. I'll be fixing that next week, and > > what I > > can do is further enhance it such that it ignores spaces in quoted strings, > > which should address the case that concerns you. I.E I can make split_argv > > behave such that: > > echo "|\"foo bar\" --pid %p" > /proc/sys/kernel/core_pattern > > results in the following argv: > > {{"foo bar"}, {"--pid"}, {"1234"}} > > > > Which I think handles what you are looking for. > > > > No, please don't. My original argv_split did that, and it was just way > too complex. If you need complex quoting, you can always point it at a > shell script and handle it there. > > J Ok, well then, it seems this corner case is much too harry to just fix up immediately. Given that we certainly don't handle quoted strings now, and the fact that this is a case that will almost never come up, and can be esaily worked around, lets address it at some time after we get this base functionality in place Regards Neil -- /*** *Neil Horman *Software Engineer *Red Hat, Inc. [EMAIL PROTECTED] *gpg keyid: 1024D / 0x92A74FA1 *http://pgp.mit.edu ***/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: Linus 2.6.23-rc1
Con Kolivas <[EMAIL PROTECTED]> writes: > Interesting... Trying to avoid reading email but with a flooded inbox > it's quite hard to do. Con, good to hear from you. Good luck with your future endeavors. Charles -- "Are [Linux users] lemmings collectively jumping off of the cliff of reliable, well-engineered commercial software?" (By Matt Welsh) pgp22bG9rBbnK.pgp Description: PGP signature
Re: [PATCH] UML - Console should handle spurious IRQS
*This message was transferred with a trial version of CommuniGate(r) Pro* Jeff Dike wrote: The previous DEBUG_SHIRQ patch missed one case. The console doesn't set its host descriptors non-blocking. Sorry, things looked okay when I tested on my UML environment (Puppy Linux). Some xterms popped around (because I was using "con=xterm") and the system was usable, so it gave me no indication something was wrong. I thought of adding an extra debugging option to warn us when a blocking I/O operation is issued for a socket/fd, but UML-specific code is not consistent regarding glibc functions. That is, most of the time it calls os_*(), but sometimes it calls functions like recvfrom() directly. I'll grep the source code for such calls and send a patch to clean it up a bit. There might still be such cases, I haven't tested all channel types yet. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: How can we make page replacement smarter
Al Boldi wrote: Good idea, but unless we understand the problems involved, we are bound to repeat it. So my first question would be: Why is swap-in so slow? As I have posted in other threads, swap-in of consecutive pages suffers a 2x slowdown wrt swap-out, whereas swap-in of random pages suffers over 6x slowdown. Because it is hard to quantify the expected swap-in speed for random pages, let's first tackle the swap-in of consecutive pages, which should be at least as fast as swap-out. So again, why is swap-in so slow? I suspect that this is a locality of reference issue. Anonymous memory can get jumbled up by repeated free and malloc cycles of many smaller objects. The amount of anonymous memory is often smaller than or roughly the same size as system memory. Locality of refenence to anonymous memory tends to be temporal in nature, with the same sets of pages being accessed over and over again. Files are different. File content tends to be grouped in large related chunks, both logically in the file and on disk. Generally there is a lot more file data on a system than what fits in memory. Locality of reference to file data tends to be spatial in nature, with one file access leading up to the system accessing "nearby" data. The data is not necessarily touched again any time soon. Once we understand this problem, we may be able to suggest a smart improvement. Like the one on http://linux-mm.org/PageoutFailureModes ? I have the LRU lists split and am working on getting SEQ replacement implemented for the anonymous pages. The most recent (untested) patches are attached. -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. --- linux-2.6.21.noarch/drivers/base/node.c.vmsplit 2007-04-25 23:08:32.0 -0400 +++ linux-2.6.21.noarch/drivers/base/node.c 2007-07-23 11:42:52.0 -0400 @@ -44,33 +44,37 @@ static ssize_t node_read_meminfo(struct si_meminfo_node(, nid); n = sprintf(buf, "\n" - "Node %d MemTotal: %8lu kB\n" - "Node %d MemFree: %8lu kB\n" - "Node %d MemUsed: %8lu kB\n" - "Node %d Active: %8lu kB\n" - "Node %d Inactive: %8lu kB\n" + "Node %d MemTotal: %8lu kB\n" + "Node %d MemFree:%8lu kB\n" + "Node %d MemUsed:%8lu kB\n" + "Node %d Active(anon): %8lu kB\n" + "Node %d Inactive(anon): %8lu kB\n" + "Node %d Active(file): %8lu kB\n" + "Node %d Inactive(file): %8lu kB\n" #ifdef CONFIG_HIGHMEM - "Node %d HighTotal:%8lu kB\n" - "Node %d HighFree: %8lu kB\n" - "Node %d LowTotal: %8lu kB\n" - "Node %d LowFree: %8lu kB\n" + "Node %d HighTotal: %8lu kB\n" + "Node %d HighFree: %8lu kB\n" + "Node %d LowTotal: %8lu kB\n" + "Node %d LowFree:%8lu kB\n" #endif - "Node %d Dirty:%8lu kB\n" - "Node %d Writeback:%8lu kB\n" - "Node %d FilePages:%8lu kB\n" - "Node %d Mapped: %8lu kB\n" - "Node %d AnonPages:%8lu kB\n" - "Node %d PageTables: %8lu kB\n" - "Node %d NFS_Unstable: %8lu kB\n" - "Node %d Bounce: %8lu kB\n" - "Node %d Slab: %8lu kB\n" - "Node %d SReclaimable: %8lu kB\n" - "Node %d SUnreclaim: %8lu kB\n", + "Node %d Dirty: %8lu kB\n" + "Node %d Writeback: %8lu kB\n" + "Node %d FilePages: %8lu kB\n" + "Node %d Mapped: %8lu kB\n" + "Node %d AnonPages: %8lu kB\n" + "Node %d PageTables: %8lu kB\n" + "Node %d NFS_Unstable: %8lu kB\n" + "Node %d Bounce: %8lu kB\n" + "Node %d Slab: %8lu kB\n" + "Node %d SReclaimable: %8lu kB\n" + "Node %d SUnreclaim: %8lu kB\n", nid, K(i.totalram), nid, K(i.freeram), nid, K(i.totalram - i.freeram), - nid, node_page_state(nid, NR_ACTIVE), - nid, node_page_state(nid, NR_INACTIVE), + nid, node_page_state(nid, NR_ACTIVE_ANON), + nid, node_page_state(nid, NR_INACTIVE_ANON), + nid, node_page_state(nid, NR_ACTIVE_FILE), + nid, node_page_state(nid, NR_INACTIVE_FILE), #ifdef CONFIG_HIGHMEM nid, K(i.totalhigh), nid, K(i.freehigh), --- linux-2.6.21.noarch/fs/proc/proc_misc.c.vmsplit 2007-07-05 12:06:14.0 -0400 +++ linux-2.6.21.noarch/fs/proc/proc_misc.c 2007-07-23 11:42:52.0 -0400 @@ -146,43 +146,47 @@ static int meminfo_read_proc(char *page, * Tagged format, for easy grepping and expansion. */ len = sprintf(page, - "MemTotal: %8lu kB\n" - "MemFree: %8lu kB\n" - "Buffers: %8lu kB\n" - "Cached: %8lu kB\n" - "SwapCached: %8lu kB\n" - "Active: %8lu kB\n" -
RE: 2.6.23-rc1-git3 init failure
> Boot failure on x86_64 (64X2), says it can't find init, specifically > /init. 2.6.23-rc1-git1 boots and runs successfully. I haven't tried > -git2. I shall reboot on 2.6.23-rc1-git3 tomorrow and record the full > message. > Strings from vmlinux in both the above:- > > Kernel alive > /dev/console > <4>Warning: unable to open an initial console. > <4>Failed to execute %s > <4>Failed to execute %s. Attempting defaults... > /sbin/init > /etc/init > /bin/init > /bin/sh > No init found. Try passing init= option to kernel. > > Tried option "init=/sbin/init" and got the same failure. > Regards > Sid. I see the sam problem with 2.6.23-rc1-git5. Freeing unused kernel memory: 236k freed failed to execute /init kernel panic - not syncing: No init found. Try passing init= option to kernel Copying /sbin/init to / results in the same error. openSUSE 10.3Alpha6plus # rpm -qf /sbin/init sysvinit-2.86-90 Regards Sid. -- Sid Boyce ... Hamradio License G3VBV, Licensed Private Pilot Emeritus IBM/Amdahl Mainframes and Sun/Fujitsu Servers Tech Support Specialist, Cricket Coach Microsoft Windows Free Zone - Linux used for all Computing Tasks - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFT: updatedb "morning after" problem [was: Re: -mm merge plans for 2.6.23]
Andrew Morton wrote: What I think is killing us here is the blockdev pagecache: the pagecache which backs those directory entries and inodes. These pages get read multiple times because they hold multiple directory entries and multiple inodes. These multiple touches will put those pages onto the active list so they stick around for a long time and everything else gets evicted. I've never been very sure about this policy for the metadata pagecache. We read the filesystem objects into the dcache and icache and then we won't read from that page again for a long time (I expect). But the page will still hang around for a long time. It could be that we should leave those pages inactive. Good idea for updatedb. However, it may be a bad idea for files that are often written to. Turning an inode write into a read plus a write does not sound like such a hot idea, we really want to keep those in the cache. I think what you need is to ignore multiple references to the same page when they all happen in one time interval, counting them only if they happen in multiple time intervals. The use-once cleanup (which takes a page flag for PG_new, I know...) would solve that problem. However, it would introduce the problem of having to scan all the pages on the list before a page becomes freeable. We would have to add some background scanning (or a separate list for PG_new pages) to make the initial pageout run use an acceptable amount of CPU time. Not sure that complexity will be worth it... -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ide problems: 2.6.22-git17 working, 2.6.23-rc1* is not
Quoting Bartlomiej Zolnierkiewicz ([EMAIL PROTECTED]): > Please retry with the latest -git kernel and if the problem is still > there install git, get kernel tree and run git-bisect. I ran over "make menuconfig" and did a few changes. http://www.dth.net/kernel/config-2.6.23-rc1-git5 It boots, but freezes solid before doing any "work" as a firewall. I was able to catch boot messages with netconsole http://www.dth.net/kernel/via_output_2.6.23-rc1-git5 It didn't respond after the last line. magig sysrq etc, all nada. Will start to get acquainted with git ;-) Danny -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: Linus 2.6.23-rc1
On Sat, Jul 28, 2007 at 03:18:24PM -0700, Linus Torvalds wrote: > I don't think anything was suppressed here. I disagree. See below. > You seem to say that more modular code would have helped make for a nicer > way to do schedulers, but if so, where were those patches to do that? > Con's patches didn't do that either. They just replaced the code. They replaced code because he would have liked to have taken scheduler code in possibly a completely different direction. This is a large conceptual change from what is currently there. That might also mean how the notion of bandwidth with regards to core frequency might be expressed in the system with regards to power saving and other things. Things get dropped often not because of pure technical reasons but because of person preference and the lack of willingness to ask where this might take us. The way that Con works and conceptualizes things is quite a bit different and more comprehensive in a lot of ways compared to how the regular kernel community operates. He's strong in this area and weak in general kernel hackery as a function of time and experience. That doesn't mean that he, his ideas and his code should be subject to an either/or situation with the scheduler and other ideas that have been rejected by various folks. He maintained -ck branch successfully for a long time and is a very capable developer. I do acknowledge that having a maintainer that you can trust is more important, but it should not be exclusionary in this way. I totally understand his reaction. > In fact, Ingo's patches _do_ add some modularity, and might make it easier > to replace the scheduler. So it would seem that you would argue for CFS, > not against it? It's not the same as sched plugin. Some folks might not like to use the rbtree that's in place and express things in a completely different manner. Take for instance, Tong Li's stuff with CFS a bit of a conceptual mismatch with his attempt at expression rebalancing in terms expiry rounds yet would be more seamlessly integrated with something like either the old O(1) scheduler or Con's stuff. It's also the only method posted to lkml that can deal with fairness across SMP situtations with low error. Yet what's happening here is that his implementation is being rejected because of size and complexity because of a data structure conceptual mismatch. Because of this, his notion of trio as a general method of getting aggressive group fairness (by far the most complete conceptually on lkml, over design is a different topic altogether) may never see the light of day in Linux because of people's collective lack of foresight. To answer the question that you posed, no. I'm not arguing against it. I'm in favor of it going into the kernel like any dead line mechanism since it can be generalized, but the current developement processes in Linux kernel should not create an either/or situation with the scheduler code. There has been multipule rejection of ideas with regards to the scheduler code over the years that could have take things in a very different and possibly complete kick ass way that was suppress because of the development attitude of various Linux kernel developers. It's all of a sudden because of Con's work there's a flurry of development in this area when this idea is shown to be superior and even then, it's conceptually incomplete and subject to a lot of arbitrary hacking. This is very different than Con's development style and mine as well. This is an area that could have been addressed sooner if the general community admitted that there was a problem earlier and permitted more conscious and open change. I've seen changes in this area from Con be reject time and time again which effect the technical direction he originally wanted to take this. Now, Con might have a communication problem here, but nobody asked to clarify what he might have wanted and why, yet folks were very quick at dismissing him, nitpick him to death, even when he explained why he might have wanted a particular change in the first place. This is the "facilitation" part that's missing in the current kernel culture. This is a very important idea as the community grows, because I see folks that are capable of doing work get discouraged and locked out because of code maintainability issues and an inability to get folks to move that direction because of a missing concensus mechanism in the community other that sucking up to developers. Con and folks like him should be permitted the opportunity to fail on their own account. If Linux was truely open, it would have dealt with issue by now and there wouldn't be so much flammage from the general community. > > I think that's kind of a bogus assumption from the very get go. Scheduling > > in Linux is one of the most unevolved systems in the kernel that still > > could go through a large transformation and get big gains like what > > we've had over the last few months. This evident with both
Re: [PATCH] Fix lguest bzImage loading with CONFIG_RELOCATABLE=y
On Fri, 2007-07-27 at 12:45 +0200, Andi Kleen wrote: > Rusty Russell <[EMAIL PROTECTED]> writes: > > > Jason Yeh sent his crashing .config: bzImages made with > > CONFIG_RELOCATABLE=y put the relocs where the BSS is expected, and we > > crash with unusual results such as: > > The normal kernel startup should already clear BSS. Why does > this not work here? Can it be fixed? Unfortunately, lguest doesn't go through the normal startup path (which does this in asm). Thanks, Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Framebuffer: Consolidated cleanup of pvr2fb.c for Sega Dreamcast
On 29/07/07, Adrian McMenamin <[EMAIL PROTECTED]> wrote: > Tony, > > Second time attempt at this and a much better job I think. > Sorry, given I've jsut said this *does* work at 24bpp and 32bpp I'd better clean up the Documentation patch,,, diff --git a/Documentation/fb/pvr2fb.txt b/Documentation/fb/pvr2fb.txt index 2bf6c23..0e4a3c6 100644 --- a/Documentation/fb/pvr2fb.txt +++ b/Documentation/fb/pvr2fb.txt @@ -9,14 +9,14 @@ one found in the Dreamcast. Advantages: * It provides a nice large console (128 cols + 48 lines with 1024x768) - without using tiny, unreadable fonts. + without using tiny, unreadable fonts (this size is NOT available on the + Dreamcast) * You can run XF86_FBDev on top of /dev/fb0 * Most important: boot logo :-) Disadvantages: - * Driver is currently limited to the Dreamcast PowerVR 2 implementation - at the time of this writing. + * Driver is largely untested on non-Dreamcast systems. Configuration = @@ -29,11 +29,13 @@ Accepted options: font:X- default font to use. All fonts are supported, including the SUN12x22 font which is very nice at high resolutions. -mode:X- default video mode. The following video modes are supported: -640x240-60, 640x480-60. +mode:X- default video mode +The following video modes are supported: +[EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] The Dreamcast + defaults to [EMAIL PROTECTED] Note: the 640x240 mode is currently broken, and should not be -used for any reason. It is only mentioned as a reference. +used for any reason. It is only mentioned here as a reference. inverse - invert colors on screen (for LCD displays) @@ -49,13 +51,20 @@ cable:X - cable type. This can be any of the following: vga, rgb, and output:X - output type. This can be any of the following: pal, ntsc, and vga. If none is specified, we guess. +Video mode may also be specified in the form: + + [xres]x[yres][-[EMAIL PROTECTED] + + eg [EMAIL PROTECTED] + + X11 === -XF86_FBDev should work, in theory. At the time of this writing it is -totally untested and may or may not even portray the beginnings of -working. If you end up testing this, please let me know! +XF86_FBDev has been shown to work on the Dremcast in the past - though not yet +on any 2.6 series kernel. -- Paul Mundt <[EMAIL PROTECTED]> +Updated by Adrian McMenamin <[EMAIL PROTECTED]> diff --git a/drivers/video/pvr2fb.c b/drivers/video/pvr2fb.c index 3ac32f3..264b6a6 100644 --- a/drivers/video/pvr2fb.c +++ b/drivers/video/pvr2fb.c @@ -94,6 +94,7 @@ #define DISP_DIWCONF (DISP_BASE + 0xe8) #define DISP_DIWHSTRT (DISP_BASE + 0xec) #define DISP_DIWVSTRT (DISP_BASE + 0xf0) +#define DISP_PIXDEPTH (DISP_BASE + 0x108) /* Pixel clocks, one for TV output, doubled for VGA output */ #define TV_CLK 74239 @@ -143,6 +144,7 @@ static struct pvr2fb_par { unsigned char is_lowres; /* Is horizontal pixel-doubling enabled? */ unsigned long mmio_base; /* MMIO base */ + u32 palette[16]; } *currentpar; static struct fb_info *fb_info; @@ -320,7 +322,7 @@ static int pvr2fb_setcolreg(unsigned int regno, unsigned int red, if (regno > info->cmap.len) return 1; - + /* * We only support the hardware palette for 16 and 32bpp. It's also * expected that the palette format has been set by the time we get @@ -333,24 +335,25 @@ static int pvr2fb_setcolreg(unsigned int regno, unsigned int red, ((blue & 0xf800) >> 11); pvr2fb_set_pal_entry(par, regno, tmp); - ((u16*)(info->pseudo_palette))[regno] = tmp; break; case 24: /* RGB 888 */ red >>= 8; green >>= 8; blue >>= 8; - ((u32*)(info->pseudo_palette))[regno] = (red << 16) | (green << 8) | blue; + tmp = (red << 16) | (green << 8) | blue; break; case 32: /* ARGB */ red >>= 8; green >>= 8; blue >>= 8; tmp = (transp << 24) | (red << 16) | (green << 8) | blue; pvr2fb_set_pal_entry(par, regno, tmp); - ((u32*)(info->pseudo_palette))[regno] = tmp; break; default: pr_debug("Invalid bit depth %d?!?\n", info->var.bits_per_pixel); return 1; } + if (regno < 16) + ((u32*)(info->pseudo_palette))[regno] = tmp; + return 0; } @@ -598,6 +601,7 @@ static void pvr2_init_display(struct fb_info *info) /* bits per pixel */ fb_writel(fb_readl(DISP_DIWMODE) | (--bytesperpixel << 2), DISP_DIWMODE); + fb_writel(bytesperpixel << 2, DISP_PIXDEPTH); /* video enable, color sync, interlace, * hsync and vsync polarity (currently unused) */ @@ -789,7 +793,7 @@ static int __devinit pvr2fb_common_init(void) fb_info->fbops = _ops; fb_info->fix = pvr2_fix; fb_info->par = currentpar; - fb_info->pseudo_palette = (void *)(fb_info->par + 1); + fb_info->pseudo_palette = currentpar->palette; fb_info->flags = FBINFO_DEFAULT | FBINFO_HWACCEL_YPAN; if (video_output == VO_VGA) @@ -806,6 +810,8 @@ static int __devinit
Re: Linus 2.6.23-rc1
On Sun, 2007-07-29 at 01:41 +0200, Volker Armin Hemmann wrote: > Hi, > > I never tried Con's patchset, for two reasons: > I tried his 2.4 patches ones, and I never saw any improvements. So when > people > were reporting huge improvements with his SD scheduler, I compared that with > the reports of huge improvements with his 2.4 kernel patches. Well thats a reason if there ever were one... > ... > The second: too many patches. I only would have tried one or two, but the > ck-patchset is a lot bigger.. and I am a little bit uneasy about that. so use only the scheduler? nobody forces you to do many things.. > > But I tried a lot of Ingo's cfs patches - and it was a very pleasant > experience. Ingo reacted very fast on my feedback and when I hit a problem he > really tried to find the cause and solve it - and it always was one patch, so > I felt a lot less scared ;) > > My usual workload is very 'usual'. KDE desktop, kmail, konqueror, sometimes > xine or amarok providing some background noise while typing away in kate, > triplea, wesnoth or some other game when I need to 'rest' for a while. A lot > of compiling in the background, because I am one of these gentoo users. > > With cfs the experience was much more pleasant than with the 'old' scheduler. > Compiling did not hurt as much as usual anymore - the only thing that hurts > is swap > > But there is another thing I do regularly: I play ut2004. Not every single > day, but sometimes several times a day. 20minutes of mayhem and then back to > the desktop. > > And I do not see any problems with cfs and ut2004. The maximum FPS are indeed > a little bit lower (and you can argue that this really is not important if > the pre-game FPS in a level looking down on the floor go down from 390 to > 380FPS), but the minimum FPS went up! well, surely CFS is better than the old vanilla scheduler, also with 3d, and if you have that high fps, i doubt you will notice the effects me and others are having. it is not that it is bad, its just not as good as SD has shown to be possible.. > > In scenes when my system is fighting hard to provide the FPS, when the action > is high (like when fighting with half a douzend bots at a power node, while > some other bots are shooting into the mess) CFS is much better than the old > scheduler. It is a big difference if you get 6-10FPS or 15-25. > (I am playing with maximum 'beautifullness' - I would be able to get a lot > more FPS, if I wanted, but I want a nice scenery and maximum visual > effects ...) > > From my point of view 3D is a lot better with cfs. Better than old vanilla yes, but than SD? well, you should give it a try. > > Now the question for all the people who are bashing cfs for its bad 3d > performance: what am I doing wrong? As said, we never said CFS was worse than old vanilla, and we never said it was BAD, we did however say its not as good as SD :) > > Glück Auf, > Volker > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Framebuffer: Consolidated cleanup of pvr2fb.c for Sega Dreamcast
Tony, Second time attempt at this and a much better job I think. This patch consolidates your earlier patch, some cleanup of the documentation and, crucially, some better handling of the pvr2 registers based on more up to date information. Testing shows that it seems to work pretty well at 16bpp, 24bpp and 32bpp - including proper rendering of the boot logo at all levels (previously this was a bit broken even at 16bpp) and giving white against black text. Really detailed testing (eg with X11) requires support for the maple bus - which isn't (currently - next project assuming this is okay) available, but I have no reason to think this is broken. Incidentally, substituing DIRECTCOLOR for TRUECOLOR appears to break the driver. Signed-off by: Adrian McMenamin <[EMAIL PROTECTED]> diff --git a/Documentation/fb/pvr2fb.txt b/Documentation/fb/pvr2fb.txt index 2bf6c23..3d08551 100644 --- a/Documentation/fb/pvr2fb.txt +++ b/Documentation/fb/pvr2fb.txt @@ -9,14 +9,14 @@ one found in the Dreamcast. Advantages: * It provides a nice large console (128 cols + 48 lines with 1024x768) - without using tiny, unreadable fonts. + without using tiny, unreadable fonts (this size is NOT available on the + Dreamcast) * You can run XF86_FBDev on top of /dev/fb0 * Most important: boot logo :-) Disadvantages: - * Driver is currently limited to the Dreamcast PowerVR 2 implementation - at the time of this writing. + * Driver is largely untested on non-Dremcast systems. Configuration = @@ -29,11 +29,15 @@ Accepted options: font:X- default font to use. All fonts are supported, including the SUN12x22 font which is very nice at high resolutions. -mode:X- default video mode. The following video modes are supported: -640x240-60, 640x480-60. +mode:X- default video mode +The following video modes are supported: +[EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] The Dreamcast + defaults to [EMAIL PROTECTED] At the time of writing the +24bpp and 32bpp modes function poorly. Work to fix that is +ongoing Note: the 640x240 mode is currently broken, and should not be -used for any reason. It is only mentioned as a reference. +used for any reason. It is only mentioned here as a reference. inverse - invert colors on screen (for LCD displays) @@ -49,13 +53,20 @@ cable:X - cable type. This can be any of the following: vga, rgb, and output:X - output type. This can be any of the following: pal, ntsc, and vga. If none is specified, we guess. +Video mode may also be specified in the form: + + [xres]x[yres][-[EMAIL PROTECTED] + + eg [EMAIL PROTECTED] + + X11 === -XF86_FBDev should work, in theory. At the time of this writing it is -totally untested and may or may not even portray the beginnings of -working. If you end up testing this, please let me know! +XF86_FBDev has been shown to work on the Dremcast in the past - though not yet +on any 2.6 series kernel. -- Paul Mundt <[EMAIL PROTECTED]> +Updated by Adrian McMenamin <[EMAIL PROTECTED]> diff --git a/drivers/video/pvr2fb.c b/drivers/video/pvr2fb.c index 3ac32f3..264b6a6 100644 --- a/drivers/video/pvr2fb.c +++ b/drivers/video/pvr2fb.c @@ -94,6 +94,7 @@ #define DISP_DIWCONF (DISP_BASE + 0xe8) #define DISP_DIWHSTRT (DISP_BASE + 0xec) #define DISP_DIWVSTRT (DISP_BASE + 0xf0) +#define DISP_PIXDEPTH (DISP_BASE + 0x108) /* Pixel clocks, one for TV output, doubled for VGA output */ #define TV_CLK 74239 @@ -143,6 +144,7 @@ static struct pvr2fb_par { unsigned char is_lowres; /* Is horizontal pixel-doubling enabled? */ unsigned long mmio_base; /* MMIO base */ + u32 palette[16]; } *currentpar; static struct fb_info *fb_info; @@ -320,7 +322,7 @@ static int pvr2fb_setcolreg(unsigned int regno, unsigned int red, if (regno > info->cmap.len) return 1; - + /* * We only support the hardware palette for 16 and 32bpp. It's also * expected that the palette format has been set by the time we get @@ -333,24 +335,25 @@ static int pvr2fb_setcolreg(unsigned int regno, unsigned int red, ((blue & 0xf800) >> 11); pvr2fb_set_pal_entry(par, regno, tmp); - ((u16*)(info->pseudo_palette))[regno] = tmp; break; case 24: /* RGB 888 */ red >>= 8; green >>= 8; blue >>= 8; - ((u32*)(info->pseudo_palette))[regno] = (red << 16) | (green << 8) | blue; + tmp = (red << 16) | (green << 8) | blue; break; case 32: /* ARGB */ red >>= 8; green >>= 8; blue >>= 8; tmp = (transp << 24) | (red << 16) | (green << 8) | blue; pvr2fb_set_pal_entry(par, regno, tmp); - ((u32*)(info->pseudo_palette))[regno] = tmp; break; default: pr_debug("Invalid bit depth %d?!?\n", info->var.bits_per_pixel); return 1; } + if (regno < 16) + ((u32*)(info->pseudo_palette))[regno] = tmp; + return 0; } @@ -598,6 +601,7 @@ static void
Re: ide problems: 2.6.22-git17 working, 2.6.23-rc1* is not
Quoting Bartlomiej Zolnierkiewicz ([EMAIL PROTECTED]): > Should be harmless for now but we would like to fix it the long-term, > please send "hdparm --Istdout /dev/hda" output. I allready made that available in the same subdir: http://www.dth.net/kernel/ Output repeated here: voyage:~# hdparm -I /dev/hda /dev/hda: ATA device, with non-removable media Model Number: PQI IDE DiskOnModule Serial Number: Firmware Revision: db01.20a Standards: Likely used: 1 Configuration: hard sectored not MFM encoded head switch time > 15us fixed drive disk xfer rate > 5Mbs Logical max current cylinders 500 500 heads 16 16 sectors/track 32 32 -- bytes/track: 0 bytes/sector: 528 CHS current addressable sectors: 256000 LBAuser addressable sectors: 256000 device size with M = 1024*1024: 125 MBytes device size with M = 1000*1000: 131 MBytes Capabilities: LBA, IORDY not likely Buffer type: 0002: dual port, multi-sector Buffer size: 1.0kB bytes avail on r/w long: 4 Cannot perform double-word IO R/W multiple sector transfer: Max = 1 Current = 0 DMA: not supported PIO: pio0 pio1 pio2 voyage:~# hdparm -tT /dev/hda /dev/hda: Timing cached reads: 116 MB in 2.00 seconds = 57.92 MB/sec Timing buffered disk reads: 16 MB in 3.01 seconds = 5.32 MB/sec # hdparm --Istdout /dev/hda /dev/hda: 045a 01f4 0010 0210 0020 0003 e800 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 0002 0002 0004 6462 3031 2e32 3061 5051 4920 4944 4520 4469 736b 4f6e 4d6f 6475 6c65 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 0001 0200 0200 0001 01f4 0010 0020 e800 0003 0100 e800 0003 > No IDE changes yet after 2.6.22-git17 so something else must have broke down. darn I also tried with libata driver on the 2.6.22 series but that wouldn't work as well. I contacted Alan Cox and he made some patches/suggestions that made 2.6.22-git16 with libata work! I hope his patch will make it into mainline soon. For the dmesg output of this kernel http://www.dth.net/kernel/via_output_2.6.22-git16-libata_fix_alan_cox > Please retry with the latest -git kernel and if the problem is still > there install git, get kernel tree and run git-bisect. > > Since 2.6.22-git17 works fine the initial "good" commit would be > e51f802babc5e368c60fbfd08c6c11269c9253b0 and the initial "bad" one > f695baf2df9e0413d3521661070103711545207a (for 2.6.23-rc1). > > If you need some practical examples on using git-bisect, this one > > http://www.reactivated.net/weblog/archives/2006/01/using-git-bisect-to-find-buggy-kernel-patches/ > > is a very good one (IMO). I will certainly will read/try that. > > VP_IDE: VIA vt8231 (rev 10) IDE UDMA100 controller on pci:00:11.1 > > ide0: BM-DMA at 0xd000-0xd007, BIOS settings: hda:pio, hdb:pio > > ide1: BM-DMA at 0xd008-0xd00f, BIOS settings: hdc:pio, hdd:pio > > Marking TSC unstable due to: possible TSC halt in C2. > > Time: acpi_pm clocksource has been installed. > > hda: PQI IDE DiskOnModule, ATA DISK drive > > hda: IRQ probe failed (0xfff2) > > This message would indicate broken IRQ routing, however it is no > longer present in the log for the kernel 2.6.23-rc1-git4 ([3]). i booted with routeirq=pci but gave the same negative result. > Interesting that the kernel 2.6.22-git17 (log [2]) doesn't use via82cxxx > IDE host driver while the kernel 2.6.23-rc1-git4 (log [3]) does...? kernel configs are also in above mentioned webdir basically copied the 2.6.22-git17 and ran a make oldconfig. Did a diff -u on those config files and there is really a lot different: [snip] -# ATA/IDE/MFM/RLL support +# Device Drivers # [snip] # # IDE chipset support/bugfixes # +# CONFIG_BLK_DEV_TRM290 is not set
Re: [2/3] 2.6.23-rc1: known regressions v2
On Friday 27 July 2007, Michal Piotrowski wrote: > IDE > > Subject : ide problems: 2.6.22-git17 working, 2.6.23-rc1* is not > References : http://lkml.org/lkml/2007/7/27/298 > Last known good : ? > Submitter : dth <[EMAIL PROTECTED]> > Caused-By : ? > Handled-By : ? > Status : unknown No IDE changes after 2.6.22-git17. Despite this I will try to follow this bugreport. Thanks, Bart - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: Linus 2.6.23-rc1
Interesting... Trying to avoid reading email but with a flooded inbox it's quite hard to do. A lot of useful discussion seems to have generated in response to people's _interpretation_ of my interview rather than what I actually said. For example, everyone seems to think I quit because CFS was chosen over SD (hint: it wasn't). Since it's generating good discussion I'll otherwise leave it as is. As a parting gesture; a couple of hints for CFS. Any difference in behaviour between CFS and SD since they both aim for fairness would come down to the way they interpret fair. Since CFS accounts sleep time whereas SD does not, that would be the reason. As for volanomark regressions, they're always the sched_yield implementation. SD addressed a similar regression a few months back. Good luck. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB Pegasus driver - avoid a potential NULL pointer dereference.
On 29/07/07, Satyam Sharma <[EMAIL PROTECTED]> wrote: > Hi, > > On 7/29/07, Jesper Juhl <[EMAIL PROTECTED]> wrote: > > Hello, > > > > This patch makes sure we don't dereference a NULL pointer in > > drivers/net/usb/pegasus.c::write_bulk_callback() in the initial > > struct net_device *net = pegasus->net; assignment. > > The existing code checks if 'pegasus' is NULL and bails out if > > it is, so we better not touch that pointer until after that check. > > [...] > > diff --git a/drivers/net/usb/pegasus.c b/drivers/net/usb/pegasus.c > > index a05fd97..04cba6b 100644 > > --- a/drivers/net/usb/pegasus.c > > +++ b/drivers/net/usb/pegasus.c > > @@ -768,11 +768,13 @@ done: > > static void write_bulk_callback(struct urb *urb) > > { > > pegasus_t *pegasus = urb->context; > > - struct net_device *net = pegasus->net; > > + struct net_device *net; > > > > if (!pegasus) > > return; > > > > + net = pegasus->net; > > + > > if (!netif_device_present(net) || !netif_running(net)) > > return; > > Is it really possible that we're called into this function with > urb->context == NULL? If not, I'd suggest let's just get rid of > the check itself, instead. > I'm not sure. I am not very familiar with this code. I just figured that moving the assignment is potentially a little safer and it is certainly no worse than the current code, so that's a safe and potentially benneficial change. Removing the check may be safe but I'm not certain enough to make that call... -- Jesper Juhl <[EMAIL PROTECTED]> Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html Plain text mails only, please http://www.expita.com/nomime.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ide problems: 2.6.22-git17 working, 2.6.23-rc1* is not
Hi, On Friday 27 July 2007, dth wrote: > I have a via mini-itx epia 5000 motherboard as a firewall. > It has an ide device: PQI 128MB flash DOM [1] which just plugs > directly into the ide connector on the motherboard. > 2.6.22-git17 works, although it gives some warning > hda: set_drive_speed_status: status=0x51 { DriveReady SeekComplete Error } > hda: set_drive_speed_status: error=0x04 { DriveStatusError } Should be harmless for now but we would like to fix it the long-term, please send "hdparm --Istdout /dev/hda" output. > Full dmesg output [2] > > When i compiled any 2.6.23-rc1 kernel out so far it always froze on me. > Yesterday i tried 2.6.23-rc1-git3 and i booted until: No IDE changes yet after 2.6.22-git17 so something else must have broke down. Please retry with the latest -git kernel and if the problem is still there install git, get kernel tree and run git-bisect. Since 2.6.22-git17 works fine the initial "good" commit would be e51f802babc5e368c60fbfd08c6c11269c9253b0 and the initial "bad" one f695baf2df9e0413d3521661070103711545207a (for 2.6.23-rc1). If you need some practical examples on using git-bisect, this one http://www.reactivated.net/weblog/archives/2006/01/using-git-bisect-to-find-buggy-kernel-patches/ is a very good one (IMO). > VP_IDE: VIA vt8231 (rev 10) IDE UDMA100 controller on pci:00:11.1 > ide0: BM-DMA at 0xd000-0xd007, BIOS settings: hda:pio, hdb:pio > ide1: BM-DMA at 0xd008-0xd00f, BIOS settings: hdc:pio, hdd:pio > Marking TSC unstable due to: possible TSC halt in C2. > Time: acpi_pm clocksource has been installed. > hda: PQI IDE DiskOnModule, ATA DISK drive > hda: IRQ probe failed (0xfff2) This message would indicate broken IRQ routing, however it is no longer present in the log for the kernel 2.6.23-rc1-git4 ([3]). Interesting that the kernel 2.6.22-git17 (log [2]) doesn't use via82cxxx IDE host driver while the kernel 2.6.23-rc1-git4 (log [3]) does...? > Clocksource tsc unstable (delta = 84358771493 ns) > > So i went back to 2.6.22-git17. > This morning i saw a fresh git4 and compiled/installed that. > This kernel actually booted! (once) > I had a netconsole running to catch the lucky event [3] > > After about 2 minutes of working however, the whole machine froze. > No message at the console, not able to toggle numlock, no magic > sysrq key features: solid frozen. > After the power cycle i was not able to boot the same kernel > ever again. It was just like a timer had run out and it wouldn't > work anymore. > > I know my hardware is "ancient" and the flash thing is probably > not current. My firewall however is solar powerpowered (10 watt Neat. :) > power consumption) and it would be fun if i could run the latest/ > greatest kernels on it. With the buggy commit number obtained from git-bisect it should be fixed quite quickly. > Config files and hdparm info on the DOM in the same directory as > dmesg output. > > Any comments appriciated. > > Danny > > [1] http://www.memorydepot.com/ssd_diskonmodule.asp > [2] http://www.dth.net/kernel/via_output_2.6.22_git17 > [3] http://www.dth.net/kernel/via_output_2.6.23-rc1-git4 Thanks, Bart - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ide: sis5513.c: Add FSC Amilo A1630 PCI subvendor/dev to laptops
On Friday 27 July 2007, David Lamparter wrote: > [PATCH] ide: sis5513.c: Add FSC Amilo A1630 PCI subvendor/dev to laptops > > Recognise the FSC Amilo A1630's incarnation of a SiS5513 chip as laptop to > get UDMA100 support. > > Signed-off-by: David Lamparter <[EMAIL PROTECTED]> applied, thanks - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linus 2.6.23-rc1
Hi, I never tried Con's patchset, for two reasons: I tried his 2.4 patches ones, and I never saw any improvements. So when people were reporting huge improvements with his SD scheduler, I compared that with the reports of huge improvements with his 2.4 kernel patches. ... The second: too many patches. I only would have tried one or two, but the ck-patchset is a lot bigger.. and I am a little bit uneasy about that. But I tried a lot of Ingo's cfs patches - and it was a very pleasant experience. Ingo reacted very fast on my feedback and when I hit a problem he really tried to find the cause and solve it - and it always was one patch, so I felt a lot less scared ;) My usual workload is very 'usual'. KDE desktop, kmail, konqueror, sometimes xine or amarok providing some background noise while typing away in kate, triplea, wesnoth or some other game when I need to 'rest' for a while. A lot of compiling in the background, because I am one of these gentoo users. With cfs the experience was much more pleasant than with the 'old' scheduler. Compiling did not hurt as much as usual anymore - the only thing that hurts is swap But there is another thing I do regularly: I play ut2004. Not every single day, but sometimes several times a day. 20minutes of mayhem and then back to the desktop. And I do not see any problems with cfs and ut2004. The maximum FPS are indeed a little bit lower (and you can argue that this really is not important if the pre-game FPS in a level looking down on the floor go down from 390 to 380FPS), but the minimum FPS went up! In scenes when my system is fighting hard to provide the FPS, when the action is high (like when fighting with half a douzend bots at a power node, while some other bots are shooting into the mess) CFS is much better than the old scheduler. It is a big difference if you get 6-10FPS or 15-25. (I am playing with maximum 'beautifullness' - I would be able to get a lot more FPS, if I wanted, but I want a nice scenery and maximum visual effects ...) >From my point of view 3D is a lot better with cfs. Now the question for all the people who are bashing cfs for its bad 3d performance: what am I doing wrong? Glück Auf, Volker - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB Pegasus driver - avoid a potential NULL pointer dereference.
Hi, On 7/29/07, Jesper Juhl <[EMAIL PROTECTED]> wrote: > Hello, > > This patch makes sure we don't dereference a NULL pointer in > drivers/net/usb/pegasus.c::write_bulk_callback() in the initial > struct net_device *net = pegasus->net; assignment. > The existing code checks if 'pegasus' is NULL and bails out if > it is, so we better not touch that pointer until after that check. > [...] > diff --git a/drivers/net/usb/pegasus.c b/drivers/net/usb/pegasus.c > index a05fd97..04cba6b 100644 > --- a/drivers/net/usb/pegasus.c > +++ b/drivers/net/usb/pegasus.c > @@ -768,11 +768,13 @@ done: > static void write_bulk_callback(struct urb *urb) > { > pegasus_t *pegasus = urb->context; > - struct net_device *net = pegasus->net; > + struct net_device *net; > > if (!pegasus) > return; > > + net = pegasus->net; > + > if (!netif_device_present(net) || !netif_running(net)) > return; Is it really possible that we're called into this function with urb->context == NULL? If not, I'd suggest let's just get rid of the check itself, instead. Satyam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] core_pattern: allow passing of arguments to user mode helper when core_pattern is a pipe
Neil Horman wrote: > Jeremy asked that I make a patch next week to address split_argv's requirement > that the argc parameter be non-NULL. I'll be fixing that next week, and what > I > can do is further enhance it such that it ignores spaces in quoted strings, > which should address the case that concerns you. I.E I can make split_argv > behave such that: > echo "|\"foo bar\" --pid %p" > /proc/sys/kernel/core_pattern > results in the following argv: > {{"foo bar"}, {"--pid"}, {"1234"}} > > Which I think handles what you are looking for. > No, please don't. My original argv_split did that, and it was just way too complex. If you need complex quoting, you can always point it at a shell script and handle it there. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] lib: move kasprintf to a separate file
Sam Ravnborg wrote: > kasprintf pulls in kmalloc which proved to be fatal for at least > bootimage target on alpha. > Move it to a separate file so only users of kasprintf are exposed > to the dependency on kmalloc. > OK by me (that's what my original patch did), but it might be worth documenting what environmental constraints vsprintf.c is under. I didn't realize it was used by non-kernel code. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][ACPI] Let's not gamble that a possible double free will never happen in asus_hotk_get_info()
Hi, The Coverity checker points out (CID: 1500) that we can in some cases end up doing a double free of 'model' in asus_hotk_get_info(). I'm not 100% sure it is right, but better safe than sorry, especially since this is so simple to turn into a non-issue - simply set 'model' to NULL after the first kfree() and then the second kfree() is harmless (if it actually can happen, and if it cannot happen then the cost is just a single extra assignment). Here is the function with Coverity's annotations (proposed patch at the end of the mail) ... 1141/* 1142 * This function is used to initialize the hotk with right values. In this 1143 * method, we can make all the detection we want, and modify the hotk struct 1144 */ 1145static int asus_hotk_get_info(void) 1146{ 1147struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL }; 1148union acpi_object *model = NULL; 1149int bsts_result; 1150char *string = NULL; 1151acpi_status status; 1152 1153/* 1154 * Get DSDT headers early enough to allow for differentiating between 1155 * models, but late enough to allow acpi_bus_register_driver() to fail 1156 * before doing anything ACPI-specific. Should we encounter a machine, 1157 * which needs special handling (i.e. its hotkey device has a different 1158 * HID), this bit will be moved. A global variable asus_info contains 1159 * the DSDT header. 1160 */ 1161status = acpi_get_table(ACPI_SIG_DSDT, 1, _info); 1162if (ACPI_FAILURE(status)) 1163printk(KERN_WARNING " Couldn't get the DSDT table header\n"); 1164 1165/* We have to write 0 on init this far for all ASUS models */ 1166if (!write_acpi_int(hotk->handle, "INIT", 0, )) { 1167printk(KERN_ERR " Hotkey initialization failed\n"); 1168return -ENODEV; 1169} 1170 1171/* This needs to be called for some laptops to init properly */ 1172if (!read_acpi_int(hotk->handle, "BSTS", _result)) 1173printk(KERN_WARNING " Error calling BSTS\n"); 1174else if (bsts_result) 1175printk(KERN_NOTICE " BSTS called, 0x%02x returned\n", 1176 bsts_result); 1177 1178/* 1179 * Try to match the object returned by INIT to the specific model. 1180 * Handle every possible object (or the lack of thereof) the DSDT 1181 * writers might throw at us. When in trouble, we pass NULL to 1182 * asus_model_match() and try something completely different. 1183 */ 1184if (buffer.pointer) { Event alias: aliasing "(buffer).pointer" with "model" Also see events: [freed_arg][double_free] 1185model = buffer.pointer; 1186switch (model->type) { 1187case ACPI_TYPE_STRING: 1188string = model->string.pointer; 1189break; 1190case ACPI_TYPE_BUFFER: 1191string = model->buffer.pointer; 1192break; At conditional (1): "default" taking true path 1193default: Event freed_arg: Pointer "model" freed by function "kfree" Also see events: [alias][double_free] 1194kfree(model); 1195break; 1196} 1197} 1198hotk->model = asus_model_match(string); At conditional (2): "(hotk)->model == 23" taking false path 1199if (hotk->model == END_MODEL) { /* match failed */ 1200if (asus_info && 1201strncmp(asus_info->oem_table_id, "ODEM", 4) == 0) { 1202hotk->model = P30; 1203printk(KERN_NOTICE 1204 " Samsung P30 detected, supported\n"); 1205} else { 1206hotk->model = M2E; 1207printk(KERN_NOTICE " unsupported model %s, trying " 1208 "default values\n", string); 1209printk(KERN_NOTICE 1210 " send /proc/acpi/dsdt to the developers\n"); 1211} 1212hotk->methods = _conf[hotk->model]; 1213return AE_OK; 1214} 1215hotk->methods = _conf[hotk->model]; 1216printk(KERN_NOTICE " %s model detected, supported\n", string); 1217 1218/* Sort of per-model blacklist */ At conditional (3): "strncmp == 0" taking false path 1219if (strncmp(string, "L2B", 3) == 0) 1220hotk->methods->lcd_status =
[PATCH 2/2] Wait for page writeback when directly reclaiming contiguous areas
From: Mel Gorman <[EMAIL PROTECTED]> Lumpy reclaim works by selecting a lead page from the LRU list and then selecting pages for reclaim from the order-aligned area of pages. In the situation were all pages in that region are inactive and not referenced by any process over time, it works well. In the situation where there is even light load on the system, the pages may not free quickly. Out of a area of 1024 pages, maybe only 950 of them are freed when the allocation attempt occurs because lumpy reclaim returned early. This patch alters the behaviour of direct reclaim for large contiguous blocks. The first attempt to call shrink_page_list() is asynchronous but if it fails, the pages are submitted a second time and the calling process waits for the IO to complete. It'll retry up to 5 times for the pages to be fully freed. This may stall allocators waiting for contiguous memory but that should be expected behaviour for high-order users. It is preferable behaviour to potentially queueing unnecessary areas for IO. Note that kswapd will not stall in this fashion. [EMAIL PROTECTED]: update to version 2] Signed-off-by: Mel Gorman <[EMAIL PROTECTED]> Signed-off-by: Andy Whitcroft <[EMAIL PROTECTED]> Changelog: Changes in V2: - remove retry loop - fix up active accounting (count deactivate events correctly) - use our own sync/async flag type --- diff --git a/mm/vmscan.c b/mm/vmscan.c index 99ec7fa..1c21714 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -271,6 +271,12 @@ static void handle_write_error(struct address_space *mapping, unlock_page(page); } +/* Request for sync pageout. */ +typedef enum { + PAGEOUT_IO_ASYNC, + PAGEOUT_IO_SYNC, +} pageout_io_t; + /* possible outcome of pageout() */ typedef enum { /* failed to write page out, page is locked */ @@ -287,7 +293,8 @@ typedef enum { * pageout is called by shrink_page_list() for each dirty page. * Calls ->writepage(). */ -static pageout_t pageout(struct page *page, struct address_space *mapping) +static pageout_t pageout(struct page *page, struct address_space *mapping, + pageout_io_t sync_writeback) { /* * If the page is dirty, only perform writeback if that write @@ -346,6 +353,15 @@ static pageout_t pageout(struct page *page, struct address_space *mapping) ClearPageReclaim(page); return PAGE_ACTIVATE; } + + /* +* Wait on writeback if requested to. This happens when +* direct reclaiming a large contiguous area and the +* first attempt to free a ranage of pages fails +*/ + if (PageWriteback(page) && sync_writeback == PAGEOUT_IO_SYNC) + wait_on_page_writeback(page); + if (!PageWriteback(page)) { /* synchronous write or broken a_ops? */ ClearPageReclaim(page); @@ -423,7 +439,8 @@ cannot_free: * shrink_page_list() returns the number of reclaimed pages */ static unsigned long shrink_page_list(struct list_head *page_list, - struct scan_control *sc) + struct scan_control *sc, + pageout_io_t sync_writeback) { LIST_HEAD(ret_pages); struct pagevec freed_pvec; @@ -458,8 +475,12 @@ static unsigned long shrink_page_list(struct list_head *page_list, if (page_mapped(page) || PageSwapCache(page)) sc->nr_scanned++; - if (PageWriteback(page)) - goto keep_locked; + if (PageWriteback(page)) { + if (sync_writeback == PAGEOUT_IO_SYNC) + wait_on_page_writeback(page); + else + goto keep_locked; + } referenced = page_referenced(page, 1); /* In active use or really unfreeable? Activate it. */ @@ -505,7 +526,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, goto keep_locked; /* Page is dirty, try to write it out here */ - switch(pageout(page, mapping)) { + switch (pageout(page, mapping, sync_writeback)) { case PAGE_KEEP: goto keep_locked; case PAGE_ACTIVATE: @@ -786,7 +807,29 @@ static unsigned long shrink_inactive_list(unsigned long max_scan, spin_unlock_irq(>lru_lock); nr_scanned += nr_scan; - nr_freed = shrink_page_list(_list, sc); + nr_freed = shrink_page_list(_list, sc, PAGEOUT_IO_ASYNC); + + /* +* If we are direct reclaiming for contiguous pages and we do +
[PATCH 1/2] ensure we count pages transitioning inactive via clear_active_flags
We are transitioning pages from active to inactive in clear_active_flags, those need counting as PGDEACTIVATE vm events. Signed-off-by: Andy Whitcroft <[EMAIL PROTECTED]> Acked-by: Mel Gorman <[EMAIL PROTECTED]> --- diff --git a/mm/vmscan.c b/mm/vmscan.c index d419e10..99ec7fa 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -777,6 +777,7 @@ static unsigned long shrink_inactive_list(unsigned long max_scan, (sc->order > PAGE_ALLOC_COSTLY_ORDER)? ISOLATE_BOTH : ISOLATE_INACTIVE); nr_active = clear_active_flags(_list); + __count_vm_events(PGDEACTIVATE, nr_active); __mod_zone_page_state(zone, NR_ACTIVE, -nr_active); __mod_zone_page_state(zone, NR_INACTIVE, - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/2] Synchronous Lumpy Reclaim V2
As pointed out by Mel when reclaim is applied at higher orders a significant amount of IO may be started. As this takes finite time to drain reclaim will consider more areas than ultimatly needed to satisfy the request. This leads to more reclaim than strictly required and reduced success rates. I was able to confirm Mel's test results on systems locally. These show that even under light load the success rates drop off far more than expected. Testing with a modified version of his patch (which follows) I was able to allocate almost all of ZONE_MOVABLE with a near idle system. I ran 5 test passes sequentially following system boot (the system has 29 hugepages in ZONE_MOVABLE): 2.6.23-rc1 11 8 6 7 7 sync_lumpy v2 28 28 29 29 26 These show that although hugely better than the near 0% success normally expected we can only allocate about a 1/4 of the zone. Using synchronous reclaim for these allocations we get close to 100% as expected. I have also run our standard high order tests and these show no regressions in allocation success rates at rest, and some significant improvements under load. Following this email are two patches, both should be considered as bug fixes to lumpy reclaim: ensure-we-count-pages-transitioning-inactive-via-clear_active_flags: this a bug fix for Lumpy Reclaim fixing up a bug in VM Event accounting when it marks pages inactive, and Wait-for-page-writeback-when-directly-reclaiming-contiguous-areas: updates reclaim making direct reclaim synchronous when applied at orders above PAGE_ALLOC_COSTLY_ORDER. Patches against 2.6.23-rc1. Andrew please consider for -mm and for pushing to mainline. -apw - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] mm: reduce pagetable-freeing latencies
> The onstack array seems fine to me, even if you do end up deciding on > an array of one. Is there any evidence that it's a problem getting a > page for the freeing (other than in circumstances that are already > badly slowed down)? It's obvious that we need a fallback route, > but optimizing throughput on that route seems premature. Hrm, no evidence of that so far indeed. I'm worried by the stack usage of the unmap_mapping_ranges() but appart from that, no. Appart from that, yeah, I suppose we can have a macro defining how many on-stack backup we have and adjust it if we see that being a problem. I'm not fan of dynamic on-stack allocations. Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: Linus 2.6.23-rc1
Linus Torvalds wrote: I personally feel that modal behaviour is bad, so it would introduce what is in my opinion bad code, and likely result in problems not being found and fixed as well (because people would pick the thing that "works for them", and ignore the problems in the other module). I'm sorry, but this argument doesn't hold water. It was invoked years ago and turned out to be incorrect - the new CFS scheduler is not just a fixed old scheduler, it's a completely redesigned one. -- With respect, Alex Besogonov ([EMAIL PROTECTED]) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sound is interrupting with new kernels
On 7/23/07, Ingo Molnar <[EMAIL PROTECTED]> wrote: > > could you try CONFIG_HZ_1000 instead of the 250 you are using currently? > Also, please enable CONFIG_SCHED_DEBUG to improve the output of > cfs-debug-info.sh. > > Ingo > Hi, Igno. Sorry for so long response, I hadn't opportunity to reboot machine for new kernel. I've built 2.6.22 with CONFIG_HZ_1000 and CONFIG_PREEMPT - nothing changed =(. Interesting that in Totem(Gnome vp) sound isn't interrupting during video watching. I'll try other kernels later to find out what is working good for my case. In gxine(xine-based) sound is interrupting too! >firstly, could you check whether the ogg123/mpg321 console apps work >without any audio skipping? If they work fine, does Amarok work fine? >(Amarok is an X apps that has a high-quality latency design - most other >X based players are affected by X communication latencies.) I'm not using amarok but audacious. It's seems that's everything is alright with it. I'll send more tests results later. Best wishes! Dima - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linus 2.6.23-rc1
On Sat, 28 Jul 2007, Jan Engelhardt wrote: > > I generally run with CONFIG_HZ=100, CONFIG_NO_HZ=n, CONFIG_PREEMPT_NONE. Ok, that's HZ=100 is likely the worst case, as it effectively multiples all the scheduler latencies by 10 (rather than by 4, which is what the default 250Hz does). That said, I think most testing showed that the CFS scheduler tunables didn't have a huge amount of impact on how things felt, so that factor-of-ten might not even matter that much. The 3D game issues may well be totally elsewhere. But it's certainly worth looking at. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] USB Pegasus driver - avoid a potential NULL pointer dereference.
Hello, This patch makes sure we don't dereference a NULL pointer in drivers/net/usb/pegasus.c::write_bulk_callback() in the initial struct net_device *net = pegasus->net; assignment. The existing code checks if 'pegasus' is NULL and bails out if it is, so we better not touch that pointer until after that check. Please consider merging. Signed-off-by: Jesper Juhl <[EMAIL PROTECTED]> --- drivers/net/usb/pegasus.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/drivers/net/usb/pegasus.c b/drivers/net/usb/pegasus.c index a05fd97..04cba6b 100644 --- a/drivers/net/usb/pegasus.c +++ b/drivers/net/usb/pegasus.c @@ -768,11 +768,13 @@ done: static void write_bulk_callback(struct urb *urb) { pegasus_t *pegasus = urb->context; - struct net_device *net = pegasus->net; + struct net_device *net; if (!pegasus) return; + net = pegasus->net; + if (!netif_device_present(net) || !netif_running(net)) return; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: Linus 2.6.23-rc1
On Sat, 28 Jul 2007, Bill Huey wrote: > > My argument is that schedule development is open ended. Although having > a central scheduler to hack is a a good thing, it shouldn't lock out or > supress development from other groups that might be trying to solve the > problem in unique ways. I don't think anything was suppressed here. You seem to say that more modular code would have helped make for a nicer way to do schedulers, but if so, where were those patches to do that? Con's patches didn't do that either. They just replaced the code. In fact, Ingo's patches _do_ add some modularity, and might make it easier to replace the scheduler. So it would seem that you would argue for CFS, not against it? > I think that's kind of a bogus assumption from the very get go. Scheduling > in Linux is one of the most unevolved systems in the kernel that still > could go through a large transformation and get big gains like what > we've had over the last few months. This evident with both schedulers, > both do well and it's a good thing overall the CFS is going in. > > Now, the way it happened is completely screwed up in so many ways that I > don't see how folks can miss it. This is not just Ingo versus Con, this > is the current Linux community and how it makes decision from the top down > and the current cultural attitude towards developers doing things that > are: I don't think so. I think you're barking up the totally wrong tree here. I think that what happened was very simple: somebody showed that we did badly and had benchmarks to show for it, and that in turn resulted in a huge spurt of coding from the people involved. The fact that you think this is "broken" is interesting. I can point to a very real example of where this also happened, and where I bet you don't think the process was "broken". Do you remember the mindcraft study? Exact same thing. Somebody came in, and showed that Linux did really badly on some benchmark, and that an alternate approach was much better. What happened? A huge spurt of development in a pretty short timeframe, that totally _obliterated_ the mindcraft results. It could have happened independently, but the fact is, it didn't. These kinds of events where somebody shows (with real numbers and code) that things can be done better really *are* a good way to do development, and it's how development generally ends up happening. It's hugely motivational, both because competition is motivational in itself, but also because somebody shows that things can be done so much better opens peoples eyes to it. And if you think the scheduler situation is different, why? Was it just because the mindcraft study compared against Windows NT, not another version of Linux patches? The thing is, development is slow and gradual, but at the same time, it happens in spurts (btw, if you have ever studied evolution, you'll find the exact same thing: evolution is slow and gradual, but it also happens in sudden "spurts" where you have relatively much bigger changes happening because you get into a feedback loop). Another comparison to evolution: most of the competitive pressure actually comes from the _same_ species, not from outside. It's not so much rabbits competing against foxes (although that happens too), quite a lot of it is rabbits competing against other rabbits! Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linus 2.6.23-rc1
On Jul 28 2007 14:33, Linus Torvalds wrote: > >Btw, people who actually have 3D games installed (I have exactly one: >ppracer, and I can't really say that I care about how it feels), if you >don't have CONFIG_HZ=1000, this really is worth testing. > >I think Ingo probably ran with CONFIG_NO_HZ and HZ_1000, but the default >timer tick is actually 250Hz, which makes all the default scheduler values >come out four times bigger than they are documented/supposed to be. I generally run with CONFIG_HZ=100, CONFIG_NO_HZ=n, CONFIG_PREEMPT_NONE. Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFT: updatedb "morning after" problem [was: Re: -mm merge plans for 2.6.23]
On Saturday 28 July 2007 17:06:50 [EMAIL PROTECTED] wrote: > On Sat, 28 Jul 2007, Daniel Hazelton wrote: > > On Saturday 28 July 2007 04:55:58 [EMAIL PROTECTED] wrote: > >> On Sat, 28 Jul 2007, Rene Herman wrote: > >>> On 07/27/2007 09:43 PM, [EMAIL PROTECTED] wrote: > On Fri, 27 Jul 2007, Rene Herman wrote: > > On 07/27/2007 07:45 PM, Daniel Hazelton wrote: > >> > >> nobody is arguing that swap prefetch helps in the second cast. > > > > Actually, I made a mistake when tracking the thread and reading the code > > for the patch and started to argue just that. But I have to admit I made > > a mistake - the patches author has stated (as Rene was kind enough to > > point out) that swap prefetch can't help when memory is filled. > > I stand corrected, thaks for speaking up and correcting your position. If you had made the statement before I decided to speak up you would have been correct :) Anyway, I try to always admit when I've made a mistake - its part of my philosophy. (There have been times when I haven't done it, but I'm trying to make that stop entirely) > >> what people are arguing is that there are situations where it helps for > >> the first case. on some machines and version of updatedb the nighly run > >> of updatedb can cause both sets of problems. but the nightly updatedb > >> run is not the only thing that can cause problems > > > > Solving the cache filling memory case is difficult. There have been a > > number of discussions about it. The simplest solution, IMHO, would be to > > place a (configurable) hard limit on the maximum size any of the kernels > > caches can grow to. (The only solution that was discussed, however, is a > > complex beast) > > limiting the size of the cache is also the wrong thing to do in many > situations. it's only right if the cache pushes out other data you care > about, if you are trying to do one thing as fast as you can you really do > want the system to use all the memory it can for the cache. After thinking about this you are partially correct. There are those sorts of situations where you want the system to use all the memory it can for caches. OTOH, if those situations could be described in some sort of simple heuristic, then a soft-limit that uses those heuristics to determine when to let the cache expand could exploit the benefits of having both a limited and unlimited cache. (And, potentially, if the heuristic has allowed a cache to expand beyond the limit then, when the heuristic no longer shows the oversize cache is no longer necessary it could trigger and automatic reclaim of that memory.) (I'm willing to help write and test code to do this exactly. There is no guarantee that I'll be able to help with more than testing - I don't understand the parts of the code involved all that well) DRH -- Dialup is like pissing through a pipette. Slow and excruciatingly painful. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linus 2.6.23-rc1
On Sat, 28 Jul 2007, Linus Torvalds wrote: > > Yes, it's what "/proc/sys/kernel/sched_granularity_ns" is supposed to > tweak, but maybe there's some misfeature there, or maybe the default is > just bad for games, or whatever. > > Ingo: that sysctl_sched_granularity initialization doesn't make sense. You > talk about it being in units of nanoseconds, but then you do > > 20ULL/HZ > > which is nonsensical. Btw, people who actually have 3D games installed (I have exactly one: ppracer, and I can't really say that I care about how it feels), if you don't have CONFIG_HZ=1000, this really is worth testing. I think Ingo probably ran with CONFIG_NO_HZ and HZ_1000, but the default timer tick is actually 250Hz, which makes all the default scheduler values come out four times bigger than they are documented/supposed to be. On SMP, that scheduler granularity then gets doubled once more if you have two CPU's, so rather than 2ms by default, it ends up being 16ns (and the time slices themselves end up being bigger than that). So doing some testing with a simple echo 200 > /proc/sys/kernel/sched_granularity_ns echo 100 > /proc/sys/kernel/sched_batch_wakeup_granularity_ns echo 800 > /proc/sys/kernel/sched_runtime_limit_ns might be worth doing (and if you vary numbers to see if it matters, please do let people know!) Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: Linus 2.6.23-rc1
On Sat, Jul 28, 2007 at 11:06:09PM +0200, Diego Calleja wrote: > So your argument is that SD shouldn't have been merged either, because it > would have resulted in one scheduler over the other? My argument is that schedule development is open ended. Although having a central scheduler to hack is a a good thing, it shouldn't lock out or supress development from other groups that might be trying to solve the problem in unique ways. This can be accomplished in a couple of ways: 1) scheduler modularity Clearly Con is highly qualified to experiement with scheduler code and this should be technically facilitate by some means if not a maintainer. He's only a part time maintainer and nobody helped him with this stuff nor did they try to understand what his scheduler was trying to do other than Tong Li. 2) better code modularity Now, cleaner code would help with this a lot. If that was in place, we might not need (1) and pluggable scheduler. It would limit the amount of refactoring for folks so that their code can drop in easier. There's a significant amount of churn that it locks out developers by default since they have to constantly clean up the code in question while another developer can commit without consideration to how it effects others. That's their right as a maintainer, but also as maintainer, they should give proper amount of consideration to how others might intend to extend the code so that development remains "inclusive". This notion of "open source, open development" is false when working under those circumstances. > > where capable but one is locked out now because of the choices of > > current high level kernel developers in Linux. > > Well, there are two schedulers...it's obvious that "high level kernel > developers" needed to chose one. I think that's kind of a bogus assumption from the very get go. Scheduling in Linux is one of the most unevolved systems in the kernel that still could go through a large transformation and get big gains like what we've had over the last few months. This evident with both schedulers, both do well and it's a good thing overall the CFS is going in. Now, the way it happened is completely screwed up in so many ways that I don't see how folks can miss it. This is not just Ingo versus Con, this is the current Linux community and how it makes decision from the top down and the current cultural attitude towards developers doing things that are: 1) architecturally significant which they will get flamed to death by the establish Linux kernel culture before they can get any users to report bugs after their posting on lkml. 2) conceptual different which is subject to the reasons above, but also get flamed to death unless it comes from folks internal to the Linux development processes. When groups get to a certain size like it has, there needs to be a revision of development processes so that they can scale and be "inclusive" to the overall spirit the Linux development process. When that breaks down, we get situations like what we have with Con leaving development. Other developers like me get turned off to the situation, also feel the same as Con and stop Linux development. That's my current status as well. > The main problem is clearly that no scheduler was clearly better than the > other. This remembers me of the LVM2/MD vs EVMS in the 2.5 days - both > of them were good enought, but only one of them could be merged. The > difference is that EVMS developers didn't get that annoyed, and not only > they didn't quit but they continued developing their userspace tools to > make it work with the solution included in the kernel That's a good point to have folks not go down that particular path. But Con was kind of put down during the arguments with Ingo about his assumptions of the problems and then was personally crapped on by having his own idea under go a a complete reversal in opinion by Ingo, with Ingo then doing this own version of Con's work displacing him How would you feel in that situation ? I'd be pretty damn pissed. [For the record Peter Zijlstra did the same thing to me which is annoying, but since he's my buddy doesn't get as rude as the above situation, included me in every private mail about his working so that I don't feel like RH is paying him to undermine my brilliance, it's ok :)] bill - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: Linus 2.6.23-rc1
On Sat, 28 Jul 2007, Diego Calleja wrote: > El Sat, 28 Jul 2007 11:05:25 -0700 (PDT), Linus Torvalds <[EMAIL PROTECTED]> > escribió: > > > > So "modal" things are good for fixing behaviour in the short run. But they > > are a total disaster in the long run, and even in the short run they tend > > to have problems (simply because there will be cases that straddle the > > line, and show some of _both_ issues, and now *neither* mode is the right > > one) > > I fully agree with this, but plugsched could have avoided this useless > "division" > on the topic of SD vs CFS. IMO that counts as an advantage, too ;) Sure. I actually think it's a huge advantage (see the ManagementStyle file on pissing people off), but at the same time, I don't like playing politics with technology. The kernel is a technical project, and I make technical decisions. So I absolutely detest adding code for "political" reasons. I personally feel that modal behaviour is bad, so it would introduce what is in my opinion bad code, and likely result in problems not being found and fixed as well (because people would pick the thing that "works for them", and ignore the problems in the other module). So while I don't like making irreversible decisions (and the choice of CFS wasn't irreversible in itself, but if it pisses off Con, _that_ is generally not reversible), I dislike even more making a half-assed decision. So rather than making a choice at all, my other choice would have been to not merge _either_ scheduler, and let people just continue to fight it out. Would that have made people happier? I seriously doubt it. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFT: updatedb "morning after" problem [was: Re: -mm merge plans for 2.6.23]
On Sat, 28 Jul 2007, Daniel Hazelton wrote: On Saturday 28 July 2007 04:55:58 [EMAIL PROTECTED] wrote: On Sat, 28 Jul 2007, Rene Herman wrote: On 07/27/2007 09:43 PM, [EMAIL PROTECTED] wrote: On Fri, 27 Jul 2007, Rene Herman wrote: On 07/27/2007 07:45 PM, Daniel Hazelton wrote: nobody is arguing that swap prefetch helps in the second cast. Actually, I made a mistake when tracking the thread and reading the code for the patch and started to argue just that. But I have to admit I made a mistake - the patches author has stated (as Rene was kind enough to point out) that swap prefetch can't help when memory is filled. I stand corrected, thaks for speaking up and correcting your position. what people are arguing is that there are situations where it helps for the first case. on some machines and version of updatedb the nighly run of updatedb can cause both sets of problems. but the nightly updatedb run is not the only thing that can cause problems Solving the cache filling memory case is difficult. There have been a number of discussions about it. The simplest solution, IMHO, would be to place a (configurable) hard limit on the maximum size any of the kernels caches can grow to. (The only solution that was discussed, however, is a complex beast) limiting the size of the cache is also the wrong thing to do in many situations. it's only right if the cache pushes out other data you care about, if you are trying to do one thing as fast as you can you really do want the system to use all the memory it can for the cache. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: Linus 2.6.23-rc1
On Fri, 2007-07-27 at 19:35 -0700, Linus Torvalds wrote: > As a long-term maintainer, trust me, I know what matters. And a person who > can actually be bothered to follow up on problem reports is a *hell* of a > lot more important than one who just argues with reporters. > > Linus Once again linus blows a nut getting off about this and that. The fact of the matter linus is a one sided. The fact is linus says what he wants and people think he is god. The fact is noone get code in unless they are a major player in a linux distro. Ingo had much advantage by using fedora users. The fact Con did not take all bugs serious yes that is a player of the game but linus is GOD so all bow before him before he blows his back out while jacking off to his rants about how the kernel and other projects should run. Jory - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: Linus 2.6.23-rc1
El Sat, 28 Jul 2007 13:07:05 -0700, Bill Huey (hui) <[EMAIL PROTECTED]> escribió: > of how crappy X is. This is an open argument on how to solve, but it > should not have resulted in really one scheduler over the other. Both So your argument is that SD shouldn't have been merged either, because it would have resulted in one scheduler over the other? > where capable but one is locked out now because of the choices of > current high level kernel developers in Linux. Well, there are two schedulers...it's obvious that "high level kernel developers" needed to chose one. The main problem is clearly that no scheduler was clearly better than the other. This remembers me of the LVM2/MD vs EVMS in the 2.5 days - both of them were good enought, but only one of them could be merged. The difference is that EVMS developers didn't get that annoyed, and not only they didn't quit but they continued developing their userspace tools to make it work with the solution included in the kernel (http://lwn.net/Articles/14714/) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFT: updatedb "morning after" problem [was: Re: -mm merge plans for 2.6.23]
On Sat, 28 Jul 2007, Alan Cox wrote: It is. Prefetched pages can be dropped on the floor without additional I/O. Which is essentially free for most cases. In addition your disk access may well have been in idle time (and should be for this sort of stuff) and if it was in the same chunk as something nearby was effectively free anyway. as I understand it the swap-prefetch only kicks in if the device is idle Actual physical disk ops are precious resource and anything that mostly reduces the number will be a win - not to stay swap prefetch is the right answer but accidentally or otherwise there are good reasons it may happen to help. Bigger more linear chunks of writeout/readin is much more important I suspect than swap prefetching. I'm sure this is true while you are doing the swapout or swapin and the system is waiting for it. but with prefetch you may be able to avoid doing the swapin at a time when the system is waiting for it by doing it at a time when the system is otherwise idle. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFT: updatedb "morning after" problem [was: Re: -mm merge plans for 2.6.23]
On Sat, 28 Jul 2007, Rene Herman wrote: On 07/28/2007 10:55 AM, [EMAIL PROTECTED] wrote: it looks to me like unless the code was really bad (and after 23 months in -mm it doesn't sound like it is) Not to sound pretentious or anything but I assume that Andrew has a fairly good overview of exactly how broken -mm can be at times. How many -mm users use it anyway? He himself said he's not convinced of usefulness having not seen it help for him (and notice that most developers are also users), turned it off due to it annoying him at some point and hasn't seen a serious investigation into potential downsides. if that was the case then people should be responding to the request to get it merged with 'but it caused problems for me when I tried it' I haven't seen any comments like that. that the only significant con left is the potential to mask other problems. Which is not a madeup issue, mind you. As an example, I just now tried GNU locate and saw it's a complete pig and specifically unsuitable for the low memory boxes under discussion. Upon completion, it actually frees enough memory that swap-prefetch _could_ help on some boxes, while the real issue is that they should first and foremost dump GNU locate. I see the conclusion as being exactly the opposite. here is a workload with some badly designed userspace software that the kernel can make much more pleasent for users. arguing that users should never use badly designed software in userspace doesn't seem like an argument that will gain much traction. I'm not saying the kernel needs to fix the software itself (ala the sched_yeild issues), but the kernel should try and keep such software from hurting the rest of the system where it can. in this case it can't help it while the bad software is running, but it could minimize the impact after it finishes. however there are many legitimate cases where it is definantly dong the right thing (swapout was correct in pushing out the pages, but now the cause of that preasure is gone). the amount of benifit from this will vary from situation to situation, but it's not reasonable to claim that this provides no benifit (you have benchmark numbers that show it in synthetic benchmarks, and you have user reports that show it in the real-worlk) I certainly would not want to argue anything of the sort no. As said a few times, I agree that swap-prefetch makes sense and has at least the potential to help some situations that you really wouldnt even want to try and fix any other way, simply because nothing's broken. so there is a legitimate situation where swap-prefetch will help significantly, what is the downside that prevents it from being included? (reading this thread it sometimes seems like the downside is that updatedb shouldn't cause this problem and so if you fixed updatedb there wold be no legitimate benifit, or alturnatly this patch doesn't help updatedb so there's no legitimate benifit) there are lots of things in the kernel who's job is to pre-fill the memroy with data that may (or may not) be useful in the future. this is just another method of filling the cache. it does so my saying "the user wanted these pages in the recent past, so it's a reasonable guess to say that the user will want them again in the future" Well, _that_ is what the kernel is already going to great lengths at doing, and it decided that those pages us poor overnight OO.o users want in in the morning weren't reasonable guesses. The kernel also won't any time soon be reading our minds, so any solution would need either user intervention (we could devise a way to tell the kernel "hey ho, I consider these pages to be very important -- try not to swap them out" possible even with a "and if you do, please pull them back in when possible") or we can let swap-prefetch do the "just in case" thing it is doing. it's not that they shouldn't have been swapped out (they should have been), it's that the reason they were swapped out no longer exists. While swap-prefetch may not be the be all end all of solutions I agree that having a machine sit around with free memory and applications in swap seems not too useful if (as is the case) fetched pages can be dropped immediately when it turns out swap-prefetch made the wrong decision. So that's for the concept. As to implementation, if I try and look at the code, it seems to be trying hard to really be free and as such, potential downsides seem limited. It's a rather core concept though and as such needs someone with a _lot_ more VM clue to ack. Sorry for not knowing, but who's maintaining/submitting the thing now that Con's not? He or she should preferably address any concerns it seems. I've seen it mentioned that there is still a maintainer but I missed who it is, but I haven't seen any concerns that can be addressed, they all seem to be 'this is a core concept, people need to think about it' or 'but someone may find
Re: [ck] Re: Linus 2.6.23-rc1
On Jul 28 2007 22:51, Diego Calleja wrote: >El Sat, 28 Jul 2007 11:05:25 -0700 (PDT), Linus Torvalds <[EMAIL PROTECTED]> >escribió: > >> So "modal" things are good for fixing behaviour in the short run. But they >> are a total disaster in the long run, and even in the short run they tend >> to have problems (simply because there will be cases that straddle the >> line, and show some of _both_ issues, and now *neither* mode is the right >> one) > >I fully agree with this, but plugsched could have avoided this useless >"division" >on the topic of SD vs CFS. IMO that counts as an advantage, too ;) > It's like CONFIG_HZ - more or less often debated, and now we have everyone happy by giving them the choice. Jan --
Re: keyboard stopped working after de9ce703c6b807b1dfef5942df4f2fadd0fdb67a
On Tue, Jul 17, 2007 at 03:52:07PM -0400, Dmitry Torokhov wrote: > Hi Christoph, > > On 7/17/07, Christoph Pfister <[EMAIL PROTECTED]> wrote: > > > >Yep, attached (cold reboot, not pressing any keys, 2.6.21). > > > > Ok, I see. You don't use PS/2 mouse and so BIOS told us that mouse is > absent and reassigned IRQ12 to EHCI controller. However we do not > listen to BIOS on i386 (for historucal reasons) and process with > registering AUX port... Now IRQ12 is shared between AUX port and EHCI > and the keyboard controller is unhappy wehereas before (with polling > timer) it would release IRQ12 and close port... Here I should add that IRQ sharing between ISA/LPC where i8042 lives and PCI where EHCI lives doesn't work - only one of the sides will ever be able to trigger interrupts, depending on the bridge config. > Does your keyboard start working if you boot with i8042.noaux? -- Vojtech Pavlik Director SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] add a missing LIB_Y to arch/alpha/boot Makefile
> kasprintf pulls in kmalloc which proved to be fatal for at least > bootimage target on alpha. > Move it to a separate file so only users of kasprintf are exposed > to the dependency on kmalloc. Withe the addition of missing $(LIB_Y) it really seems to compile now. Can not boot test before some time next week. Add $(LIBS_Y) to get lib/lib.a so srm_printk is present. Signed-off-by: Meelis Roos <[EMAIL PROTECTED]> diff --git a/arch/alpha/boot/Makefile b/arch/alpha/boot/Makefile index e1ae14c..cd14388 100644 --- a/arch/alpha/boot/Makefile +++ b/arch/alpha/boot/Makefile @@ -104,7 +104,7 @@ OBJ_bootlx := $(obj)/head.o $(obj)/main.o OBJ_bootph := $(obj)/head.o $(obj)/bootp.o OBJ_bootpzh := $(obj)/head.o $(obj)/bootpz.o $(obj)/misc.o -$(obj)/bootloader: $(obj)/bootloader.lds $(OBJ_bootlx) FORCE +$(obj)/bootloader: $(obj)/bootloader.lds $(OBJ_bootlx) $(LIBS_Y) FORCE $(call if_changed,ld) $(obj)/bootpheader: $(obj)/bootloader.lds $(OBJ_bootph) $(LIBS_Y) FORCE -- Meelis Roos ([EMAIL PROTECTED]) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: Linus 2.6.23-rc1
El Sat, 28 Jul 2007 11:05:25 -0700 (PDT), Linus Torvalds <[EMAIL PROTECTED]> escribió: > So "modal" things are good for fixing behaviour in the short run. But they > are a total disaster in the long run, and even in the short run they tend > to have problems (simply because there will be cases that straddle the > line, and show some of _both_ issues, and now *neither* mode is the right > one) I fully agree with this, but plugsched could have avoided this useless "division" on the topic of SD vs CFS. IMO that counts as an advantage, too ;) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel panic w/ 2.6.22.1, VIA EPIA Mini ITX
> Checked the RAM on the box? Kinda weird if you're getting VRAM corruption, I > wonder if this is due to the RAM failing at the point where the framebuffer > is mapped? you are probably right... I removed one of the RAMs, no crash anymore. This is not ... nice. The manual says, that the board can handle up to 1GB (2x512MB). I don't think the RAM itself is damaged -- these are brandnew Kingston "ValueRam" PC133 SD-RAMs with lifelong guarantee. I wonder if this behaviour only apperas with my particular board, or if all VIA EPIA Mini ITX 500 are affected (I've run out of boards to test ...) thanks, herp > Try running memtest86 on it. good idea. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: Linus 2.6.23-rc1
On Sat, 28 Jul 2007, jos poortvliet wrote: > > Your point here seems to be: this is how it went, and it was right. Ok, got > that. But I wanted to bring out more than what you make sound like "that's what happened, deal with it". I tried to explain _why_ the choices that were made were in fact made. And it's a (I think) important thing for people to be aware of. The fact is, "personality" and "work with the other developers" is a big issue. You cannot just go off and do your own thing in your private world, and then expect it to be accepted without any discussion or other people showing up and doing alternate things. That's _especially_ true in an area that has a respected and working maintainer. >Yet, Con walked away (and not just over SD). Seeing Con go, I wonder > how many did leave without this splash. We've had people go with a splash before. Quite frankly, the current scheduler situation looks very much like the CML2 situation. Anybody remember that? The developer there also got rejected, the improvement was made differently (and much more in line with existing practices and maintainership), and life went on. Eric Raymond, however, left with a splash. It's not common, but it's not unheard of. Anybody who thinks that developers don't have huge egos probably haven't ever met a software engineer. And I suspect kernel people have bigger egos than most. No wonder there are clashes every once in a while - it's a wonder there aren't _more_ of them. > How and why? And is it due to a deeper problem? Well, one part of it is that the way to make changes in the kernel community is to do them incrementally. Small and incremental improvements are much easier to merge. If you go off and rewrite a subsystem, you shouldn't expect it to get merged, at least not unless it can live side-by-side with the old one (the new firewire stack is an example of that, and most filesystems are this way too). And the closer to some central part you get, the harder that gets. So the *bulk* of the kernel stuff can be handled either incrementally, or side-by-side, and as a result, you actually seldom see issues like this. The kernel is extremely modular, and a large reason for that is exactly to avoid couplings. Some (very few) things cannot be done incrementally. That's why I bring up CML2 as a fairly good example of this having happened before. Some things require flag-days. But you should pretty much *assume* that if there is a flag-day, and if there is a maintainer, the maintainer has to be involved. Does "maintainership" give infinite powers? No. I'll take patches that bypass maintainers, but there needs to be some reason for them (ie in some sense the maintainer needs to have done a bad job, or the patch just needs to be trivial enough - or it cuts across maintainership areas - that it's not even _worth_ going through all maintainers). So maintainers aren't "everything". But they are important. You can't just ignore them and go do your own thing, and then expect it to be merged. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Volanomark slows by 80% under CFS
On Fri, Jul 27, 2007 at 10:47:21PM -0400, Rik van Riel wrote: > Tim Chen wrote: > > Ingo, > > > > Volanomark slows by 80% with CFS scheduler on 2.6.23-rc1. > > Benchmark was run on a 2 socket Core2 machine. > > > > The change in scheduler treatment of sched_yield > > could play a part in changing Volanomark behavior. > > In CFS, sched_yield is implemented > > by dequeueing and requeueing a process . The time a process > > has spent running probably reduced the the cpu time due it > > by only a bit. The process could get re-queued pretty close > > to head of the queue, and may get scheduled again pretty > > quickly if there is still a lot of cpu time due. > > I wonder if this explains the 30% drop in top performance > seen with the MySQL sysbench benchmark when the scheduler > changed to CFS... > > See http://people.freebsd.org/~jeff/sysbench.png From the authors blog when he did that graph: http://jeffr-tech.livejournal.com/10103.html "So I updated the image for the second time today to include Ingo's cfs scheduler. This kernel is from the rpm on his website. I double checked that it was not using tcmalloc at the time and switching back to a 2.6.21 kernel returned to the expected perf. Basically, it has the same performance as the FreeBSD 4BSD scheduler now. Which is to say the peak is terrible but it has virtually no dropoff and performs better under load than the default 2.6.21 scheduler. " Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha compile failure (srm_printk)
On Sat, Jul 28, 2007 at 10:04:50PM +0200, Sam Ravnborg wrote: > > The fix is to split kasprintf out to a separate file to > avoid pulling in more stuff than necessary. I just sent a patch doing this and I assume you take care of the alpha changes. Thanks, Sam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] lib: move kasprintf to a separate file
kasprintf pulls in kmalloc which proved to be fatal for at least bootimage target on alpha. Move it to a separate file so only users of kasprintf are exposed to the dependency on kmalloc. Signed-off-by: Sam Ravnborg <[EMAIL PROTECTED]> Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]> Cc: Meelis Roos <[EMAIL PROTECTED]> --- lib/Makefile|2 +- lib/kasprintf.c | 44 lib/vsprintf.c | 35 --- 3 files changed, 45 insertions(+), 36 deletions(-) create mode 100644 lib/kasprintf.c diff --git a/lib/Makefile b/lib/Makefile index 6149663..d9e5f1c 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -2,7 +2,7 @@ # Makefile for some libs needed in the kernel. # -lib-y := ctype.o string.o vsprintf.o cmdline.o \ +lib-y := ctype.o string.o vsprintf.o kasprintf.o cmdline.o \ rbtree.o radix-tree.o dump_stack.o \ idr.o int_sqrt.o bitmap.o extable.o prio_tree.o \ sha1.o irq_regs.o reciprocal_div.o argv_split.o diff --git a/lib/kasprintf.c b/lib/kasprintf.c new file mode 100644 index 000..c5ff1fd --- /dev/null +++ b/lib/kasprintf.c @@ -0,0 +1,44 @@ +/* + * linux/lib/kasprintf.c + * + * Copyright (C) 1991, 1992 Linus Torvalds + */ + +#include +#include +#include +#include + +/* Simplified asprintf. */ +char *kvasprintf(gfp_t gfp, const char *fmt, va_list ap) +{ + unsigned int len; + char *p; + va_list aq; + + va_copy(aq, ap); + len = vsnprintf(NULL, 0, fmt, aq); + va_end(aq); + + p = kmalloc(len+1, gfp); + if (!p) + return NULL; + + vsnprintf(p, len+1, fmt, ap); + + return p; +} +EXPORT_SYMBOL(kvasprintf); + +char *kasprintf(gfp_t gfp, const char *fmt, ...) +{ + va_list ap; + char *p; + + va_start(ap, fmt); + p = kvasprintf(gfp, fmt, ap); + va_end(ap); + + return p; +} +EXPORT_SYMBOL(kasprintf); diff --git a/lib/vsprintf.c b/lib/vsprintf.c index 6b6734d..7b481ce 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -978,38 +978,3 @@ int sscanf(const char * buf, const char * fmt, ...) } EXPORT_SYMBOL(sscanf); - - -/* Simplified asprintf. */ -char *kvasprintf(gfp_t gfp, const char *fmt, va_list ap) -{ - unsigned int len; - char *p; - va_list aq; - - va_copy(aq, ap); - len = vsnprintf(NULL, 0, fmt, aq); - va_end(aq); - - p = kmalloc(len+1, gfp); - if (!p) - return NULL; - - vsnprintf(p, len+1, fmt, ap); - - return p; -} -EXPORT_SYMBOL(kvasprintf); - -char *kasprintf(gfp_t gfp, const char *fmt, ...) -{ - va_list ap; - char *p; - - va_start(ap, fmt); - p = kvasprintf(gfp, fmt, ap); - va_end(ap); - - return p; -} -EXPORT_SYMBOL(kasprintf); -- 1.5.1.rc3.g84b7-dirty - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: Linus 2.6.23-rc1
On Sat, Jul 28, 2007 at 09:28:36PM +0200, jos poortvliet wrote: > Your point here seems to be: this is how it went, and it was right. Ok, got > that. Yet, Con walked away (and not just over SD). Seeing Con go, I wonder > how many did leave without this splash. How many didn't even get involved at > all??? Did THAT have to happen? I don't blame you for it - the point is that > somewhere in the process a valuable kernel hacker went away. How and why? And > is it due to a deeper problem? Absolutely, the current Linux community hasn't realized how large the community has gotten and the internal processes for dealing with new developers, that aren't at companies like SuSE or RedHat, haven't been extended to deal with it yet. It comes off as elitism which it partially is. Nobody tries to facilitate or understand ideas in the larger community which locks folks like Con out that try to do provocative things outside of the normal technical development mindset. He was punished for doing so and is a huge failure in this community. Con basically got caught in a scheduler philosophical argument of whether to push a policy into userspace or to nice a process instead because of how crappy X is. This is an open argument on how to solve, but it should not have resulted in really one scheduler over the other. Both where capable but one is locked out now because of the choices of current high level kernel developers in Linux. There are a lot good kernel folks in many different communities that look at something like this and would be turned off to participating in Linux development. And I have a good record of doing rather interesting stuff in kernel. bill - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel panic w/ 2.6.22.1, VIA EPIA Mini ITX
> I have absolutely no idea what triggers this crash. Checked the RAM on the box? Kinda weird if you're getting VRAM corruption, I wonder if this is due to the RAM failing at the point where the framebuffer is mapped? Try running memtest86 on it. -- Cheers, Alistair. 137/1 Warrender Park Road, Edinburgh, UK. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: alpha compile failure (srm_printk)
On Sat, Jul 28, 2007 at 08:05:08PM +0300, Meelis Roos wrote: > Retested this compile error with todays 2.6.23-rc1+git, still the same. > > > > LD arch/alpha/boot/bootloader > > > arch/alpha/boot/bootloader.lds:25: undefined symbol `srm_printk' > > > referenced in expression > > > > I was unable to repeoduce these errors on 2.6.22-rc4 with your config. > > Hmm. Just make works, make bootimage does not. I debugged this further > today and I can not see how it can work. > > The link command in question is > ld -static -uvsprintf -T arch/alpha/boot/bootloader.lds > arch/alpha/boot/head.o arch/alpha/boot/main.o -o arch/alpha/boot/bootloader > and it still tells > arch/alpha/boot/bootloader.lds:25: undefined symbol `srm_printk' referenced > in expression > > arch/alpha/boot/bootloader.lds contains a single related line referring > to srm_printk: > printk = srm_printk; > This only seems to define an alias to srm_printk so not important (and > unused)? Hi Meelis. I took the time to investige this a bit. The relevant files in this case (arch/alpha/MAkefile + boot/Makefile has not seen many changes when browsing the git tree. So I looked back a bit further in the bitkeeper based tree. Before boot/Makefile were converted to kbuild style bootloader indeed referenced $(LIBS). That was lost in the process. So adding $(LIBS_Y) is the right thing to do. The linker error you get is due to kasprintf uses kmalloc and it gets pulled in when srm_printf uses vsprintf. The fix is to split kasprintf out to a separate file to avoid pulling in more stuff than necessary. PS. I had trouble compiling objstrip and had to due a lot of ugly hacks before it compiles. I assume it is a toolchain issue.. Sam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ATA scsi driver misbehavior under kdump capture kernel
On Fri, 27 Jul 2007, Cliff Wickman wrote: > I've run into a problem with the ATA SCSI disk driver when running in a > kdump dump-capture kernel. > > I'm running on 2-processor x86_64 box. It has 2 scsi disks, /dev/sda and > /dev/sdb > > My kernel is 2.6.22, and built to be a dump capturing kernel loaded by kexec. > When I boot this kernel by itself, it finds both sda and sdb. > > But when it is loaded by kexec and booted on a panic it only finds sda. > > Any ideas from those familiar with the ATA driver? No, just wanted to suggest to try to ask on kexec /fastboot seems to be deprecated) mailing list? Thanks Guennadi > > > -Cliff Wickman > SGI > > > > I put some printk's into it and get this: > > Standalone: > >[nv_adma_error_handler] > cpw: ata_host_register probe port 1 (error_handler:81348625) > cpw: ata_host_register call ata_port_probe > cpw: ata_host_register call ata_port_schedule > cpw: ata_host_register call ata_port_wait_eh > cpw: ata_port_wait_eh entered > cpw: ata_port_wait_eh, preparing to wait > ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > cpw: ata_dev_configure entered > cpw: ata_dev_configure testing class > cpw: ata_dev_configure class is ATA_DEV_ATA > ata2.00: ATA-6: ST3200822AS, 3.01, max UDMA/133 > ata2.00: 390721968 sectors, multi 16: LBA48 > cpw: ata_dev_configure exiting > cpw: ata_dev_configure entered > cpw: ata_dev_configure testing class > cpw: ata_dev_configure class is ATA_DEV_ATA > cpw: ata_dev_configure exiting > cpw: ata_dev_set_mode printing: > ata2.00: configured for UDMA/133 > cpw: ata_port_wait_eh, finished wait > cpw: ata_port_wait_eh exiting > cpw: ata_host_register done with probe port 1 > > > When loaded with kexec and booted on a panic: > > cpw: ata_host_register probe port 1 (error_handler:81348625) > cpw: ata_host_register call ata_port_probe > cpw: ata_host_register call ata_port_schedule > cpw: ata_host_register call ata_port_wait_eh > cpw: ata_port_wait_eh entered > cpw: ata_port_wait_eh, preparing to wait > ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > cpw: ata_port_wait_eh, finished wait > cpw: ata_port_wait_eh exiting > cpw: ata_host_register done with probe port 1 > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > --- Guennadi Liakhovetski - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] scheduler: improve SMP fairness in CFS
On Fri, 27 Jul 2007, Chris Snook wrote: Bill Huey (hui) wrote: You have to consider the target for this kind of code. There are applications where you need something that falls within a constant error bound. According to the numbers, the current CFS rebalancing logic doesn't achieve that to any degree of rigor. So CFS is ok for SCHED_OTHER, but not for anything more strict than that. I've said from the beginning that I think that anyone who desperately needs perfect fairness should be explicitly enforcing it with the aid of realtime priorities. The problem is that configuring and tuning a realtime application is a pain, and people want to be able to approximate this behavior without doing a whole lot of dirty work themselves. I believe that CFS can and should be enhanced to ensure SMP-fairness over potentially short, user-configurable intervals, even for SCHED_OTHER. I do not, however, believe that we should take it to the extreme of wasting CPU cycles on migrations that will not improve performance for *any* task, just to avoid letting some tasks get ahead of others. We should be as fair as possible but no fairer. If we've already made it as fair as possible, we should account for the margin of error and correct for it the next time we rebalance. We should not burn the surplus just to get rid of it. Proportional-share scheduling actually has one of its roots in real-time and having a p-fair scheduler is essential for real-time apps (soft real-time). On a non-NUMA box with single-socket, non-SMT processors, a constant error bound is fine. Once we add SMT, go multi-core, go NUMA, and add inter-chassis interconnects on top of that, we need to multiply this error bound at each stage in the hierarchy, or else we'll end up wasting CPU cycles on migrations that actually hurt the processes they're supposed to be helping, and hurt everyone else even more. I believe we should enforce an error bound that is proportional to migration cost. I think we are actually in agreement. When I say constant bound, it can certainly be a constant that's determined based on inputs from the memory hierarchy. The point is that it needs to be a constant independent of things like # of tasks. But this patch is only relevant to SCHED_OTHER. The realtime scheduler doesn't have a concept of fairness, just priorities. That why each realtime priority level has its own separate runqueue. Realtime schedulers are supposed to be dumb as a post, so they cannot heuristically decide to do anything other than precisely what you configured them to do, and so they don't get in the way when you're context switching a million times a second. Are you referring to hard real-time? As I said, an infrastructure that enables p-fair scheduling, EDF, or things alike is the foundation for real-time. I designed DWRR, however, with a target of non-RT apps, although I was hoping the research results might be applicable to RT. tong - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm] Fix libata warnings with CONFIG_PM=n
Hi Jeff, I noticed this warnings when CONFIG_PM=n ... drivers/ata/libata-core.c:5993: warning: 'ata_host_disable_link_pm' defined but not used drivers/ata/libata-core.c:6004: warning: 'ata_host_enable_link_pm' defined but not used ... Signed-off-by: Gabriel Craciunescu <[EMAIL PROTECTED]> --- --- linux-2.6.23-rc1/drivers/ata/libata-core.c.orig 2007-07-28 21:17:31.0 +0200 +++ linux-2.6.23-rc1/drivers/ata/libata-core.c 2007-07-28 21:17:48.0 +0200 @@ -5989,6 +5989,7 @@ int ata_flush_cache(struct ata_device *d return 0; } +#ifdef CONFIG_PM static void ata_host_disable_link_pm(struct ata_host *host) { int i; @@ -6011,7 +6012,7 @@ static void ata_host_enable_link_pm(stru } } -#ifdef CONFIG_PM + static int ata_host_request_pm(struct ata_host *host, pm_message_t mesg, unsigned int action, unsigned int ehi_flags, int wait) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linus 2.6.23-rc1
On Sat, 28 Jul 2007, Jan Engelhardt wrote: > > Time to investigate... Well, one thing that would be worth doing is to simply create a trace of time-slices for both schedulers. It could easily be some hacky thing that just saves the process name and TSC at each scheduling event in some fairly small fixed-sized per-CPU circular buffer, and have a /sys interface that reads it out, and then you do sleep 60 ; cat /sys/cpubuffer > buffer and play the game for 60 seconds (so that you get a buffer that represents perhaps the last 10 seconds of play). It could *literally* just be an effect of the time quanta used, and CFS just deciding that it's not interactive and giving things too long of a CPU slice. Yes, it's what "/proc/sys/kernel/sched_granularity_ns" is supposed to tweak, but maybe there's some misfeature there, or maybe the default is just bad for games, or whatever. Ingo: that sysctl_sched_granularity initialization doesn't make sense. You talk about it being in units of nanoseconds, but then you do 20ULL/HZ which is nonsensical. That value is "2 seconds" (not 2ms like the comment says) in nanoseconds, but then divided by HZ, so what's the meaning of that HZ thing? Nothing in the scheduler should care about jiffies, why is that related to HZ? All the scheduler clocks are in ns. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
kernel panic w/ 2.6.22.1, VIA EPIA Mini ITX
good day, when setting up a more or less old board, 2.6.22.1 crashes during an "emerge --sync" (gentoo installation). first, the screen will start funny - like a crashed C64 (from the "good" old days). I made a photo, you can look at it here: http://wildsau.enemy.org/~kernel/epia-crash.jpg actually, that's not a static picture, but there's also a lotta blinking going on. the image you see is from a non-framebuffer mode kernel, since I first suspected the framebuffer being the culprit, because when I framebuffer mode, the crash will result in many coloured pixels on the screen, like when doing "cat /dev/random /dev/fb0". But since this happens in textmode too, something else is triggering this. anyway hardware: the board is a VIA EPIA 500Mhz fanless (Mini ITX) with a C3 processor and 1GB (2x512) SD-RAM. the kernel complains about this: Jul 28 22:30:17 localhost BUG: unable to handle kernel paging request at virtual address 00ff or, to me more specific: Jul 28 22:30:17 localhost BUG: unable to handle kernel paging request at virtual address 00ff Jul 28 22:30:17 localhost printing eip: Jul 28 22:30:17 localhost c016c506 Jul 28 22:30:17 localhost *pde = Jul 28 22:30:17 localhost Oops: [#1] Jul 28 22:30:17 localhost Modules linked in: parport_pc parport uhci_hcd usbcore Jul 28 22:30:17 localhost CPU:0 Jul 28 22:30:17 localhost EIP:0060:[]Not tainted VLI Jul 28 22:30:17 localhost EFLAGS: 00010206 (2.6.22.1 #2) Jul 28 22:30:17 localhost EIP is at __d_lookup+0x66/0xe0 Jul 28 22:30:17 localhost eax: 4c482617 ebx: 00ff ecx: 0011 edx: c 1819180 Jul 28 22:30:17 localhost esi: 00ff edi: f7634005 ebp: ef848780 esp: f 7a87dc0 Jul 28 22:30:17 localhost ds: 007b es: 007b fs: gs: 0033 ss: 0068 Jul 28 22:30:17 localhost Process bash (pid: 4209, ti=f7a86000 task=c1bb8070 tas k.ti=f7a86000) Jul 28 22:30:17 localhost Stack: 4c482617 f7a87e1c f7a87e3c f7fe7114 f7a87f30 00 11 f7634005 f7634016 Jul 28 22:30:17 localhost f7a87e3c f7a87f04 f7a87e3c c0162f48 f7a87e48 c18e6220 c01a82b0 f7ef12a0 Jul 28 22:30:17 localhost f7634016 f7a87e3c f7ef12a0 f7a87f04 c0164a69 f7634005 0001 4c482617 Jul 28 22:30:17 localhost Call Trace: Jul 28 22:30:17 localhost [] do_lookup+0x28/0x190 Jul 28 22:30:17 localhost [] ext3_permission+0x0/0x10 Jul 28 22:30:17 localhost [] __link_path_walk+0x669/0xb20 Jul 28 22:30:17 localhost [] mntput_no_expire+0x1b/0x70 Jul 28 22:30:17 localhost [] link_path_walk+0x63/0xc0 Jul 28 22:30:17 localhost [] link_path_walk+0x43/0xc0 Jul 28 22:30:17 localhost [] do_path_lookup+0x64/0x180 Jul 28 22:30:17 localhost [] getname+0xb3/0xe0 Jul 28 22:30:17 localhost [] __user_walk_fd+0x3b/0x60 Jul 28 22:30:17 localhost [] vfs_stat_fd+0x22/0x60 Jul 28 22:30:17 localhost [] sys_stat64+0xf/0x30 Jul 28 22:30:17 localhost [] syscall_call+0x7/0xb Jul 28 22:30:17 localhost === Jul 28 22:30:17 localhost Code: ea 31 d0 8b 15 b4 9e 3b c0 89 7c 24 18 21 d0 8b 15 bc 9e 3b c0 8b 34 82 85 f6 75 0f eb 54 8d b4 26 00 00 00 00 85 db 89 de 74 47 <8b> 1e 8d 74 26 00 8d 6e f4 8b 04 24 3b 45 18 75 e9 8b 44 24 0c Jul 28 22:30:17 localhost EIP: [] __d_lookup+0x66/0xe0 SS:ESP 0068:f7a 87dc0 Jul 28 22:30:17 localhost BUG: unable to handle kernel paging request at virtual address 00ff Jul 28 22:30:17 localhost printing eip: Jul 28 22:30:17 localhost c016c506 Jul 28 22:30:17 localhost *pde = Jul 28 22:30:17 localhost Oops: [#2] Jul 28 22:30:17 localhost Modules linked in: parport_pc parport uhci_hcd usbcore Jul 28 22:30:17 localhost CPU:0 Jul 28 22:30:17 localhost EIP:0060:[]Not tainted VLI Jul 28 22:30:17 localhost EFLAGS: 00010206 (2.6.22.1 #2) Jul 28 22:30:17 localhost EIP is at __d_lookup+0x66/0xe0 Jul 28 22:30:17 localhost eax: 00284951 ebx: 00ff ecx: 0011 edx: c 1819180 Jul 28 22:30:17 localhost esi: 00ff edi: f7bb1005 ebp: efa49688 esp: e 4837dc0 Jul 28 22:30:17 localhost ds: 007b es: 007b fs: gs: 0033 ss: 0068 Jul 28 22:30:17 localhost Process utempter (pid: 4308, ti=e4836000 task=f7c34090 task.ti=e4836000) Jul 28 22:30:17 localhost Stack: 00284951 e4837e1c e4837e3c f7d3a5ec e4837e48 00 03 f7bb1005 f7bb1009 Jul 28 22:30:17 localhost e4837e3c e4837f04 e4837e3c c0162f48 e4837e48 c18e6e20 c0158690 c1b54cc0 I have absolutely no idea what triggers this crash. cheers, herp - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: Linus 2.6.23-rc1
Op Saturday 28 July 2007, schreef Linus Torvalds: > > Compare this to SD for a while. Ponder. > > Linus Your point here seems to be: this is how it went, and it was right. Ok, got that. Yet, Con walked away (and not just over SD). Seeing Con go, I wonder how many did leave without this splash. How many didn't even get involved at all??? Did THAT have to happen? I don't blame you for it - the point is that somewhere in the process a valuable kernel hacker went away. How and why? And is it due to a deeper problem? -- Disclaimer: Alles wat ik doe denk en zeg is gebaseerd op het wereldbeeld wat ik nu heb. Ik ben niet verantwoordelijk voor wijzigingen van de wereld, of het beeld wat ik daarvan heb, noch voor de daaruit voortvloeiende gedragingen van mezelf. Alles wat ik zeg is aardig bedoeld, tenzij expliciet vermeld. Please avoid sending me Word or PowerPoint attachments. See http://www.gnu.org/philosophy/no-word-attachments.html A: Because it destroys the flow of the conversation Q: Why is top-posting bad? signature.asc Description: This is a digitally signed message part.
Re: [RFC] scheduler: improve SMP fairness in CFS
On Fri, 27 Jul 2007, Chris Snook wrote: I don't think that achieving a constant error bound is always a good thing. We all know that fairness has overhead. If I have 3 threads and 2 processors, and I have a choice between fairly giving each thread 1.0 billion cycles during the next second, or unfairly giving two of them 1.1 billion cycles and giving the other 0.9 billion cycles, then we can have a useful discussion about where we want to draw the line on the fairness/performance tradeoff. On the other hand, if we can give two of them 1.1 billion cycles and still give the other one 1.0 billion cycles, it's madness to waste those 0.2 billion cycles just to avoid user jealousy. The more complex the memory topology of a system, the more "free" cycles you'll get by tolerating short-term unfairness. As a crude heuristic, scaling some fairly low tolerance by log2(NCPUS) seems appropriate, but eventually we should take the boot-time computed migration costs into consideration. I think we are in agreement. To avoid confusion, I think we should be more precise on what fairness means. Lag (i.e., ideal fair time - actual service time) is the commonly used metric for fairness. The definition is that a scheduler is proportionally fair if for any task in any time interval, the task's lag is bounded by a constant (note it's in terms of absolute time). The knob here is this constant and can help trade off performance and fairness. The reason for a constant bound is that we want consistent fairness properties regardless of the number of tasks. For example, we don't want the system to be much less fair as the number of tasks increases. With DWRR, the lag bound is the max weight of currently running tasks, multiplied by sysctl_base_round_slice. So if all tasks are of nice 0, i.e., weight 1, and sysctl_base_round_slice equals 30 ms, then we are guaranteed each task is at most 30ms off of the ideal case. This is a useful property. Just like what you mentioned about the migration cost, this property allows the scheduler or user to accurately reason about the tradeoffs. If we want to trade fairness for performance, we can increase sysctl_base_round_slice to, say, 100ms; doing so we also know accurately the worst impact it has on fairness. Adding system calls, while great for research, is not something which is done lightly in the published kernel. If we're going to implement a user interface beyond simply interpreting existing priorities more precisely, it would be nice if this was part of a framework with a broader vision, such as a scheduler economy. Agreed. I've seen papers on scheduler economy but not familiar enough to comment on it. Scheduling Algorithm: The scheduler keeps a set data structures, called Trio groups, to maintain the weight or reservation of each thread group (including one or more threads) and the local weight of each member thread. When scheduling a thread, it consults these data structures and computes (in constant time) a system-wide weight for the thread that represents an equivalent CPU share. Consequently, the scheduling algorithm, DWRR, operates solely based on the system-wide weight (or weight for short, hereafter) of each thread. Having a flat space of system-wide weights for individual threads avoids performing seperate scheduling at each level of the group hierarchy and thus greatly simplies the implementation for group scheduling. Implementing a flat weight space efficiently is nontrivial. I'm curious to see how you reworked the original patch without global locking. I simply removed the locking and changed a little bit in idle_balance(). The lock was trying to avoid a thread from reading or writing the global highest round value while another thread is writing to it. For writes, it's simple to ensure without locking only one write takes effect when multiple writes are concurrent. For the case that there's one write going on and multiple threads read, without locking, the only problem is that a reader may read a stale value and thus thinks the current highest round is X while it's actually X + 1. The end effect is that a thread can be at most two rounds behind the highest round. This changes DWRR's lag bound to 2 * (max weight of current tasks) * sysctl_base_round_slice, which is still constant. I had a feeling this patch was originally designed for the O(1) scheduler, and this is why. The old scheduler had expired arrays, so adding a round-expired array wasn't a radical departure from the design. CFS does not have an expired rbtree, so adding one *is* a radical departure from the design. I think we can implement DWRR or something very similar without using this implementation method. Since we've already got a tree of queued tasks, it might be easiest to basically break off one subtree (usually just one task, but not necessarily) and migrate it to a less loaded tree whenever we can reduce the difference
Re: Linus 2.6.23-rc1
On Jul 28 2007 10:50, Linus Torvalds wrote: >On Sat, 28 Jul 2007, Kasper Sandberg wrote: >> >> First off, i've personally run tests on many more machines than my own, >> i've had lots of people try on their machines, and i've seen totally >> unrelated posts to lkml, plus i've seen the experiences people are >> writing about on IRC. Frankly, im not just thinking of myself. > >Ok, good. Has anybody tried to figure out why 3D games seem to be such a >special case? Is it specific to 3D? I would not think so. dosbox, bochs should have the same issue. Games with "a lot of motion" usually implement their event handling and screen drawing in a busy loop to get the maximum possible frame rate. Usually, only the GL thread would need to run at full power, and reducing the input subsystem to a simple event-based loop (for example reading a pipe in blocking mode). This could IMO makes games a bit more responsive. However, most games combine the input subsystem and graphics output in one thread. Due to the way CFS works, it may mean that processes get scheduled too fair, though I'd suspect that a GL busy loop has no interactivity bonus at all anyway in the old scheduler or SD. I/O is also something that can hurt games in their framerate and/or handling (something the user cares most about). Since I have not tried 2.6.23-rc yet, I can only speak for the old scheduler. I have always turned cron off so that updatedb does not run, because it makes games sluggish for some reason, even though updatedb (or subordinate processes) don't take a lot of CPU time according to `top`. What's more, running BOINC in the background (nice 20) while running unreal (nice 0), everything is ok. (But not if BOINC is at nice 0). Time to investigate... Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Volanomark slows by 80% under CFS
> > Volanomark runs better > > and is only 40% (instead of 80%) down from old scheduler > > without CFS. > 40 or 80 % is still a huge regression. > Dmitry Adamushko Can anyone explain precisely what Volanomark is doing? If it's something dumb like "looping on sched_yield until the 'right' thread runs and finishes what we're waiting for" then I think any regression can be ignored. This applies if and only if CFS' sched_yield behavior is sane and Volano's is insane. A sane sched_yield implementation must do two things: 1) Reward processes that actually do yield most of their CPU time to another process. 2) Make an effort to run every ready-to-run process at the same or higher static priority level before re-scheduling this process. (That won't always be possible due to SMP issues, but a reasonable effort is needed.) If CFS is doing these two things, and Volanomark is looping on sched_yield until the 'right thread' runs, then CFS is doing the right and Volanomark isn't. Volanomark deserves to lose. If CFS binds processes to processors more tightly and thus sched_yield can't yield to a process that was planned to run on another CPU in the future, that would be a legitimate complaint about CFS. DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NETPOLL=y , NETDEVICES=n compile error ( Re: 2.6.23-rc1-mm1 )
Andrew Morton wrote: > On Sat, 28 Jul 2007 17:44:45 +0200 Gabriel C <[EMAIL PROTECTED]> wrote: > >> Hi, >> >> I got this compile error with a randconfig ( >> http://194.231.229.228/MM/randconfig-auto-82.broken.netpoll.c ). >> >> ... >> >> net/core/netpoll.c: In function 'netpoll_poll': >> net/core/netpoll.c:155: error: 'struct net_device' has no member named >> 'poll_controller' >> net/core/netpoll.c:159: error: 'struct net_device' has no member named >> 'poll_controller' >> net/core/netpoll.c: In function 'netpoll_setup': >> net/core/netpoll.c:670: error: 'struct net_device' has no member named >> 'poll_controller' >> make[2]: *** [net/core/netpoll.o] Error 1 >> make[1]: *** [net/core] Error 2 >> make: *** [net] Error 2 >> make: *** Waiting for unfinished jobs >> >> ... >> >> >> I think is because KGDBOE selects just NETPOLL. >> > > Looks like it. > > Select went and selected NETPOLL and NETPOLL_TRAP but things like > CONFIG_NETDEVICES and CONFIG_NET_POLL_CONTROLLER remain unset. `select' > remains evil. > > Something like this.. > > --- a/lib/Kconfig.kgdb~kgdb-kconfig-fix > +++ a/lib/Kconfig.kgdb > @@ -175,8 +175,7 @@ endchoice > config KGDBOE > tristate "KGDB: On ethernet" if !KGDBOE_NOMODULE > depends on m && KGDB > - select NETPOLL > - select NETPOLL_TRAP > + depends on NETPOLL_TRAP && NET_POLL_CONTROLLER > help > Uses the NETPOLL API to communicate with the host GDB via UDP. > In order for this to work, the ethernet interface specified must > _ > > That doesn't fix it. With that patch an 'make oldconfig' all NETPOLL stuff gone and we end up with : ... drivers/built-in.o: In function `option_setup': /work/crazy/linux-git/MM/linux-2.6.23-rc1/drivers/net/kgdboe.c:160: undefined reference to `netpoll_parse_options' drivers/built-in.o: In function `configure_kgdboe': /work/crazy/linux-git/MM/linux-2.6.23-rc1/drivers/net/kgdboe.c:183: undefined reference to `netpoll_setup' /work/crazy/linux-git/MM/linux-2.6.23-rc1/drivers/net/kgdboe.c:189: undefined reference to `netpoll_cleanup' drivers/built-in.o: In function `eth_post_exception_handler': /work/crazy/linux-git/MM/linux-2.6.23-rc1/drivers/net/kgdboe.c:119: undefined reference to `netpoll_set_trap' drivers/built-in.o: In function `eth_pre_exception_handler': /work/crazy/linux-git/MM/linux-2.6.23-rc1/drivers/net/kgdboe.c:111: undefined reference to `netpoll_set_trap' drivers/built-in.o: In function `eth_flush_buf': /work/crazy/linux-git/MM/linux-2.6.23-rc1/drivers/net/kgdboe.c:138: undefined reference to `netpoll_send_udp' drivers/built-in.o: In function `eth_get_char': /work/crazy/linux-git/MM/linux-2.6.23-rc1/drivers/net/kgdboe.c:127: undefined reference to `netpoll_poll' drivers/built-in.o: In function `cleanup_kgdboe': /work/crazy/linux-git/MM/linux-2.6.23-rc1/drivers/net/kgdboe.c:217: undefined reference to `netpoll_cleanup' make: *** [.tmp_vmlinux1] Error 1 ... If I get that right select is needed here because all NETPOLL{_*} depends on if NETDEVICES && if NET_ETHERNET. Also doing ... select NETPOLL_TRAP select NETPOLL select NET_POLL_CONTROLLER ... makes the driver happy and everything compiles fine. I think there may be a logical issue ( again if I got it right ). We need some ethernet card to work with kgdboe right ? but we don't have any if !NETDEVICES && !NET_ETHERNET. So maybe some ' depends on ... && NETDEVICES!=n && NET_ETHERNET!=n ' is needed too ? ( really sory if I said something stupid these Kconfig depends are not really easy to figure for me ) Gabriel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CONFIG_SUSPEND? (was: Re: [GIT PATCH] ACPI patches for 2.6.23-rc1)
On Sat, 28 Jul 2007, Rafael J. Wysocki wrote: > > OK, I'll prepare a patch to introduce CONFIG_SUSPEND, but that will require > quite a bit of (compilation) testing on different architectures. Sure. I'm not too worried, the fallout should be of the trivial kind. Also, mind basing it on the (independent) cleanups that Adrian already sent out. This is all intertwined.. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: Linus 2.6.23-rc1
On Sat, 28 Jul 2007, jos poortvliet wrote: > > Actually, the tag you were looking for was "" > http://osnews.com/permalink.php?news_id=18350_id=259044 > > Now I wonder. Apparently, one person complaining about SD was reason to keep > it out http://osnews.com/permalink.php?news_id=18350_id=258997 > > Will this first post stop CFS from entering the kernel? You seem to be not understanding the argument. It wasn't about "one person complaining". Of *course* people will complain. That always happens, and sometimes with totally bogus complaints (the most common being "I'm not used to it"). The problem was the reaction to complaints. Ingo got lots of complaints too. He was very responsive to them (which is not something surprising - he's been doing this a long time), and while some of the tangents he went off on were definitely bogus (the whole renicing thing), they were still useful as part of the discussion. And Ingo got other - totally unrelated - developers involved too, ie the group fairness logic came from Vatsa. And he ended up supporting not just scheduler people, but also talking to the block layer people ahout the scheduler timer usage as a fast clock for block requests etc. And you have to realize that to me, as the top-level maintainer and one who seldom actually does big coding things, but just ends up making sure that people work with others, and fix the problems that crop up, *that* kind of behaviour is much much MUCH more important than the code itself. Can you see that? Can you see how big of a difference those whole approaches make? > Now I'll try to be a bit more constructive. I hope your benevolent > dictatorship allows self reflection. Nobody is very good at self-reflection, I'm afraid. > Sure, the difference in behaviour (not in code) between SD and CFS is small, > and for me it doesn't matter. I'm fine with CFS in the kernel, it's a huge > improvement over the previous one. But why, while there was a seemingly good > alternative, did THAT one stay in that long? And this argument goes for more > code 'out there', btw. Actually, nobody pushed SD to me, and neither Con nor anybody else tried to get me to merge it until some time in March of this year, I think. Do you think I go trolling for code to merge? No. I actually _require_ that people send it to me, and that I also get the feeling that people are asking for it! In other words, my job is not to "merge code" (even though I sometimes describe it that way), my job is actually largely to "say no". You shouldn't see me as the person who goes out and tries to get everything together - quite the reverse. My job is to say "too late for the merge window", or "too experimental", or "you need to show numbers" or "are there going to be any _users_ for this"? > Some things get into the kernel, other don't. Some get in too soon, others > too late. Sure. But shouldn't we try to improve this process, instead of > saying 'it is what it is, get over it'? Umm. The absolute *last* thing we want to do is to merge earlier. In fact, one of our biggest problems is that people send half-cooked stuff to me (and even more so, to Andrew). So in this case, if you've been on the CK mailing list, ask yourself: why wasn't parts of it pushed up to the standard kernel? Asking "why didn't Linus take it earlier" is exactly the wrong thing to do, since nobody even _asked_ me to. I never _ever_ got a patch saying "please merge this". Seriously. (Btw, on that note: please don't send me patches saying "please merge this". I want more than just that. I want an explanation, and I want it to be in many small pieces, and I want to feel like it got tested and is likely to be an obvious improvement). So now look at what happened to CFS: - Ingo pushed it, and has been a maintainer of the area and shown himself over years to be able to work with others and react to reports of problems. - It was fairly obviously an improvement over the previous status quo (although I expect that there will be regressions - almost nothing is ever a _pure_ improvement, if it's in any way non-trivial) - Even so, I asked for (and got) a series of independent patches. Compare this to SD for a while. Ponder. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] let SUSPEND select HOTPLUG_CPU
On Saturday, 28 July 2007 00:25, Adrian Bunk wrote: > On Thu, Jul 26, 2007 at 01:55:18PM -0700, Linus Torvalds wrote: > > > > > > On Thu, 26 Jul 2007, Rafael J. Wysocki wrote: > > > > > > My point is we have ACPI dependent on PM, so if you want ACPI, you end > > > up with all of the STR stuff built in, which is what you don't like (if I > > > understand that correctly). If we have CONFIG_SUSPEND, you'll be able to > > > choose ACPI alone. :-) > > > > Good point. > > > > Anyway, I think the ACPI problem really is as trivial as the following > > three-liner removal fix. If the user doesn't want suspend, ACPI shouldn't > > force it on him. > > > > A nicer fix might be to also make some of the ACPI helper routines depend > > on whether they are needed or not (which in turn will depend on whether > > suspend support has been compiled into the kernel), but quite frankly, > > that's secondary at least for me. > > > > So if we have a few ACPI routines that will never get called (because we > > don't even enable the interfaces that would *cause* them to be called), I > > don't think that's a huge problem. It's a beauty wart, but nobody really > > cares (and it's even something that we could get the compiler to optimize > > away for us if we really cared). > > > > Linus > > > > --- > > Don't force-enable suspend/hibernate support just for ACPI > > > > It's a totally independent decision for the user whether he wants > > suspend and/or hibernation support, and ACPI shouldn't care. > > > > Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]> > > --- > > drivers/acpi/Kconfig |3 --- > > 1 files changed, 0 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig > > index 251344c..22b401b 100644 > > --- a/drivers/acpi/Kconfig > > +++ b/drivers/acpi/Kconfig > > @@ -11,9 +11,6 @@ menuconfig ACPI > > depends on PCI > > depends on PM > > select PNP > > - # for sleep > > - select HOTPLUG_CPU if X86 && SMP > > - select SUSPEND_SMP if X86 && SMP > > default y > > ---help--- > > Advanced Configuration and Power Interface (ACPI) support for > > The dependency of SUSPEND_SMP on HOTPLUG_CPU is quite unintuitive, so > what about something like the patch below? > > This should address a main issue behind Len's patch. > > cu > Adrian > > > <-- snip --> > > > An implementation detail of the suspend code that is not intuitive for > the user is the HOTPLUG_CPU dependency of SOFTWARE_SUSPEND if SMP. > > This patch changes SOFTWARE_SUSPEND if SMP to select HOTPLUG_CPU instead > of depending on it. > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> > > --- > > kernel/power/Kconfig | 20 ++-- > 1 file changed, 14 insertions(+), 6 deletions(-) > > --- a/kernel/power/Kconfig > +++ b/kernel/power/Kconfig > @@ -72,9 +72,22 @@ config PM_TRACE > CAUTION: this option will cause your machine's real-time clock to be > set to an invalid time after a resume. > > +config SUSPEND_SMP_POSSIBLE > + bool > + depends on (X86 && !X86_VOYAGER) || (PPC64 && (PPC_PSERIES || PPC_PMAC)) > + depends on SMP > + default y > + > +config SUSPEND_SMP > + bool > + depends on SUSPEND_SMP_POSSIBLE && SOFTWARE_SUSPEND > + select HOTPLUG_CPU > + default y That should not depend on SOFTWARE_SUSPEND (it's equivalent to HIBERNATION). Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CONFIG_SUSPEND? (was: Re: [GIT PATCH] ACPI patches for 2.6.23-rc1)
On Saturday, 28 July 2007 18:55, Linus Torvalds wrote: > > On Sat, 28 Jul 2007, Linus Torvalds wrote: > > > > And it's the *top*level* code that selects HOTPLUG_CPU. Through > > SUSPEND_SMP (which will select HOTPLUG_CPU) and SOFTWARE_SUSPEND. > > In other words, the problem seems to be that > > kernel/power/main.c: > suspend_devices_and_enter() > > does the proper "disable/enable_nonboot_cpus()", but it does so without > having enabled CPU hotplug. > > And you seem to think that it's ACPI that should enable the hotplug, even > though the code that actually needs it is _outside_ ACPI. And I think > that's wrong, and that this is a bug. > > So I think the real issue is that we allow that > "suspend_devices_and_enter()" code to be compiled without HOTPLUG_CPU in > the first place. It's not supposed to work that way. > > Of course, it may well be that other architectures can happily suspend > even with multiple CPU's active, which may be the cause of this mess. But > I really think it shouldn't be ACPI that has to select the CPU hotplug, > since it's not ACPI that _uses_ it in the first place. > > Rafael: making a config option for STR (the same way we have a config > option for hibernate), and just not allowing it on SMP without HOTPLUG_CPU > seems to be the right thing. Len is right in that we do insane things > right now (trying to STR with multiple CPU's still active), and I just > don't think he's the one that should work around it! Well, I agree and that's why I asked. :-) OK, I'll prepare a patch to introduce CONFIG_SUSPEND, but that will require quite a bit of (compilation) testing on different architectures. Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fix IDE legacy mode resource
Yoichi Yuasa wrote: Hi, I got the following error on MIPS Cobalt. MIPS Cobalt has the 0x1000 offset between resource and bus region. PCI: Unable to reserve I/O region #1:[EMAIL PROTECTED] for device :00:09.1 pata_via :00:09.1: failed to request/iomap BARs for port 0 (errno=-16) PCI: Unable to reserve I/O region #3:[EMAIL PROTECTED] for device :00:09.1 pata_via :00:09.1: failed to request/iomap BARs for port 1 (errno=-16) pata_via :00:09.1: no available native port At this point, these resources should be the bus regions. Signed-off-by: Yoichi Yuasa <[EMAIL PROTECTED]> I'm not sure I understand what's going on here... could you or someone provide additional explanation as to why this is a fix? Thanks, Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linus 2.6.23-rc1
On Sat, 2007-07-28 at 10:50 -0700, Linus Torvalds wrote: > > On Sat, 28 Jul 2007, Kasper Sandberg wrote: > > > > First off, i've personally run tests on many more machines than my own, > > i've had lots of people try on their machines, and i've seen totally > > unrelated posts to lkml, plus i've seen the experiences people are > > writing about on IRC. Frankly, im not just thinking of myself. > > Ok, good. Has anybody tried to figure out why 3D games seem to be such a > special case? > > I know Ingo looked at it, and seemed to think that he found and fixed > something. But it sounds like it's worth a lot more discussion. > Yes, but the various patches i've recieved seems to not solve it, it simply changed the load at which CFS seemed to perform well. On irc there has been wild speculation as to whether its the sched_yield() stuff in most 3d drivers, but my tests with stubbing it out, and altering behavior has not changed anything. > > Okay, i wasnt going to ask, but ill do it anyway, did you even read the > > threads about SD? > > I don't _ever_ go on specialty mailing lists. I don't read -mm, and I > don't read the -fs mailing lists. I don't think they are interesting. > > And I tried to explain why: people who concentrate on one thing tend to > become this self-selecting group that never looks at anything else, and > then rejects outside input from people who hadn't become part of the "mind > meld". > > That's what I think I saw - I saw the reactions from where external people > were talking and cc'ing me. > > And yes, it's quite possible that I also got a very one-sided picture of > it. I'm not disputing that. Con was also ill for a rather critical period, > which was certainly not helping it all. > > > Con was extremely polite to everyone, and he did work > > with a multitude of people, you seem to be totally deadlocked into the > > ONE incident with a person that was unhappy with SD, simply for being a > > fair scheduler. > > Hey, maybe that one incident just ended up being a rather big portion of > what I saw. Too bad. That said, the end result (Con's public gripes about > other kernel developers) mostly reinforced my opinion that I did the right > choice. > > But maybe you can show a better side of it all. I don't think _any_ > scheduler is perfect, and almost all of the time, the RightAnswer(tm) ends > up being not "one or the other", but "somewhere in between". > > It's not like we've come to the end of the road: the baseline has just > improved. If you guys can show that SD actually is better at some loads, > without penalizing others, we can (and will) revisit this issue. well, as far as my tests show, the only real difference between SD and CFS in terms of performance, is 3d, where both will deliver basically the same FPS in a given application, SD does it smooth, which is the best way to explain it, what happens with CFS, as i experience it, is that it seems to burstly allocate ressources. > > So what you should take away from this is that: from what I saw over the > last couple of months, it really wasn't much of a decision. The difference > in how Ingo and Con reacted to peoples reports was pretty stark. And no, I > haven't followed the ck mailing list, and so yes, I obviously did get just > a part of the picture, but the part I got was pretty damn unambiguous. I really think you should try read the SD and RSDL threads on lkml again, the only place where con havent been extremely fourthcoming was deep in the thread where Mike was unhappy with SD not giving X more prioity than fairness dictates.. > > But at the same time, no technical decision is ever written in stone. It's > all a balancing act. I've replaced the scheduler before, I'm 100% sure > we'll replace it again. Schedulers are actually not at all that important > in the end: they are a very very small detail in the kernel. > > Linus > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: Linus 2.6.23-rc1
On Sat, 28 Jul 2007, Jan Engelhardt wrote: > > You cannot please everybody in the scheduler question, that is clear, > then why not offer dedicated scheduling alternatives (plugsched comes to mind) > and let them choose what pleases them most, and handles their workload best? This is one approach, but it's actually one that I personally think is often the worst possible choice. Why? Because it ends up meaning that you never get the cross-pollination from different approaches (they stay separate "modes"), and it's also usually really bad for users in that it forces the user to make some particular choice that the user is usually not even aware of. So I personally think that it's much better to find a setup that works "well enough" for people, without having modal behaviour. People complain and gripe now, but what people seem to be missing is that it's a journey, not an end-of-the-line destination. We haven't had a single release kernel with the new scheduler yet, so the only people who have tried it are either (a) interested in schedulers in the first place (which I think is *not* a good subset, because they have very specific expectations of what is right and what is wrong, and they come into the whole thing with that mental baggage) (b) people who test -rc1 kernels (I love you guys, but sadly, you're not nearly as common as I'd like ;) so the fact is, we'll find out more information about where CFS falls down, and where it does well, and we'll be able to *fix* it and tweak it. In contrast, if you go for a modal approach, you tend to always fixate those two modes forever, and you'll never get something that works well: people have to switch modes when they switch workloads. [ This, btw, has nothing to do with schedulers per se. We have had these exact same issues in the memory management too - which is a lot more complex than scheduling, btw. The whole page replacement algorithm is something where you could easily have "specialized" algorithms in order to work really well under certain loads, but exactly as with scheduling, I will argue that it's a lot better to be "good across a wide swath of loads" than to try to be "perfect in one particular modal setup". ] This is also, btw, why I think that people who argue for splitting desktop kernels from server kernels are total morons, and only show that they don't know what the hell they are talking about. The fact is, the work we've done on server loads has improved the desktop experience _immensely_, with all the scalability work (or the work on large memory configurations, etc etc) that went on there, and that used to be totally irrelevant for the desktop. And btw, the same is very much true in reverse: a lot of the stuff that was done for desktop reasons (hotplug etc) has been a _huge_ boon for the server side, and while there were certainly issues that had to be resolved (the sysfs stuff so central to the hotplug model used tons of memory when you had ten thousand disks, and server people were sometimes really unhappy), a lot of the big improvements actually happen because somethng totally _unrelated_ needed them, and then it just turns out that it's good for the desktop too, even if it started out as a server thing or vice versa. This is why the whole "modal" mindset is stupid. It basically freezes a choice that shouldn't be frozen. It sets up an artificial barrier between two kinds of uses (whether they be about "server" vs "desktop" or "3D gaming" vs "audio processing", or anything else), and that frozen choice actually ends up being a barrier to development in the long run. So "modal" things are good for fixing behaviour in the short run. But they are a total disaster in the long run, and even in the short run they tend to have problems (simply because there will be cases that straddle the line, and show some of _both_ issues, and now *neither* mode is the right one) Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: Linus 2.6.23-rc1
Op Saturday 28 July 2007, schreef Linus Torvalds: > On Sat, 28 Jul 2007, Michael Chang wrote: > > I do recall there is one issue on which Con wouldn't budge -- anything > > that involved boosting certain kinds of processes in the kernel. > > I did that myself, so that's a non-issue. > > No. The complaints were about the CK scheduler not being as responsive > under load as even the _old_ scheduler was. I don't know why people ignore > this fact. It was a long thread back in March or April, and I'm pretty > sure the CK mailing list was cc'd. Of course it wasn't. The speed of tasks slows proportionally with the amount of system usage. That's the whole point, and CFS can't fix that either, can it? > Sure, most people don't actually have load-averages above ten etc, but > it's important to do those well _too_. > > Linus http://osnews.com/permalink.php?news_id=18350_id=259044 Now I wonder. Apparently, one person complaining about SD was reason to keep it out http://osnews.com/permalink.php?news_id=18350_id=258997 Will this first post stop CFS from entering the kernel? Now I'll try to be a bit more constructive. I hope your benevolent dictatorship allows self reflection. Sure, the difference in behaviour (not in code) between SD and CFS is small, and for me it doesn't matter. I'm fine with CFS in the kernel, it's a huge improvement over the previous one. But why, while there was a seemingly good alternative, did THAT one stay in that long? And this argument goes for more code 'out there', btw. Some things get into the kernel, other don't. Some get in too soon, others too late. Sure. But shouldn't we try to improve this process, instead of saying 'it is what it is, get over it'? For me, that's the purpose of this whole discussion. We're losing valuable code and contributors, yet at the same time code which isn't mature yet enters the kernel. Acknowledging there is a problem is the first step in solving it. Of course, I don't have answers - but I do feel strongly that you think there is no issue. Is there, or isn't there? And if there is, what do you plan to do about it? Your influence on the behaviour of the people around you, your 'lieutenants', is huge. Larger than you might think. And in many cases, ppl following someone behave more extreme. That's a big reason why the LKML isn't very polite nor inviting (mind you, I don't think that's necessarily a bad thing, that's up to you to decide). You might want to think about ways to improve the whole process. Again, I'm no Linus, it's your call. And you can make a big difference, I'm sure. Greetings, Jos signature.asc Description: This is a digitally signed message part.
Re: [PATCH] Framebuffer: Fix 16bpp colour output in Dreamcast pvr2fb
On 28/07/07, Ondrej Zajicek <[EMAIL PROTECTED]> wrote: > On Sat, Jul 28, 2007 at 03:51:38PM +0100, Adrian McMenamin wrote: > > Tony, > > > > This patch - on top of your others - fixes the colour output for 16bpp > > RGB565 output in the Dreamcast - it was a simple out by one error in > > the bit shift. > > > @@ -330,27 +331,28 @@ static int pvr2fb_setcolreg(unsigned int regno, > > unsigned int red, > > case 16: /* RGB 565 */ > > tmp = (red & 0xf800) | > > ((green & 0xfc00) >> 5) | > > - ((blue & 0xf800) >> 11); > > + ((blue & 0xf800) >> 10); > > This mixes lsb of green with msb of blue. If you want RGB 565, > then >> 11 is correct. If you want RGB 555, green should > be anded with 0xf800. > You are, of course, quite right, which makes it all the more the strange that it appeared to fix the problem. Back to the drawing board then. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.23-rc1, KVM-AMD problem
Hi, I'm getting periodic oopses running KVM-33 on 2.6.23-rc1. Here is a digital photo of the oops. Alarmingly, a lot of the time it triple faults the machine and I don't get a chance to grab it. This time I was lucky, though. http://devzero.co.uk/~alistair/kvm-2.6.23-rc1.jpg Unfortunately, some of the oops text scrolled out of the screen. I will endeavour to reproduce the bug over serial console, but I can make no guarantees. The CPU is an AMD X2 BE-2350, chipset is AMD 690G. -- Cheers, Alistair. 137/1 Warrender Park Road, Edinburgh, UK. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linus 2.6.23-rc1
On Sat, 28 Jul 2007, Kasper Sandberg wrote: > > First off, i've personally run tests on many more machines than my own, > i've had lots of people try on their machines, and i've seen totally > unrelated posts to lkml, plus i've seen the experiences people are > writing about on IRC. Frankly, im not just thinking of myself. Ok, good. Has anybody tried to figure out why 3D games seem to be such a special case? I know Ingo looked at it, and seemed to think that he found and fixed something. But it sounds like it's worth a lot more discussion. > Okay, i wasnt going to ask, but ill do it anyway, did you even read the > threads about SD? I don't _ever_ go on specialty mailing lists. I don't read -mm, and I don't read the -fs mailing lists. I don't think they are interesting. And I tried to explain why: people who concentrate on one thing tend to become this self-selecting group that never looks at anything else, and then rejects outside input from people who hadn't become part of the "mind meld". That's what I think I saw - I saw the reactions from where external people were talking and cc'ing me. And yes, it's quite possible that I also got a very one-sided picture of it. I'm not disputing that. Con was also ill for a rather critical period, which was certainly not helping it all. > Con was extremely polite to everyone, and he did work > with a multitude of people, you seem to be totally deadlocked into the > ONE incident with a person that was unhappy with SD, simply for being a > fair scheduler. Hey, maybe that one incident just ended up being a rather big portion of what I saw. Too bad. That said, the end result (Con's public gripes about other kernel developers) mostly reinforced my opinion that I did the right choice. But maybe you can show a better side of it all. I don't think _any_ scheduler is perfect, and almost all of the time, the RightAnswer(tm) ends up being not "one or the other", but "somewhere in between". It's not like we've come to the end of the road: the baseline has just improved. If you guys can show that SD actually is better at some loads, without penalizing others, we can (and will) revisit this issue. So what you should take away from this is that: from what I saw over the last couple of months, it really wasn't much of a decision. The difference in how Ingo and Con reacted to peoples reports was pretty stark. And no, I haven't followed the ck mailing list, and so yes, I obviously did get just a part of the picture, but the part I got was pretty damn unambiguous. But at the same time, no technical decision is ever written in stone. It's all a balancing act. I've replaced the scheduler before, I'm 100% sure we'll replace it again. Schedulers are actually not at all that important in the end: they are a very very small detail in the kernel. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/