Re: [PATCH][RFC][POWERPC] i2c: adds support for i2c bus on 8xx
On Fri, Apr 20, 2007 at 08:27:14AM +0400, Vitaly Bordug wrote: diff --git a/arch/powerpc/platforms/8xx/mpc885ads_setup.c b/arch/powerpc/platforms/8xx/mpc885ads_setup.c index 9bd81c7..d32e066 100644 --- a/arch/powerpc/platforms/8xx/mpc885ads_setup.c +++ b/arch/powerpc/platforms/8xx/mpc885ads_setup.c @@ -51,6 +51,7 @@ static void init_smc1_uart_ioports(struc static void init_smc2_uart_ioports(struct fs_uart_platform_info* fpi); static void init_scc3_ioports(struct fs_platform_info* ptr); static void init_irda_ioports(void); +static void init_i2c_ioports(void); void __init mpc885ads_board_setup(void) { @@ -120,6 +121,10 @@ #endif #ifdef CONFIG_8XX_SIR init_irda_ioports(); #endif + +#ifdef CONFIG_I2C_RPXLITE + init_i2c_ioports(); +#endif Does it hurt to always do it, even when the driver is not enabled? THat'd do away with an ifdef. Also, if you move the static function up, you don't need a prototype. That goes for other stuff in this file too. } @@ -361,6 +366,15 @@ static void init_irda_ioports() immr_unmap(cp); } +static void init_i2c_ioports() +{ + cpm8xx_t *cp = (cpm8xx_t *)immr_map(im_cpm); + +setbits32(cp-cp_pbpar, 0x0030); +setbits32(cp-cp_pbdir, 0x0030); +setbits16(cp-cp_pbodr, 0x0030); +} Looks like you moved this out of the driver and into the platform code. What happens to other platforms where it's used? + int platform_device_skip(const char *model, int id) { #ifdef CONFIG_MPC8xx_SECOND_ETH_SCC3 diff --git a/arch/powerpc/sysdev/fsl_soc.c b/arch/powerpc/sysdev/fsl_soc.c index 419b688..7ecd537 100644 --- a/arch/powerpc/sysdev/fsl_soc.c +++ b/arch/powerpc/sysdev/fsl_soc.c @@ -331,7 +331,7 @@ static int __init fsl_i2c_of_init(void) for (np = NULL, i = 0; (np = of_find_compatible_node(np, i2c, fsl-i2c)) != NULL; i++) { - struct resource r[2]; + struct resource r[3]; Why? No code that uses it has been changed. Is it a bugfix? struct fsl_i2c_platform_data i2c_data; const unsigned char *flags = NULL; @@ -1215,4 +1215,63 @@ err: arch_initcall(fs_irda_of_init); +static const char *i2c_regs = regs; +static const char *i2c_pram = pram; +static const char *i2c_irq = interrupt; + +static int __init fsl_i2c_cpm_of_init(void) +{ + struct device_node *np; + unsigned int i; + struct platform_device *i2c_dev; + int ret; + + for (np = NULL, i = 0; + (np = of_find_compatible_node(np, i2c, fsl-i2c-cpm)) != NULL; + i++) { + struct resource r[3]; + struct fsl_i2c_platform_data i2c_data; + + memset(r, 0, sizeof(r)); + memset(i2c_data, 0, sizeof(i2c_data)); + + ret = of_address_to_resource(np, 0, r[0]); + if (ret) + goto err; + r[0].name = i2c_regs; + + ret = of_address_to_resource(np, 1, r[1]); + if (ret) + goto err; + r[1].name = i2c_pram; + + r[2].start = r[2].end = irq_of_parse_and_map(np, 0); + r[2].flags = IORESOURCE_IRQ; + r[2].name = i2c_irq; + + i2c_dev = platform_device_register_simple(fsl-i2c-cpm, i, r[0], 3); + if (IS_ERR(i2c_dev)) { + ret = PTR_ERR(i2c_dev); + goto err; + } + + ret = + platform_device_add_data(i2c_dev, i2c_data, + sizeof(struct + fsl_i2c_platform_data)); + if (ret) + goto unreg; + } + + return 0; + +unreg: + platform_device_unregister(i2c_dev); +err: + return ret; +} + +arch_initcall(fsl_i2c_cpm_of_init); This could all be done with an of_platform driver instead, and avoid the above. (Someone else already suggested that I believe). #endif /* CONFIG_8xx */ diff --git a/drivers/i2c/algos/Kconfig b/drivers/i2c/algos/Kconfig index 5889907..7d7fb87 100644 --- a/drivers/i2c/algos/Kconfig +++ b/drivers/i2c/algos/Kconfig @@ -37,6 +37,8 @@ config I2C_ALGOPCA config I2C_ALGO8XX tristate MPC8xx CPM I2C interface depends on 8xx + help + 8xx I2C Algorithm config I2C_ALGO_SGI tristate I2C SGI interfaces diff --git a/drivers/i2c/algos/Makefile b/drivers/i2c/algos/Makefile index cac1051..1bd3b37 100644 --- a/drivers/i2c/algos/Makefile +++ b/drivers/i2c/algos/Makefile @@ -6,6 +6,7 @@ obj-$(CONFIG_I2C_ALGOBIT) += i2c-algo-bi obj-$(CONFIG_I2C_ALGOPCF)+= i2c-algo-pcf.o obj-$(CONFIG_I2C_ALGOPCA)+= i2c-algo-pca.o obj-$(CONFIG_I2C_ALGO_SGI) += i2c-algo-sgi.o +obj-$(CONFIG_I2C_ALGO8XX)+= i2c-algo-8xx.o ifeq ($(CONFIG_I2C_DEBUG_ALGO),y) EXTRA_CFLAGS += -DDEBUG diff --git
Re: [patch] CFS scheduler, -v5
On Sunday 22 April 2007, Nick Piggin wrote: On Mon, Apr 23, 2007 at 03:12:29AM +0200, Ingo Molnar wrote: i'm pleased to announce release -v5 of the CFS scheduler patchset. The patch against v2.6.21-rc7 and v2.6.20.7 can be downloaded from: http://redhat.com/~mingo/cfs-scheduler/ this CFS release mainly fixes regressions and improves interactivity: 13 files changed, 211 insertions(+), 199 deletions(-) the biggest user-visible change in -v5 are various interactivity improvements (especially under higher load) to fix reported regressions, and an improved way of handling nice levels. There's also a new sys_sched_yield_to() syscall implementation for i686 and x86_64. All known regressions have been fixed. (knock on wood) I think the granularity is still much too low. Why not increase it to something more reasonable as a default? I haven't approached that yet, but I just noticed, having been booted to this for all of 5 minutes, that although I told it not to renice x when my script ran 'make oldconfig', and I answered n, but there it is, sitting at -19 according to htop. The .config says otherwise: [EMAIL PROTECTED] linux-2.6.21-rc7-CFS-v5]# grep RENICE .config # CONFIG_RENICE_X is not set So v5 reniced X in spite of the 'no' setting. Although I hadn't noticed it, one way or the other, I just set it (X) back to the default -1 so that I'm comparing the same apples when I do compare. -- Cheers, Gene There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order. -Ed Howdershelt (Author) Fortune finishes the great quotations, #2 If at first you don't succeed, think how many people you've made happy. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[report] renicing X, cfs-v5 vs sd-0.46
* Linus Torvalds [EMAIL PROTECTED] wrote: The X server should not be re-niced. It was done in the past, and it was wrogn then (and caused problems - we had to tell people to undo it, because some distros had started doing it by default). If you have a single client, the X server is *not* more important than the client, and indeed, renicing the X server causes bad patterns: just because the client sends a request does not mean that the X server should immediately be given the CPU as being more important. You are completely right in the case of traditional schedulers. Note that this is not the case for CFS though. CFS has natural, built-in buffering against high-rate preemptions from lower nice-level SCHED_OTHER tasks. So while X will indeed get more CPU time (and that i think is fully justified), it wont get nearly as high of a context-switch rate as under priority/runqueue-based schedulers. To demonstrate this i have done the following simple experiment: i started 4 xterms on a single-CPU box, then i started the 'yes' utility in each xterm and resized all of the xterms to just 2 lines vertical. This generates a _lot_ of screen refresh events. Naturally, such a workload utilizes the whole CPU. Using CFS-v5, with Xorg at nice 0, the context-switch rate is low: procs ---memory-- ---swap-- -io --system-- -cpu-- r b swpd free buff cache si sobibo in cs us sy id wa st 2 0 0 472132 13712 17860400 032 113 170 83 17 0 0 0 2 0 0 472172 13712 17860400 0 0 112 184 85 15 0 0 0 2 0 0 472196 13712 17860400 0 0 108 162 83 17 0 0 0 1 0 0 472076 13712 17860400 0 0 115 189 86 14 0 0 0 X's CPU utilization is 49%, xterm's go to 12% each. Userspace utilization is 85%, system utilization is 15%. Renicing X to -10 increases context-switching, but not dramatically so, because it is throttled by CFS: procs ---memory-- ---swap-- -io --system-- -cpu-- r b swpd free buff cache si sobibo in cs us sy id wa st 4 0 0 475752 13492 17632000 064 116 1498 85 15 0 0 0 4 0 0 475752 13492 17632000 0 0 107 1488 84 16 0 0 0 4 0 0 475752 13492 17632000 0 0 140 1514 86 14 0 0 0 4 0 0 475752 13492 17632000 0 0 107 1477 85 15 0 0 0 4 0 0 475752 13492 17632000 0 0 122 1498 84 16 0 0 0 The system is still usable, Xorg is 44% busy, each xterm is 14% busy. User utilization 85%, system utilization is 15% - just like in the first case. Performance of scrolling is exactly the same in both cases (i have tested this by inserting periodic beeps after every 10,000 lines of text scrolled) - but the screen refresh rate is alot more eye-pleasing in the nice -10 case. (screen refresh it happens at ~500 Hz, while in the nice 0 case it happens at ~40 Hz and visibly flickers. This is especially noticeable if the xterms have full size.) I have tested the same workload on vanilla v2.6.21-rc7 and on SD-0.46 too, and they give roughly the same xterm scheduling behavior when Xorg is at nice 0: procs ---memory-- ---swap-- -io --system-- -cpu-- r b swpd free buff cache si sobibo in cs us sy id wa st 4 0 0 450564 14844 19497600 0 0 287 594 58 10 32 0 0 4 0 0 450704 14844 19497600 0 0 108 370 89 11 0 0 0 0 0 0 449588 14844 19497600 0 0 175 434 85 13 2 0 0 3 0 0 450688 14852 19497600 032 242 315 62 9 29 0 0 but when Xorg is reniced to -10 on the vanilla or SD schedulers, it indeed gives the markedly higher context-switching behavior you predicted: procs ---memory-- ---swap-- -io --system-- -cpu-- r b swpd free buff cache si sobibo in cs us sy id wa st 5 0 0 452272 13936 19489600 0 0 126 14147 78 22 0 0 0 4 0 0 452252 13944 19489600 064 155 14143 80 20 0 0 0 5 0 0 452612 13944 19489600 0 0 187 14031 79 21 0 0 0 4 0 0 452624 13944 19489600 0 0 121 14300 82 18 0 0 0 User time drops to 78%, system time increases to 22%. Scrolling performance clearly decreases. so i agree that renicing X can be a very bad idea, but it very much depends on the scheduler implementation too. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v5
* Nick Piggin [EMAIL PROTECTED] wrote: the biggest user-visible change in -v5 are various interactivity improvements (especially under higher load) to fix reported regressions, and an improved way of handling nice levels. There's also a new sys_sched_yield_to() syscall implementation for i686 and x86_64. All known regressions have been fixed. (knock on wood) I think the granularity is still much too low. Why not increase it to something more reasonable as a default? note that CFS's granularity value is not directly comparable to timeslice length: [ Note: while CFS's default preemption granularity is currently set to 5 msecs, this value does not directly transform into timeslices: for example two CPU-intense tasks will have effective timeslices of 10 msecs with this setting. ] also, i just checked SD: 0.46 defaults to 8 msecs rr_interval (on 1 CPU systems), which is lower than the 10 msecs effective timeslice length CVS-v5 achieves on two CPU-bound tasks. (in -v6 i'll scale the granularity up a bit with the number of CPUs, like SD does. That should get the right result on larger SMP boxes too.) while i agree it's a tad too finegrained still, I agree with Con's choice: rather err on the side of being too finegrained and lose some small amount of throughput on cache-intense workloads like compile jobs, than err on the side of being visibly too choppy for users on the desktop. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v5
* Gene Heskett [EMAIL PROTECTED] wrote: I haven't approached that yet, but I just noticed, having been booted to this for all of 5 minutes, that although I told it not to renice x when my script ran 'make oldconfig', and I answered n, but there it is, sitting at -19 according to htop. The .config says otherwise: [EMAIL PROTECTED] linux-2.6.21-rc7-CFS-v5]# grep RENICE .config # CONFIG_RENICE_X is not set So v5 reniced X in spite of the 'no' setting. Hmm, apparently your X uses ioperm() while mine uses iopl(), and i only turned off the renicing for iopl. (I fixed this in my tree and it will show up in -v6.) Although I hadn't noticed it, one way or the other, I just set it (X) back to the default -1 so that I'm comparing the same apples when I do compare. note that CFS handles negative nice levels differently from other schedulers, so the disadvantages of agressively reniced X (lost throughput due to overscheduling, worse interactivity) do _not_ apply to CFS. I think the 'fair' setting would be whatever the scheduler writer recommends: for SD, X probably performs better at around nice 0 (i'll let Con correct me if his experience is different). On CFS, nice -10 is perfectly fine too, and you'll have a zippier desktop under higher loads. (on servers this might be unnecessary/disadvantegous so there this can be turned off.) (also, in my tree i've changed the default from -19 to -10 to make it less scary to people and to leave more levels to the sysadmin, this change too will show up in -v6.) Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] kthread: Spontaneous exit support
This patch implements the kthread helper functions kthread_start and kthread_end which make it simple to support a kernel thread that may decided to exit on it's own before we request it to. It is still assumed that eventually we will get around to requesting that the kernel thread stop. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- include/linux/kthread.h | 23 +++ kernel/kthread.c| 18 ++ 2 files changed, 41 insertions(+), 0 deletions(-) diff --git a/include/linux/kthread.h b/include/linux/kthread.h index a8ea31d..4f1eff1 100644 --- a/include/linux/kthread.h +++ b/include/linux/kthread.h @@ -28,6 +28,29 @@ struct task_struct *kthread_create(int (*threadfn)(void *data), void kthread_bind(struct task_struct *k, unsigned int cpu); int kthread_stop(struct task_struct *k); +/** + * kthread_start - create and wake a thread. + * @threadfn: the function to run until kthread_should_stop(). + * @data: data ptr for @threadfn. + * @namefmt: printf-style name for the thread. + * + * Description: Convenient wrapper for kthread_create() followed by + * get_task_struct() and wake_up_process. kthread_start should be paired + * with kthread_end() so we don't leak task structs. + * + * Returns the kthread or ERR_PTR(-ENOMEM). + */ +#define kthread_start(threadfn, data, namefmt, ...) \ +({\ + struct task_struct *__k\ + = kthread_create(threadfn, data, namefmt, ## __VA_ARGS__); \ + if (!IS_ERR(__k)) {\ + get_task_struct(__k); \ + wake_up_process(__k); \ + } \ + __k; \ +}) +int kthread_end(struct task_struct *k); static inline int __kthread_should_stop(struct task_struct *tsk) { diff --git a/kernel/kthread.c b/kernel/kthread.c index 9b3c19f..d6d63c6 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -179,6 +179,24 @@ int kthread_stop(struct task_struct *tsk) } EXPORT_SYMBOL(kthread_stop); +/** + * kthread_end - signal a kthread and wait for it to exit. + * @task: The kthread to end. + * + * Description: Convenient wrapper for kthread_stop() followed by + * put_task_struct(). Returns the kthread exit code. + * + * kthread_start()/kthread_end() can handle kthread that spontaneously exit + * before the kthread is requested to terminate. + */ +int kthread_end(struct task_struct *task) +{ + int ret; + ret = kthread_stop(task); + put_task_struct(task); + return ret; +} +EXPORT_SYMBOL(kthread_end); static __init void kthreadd_setup(void) { -- 1.5.0.g53756 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
SATA errors/messages after upgrade to 2.6.20.7
It is a Samsung HD501LJ SATA drive connected to 631xESB/632xESB controller. Reading and writing every block of the drive does not generate any other errors/failures. This is observed in 2.6.20.7 like a clockwork on any badblocks -v run or rebuild of a MD raid1 array onto the disk. It, however, was not observed on 2.6.18 in 182 badblocks -v runs followed by rebuild of MD raid1 array. Any idea what it might be? Apr 23 14:45:34 stdsrv-x86-64bit kernel: ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 Apr 23 14:45:34 stdsrv-x86-64bit kernel: ata4.00: (irq_stat 0x4008) Apr 23 14:45:34 stdsrv-x86-64bit kernel: ata4.00: cmd 60/80:00:14:16:c4/00:00:05:00:00/40 tag 0 cdb 0x0 data 65536 in Apr 23 14:45:34 stdsrv-x86-64bit kernel: res 51/40:00:40:16:c4/6f:00:05:00:00/40 Emask 0x9 (media error) Apr 23 14:45:34 stdsrv-x86-64bit kernel: ata4.00: configured for UDMA/133 Apr 23 14:45:34 stdsrv-x86-64bit kernel: ata4: EH complete Apr 23 14:45:37 stdsrv-x86-64bit kernel: ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 Apr 23 14:45:37 stdsrv-x86-64bit kernel: ata4.00: (irq_stat 0x4008) Apr 23 14:45:37 stdsrv-x86-64bit kernel: ata4.00: cmd 60/80:00:14:16:c4/00:00:05:00:00/40 tag 0 cdb 0x0 data 65536 in Apr 23 14:45:37 stdsrv-x86-64bit kernel: res 51/40:00:40:16:c4/6f:00:05:00:00/40 Emask 0x9 (media error) Apr 23 14:45:37 stdsrv-x86-64bit kernel: ata4.00: configured for UDMA/133 Apr 23 14:45:37 stdsrv-x86-64bit kernel: ata4: EH complete Apr 23 14:45:40 stdsrv-x86-64bit kernel: ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 Apr 23 14:45:49 stdsrv-x86-64bit kernel: ata4.00: (irq_stat 0x4008) Apr 23 14:45:49 stdsrv-x86-64bit kernel: ata4.00: cmd 60/80:00:14:16:c4/00:00:05:00:00/40 tag 0 cdb 0x0 data 65536 in Apr 23 14:45:49 stdsrv-x86-64bit kernel: res 51/40:00:40:16:c4/6f:00:05:00:00/40 Emask 0x9 (media error) Apr 23 14:45:49 stdsrv-x86-64bit kernel: ata4.00: configured for UDMA/133 Apr 23 14:45:50 stdsrv-x86-64bit kernel: ata4: EH complete Apr 23 14:45:50 stdsrv-x86-64bit kernel: ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 Apr 23 14:45:50 stdsrv-x86-64bit kernel: ata4.00: (irq_stat 0x4008) Apr 23 14:45:50 stdsrv-x86-64bit kernel: ata4.00: cmd 60/80:00:14:16:c4/00:00:05:00:00/40 tag 0 cdb 0x0 data 65536 in Apr 23 14:45:51 stdsrv-x86-64bit kernel: res 51/40:00:40:16:c4/6f:00:05:00:00/40 Emask 0x9 (media error) Apr 23 14:45:51 stdsrv-x86-64bit kernel: ata4.00: configured for UDMA/133 Apr 23 14:45:51 stdsrv-x86-64bit kernel: ata4: EH complete Apr 23 14:45:51 stdsrv-x86-64bit kernel: ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 Apr 23 14:45:52 stdsrv-x86-64bit kernel: ata4.00: (irq_stat 0x4008) Apr 23 14:45:52 stdsrv-x86-64bit kernel: ata4.00: cmd 60/80:00:14:16:c4/00:00:05:00:00/40 tag 0 cdb 0x0 data 65536 in Apr 23 14:45:52 stdsrv-x86-64bit kernel: res 51/40:00:40:16:c4/6f:00:05:00:00/40 Emask 0x9 (media error) Apr 23 14:45:52 stdsrv-x86-64bit kernel: ata4.00: configured for UDMA/133 Apr 23 14:45:52 stdsrv-x86-64bit kernel: ata4: EH complete Apr 23 14:45:52 stdsrv-x86-64bit kernel: ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 Apr 23 14:45:53 stdsrv-x86-64bit kernel: ata4.00: (irq_stat 0x4008) Apr 23 14:45:53 stdsrv-x86-64bit kernel: ata4.00: cmd 60/80:00:14:16:c4/00:00:05:00:00/40 tag 0 cdb 0x0 data 65536 in Apr 23 14:45:53 stdsrv-x86-64bit kernel: res 51/40:00:40:16:c4/6f:00:05:00:00/40 Emask 0x9 (media error) Apr 23 14:45:54 stdsrv-x86-64bit kernel: ata4.00: configured for UDMA/133 Apr 23 14:45:54 stdsrv-x86-64bit kernel: ata4: EH complete Apr 23 14:45:54 stdsrv-x86-64bit kernel: SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB) Apr 23 14:45:54 stdsrv-x86-64bit kernel: sdd: Write Protect is off Apr 23 14:45:54 stdsrv-x86-64bit kernel: SCSI device sdd: write cache: enabled, read cache: enabled, doesn't support DPO or FUA Apr 23 14:45:54 stdsrv-x86-64bit kernel: SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB) Apr 23 14:45:55 stdsrv-x86-64bit kernel: sdd: Write Protect is off Apr 23 14:45:55 stdsrv-x86-64bit kernel: SCSI device sdd: write cache: enabled, read cache: enabled, doesn't support DPO or FUA - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH -mm 2/3] freezer: Introduce freezer_flags
On 4/23/07, Paul Jackson [EMAIL PROTECTED] wrote: One more question - why would I want to do this? Check out the FAQ in Documentation/power/swsusp.txt. Is this like something that would be useful on a laptop, to suspend activity and reduce battery drain, while preserving the current state of ones sessions and avoiding having to logout or shutdown? Yes, the original purpose for the inclusion of the freezer code was to support suspend-resume (mainly for laptops, but suspend-resume could be useful in other circumstances too, see the FAQ). Is it useful for quietting a system down before doing hot plug or unplug of key components, such as processors and memory? Yes, the freezer is (proposed to be, at least) moving on from being merely a suspend-resume-only thing to other usage scenarios, such as kprobes and hotlpug. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v5 (build problem - make headers_check fails)
Ingo Molnar wrote: i'm pleased to announce release -v5 of the CFS scheduler patchset. The patch against v2.6.21-rc7 and v2.6.20.7 can be downloaded from: FYI, make headers_check seems to fail on this: [EMAIL PROTECTED] linux-2.6]$ make headers_check [snip] CHECK include/linux/usb/cdc.h CHECK include/linux/usb/audio.h make[2]: *** No rule to make target `/src/linux-2.6/usr/include/linux/.check.sched.h', needed by `__headerscheck'. Stop. make[1]: *** [linux] Error 2 make: *** [headers_check] Error 2 [EMAIL PROTECTED] linux-2.6]$ This also fails if I have CONFIG_HEADERS_CHECK=y in my .config unset CONFIG_HEADERS_CHECK and it builds just fine. -Zach - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v5
On Mon, Apr 23, 2007 at 04:55:53AM +0200, Ingo Molnar wrote: * Nick Piggin [EMAIL PROTECTED] wrote: the biggest user-visible change in -v5 are various interactivity improvements (especially under higher load) to fix reported regressions, and an improved way of handling nice levels. There's also a new sys_sched_yield_to() syscall implementation for i686 and x86_64. All known regressions have been fixed. (knock on wood) I think the granularity is still much too low. Why not increase it to something more reasonable as a default? note that CFS's granularity value is not directly comparable to timeslice length: Right, but it does introduce the kbuild regression, and as we discussed, this will be only worse on newer CPUs with bigger caches or less naturally context switchy workloads. [ Note: while CFS's default preemption granularity is currently set to 5 msecs, this value does not directly transform into timeslices: for example two CPU-intense tasks will have effective timeslices of 10 msecs with this setting. ] also, i just checked SD: 0.46 defaults to 8 msecs rr_interval (on 1 CPU systems), which is lower than the 10 msecs effective timeslice length CVS-v5 achieves on two CPU-bound tasks. This is about an order of magnitude more than the current scheduler, so I still think it is too small. (in -v6 i'll scale the granularity up a bit with the number of CPUs, like SD does. That should get the right result on larger SMP boxes too.) I don't really like the scaling with SMP thing. The cache effects are still going to be significant on small systems, and there are lots of non-desktop users of those (eg. clusters). while i agree it's a tad too finegrained still, I agree with Con's choice: rather err on the side of being too finegrained and lose some small amount of throughput on cache-intense workloads like compile jobs, than err on the side of being visibly too choppy for users on the desktop. So cfs gets too choppy if you make the effective timeslice comparable to mainline? My approach is completely the opposite. For testing, I prefer to make the timeslice as large as possible so any problems or regressions are really noticable and will be reported; it can be scaled back to be smaller once those kinks are ironed out. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v5
* Nick Piggin [EMAIL PROTECTED] wrote: note that CFS's granularity value is not directly comparable to timeslice length: Right, but it does introduce the kbuild regression, [...] Note that i increased the granularity from 1msec to 5msecs after your kbuild report, could you perhaps retest kbuild with the default settings of -v5? [...] and as we discussed, this will be only worse on newer CPUs with bigger caches or less naturally context switchy workloads. yeah - but they'll all be quad core, so the SMP timeslice multiplicator should do the trick. Most of the CFS testers use single-CPU systems. (in -v6 i'll scale the granularity up a bit with the number of CPUs, like SD does. That should get the right result on larger SMP boxes too.) I don't really like the scaling with SMP thing. The cache effects are still going to be significant on small systems, and there are lots of non-desktop users of those (eg. clusters). CFS using clusters will want to tune the granularity up drastically anyway, to 1 second or more, to maximize throughput. I think a small default with a scale-up-on-SMP rule is pretty sane. We'll gather some more kbuild data and see what happens, ok? while i agree it's a tad too finegrained still, I agree with Con's choice: rather err on the side of being too finegrained and lose some small amount of throughput on cache-intense workloads like compile jobs, than err on the side of being visibly too choppy for users on the desktop. So cfs gets too choppy if you make the effective timeslice comparable to mainline? it doesnt in any test i do, but again, i'm erring on the side of it being more interactive. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] lazy freeing of memory through MADV_FREE
Nick Piggin wrote: Rik van Riel wrote: Nick Piggin wrote: Rik van Riel wrote: Here are the transactions/seconds for each combination: I've added a 5th column, with just your mmap_sem patch and without my madv_free patch. It is run with the glibc patch, which should make it fall back to MADV_DONTNEED after the first MADV_FREE call fails. vanilla new glibc madv_free kernel madv_free + mmap_sem mmap_sem threads 1 610 609 596545 534 2103211361196 12001180 4107011282014 20242027 8100010881665 20872089 1677910731310 19992012 Not doing the mprotect calls is the big one I guess, especially the fact that we don't need to take the mmap_sem for writing. With both our patches, single and two thread performance with MySQL sysbench is somewhat better than with just your patch, 4 and 8 thread performance are basically the same and just your patch gives a slight benefit with 16 threads. I guess I should benchmark up to 64 or 128 threads tomorrow, to see if this is just luck or if the cache benefit of doing the page faults and reusing hot pages is faster than not having page faults at all. I should run some benchmarks on other systems, too. Some of these results could be an artifact of my quad core CPU. The results could be very different on other systems... Yeah. That's funny, because it means either there is some contention on the mmap_sem (or ptl) at 1 thread, or that my patch alters the uncontended performance. Maybe MySQL has various different threads to do different tasks. Something to look into... - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Question about Reiser4
Eric Hopper wrote: I know that this whole effort has been put in disarray by the prosecution of Hans Reiser, but I'm curious as to its status. It was in disarray well before. Many of the reiser4 features, like filesystem plugins, make more technical sense in the Linux VFS, but made more business sense for Namesys as a reiserfs 4 thing. That lead to a stalemate. Is Reiser4 going to be going into the Linus kernel anytime soon? I wouldn't count on it. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] lazy freeing of memory through MADV_FREE
Rik van Riel wrote: Nick Piggin wrote: Rik van Riel wrote: Nick Piggin wrote: Rik van Riel wrote: Here are the transactions/seconds for each combination: I've added a 5th column, with just your mmap_sem patch and without my madv_free patch. It is run with the glibc patch, which should make it fall back to MADV_DONTNEED after the first MADV_FREE call fails. vanilla new glibc madv_free kernel madv_free + mmap_sem mmap_sem threads 1 610 609 596545 534 2103211361196 12001180 4107011282014 20242027 8100010881665 20872089 1677910731310 19992012 Now that I think about it - this is all with the rawhide kernel configuration, which has an ungodly number of debug config options enabled. I should try this with a more normal kernel, on various different systems. It would also be helpful if other people tried this same benchmark, and others, on their systems. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Question about Reiser4
Eric Hopper wrote: I know that this whole effort has been put in disarray by the prosecution of Hans Reiser, but I'm curious as to its status. It was in disarray well before. Many of the reiser4 features, like filesystem plugins, make more technical sense in the Linux VFS, but made more business sense for Namesys as a reiserfs 4 thing. That lead to a stalemate. Shouldn't it be a matter of stability though? Benchmarks suggest that reiser4 is a good file system; reiser4 is the successor to the already-accepted reiserfs; we've got experimental ext4 support but no reiser4 support, etc. I don't see why something like plugins should matter. If it works enough to be marked as experimental, why shouldn't reiser4 support be included? It's a pain for me personally to have to patch any kernel with reiser4 support so I can use the reiser4 fs. William Heimbigner [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] lazy freeing of memory through MADV_FREE
Rik van Riel wrote: I've added a 5th column, with just your mmap_sem patch and without my madv_free patch. It is run with the glibc patch, which should make it fall back to MADV_DONTNEED after the first MADV_FREE call fails. Thanks! (I edited slightly so it doesn't wrap) vanilla new glibc madv_freemmap_semboth threads 1 610 609 596 534 545 210321136119611801200 410701128201420272024 810001088166520892087 167791073131020121999 Not doing the mprotect calls is the big one I guess, especially the fact that we don't need to take the mmap_sem for writing. Yes. With both our patches, single and two thread performance with MySQL sysbench is somewhat better than with just your patch, 4 and 8 thread performance are basically the same and just your patch gives a slight benefit with 16 threads. I guess I should benchmark up to 64 or 128 threads tomorrow, to see if this is just luck or if the cache benefit of doing the page faults and reusing hot pages is faster than not having page faults at all. I should run some benchmarks on other systems, too. Some of these results could be an artifact of my quad core CPU. The results could be very different on other systems... I'm getting the 16 core box out of retirement as we speak :) -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v5
On Mon, Apr 23, 2007 at 05:43:10AM +0200, Ingo Molnar wrote: * Nick Piggin [EMAIL PROTECTED] wrote: note that CFS's granularity value is not directly comparable to timeslice length: Right, but it does introduce the kbuild regression, [...] Note that i increased the granularity from 1msec to 5msecs after your kbuild report, could you perhaps retest kbuild with the default settings of -v5? I'm looking at mysql again today, but I will try eventually. It was just a simple kbuild. [...] and as we discussed, this will be only worse on newer CPUs with bigger caches or less naturally context switchy workloads. yeah - but they'll all be quad core, so the SMP timeslice multiplicator should do the trick. Most of the CFS testers use single-CPU systems. But desktop users could have have quad thread and even 8 thread CPUs soon, so if the number doesn't work for both then you're in trouble. It just smells like a hack to scale with CPU numbers. (in -v6 i'll scale the granularity up a bit with the number of CPUs, like SD does. That should get the right result on larger SMP boxes too.) I don't really like the scaling with SMP thing. The cache effects are still going to be significant on small systems, and there are lots of non-desktop users of those (eg. clusters). CFS using clusters will want to tune the granularity up drastically anyway, to 1 second or more, to maximize throughput. I think a small default with a scale-up-on-SMP rule is pretty sane. We'll gather some more kbuild data and see what happens, ok? while i agree it's a tad too finegrained still, I agree with Con's choice: rather err on the side of being too finegrained and lose some small amount of throughput on cache-intense workloads like compile jobs, than err on the side of being visibly too choppy for users on the desktop. So cfs gets too choppy if you make the effective timeslice comparable to mainline? it doesnt in any test i do, but again, i'm erring on the side of it being more interactive. I'd start by erring on the side of trying to ensure no obvious performance regressions like this because that's the easy part. Suppose everybody finds your scheduler wonderfully interactive, but you can't make it so with a larger timeslice? For _real_ desktop systems, sure, erring on the side of being more interactive is fine. For RFC patches for testing, I really think you could be taking advantage of the fact that people will give you feedback on the issue. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH -mm 2/3] freezer: Introduce freezer_flags
Hi Rafael, +/* + * Per task flags used by the freezer + * + * They should not be referred to directly outside of this file. + */ +#define TFF_NOFREEZE 0 /* task should not be frozen */ +#define TFF_FREEZE 8 /* task should go to the refrigerator ASAP */ +#define TFF_SKIP 9 /* do not count this task as freezable */ +#define TFF_FROZEN 10 /* task is frozen */ Aren't NOFREEZE and SKIP doing the same thing? One of them appears superfluous. I'm looking at 21-rc6-mm1 and vfork(2) seems to be its only user. Seeing how vfork(2) used it, can't the call to freezer_do_not_count() be replaced with a call to freezer_exempt()? Similarly, the freezer_count() after the wait_for_completion might just as well be a clear of the NOFREEZE bit followed by a try_to_freeze(). Could you please explain the rationale behind the SKIP flag? I do see that SKIP seems to be relevant for only userspace threads and presumably only kernel threads are allowed to set NOFREEZE, but why this distinction between the two? Also, I do have several gripes against the naming of some of these functions: static inline int freezing(struct task_struct *p) This could be called task_should_freeze(). /* - * Sometimes we may need to cancel the previous 'freeze' request + * Cancel the previous 'freeze' request */ static inline void do_not_freeze(struct task_struct *p) This definitely needs to be undo_freeze() or unfreeze(). do_not_freeze() sounds like what freeze_exempt() does. static inline void frozen_process(struct task_struct *p) frozen_process() sounds like what frozen() is supposed to do. This could instead be mark_task_frozen(), or even mark_frozen(), because only the current task can ever mark *itself* frozen before freezing itself. static inline void freezer_do_not_count(void) static inline void freezer_count(void) These could be called freezer_skip() and freezer_do_not_skip(). Better to stick to consistent naming / terminology. Cheers, Satyam - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: How to make mmap'ed kernel buffer non-cacheable
Hi Alan, I believe that dma_alloc_coherent will mark the kernel buffer as uncached at alocation time. But that is not my intention. I have mapped some user space memory to the kernel buffer and I wish to ensure that the contents of both are coherent and correctly ordered. In other words I wish to flush the contents of the kernel buffer to user space as soon as new data is available in my kernel buffer. How to do that? Will doing mysnc from the user space help? Rather than flushing everytime (or msyncing) I intend to make my user-to-kernel mapping as non cacheable so that multiple flushing can be avoided. Bhuvan Hi, I am working on an audio device driver development on Linux. I have a kernel buffer which I have mapped to user space using mmap call from user space. My problem is that the data which comes to the kernel buffer is getting dropped in user space and I get only 50-60% of the data which is randomly ordered. The user to kernel level buffer address translation code is fine and I suspect this data dropping is occurring coz the kernel buffer is cacheable. Please suggest me some way of making the entire buffer non cacheable. I am stuck on this for quite a while now. The dma mapping API (or the PCI equivalent) provide the neccessary behaviours for DMA receive, DMA send and consistent memory space in a portable fashion. That may not be done using uncachable memory in all cases as not all processors even support uncacheable memory spaces. If you are using the ALSA core routines (snd_dma_alloc_coherent) then ALSA already uses dma_alloc_coherent to ensure the memory is allocated for the appropriate use and will be kernel marked uncached. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] lazy freeing of memory through MADV_FREE
Nick Piggin wrote: So where is the down_write coming from in this workload, I wonder? Heap management? What syscalls? Trying to answer this question, I straced the mysql threads that showed up in top when running a single threaded sysbench workload. There were no mmap, munmap, brk, mprotect or madvise system calls in the trace. MySQL has me puzzled, but it seems to have some other people interested too. I think I'll go play a bit with ebizzy now, to see how other workloads are affected by our kernel changes. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: How to make mmap'ed kernel buffer non-cacheable
Bhuvan Kumar MITTAL wrote: Hi Alan, I believe that dma_alloc_coherent will mark the kernel buffer as uncached at alocation time. But that is not my intention. I have mapped some user space memory to the kernel buffer and I wish to ensure that the contents of both are coherent and correctly ordered. In other words I wish to flush the contents of the kernel buffer to user space as soon as new data is available in my kernel buffer. How to do that? Will doing mysnc from the user space help? msync is only for pagecache. If you modify user mapped RAM from the kernel, or wish to read user modified RAM from the kernel, you should issue a flush_dcache_page after and before, respectively. See Documentation/cachetlb.h. Does that fix it? What are the details of your platform? -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] lazy freeing of memory through MADV_FREE
Jakub Jelinek wrote: On Fri, Apr 20, 2007 at 07:52:44PM -0400, Rik van Riel wrote: It turns out that Nick's patch does not improve peak performance much, but it does prevent the decline when running with 16 threads on my quad core CPU! We _definately_ want both patches, there's a huge benefit in having them both. Here are the transactions/seconds for each combination: vanilla new glibc madv_free kernel madv_free + mmap_sem threads 1 610 609 596545 2103211361196 1200 4107011282014 2024 8100010881665 2087 1677910731310 1999 FYI, I have uploaded a testing glibc that uses MADV_FREE and falls back to MADV_DONTUSE if MADV_FREE is not available, to http://people.redhat.com/jakub/glibc/2.5.90-21.1/ Hmm, I wonder how glibc malloc stacks up to tcmalloc on this test (after the mmap_sem patch as well). I'll try running that as well! -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fault injection: fix failslab with CONFIG_NUMA
On Sun, 22 Apr 2007, Akinobu Mita wrote: Currently failslab injects failures into cache_alloc(). But with enabling CONFIG_NUMA it's not enough to let actual slab allocator functions (kmalloc, kmem_cache_alloc, ...) return NULL. This patch moves fault injection hook inside of __cache_alloc() and __cache_alloc_node(). These are lower call path than cache_alloc() and enable to inject faulures to slab allocators with CONFIG_NUMA. Looks good to me. Acked-by: Pekka Enberg [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Fix potential overflow in perfctr reservation
Hello. In article [EMAIL PROTECTED] (at Sun, 22 Apr 2007 01:09:17 -0700), Andrew Morton [EMAIL PROTECTED] says: [PATCH] x86: Fix potential overflow in perfctr reservation : The created a warning storm: arch/i386/kernel/nmi.c: In function 'avail_to_resrv_perfctr_nmi_bit': arch/i386/kernel/nmi.c:129: warning: passing argument 2 of 'constant_test_bit' from incompatible pointer type arch/i386/kernel/nmi.c:129: warning: passing argument 2 of 'variable_test_bit' from incompatible pointer type : diff -puN arch/i386/kernel/nmi.c~fix-x86-fix-potential-overflow-in-perfctr-reservation arch/i386/kernel/nmi.c --- a/arch/i386/kernel/nmi.c~fix-x86-fix-potential-overflow-in-perfctr-reservation +++ a/arch/i386/kernel/nmi.c @@ -126,7 +126,7 @@ int avail_to_resrv_perfctr_nmi_bit(unsig int cpu; BUG_ON(counter NMI_MAX_COUNTER_BITS); for_each_possible_cpu (cpu) { - if (test_bit(counter, per_cpu(perfctr_nmi_owner, cpu))) + if (test_bit(counter, per_cpu(perfctr_nmi_owner, cpu))) return 0; } return 1; : I worry rather a lot about how well runtime tested this very late change was, and whether it works correctly even with this fix applied. Perhaps we should jsut revert? Is DEFINE_PER_CPU(type, var[num]) is really valid? I guess it should be DEFINE_PER_CPU(type[num], var), no? [I386] NMI: Fix per_cpu() usage. Per-cpu array should be declared as DEFINE_PER_CPU(type[size], name), not as DEFINE_PER_CPU(type, name[size]). Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED] diff --git a/arch/i386/kernel/nmi.c b/arch/i386/kernel/nmi.c index 9f1e8c1..eddb4f7 100644 --- a/arch/i386/kernel/nmi.c +++ b/arch/i386/kernel/nmi.c @@ -48,8 +48,8 @@ int nmi_watchdog_enabled; #define NMI_MAX_COUNTER_BITS 66 #define NMI_MAX_COUNTER_LONGS BITS_TO_LONGS(NMI_MAX_COUNTER_BITS) -static DEFINE_PER_CPU(unsigned long, perfctr_nmi_owner[NMI_MAX_COUNTER_LONGS]); -static DEFINE_PER_CPU(unsigned long, evntsel_nmi_owner[NMI_MAX_COUNTER_LONGS]); +static DEFINE_PER_CPU(unsigned long [NMI_MAX_COUNTER_LONGS], perfctr_nmi_owner); +static DEFINE_PER_CPU(unsigned long [NMI_MAX_COUNTER_LONGS], evntsel_nmi_owner); static cpumask_t backtrace_mask = CPU_MASK_NONE; /* nmi_active: @@ -126,7 +126,7 @@ int avail_to_resrv_perfctr_nmi_bit(unsigned int counter) int cpu; BUG_ON(counter NMI_MAX_COUNTER_BITS); for_each_possible_cpu (cpu) { - if (test_bit(counter, per_cpu(perfctr_nmi_owner, cpu))) + if (test_bit(counter, per_cpu(perfctr_nmi_owner, cpu))) return 0; } return 1; @@ -142,7 +142,7 @@ int avail_to_resrv_perfctr_nmi(unsigned int msr) BUG_ON(counter NMI_MAX_COUNTER_BITS); for_each_possible_cpu (cpu) { - if (test_bit(counter, per_cpu(perfctr_nmi_owner, cpu))) + if (test_bit(counter, per_cpu(perfctr_nmi_owner, cpu))) return 0; } return 1; @@ -157,7 +157,7 @@ static int __reserve_perfctr_nmi(int cpu, unsigned int msr) counter = nmi_perfctr_msr_to_bit(msr); BUG_ON(counter NMI_MAX_COUNTER_BITS); - if (!test_and_set_bit(counter, per_cpu(perfctr_nmi_owner, cpu))) + if (!test_and_set_bit(counter, per_cpu(perfctr_nmi_owner, cpu))) return 1; return 0; } @@ -171,7 +171,7 @@ static void __release_perfctr_nmi(int cpu, unsigned int msr) counter = nmi_perfctr_msr_to_bit(msr); BUG_ON(counter NMI_MAX_COUNTER_BITS); - clear_bit(counter, per_cpu(perfctr_nmi_owner, cpu)); + clear_bit(counter, per_cpu(perfctr_nmi_owner, cpu)); } int reserve_perfctr_nmi(unsigned int msr) @@ -207,7 +207,7 @@ int __reserve_evntsel_nmi(int cpu, unsigned int msr) counter = nmi_evntsel_msr_to_bit(msr); BUG_ON(counter NMI_MAX_COUNTER_BITS); - if (!test_and_set_bit(counter, per_cpu(evntsel_nmi_owner, cpu)[0])) + if (!test_and_set_bit(counter, per_cpu(evntsel_nmi_owner, cpu))) return 1; return 0; } @@ -221,7 +221,7 @@ static void __release_evntsel_nmi(int cpu, unsigned int msr) counter = nmi_evntsel_msr_to_bit(msr); BUG_ON(counter NMI_MAX_COUNTER_BITS); - clear_bit(counter, per_cpu(evntsel_nmi_owner, cpu)[0]); + clear_bit(counter, per_cpu(evntsel_nmi_owner, cpu)); } int reserve_evntsel_nmi(unsigned int msr) -- YOSHIFUJI Hideaki @ USAGI Project [EMAIL PROTECTED] GPG-FP : 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] x86_64: Reflect the relocatability of the kernel in the ELF header.
Currently because vmlinux does not reflect that the kernel is relocatable we still have to support CONFIG_PHYSICAL_START. So this patch adds a small c program to do what we cannot do with a linker script, set the elf header type to ET_DYN. This should remove the last obstacle to removing CONFIG_PHYSICAL_START on x86_64. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- arch/x86_64/Kconfig |4 +++ arch/x86_64/Makefile | 10 +++ scripts/Makefile | 11 --- scripts/mketrel.c| 70 ++ 4 files changed, 90 insertions(+), 5 deletions(-) create mode 100644 scripts/mketrel.c diff --git a/arch/x86_64/Kconfig b/arch/x86_64/Kconfig index 16d9bf3..773b487 100644 --- a/arch/x86_64/Kconfig +++ b/arch/x86_64/Kconfig @@ -121,6 +121,10 @@ config ARCH_HAS_ILOG2_U64 bool default n +config ELF_RELOCATABLE + bool + default y + source init/Kconfig diff --git a/arch/x86_64/Makefile b/arch/x86_64/Makefile index 9dd91b2..5ae79ab 100644 --- a/arch/x86_64/Makefile +++ b/arch/x86_64/Makefile @@ -124,6 +124,16 @@ define archhelp echo ' isoimage - Create a boot CD-ROM image' endef +ifeq ($(CONFIG_RELOCATABLE),y) +define cmd_vmlinux__ + $(LD) $(LDFLAGS) $(LDFLAGS_vmlinux) -o $@ \ + -T $(vmlinux-lds) $(vmlinux-init)\ + --start-group $(vmlinux-main) --end-group\ + $(filter-out $(vmlinux-lds) $(vmlinux-init) $(vmlinux-main) FORCE ,$^) \ + scripts/mketrel $@ +endef +endif + CLEAN_FILES += arch/$(ARCH)/boot/fdimage \ arch/$(ARCH)/boot/image.iso \ arch/$(ARCH)/boot/mtools.conf diff --git a/scripts/Makefile b/scripts/Makefile index 1c73c5a..ddba550 100644 --- a/scripts/Makefile +++ b/scripts/Makefile @@ -7,11 +7,12 @@ # conmakehash: Create chartable # conmakehash: Create arrays for initializing the kernel console tables -hostprogs-$(CONFIG_KALLSYMS) += kallsyms -hostprogs-$(CONFIG_LOGO) += pnmtologo -hostprogs-$(CONFIG_VT) += conmakehash -hostprogs-$(CONFIG_PROM_CONSOLE) += conmakehash -hostprogs-$(CONFIG_IKCONFIG) += bin2c +hostprogs-$(CONFIG_KALLSYMS)+= kallsyms +hostprogs-$(CONFIG_LOGO)+= pnmtologo +hostprogs-$(CONFIG_VT) += conmakehash +hostprogs-$(CONFIG_PROM_CONSOLE)+= conmakehash +hostprogs-$(CONFIG_IKCONFIG)+= bin2c +hostprogs-$(CONFIG_ELF_RELOCATABLE) += mketrel always := $(hostprogs-y) $(hostprogs-m) diff --git a/scripts/mketrel.c b/scripts/mketrel.c new file mode 100644 index 000..effa312 --- /dev/null +++ b/scripts/mketrel.c @@ -0,0 +1,70 @@ +#include sys/types.h +#include sys/stat.h +#include fcntl.h +#include unistd.h +#include elf.h +#include stdio.h +#include errno.h +#include string.h +#include stdarg.h +#include stdlib.h + +static int fd; +unsigned char e_ident[EI_NIDENT]; + +void die(const char * str, ...) +{ + va_list args; + va_start(args, str); + vfprintf(stderr, str, args); + fputc('\n', stderr); + exit(1); +} + +void file_open(const char *name) +{ + if ((fd = open(name, O_RDWR, 0)) 0) + die(Unable to open `%s': %m, name); +} + +static void mketrel(void) +{ + unsigned char e_type[2]; + if (read(fd, e_ident, sizeof(e_ident)) != sizeof(e_ident)) + die(Cannot read ELF header: %s\n, strerror(errno)); + + if (memcmp(e_ident, ELFMAG, 4) != 0) + die(No ELF magic\n); + + if ((e_ident[EI_CLASS] != ELFCLASS64) + (e_ident[EI_CLASS] != ELFCLASS32)) + die(Unrecognized ELF class: %x\n, e_ident[EI_CLASS]); + + if ((e_ident[EI_DATA] != ELFDATA2LSB) + (e_ident[EI_DATA] != ELFDATA2MSB)) + die(Unrecognized ELF data encoding: %x\n, e_ident[EI_DATA]); + + if (e_ident[EI_VERSION] != EV_CURRENT) + die(Unknown ELF version: %d\n, e_ident[EI_VERSION]); + + if (e_ident[EI_DATA] == ELFDATA2LSB) { + e_type[0] = ET_REL 0xff; + e_type[1] = ET_REL 8; + } else { + e_type[1] = ET_REL 0xff; + e_type[0] = ET_REL 8; + } + + if (write(fd, e_type, sizeof(e_type)) != sizeof(e_type)) + die(Cannot write ELF type: %s\n, strerror(errno)); +} + +int main(int argc, char **argv) +{ + if (argc != 2) + die(Usage: mketrel: vmlinux); + file_open(argv[1]); + mketrel(); + close(fd); + return 0; +} -- 1.5.1.1.181.g2de0 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] x86_64: Remove CONFIG_PHYSICAL_START and CONFIG_RELOCATABLE
Now that the vmlinux is marked as relocatable there is no reason to retain the CONFIG_PHYSICAL_START option, as we can put the binary we have at any 2MB aligned address in memory. With CONFIG_PHYSICAL_START gone the handful of code lines that depend on CONFIG_RELOCATABLE no longer make sense to be conditional and can be removed. The big win of this patch (besides Kconfig simplicity) is that the nasty BUILD_BUG_ON test for people misaligning their kernel when using CONFIG_PHYSICAL_START can be removed as this case can only happen with CONFIG_PHYSICAL_START selected. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- arch/x86_64/Kconfig| 55 +--- arch/x86_64/Makefile |2 - arch/x86_64/boot/compressed/head.S | 13 + arch/x86_64/boot/setup.S |4 -- arch/x86_64/defconfig |2 - arch/x86_64/kernel/head64.c|7 include/asm-x86_64/page.h |2 +- 7 files changed, 3 insertions(+), 82 deletions(-) diff --git a/arch/x86_64/Kconfig b/arch/x86_64/Kconfig index 773b487..713c1ad 100644 --- a/arch/x86_64/Kconfig +++ b/arch/x86_64/Kconfig @@ -565,62 +565,9 @@ config CRASH_DUMP which are loaded in the main kernel with kexec-tools into a specially reserved region and then later executed after a crash by kdump/kexec. The crash dump kernel must be compiled - to a memory address not used by the main kernel or BIOS using - PHYSICAL_START. + to a memory address not used by the main kernel or BIOS For more details see Documentation/kdump/kdump.txt -config RELOCATABLE - bool Build a relocatable kernel(EXPERIMENTAL) - depends on EXPERIMENTAL - help - Builds a relocatable kernel. This enables loading and running - a kernel binary from a different physical address than it has - been compiled for. - - One use is for the kexec on panic case where the recovery kernel - must live at a different physical address than the primary - kernel. - - Note: If CONFIG_RELOCATABLE=y, then kernel run from the address - it has been loaded at and compile time physical address - (CONFIG_PHYSICAL_START) is ignored. - -config PHYSICAL_START - hex Physical address where the kernel is loaded if (EMBEDDED || CRASH_DUMP) - default 0x20 - help - This gives the physical address where the kernel is loaded. It - should be aligned to 2MB boundary. - - If kernel is a not relocatable (CONFIG_RELOCATABLE=n) then - bzImage will decompress itself to above physical address and - run from there. Otherwise, bzImage will run from the address where - it has been loaded by the boot loader and will ignore above physical - address. - - In normal kdump cases one does not have to set/change this option - as now bzImage can be compiled as a completely relocatable image - (CONFIG_RELOCATABLE=y) and be used to load and run from a different - address. This option is mainly useful for the folks who don't want - to use a bzImage for capturing the crash dump and want to use a - vmlinux instead. - - So if you are using bzImage for capturing the crash dump, leave - the value here unchanged to 0x20 and set CONFIG_RELOCATABLE=y. - Otherwise if you plan to use vmlinux for capturing the crash dump - change this value to start of the reserved region (Typically 16MB - 0x100). In other words, it can be set based on the X value as - specified in the [EMAIL PROTECTED] command line boot parameter - passed to the panic-ed kernel. Typically this parameter is set as - [EMAIL PROTECTED] Please take a look at - Documentation/kdump/kdump.txt for more details about crash dumps. - - Usage of bzImage for capturing the crash dump is advantageous as - one does not have to build two kernels. Same kernel can be used - as production kernel and capture kernel. - - Don't change this unless you know what you are doing. - config SECCOMP bool Enable seccomp to safely compute untrusted bytecode depends on PROC_FS diff --git a/arch/x86_64/Makefile b/arch/x86_64/Makefile index 5ae79ab..5d96f4f 100644 --- a/arch/x86_64/Makefile +++ b/arch/x86_64/Makefile @@ -124,7 +124,6 @@ define archhelp echo ' isoimage - Create a boot CD-ROM image' endef -ifeq ($(CONFIG_RELOCATABLE),y) define cmd_vmlinux__ $(LD) $(LDFLAGS) $(LDFLAGS_vmlinux) -o $@ \ -T $(vmlinux-lds) $(vmlinux-init)\ @@ -132,7 +131,6 @@ define cmd_vmlinux__ $(filter-out $(vmlinux-lds) $(vmlinux-init) $(vmlinux-main) FORCE ,$^) \ scripts/mketrel $@ endef -endif CLEAN_FILES += arch/$(ARCH)/boot/fdimage \ arch/$(ARCH)/boot/image.iso \ diff
Re: SATA errors/messages after upgrade to 2.6.20.7
[EMAIL PROTECTED] wrote: It is a Samsung HD501LJ SATA drive connected to 631xESB/632xESB controller. Reading and writing every block of the drive does not generate any other errors/failures. This is observed in 2.6.20.7 like a clockwork on any badblocks -v run or rebuild of a MD raid1 array onto the disk. It, however, was not observed on 2.6.18 in 182 badblocks -v runs followed by rebuild of MD raid1 array. Any idea what it might be? Apr 23 14:45:34 stdsrv-x86-64bit kernel: ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 Apr 23 14:45:34 stdsrv-x86-64bit kernel: ata4.00: (irq_stat 0x4008) Apr 23 14:45:34 stdsrv-x86-64bit kernel: ata4.00: cmd 60/80:00:14:16:c4/00:00:05:00:00/40 tag 0 cdb 0x0 data 65536 in Apr 23 14:45:34 stdsrv-x86-64bit kernel: res 51/40:00:40:16:c4/6f:00:05:00:00/40 Emask 0x9 (media error) Does 'smartctl -d ata -t long /dev/X' return errors? Media error is typically just that... Jeff - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v5
On Mon, Apr 23, 2007 at 07:16:59AM +0200, Markus Trippelsdorf wrote: On Mon, Apr 23, 2007 at 03:12:29AM +0200, Ingo Molnar wrote: i'm pleased to announce release -v5 of the CFS scheduler patchset. The patch against v2.6.21-rc7 and v2.6.20.7 can be downloaded from: ... - feature: add initial sys_sched_yield_to() implementation. Not hooked into the futex code yet, but testers are encouraged to give the syscalls a try, on i686 the new syscall is __NR_yield_to==320, on x86_64 it's __NR_yield_to==280. The prototype is sys_sched_yield_to(pid_t), as suggested by Ulrich Drepper. The new version does not link here (amd64,smp): LD .tmp_vmlinux1 arch/x86_64/kernel/built-in.o:(.rodata+0x1dd8): undefined reference to `sys_yield_to' Changing sys_yield_to to sys_sched_yield_to in include/asm-x86_64/unistd.h fixes the problem. -- Markus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v5
On Mon, Apr 23, 2007 at 03:12:29AM +0200, Ingo Molnar wrote: i'm pleased to announce release -v5 of the CFS scheduler patchset. The patch against v2.6.21-rc7 and v2.6.20.7 can be downloaded from: ... - feature: add initial sys_sched_yield_to() implementation. Not hooked into the futex code yet, but testers are encouraged to give the syscalls a try, on i686 the new syscall is __NR_yield_to==320, on x86_64 it's __NR_yield_to==280. The prototype is sys_sched_yield_to(pid_t), as suggested by Ulrich Drepper. The new version does not link here (amd64,smp): LD .tmp_vmlinux1 arch/x86_64/kernel/built-in.o:(.rodata+0x1dd8): undefined reference to `sys_yield_to' -- Markus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] use spinlock instead of binary mutex in CDU-31A driver
El Mon, Apr 23, 2007 at 01:25:58AM +0200 Andi Kleen ha dit: Matthias Kaehlcke [EMAIL PROTECTED] writes: -static DECLARE_MUTEX(sony_sem);/* Semaphore for drive hardware access */ +static DEFINE_MUTEX(sony_mtx); /* Mutex for drive hardware access */ That's not a spinlock. Also normally some rationale is added to the description for a change? sorry i messed up the description of the change, i meant mutex instead of spinlock (in the last days i reported some spinlock related bugs ...). the rationale is that according to http://lwn.net/Articles/167034/ binary semaphores that aren't given in interrupt context or locked and unlocked by different processes should be replaced by mutexes thanks for your comments -- Matthias Kaehlcke Linux Application Developer Barcelona La posibilidad de realizar un suenyo es lo que hace que la vida sea interesante .''`. using free software / Debian GNU/Linux | http://debian.org : :' : `. `'` gpg --keyserver pgp.mit.edu --recv-keys 47D8E5D4 `- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: regression with gammu on 2.6.21-rc7
On Fri, Apr 20, 2007 at 10:58:53AM +0200, Wolfgang Erig wrote: Hello, I have a regression with 2.6.21-rc7-g80d74d51. The utility gammu to talk to my mobile does not work anymore. With 2.6.20 gammu runs fine. Distribution is the latest Debian/testing Wolfgang $ gammu --backup backup Press Ctrl+C to break... I/O possible $ uname -a Linux max 2.6.21-rc7-g80d74d51 #9 SMP Wed Apr 18 21:41:41 CEST 2007 i686 GNU/Linux $ tail messages Apr 20 08:04:36 max kernel: ACPI: PCI Interrupt :00:1b.0[A] - GSI 16 (level, low) - IRQ 16 Apr 20 08:04:36 max kernel: extern: link up, 100Mbps, full-duplex, lpa 0x45E1 Apr 20 08:04:36 max kernel: intern: setting half-duplex. Apr 20 08:09:02 max kernel: usb 2-2: USB disconnect, address 3 Apr 20 08:09:02 max kernel: pl2303 ttyUSB0: pl2303 converter now disconnected from ttyUSB0 Apr 20 08:09:02 max kernel: pl2303 2-2:1.0: device disconnected Apr 20 08:10:24 max kernel: usb 2-2: new full speed USB device using uhci_hcd and address 4 Apr 20 08:10:25 max kernel: usb 2-2: configuration #1 chosen from 1 choice Apr 20 08:10:25 max kernel: pl2303 2-2:1.0: pl2303 converter detected Apr 20 08:10:25 max kernel: usb 2-2: pl2303 converter now attached to ttyUSB0 That looks ok, I'm guessing you yanked it out and then back in? Or is the problem that the device was removed? thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Question about Reiser4
William Heimbigner wrote: Eric Hopper wrote: I know that this whole effort has been put in disarray by the prosecution of Hans Reiser, but I'm curious as to its status. It was in disarray well before. Many of the reiser4 features, like filesystem plugins, make more technical sense in the Linux VFS, but made more business sense for Namesys as a reiserfs 4 thing. That lead to a stalemate. Shouldn't it be a matter of stability though? A lot of other things matter. Things like a willingness to maintain the code after it gets merged, or at least turning the code into something the community is willing to maintain if the original developers stop maintaining it. Benchmarks suggest that reiser4 is a good file system; reiser4 is the successor to the already-accepted reiserfs; we've got experimental ext4 support but no reiser4 support, etc. Namesys kind of abandoned reiserfs after work on reiser4 started. Taking in a new code base on such a track record is not a good idea when the code is not in a shape where the community wants to maintain it. I don't see why something like plugins should matter. If it works enough to be marked as experimental, why shouldn't reiser4 support be included? It's a pain for me personally to have to patch any kernel with reiser4 support so I can use the reiser4 fs. You basically have three options: 1) keep patching every time you upgrade the kernel 2) use another filesystem 3) become the new reiser4 maintainer and turn the code into something that Linus is willing to accept -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Question about Reiser4
William Heimbigner wrote: Eric Hopper wrote: I know that this whole effort has been put in disarray by the prosecution of Hans Reiser, but I'm curious as to its status. It was in disarray well before. Many of the reiser4 features, like filesystem plugins, make more technical sense in the Linux VFS, but made more business sense for Namesys as a reiserfs 4 thing. That lead to a stalemate. Shouldn't it be a matter of stability though? A lot of other things matter. Things like a willingness to maintain the code after it gets merged, or at least turning the code into something the community is willing to maintain if the original developers stop maintaining it. Benchmarks suggest that reiser4 is a good file system; reiser4 is the successor to the already-accepted reiserfs; we've got experimental ext4 support but no reiser4 support, etc. Namesys kind of abandoned reiserfs after work on reiser4 started. Taking in a new code base on such a track record is not a good idea when the code is not in a shape where the community wants to maintain it. I don't see why something like plugins should matter. If it works enough to be marked as experimental, why shouldn't reiser4 support be included? It's a pain for me personally to have to patch any kernel with reiser4 support so I can use the reiser4 fs. You basically have three options: 1) keep patching every time you upgrade the kernel 2) use another filesystem 3) become the new reiser4 maintainer and turn the code into something that Linus is willing to accept I suppose. I have a feeling there's an underlying issue behind code standards (and even then, I think that code standards is ultimately an excuse for not integrating reiser4 support into the kernel, but that's just my opinion). However, is the code really in such a shape that the community doesn't want to maintain it? Obviously there's a significant number of people interested in reiser4 - if there weren't, questions like this wouldn't keep getting asked. William Heimbigner [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/