Re: [PATCH 1/13] timestamp fixes
* Nick Piggin <[EMAIL PROTECTED]> wrote: > 1/13 > ugh, has this been tested? It needs the patch below. Ingo Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- linux/kernel/sched.c.orig +++ linux/kernel/sched.c @@ -2704,11 +2704,11 @@ need_resched_nonpreemptible: schedstat_inc(rq, sched_cnt); now = sched_clock(); - if (likely((long long)now - prev->timestamp < NS_MAX_SLEEP_AVG)) + if (likely((long long)now - prev->timestamp < NS_MAX_SLEEP_AVG)) { run_time = now - prev->timestamp; if (unlikely((long long)now - prev->timestamp < 0)) run_time = 0; - else + } else run_time = NS_MAX_SLEEP_AVG; /* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Non-DMA mode for floppy on PowerPC, new version
Here is a cleaned up version of my 2.6.8 kernel patch. This patch allows to use floppy drive in non-DMA mode on PegasosPPC and AmigaOne machines. To use it: 1. Do not build floppy driver as a module, link it statically. Transferring parameters to it from insmod is still problematic, at least it doesn't work properly on my system. May be i'll clean it up in future. 2. Specify floppy=nodma in kernel's arguments. Also you'll need to specify your drive type here using floppy=,,cmos. For example, floppy=0,4,cmos specifies type 4 (1.44 mb 3.5") for drive 0 on my system. Default drive type is 2.88 mb. This patch does not affect operation of the driver in DMA mode so it's safe to use on any platform. -- Best regards, Pavel Fedin, mailto:[EMAIL PROTECTED] --- linux-2.6.8.1-10mdk/include/asm-ppc/floppy.h.orig 2004-08-14 06:55:10.0 -0400 +++ linux-2.6.8.1-10mdk/include/asm-ppc/floppy.h2005-02-24 09:41:54.594830800 -0500 @@ -11,30 +11,163 @@ #ifndef __ASM_PPC_FLOPPY_H #define __ASM_PPC_FLOPPY_H +#include + +#define CSW fd_routine[can_use_virtual_dma & 1] + #define fd_inb(port) inb_p(port) #define fd_outb(value,port)outb_p(value,port) -#define fd_enable_dma() enable_dma(FLOPPY_DMA) -#define fd_disable_dma()disable_dma(FLOPPY_DMA) -#define fd_request_dma()request_dma(FLOPPY_DMA,"floppy") -#define fd_free_dma() free_dma(FLOPPY_DMA) -#define fd_clear_dma_ff() clear_dma_ff(FLOPPY_DMA) -#define fd_set_dma_mode(mode) set_dma_mode(FLOPPY_DMA,mode) -#define fd_set_dma_addr(addr) set_dma_addr(FLOPPY_DMA,(unsigned int)virt_to_bus(addr)) -#define fd_set_dma_count(count) set_dma_count(FLOPPY_DMA,count) +#define fd_disable_dma() CSW._disable_dma(FLOPPY_DMA) +#define fd_request_dma()CSW._request_dma(FLOPPY_DMA,"floppy") +#define fd_free_dma() CSW._free_dma(FLOPPY_DMA) +#define fd_get_dma_residue()CSW._get_dma_residue(FLOPPY_DMA) +#define fd_dma_mem_alloc(size) CSW._dma_mem_alloc(size) +#define fd_dma_setup(addr, size, mode, io) CSW._dma_setup(addr, size, mode, io) #define fd_enable_irq() enable_irq(FLOPPY_IRQ) #define fd_disable_irq()disable_irq(FLOPPY_IRQ) -#define fd_cacheflush(addr,size) /* nothing */ -#define fd_request_irq()request_irq(FLOPPY_IRQ, floppy_interrupt, \ - SA_INTERRUPT|SA_SAMPLE_RANDOM, \ - "floppy", NULL) #define fd_free_irq() free_irq(FLOPPY_IRQ, NULL); -__inline__ void virtual_dma_init(void) +static int virtual_dma_count; +static int virtual_dma_residue; +static char *virtual_dma_addr; +static int virtual_dma_mode; +static int doing_pdma; + +static irqreturn_t floppy_hardint(int irq, void *dev_id, struct pt_regs * regs) +{ + unsigned char st; + + if (!doing_pdma) + return floppy_interrupt(irq, dev_id, regs); + + { + int lcount; + char *lptr; + + st = 1; + for (lcount=virtual_dma_count, lptr=virtual_dma_addr; + lcount; lcount--, lptr++) { + st=inb(virtual_dma_port+4) & 0xa0 ; + if (st != 0xa0) + break; + if (virtual_dma_mode) + outb_p(*lptr, virtual_dma_port+5); + else + *lptr = inb_p(virtual_dma_port+5); + } + virtual_dma_count = lcount; + virtual_dma_addr = lptr; + st = inb(virtual_dma_port+4); + } + + if (st == 0x20) + return IRQ_HANDLED; + if (!(st & 0x20)) { + virtual_dma_residue += virtual_dma_count; + virtual_dma_count=0; + doing_pdma = 0; + floppy_interrupt(irq, dev_id, regs); + return IRQ_HANDLED; + } + return IRQ_HANDLED; +} + +static void vdma_disable_dma(unsigned int dummy) { - /* Nothing to do on PowerPC */ + doing_pdma = 0; + virtual_dma_residue += virtual_dma_count; + virtual_dma_count=0; } +static int vdma_request_dma(unsigned int dmanr, const char * device_id) +{ + return 0; +} + +static void vdma_nop(unsigned int dummy) +{ +} + + +static int vdma_get_dma_residue(unsigned int dummy) +{ + return virtual_dma_count + virtual_dma_residue; +} + + +static int fd_request_irq(void) +{ + if (can_use_virtual_dma) + return request_irq(FLOPPY_IRQ, floppy_hardint,SA_INTERRUPT, + "floppy", NULL); + else + return request_irq(FLOPPY_IRQ, floppy_interrupt, + SA_INTERRUPT|SA_SAMPLE_RANDOM, + "floppy", NULL);
netdev-2.6, wireless-2.6 queues updated
See attached changelog. I'm too slack to post a patch tonight. Please do a bk pull bk://gkernel.bkbits.net/netdev-2.6 This will update the following files: drivers/net/bagetlance.c| 1368 - include/linux/dp83840.h | 41 Documentation/networking/bonding.txt| 2101 + Documentation/networking/e100.txt |3 Documentation/networking/ixgb.txt |9 MAINTAINERS |7 arch/arm/mach-pxa/lubbock.c |2 arch/arm/mach-sa1100/neponset.c |2 drivers/net/3c503.c | 67 drivers/net/3c509.c |4 drivers/net/3c515.c | 32 drivers/net/3c527.c |2 drivers/net/3c59x.c |2 drivers/net/8139cp.c| 100 drivers/net/8139too.c | 293 - drivers/net/Kconfig | 59 drivers/net/Makefile|2 drivers/net/Space.c | 11 drivers/net/amd8111e.c |6 drivers/net/arcnet/arc-rawmode.c|4 drivers/net/arcnet/arc-rimi.c | 14 drivers/net/arcnet/arcnet.c | 30 drivers/net/arcnet/com20020.c |6 drivers/net/arcnet/com90io.c|4 drivers/net/arcnet/com90xx.c|8 drivers/net/arcnet/rfc1051.c|8 drivers/net/arcnet/rfc1201.c| 12 drivers/net/au1000_eth.c| 1361 - drivers/net/au1000_eth.h| 55 drivers/net/b44.c |2 drivers/net/b44.h | 14 drivers/net/bonding/bond_3ad.c |2 drivers/net/bonding/bond_3ad.h |1 drivers/net/bonding/bond_alb.c | 12 drivers/net/bonding/bond_main.c | 35 drivers/net/cs89x0.c|4 drivers/net/depca.c |4 drivers/net/dgrs.c |6 drivers/net/e100.c |4 drivers/net/e1000/e1000.h |3 drivers/net/e1000/e1000_ethtool.c | 11 drivers/net/e1000/e1000_hw.c| 86 drivers/net/e1000/e1000_hw.h| 11 drivers/net/e1000/e1000_main.c | 249 - drivers/net/eepro100.c | 17 drivers/net/epic100.c |2 drivers/net/es3210.c| 32 drivers/net/ethertap.c |4 drivers/net/ewrk3.c | 87 drivers/net/fealnx.c| 275 - drivers/net/hamradio/baycom_epp.c | 53 drivers/net/hamradio/baycom_par.c |8 drivers/net/hamradio/baycom_ser_fdx.c |7 drivers/net/hamradio/baycom_ser_hdx.c |7 drivers/net/hamradio/bpqether.c | 17 drivers/net/hamradio/dmascc.c | 2073 drivers/net/hamradio/hdlcdrv.c | 48 drivers/net/hamradio/mkiss.c| 12 drivers/net/hamradio/yam.c | 38 drivers/net/ibm_emac/ibm_emac.h |4 drivers/net/ibmlana.c | 99 drivers/net/ibmlana.h |1 drivers/net/ioc3-eth.c | 83 drivers/net/irda/act200l-sir.c |3 drivers/net/irda/donauboe.c |2 drivers/net/irda/irtty-sir.c|4 drivers/net/irda/ma600-sir.c| 12 drivers/net/irda/sir_dev.c |4 drivers/net/irda/tekram-sir.c |3 drivers/net/ixgb/ixgb.h |3 drivers/net/ixgb/ixgb_ee.c | 16 drivers/net/ixgb/ixgb_ee.h |3 drivers/net/ixgb/ixgb_ethtool.c |5 drivers/net/ixgb/ixgb_hw.c |2 drivers/net/ixgb/ixgb_hw.h |2 drivers/net/ixgb/ixgb_ids.h |2 drivers/net/ixgb/ixgb_main.c| 73 drivers/net/ixgb/ixgb_osdep.h |2 drivers/net/ixgb/ixgb_param.c |2 drivers/net/jazzsonic.c | 217 drivers/net/loopback.c |2 drivers/net/lp486e.c|8 drivers/net/meth.c | 275 - drivers/net/meth.h |2 drivers/net
Re: [Lse-tech] Re: A common layer for Accounting packages
On Wed, 2005-02-23 at 11:11 -0800, Jay Lan wrote: > Guillaume Thouvenin wrote: > > It's what I'm proposing. The problem is to be alerted when a new process > > is created in order to add it in the correct group of processes if the > > parent belongs to one (or several) groups. The notification can be done > > with the fork connector patch. > > I am not quite comfortable of ELSA requesting a fork hook this way. > How many hooks in the stock kernel that are related to accounting? Can > anyone answer this question? I know of 'acct_process()' in exit.c used > by the BSD accounting and ELSA is requesting a hook in fork. If people > raise the same question again a few years later, how many people will > still remember this ELSA hook? The fork connector is not related to accounting. It's a connector that allows to send information to a user space application when a fork occurs in the kernel. This information is used by ELSA by I think that this hook will be used by some others user space applications and IMHO, it's not incompatible with a specific hook for accounting tool if needed. Guillaume - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 13/13] basic tuning
13/13 Do some basic initial tuning. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/include/asm-x86_64/topology.h === --- linux-2.6.orig/include/asm-x86_64/topology.h 2005-02-24 17:39:07.615911131 +1100 +++ linux-2.6/include/asm-x86_64/topology.h 2005-02-24 17:39:07.990864853 +1100 @@ -52,12 +52,11 @@ .cache_nice_tries = 2, \ .busy_idx = 3, \ .idle_idx = 2, \ - .newidle_idx = 1, \ + .newidle_idx = 0, \ .wake_idx = 1, \ .forkexec_idx = 1, \ .per_cpu_gain = 100, \ .flags = SD_LOAD_BALANCE \ -| SD_BALANCE_NEWIDLE \ | SD_BALANCE_FORK \ | SD_BALANCE_EXEC \ | SD_WAKE_BALANCE, \ Index: linux-2.6/include/linux/topology.h === --- linux-2.6.orig/include/linux/topology.h 2005-02-24 17:39:07.616911007 +1100 +++ linux-2.6/include/linux/topology.h 2005-02-24 17:39:07.991864730 +1100 @@ -118,15 +118,14 @@ .cache_nice_tries = 1, \ .per_cpu_gain = 100, \ .busy_idx = 2, \ - .idle_idx = 0, \ - .newidle_idx = 1, \ + .idle_idx = 1, \ + .newidle_idx = 2, \ .wake_idx = 1, \ - .forkexec_idx = 0, \ + .forkexec_idx = 1, \ .flags = SD_LOAD_BALANCE \ | SD_BALANCE_NEWIDLE \ | SD_BALANCE_EXEC \ -| SD_WAKE_AFFINE \ -| SD_WAKE_BALANCE, \ +| SD_WAKE_AFFINE, \ .last_balance = jiffies, \ .balance_interval = 1, \ .nr_balance_failed = 0, \
Re: [8/14] Orinoco driver updates - PCMCIA initialization cleanups
Dominik Brodowski wrote: @@ -184,6 +186,7 @@ dev_list = link; client_reg.dev_info = &dev_info; + client_reg.Attributes = INFO_IO_CLIENT | INFO_CARD_SHARE; That's not needed any longer for 2.6. So who wants to send the incremental update patch? :) Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 11/13] sched-domains aware balance-on-fork
11/13 Reimplement the balance on exec balancing to be sched-domains aware. Use this to also do balance on fork balancing. Make x86_64 do balance on fork over the NUMA domain. The problem that the non sched domains aware blancing became apparent on dual core, multi socket opterons. What we want is for the new tasks to be sent to a different socket, but more often than not, we would first load up our sibling core, or fill two cores of a single remote socket before selecting a new one. This gives large improvements to STREAM on such systems. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/include/asm-x86_64/topology.h === --- linux-2.6.orig/include/asm-x86_64/topology.h 2005-02-24 17:39:07.320947536 +1100 +++ linux-2.6/include/asm-x86_64/topology.h 2005-02-24 17:43:37.077660523 +1100 @@ -54,9 +54,11 @@ .idle_idx = 2, \ .newidle_idx = 1, \ .wake_idx = 1, \ + .forkexec_idx = 1, \ .per_cpu_gain = 100, \ .flags = SD_LOAD_BALANCE \ | SD_BALANCE_NEWIDLE \ +| SD_BALANCE_FORK \ | SD_BALANCE_EXEC \ | SD_WAKE_BALANCE, \ .last_balance = jiffies, \ Index: linux-2.6/include/linux/sched.h === --- linux-2.6.orig/include/linux/sched.h 2005-02-24 17:39:06.806011090 +1100 +++ linux-2.6/include/linux/sched.h 2005-02-24 17:43:37.274636222 +1100 @@ -423,10 +423,11 @@ #define SD_LOAD_BALANCE 1 /* Do load balancing on this domain. */ #define SD_BALANCE_NEWIDLE 2 /* Balance when about to become idle */ #define SD_BALANCE_EXEC 4 /* Balance on exec */ -#define SD_WAKE_IDLE 8 /* Wake to idle CPU on task wakeup */ -#define SD_WAKE_AFFINE 16 /* Wake task to waking CPU */ -#define SD_WAKE_BALANCE 32 /* Perform balancing at task wakeup */ -#define SD_SHARE_CPUPOWER 64 /* Domain members share cpu power */ +#define SD_BALANCE_FORK 8 /* Balance on fork, clone */ +#define SD_WAKE_IDLE 16 /* Wake to idle CPU on task wakeup */ +#define SD_WAKE_AFFINE 32 /* Wake task to waking CPU */ +#define SD_WAKE_BALANCE 64 /* Perform balancing at task wakeup */ +#define SD_SHARE_CPUPOWER 128 /* Domain members share cpu power */ struct sched_group { struct sched_group *next; /* Must be a circular list */ @@ -455,6 +456,7 @@ unsigned int idle_idx; unsigned int newidle_idx; unsigned int wake_idx; + unsigned int forkexec_idx; int flags; /* See SD_* */ /* Runtime fields. */ Index: linux-2.6/include/linux/topology.h === --- linux-2.6.orig/include/linux/topology.h 2005-02-24 17:39:07.320947536 +1100 +++ linux-2.6/include/linux/topology.h 2005-02-24 17:43:37.078660399 +1100 @@ -90,6 +90,7 @@ .idle_idx = 0, \ .newidle_idx = 0, \ .wake_idx = 0, \ + .forkexec_idx = 0, \ .flags = SD_LOAD_BALANCE \ | SD_BALANCE_NEWIDLE \ | SD_BALANCE_EXEC \ @@ -120,6 +121,7 @@ .idle_idx = 0, \ .newidle_idx = 1, \ .wake_idx = 1, \ + .forkexec_idx = 0, \ .flags = SD_LOAD_BALANCE \ | SD_BALANCE_NEWIDLE \ | SD_BALANCE_EXEC \ Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c 2005-02-24 17:39:07.322947289 +1100 +++ linux-2.6/kernel/sched.c 2005-02-24 17:43:37.274636222 +1100 @@ -891,6 +891,79 @@ return max(rq->cpu_load[type-1], load_now); } +/* + * find_idlest_group finds and returns the least busy CPU group within the + * domain. + */ +static struct sched_group * +find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu) +{ + struct sched_group *idlest = NULL, *this = NULL, *group = sd->groups; + unsigned long min_load = ULONG_MAX, this_load = 0; + int load_idx = sd->forkexec_idx; + int imbalance = 100 + (sd->imbalance_pct-100)/2; + + do { + unsigned long load, avg_load; + int local_group; + int i; + + local_group = cpu_isset(this_cpu, group->cpumask); + /* XXX: put a cpus allowed check */ + + /* Tally up the load of all CPUs in the group */ + avg_load = 0; + + for_each_cpu_mask(i, group->cpumask) { + /* Bias balancing toward cpus of our domain */ + if (local_group) +load = source_load(i, load_idx); + else +load = target_load(i, load_idx); + + avg_load += load; + } + + /* Adjust by relative CPU power of the group */ + avg_load = (avg_load * SCHED_LOAD_SCALE) / group->cpu_power; + + if (local_group) { + this_load = avg_load; + this = group; + } else if (avg_load < min_load) { + min_load = avg_load; + idlest = group; + } + group = group->next; + } while (group != sd->groups); + + if (!idlest || 100*this_load < imbalance*min_load) + return NULL; + return idlest; +} + +/* + * find_idlest_queue - find the idlest runqueue among the cpus in group. + */ +static int find_idlest_cpu(struct sched_group *group, int this_cpu) +{ + unsigned long load, min_load = ULONG_MAX; + int idlest = -1; + int i; + +
[PATCH 10/13] remove aggressive idle balancing
10/13 Remove the very aggressive idle stuff that has recently gone into 2.6 - it is going against the direction we are trying to go. Hopefully we can regain performance through other methods. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/include/asm-i386/topology.h === --- linux-2.6.orig/include/asm-i386/topology.h 2005-02-24 17:39:06.805011214 +1100 +++ linux-2.6/include/asm-i386/topology.h 2005-02-24 17:39:07.320947536 +1100 @@ -85,7 +85,6 @@ .flags = SD_LOAD_BALANCE \ | SD_BALANCE_EXEC \ | SD_BALANCE_NEWIDLE \ -| SD_WAKE_IDLE \ | SD_WAKE_BALANCE, \ .last_balance = jiffies, \ .balance_interval = 1, \ Index: linux-2.6/include/asm-x86_64/topology.h === --- linux-2.6.orig/include/asm-x86_64/topology.h 2005-02-24 17:39:06.805011214 +1100 +++ linux-2.6/include/asm-x86_64/topology.h 2005-02-24 17:43:37.503607973 +1100 @@ -58,7 +58,6 @@ .flags = SD_LOAD_BALANCE \ | SD_BALANCE_NEWIDLE \ | SD_BALANCE_EXEC \ -| SD_WAKE_IDLE \ | SD_WAKE_BALANCE, \ .last_balance = jiffies, \ .balance_interval = 1, \ Index: linux-2.6/include/linux/topology.h === --- linux-2.6.orig/include/linux/topology.h 2005-02-24 17:39:06.806011090 +1100 +++ linux-2.6/include/linux/topology.h 2005-02-24 17:43:37.503607973 +1100 @@ -124,7 +124,6 @@ | SD_BALANCE_NEWIDLE \ | SD_BALANCE_EXEC \ | SD_WAKE_AFFINE \ -| SD_WAKE_IDLE \ | SD_WAKE_BALANCE, \ .last_balance = jiffies, \ .balance_interval = 1, \ Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c 2005-02-24 17:39:07.057979992 +1100 +++ linux-2.6/kernel/sched.c 2005-02-24 17:43:37.504607850 +1100 @@ -412,22 +412,6 @@ return rq; } -#ifdef CONFIG_SCHED_SMT -static int cpu_and_siblings_are_idle(int cpu) -{ - int sib; - for_each_cpu_mask(sib, cpu_sibling_map[cpu]) { - if (idle_cpu(sib)) - continue; - return 0; - } - - return 1; -} -#else -#define cpu_and_siblings_are_idle(A) idle_cpu(A) -#endif - #ifdef CONFIG_SCHEDSTATS /* * Called when a process is dequeued from the active array and given @@ -1650,16 +1634,15 @@ /* * Aggressive migration if: - * 1) the [whole] cpu is idle, or + * 1) task is cache cold, or * 2) too many balance attempts have failed. */ - if (cpu_and_siblings_are_idle(this_cpu) || \ - sd->nr_balance_failed > sd->cache_nice_tries) + if (sd->nr_balance_failed > sd->cache_nice_tries) return 1; if (task_hot(p, rq->timestamp_last_tick, sd)) - return 0; + return 0; return 1; } @@ -2131,7 +2114,7 @@ if (cpu_isset(cpu, visited_cpus)) continue; cpu_set(cpu, visited_cpus); -if (!cpu_and_siblings_are_idle(cpu) || cpu == busiest_cpu) +if (cpu == busiest_cpu) continue; target_rq = cpu_rq(cpu);
[PATCH 12/13] schedstats additions for sched-balance-fork
12/13 Add SCHEDSTAT statistics for sched-balance-fork. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/include/linux/sched.h === --- linux-2.6.orig/include/linux/sched.h 2005-02-24 17:39:07.616911007 +1100 +++ linux-2.6/include/linux/sched.h 2005-02-24 17:39:07.819885956 +1100 @@ -480,10 +480,16 @@ unsigned long alb_failed; unsigned long alb_pushed; - /* sched_balance_exec() stats */ - unsigned long sbe_attempts; + /* SD_BALANCE_EXEC stats */ + unsigned long sbe_cnt; + unsigned long sbe_balanced; unsigned long sbe_pushed; + /* SD_BALANCE_FORK stats */ + unsigned long sbf_cnt; + unsigned long sbf_balanced; + unsigned long sbf_pushed; + /* try_to_wake_up() stats */ unsigned long ttwu_wake_remote; unsigned long ttwu_move_affine; Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c 2005-02-24 17:39:07.618910761 +1100 +++ linux-2.6/kernel/sched.c 2005-02-24 17:43:36.887683960 +1100 @@ -307,7 +307,7 @@ * bump this up when changing the output format or the meaning of an existing * format, so that tools can adapt (or abort) */ -#define SCHEDSTAT_VERSION 11 +#define SCHEDSTAT_VERSION 12 static int show_schedstat(struct seq_file *seq, void *v) { @@ -354,9 +354,10 @@ sd->lb_nobusyq[itype], sd->lb_nobusyg[itype]); } - seq_printf(seq, " %lu %lu %lu %lu %lu %lu %lu %lu\n", + seq_printf(seq, " %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu\n", sd->alb_cnt, sd->alb_failed, sd->alb_pushed, - sd->sbe_pushed, sd->sbe_attempts, + sd->sbe_cnt, sd->sbe_balanced, sd->sbe_pushed, + sd->sbf_cnt, sd->sbf_balanced, sd->sbf_pushed, sd->ttwu_wake_remote, sd->ttwu_move_affine, sd->ttwu_move_balance); } #endif @@ -1262,24 +1263,34 @@ sd = tmp; if (sd) { + int new_cpu; struct sched_group *group; + schedstat_inc(sd, sbf_cnt); cpu = task_cpu(p); group = find_idlest_group(sd, p, cpu); - if (group) { - int new_cpu; - new_cpu = find_idlest_cpu(group, cpu); - if (new_cpu != -1 && new_cpu != cpu && - cpu_isset(new_cpu, p->cpus_allowed)) { -set_task_cpu(p, new_cpu); -task_rq_unlock(rq, &flags); -rq = task_rq_lock(p, &flags); -cpu = task_cpu(p); - } + if (!group) { + schedstat_inc(sd, sbf_balanced); + goto no_forkbalance; + } + + new_cpu = find_idlest_cpu(group, cpu); + if (new_cpu == -1 || new_cpu == cpu) { + schedstat_inc(sd, sbf_balanced); + goto no_forkbalance; + } + + if (cpu_isset(new_cpu, p->cpus_allowed)) { + schedstat_inc(sd, sbf_pushed); + set_task_cpu(p, new_cpu); + task_rq_unlock(rq, &flags); + rq = task_rq_lock(p, &flags); + cpu = task_cpu(p); } } -#endif +no_forkbalance: +#endif /* * We decrease the sleep average of forking parents * and children as well, to keep max-interactive tasks @@ -1616,30 +1627,28 @@ struct sched_domain *tmp, *sd = NULL; int new_cpu, this_cpu = get_cpu(); - /* Prefer the current CPU if there's only this task running */ - if (this_rq()->nr_running <= 1) - goto out; - for_each_domain(this_cpu, tmp) if (tmp->flags & SD_BALANCE_EXEC) sd = tmp; if (sd) { struct sched_group *group; - schedstat_inc(sd, sbe_attempts); + schedstat_inc(sd, sbe_cnt); group = find_idlest_group(sd, current, this_cpu); - if (!group) + if (!group) { + schedstat_inc(sd, sbe_balanced); goto out; + } new_cpu = find_idlest_cpu(group, this_cpu); - if (new_cpu == -1) + if (new_cpu == -1 || new_cpu == this_cpu) { + schedstat_inc(sd, sbe_balanced); goto out; - - if (new_cpu != this_cpu) { - schedstat_inc(sd, sbe_pushed); - put_cpu(); - sched_migrate_task(current, new_cpu); - return; } + + schedstat_inc(sd, sbe_pushed); + put_cpu(); + sched_migrate_task(current, new_cpu); + return; } out: put_cpu();
[PATCH 8/13] generalised CPU load averaging
8/13 Do CPU load averaging over a number of different intervals. Allow each interval to be chosen by sending a parameter to source_load and target_load. 0 is instantaneous, idx > 0 returns a decaying average with the most recent sample weighted at 2^(idx-1). To a maximum of 3 (could be easily increased). So generally a higher number will result in more conservative balancing. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/include/asm-i386/topology.h === --- linux-2.6.orig/include/asm-i386/topology.h 2005-02-24 17:31:22.664322588 +1100 +++ linux-2.6/include/asm-i386/topology.h 2005-02-24 17:43:37.733579601 +1100 @@ -77,6 +77,10 @@ .imbalance_pct = 125, \ .cache_hot_time = (10*100), \ .cache_nice_tries = 1, \ + .busy_idx = 3, \ + .idle_idx = 1, \ + .newidle_idx = 2, \ + .wake_idx = 1, \ .per_cpu_gain = 100, \ .flags = SD_LOAD_BALANCE \ | SD_BALANCE_EXEC \ Index: linux-2.6/include/asm-x86_64/topology.h === --- linux-2.6.orig/include/asm-x86_64/topology.h 2005-02-24 17:31:22.664322588 +1100 +++ linux-2.6/include/asm-x86_64/topology.h 2005-02-24 17:43:37.733579601 +1100 @@ -49,7 +49,11 @@ .busy_factor = 32, \ .imbalance_pct = 125, \ .cache_hot_time = (10*100), \ - .cache_nice_tries = 1, \ + .cache_nice_tries = 2, \ + .busy_idx = 3, \ + .idle_idx = 2, \ + .newidle_idx = 1, \ + .wake_idx = 1, \ .per_cpu_gain = 100, \ .flags = SD_LOAD_BALANCE \ | SD_BALANCE_NEWIDLE \ Index: linux-2.6/include/linux/sched.h === --- linux-2.6.orig/include/linux/sched.h 2005-02-24 17:31:28.428610071 +1100 +++ linux-2.6/include/linux/sched.h 2005-02-24 17:43:37.503607973 +1100 @@ -451,6 +451,10 @@ unsigned long long cache_hot_time; /* Task considered cache hot (ns) */ unsigned int cache_nice_tries; /* Leave cache hot tasks for # tries */ unsigned int per_cpu_gain; /* CPU % gained by adding domain cpus */ + unsigned int busy_idx; + unsigned int idle_idx; + unsigned int newidle_idx; + unsigned int wake_idx; int flags; /* See SD_* */ /* Runtime fields. */ Index: linux-2.6/include/linux/topology.h === --- linux-2.6.orig/include/linux/topology.h 2005-02-24 17:31:22.665322464 +1100 +++ linux-2.6/include/linux/topology.h 2005-02-24 17:43:37.733579601 +1100 @@ -86,6 +86,10 @@ .cache_hot_time = 0, \ .cache_nice_tries = 0, \ .per_cpu_gain = 25, \ + .busy_idx = 0, \ + .idle_idx = 0, \ + .newidle_idx = 0, \ + .wake_idx = 0, \ .flags = SD_LOAD_BALANCE \ | SD_BALANCE_NEWIDLE \ | SD_BALANCE_EXEC \ @@ -112,6 +116,10 @@ .cache_hot_time = (5*100/2), \ .cache_nice_tries = 1, \ .per_cpu_gain = 100, \ + .busy_idx = 2, \ + .idle_idx = 0, \ + .newidle_idx = 1, \ + .wake_idx = 1, \ .flags = SD_LOAD_BALANCE \ | SD_BALANCE_NEWIDLE \ | SD_BALANCE_EXEC \ Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c 2005-02-24 17:39:06.530045151 +1100 +++ linux-2.6/kernel/sched.c 2005-02-24 17:43:37.913557397 +1100 @@ -204,7 +204,7 @@ */ unsigned long nr_running; #ifdef CONFIG_SMP - unsigned long cpu_load; + unsigned long cpu_load[3]; #endif unsigned long long nr_switches; @@ -884,23 +884,27 @@ * We want to under-estimate the load of migration sources, to * balance conservatively. */ -static inline unsigned long source_load(int cpu) +static inline unsigned long source_load(int cpu, int type) { runqueue_t *rq = cpu_rq(cpu); unsigned long load_now = rq->nr_running * SCHED_LOAD_SCALE; + if (type == 0) + return load_now; - return min(rq->cpu_load, load_now); + return min(rq->cpu_load[type-1], load_now); } /* * Return a high guess at the load of a migration-target cpu */ -static inline unsigned long target_load(int cpu) +static inline unsigned long target_load(int cpu, int type) { runqueue_t *rq = cpu_rq(cpu); unsigned long load_now = rq->nr_running * SCHED_LOAD_SCALE; + if (type == 0) + return load_now; - return max(rq->cpu_load, load_now); + return max(rq->cpu_load[type-1], load_now); } #endif @@ -965,7 +969,7 @@ runqueue_t *rq; #ifdef CONFIG_SMP unsigned long load, this_load; - struct sched_domain *sd; + struct sched_domain *sd, *this_sd = NULL; int new_cpu; #endif @@ -984,72 +988,64 @@ if (unlikely(task_running(rq, p))) goto out_activate; -#ifdef CONFIG_SCHEDSTATS + new_cpu = cpu; + schedstat_inc(rq, ttwu_cnt); if (cpu == this_cpu) { schedstat_inc(rq, ttwu_local); - } else { - for_each_domain(this_cpu, sd) { - if (cpu_isset(cpu, sd->span)) { -schedstat_inc(sd, ttwu_wake_remote); -break; - } + goto out_set_cpu; + } + + for_each_domain(this_cpu, sd) { + if (
[PATCH 9/13] less affine wakups
9/13 Do less affine wakeups. We're trying to reduce dbt2-pgsql idle time regressions here... make sure we don't don't move tasks the wrong way in an imbalance condition. Also, remove the cache coldness requirement from the calculation - this seems to induce sharp cutoff points where behaviour will suddenly change on some workloads if the load creeps slightly over or under some point. It is good for periodic balancing because in that case have otherwise have no other context to determine what task to move. But also make a minor tweak to "wake balancing" - the imbalance tolerance is now set at half the domain's imbalance, so we get the opportunity to do wake balancing before the more random periodic rebalancing gets preformed. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c 2005-02-24 17:39:06.808010844 +1100 +++ linux-2.6/kernel/sched.c 2005-02-24 17:43:37.734579478 +1100 @@ -1014,38 +1014,45 @@ int idx = this_sd->wake_idx; unsigned int imbalance; + imbalance = 100 + (this_sd->imbalance_pct - 100) / 2; + load = source_load(cpu, idx); this_load = target_load(this_cpu, idx); - /* - * If sync wakeup then subtract the (maximum possible) effect of - * the currently running task from the load of the current CPU: - */ - if (sync) - this_load -= SCHED_LOAD_SCALE; - - /* Don't pull the task off an idle CPU to a busy one */ - if (load < SCHED_LOAD_SCALE/2 && this_load > SCHED_LOAD_SCALE/2) - goto out_set_cpu; - new_cpu = this_cpu; /* Wake to this CPU if we can */ - if ((this_sd->flags & SD_WAKE_AFFINE) && - !task_hot(p, rq->timestamp_last_tick, this_sd)) { - /* - * This domain has SD_WAKE_AFFINE and p is cache cold - * in this domain. - */ - schedstat_inc(this_sd, ttwu_move_affine); - goto out_set_cpu; - } else if ((this_sd->flags & SD_WAKE_BALANCE) && -imbalance*this_load <= 100*load) { + if (this_sd->flags & SD_WAKE_AFFINE) { + unsigned long tl = this_load; /* - * This domain has SD_WAKE_BALANCE and there is - * an imbalance. + * If sync wakeup then subtract the (maximum possible) + * effect of the currently running task from the load + * of the current CPU: */ - schedstat_inc(this_sd, ttwu_move_balance); - goto out_set_cpu; + if (sync) +tl -= SCHED_LOAD_SCALE; + + if ((tl <= load && +tl + target_load(cpu, idx) <= SCHED_LOAD_SCALE) || +100*(tl + SCHED_LOAD_SCALE) <= imbalance*load) { +/* + * This domain has SD_WAKE_AFFINE and + * p is cache cold in this domain, and + * there is no bad imbalance. + */ +schedstat_inc(this_sd, ttwu_move_affine); +goto out_set_cpu; + } + } + + /* + * Start passive balancing when half the imbalance_pct + * limit is reached. + */ + if (this_sd->flags & SD_WAKE_BALANCE) { + if (imbalance*this_load <= 100*load) { +schedstat_inc(this_sd, ttwu_move_balance); +goto out_set_cpu; + } } }
Re: [RFC] PCI bridge driver rewrite
When you start writing the PCI root bridge driver you'll run into the AGP drivers that are already attached to the bridge. I was surprised by this since I expected AGP to be attached to the AGP bridge but now I learned that it is a root bridge function. An ISA LPC bridge driver would be nice too. It would let you turn off serial ports, etc and let other systems know how many ports there are. No real need for this, just a nice toy. Does this work to cause a probe based on PCI class? static struct pci_device_id p2p_id_tbl[] = { { PCI_DEVICE_CLASS(PCI_CLASS_BRIDGE_PCI << 8, 0x00) }, { 0 }, }; I would like to install a driver that gets called whenever new CLASS_VGA hardware shows up via hotplug. It won't attach to the device, it will just add some sysfs attributes. The framebuffer drivers need to attach the device. If I add attributes this way how can I remove them? -- Jon Smirl [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 7/13] better active balancing heuristic
7/13 Fix up active load balancing a bit so it doesn't get called when it shouldn't. Reset the nr_balance_failed counter at more points where we have found conditions to be balanced. This reduces too aggressive active balancing seen on some workloads. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c 2005-02-24 17:39:05.851128944 +1100 +++ linux-2.6/kernel/sched.c 2005-02-24 17:43:38.162526682 +1100 @@ -2009,6 +2009,7 @@ schedstat_inc(sd, lb_balanced[idle]); + sd->nr_balance_failed = 0; /* tune up the balancing interval */ if (sd->balance_interval < sd->max_interval) sd->balance_interval *= 2; @@ -2034,16 +2035,14 @@ schedstat_inc(sd, lb_cnt[NEWLY_IDLE]); group = find_busiest_group(sd, this_cpu, &imbalance, NEWLY_IDLE); if (!group) { - schedstat_inc(sd, lb_balanced[NEWLY_IDLE]); schedstat_inc(sd, lb_nobusyg[NEWLY_IDLE]); - goto out; + goto out_balanced; } busiest = find_busiest_queue(group); if (!busiest || busiest == this_rq) { - schedstat_inc(sd, lb_balanced[NEWLY_IDLE]); schedstat_inc(sd, lb_nobusyq[NEWLY_IDLE]); - goto out; + goto out_balanced; } /* Attempt to move tasks */ @@ -2054,11 +2053,16 @@ imbalance, sd, NEWLY_IDLE, &all_pinned); if (!nr_moved) schedstat_inc(sd, lb_failed[NEWLY_IDLE]); + else +sd->nr_balance_failed = 0; spin_unlock(&busiest->lock); - -out: return nr_moved; + +out_balanced: + schedstat_inc(sd, lb_balanced[NEWLY_IDLE]); + sd->nr_balance_failed = 0; + return 0; } /*
[PATCH 5/13] find_busiest_group cleanup
5/13 Cleanup find_busiest_group a bit. New sched-domains code means we can't have groups without a CPU. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c 2005-02-24 17:31:29.298502546 +1100 +++ linux-2.6/kernel/sched.c 2005-02-24 17:43:38.629469074 +1100 @@ -1771,7 +1771,7 @@ do { unsigned long load; int local_group; - int i, nr_cpus = 0; + int i; local_group = cpu_isset(this_cpu, group->cpumask); @@ -1785,13 +1785,9 @@ else load = source_load(i); - nr_cpus++; avg_load += load; } - if (!nr_cpus) - goto nextgroup; - total_load += avg_load; total_pwr += group->cpu_power;
[PATCH 4/13] find_busiest_group fixlets
4/13 Fix up a few small warts in the periodic multiprocessor rebalancing code. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c 2005-02-24 17:31:28.431609701 +1100 +++ linux-2.6/kernel/sched.c 2005-02-24 17:43:38.806447240 +1100 @@ -1830,13 +1830,12 @@ * by pulling tasks to us. Be careful of negative numbers as they'll * appear as very large values with unsigned longs. */ - *imbalance = min(max_load - avg_load, avg_load - this_load); - /* How much load to actually move to equalise the imbalance */ - *imbalance = (*imbalance * min(busiest->cpu_power, this->cpu_power)) -/ SCHED_LOAD_SCALE; + *imbalance = min((max_load - avg_load) * busiest->cpu_power, +(avg_load - this_load) * this->cpu_power) + / SCHED_LOAD_SCALE; - if (*imbalance < SCHED_LOAD_SCALE - 1) { + if (*imbalance < SCHED_LOAD_SCALE) { unsigned long pwr_now = 0, pwr_move = 0; unsigned long tmp; @@ -1862,14 +1861,16 @@ max_load - tmp); /* Amount of load we'd add */ - tmp = SCHED_LOAD_SCALE*SCHED_LOAD_SCALE/this->cpu_power; - if (max_load < tmp) - tmp = max_load; + if (max_load*busiest->cpu_power < +SCHED_LOAD_SCALE*SCHED_LOAD_SCALE) + tmp = max_load*busiest->cpu_power/this->cpu_power; + else + tmp = SCHED_LOAD_SCALE*SCHED_LOAD_SCALE/this->cpu_power; pwr_move += this->cpu_power*min(SCHED_LOAD_SCALE, this_load + tmp); pwr_move /= SCHED_LOAD_SCALE; - /* Move if we gain another 8th of a CPU worth of throughput */ - if (pwr_move < pwr_now + SCHED_LOAD_SCALE / 8) + /* Move if we gain throughput */ + if (pwr_move <= pwr_now) goto out_balanced; *imbalance = 1; @@ -1877,7 +1878,7 @@ } /* Get rid of the scaling factor, rounding down as we divide */ - *imbalance = (*imbalance + 1) / SCHED_LOAD_SCALE; + *imbalance = *imbalance / SCHED_LOAD_SCALE; return busiest;
[PATCH 3/13] rework schedstats
3/13 I have an updated userspace parser for this thing, if you are still keeping it on your website. Move balancing fields into struct sched_domain, so we can get more useful results on systems with multiple domains (eg SMT+SMP, CMP+NUMA, SMP+NUMA, etc). Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/include/linux/sched.h === --- linux-2.6.orig/include/linux/sched.h 2005-02-24 17:31:24.598083557 +1100 +++ linux-2.6/include/linux/sched.h 2005-02-24 17:43:38.161526805 +1100 @@ -462,17 +462,26 @@ /* load_balance() stats */ unsigned long lb_cnt[MAX_IDLE_TYPES]; unsigned long lb_failed[MAX_IDLE_TYPES]; + unsigned long lb_balanced[MAX_IDLE_TYPES]; unsigned long lb_imbalance[MAX_IDLE_TYPES]; + unsigned long lb_gained[MAX_IDLE_TYPES]; + unsigned long lb_hot_gained[MAX_IDLE_TYPES]; unsigned long lb_nobusyg[MAX_IDLE_TYPES]; unsigned long lb_nobusyq[MAX_IDLE_TYPES]; + /* Active load balancing */ + unsigned long alb_cnt; + unsigned long alb_failed; + unsigned long alb_pushed; + /* sched_balance_exec() stats */ unsigned long sbe_attempts; unsigned long sbe_pushed; /* try_to_wake_up() stats */ - unsigned long ttwu_wake_affine; - unsigned long ttwu_wake_balance; + unsigned long ttwu_wake_remote; + unsigned long ttwu_move_affine; + unsigned long ttwu_move_balance; #endif }; Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c 2005-02-24 17:31:27.503724395 +1100 +++ linux-2.6/kernel/sched.c 2005-02-24 17:43:38.983425407 +1100 @@ -246,35 +246,13 @@ unsigned long yld_cnt; /* schedule() stats */ - unsigned long sched_noswitch; unsigned long sched_switch; unsigned long sched_cnt; unsigned long sched_goidle; - /* pull_task() stats */ - unsigned long pt_gained[MAX_IDLE_TYPES]; - unsigned long pt_lost[MAX_IDLE_TYPES]; - - /* active_load_balance() stats */ - unsigned long alb_cnt; - unsigned long alb_lost; - unsigned long alb_gained; - unsigned long alb_failed; - /* try_to_wake_up() stats */ unsigned long ttwu_cnt; - unsigned long ttwu_attempts; - unsigned long ttwu_moved; - - /* wake_up_new_task() stats */ - unsigned long wunt_cnt; - unsigned long wunt_moved; - - /* sched_migrate_task() stats */ - unsigned long smt_cnt; - - /* sched_balance_exec() stats */ - unsigned long sbe_cnt; + unsigned long ttwu_local; #endif }; @@ -329,7 +307,7 @@ * bump this up when changing the output format or the meaning of an existing * format, so that tools can adapt (or abort) */ -#define SCHEDSTAT_VERSION 10 +#define SCHEDSTAT_VERSION 11 static int show_schedstat(struct seq_file *seq, void *v) { @@ -347,22 +325,14 @@ /* runqueue-specific stats */ seq_printf(seq, - "cpu%d %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu " - "%lu %lu %lu %lu %lu %lu %lu %lu %lu %lu", + "cpu%d %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu", cpu, rq->yld_both_empty, - rq->yld_act_empty, rq->yld_exp_empty, - rq->yld_cnt, rq->sched_noswitch, + rq->yld_act_empty, rq->yld_exp_empty, rq->yld_cnt, rq->sched_switch, rq->sched_cnt, rq->sched_goidle, - rq->alb_cnt, rq->alb_gained, rq->alb_lost, - rq->alb_failed, - rq->ttwu_cnt, rq->ttwu_moved, rq->ttwu_attempts, - rq->wunt_cnt, rq->wunt_moved, - rq->smt_cnt, rq->sbe_cnt, rq->rq_sched_info.cpu_time, + rq->ttwu_cnt, rq->ttwu_local, + rq->rq_sched_info.cpu_time, rq->rq_sched_info.run_delay, rq->rq_sched_info.pcnt); - for (itype = SCHED_IDLE; itype < MAX_IDLE_TYPES; itype++) - seq_printf(seq, " %lu %lu", rq->pt_gained[itype], - rq->pt_lost[itype]); seq_printf(seq, "\n"); #ifdef CONFIG_SMP @@ -373,17 +343,21 @@ cpumask_scnprintf(mask_str, NR_CPUS, sd->span); seq_printf(seq, "domain%d %s", dcnt++, mask_str); for (itype = SCHED_IDLE; itype < MAX_IDLE_TYPES; - itype++) { -seq_printf(seq, " %lu %lu %lu %lu %lu", + itype++) { +seq_printf(seq, " %lu %lu %lu %lu %lu %lu %lu %lu", sd->lb_cnt[itype], +sd->lb_balanced[itype], sd->lb_failed[itype], sd->lb_imbalance[itype], +sd->lb_gained[itype], +sd->lb_hot_gained[itype], sd->lb_nobusyq[itype], sd->lb_nobusyg[itype]); } - seq_printf(seq, " %lu %lu %lu %lu\n", + seq_printf(seq, " %lu %lu %lu %lu %lu %lu %lu %lu\n", + sd->alb_cnt, sd->alb_failed, sd->alb_pushed, sd->sbe_pushed, sd->sbe_attempts, - sd->ttwu_wake_affine, sd->ttwu_wake_balance); + sd->ttwu_wake_remote, sd->ttwu_move_affine, sd->ttwu_move_balance); } #endif } @@ -996,7 +970,6 @@ #endif rq = task_rq_lock(p, &flags); - schedstat_inc(rq, ttwu_cnt); old_state = p->state; if (!(old_state & state)) goto out; @@ -1011,8 +984,21 @@ if (unlikely(task_running(rq, p))) goto out_activate; - new_cpu = cpu; +#ifdef CONFIG_SCHEDSTATS + schedstat_inc(rq, ttwu_
[PATCH 2/13] improve pinned task handling
2/13 John Hawkes explained the problem best: A large number of processes that are pinned to a single CPU results in every other CPU's load_balance() seeing this overloaded CPU as "busiest", yet move_tasks() never finds a task to pull-migrate. This condition occurs during module unload, but can also occur as a denial-of-service using sys_sched_setaffinity(). Several hundred CPUs performing this fruitless load_balance() will livelock on the busiest CPU's runqueue lock. A smaller number of CPUs will livelock if the pinned task count gets high. Expanding slightly on John's patch, this one attempts to work out whether the balancing failure has been due to too many tasks pinned on the runqueue. This allows it to be basically invisible to the regular blancing paths (ie. when there are no pinned tasks). We can use this extra knowledge to shut down the balancing faster, and ensure the migration threads don't start running which is another problem observed in the wild. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c 2005-02-24 17:31:27.042781371 +1100 +++ linux-2.6/kernel/sched.c 2005-02-24 17:43:39.180401105 +1100 @@ -1650,7 +1650,7 @@ */ static inline int can_migrate_task(task_t *p, runqueue_t *rq, int this_cpu, - struct sched_domain *sd, enum idle_type idle) + struct sched_domain *sd, enum idle_type idle, int *pinned) { /* * We do not migrate tasks that are: @@ -1660,8 +1660,10 @@ */ if (task_running(rq, p)) return 0; - if (!cpu_isset(this_cpu, p->cpus_allowed)) + if (!cpu_isset(this_cpu, p->cpus_allowed)) { + *pinned++; return 0; + } /* * Aggressive migration if: @@ -1687,11 +1689,11 @@ */ static int move_tasks(runqueue_t *this_rq, int this_cpu, runqueue_t *busiest, unsigned long max_nr_move, struct sched_domain *sd, - enum idle_type idle) + enum idle_type idle, int *all_pinned) { prio_array_t *array, *dst_array; struct list_head *head, *curr; - int idx, pulled = 0; + int idx, pulled = 0, pinned = 0; task_t *tmp; if (max_nr_move <= 0 || busiest->nr_running <= 1) @@ -1735,7 +1737,7 @@ curr = curr->prev; - if (!can_migrate_task(tmp, busiest, this_cpu, sd, idle)) { + if (!can_migrate_task(tmp, busiest, this_cpu, sd, idle, &pinned)) { if (curr != head) goto skip_queue; idx++; @@ -1761,6 +1763,9 @@ goto skip_bitmap; } out: + *all_pinned = 0; + if (unlikely(pinned >= max_nr_move) && pulled == 0) + *all_pinned = 1; return pulled; } @@ -1935,7 +1940,7 @@ struct sched_group *group; runqueue_t *busiest; unsigned long imbalance; - int nr_moved; + int nr_moved, all_pinned; spin_lock(&this_rq->lock); schedstat_inc(sd, lb_cnt[idle]); @@ -1974,9 +1979,14 @@ */ double_lock_balance(this_rq, busiest); nr_moved = move_tasks(this_rq, this_cpu, busiest, - imbalance, sd, idle); + imbalance, sd, idle, + &all_pinned); spin_unlock(&busiest->lock); } + /* All tasks on this runqueue were pinned by CPU affinity */ + if (unlikely(all_pinned)) + goto out_balanced; + spin_unlock(&this_rq->lock); if (!nr_moved) { @@ -2041,7 +2051,7 @@ struct sched_group *group; runqueue_t *busiest = NULL; unsigned long imbalance; - int nr_moved = 0; + int nr_moved = 0, all_pinned; schedstat_inc(sd, lb_cnt[NEWLY_IDLE]); group = find_busiest_group(sd, this_cpu, &imbalance, NEWLY_IDLE); @@ -2061,7 +2071,7 @@ schedstat_add(sd, lb_imbalance[NEWLY_IDLE], imbalance); nr_moved = move_tasks(this_rq, this_cpu, busiest, - imbalance, sd, NEWLY_IDLE); + imbalance, sd, NEWLY_IDLE, &all_pinned); if (!nr_moved) schedstat_inc(sd, lb_failed[NEWLY_IDLE]); @@ -2119,6 +2129,7 @@ cpu_group = sd->groups; do { for_each_cpu_mask(cpu, cpu_group->cpumask) { +int all_pinned; if (busiest_rq->nr_running <= 1) /* no more tasks left to move */ return; @@ -2139,7 +2150,7 @@ /* move a task from busiest_rq to target_rq */ double_lock_balance(busiest_rq, target_rq); if (move_tasks(target_rq, cpu, busiest_rq, - 1, sd, SCHED_IDLE)) { + 1, sd, SCHED_IDLE, &all_pinned)) { schedstat_inc(busiest_rq, alb_lost); schedstat_inc(target_rq, alb_gained); } else {
[PATCH 1/13] timestamp fixes
1/13 Some fixes for unsynchronised TSCs. A task's timestamp may have been set by another CPU. Although we try to adjust this correctly with the timestamp_last_tick field, there is no guarantee this will be exactly right. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c 2005-02-24 17:31:25.384986289 +1100 +++ linux-2.6/kernel/sched.c 2005-02-24 17:43:39.356379395 +1100 @@ -648,6 +648,7 @@ static void recalc_task_prio(task_t *p, unsigned long long now) { + /* Caller must always ensure 'now >= p->timestamp' */ unsigned long long __sleep_time = now - p->timestamp; unsigned long sleep_time; @@ -2703,8 +2704,10 @@ schedstat_inc(rq, sched_cnt); now = sched_clock(); - if (likely(now - prev->timestamp < NS_MAX_SLEEP_AVG)) + if (likely((long long)now - prev->timestamp < NS_MAX_SLEEP_AVG)) run_time = now - prev->timestamp; + if (unlikely((long long)now - prev->timestamp < 0)) + run_time = 0; else run_time = NS_MAX_SLEEP_AVG; @@ -2782,6 +2785,8 @@ if (!rt_task(next) && next->activated > 0) { unsigned long long delta = now - next->timestamp; + if (unlikely((long long)now - next->timestamp < 0)) + delta = 0; if (next->activated == 1) delta = delta * (ON_RUNQUEUE_WEIGHT * 128 / 100) / 128;
[PATCH 0/13] Multiprocessor CPU scheduler patches
Hi, I hope that you can include the following set of CPU scheduler patches in -mm soon, if you have no other significant performance work going on. There are some fairly significant changes, with a few basic aims: * Improve SMT behaviour * Improve CMP behaviour, CMP/NUMA scheduling (ie. Opteron) * Reduce task movement, esp over NUMA nodes. They are not going to be very well tuned for most usages at the moment (unfortunately dbt2/3-pgsql on OSDL isn't working, which is a good one). So hopefully I can address regressions as they come up. There are a few problems with the scheduler currently: Problem #1: It has _very_ aggressive idle CPU pulling. Not only does it not really obey imbalances, it is also wrong for eg. an SMT CPU who's sibling is not idle. The reason this was done really is to bring down idle time on some workloads (dbt2-pgsql, other database stuff). So I address this in the following ways; reduce special casing for idle balancing, revert some of the recent moves toward even more aggressive balancing. Then provide a range of averaging levels for CPU "load averages", and we choose which to use in which situation on a sched-domain basis. This allows idle balancing to use a more instantaneous value for calculating load, so idle CPUs need not wait many timer ticks for the load averages to catch up. This can hopefully solve our idle time problems. Also, further moderate "affine wakeups", which can tend to move most tasks to one CPU on some workloads and cause idle problems. Problem #2: The second problem is that balance-on-exec is not sched-domains aware. This means it will tend to (for example) fill up two cores of a CPU on one socket, then fill up two cores on the next socket, etc. What we want is to try to spread load evenly across memory controllers. So make that sched-domains aware following the same pattern as find_busiest_group / find_busiest_queue. Problem #3: Lastly, implement balance-on-fork/clone again. I have come to the realisation that for NUMA, this is probably the best solution. Run-cloned-child-last has run out of steam on CMP systems. What it was supposed to do was provide a period where the child could be pulled to another CPU before it starts running and allocating memory. Unfortunately on CMP systems, this tends to just be to the other sibling. Also, having such a difference between thread and process creation was not really ideal, so we balance on all types of fork/clone. This really helps some things (like STREAM) on CMP Opterons, but also hurts others, so naturally it is settable per-domain. Problem #4: Sched domains isn't very useful to me in its current form. Bring it up to date with what I've been using. I don't think anyone other than myself uses it so that should be OK. Nick - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] PCI bridge driver rewrite
On Thu, 2005-02-24 at 01:45 -0500, Jon Smirl wrote: > On Thu, 24 Feb 2005 01:22:01 -0500, Adam Belay <[EMAIL PROTECTED]> wrote: > > For the past couple weeks I have been reorganizing the PCI subsystem to > > better utilize the driver model. Specifically, the bus detection code > > is now using a standard PCI driver. It turns out to be a major > > What about VGA routing? Most PCI buses do it with the normal VGA bit > but big hardware supports multiple legacy IO spaces via the bridge > chips. > > Are you going to make sysfs entries for the bridges? If so I'd like a > VGA attribute that directly reads the VGA bit from the hardware and > display it instead of using the shadow copy. Yeah, actually I've been thinking about this issue a lot. I think it would make a lot of sense to export this sort of thing under the "pci_bus" class in sysfs. The ISA enable bit should probably also be exported. Furthermore, we should be verifying the BIOS's configuration of VGA and ISA. I'll try to integrate this in my future releases. I appreciate the code. I also have a number of resource management plans for the VGA enable bit that I'll get into in my next set of patches. > > Jesse can comment on the specific support needed for multiple legacy IO > spaces. > That would be great. Most of my experience has been with only a couple legacy IO port ranges passing through the bridge. Thanks, Adam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [8/14] Orinoco driver updates - PCMCIA initialization cleanups
> @@ -184,6 +186,7 @@ > dev_list = link; > > client_reg.dev_info = &dev_info; > + client_reg.Attributes = INFO_IO_CLIENT | INFO_CARD_SHARE; That's not needed any longer for 2.6. Dominik - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Xterm Hangs - Possible scheduler defect?
"Chad N. Tindel" <[EMAIL PROTECTED]> wrote: > > > `xterm' is waiting for the other CPU to schedule a kernel thread (which is > > bound to that CPU). Once that kernel thread has done a little bit of work, > > `xterm' can terminate. > > > > But kernel threads don't run with realtime policy, so your userspace app > > has permanently starved that kernel thread. > > > > It's potentially quite a problem, really. For example it could prevent > > various tty operations from completing, it will prevent kjournald from ever > > writing back anything (on uniprocessor, etc). I've been waiting for > > someone to complain ;) > > > > But the other side of the coin is that a SCHED_FIFO userspace task > > presumably has extreme latency requirements, so it doesn't *want* to be > > preempted by some routine kernel operation. People would get irritated if > > we were to do that. > > > > So what to do? > > It shouldn't need to preempt the kernel operation. Why is the design such > that > the necessary kernel thread can't run on the other CPU? > This particular kernel function is implemented via a kernel thread per CPU, with each thread bound to each CPU. The xterm-does-exit cleanup code is waiting for the thread which is bound to the busy CPU to do something. No other CPU can, or is allowed, to do that thread's work. If it were to do so, the implicit locking which we get from the per-cpuness would be violated. I don't know if any clients of the workqueue code rely upon the pinned-to-cpu feature. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] PCI bridge driver rewrite
On Thu, 24 Feb 2005 01:22:01 -0500, Adam Belay <[EMAIL PROTECTED]> wrote: > For the past couple weeks I have been reorganizing the PCI subsystem to > better utilize the driver model. Specifically, the bus detection code > is now using a standard PCI driver. It turns out to be a major What about VGA routing? Most PCI buses do it with the normal VGA bit but big hardware supports multiple legacy IO spaces via the bridge chips. Are you going to make sysfs entries for the bridges? If so I'd like a VGA attribute that directly reads the VGA bit from the hardware and display it instead of using the shadow copy. /* sysfs show for VGA routing bridge */ static ssize_t vga_bridge_show(struct device *dev, char *buf) { struct pci_dev *pdev = to_pci_dev(dev); u16 l; /* don't trust the shadow PCI_BRIDGE_CTL_VGA in pdev */ /* user space (X) may change hardware without telling the kernel */ pci_read_config_word(pdev, PCI_BRIDGE_CONTROL, &l); return sprintf(buf, "%d\n", (l & PCI_BRIDGE_CTL_VGA) != 0); } I also use these functions to control VGA routing, maybe they should be part of bridge support. static void bridge_yes(struct pci_dev *pdev) { struct pci_dev *bridge; struct pci_bus *bus; /* Make sure the bridges route to us */ bus = pdev->bus; while (bus) { bridge = bus->self; if (bridge) { bus->bridge_ctl |= PCI_BRIDGE_CTL_VGA; pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, bus->bridge_ctl); } bus = bus->parent; } } static void bridge_no(struct pci_dev *pdev) { struct pci_dev *bridge; struct pci_bus *bus; /* Make sure the bridges don't route to us */ bus = pdev->bus; while (bus) { bridge = bus->self; if (bridge) { bus->bridge_ctl &= ~PCI_BRIDGE_CTL_VGA; pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, bus->bridge_ctl); } bus = bus->parent; } } Jesse can comment on the specific support needed for multiple legacy IO spaces. -- Jon Smirl [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector
On Wed, 2005-02-23 at 14:41 +0300, Evgeniy Polyakov wrote: > > Please assume that > originally written for> will always be listening. > > > > > > What happened to the idea of sending an on/off message down the netlink > > > > socket? > > ... > > Arrange for the userspace daemon to send a message to the fork_connector > > subsystem turning it on or off. So we can bypass all this code in the > > common case where is listening, but your daemon is > > not. > > Ok, now I see(I'm not a fork connector author, so I did not receive them). > That will require to add real fork connector with callback routing. > Guillaume? Yes the connector's callback is a good solution. I will add a fork enable/disable callback in drivers/connector/connector.c that will switch a global variable when called from user space. It will be something like: void cn_fork_callback(void) { if (cn_already_initialized) cn_fork_enable = cn_fork_enable ? 0 : 1 ; } With cn_fork_enable set to 0 by default. In the do_fork() I will replace the statement "if (cn_already_initialized)" by "if (cn_fork_enable)" > > Without a lock you can have two messages with the same sequence number. > > Even if the daemon which you're planning on implementing can handle that, > > we shouldn't allow it. > > Yes, they can have the same number, but does it cost atomic/lock overhead? > Anyway, simple spin_lock() should be enough in do_fork() context. > Guillaume? I will protect the incrementation by a spin_lock(&fork_cn_lock). Guillaume - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02
On Thu, 2005-02-24 at 04:56 +, Hugh Dickins wrote: > On Wed, 23 Feb 2005, Lee Revell wrote: > > On Wed, 2005-02-23 at 20:53 +, Hugh Dickins wrote: > > > On Wed, 23 Feb 2005, Hugh Dickins wrote: > > > > Please replace by new patch below, which I'm now running through > > > > lmbench. > > > > > > That second patch seems fine, and I see no lmbench regression from it. > > > > Should go into 2.6.11, right? > > That's up to Andrew (and Linus). > > I was thinking that way when I rushed you the patch. But given that > you have remaining unresolved latency issues nearby (zap_pte_range, > clear_page_range), and given the warning shot that I screwed up my > first attempt, I'd be inclined to say hold off. > > It's a pity: for a while we were thinking 2.6.11 would be a big step > forward for mainline latency; but it now looks to me like these tests > have come too late in the cycle to be dealt with safely. > > In other mail, you do expect people still to be using Ingo's patches, > so probably this patch should stick there (and in -mm) for now. Well all of these were fixed in the past so it may not be unreasonable to fix them for 2.6.11. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc5
Quoting Linus Torvalds ([EMAIL PROTECTED]): > > > Hey, I hoped -rc4 was the last one, but we had some laptop resource > conflicts, various ppc TLB flush issues, some possible stack overflows in > networking and a number of other details warranting a quick -rc5 before > the final 2.6.11. > > This time it's really supposed to be a quickie, so people who can, please > check it out, and we'll make the real 2.6.11 asap. > > Mostly pretty small changes (the largest is a new SATA driver that crept > in, our bad). But worth another quick round. Are you sure you uploaded the correct patch file ? -rw-rw-r--1 536 536 50907 Feb 24 04:13 ChangeLog-2.6.11-rc5 -rw-rw-r--1 536 536 0 Feb 24 04:13 LATEST-IS-2.6.11-rc5 -rw-rw-r--1 536 536 46586159 Feb 24 04:20 linux-2.6.11-rc5.tar.gz -rw-rw-r--1 536 536 248 Feb 24 04:20 linux-2.6.11-rc5.tar.gz.sign -rw-rw-r--1 536 53637 Feb 24 04:20 patch-2.6.11-rc5.gz -rw-rw-r--1 536 536 37080033 Feb 24 04:20 linux-2.6.11-rc5.tar.bz2 -rw-rw-r--1 536 536 248 Feb 24 04:20 linux-2.6.11-rc5.tar.bz2.sign -rw-rw-r--1 536 536 248 Feb 24 04:20 linux-2.6.11-rc5.tar.sign -rw-rw-r--1 536 53614 Feb 24 04:20 patch-2.6.11-rc5.bz2 -rw-rw-r--1 536 536 248 Feb 24 04:20 patch-2.6.11-rc5.bz2.sign -rw-rw-r--1 536 536 248 Feb 24 04:20 patch-2.6.11-rc5.gz.sign -rw-rw-r--1 536 536 248 Feb 24 04:20 patch-2.6.11-rc5.sign drwxrwsr-x2 536 536 8192 Feb 24 04:57 incr drwxrwsr-x4 536 536 16384 Feb 24 05:00 . lftp ftp.kernel.org:/pub/linux/kernel/v2.6/testing> Cheers Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc5
On Wed, Feb 23, 2005 at 08:18:08PM -0800, Linus Torvalds wrote: > > > Hey, I hoped -rc4 was the last one, but we had some laptop resource > conflicts, various ppc TLB flush issues, some possible stack overflows in > networking and a number of other details warranting a quick -rc5 before > the final 2.6.11. > > This time it's really supposed to be a quickie, so people who can, please > check it out, and we'll make the real 2.6.11 asap. > > Mostly pretty small changes (the largest is a new SATA driver that crept > in, our bad). But worth another quick round. Very small. [ ] patch-2.6.11-rc5.bz2 23-Feb-2005 20:20 14 [ ] patch-2.6.11-rc5.bz2.sign 23-Feb-2005 20:20 248 [ ] patch-2.6.11-rc5.gz23-Feb-2005 20:20 37 [ ] patch-2.6.11-rc5.gz.sign 23-Feb-2005 20:20 248 [ ] patch-2.6.11-rc5.sign 23-Feb-2005 20:20 248 Seems to have passed the gpg signature test on my end. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC] PCI bridge driver rewrite
Hi all, For the past couple weeks I have been reorganizing the PCI subsystem to better utilize the driver model. Specifically, the bus detection code is now using a standard PCI driver. It turns out to be a major undertaking, as the PCI probing code is closely tied into a lot of other PCI components, and is spread throughout various architecture specific areas. I'm hoping that these changes will allow for a much cleaner and more functional PCI implementation. The basic flow of the new code is as follows: 1.) A standard "driver core" driver binds to a bridge device. 2.) When "*probe" is called it sets up the hardware and allocates a "struct pci_bus". 3.) The "struct pci_bus" is filled with information about the detected bridge. 4.) The driver then registers the "struct pci_bus" with the PCI Bus Class. 5.) The PCI Bus Class makes the bridge available to sysfs. 6.) It then detects hardware attached to the bridge. 7.) Each new PCI bridge device is registered with the driver model. 8.) All remaining PCI devices are registered with the driver model. Steps 7 and 8 allow for better resource management. I've attached an early version of my code. It has most of the new PCI bus class registration code in place, and an early implementation of the PCI-to-PCI bridge driver. The following remains to be done: 1.) refine and cleanup the new PCI Bus API 2.) export the new API in "linux/pci.h", and cleanup any users of the old code. 3.) fix every PCI hotplug driver. 4.) write a bridge driver for the PCI root bridge 5.) write a bridge driver for Cardbus hardware 6.) refine device registration order 7.) redesign PCI bus number assignment and support bus renumbering 8.) redesign PCI resource management to be compatible with the new code 9.) testing on various architectures 10.) Write "*suspend" and "*resume" routines for PCI bridges. Any ideas on what needs to be done? 11.) fix "PCI_LEGACY" (I may have broke it, but it should be trivial) I look forward to any comments or suggestions. Thanks, Adam diffstat: Makefile |9 bus-class.c | 225 +++ bus/Makefile |6 bus/bus-p2p.c | 133 ++ device.c | 142 +++ pci.h |4 probe.c | 546 -- remove.c | 126 - 9 files changed, 598 insertions(+), 593 deletions(-) Patch is against 2.6.11-RC3. diff -urN linux/drivers/pci/bus/bus-p2p.c linux-pci/drivers/pci/bus/bus-p2p.c --- linux/drivers/pci/bus/bus-p2p.c 1969-12-31 19:00:00.0 -0500 +++ linux-pci/drivers/pci/bus/bus-p2p.c 2005-02-24 00:19:05.0 -0500 @@ -0,0 +1,133 @@ +/* + * bus-p2p.c - a generic PCI bus driver for PCI<->PCI bridges + * + */ + +#include +#include +#include + +static struct pci_device_id p2p_id_tbl[] = { + { PCI_DEVICE_CLASS(PCI_CLASS_BRIDGE_PCI << 8, 0x00) }, + { 0 }, +}; +MODULE_DEVICE_TABLE(pci, p2p_id_tbl); + +static void p2p_setup_bus_numbers(struct pci_dev *dev, struct pci_bus *bus) +{ + u32 buses; + + pci_read_config_dword(dev, PCI_PRIMARY_BUS, &buses); + + bus->primary = buses & 0xFF; + bus->secondary = (buses >> 8) & 0xFF; + bus->subordinate = (buses >> 16) & 0xFF; +} + +static void pci_enable_crs(struct pci_dev *dev) +{ + u16 cap, rpctl; + int rpcap = pci_find_capability(dev, PCI_CAP_ID_EXP); + if (!rpcap) + return; + + pci_read_config_word(dev, rpcap + PCI_CAP_FLAGS, &cap); + if (((cap & PCI_EXP_FLAGS_TYPE) >> 4) != PCI_EXP_TYPE_ROOT_PORT) + return; + + pci_read_config_word(dev, rpcap + PCI_EXP_RTCTL, &rpctl); + rpctl |= PCI_EXP_RTCTL_CRSSVE; + pci_write_config_word(dev, rpcap + PCI_EXP_RTCTL, rpctl); +} + +static void p2p_prepare_hardware(struct pci_dev *dev, struct pci_bus *bus) +{ + u16 bctl; + + /* Disable MasterAbortMode during probing to avoid reporting + of bus errors (in some architectures) */ + pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &bctl); + pci_write_config_word(dev, PCI_BRIDGE_CONTROL, + bctl & ~PCI_BRIDGE_CTL_MASTER_ABORT); + + bus->bridge_ctl = bctl; + + pci_enable_crs(dev); +} + +/* FIXME: these need to be defined in linux/pci.h */ +extern struct pci_bus * pci_alloc_bus(void); +extern int pci_add_bus(struct pci_bus *bus); +extern struct pci_bus * pci_derive_parent(struct device *); + +static int p2p_probe(struct pci_dev *dev, const struct pci_device_id *id) +{ + int err, i; + struct pci_bus *bus; + + if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) + return -ENODEV; + + bus = pci_alloc_bus(); + + if (!bus) + return -ENOMEM; + + bus->bridge = &dev->dev; + bus->parent = pci_derive_parent(&bus->self->dev); + if (!bus->parent) { + err = -ENODEV; + goto out; + } + + bus->ops = bus->parent->ops; +
Re: 2.6.11-rc5
>This time it's really supposed to be a quickie, so people who can, >please check it out, and we'll make the real 2.6.11 asap. Out of diskspace on kernel.org? http://www.kernel.org/pub/linux/kernel/v2.6/testing/ [...] patch-2.6.11-rc5.bz2 23-Feb-2005 20:20 14 patch-2.6.11-rc5.bz2.sign 23-Feb-2005 20:20 248 patch-2.6.11-rc5.gz23-Feb-2005 20:20 37 patch-2.6.11-rc5.gz.sign 23-Feb-2005 20:20 248 patch-2.6.11-rc5.sign 23-Feb-2005 20:20 248 Mvh Mats Johannesson -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/5] I8K - convert to platform device (sysfs)
i8k.c | 117 ++ 1 files changed, 117 insertions(+) Index: dtor/drivers/char/i8k.c === --- dtor.orig/drivers/char/i8k.c +++ dtor/drivers/char/i8k.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include @@ -87,6 +88,13 @@ static struct file_operations i8k_fops = .ioctl = i8k_ioctl, }; +static struct device_driver i8k_driver = { + .name = "i8k", + .bus= &platform_bus_type, +}; + +static struct platform_device *i8k_device; + struct smm_regs { unsigned int eax; unsigned int ebx __attribute__ ((packed)); @@ -406,6 +414,89 @@ static int i8k_open_fs(struct inode *ino return single_open(file, i8k_proc_show, NULL); } +static ssize_t i8k_sysfs_cpu_temp_show(struct device *dev, char *buf) +{ + int temp = i8k_get_cpu_temp(); + + return temp < 0 ? -EIO : sprintf(buf, "%d\n", temp); +} + +static ssize_t i8k_sysfs_fan1_show(struct device *dev, char *buf) +{ + int status = i8k_get_fan_status(0); + return status < 0 ? -EIO : sprintf(buf, "%d\n", status); +} + +static ssize_t i8k_sysfs_fan1_set(struct device *dev, const char *buf, size_t count) +{ + unsigned long state; + char *rest; + + if (restricted && !capable(CAP_SYS_ADMIN)) + return -EPERM; + + state = simple_strtoul(buf, &rest, 10); + if (*rest || state > I8K_FAN_MAX) + return -EINVAL; + + if (i8k_set_fan(0, state) < 0) + return -EIO; + + return count; +} + +static ssize_t i8k_sysfs_fan2_show(struct device *dev, char *buf) +{ + int status = i8k_get_fan_status(1); + return status < 0 ? -EIO : sprintf(buf, "%d\n", status); +} + +static ssize_t i8k_sysfs_fan2_set(struct device *dev, const char *buf, size_t count) +{ + unsigned long state; + char *rest; + + if (restricted && !capable(CAP_SYS_ADMIN)) + return -EPERM; + + state = simple_strtoul(buf, &rest, 10); + if (*rest || state > I8K_FAN_MAX) + return -EINVAL; + + if (i8k_set_fan(1, state) < 0) + return -EIO; + + return count; +} + +static ssize_t i8k_sysfs_fan1_speed_show(struct device *dev, char *buf) +{ + int speed = i8k_get_fan_speed(0); + return speed < 0 ? -EIO : sprintf(buf, "%d\n", speed); +} + +static ssize_t i8k_sysfs_fan2_speed_show(struct device *dev, char *buf) +{ + int speed = i8k_get_fan_speed(1); + return speed < 0 ? -EIO : sprintf(buf, "%d\n", speed); +} + +static ssize_t i8k_sysfs_power_status_show(struct device *dev, char *buf) +{ + int status = power_status ? i8k_get_power_status() : -1; + return status < 0 ? -EIO : sprintf(buf, "%d\n", status); +} + +static struct device_attribute i8k_device_attrs[] = { + __ATTR(cpu_temp, 0444, i8k_sysfs_cpu_temp_show, NULL), + __ATTR(fan1_state, 0644, i8k_sysfs_fan1_show, i8k_sysfs_fan1_set), + __ATTR(fan2_state, 0644, i8k_sysfs_fan2_show, i8k_sysfs_fan2_set), + __ATTR(fan1_speed, 0444, i8k_sysfs_fan1_speed_show, NULL), + __ATTR(fan2_speed, 0444, i8k_sysfs_fan2_speed_show, NULL), + __ATTR(power_status, 0444, i8k_sysfs_power_status_show, NULL), + __ATTR_NULL +}; + static struct dmi_system_id __initdata i8k_dmi_table[] = { { .ident = "Dell Inspiron", @@ -490,6 +581,7 @@ static int __init i8k_probe(void) static int __init i8k_init(void) { struct proc_dir_entry *proc_i8k; + int err, i; /* Are we running on an supported laptop? */ if (i8k_probe()) @@ -503,15 +595,40 @@ static int __init i8k_init(void) proc_i8k->proc_fops = &i8k_fops; proc_i8k->owner = THIS_MODULE; + err = driver_register(&i8k_driver); + if (err) + goto fail1; + + i8k_device = platform_device_register_simple("i8k", -1, NULL, 0); + if (IS_ERR(i8k_device)) { + err = PTR_ERR(i8k_device); + goto fail2; + } + + for (i = 0; attr_name(i8k_device_attrs[i]); i++) { + err = device_create_file(&i8k_device->dev, &i8k_device_attrs[i]); + if (err) + goto fail3; + } + printk(KERN_INFO "Dell laptop SMM driver v%s Massimo Dal Zotto ([EMAIL PROTECTED])\n", I8K_VERSION); return 0; + +fail3: while (--i >= 0) + device_remove_file(&i8k_device->dev, &i8k_device_attrs[i]); + platform_device_unregister(i8k_device); +fail2: driver_unregister(&i8k_driver); +fail1: remove_proc_entry("i8k", NULL); + return err; } static void __exit i8k_exit(void) { + platform_device_unregister(i8k_device); + driver_unregister(&i8k_driver); remove_proc_entry("i8k", NULL); } - To unsubscribe from this list: send the line
[PATCH 4/5] I8K - switch to module_{init|exit}
=== I8K: use module_{init|exit} instead of old style #ifdef MODULE code, some formatting changes. Signed-off-by: Dmitry Torokhov <[EMAIL PROTECTED]> i8k.c | 149 - misc.c |4 - 2 files changed, 47 insertions(+), 106 deletions(-) Index: dtor/drivers/char/misc.c === --- dtor.orig/drivers/char/misc.c +++ dtor/drivers/char/misc.c @@ -67,7 +67,6 @@ extern int rtc_DP8570A_init(void); extern int rtc_MK48T08_init(void); extern int pmu_device_init(void); extern int tosh_init(void); -extern int i8k_init(void); #ifdef CONFIG_PROC_FS static void *misc_seq_start(struct seq_file *seq, loff_t *pos) @@ -317,9 +316,6 @@ static int __init misc_init(void) #ifdef CONFIG_TOSHIBA tosh_init(); #endif -#ifdef CONFIG_I8K - i8k_init(); -#endif if (register_chrdev(MISC_MAJOR,"misc",&misc_fops)) { printk("unable to get major %d for misc devices\n", MISC_MAJOR); Index: dtor/drivers/char/i8k.c === --- dtor.orig/drivers/char/i8k.c +++ dtor/drivers/char/i8k.c @@ -87,14 +87,14 @@ static struct file_operations i8k_fops = .ioctl = i8k_ioctl, }; -typedef struct { +struct smm_regs { unsigned int eax; unsigned int ebx __attribute__ ((packed)); unsigned int ecx __attribute__ ((packed)); unsigned int edx __attribute__ ((packed)); unsigned int esi __attribute__ ((packed)); unsigned int edi __attribute__ ((packed)); -} SMMRegisters; +}; static inline char *i8k_get_dmi_data(int field) { @@ -104,7 +104,7 @@ static inline char *i8k_get_dmi_data(int /* * Call the System Management Mode BIOS. Code provided by Jonathan Buzzard. */ -static int i8k_smm(SMMRegisters * regs) +static int i8k_smm(struct smm_regs *regs) { int rc; int eax = regs->eax; @@ -134,9 +134,8 @@ static int i8k_smm(SMMRegisters * regs) :"a"(regs) :"%ebx", "%ecx", "%edx", "%esi", "%edi", "memory"); - if ((rc != 0) || ((regs->eax & 0x) == 0x) || (regs->eax == eax)) { + if (rc != 0 || (regs->eax & 0x) == 0x || regs->eax == eax) return -EINVAL; - } return 0; } @@ -147,15 +146,9 @@ static int i8k_smm(SMMRegisters * regs) */ static int i8k_get_bios_version(void) { - SMMRegisters regs = { 0, 0, 0, 0, 0, 0 }; - int rc; - - regs.eax = I8K_SMM_BIOS_VERSION; - if ((rc = i8k_smm(®s)) < 0) { - return rc; - } + struct smm_regs regs = { .eax = I8K_SMM_BIOS_VERSION, }; - return regs.eax; + return i8k_smm(®s) < 0 ? : regs.eax; } /* @@ -163,13 +156,11 @@ static int i8k_get_bios_version(void) */ static int i8k_get_fn_status(void) { - SMMRegisters regs = { 0, 0, 0, 0, 0, 0 }; + struct smm_regs regs = { .eax = I8K_SMM_FN_STATUS, }; int rc; - regs.eax = I8K_SMM_FN_STATUS; - if ((rc = i8k_smm(®s)) < 0) { + if ((rc = i8k_smm(®s)) < 0) return rc; - } switch ((regs.eax >> I8K_FN_SHIFT) & I8K_FN_MASK) { case I8K_FN_UP: @@ -188,20 +179,13 @@ static int i8k_get_fn_status(void) */ static int i8k_get_power_status(void) { - SMMRegisters regs = { 0, 0, 0, 0, 0, 0 }; + struct smm_regs regs = { .eax = I8K_SMM_POWER_STATUS, }; int rc; - regs.eax = I8K_SMM_POWER_STATUS; - if ((rc = i8k_smm(®s)) < 0) { + if ((rc = i8k_smm(®s)) < 0) return rc; - } - switch (regs.eax & 0xff) { - case I8K_POWER_AC: - return I8K_AC; - default: - return I8K_BATTERY; - } + return (regs.eax & 0xff) == I8K_POWER_AC ? I8K_AC : I8K_BATTERY; } /* @@ -209,16 +193,10 @@ static int i8k_get_power_status(void) */ static int i8k_get_fan_status(int fan) { - SMMRegisters regs = { 0, 0, 0, 0, 0, 0 }; - int rc; + struct smm_regs regs = { .eax = I8K_SMM_GET_FAN, }; - regs.eax = I8K_SMM_GET_FAN; regs.ebx = fan & 0xff; - if ((rc = i8k_smm(®s)) < 0) { - return rc; - } - - return (regs.eax & 0xff); + return i8k_smm(®s) < 0 ? : regs.eax & 0xff; } /* @@ -226,16 +204,10 @@ static int i8k_get_fan_status(int fan) */ static int i8k_get_fan_speed(int fan) { - SMMRegisters regs = { 0, 0, 0, 0, 0, 0 }; - int rc; + struct smm_regs regs = { .eax = I8K_SMM_GET_SPEED, }; - regs.eax = I8K_SMM_GET_SPEED; regs.ebx = fan & 0xff; - if ((rc = i8k_smm(®s)) < 0) { - return rc; - } - - return (regs.eax & 0x) * I8K_FAN_MULT; + return i8k_smm(®s) < 0 ? : (regs.eax & 0x) * I8K_FAN_MULT; } /* @@ -243,18 +215,12 @@ static in
[PATCH 2/5] I8K - use standard DMI functions
=== I8K: Change to use stock dmi infrastructure instead of homegrown parsing code. The driver now requres box's DMI data to match list of supported models so driver can be safely compiled-in by default without fear of it poking into random SMM BIOS code. DMI checks can be ignored with i8k.ignore_dmi option. Signed-off-by: Dmitry Torokhov <[EMAIL PROTECTED]> Documentation/kernel-parameters.txt |3 arch/i386/kernel/dmi_scan.c |1 drivers/char/i8k.c | 304 ++-- include/linux/dmi.h |1 4 files changed, 60 insertions(+), 249 deletions(-) Index: dtor/arch/i386/kernel/dmi_scan.c === --- dtor.orig/arch/i386/kernel/dmi_scan.c +++ dtor/arch/i386/kernel/dmi_scan.c @@ -416,6 +416,7 @@ static void __init dmi_decode(struct dmi dmi_save_ident(dm, DMI_PRODUCT_VERSION, 6); dmi_printk(("Serial Number: %s\n", dmi_string(dm, data[7]))); + dmi_save_ident(dm, DMI_PRODUCT_SERIAL, 7); break; case 2: dmi_printk(("Board Vendor: %s\n", Index: dtor/include/linux/dmi.h === --- dtor.orig/include/linux/dmi.h +++ dtor/include/linux/dmi.h @@ -9,6 +9,7 @@ enum dmi_field { DMI_SYS_VENDOR, DMI_PRODUCT_NAME, DMI_PRODUCT_VERSION, + DMI_PRODUCT_SERIAL, DMI_BOARD_VENDOR, DMI_BOARD_NAME, DMI_BOARD_VERSION, Index: dtor/drivers/char/i8k.c === --- dtor.orig/drivers/char/i8k.c +++ dtor/drivers/char/i8k.c @@ -20,7 +20,7 @@ #include #include #include -#include +#include #include #include @@ -52,18 +52,7 @@ #define I8K_TEMPERATURE_BUG1 -#define DELL_SIGNATURE "Dell Computer" - -static char *supported_models[] = { - "Inspiron", - "Latitude", - NULL -}; - -static char system_vendor[48] = "?"; -static char product_name[48] = "?"; -static char bios_version[4] = "?"; -static char serial_number[16] = "?"; +static char bios_version[4]; MODULE_AUTHOR("Massimo Dal Zotto ([EMAIL PROTECTED])"); MODULE_DESCRIPTION("Driver for accessing SMM BIOS on Dell laptops"); @@ -73,6 +62,10 @@ static int force; module_param(force, bool, 0); MODULE_PARM_DESC(force, "Force loading without checking for supported models"); +static int ignore_dmi; +module_param(ignore_dmi, bool, 0); +MODULE_PARM_DESC(ignore_dmi, "Continue probing hardware even if DMI data does not match"); + static int restricted; module_param(restricted, bool, 0); MODULE_PARM_DESC(restricted, "Allow fan control if SYS_ADMIN capability set"); @@ -99,11 +92,10 @@ typedef struct { unsigned int edi __attribute__ ((packed)); } SMMRegisters; -typedef struct { - u8 type; - u8 length; - u16 handle; -} DMIHeader; +static inline char *i8k_get_dmi_data(int field) +{ + return dmi_get_system_info(field) ? : "N/A"; +} /* * Call the System Management Mode BIOS. Code provided by Jonathan Buzzard. @@ -163,15 +155,6 @@ static int i8k_get_bios_version(void) } /* - * Read the machine id. - */ -static int i8k_get_serial_number(unsigned char *buff) -{ - strlcpy(buff, serial_number, sizeof(serial_number)); - return 0; -} - -/* * Read the Fn key status. */ static int i8k_get_fn_status(void) @@ -328,7 +311,7 @@ static int i8k_get_dell_signature(void) static int i8k_ioctl(struct inode *ip, struct file *fp, unsigned int cmd, unsigned long arg) { - int val; + int val = 0; int speed; unsigned char buff[16]; int __user *argp = (int __user *)arg; @@ -343,7 +326,7 @@ static int i8k_ioctl(struct inode *ip, s case I8K_MACHINE_ID: memset(buff, 0, 16); - val = i8k_get_serial_number(buff); + strlcpy(buff, i8k_get_dmi_data(DMI_PRODUCT_SERIAL), sizeof(buff)); break; case I8K_FN_STATUS: @@ -451,10 +434,10 @@ static int i8k_get_info(char *buffer, ch n = sprintf(buffer, "%s %s %s %d %d %d %d %d %d %d\n", I8K_PROC_FMT, bios_version, - serial_number, + dmi_get_system_info(DMI_PRODUCT_SERIAL) ? : "N/A", cpu_temp, - left_fan, - right_fan, left_speed, right_speed, ac_power, fn_key); + left_fan, right_fan, left_speed, right_speed, + ac_power, fn_key); return n; } @@ -486,201 +469,23 @@ static ssize_t i8k_read(struct file *f, return len; } -static char *__init string_trim(char *s, int size) -{ - int len; - char *p; - -
[PATCH 3/5] I8K - switch to seq_file
=== I8K: Change proc code to use seq_file. Signed-off-by: Dmitry Torokhov <[EMAIL PROTECTED]> i8k.c | 64 ++-- 1 files changed, 22 insertions(+), 42 deletions(-) Index: dtor/drivers/char/i8k.c === --- dtor.orig/drivers/char/i8k.c +++ dtor/drivers/char/i8k.c @@ -20,13 +20,14 @@ #include #include #include +#include #include #include #include #include -#define I8K_VERSION"1.13 14/05/2002" +#define I8K_VERSION"1.14 21/02/2005" #define I8K_SMM_FN_STATUS 0x0025 #define I8K_SMM_POWER_STATUS 0x0069 @@ -74,13 +75,16 @@ static int power_status; module_param(power_status, bool, 0600); MODULE_PARM_DESC(power_status, "Report power status in /proc/i8k"); -static ssize_t i8k_read(struct file *, char __user *, size_t, loff_t *); +static int i8k_open_fs(struct inode *inode, struct file *file); static int i8k_ioctl(struct inode *, struct file *, unsigned int, unsigned long); static struct file_operations i8k_fops = { - .read = i8k_read, - .ioctl = i8k_ioctl, + .open = i8k_open_fs, + .read = seq_read, + .llseek = seq_lseek, + .release= single_release, + .ioctl = i8k_ioctl, }; typedef struct { @@ -400,9 +404,9 @@ static int i8k_ioctl(struct inode *ip, s /* * Print the information for /proc/i8k. */ -static int i8k_get_info(char *buffer, char **start, off_t fpos, int length) +static int i8k_proc_show(struct seq_file *seq, void *offset) { - int n, fn_key, cpu_temp, ac_power; + int fn_key, cpu_temp, ac_power; int left_fan, right_fan, left_speed, right_speed; cpu_temp= i8k_get_cpu_temp(); /* 11100 ??s */ @@ -431,42 +435,18 @@ static int i8k_get_info(char *buffer, ch * 9) AC power * 10) Fn Key status */ - n = sprintf(buffer, "%s %s %s %d %d %d %d %d %d %d\n", - I8K_PROC_FMT, - bios_version, - dmi_get_system_info(DMI_PRODUCT_SERIAL) ? : "N/A", - cpu_temp, - left_fan, right_fan, left_speed, right_speed, - ac_power, fn_key); - - return n; + return seq_printf(seq, "%s %s %s %d %d %d %d %d %d %d\n", + I8K_PROC_FMT, + bios_version, + dmi_get_system_info(DMI_PRODUCT_SERIAL) ? : "N/A", + cpu_temp, + left_fan, right_fan, left_speed, right_speed, + ac_power, fn_key); } -static ssize_t i8k_read(struct file *f, char __user * buffer, size_t len, - loff_t * fpos) +static int i8k_open_fs(struct inode *inode, struct file *file) { - int n; - char info[128]; - - n = i8k_get_info(info, NULL, 0, 128); - if (n <= 0) { - return n; - } - - if (*fpos >= n) { - return 0; - } - - if ((*fpos + len) >= n) { - len = n - *fpos; - } - - if (copy_to_user(buffer, info, len) != 0) { - return -EFAULT; - } - - *fpos += len; - return len; + return single_open(file, i8k_proc_show, NULL); } static struct dmi_system_id __initdata i8k_dmi_table[] = { @@ -562,10 +542,10 @@ int __init i8k_init(void) return -ENODEV; /* Register the proc entry */ - proc_i8k = create_proc_info_entry("i8k", 0, NULL, i8k_get_info); - if (!proc_i8k) { + proc_i8k = create_proc_entry("i8k", 0, NULL); + if (!proc_i8k) return -ENOENT; - } + proc_i8k->proc_fops = &i8k_fops; proc_i8k->owner = THIS_MODULE; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/5] I8K driver facelift
Hi, here are some changes that freshen I8K driver (Dell Inspiron/Latitude platform driver). The patches have been tested on Inspiron 8100. i8k-lindent.patch - pass the driver through Lindent to comply with CondingStyle requirements (4 spaces vs. TAB indentation) i8k-use-dmi.patch - use standard DMI handling functions instead of homemade ones. The driver now requires DMI data to match list of supported models - this way driver can be safely enabled without fear of it poking into SMM code on wrong box. DMI checks can be ignored with i8k.ignore_dmi option. i8k-seqfile.patch - switch proc handlig code to seq_file instead of having custom read function splitting output to fit into user's buffer. i8k-cleanup.patch - use module_{init|exit} instead of old-style module intialization code, some formatting changes. i8k-sysfs.patch - make i8k a platform device and export temperatiure and both fan states as sysfs attributes. Wringing into fan1_state and fan2_state attributes allows switching fans on and off without need for special utility. Please consider for inclusion. Thanks! -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/5] I8K - pass though Lindent
=== I8K: pass through Lindent to change 4 spaces identation to TABs Signed-off-by: Dmitry Torokhov <[EMAIL PROTECTED]> i8k.c | 954 +- 1 files changed, 477 insertions(+), 477 deletions(-) Index: dtor/drivers/char/i8k.c === --- dtor.orig/drivers/char/i8k.c +++ dtor/drivers/char/i8k.c @@ -55,14 +55,14 @@ #define DELL_SIGNATURE "Dell Computer" static char *supported_models[] = { -"Inspiron", -"Latitude", -NULL + "Inspiron", + "Latitude", + NULL }; static char system_vendor[48] = "?"; -static char product_name [48] = "?"; -static char bios_version [4] = "?"; +static char product_name[48] = "?"; +static char bios_version[4] = "?"; static char serial_number[16] = "?"; MODULE_AUTHOR("Massimo Dal Zotto ([EMAIL PROTECTED])"); @@ -86,64 +86,63 @@ static int i8k_ioctl(struct inode *, str unsigned long); static struct file_operations i8k_fops = { -.read = i8k_read, -.ioctl = i8k_ioctl, + .read = i8k_read, + .ioctl = i8k_ioctl, }; typedef struct { -unsigned int eax; -unsigned int ebx __attribute__ ((packed)); -unsigned int ecx __attribute__ ((packed)); -unsigned int edx __attribute__ ((packed)); -unsigned int esi __attribute__ ((packed)); -unsigned int edi __attribute__ ((packed)); + unsigned int eax; + unsigned int ebx __attribute__ ((packed)); + unsigned int ecx __attribute__ ((packed)); + unsigned int edx __attribute__ ((packed)); + unsigned int esi __attribute__ ((packed)); + unsigned int edi __attribute__ ((packed)); } SMMRegisters; typedef struct { -u8 type; -u8 length; -u16handle; + u8 type; + u8 length; + u16 handle; } DMIHeader; /* * Call the System Management Mode BIOS. Code provided by Jonathan Buzzard. */ -static int i8k_smm(SMMRegisters *regs) +static int i8k_smm(SMMRegisters * regs) { -int rc; -int eax = regs->eax; + int rc; + int eax = regs->eax; -asm("pushl %%eax\n\t" \ - "movl 0(%%eax),%%edx\n\t" \ - "push %%edx\n\t" \ - "movl 4(%%eax),%%ebx\n\t" \ - "movl 8(%%eax),%%ecx\n\t" \ - "movl 12(%%eax),%%edx\n\t" \ - "movl 16(%%eax),%%esi\n\t" \ - "movl 20(%%eax),%%edi\n\t" \ - "popl %%eax\n\t" \ - "out %%al,$0xb2\n\t" \ - "out %%al,$0x84\n\t" \ - "xchgl %%eax,(%%esp)\n\t" - "movl %%ebx,4(%%eax)\n\t" \ - "movl %%ecx,8(%%eax)\n\t" \ - "movl %%edx,12(%%eax)\n\t" \ - "movl %%esi,16(%%eax)\n\t" \ - "movl %%edi,20(%%eax)\n\t" \ - "popl %%edx\n\t" \ - "movl %%edx,0(%%eax)\n\t" \ - "lahf\n\t" \ - "shrl $8,%%eax\n\t" \ - "andl $1,%%eax\n" \ - : "=a" (rc) - : "a" (regs) - : "%ebx", "%ecx", "%edx", "%esi", "%edi", "memory"); - -if ((rc != 0) || ((regs->eax & 0x) == 0x) || (regs->eax == eax)) { - return -EINVAL; -} + asm("pushl %%eax\n\t" + "movl 0(%%eax),%%edx\n\t" + "push %%edx\n\t" + "movl 4(%%eax),%%ebx\n\t" + "movl 8(%%eax),%%ecx\n\t" + "movl 12(%%eax),%%edx\n\t" + "movl 16(%%eax),%%esi\n\t" + "movl 20(%%eax),%%edi\n\t" + "popl %%eax\n\t" + "out %%al,$0xb2\n\t" + "out %%al,$0x84\n\t" + "xchgl %%eax,(%%esp)\n\t" + "movl %%ebx,4(%%eax)\n\t" + "movl %%ecx,8(%%eax)\n\t" + "movl %%edx,12(%%eax)\n\t" + "movl %%esi,16(%%eax)\n\t" + "movl %%edi,20(%%eax)\n\t" + "popl %%edx\n\t" + "movl %%edx,0(%%eax)\n\t" + "lahf\n\t" + "shrl $8,%%eax\n\t" + "andl $1,%%eax\n":"=a"(rc) + :"a"(regs) + :"%ebx", "%ecx", "%edx", "%esi", "%edi", "memory"); -return 0; + if ((rc != 0) || ((regs->eax & 0x) == 0x) || (regs->eax == eax)) { + return -EINVAL; + } + + return 0; } /* @@ -152,15 +151,15 @@ static int i8k_smm(SMMRegisters *regs) */ static int i8k_get_bios_version(void) { -SMMRegisters regs = { 0, 0, 0, 0, 0, 0 }; -int rc; + SMMRegisters regs = { 0, 0, 0, 0, 0, 0 }; + int rc; -regs.eax = I8K_SMM_BIOS_VERSION; -if ((rc=i8k_smm(®s)) < 0) { - return rc; -} + regs.eax = I8K_SMM_BIOS_VERSION; + if ((rc = i8k_smm(®s)) < 0) { + return rc; + } -return regs.eax; + return regs.eax; } /* @@ -168,8 +167,8 @@ static int i8k_get_bios_version(void) */ static int i8k_get_serial_number(unsigned char *buff) { -strlcpy(buff, serial_number, sizeof(serial_number)); -return 0; + strlcpy(buff, serial_number, sizeof(serial_number)); + return 0; } /* @@ -177,2
A Proposal for an MMU abstraction layer
1. Rationale Currently the Linux kernel implements a hierachical page table utilizing 4 layers. Architectures that have less layers may cause the kernel to not generate code for certain layers. However, there are other means for mmu to describe page tables to the system. For example the Itanium (and other CPUs) support hashed page table structures or linear page tables. IA64 has to simulate the hierachical layers through its linear page tables and implements the higher layers in software. Moreover, different architectures have different means of implementing huge page table entries. On IA32 this is realized by omitting the lower layer entries and providing single PMD entry replacing 512/1024 PTE entries. On IA64 a PTE entry is used for that purpose. Other architecture realize huge page table entries through groups of PTE entries. There are hooks for each of these methods in the kernel. Moreover the way of handling huge pages is not like other pages but they are managed through a file system. Only one size of huge pages is supported. It would be much better if huge pages would be handled more like regular pages and also to have support for multiple page sizes (which then may lead to support variable page sizes in the VM). It would be best to hide these implementation differences in an mmu abstraction layer. Various architectures could then implement their own way of representing page table entries. We would provide a legacy 4 layer, 3 layer and 2 layer implementation that would take care of the existing implementations. These generic implementations can then be taken by an architecture and emendedto provide the huge page table entries in way fitting for that architecture. For IA64 and otherplatforms that allow alternate ways of maintaining translations, we could avoid maintaining a hierachical table. There are a couple of additional features for page tables that then could also be worked into that abstraction layer: A. Global translation entries. B. Variable page size. C. Use a transactional scheme to allow a variety of synchronization schemes. Early idea for an mmu abstraction layer API === Three new opaque types: mmu_entry_t mmu_translation_set_t mmu_transaction_t *mmu_entry_t* replaces the existing pte_t and has roughly the same features. However, mmu_entry_t describes a translation of a logical address to a physical address in general. This means that the mmu_entry_t must be able to represent all possible mappings including mappings for huge pages and pages of various sizes if these features are supported by the method of handling page tables. If statistics need to be kept about entries then this entry will also contain a number to indicate what counter to update when inserting or deleting this type of entry [spare bits may be used for this purpose] *mmu_translation_set_t* represents a virtual address space for a process and is essentially a set of mmu_entry_t's plus additional management information that may be necessary to manage an address space. *mmu_transaction_t* allows to perform transactions on translation entries and maintains the state of a transaction. The state information allows to undo changes or commit them in a way that must appear to be atomic to any other access in the system. Operations on mmu_translation_set_t --- void mmu_new_translation_set(struct mmu_translation_set_t *t); Generates an empty translation set void mmu_dup_translation_set(struct mmu_translation_set_t *t, struct mmu_translation_set *t); Generates a duplicate of a translation set void mmu_remove_translation_set(struct mmu_translation_set *t); Removes a translation set void mmu_clear_range(struct mmu_translation_set_t *t, unsigned long start, unsigned long end); Wipe out a range of addresses in the translation set void mmu_copy_range(struct mmu_translation_set *dest, struct mmu_translation_set_t *src, unsinged long dest_start, unsigned long src_start, unsigned long length); These functions are not implemented for the period in which old and new schemes are coexisting since this would require a major change to mm_struct. Transactional operations void mmu_transaction(struct mmu_transaction_t *ta, struct mmu_translation_set_t *tr); Begin a transaction For the coexistence period this is implemented as mmu_transaction(struct mmu_transaction_t , struct mm_struct *mm, struct vm_are_struct *); void mmu_commit(struct mmu_transaction_t); Commit changes done void mmu_forget(struct mmu_transaction_t); Undo changes undone struct mmu_entry_t mmu_find(struct mmu_transaction_t *ta, unsigned long address); Find mmu entry and make this the current entry void mmu_update(struct mmu_transaction_t *ta, mmu_entry_t entry); Update the current entry void mmu_add(struct mmu_transaction_t *ta, mmu_entry_t ent
Re: [PATCH 2/2] page table iterators
On Thu, 2005-02-24 at 05:12 +, Hugh Dickins wrote: > On Thu, 24 Feb 2005, Nick Piggin wrote: > > OK after sleeping on it, I'm warming to your way. > > > > I don't think it makes something like David's modifications any > > easier, but mine didn't go a long way to that end either. And > > being a more incremental approach gives us more room to move in > > future (for example, maybe toward something that really *will* > > accommodate the bitmap walking code nicely). > > I'll take a quick look at David's today. > Just so long as we don't make them harder. > No, I think we may want to move to something better abstracted: it makes things sufficiently complex that you wouldn't want to have it open coded everywhere. But no, you're not making it harder than the present situation. > > So I'd be pretty happy for you to queue this up with Andrew for > > 2.6.12. Anyone else? > > Oh, okay, thanks. You weren't very happy with p??_limit(addr, end), > and good naming is important to me. I didn't care for your tentative > p??_span or p??_span_end. Would p??_end be better? p??_enda would > be fun for one of them... > pud_addr_end? http://mobile.yahoo.com.au - Yahoo! Mobile - Check & compose your email via SMS on your Telstra or Vodafone mobile. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] page table iterators
On Thu, 24 Feb 2005, Nick Piggin wrote: > Hugh Dickins wrote: > > > I'm inlining pmd and pud levels, but not pte and pgd levels. > > OK - that's probably sufficient for debugging. There is only so > much that can go wrong in the middle levels... Yes, that was my thinking. > how does it look > performance wise? (I can give it a test when it gets split out) Yesterday shattered in various directions, I hope to try today. > > One point worth making, I do believe throughout that whatever the > > address layout, "end" cannot be 0 - BUG_ON(addr >= end) assures. Of course, that does allow some simplifications in your for_each macros; but it still looked like my p??_limits were better for shortest codepath, and close to yours for codesize. > OK after sleeping on it, I'm warming to your way. > > I don't think it makes something like David's modifications any > easier, but mine didn't go a long way to that end either. And > being a more incremental approach gives us more room to move in > future (for example, maybe toward something that really *will* > accommodate the bitmap walking code nicely). I'll take a quick look at David's today. Just so long as we don't make them harder. > So I'd be pretty happy for you to queue this up with Andrew for > 2.6.12. Anyone else? Oh, okay, thanks. You weren't very happy with p??_limit(addr, end), and good naming is important to me. I didn't care for your tentative p??_span or p??_span_end. Would p??_end be better? p??_enda would be fun for one of them... Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel BUG at mm/rmap.c:483!
On Wed, 23 Feb 2005, Ammar T. Al-Sayegh wrote: > - Original Message - From: "Hugh Dickins" <[EMAIL PROTECTED]> > > though quite possibly you cannot afford > > such experiments on this server, and will revert to 2.4 for now. > > The problem is that my server is already in production > mode. I'm running great portion of my business on it, > where there is very little tolerance for downtime. I feared as much. > Because the server is located in a remote datacenter, > every time it goes down it takes several hours to have > someone sent up there to manually reboot it for a hefty > emergency fee. So this bug has already cost me a lot of > money, and I'm worried that it will cost me a lot of my > clients as well if it persists. I'm very sorry for that. > Remote hands are rather expensive, so it will cost me > $100/hr to have someone runs memtest86 on my server > since I can't perform it remotely. I'll do it though > since that's your recommendation for the time being. > Hope it will not take more than an hour to run the > test, and hope it turns out as bad memory modules as > you expect because I hate to downgrade after all the > time and money I expended on the upgrade. One hour will be enough if it does find a problem in that time, worth a shot; but not enough to give confidence in the memory if it does not find one, 12 hours better. I actually wonder whether rmap.c:483 is the best memory tester (serious answer would be, in some cases yes, but not in all). Do let me know. If I can find time to rejig the debug patch against your kernel, it would itself keep your server running, replacing the BUG_ON by printks and safety. But without knowing what it will report, I can't judge how satisfactory that would be (and it's unlikely to lead us to the final answer in one go). Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Xterm Hangs - Possible scheduler defect?
> But the other side of the coin is that a SCHED_FIFO userspace task > presumably has extreme latency requirements, so it doesn't *want* to be > preempted by some routine kernel operation. People would get irritated if > we were to do that. Just to follow up a bit. People writing apps that run at SCHED_FIFO know that they aren't getting hard real-time, and they are OK with that. If they wanted something more they'd run on RTLinux. Why would it be wrong to preempt the SCHED_FIFO process in the case, assuming that it is too hard to fix a broken design that doesn't allow the necessary kernel threads to run on any CPU? Chad - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Xterm Hangs - Possible scheduler defect?
> `xterm' is waiting for the other CPU to schedule a kernel thread (which is > bound to that CPU). Once that kernel thread has done a little bit of work, > `xterm' can terminate. > > But kernel threads don't run with realtime policy, so your userspace app > has permanently starved that kernel thread. > > It's potentially quite a problem, really. For example it could prevent > various tty operations from completing, it will prevent kjournald from ever > writing back anything (on uniprocessor, etc). I've been waiting for > someone to complain ;) > > But the other side of the coin is that a SCHED_FIFO userspace task > presumably has extreme latency requirements, so it doesn't *want* to be > preempted by some routine kernel operation. People would get irritated if > we were to do that. > > So what to do? It shouldn't need to preempt the kernel operation. Why is the design such that the necessary kernel thread can't run on the other CPU? Chad - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.11+ sata_qstor] libata: sata_qstor cosmetic fixes
Mark Lord wrote: Minor patch for new 2.6.xx sata_qstor driver attached, as per Alexey's fine-toothed comb! :) Signed-off-by: Mark Lord <[EMAIL PROTECTED]> I had to apply this manually, since your mailer "corrupts" the patch by encoding text/plain as base64. Please fix your mailer... The ideal is an inline patch, rather than an attachment anyway. e.g. To: ... From: ... Subject: ... Patch description Patch cat'd to 'sendmail -t'. Sendmail (or another MTA which provides a /usr/sbin/sendmail wrapper) will automatically fill in other headers like Message-ID and Date. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
module insert question
Hi, Can you please let me know, what all files does the OS look into to load modules? I see the following messages during boot rather installation: == Finished bus probing modules to insert tg3 aic79xx == which files does the OS look into to load tg3 and aic79xx after finishing bus probing. I guess modprobe.conf, modules.alias, modules.pcimap. with regards, Anil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
about printk
Dear all I am new to this place, Please correct me if i am wrong. Before console_init, printk is just filling up the printk buffer. After console_init, will the message print out immediately? best regard Mike,Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02
On Wed, 23 Feb 2005, Lee Revell wrote: > On Wed, 2005-02-23 at 20:53 +, Hugh Dickins wrote: > > On Wed, 23 Feb 2005, Hugh Dickins wrote: > > > Please replace by new patch below, which I'm now running through lmbench. > > > > That second patch seems fine, and I see no lmbench regression from it. > > Should go into 2.6.11, right? That's up to Andrew (and Linus). I was thinking that way when I rushed you the patch. But given that you have remaining unresolved latency issues nearby (zap_pte_range, clear_page_range), and given the warning shot that I screwed up my first attempt, I'd be inclined to say hold off. It's a pity: for a while we were thinking 2.6.11 would be a big step forward for mainline latency; but it now looks to me like these tests have come too late in the cycle to be dealt with safely. In other mail, you do expect people still to be using Ingo's patches, so probably this patch should stick there (and in -mm) for now. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[2/14] Orinoco driver updates - update printk()s
Reformats printk()s, comments, labels and other cosmetic strings in the orinoco driver. Also moves, removes, and adds ratelimiting in some places. Behavioural changes are trivial/cosmetic only. This reduces the cosmetic/trivial differences between the current kernel version, and the CVS version of the driver; one small step towards full merge. Signed-off-by: David Gibson <[EMAIL PROTECTED]> Index: working-2.6/drivers/net/wireless/hermes.c === --- working-2.6.orig/drivers/net/wireless/hermes.c 2005-02-10 14:47:39.572667480 +1100 +++ working-2.6/drivers/net/wireless/hermes.c 2005-02-10 14:47:41.293405888 +1100 @@ -48,6 +48,7 @@ #include #include #include +#include #include #include "hermes.h" @@ -232,13 +233,16 @@ err = hermes_issue_cmd(hw, cmd, parm0); if (err) { if (! hermes_present(hw)) { - printk(KERN_WARNING "hermes @ %p: " - "Card removed while issuing command.\n", - hw->iobase); + if (net_ratelimit()) + printk(KERN_WARNING "hermes @ %p: " + "Card removed while issuing command " + "0x%04x.\n", hw->iobase, cmd); err = -ENODEV; } else - printk(KERN_ERR "hermes @ %p: Error %d issuing command.\n", - hw->iobase, err); + if (net_ratelimit()) + printk(KERN_ERR "hermes @ %p: " + "Error %d issuing command 0x%04x.\n", + hw->iobase, err, cmd); goto out; } @@ -251,17 +255,16 @@ } if (! hermes_present(hw)) { - printk(KERN_WARNING "hermes @ %p: " - "Card removed while waiting for command completion.\n", - hw->iobase); + printk(KERN_WARNING "hermes @ %p: Card removed " + "while waiting for command 0x%04x completion.\n", + hw->iobase, cmd); err = -ENODEV; goto out; } if (! (reg & HERMES_EV_CMD)) { - printk(KERN_ERR "hermes @ %p: " - "Timeout waiting for command completion.\n", - hw->iobase); + printk(KERN_ERR "hermes @ %p: Timeout waiting for " + "command 0x%04x completion.\n", hw->iobase, cmd); err = -ETIMEDOUT; goto out; } @@ -481,14 +484,13 @@ *length = rlength; if (rtype != rid) - printk(KERN_WARNING "hermes @ %p: " - "hermes_read_ltv(): rid (0x%04x) does not match type (0x%04x)\n", - hw->iobase, rid, rtype); + printk(KERN_WARNING "hermes @ %p: %s(): " + "rid (0x%04x) does not match type (0x%04x)\n", + hw->iobase, __FUNCTION__, rid, rtype); if (HERMES_RECLEN_TO_BYTES(rlength) > bufsize) printk(KERN_WARNING "hermes @ %p: " "Truncating LTV record from %d to %d bytes. " - "(rid=0x%04x, len=0x%04x)\n", - hw->iobase, + "(rid=0x%04x, len=0x%04x)\n", hw->iobase, HERMES_RECLEN_TO_BYTES(rlength), bufsize, rid, rlength); nwords = min((unsigned)rlength - 1, bufsize / 2); Index: working-2.6/drivers/net/wireless/orinoco_pci.c === --- working-2.6.orig/drivers/net/wireless/orinoco_pci.c 2005-02-10 14:47:39.573667328 +1100 +++ working-2.6/drivers/net/wireless/orinoco_pci.c 2005-02-10 14:47:41.294405736 +1100 @@ -151,24 +151,18 @@ /* Assert the reset until the card notice */ hermes_write_regn(hw, PCI_COR, HERMES_PCI_COR_MASK); - printk(KERN_NOTICE "Reset done"); timeout = jiffies + (HERMES_PCI_COR_ONT * HZ / 1000); while(time_before(jiffies, timeout)) { - printk("."); mdelay(1); } - printk(";\n"); //mdelay(HERMES_PCI_COR_ONT); /* Give time for the card to recover from this hard effort */ hermes_write_regn(hw, PCI_COR, 0x); - printk(KERN_NOTICE "Clear Reset"); timeout = jiffies + (HERMES_PCI_COR_OFFT * HZ / 1000); while(time_before(jiffies, timeout)) { - printk("."); mdelay(1); } - printk(";\n"); //mdelay(HERMES_PCI_COR_OFFT); /* The card is ready when it's no longer busy */ @@ -183,7 +177,6 @@ printk(KERN_ERR PFX "Busy timeout\n"); return -ETIMEDOUT; } - printk(K
[7/14] Orinoco driver updates - use modern module_parm()
Add descrptions to module parameters in the orinoco driver, and also add permissions to allow them to be exported in sysfs. Signed-off-by: David Gibson <[EMAIL PROTECTED]> Index: working-2.6/drivers/net/wireless/orinoco.c === --- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-02-10 13:19:14.0 +1100 +++ working-2.6/drivers/net/wireless/orinoco.c 2005-02-10 13:24:03.0 +1100 @@ -461,12 +461,14 @@ /* Level of debugging. Used in the macros in orinoco.h */ #ifdef ORINOCO_DEBUG int orinoco_debug = ORINOCO_DEBUG; -module_param(orinoco_debug, int, 0); +module_param(orinoco_debug, int, 0644); +MODULE_PARM_DESC(orinoco_debug, "Debug level"); EXPORT_SYMBOL(orinoco_debug); #endif static int suppress_linkstatus; /* = 0 */ -module_param(suppress_linkstatus, bool, 0); +module_param(suppress_linkstatus, bool, 0644); +MODULE_PARM_DESC(suppress_linkstatus, "Don't log link status changes"); // /* Compile time configuration and compatibility stuff */ -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[5/14] Orinoco driver updates - cleanup low-level code
Apply some cleanups to the low-level orinoco handling code in hermes.[ch]. This cleans up some error handling code, corrects an error code to something more accurate, and also increases a timeout value. This last can (when the hardware plays up) cause long delays with spinlocks held, which is bad, but is rather less prone to prematurely giving up, which has the unfortunate habit of fatally confusing the hardware in other ways :-/. Signed-off-by: David Gibson <[EMAIL PROTECTED]> Index: working-2.6/drivers/net/wireless/hermes.c === --- working-2.6.orig/drivers/net/wireless/hermes.c 2005-01-12 15:22:34.263633584 +1100 +++ working-2.6/drivers/net/wireless/hermes.c 2004-11-05 13:59:07.0 +1100 @@ -383,12 +383,17 @@ reg = hermes_read_reg(hw, oreg); } - if (reg & HERMES_OFFSET_BUSY) { - return -ETIMEDOUT; - } + if (reg != offset) { + printk(KERN_ERR "hermes @ %p: BAP%d offset %s: " + "reg=0x%x id=0x%x offset=0x%x\n", hw->iobase, bap, + (reg & HERMES_OFFSET_BUSY) ? "timeout" : "error", + reg, id, offset); + + if (reg & HERMES_OFFSET_BUSY) { + return -ETIMEDOUT; + } - if (reg & HERMES_OFFSET_ERR) { - return -EIO; + return -EIO;/* error or wrong offset */ } return 0; @@ -476,7 +481,7 @@ rlength = hermes_read_reg(hw, dreg); if (! rlength) - return -ENOENT; + return -ENODATA; rtype = hermes_read_reg(hw, dreg); Index: working-2.6/drivers/net/wireless/hermes.h === --- working-2.6.orig/drivers/net/wireless/hermes.h 2005-01-12 11:13:41.0 +1100 +++ working-2.6/drivers/net/wireless/hermes.h 2004-11-05 13:53:55.0 +1100 @@ -340,7 +340,7 @@ #ifdef __KERNEL__ /* Timeouts */ -#define HERMES_BAP_BUSY_TIMEOUT (500) /* In iterations of ~1us */ +#define HERMES_BAP_BUSY_TIMEOUT (1) /* In iterations of ~1us */ /* Basic control structure */ typedef struct hermes { -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [14/14] Orinoco driver updates - update version and changelog
applied patches 1-14 to netdev-2.6. We'll let it sit there for a bit, for testing and such. (netdev-2.6 gets auto-propagated to -mm) Thanks for your patience and perserverance. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext2/3 files per directory limits
Ron Peterson <[EMAIL PROTECTED]> wrote: > > I would like to better understand ext2/3's performance characteristics. > > I'm specifically interested in how ext2/3 will handle a /var/spool/mail > directory w/ ~6000 mbox format inboxes, handling approx 1GB delivered as > 75,000 messages daily. Virtually all access is via imap, w/ approx > ~1000 imapd processes running during peak load. Local delivery is via > procmail, which by default uses both kernel-supported locking calls and > .lock files. > > I understand that various tuning parameters will have an impact, > e.g. putting the journal on a separate device, setting the noatime mount > option, etc. I also understand that there are other mailbox formats and > other strategies for locating mail spools (e.g. in user's home > directories). > > I'm interested in people's thoughts on these issues, but I'm mostly > interested in whether or not the scenario I described falls within > ext2/3's designed capabilities. > noatime will help. increasing the journal size _may_ help. With 6k files per directory you'll benefit from indexed directories (htree). Use `tune2fs -O dir_index'. dir_index isn't available for ext2. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.11-rc5
Hey, I hoped -rc4 was the last one, but we had some laptop resource conflicts, various ppc TLB flush issues, some possible stack overflows in networking and a number of other details warranting a quick -rc5 before the final 2.6.11. This time it's really supposed to be a quickie, so people who can, please check it out, and we'll make the real 2.6.11 asap. Mostly pretty small changes (the largest is a new SATA driver that crept in, our bad). But worth another quick round. Linus Summary of changes from v2.6.11-rc4 to v2.6.11-rc5 : o ppc32: Wrong vaddr in flush_hash_one_pte() : o [libata] add ->bmdma_{stop,status} hooks Alan Stern: o USB Hub driver: Add reset recovery-time delay Andrew Morton: o mca resource layout fix o end_buffer_async_read printk ratelimiting o strip.c build fix o alpha: struct resource fix o ppc32: resource layout fixes o sparc64 rusage build fix o sparc64 usb build fix o x86_64: resource layout fix Anton Blanchard: o ppc64: Fix 32bit largepage issue Antonino Daplas: o fbdev: Fix gcc 4.0 compile failure Arjan van de Ven: o Allow heap to be marked executable too Arnaldo Carvalho de Melo: o [TCP]: Fix excessive stack usage resulting in OOPS with 4KSTACKS Art Haas: o [SPARC]:Check prom_getproperty() return value in prom_nodematch() Bartlomiej Zolnierkiewicz: o [ide] fix ide_get_error_location() for LBA28 Ben Dooks: o [ARM PATCH] 2480/1: IXP4XX - cleanup resource for i2c controller o [ARM PATCH] 2481/1: IXP2000 - replace sti/cli with local_irq{save,restore} o [ide] Kconfig for VR1000 machine driver selection Benjamin Herrenschmidt: o radeonfb: typos fixes o radeonfb: Fix hang on boot with some laptops o Fix possible race with 4level-fixup.h o Check for wraps in copy_page_range o Fix buf in zeromap_pud_range() losing virtual address o radeonfb: Workaround memory corruption accel problem o ppc32: fix ptep_test_and_clear_young o ppc32: kernel mapping breakage Bjorn Helgaas: o de214x.c uses uninitialized pci_dev->irq Bob Breuer: o [SPARC]: Check prom_getproperty return value Brian Murphy: o USB: ehci requeue revisit Christoph Hellwig: o block new writers on frozen filesystems Corey Minyard: o IPMI: Fix LAN bridging Daniel Ritz: o PCI: support PCI_PM_CAP version 1 David Brownell: o USB: ehci patch for NF4 port miscounting David S. Miller: o [COMPAT]: TUNSETIFF needs to copy back data after ioctl o [SPARC]: Fix cg3 fb blanking o [SPARC]: Fix video mode probing in atyfb driver o [TG3]: Always check tg3_readphy() return value o [TG3]: Update driver version and reldate o [SPARC64]: auxio_register is pointer not integer o [SPARC64]: Put PROM trampolines into asm file o [SPARC64]: Fix access_ok() and friends warnings o [SPARC64]: Fix access_ok() args in sys_sparc32.c:get_tv32() o [SPARC64]: Use common sys_ipc() compat code o [SPARC64]: BUG on rediculious memcpy lengths Dmitry Torokhov: o ALPS: do not activate on unsupported models François Romieu: o dscc4: use of uncompletely initialized struct o dscc4: code factorisation o dscc4: error status checking and pci janitoring o dscc4: removal of unneeded casts o dscc4: removal of unneeded variable o r8169: endianness fixes o r8169: merge of Realtek's code o r8169: typo in debugging code o r8169: screaming irq when the device is closed o r8169: synchronization and balancing when the device is closed o r8169: fix rx skb allocation error logging o r8169: skb alignment nitpicking o r8169: removal of unused #define o r8169: uniformize comments o r8169: IRQ races during change of mtu o r8169: factor out some code Gary N. Spiess: o natsemi long cable fix Herbert Xu: o ISDN locking fix o [IPSEC]: Move dst->child loop from dst_ifdown to xfrm_dst_ifdown o [NET]: Add netdev argument to dst ifdown Hideaki Yoshifuji: o [IPV6]: Fix IPV6_PKTINFO et al. handling in udpv6_recvmsg() Hirokazu Takata: o m32r: build fix for SMP kernel o m32r: fix sys_clone() o m32r: defconfig updates o m32r: warning fix Jeff Garzik: o [libata sata_via] minor cleanups o [libata sata_via] add support for VT6421 SATA o [libata] do not call pci_disable_device() for certain errors o libata kfree fix o [libata] Add missing hooks, to avoid oops in advanced SATA drivers Joe Korty: o memset argument order misuses John W. Linville: o libata: fix command queue leak when xlat_func fails Krzysztof Helt: o [SPARC32]: Need to clear PSR_EF in psr of childregs on fork() on SMP Len Brown: o [ACPI] ACPICA 20050211 from Bob Moore Lennert Buytenhek: o [ARM PATCH] 2485/1: fix enp2611 coexistence with other machine types o [ARM PATCH] 2486/1: fix incorrect comment in arch/arm/kernel/debug.S o [ARM PATCH] 2487/1: minor IRQ routing tweaks for ENP-2611 o [ARM PATCH] 2493/1: put IXP2000 slowport in 8-bit mode after boot o [ARM PATCH]
Re: [6/14] Orinoco driver updates - cleanup PCI initialization
FYI, pci_set_drvdata() needs to be one of the last functions called during PCI ->probe(). Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [7/14] Orinoco driver updates - use modern module_parm()
David Gibson wrote: Add descrptions to module parameters in the orinoco driver, and also add permissions to allow them to be exported in sysfs. Signed-off-by: David Gibson <[EMAIL PROTECTED]> Index: working-2.6/drivers/net/wireless/orinoco.c === --- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-02-10 13:19:14.0 +1100 +++ working-2.6/drivers/net/wireless/orinoco.c 2005-02-10 13:24:03.0 +1100 @@ -461,12 +461,14 @@ /* Level of debugging. Used in the macros in orinoco.h */ #ifdef ORINOCO_DEBUG int orinoco_debug = ORINOCO_DEBUG; -module_param(orinoco_debug, int, 0); +module_param(orinoco_debug, int, 0644); +MODULE_PARM_DESC(orinoco_debug, "Debug level"); EXPORT_SYMBOL(orinoco_debug); #endif eventually it would be nice to support netif_msg_* Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[8/14] Orinoco driver updates - PCMCIA initialization cleanups
Cleanup the various bits of initialization code for PCMCIA / PC-Card orinoco devices. This includes one important bugfix where we could fail to take the lock in some circumstances. Signed-off-by: David Gibson <[EMAIL PROTECTED]> Index: working-2.6/drivers/net/wireless/orinoco_cs.c === --- working-2.6.orig/drivers/net/wireless/orinoco_cs.c 2005-02-18 12:04:03.157157240 +1100 +++ working-2.6/drivers/net/wireless/orinoco_cs.c 2005-02-18 12:11:49.0 +1100 @@ -57,8 +57,8 @@ /* Some D-Link cards have buggy CIS. They do work at 5v properly, but * don't have any CIS entry for it. This workaround it... */ static int ignore_cis_vcc; /* = 0 */ - module_param(ignore_cis_vcc, int, 0); +MODULE_PARM_DESC(ignore_cis_vcc, "Allow voltage mismatch between card and socket"); // /* Magic constants */ @@ -128,6 +128,7 @@ if (err) return err; + msleep(100); clear_bit(0, &card->hard_reset_in_progress); return 0; @@ -166,9 +167,10 @@ link->priv = dev; /* Interrupt setup */ - link->irq.Attributes = IRQ_TYPE_EXCLUSIVE; + link->irq.Attributes = IRQ_TYPE_EXCLUSIVE | IRQ_HANDLE_PRESENT; link->irq.IRQInfo1 = IRQ_LEVEL_ID; - link->irq.Handler = NULL; + link->irq.Handler = orinoco_interrupt; + link->irq.Instance = dev; /* General socket configuration defaults can go here. In this * client, we assume very little, and rely on the CIS for @@ -184,6 +186,7 @@ dev_list = link; client_reg.dev_info = &dev_info; + client_reg.Attributes = INFO_IO_CLIENT | INFO_CARD_SHARE; client_reg.EventMask = CS_EVENT_CARD_INSERTION | CS_EVENT_CARD_REMOVAL | CS_EVENT_RESET_PHYSICAL | CS_EVENT_CARD_RESET | @@ -309,8 +312,8 @@ cistpl_cftable_entry_t *cfg = &(parse.cftable_entry); cistpl_cftable_entry_t dflt = { .index = 0 }; - if (pcmcia_get_tuple_data(handle, &tuple) != 0 || - pcmcia_parse_tuple(handle, &tuple, &parse) != 0) + if ( (pcmcia_get_tuple_data(handle, &tuple) != 0) + || (pcmcia_parse_tuple(handle, &tuple, &parse) != 0)) goto next_entry; if (cfg->flags & CISTPL_CFTABLE_DEFAULT) @@ -349,8 +352,7 @@ dflt.vpp1.param[CISTPL_POWER_VNOM] / 1; /* Do we need to allocate an interrupt? */ - if (cfg->irq.IRQInfo1 || dflt.irq.IRQInfo1) - link->conf.Attributes |= CONF_ENABLE_IRQ; + link->conf.Attributes |= CONF_ENABLE_IRQ; /* IO window settings */ link->io.NumPorts1 = link->io.NumPorts2 = 0; @@ -402,14 +404,7 @@ * a handler to the interrupt, unless the 'Handler' member of * the irq structure is initialized. */ - if (link->conf.Attributes & CONF_ENABLE_IRQ) { - link->irq.Attributes = IRQ_TYPE_EXCLUSIVE | IRQ_HANDLE_PRESENT; - link->irq.IRQInfo1 = IRQ_LEVEL_ID; - link->irq.Handler = orinoco_interrupt; - link->irq.Instance = dev; - - CS_CHECK(RequestIRQ, pcmcia_request_irq(link->handle, &link->irq)); - } + CS_CHECK(RequestIRQ, pcmcia_request_irq(link->handle, &link->irq)); /* We initialize the hermes structure before completing PCMCIA * configuration just in case the interrupt handler gets @@ -434,8 +429,6 @@ SET_MODULE_OWNER(dev); card->node.major = card->node.minor = 0; - /* register_netdev will give us an ethX name */ - dev->name[0] = '\0'; SET_NETDEV_DEV(dev, &handle_to_dev(handle)); /* Tell the stack we exist */ if (register_netdev(dev) != 0) { @@ -458,8 +451,7 @@ if (link->conf.Vpp1) printk(", Vpp %d.%d", link->conf.Vpp1 / 10, link->conf.Vpp1 % 10); - if (link->conf.Attributes & CONF_ENABLE_IRQ) - printk(", irq %d", link->irq.AssignedIRQ); + printk(", irq %d", link->irq.AssignedIRQ); if (link->io.NumPorts1) printk(", io 0x%04x-0x%04x", link->io.BasePort1, link->io.BasePort1 + link->io.NumPorts1 - 1); @@ -525,12 +517,12 @@ case CS_EVENT_CARD_REMOVAL: link->state &= ~DEV_PRESENT; if (link->state & DEV_CONFIG) { - orinoco_lock(priv, &flags); + unsigned long flags; + spin_lock_irqsave(&priv->lock, flags); netif_device_detach(dev); priv->hw_unavailable++; - - orinoco_unlock(priv, &flags
Re: ext2/3 files per directory limits
On Wed, 2005-02-23 at 22:11 -0500, Ron Peterson wrote: > I would like to better understand ext2/3's performance characteristics. > > I'm specifically interested in how ext2/3 will handle a /var/spool/mail > directory w/ ~6000 mbox format inboxes, handling approx 1GB delivered as > 75,000 messages daily. Virtually all access is via imap, w/ approx > ~1000 imapd processes running during peak load. Local delivery is via > procmail, which by default uses both kernel-supported locking calls and > .lock files. > > I understand that various tuning parameters will have an impact, > e.g. putting the journal on a separate device, setting the noatime mount > option, etc. I also understand that there are other mailbox formats and > other strategies for locating mail spools (e.g. in user's home > directories). > > I'm interested in people's thoughts on these issues, but I'm mostly > interested in whether or not the scenario I described falls within > ext2/3's designed capabilities. Yes, ext2 and ext3 can handle that load easily. You should not have to do any special tuning. The real question is why in the world you would want to use mbox format for this. It simply does not scale. Use maildir. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[12/14] Orinoco driver updates - WEP updates
Updates to the WEP configuration code. This adds support for shared key authentication on Agere firmwares. It also adds support (in some cases) for changing the WEP keys without disabling the MAC port (thus triggering a reassociation by the firmware). This is needed by 802.1x implementations, although it's not clear if the code so far is sufficient to allow working 802.1x. Signed-off-by: David Gibson <[EMAIL PROTECTED]> Index: working-2.6/drivers/net/wireless/orinoco.c === --- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-02-24 14:50:55.904651256 +1100 +++ working-2.6/drivers/net/wireless/orinoco.c 2005-02-24 14:50:57.300439064 +1100 @@ -1437,55 +1437,46 @@ return err; } -static int __orinoco_hw_setup_wep(struct orinoco_private *priv) +/* Change the WEP keys and/or the current keys. Can be called + * either from __orinoco_hw_setup_wep() or directly from + * orinoco_ioctl_setiwencode(). In the later case the association + * with the AP is not broken (if the firmware can handle it), + * which is needed for 802.1x implementations. */ +static int __orinoco_hw_setup_wepkeys(struct orinoco_private *priv) { hermes_t *hw = &priv->hw; int err = 0; - int master_wep_flag; - int auth_flag; switch (priv->firmware_type) { - case FIRMWARE_TYPE_AGERE: /* Agere style WEP */ - if (priv->wep_on) { - err = hermes_write_wordrec(hw, USER_BAP, - HERMES_RID_CNFTXKEY_AGERE, - priv->tx_key); - if (err) - return err; - - err = HERMES_WRITE_RECORD(hw, USER_BAP, - HERMES_RID_CNFWEPKEYS_AGERE, - &priv->keys); - if (err) - return err; - } + case FIRMWARE_TYPE_AGERE: + err = HERMES_WRITE_RECORD(hw, USER_BAP, + HERMES_RID_CNFWEPKEYS_AGERE, + &priv->keys); + if (err) + return err; err = hermes_write_wordrec(hw, USER_BAP, - HERMES_RID_CNFWEPENABLED_AGERE, - priv->wep_on); + HERMES_RID_CNFTXKEY_AGERE, + priv->tx_key); if (err) return err; break; - - case FIRMWARE_TYPE_INTERSIL: /* Intersil style WEP */ - case FIRMWARE_TYPE_SYMBOL: /* Symbol style WEP */ - master_wep_flag = 0;/* Off */ - if (priv->wep_on) { + case FIRMWARE_TYPE_INTERSIL: + case FIRMWARE_TYPE_SYMBOL: + { int keylen; int i; - /* Fudge around firmware weirdness */ + /* Force uniform key length to work around firmware bugs */ keylen = le16_to_cpu(priv->keys[priv->tx_key].len); + if (keylen > LARGE_KEY_SIZE) { + printk(KERN_ERR "%s: BUG: Key %d has oversize length %d.\n", + priv->ndev->name, priv->tx_key, keylen); + return -E2BIG; + } + /* Write all 4 keys */ for(i = 0; i < ORINOCO_MAX_KEYS; i++) { -/* int keylen = le16_to_cpu(priv->keys[i].len); */ - - if (keylen > LARGE_KEY_SIZE) { - printk(KERN_ERR "%s: BUG: Key %d has oversize length %d.\n", - priv->ndev->name, i, keylen); - return -E2BIG; - } - err = hermes_write_ltv(hw, USER_BAP, HERMES_RID_CNFDEFAULTKEY0 + i, HERMES_BYTES_TO_RECLEN(keylen), @@ -1500,27 +1491,63 @@ priv->tx_key); if (err) return err; - - if (priv->wep_restrict) { - auth_flag = 2; - master_wep_flag = 3; - } else { - /* Authentication is where Intersil and Symbol -* firmware differ... */ -
Re: PPC RT Patch..
john cooper wrote: Ingo, We've had a PPC port of your RT work underway with a focus on trace instrumentation. This is based upon realtime-preempt-2.6.11-rc2-V0.7.37-02. A diff is attached. To the extent possible the tracing facilities are the same as your x86 work. In the process a few PPC/gcc issues needed to be resolved. There is also a bug fix contained for tlb_gather_mmu() which was causing debug assertions to be generated in a path which attempted to sleep with a non-zero preempt count. Manish Lachwani mentioned to me that he faced the same issue with the MIPS RT support and that when he discussed it with Ingo that the solution was for include/asm-ppc/tlb.h to include/asm-generic/tlb-simple.h when PREEMPT_RT is turned on. The patch does this for the #ifdef CONFIG_PPC_STD_MMU case, but not for the #else case. I don't know which case is used for the Ampro board. This does build and function when SMP is configured, though we have not yet verified it on other than a uniprocessor. As a simplifying assumption, testing has thus far concentrated on the following modes: PREEMPT_NONE - verify baseline regression PREEMPT_RT && !PREEMPT_SMP - typical for an embedded RT PPC application PREEMPT_RT && PREEMPT_SMP - kicks in live locking code which otherwise receives no coverage. This is functionally equivalent to the above config on a single CPU target thus no MP dynamic testing is achieved. Still quite useful IMHO. The target used for development/testing is an Ampro EnCore PP1 which sports a 300Mhz MPC8245. For testing this boots with NFS as root. An mp3 decode at nice --20 is launched which requires just under 20% of the CPU to maintain an uninterrupted audio decode and output. To this a series of "du -s /" are launched to soak up excess CPU bandwidth. Perhaps not rigorous but a fair sanity check and load for the purpose at hand. Under these conditions maximum scheduling latencies are seen in the 120-150us range. Note no attempt has yet been made to optimize arch specific paths and full trace instrumentation has been enabled. I've written some logging code to help find problems such as the tlb issue above. As it has not been made general I've removed it from this patch. At some point I'll likely revisit this. Comments/suggestions welcome. I am glad to see the instrumentation and measurement related code in your patch. (My patch of last week ("Frank's patch") is lacking that code.) Other differences between the two patches are: arch/ppc/syslib/i8259.c Frank neglected to convert i8259_lock to a raw spinlock. arch/ppc/kernel/signal.c John added an enable of irqs in do_signal() #ifdef CONFIG_PREEMPT_RT arch/ppc/kernel/traps.c John added an enable of irqs and preempt_check_resched() in _exception(). various files Frank added the intrusive variable tb_to_us for use by cycles_to_usec() and added an ugly #ifdef in cycles_to_usec(). John hard-coded cpu_khz for one specific board so that no change would be needed in cycles_to_usec(). various files John has the mmu_gather fix that is described above. John's patch and Frank's patch are otherwise mostly the same, except for the differences that result from being based on different kernel versions. I am glad to see that because it means that two sets of eyes have agreed. Frank's patch may have missed some EXPORT_SYMBOL()s in arch/ppc/lib/locks.c. I'll check those over again tomorrow. -john -Frank -- Frank Rowand <[EMAIL PROTECTED]> MontaVista Software, Inc - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[13/14] Orinoco driver updates - update firmware detection
Update firmware detection code. This will now reliably detect Intersil firmwares past verison 1.x, a serious flaw in the previous code. It cleans up the code, and reduces the size of the private structure by using single bits for the various firmware feature flags. Signed-off-by: David Gibson <[EMAIL PROTECTED]> Index: working-2.6/drivers/net/wireless/orinoco.c === --- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-02-24 14:50:57.300439064 +1100 +++ working-2.6/drivers/net/wireless/orinoco.c 2005-02-24 14:50:59.879047056 +1100 @@ -2047,39 +2047,54 @@ /* Initialization */ // -struct sta_id { +struct comp_id { u16 id, variant, major, minor; } __attribute__ ((packed)); -static int determine_firmware_type(struct net_device *dev, struct sta_id *sta_id) +static inline fwtype_t determine_firmware_type(struct comp_id *nic_id) { - /* FIXME: this is fundamentally broken */ - unsigned int firmver = ((u32)sta_id->major << 16) | sta_id->minor; - - if (sta_id->variant == 1) + if (nic_id->id < 0x8000) return FIRMWARE_TYPE_AGERE; - else if ((sta_id->variant == 2) && - ((firmver == 0x10001) || (firmver == 0x20001))) + else if (nic_id->id == 0x8000 && nic_id->major == 0) return FIRMWARE_TYPE_SYMBOL; else return FIRMWARE_TYPE_INTERSIL; } -static void determine_firmware(struct net_device *dev) +/* Set priv->firmware type, determine firmware properties */ +static int determine_firmware(struct net_device *dev) { struct orinoco_private *priv = netdev_priv(dev); hermes_t *hw = &priv->hw; int err; - struct sta_id sta_id; + struct comp_id nic_id, sta_id; unsigned int firmver; char tmp[SYMBOL_MAX_VER_LEN+1]; + /* Get the hardware version */ + err = HERMES_READ_RECORD(hw, USER_BAP, HERMES_RID_NICID, &nic_id); + if (err) { + printk(KERN_ERR "%s: Cannot read hardware identity: error %d\n", + dev->name, err); + return err; + } + + le16_to_cpus(&nic_id.id); + le16_to_cpus(&nic_id.variant); + le16_to_cpus(&nic_id.major); + le16_to_cpus(&nic_id.minor); + printk(KERN_DEBUG "%s: Hardware identity %04x:%04x:%04x:%04x\n", + dev->name, nic_id.id, nic_id.variant, + nic_id.major, nic_id.minor); + + priv->firmware_type = determine_firmware_type(&nic_id); + /* Get the firmware version */ err = HERMES_READ_RECORD(hw, USER_BAP, HERMES_RID_STAID, &sta_id); if (err) { - printk(KERN_WARNING "%s: Error %d reading firmware info. Wildly guessing capabilities...\n", + printk(KERN_ERR "%s: Cannot read station identity: error %d\n", dev->name, err); - memset(&sta_id, 0, sizeof(sta_id)); + return err; } le16_to_cpus(&sta_id.id); @@ -2090,8 +2105,23 @@ dev->name, sta_id.id, sta_id.variant, sta_id.major, sta_id.minor); - if (! priv->firmware_type) - priv->firmware_type = determine_firmware_type(dev, &sta_id); + switch (sta_id.id) { + case 0x15: + printk(KERN_ERR "%s: Primary firmware is active\n", + dev->name); + return -ENODEV; + case 0x14b: + printk(KERN_ERR "%s: Tertiary firmware is active\n", + dev->name); + return -ENODEV; + case 0x1f: /* Intersil, Agere, Symbol Spectrum24 */ + case 0x21: /* Symbol Spectrum24 Trilogy */ + break; + default: + printk(KERN_NOTICE "%s: Unknown station ID, please report\n", + dev->name); + break; + } /* Default capabilities */ priv->has_sensitivity = 1; @@ -2107,9 +2137,8 @@ case FIRMWARE_TYPE_AGERE: /* Lucent Wavelan IEEE, Lucent Orinoco, Cabletron RoamAbout, ELSA, Melco, HP, IBM, Dell 1150, Compaq 110/210 */ - printk(KERN_DEBUG "%s: Looks like a Lucent/Agere firmware " - "version %d.%02d\n", dev->name, - sta_id.major, sta_id.minor); + snprintf(priv->fw_name, sizeof(priv->fw_name) - 1, +"Lucent/Agere %d.%02d", sta_id.major, sta_id.minor); firmver = ((unsigned long)sta_id.major << 16) | sta_id.minor; @@ -2152,14 +2181,15 @@ tmp[SYMBOL_MAX_VER_LEN] = '\0'; } - printk(KERN_DEBUG "%s: Looks like a Symbol firmware " - "version [%s] (parsing to %X)\n", dev->name, - tmp, f
[11/14] Orinoco driver updates - delay Tx wake
Delay netif_wake_queue() until the packet has actually been transmitted, rather than just when the firmware has copied it into its internal buffers. This seems to prevent problems on some Intersil firmware versions (I suspect the problems were caused by the firmware's buffers filling up). Signed-off-by: David Gibson <[EMAIL PROTECTED]> Index: working-2.6/drivers/net/wireless/orinoco.c === --- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-02-18 12:48:30.523655896 +1100 +++ working-2.6/drivers/net/wireless/orinoco.c 2005-02-18 12:58:09.407652152 +1100 @@ -901,8 +901,6 @@ printk(KERN_WARNING "%s: Allocate event on unexpected fid (%04X)\n", dev->name, fid); return; - } else { - netif_wake_queue(dev); } hermes_write_regn(hw, ALLOCFID, DUMMY_FID); @@ -915,6 +913,8 @@ stats->tx_packets++; + netif_wake_queue(dev); + hermes_write_regn(hw, TXCOMPLFID, DUMMY_FID); } @@ -941,6 +941,7 @@ stats->tx_errors++; + netif_wake_queue(dev); hermes_write_regn(hw, TXCOMPLFID, DUMMY_FID); } -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[6/14] Orinoco driver updates - cleanup PCI initialization
Update the initialization code in the various PCI incarnations of the orinoco driver. This applies similar initialization and shutdown cleanups to the orinoco_pci, orinoco_plx and orinoco_tmd drivers. It also adds COR reset support to the orinoco_plx and orinoco_tmd drivers, improves PCI power management support in the orinoco_pci driver and adds a couple of extra supported cards to the ID tables. Signed-off-by: David Gibson <[EMAIL PROTECTED]> Index: working-2.6/drivers/net/wireless/orinoco_pci.c === --- working-2.6.orig/drivers/net/wireless/orinoco_pci.c 2005-01-12 15:47:48.215477920 +1100 +++ working-2.6/drivers/net/wireless/orinoco_pci.c 2005-01-12 16:10:57.324301280 +1100 @@ -129,6 +129,11 @@ #define HERMES_PCI_COR_OFFT(500) /* ms */ #define HERMES_PCI_COR_BUSYT (500) /* ms */ +/* Orinoco PCI specific data */ +struct orinoco_pci_card { + void __iomem *pci_ioaddr; +}; + /* * Do a soft reset of the PCI card using the Configuration Option Register * We need this to get going... @@ -164,8 +169,9 @@ mdelay(1); reg = hermes_read_regn(hw, CMD); } - /* Did we timeout ? */ - if(time_after_eq(jiffies, timeout)) { + + /* Still busy? */ + if (reg & HERMES_CMD_BUSY) { printk(KERN_ERR PFX "Busy timeout\n"); return -ETIMEDOUT; } @@ -184,6 +190,7 @@ u16 __iomem *pci_ioaddr = NULL; unsigned long pci_iolen; struct orinoco_private *priv = NULL; + struct orinoco_pci_card *card; struct net_device *dev = NULL; err = pci_enable_device(pdev); @@ -192,24 +199,31 @@ return err; } + err = pci_request_regions(pdev, DRIVER_NAME); + if (err != 0) { + printk(KERN_ERR PFX "Cannot obtain PCI resources\n"); + goto fail_resources; + } + /* Resource 0 is mapped to the hermes registers */ pci_iorange = pci_resource_start(pdev, 0); pci_iolen = pci_resource_len(pdev, 0); pci_ioaddr = ioremap(pci_iorange, pci_iolen); - if (! pci_iorange) { + if (!pci_iorange) { printk(KERN_ERR PFX "Cannot remap hardware registers\n"); - goto fail; + goto fail_map; } /* Allocate network device */ - dev = alloc_orinocodev(0, NULL); + dev = alloc_orinocodev(sizeof(*card), orinoco_pci_cor_reset); if (! dev) { err = -ENOMEM; - goto fail; + goto fail_alloc; } priv = netdev_priv(dev); - dev->base_addr = (unsigned long) pci_ioaddr; + card = priv->card; + card->pci_ioaddr = pci_ioaddr; dev->mem_start = pci_iorange; dev->mem_end = pci_iorange + pci_iolen - 1; SET_MODULE_OWNER(dev); @@ -226,14 +240,14 @@ if (err) { printk(KERN_ERR PFX "Cannot allocate IRQ %d\n", pdev->irq); err = -EBUSY; - goto fail; + goto fail_irq; } dev->irq = pdev->irq; /* Perform a COR reset to start the card */ - if(orinoco_pci_cor_reset(priv) != 0) { + err = orinoco_pci_cor_reset(priv); + if (err) { printk(KERN_ERR PFX "Initial reset failed\n"); - err = -ETIMEDOUT; goto fail; } @@ -250,16 +264,19 @@ return 0; fail: - if (dev) { - if (dev->irq) - free_irq(dev->irq, dev); + free_irq(pdev->irq, dev); - free_orinocodev(dev); - } + fail_irq: + pci_set_drvdata(pdev, NULL); + free_orinocodev(dev); + + fail_alloc: + iounmap(pci_ioaddr); - if (pci_ioaddr) - iounmap(pci_ioaddr); + fail_map: + pci_release_regions(pdev); + fail_resources: pci_disable_device(pdev); return err; @@ -269,18 +286,14 @@ { struct net_device *dev = pci_get_drvdata(pdev); struct orinoco_private *priv = netdev_priv(dev); + struct orinoco_pci_card *card = priv->card; unregister_netdev(dev); - - if (dev->irq) - free_irq(dev->irq, dev); - - if (priv->hw.iobase) - iounmap(priv->hw.iobase); - + free_irq(dev->irq, dev); pci_set_drvdata(pdev, NULL); free_orinocodev(dev); - + iounmap(card->pci_ioaddr); + pci_release_regions(pdev); pci_disable_device(pdev); } @@ -312,6 +325,9 @@ orinoco_unlock(priv, &flags); + pci_save_state(pdev); + pci_set_power_state(pdev, 3); + return 0; } @@ -324,6 +340,9 @@ printk(KERN_DEBUG "%s: Orinoco-PCI waking up\n", dev->name); + pci_set_power_state(pdev, 0); + pci_restore_state(pdev); + err = orinoco_reinit_firmware(dev); if (err) {
[14/14] Orinoco driver updates - update version and changelog
Previous patches have brought the in-kernel orinoco driver roughly to parity with version 0.14alpha2 from out-of-tree. Update the version number and changelog accordingly. Signed-off-by: David Gibson <[EMAIL PROTECTED]> Index: working-2.6/drivers/net/wireless/orinoco.c === --- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-02-24 14:50:59.879047056 +1100 +++ working-2.6/drivers/net/wireless/orinoco.c 2005-02-24 14:51:02.388665536 +1100 @@ -393,6 +393,29 @@ * in the rx_dropped statistics. * o Provided a module parameter to suppress linkstatus messages. * + * v0.13e -> v0.14alpha1 - 30 Sep 2003 - David Gibson + * o Replaced priv->connected logic with netif_carrier_on/off() + * calls. + * o Remove has_ibss_any and never set the CREATEIBSS RID when + * the ESSID is empty. Too many firmwares break if we do. + * o 2.6 merges: Replace pdev->slot_name with pci_name(), remove + * __devinitdata from PCI ID tables, use free_netdev(). + * o Enabled shared-key authentication for Agere firmware (from + * Robert J. Moore + * o Move netif_wake_queue() (back) to the Tx completion from the + * ALLOC event. This seems to prevent/mitigate the rolling + * error -110 problems at least on some Intersil firmwares. + * Theoretically reduces performance, but I can't measure it. + * Patch from Andrew Tridgell + * + * v0.14alpha1 -> v0.14alpha2 - 20 Oct 2003 - David Gibson + * o Correctly turn off shared-key authentication when requested + * (bugfix from Robert J. Moore). + * o Correct airport sleep interfaces for current 2.6 kernels. + * o Add code for key change without disabling/enabling the MAC + * port. This is supposed to allow 802.1x to work sanely, but + * doesn't seem to yet. + * * TODO * o New wireless extensions API (patch from Moustafa * Youssef, updated by Jim Carter and Pavel Roskin). Index: working-2.6/drivers/net/wireless/orinoco.h === --- working-2.6.orig/drivers/net/wireless/orinoco.h 2005-02-24 14:50:59.879047056 +1100 +++ working-2.6/drivers/net/wireless/orinoco.h 2005-02-24 14:51:02.389665384 +1100 @@ -7,7 +7,7 @@ #ifndef _ORINOCO_H #define _ORINOCO_H -#define DRIVER_VERSION "0.13e" +#define DRIVER_VERSION "0.14alpha2" #include #include -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[9/14] Orinoco driver updates - update is_ethersnap()
Make the is_ethersnap() function take a void * rather than a pointer to the internal header structure. This makes more logical sense and reduces dependencies between different parts of the code. Signed-off-by: David Gibson <[EMAIL PROTECTED]> Index: working-2.6/drivers/net/wireless/orinoco.c === --- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-02-24 14:50:48.426788064 +1100 +++ working-2.6/drivers/net/wireless/orinoco.c 2005-02-24 14:50:50.125529816 +1100 @@ -966,15 +966,17 @@ /* Does the frame have a SNAP header indicating it should be * de-encapsulated to Ethernet-II? */ -static inline int is_ethersnap(struct header_struct *hdr) +static inline int is_ethersnap(void *_hdr) { + u8 *hdr = _hdr; + /* We de-encapsulate all packets which, a) have SNAP headers * (i.e. SSAP=DSAP=0xaa and CTRL=0x3 in the 802.2 LLC header * and where b) the OUI of the SNAP header is 00:00:00 or * 00:00:f8 - we need both because different APs appear to use * different OUIs for some reason */ - return (memcmp(&hdr->dsap, &encaps_hdr, 5) == 0) - && ( (hdr->oui[2] == 0x00) || (hdr->oui[2] == 0xf8) ); + return (memcmp(hdr, &encaps_hdr, 5) == 0) + && ( (hdr[5] == 0x00) || (hdr[5] == 0xf8) ); } static inline void orinoco_spy_gather(struct net_device *dev, u_char *mac, -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[10/14] Orinoco driver updates - prohibit IBSS with no ESSID
Remove has_ibss_any flag and never set the CREATEIBSS RID when the ESSID is empty. Too many firmware break if we do. Signed-off-by: David Gibson <[EMAIL PROTECTED]> Index: working-2.6/drivers/net/wireless/orinoco.c === --- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-02-24 14:50:50.125529816 +1100 +++ working-2.6/drivers/net/wireless/orinoco.c 2005-02-24 14:50:53.166067584 +1100 @@ -1580,21 +1580,26 @@ } if (priv->has_ibss) { - err = hermes_write_wordrec(hw, USER_BAP, - HERMES_RID_CNFCREATEIBSS, - priv->createibss); - if (err) { - printk(KERN_ERR "%s: Error %d setting CREATEIBSS\n", dev->name, err); - return err; - } + u16 createibss; - if ((strlen(priv->desired_essid) == 0) && (priv->createibss) - && (!priv->has_ibss_any)) { + if ((strlen(priv->desired_essid) == 0) && (priv->createibss)) { printk(KERN_WARNING "%s: This firmware requires an " "ESSID in IBSS-Ad-Hoc mode.\n", dev->name); /* With wvlan_cs, in this case, we would crash. * hopefully, this driver will behave better... * Jean II */ + createibss = 0; + } else { + createibss = priv->createibss; + } + + err = hermes_write_wordrec(hw, USER_BAP, + HERMES_RID_CNFCREATEIBSS, + createibss); + if (err) { + printk(KERN_ERR "%s: Error %d setting CREATEIBSS\n", + dev->name, err); + return err; } } @@ -2073,7 +2078,6 @@ priv->has_preamble = 0; priv->has_port3 = 1; priv->has_ibss = 1; - priv->has_ibss_any = 0; priv->has_wep = 0; priv->has_big_wep = 0; @@ -2089,7 +2093,6 @@ firmver = ((unsigned long)sta_id.major << 16) | sta_id.minor; priv->has_ibss = (firmver >= 0x60006); - priv->has_ibss_any = (firmver >= 0x60010); priv->has_wep = (firmver >= 0x40020); priv->has_big_wep = 1; /* FIXME: this is wrong - how do we tell Gold cards from the others? */ Index: working-2.6/drivers/net/wireless/orinoco.h === --- working-2.6.orig/drivers/net/wireless/orinoco.h 2005-02-24 14:50:46.549073520 +1100 +++ working-2.6/drivers/net/wireless/orinoco.h 2005-02-24 14:50:53.167067432 +1100 @@ -57,7 +57,7 @@ #define FIRMWARE_TYPE_AGERE 1 #define FIRMWARE_TYPE_INTERSIL 2 #define FIRMWARE_TYPE_SYMBOL 3 - int has_ibss, has_port3, has_ibss_any, ibss_port; + int has_ibss, has_port3, ibss_port; int has_wep, has_big_wep; int has_mwo; int has_pm; -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[4/14] Orinoco driver updates - add free_orinocodev()
Introduce a free_orinocodev() function into the orinoco driver, used by the hardware type/initialization modules to free the device structure in preference to directly calling free_netdev(). At the moment free_orinocodev() just calls free_netdev(). Future merges will make it clean up internal scanning state, so merging this now will reduce the diff noise. Signed-off-by: David Gibson <[EMAIL PROTECTED]> Index: working-2.6/drivers/net/wireless/orinoco.h === --- working-2.6.orig/drivers/net/wireless/orinoco.h 2005-02-18 12:04:03.0 +1100 +++ working-2.6/drivers/net/wireless/orinoco.h 2005-02-18 12:04:03.0 +1100 @@ -107,6 +107,7 @@ extern struct net_device *alloc_orinocodev(int sizeof_card, int (*hard_reset)(struct orinoco_private *)); +extern void free_orinocodev(struct net_device *dev); extern int __orinoco_up(struct net_device *dev); extern int __orinoco_down(struct net_device *dev); extern int orinoco_stop(struct net_device *dev); Index: working-2.6/drivers/net/wireless/orinoco.c === --- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-02-18 12:04:03.0 +1100 +++ working-2.6/drivers/net/wireless/orinoco.c 2005-02-18 13:03:51.846593520 +1100 @@ -2398,6 +2398,11 @@ } +void free_orinocodev(struct net_device *dev) +{ + free_netdev(dev); +} + // /* Wireless extensions */ // @@ -4131,6 +4136,7 @@ // EXPORT_SYMBOL(alloc_orinocodev); +EXPORT_SYMBOL(free_orinocodev); EXPORT_SYMBOL(__orinoco_up); EXPORT_SYMBOL(__orinoco_down); Index: working-2.6/drivers/net/wireless/orinoco_cs.c === --- working-2.6.orig/drivers/net/wireless/orinoco_cs.c 2005-02-18 12:04:03.0 +1100 +++ working-2.6/drivers/net/wireless/orinoco_cs.c 2005-02-18 12:04:03.0 +1100 @@ -235,7 +235,7 @@ dev); unregister_netdev(dev); } - free_netdev(dev); + free_orinocodev(dev); } /* orinoco_cs_detach */ /* Index: working-2.6/drivers/net/wireless/orinoco_pci.c === --- working-2.6.orig/drivers/net/wireless/orinoco_pci.c 2005-02-18 12:04:03.0 +1100 +++ working-2.6/drivers/net/wireless/orinoco_pci.c 2005-02-18 12:04:03.0 +1100 @@ -254,7 +254,7 @@ if (dev->irq) free_irq(dev->irq, dev); - free_netdev(dev); + free_orinocodev(dev); } if (pci_ioaddr) @@ -279,7 +279,7 @@ iounmap(priv->hw.iobase); pci_set_drvdata(pdev, NULL); - free_netdev(dev); + free_orinocodev(dev); pci_disable_device(pdev); } Index: working-2.6/drivers/net/wireless/orinoco_plx.c === --- working-2.6.orig/drivers/net/wireless/orinoco_plx.c 2005-02-18 12:04:03.0 +1100 +++ working-2.6/drivers/net/wireless/orinoco_plx.c 2005-02-18 12:04:03.0 +1100 @@ -279,7 +279,7 @@ fail: free_irq(dev->irq, dev); fail_irq: - free_netdev(dev); + free_orinocodev(dev); fail_alloc: pci_iounmap(pdev, mem); fail_map: @@ -304,7 +304,7 @@ pci_set_drvdata(pdev, NULL); - free_netdev(dev); + free_orinocodev(dev); release_region(pci_resource_start(pdev, 3), pci_resource_len(pdev, 3)); Index: working-2.6/drivers/net/wireless/orinoco_tmd.c === --- working-2.6.orig/drivers/net/wireless/orinoco_tmd.c 2005-02-18 12:04:03.0 +1100 +++ working-2.6/drivers/net/wireless/orinoco_tmd.c 2005-02-18 12:04:03.0 +1100 @@ -164,7 +164,7 @@ out4: pci_iounmap(pdev, mem); out3: - free_netdev(dev); + free_orinocodev(dev); out2: release_region(pccard_ioaddr, pccard_iolen); out: @@ -188,7 +188,7 @@ pci_set_drvdata(pdev, NULL); - free_netdev(dev); + free_orinocodev(dev); release_region(pci_resource_start(pdev, 2), pci_resource_len(pdev, 2)); Index: working-2.6/drivers/net/wireless/airport.c === --- working-2.6.orig/drivers/net/wireless/airport.c 2005-02-18 12:04:03.0 +1100 +++ working-2.6/drivers/net/wireless/airport.c 2005-02-18 12:04:03.0 +1100 @@ -149,7 +149,7 @@ ssleep(1); macio_set_drvdata(mdev, NULL); - free_netdev(dev); + free_orinoc
[3/14] Orinoco driver updates - use mdelay()/ssleep() more
Use mdelay() or ssleep() instead of various silly more complicated ways of delaying in the orinoco driver. Signed-off-by: David Gibson <[EMAIL PROTECTED]> Index: working-2.6/drivers/net/wireless/orinoco_pci.c === --- working-2.6.orig/drivers/net/wireless/orinoco_pci.c 2005-01-12 15:13:18.819073992 +1100 +++ working-2.6/drivers/net/wireless/orinoco_pci.c 2005-01-12 15:15:33.137654464 +1100 @@ -151,19 +151,11 @@ /* Assert the reset until the card notice */ hermes_write_regn(hw, PCI_COR, HERMES_PCI_COR_MASK); - timeout = jiffies + (HERMES_PCI_COR_ONT * HZ / 1000); - while(time_before(jiffies, timeout)) { - mdelay(1); - } - //mdelay(HERMES_PCI_COR_ONT); + mdelay(HERMES_PCI_COR_ONT); /* Give time for the card to recover from this hard effort */ hermes_write_regn(hw, PCI_COR, 0x); - timeout = jiffies + (HERMES_PCI_COR_OFFT * HZ / 1000); - while(time_before(jiffies, timeout)) { - mdelay(1); - } - //mdelay(HERMES_PCI_COR_OFFT); + mdelay(HERMES_PCI_COR_OFFT); /* The card is ready when it's no longer busy */ timeout = jiffies + (HERMES_PCI_COR_BUSYT * HZ / 1000); Index: working-2.6/drivers/net/wireless/orinoco_plx.c === --- working-2.6.orig/drivers/net/wireless/orinoco_plx.c 2005-01-12 15:13:18.821073688 +1100 +++ working-2.6/drivers/net/wireless/orinoco_plx.c 2005-01-12 15:15:33.138654312 +1100 @@ -356,8 +356,7 @@ static void __exit orinoco_plx_exit(void) { pci_unregister_driver(&orinoco_plx_driver); - current->state = TASK_UNINTERRUPTIBLE; - schedule_timeout(HZ); + ssleep(1); } module_init(orinoco_plx_init); Index: working-2.6/drivers/net/wireless/orinoco_tmd.c === --- working-2.6.orig/drivers/net/wireless/orinoco_tmd.c 2005-01-12 15:13:18.820073840 +1100 +++ working-2.6/drivers/net/wireless/orinoco_tmd.c 2005-01-12 15:16:05.897674184 +1100 @@ -225,8 +225,7 @@ static void __exit orinoco_tmd_exit(void) { pci_unregister_driver(&orinoco_tmd_driver); - current->state = TASK_UNINTERRUPTIBLE; - schedule_timeout(HZ); + ssleep(1); } module_init(orinoco_tmd_init); -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[0/14] Orinoco driver updates
Jeff, please apply: Here's a big stack of patches that make a significant step forward on the long overdue orinoco driver merge. Still quite a long way to go, but it's something. This patch stack is againt Linus' vanilla + Viro's big iomap cleanup patch, as requested. The first 9 patches make only trivial or cosmetic behavioural changes: 1/14orinoco-carrier Use netif_carrier_*() macros instead of homegrown 'connected' variable. 2/14orinoco-printks Update various printk()s and other cosmetic strings 3/14orinoco-delays Use mdelay() and ssleep() instead of outdated ways of delaying. 4/14orinoco-free-orinocodev Introduce free_orinocodev() function, to reduce noise in future diffs. 5/14orinoco-cleanup-hermes Assorted cleanups to low-level hardware access code 6/14orinoco-pci-updates Cleanup to initialization code for the PCI based orinoco devices. 7/14orinoco-modparm Use modern module_parm macros for orinoco module. 8/14orinoco-pccard-cleanups Cleanup to PCMCIA initialization code 9/14orinoco-void-ethersnap Trivial change to is_ethersnap() function to reduce future diff noise. The next 4 patches start to intoduce real new functionality and bug fixes: 10/14 orinoco-no-ibss-any Disallow IBSS mode if no ESSID is set (too many firmwares break, otherwise) 11/14 orinoco-late-tx-wake Delay waking the Tx queue, fixes problems on a number of firwmares 12/14 orinoco-wep-updates Various updates to WEP setup code 13/14 orinoco-update-firmware-detection Updates and bugfixes to firmware detection logic And the final one, is another trivial one: 14/14 orinoco-is-now-0.14alpha2 Update version and changelog to reflect the above patches. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[1/14] Orinoco driver updates - use netif_carrier_*()
Removes the orinoco driver's custom and dodgy "connected" variable used to track whether or not we're associated with an AP. Replaces it instead with netif_carrier_ok() settings. Signed-off-by: David Gibson <[EMAIL PROTECTED]> Index: working-2.6/drivers/net/wireless/orinoco.c === --- working-2.6.orig/drivers/net/wireless/orinoco.c 2005-01-13 09:48:55.0 +1100 +++ working-2.6/drivers/net/wireless/orinoco.c 2005-02-10 14:22:32.179826024 +1100 @@ -784,7 +784,7 @@ return 1; } - if (! priv->connected) { + if (! netif_carrier_ok(dev)) { /* Oops, the firmware hasn't established a connection, silently drop the packet (this seems to be the safest approach). */ @@ -1269,6 +1269,7 @@ case HERMES_INQ_LINKSTATUS: { struct hermes_linkstatus linkstatus; u16 newstatus; + int connected; if (len != sizeof(linkstatus)) { printk(KERN_WARNING "%s: Unexpected size for linkstatus frame (%d bytes)\n", @@ -1280,15 +1281,14 @@ len / 2); newstatus = le16_to_cpu(linkstatus.linkstatus); - if ( (newstatus == HERMES_LINKSTATUS_CONNECTED) -|| (newstatus == HERMES_LINKSTATUS_AP_CHANGE) -|| (newstatus == HERMES_LINKSTATUS_AP_IN_RANGE) ) - priv->connected = 1; - else if ( (newstatus == HERMES_LINKSTATUS_NOT_CONNECTED) - || (newstatus == HERMES_LINKSTATUS_DISCONNECTED) - || (newstatus == HERMES_LINKSTATUS_AP_OUT_OF_RANGE) - || (newstatus == HERMES_LINKSTATUS_ASSOC_FAILED) ) - priv->connected = 0; + connected = (newstatus == HERMES_LINKSTATUS_CONNECTED) + || (newstatus == HERMES_LINKSTATUS_AP_CHANGE) + || (newstatus == HERMES_LINKSTATUS_AP_IN_RANGE); + + if (connected) + netif_carrier_on(dev); + else + netif_carrier_off(dev); if (newstatus != priv->last_linkstatus) print_linkstatus(dev, newstatus); @@ -1366,8 +1366,8 @@ } /* firmware will have to reassociate */ + netif_carrier_off(dev); priv->last_linkstatus = 0x; - priv->connected = 0; return 0; } @@ -1878,7 +1878,7 @@ priv->hw_unavailable++; priv->last_linkstatus = 0x; /* firmware will have to reassociate */ - priv->connected = 0; + netif_carrier_off(dev); orinoco_unlock(priv, &flags); @@ -2388,8 +2388,8 @@ * hardware */ INIT_WORK(&priv->reset_work, (void (*)(void *))orinoco_reset, dev); + netif_carrier_off(dev); priv->last_linkstatus = 0x; - priv->connected = 0; return dev; Index: working-2.6/drivers/net/wireless/orinoco.h === --- working-2.6.orig/drivers/net/wireless/orinoco.h 2004-10-29 13:16:58.0 +1000 +++ working-2.6/drivers/net/wireless/orinoco.h 2005-02-10 14:22:32.179826024 +1100 @@ -42,7 +42,6 @@ /* driver state */ int open; u16 last_linkstatus; - int connected; /* Net device stuff */ struct net_device *ndev; -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist. NOT _the_ _other_ _way_ | _around_! http://www.ozlabs.org/people/dgibson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext2/3 files per directory limits
On Wed, 23 Feb 2005, Ron Peterson wrote: I would like to better understand ext2/3's performance characteristics. I'm specifically interested in how ext2/3 will handle a /var/spool/mail directory w/ ~6000 mbox format inboxes, handling approx 1GB delivered as 75,000 messages daily. Virtually all access is via imap, w/ approx ~1000 imapd processes running during peak load. Local delivery is via procmail, which by default uses both kernel-supported locking calls and .lock files. At some point it makes sense to subdivide you mail load because serialization of i/o on that one filesystem becomes a bigger issue than the performance of your filesystem... We deliver into mbox formatted mailboxes inside users homedirs, some folks do a similar thing with maildir. In the end you can on make one filesystem so fast. beyond that you need more filesystems to acheive any kind of reasonable scaling... I understand that various tuning parameters will have an impact, e.g. putting the journal on a separate device, setting the noatime mount option, etc. I also understand that there are other mailbox formats and other strategies for locating mail spools (e.g. in user's home directories). I'm interested in people's thoughts on these issues, but I'm mostly interested in whether or not the scenario I described falls within ext2/3's designed capabilities. Best. -- -- Joel Jaeggli Unix Consulting [EMAIL PROTECTED] GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mouse still losing sync and thus jumping around
On Wednesday 23 February 2005 22:05, Anthony DiSante wrote: > Dmitry Torokhov wrote: > > Yes, It usually happens either under high load, when mouse interrupts are > > significantly delayed. Or sometimes it happen when applications poll > > battey status and on some boxes it takes pretty long time. And because > > it is usually the same chip that serves keyboard/mouse it again delays > > mouse interrupts. > > I have this problem with recent 2.6.10 kernels too, but it has nothing to do > with load in my case; it happens whenever I switch my KVM to the linux box. > Hi Anthony, This is a bit different problem and we trying to find a reliable solution for it. > Long ago and far away, it used to be that switching out of X, then back in > (ctrl-alt-F1, then ctrl-alt-F7) would reset the mouse and stop the jumping. > At some point in late 2.4/early 2.6 that stopped working, and the only fix > was to unplug the mouse from the KVM switch and re-plug it. > > In Oct 2004 I posted to lkml with subject "KVM -> jumping mouse... still no > solution?" Dmitry Torokhov (hi :) responded that this would work on > 2.6.9-rc3+: > > echo -n "reconnect" > /sys/bus/serio/devices/serioX/driver > > That was GREAT and it worked for a while, but now my last few 2.6.10 kernels > don't seem to care when I do that, and again, unplugging the mouse is the > only thing that works. I'm currently running 2.6.10-gentoo-r6. > It still should work fine, but in a bit different form: echo -n "reconnect" > /sys/bus/serio/devices/serioX/drvctl I.e. substitute "driver" with "drvctl" as now "driver" is a symlink to a currently bound driver that is set up by driver core. -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] override RLIMIT_SIGPENDING for non-RT signals
* Roland McGrath ([EMAIL PROTECTED]) wrote: > > * Roland McGrath ([EMAIL PROTECTED]) wrote: > > > Indeed, I think your patch does not go far enough. I can read POSIX to > > > say > > > that the siginfo_t data must be available when `kill' was used, as well. > > > > How? I only see reference to filling in SI_USER for rt signals? > > Just curious...(I've only got SuSv3 and some crusty old POSIX rt docs). > > There is stuff about a SA_SIGINFO signal handler's siginfo_t argument > "shall contain" the various specified information like si_pid/si_uid values > for a kill caller. OK, guess it's odd corner case, since they aren't queued anyway. > > Good point. Although it's RLIMIT_SIGPENDING + (31 * user_nprocs). So > > that could be 31 * 8k, for example. > > And a "good point" back to you, sir! I think the right way to think about > this in terms of resource consumption is that sizeof(struct sigqueue)*31 is > part of the potential per-process overhead that make up the consumption > units one should have in mind when choosing how to set the RLIMIT_NPROC limit. As in dynamic, and work with the patch that you sent to redo default sigpending as per nproc? thanks, -chris -- Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
ext2/3 files per directory limits
I would like to better understand ext2/3's performance characteristics. I'm specifically interested in how ext2/3 will handle a /var/spool/mail directory w/ ~6000 mbox format inboxes, handling approx 1GB delivered as 75,000 messages daily. Virtually all access is via imap, w/ approx ~1000 imapd processes running during peak load. Local delivery is via procmail, which by default uses both kernel-supported locking calls and .lock files. I understand that various tuning parameters will have an impact, e.g. putting the journal on a separate device, setting the noatime mount option, etc. I also understand that there are other mailbox formats and other strategies for locating mail spools (e.g. in user's home directories). I'm interested in people's thoughts on these issues, but I'm mostly interested in whether or not the scenario I described falls within ext2/3's designed capabilities. Best. -- Ron Peterson Network & Systems Manager Mount Holyoke College http://www.mtholyoke.edu/~rpeterso - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] set RLIMIT_SIGPENDING limit based on RLIMIT_NPROC
* Roland McGrath ([EMAIL PROTECTED]) wrote: > While looking into the issues Jeremy had with the RLIMIT_SIGPENDING limit, > it occurred to me that the normal setting of this limit is bizarrely low. > The initial hard limit setting (MAX_SIGPENDING) was taken from the old > max_queued_signals parameter, which was for the entire system in aggregate. > But even as a per-user limit, the 1024 value is incongruously low for this. But the old default system-wide limit was 1024. And you could have spawned 8k processes then as well. So I don't think this matters much. > On my machine, RLIMIT_NPROC allows me 8192 processes, but only 1024 queued > signals, i.e. fewer even than one pending signal in each process. (To me, > this really puts in doubt the sensibility of using a per-user limit for > this rather than a per-process one, i.e. counted in sighand_struct or > signal_struct, which could have a much smaller reasonable value. I don't > recall the rationale for making this new limit per-user in the first place.) I don't either, the archives show using per-user as default choice (never saw a discussion otherwise). Users can easily queue signals to themselves (using multiple processes or not), and there was some concern that somebody actually wanted to be able queue up to 1024 (since it's what was allowed in the past). > This patch sets the default RLIMIT_SIGPENDING limit at boot time, using the > calculation that decides the default RLIMIT_NPROC limit. This uses the > same value for those two limits, which I think is still pretty conservative > on the RLIMIT_SIGPENDING value. It's an rlimit, so easily setable in userspace at login session time. I think we could raise it if people start complaining it's too low (hasn't seemed to be a problem yet). thanks, -chris -- Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mouse still losing sync and thus jumping around
Dmitry Torokhov wrote: Yes, It usually happens either under high load, when mouse interrupts are significantly delayed. Or sometimes it happen when applications poll battey status and on some boxes it takes pretty long time. And because it is usually the same chip that serves keyboard/mouse it again delays mouse interrupts. I have this problem with recent 2.6.10 kernels too, but it has nothing to do with load in my case; it happens whenever I switch my KVM to the linux box. Long ago and far away, it used to be that switching out of X, then back in (ctrl-alt-F1, then ctrl-alt-F7) would reset the mouse and stop the jumping. At some point in late 2.4/early 2.6 that stopped working, and the only fix was to unplug the mouse from the KVM switch and re-plug it. In Oct 2004 I posted to lkml with subject "KVM -> jumping mouse... still no solution?" Dmitry Torokhov (hi :) responded that this would work on 2.6.9-rc3+: echo -n "reconnect" > /sys/bus/serio/devices/serioX/driver That was GREAT and it worked for a while, but now my last few 2.6.10 kernels don't seem to care when I do that, and again, unplugging the mouse is the only thing that works. I'm currently running 2.6.10-gentoo-r6. -Anthony DiSante http://nodivisions.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] show RLIMIT_SIGPENDING usage in /proc/PID/status
* Roland McGrath ([EMAIL PROTECTED]) wrote: > > Two questions: 1) This changes the interface for consumers of > > /proc/[pid]/status data, do we care? Adding new line like this should be > > safe enough. > > As far as I can tell, noone fretted about the addition of Threads:, > ShdPnd:, etc., which were not always there. Sounds good ;-) > > 2) Perhaps we should do /proc/[pid]/rlimit/ type dir for each value? > >This has been asked for before. > > Is the request to see the limit settings, or the current usage, or both? > What kind of format are you suggesting? I don't see a need for something > with a million little files. Also, for some of the limits the correct > current usage count is not trivial to ascertain. (And for others like > RLIMIT_FSIZE and RLIMIT_CORE, it is of course not meaningful at all.) Probably just one file per rlimit with usage, cur, max. thanks, -chris -- Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02
On Thu, 2005-02-24 at 13:41 +1100, Nick Piggin wrote: > Lee Revell wrote: > > > > Agreed, it would be much better to optimize this away than just add a > > scheduling point. It seems like we could do this lazily. > > > > Oh? What do you mean by lazy? IMO it is sort of implemented lazily now. > That is, we are too lazy to refcount page table pages in fastpaths, so > that pushes a lot of work to unmap time. Not necessarily a bad trade-off, > mind you. Just something I'm looking into. > I guess I was thinking we could be even more lazy, and somehow defer it until after unmap time (in lieu of memory pressure that is). Actually that's kind of what a lock break would do. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: intel8x0: no sound in 2.6.11 rc3 & 4 (fine with 2.6.10)
On Wed, 23 Feb 2005 14:31:20 -0500, Bill Davidsen <[EMAIL PROTECTED]> wrote: > [EMAIL PROTECTED] wrote: > > Hello > > > > I have read a post in lkml.org that states that the problem experienced in > > rc3 has gone (1). That is not the case for me. > > > > My audio device is > > > > :00:1f.5 Multimedia audio controller: Intel Corp. 82801DB/DBL/DBM > > (ICH4/ICH4-L/ICH4-M) AC'97 Audio Controller (rev 01) > > I have found that I had to Mute __both__ "Headphone Jack Sense" and > > "Line Jack Sense" in order to ear the audio in rc4. > > I keep seeing this advice, but what tool do you use to mute them? I > don't see anything like that in alsamixer, aumix, or any other program I > tried. I have a T41p with: Multimedia audio controller: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Audio Controller (rev 01), which looks to the be the same as Ulisses' device. It works fine for me in 2.6.11-rc4 (has worked fine for a while, as well). I have both a "Headphone Jack Sense" and "Line Jack Sense" (both set to off) mixer entry in alsamixer. I'm not sure why you're not seeing these entries... Thanks, Nish - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: 2.6.11-rc4 libata-core (irq 30: nobody cared!)
> Does this patch do anything useful? > > Jeff > Not really. It doesn't print the nobody cared message, but still hangs at boot. I'd give you a backtrace but my MAGIC_SYSRQ doesn't seem to be working right now. -Brian Linux version 2.6.11-rc4 ([EMAIL PROTECTED]) (gcc version 3.3.2) #28 Wed Feb 23 18:52:22 PST 2005 Built 1 zonelists Kernel command line: root=/dev/ram rw ramdisk=36000 console=ttyS0 PID hash table entries: 1024 (order: 10, 16384 bytes) Console: colour dummy device 80x25 Dentry cache hash table entries: 32768 (order: 5, 131072 bytes) Inode-cache hash table entries: 16384 (order: 4, 65536 bytes) Memory: 120832k available (2136k kernel code, 916k data, 108k init, 0k highmem) Mount-cache hash table entries: 512 (order: 0, 4096 bytes) checking if image is initramfs...it isn't (no cpio magic); looks like an initrd Freeing initrd memory: 5709k freed NET: Registered protocol family 16 PCI: Probing PCI hardware SCSI subsystem initialized Installing knfsd (copyright (C) 1996 [EMAIL PROTECTED]). Initializing Cryptographic API Serial: 8250/16550 driver $Revision: 1.90 $ 6 ports, IRQ sharing disabled ttyS0 at MMIO 0x0 (irq = 0) is a 16550A ttyS1 at MMIO 0x0 (irq = 1) is a 16550A io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered RAMDISK driver initialized: 16 RAM disks of 36000K size 1024 blocksize loop: loaded (max 8 devices) mal0: Initialized, 1 tx channels, 1 rx channels emac: IBM EMAC Ethernet driver, version 2.0 Maintained by Benjamin Herrenschmidt <[EMAIL PROTECTED]> eth0: IBM emac, MAC 08:00:3e:26:15:59 eth0: Found Generic MII PHY (0x06) Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx ata1: SATA max UDMA/100 cmd 0xC9002E80 ctl 0xC9002E8A bmdma 0xC9002E00 irq 30 ata2: SATA max UDMA/100 cmd 0xC9002EC0 ctl 0xC9002ECA bmdma 0xC9002E08 irq 30 ata1: dev 0 ATA, max UDMA7, 234493056 sectors: lba48 eth0: Link is Up eth0: Speed: 100, Full duplex. __ Do you Yahoo!? Yahoo! Mail - Helps protect you from nasty viruses. http://promotions.yahoo.com/new_mail - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] show RLIMIT_SIGPENDING usage in /proc/PID/status
> Two questions: 1) This changes the interface for consumers of > /proc/[pid]/status data, do we care? Adding new line like this should be > safe enough. As far as I can tell, noone fretted about the addition of Threads:, ShdPnd:, etc., which were not always there. > 2) Perhaps we should do /proc/[pid]/rlimit/ type dir for each value? >This has been asked for before. Is the request to see the limit settings, or the current usage, or both? What kind of format are you suggesting? I don't see a need for something with a million little files. Also, for some of the limits the correct current usage count is not trivial to ascertain. (And for others like RLIMIT_FSIZE and RLIMIT_CORE, it is of course not meaningful at all.) Thanks, Roland - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] override RLIMIT_SIGPENDING for non-RT signals
> * Roland McGrath ([EMAIL PROTECTED]) wrote: > > Indeed, I think your patch does not go far enough. I can read POSIX to say > > that the siginfo_t data must be available when `kill' was used, as well. > > How? I only see reference to filling in SI_USER for rt signals? > Just curious...(I've only got SuSv3 and some crusty old POSIX rt docs). There is stuff about a SA_SIGINFO signal handler's siginfo_t argument "shall contain" the various specified information like si_pid/si_uid values for a kill caller. > Good point. Although it's RLIMIT_SIGPENDING + (31 * user_nprocs). So > that could be 31 * 8k, for example. And a "good point" back to you, sir! I think the right way to think about this in terms of resource consumption is that sizeof(struct sigqueue)*31 is part of the potential per-process overhead that make up the consumption units one should have in mind when choosing how to set the RLIMIT_NPROC limit. Thanks, Roland - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02
Lee Revell wrote: On Thu, 2005-02-24 at 12:29 +1100, Nick Piggin wrote: Lee Revell wrote: IIRC last time I really tested this a few months ago, the worst case latency on that machine was about 150us. Currently its 422us from the same clear_page_range code path. Well it should be pretty trivial to add a break in there. I don't think it can get into 2.6.11 at this point though, so we'll revisit this for 2.6.12 if the clear_page_range optimisations don't get anywhere. Agreed, it would be much better to optimize this away than just add a scheduling point. It seems like we could do this lazily. Oh? What do you mean by lazy? IMO it is sort of implemented lazily now. That is, we are too lazy to refcount page table pages in fastpaths, so that pushes a lot of work to unmap time. Not necessarily a bad trade-off, mind you. Just something I'm looking into. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Xterm Hangs - Possible scheduler defect?
"Chad N. Tindel" <[EMAIL PROTECTED]> wrote: > > We have hit a defect where an exiting xterm process will hang. This is > running > on a 2-cpu IA-64 box. We have a multithreaded application, where one thread > is SCHED_FIFO and is running with priority 98, and the other thread is just > a normal SCHED_OTHER thread. The SCHED_FIFO thread is in a CPU bound tight > loop, but I wouldn't expect that to cause since there are 2 CPUs. > > However, it does seem to cause some problems. For example, if you ssh into > the system and run an Xterm using X11 forwarding, when you type "exit" in > the xterm window, the window hangs and doesn't close. Killing the CPU-bound > app causes the window to exit immediately. The sysrq output shows the > following: > > xterm D a001000bef60 0 2905 2876 > (NOTLB) > > Call Trace: > [] schedule+0xca0/0x1300 > sp=e00012257d20 bsp=e00012251080 > [] flush_cpu_workqueue+0x1a0/0x4a0 > sp=e00012257d30 bsp=e00012251020 > [] flush_workqueue+0x100/0x160 > sp=e00012257d90 bsp=e00012250fe8 > [] flush_scheduled_work+0x20/0x40 > sp=e00012257d90 bsp=e00012250fd0 > [] release_dev+0x8e0/0x1100 > sp=e00012257d90 bsp=e00012250f20 > [] tty_release+0x30/0x60 > sp=e00012257e30 bsp=e00012250ef8 > [] __fput+0x330/0x340 > sp=e00012257e30 bsp=e00012250ea8 > [] fput+0x40/0x60 > sp=e00012257e30 bsp=e00012250e88 > [] filp_close+0xd0/0x160 > sp=e00012257e30 bsp=e00012250e58 > [] sys_close+0x140/0x1a0 > sp=e00012257e30 bsp=e00012250dd8 > [] ia64_ret_from_syscall+0x0/0x20 > sp=e00012257e30 bsp=e00012250dd8 > > So it would appear that xterm is hung in close() trying to shutdown a tty. > The comment says that is calling flush_scheduled_work() to > "Wait for ->hangup_work and ->flip.work handlers to terminate". Perhaps > there > is some locking issue that is causing these to not run and complete? `xterm' is waiting for the other CPU to schedule a kernel thread (which is bound to that CPU). Once that kernel thread has done a little bit of work, `xterm' can terminate. But kernel threads don't run with realtime policy, so your userspace app has permanently starved that kernel thread. It's potentially quite a problem, really. For example it could prevent various tty operations from completing, it will prevent kjournald from ever writing back anything (on uniprocessor, etc). I've been waiting for someone to complain ;) But the other side of the coin is that a SCHED_FIFO userspace task presumably has extreme latency requirements, so it doesn't *want* to be preempted by some routine kernel operation. People would get irritated if we were to do that. So what to do? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: 2.6.11-rc4 libata-core (irq 30: nobody cared!)
BTW, please CC your replies to linux-ide@vger.kernel.org as well. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Re: 2.6.11-rc4 libata-core (irq 30: nobody cared!)
Does this patch do anything useful? Jeff = drivers/scsi/sata_sil.c 1.44 vs edited = --- 1.44/drivers/scsi/sata_sil.c2005-02-17 19:43:51 -05:00 +++ edited/drivers/scsi/sata_sil.c 2005-02-23 21:27:18 -05:00 @@ -65,6 +65,7 @@ static u32 sil_scr_read (struct ata_port *ap, unsigned int sc_reg); static void sil_scr_write (struct ata_port *ap, unsigned int sc_reg, u32 val); static void sil_post_set_mode (struct ata_port *ap); +static void sil_tf_load(struct ata_port *ap, struct ata_taskfile *tf); static struct pci_device_id sil_pci_tbl[] = { { 0x1095, 0x3112, PCI_ANY_ID, PCI_ANY_ID, 0, 0, sil_3112 }, @@ -130,7 +131,7 @@ static struct ata_port_operations sil_ops = { .port_disable = ata_port_disable, .dev_config = sil_dev_config, - .tf_load= ata_tf_load, + .tf_load= sil_tf_load, .tf_read= ata_tf_read, .check_status = ata_check_status, .exec_command = ata_exec_command, @@ -197,6 +198,69 @@ MODULE_LICENSE("GPL"); MODULE_DEVICE_TABLE(pci, sil_pci_tbl); MODULE_VERSION(DRV_VERSION); + +static void sil_irq_enable(struct ata_port *ap, int disable) +{ + void __iomem *mmio = ap->host_set->mmio_base; + u32 tmp, new; + u32 bit = 1 << (22 + ap->port_no); + + tmp = readl(mmio + SIL_SYSCFG); + if (disable) + new = tmp | bit; + else + new = tmp & ~bit; + if (new != tmp) + writel(new, mmio + SIL_SYSCFG); +} + +static void sil_tf_load(struct ata_port *ap, struct ata_taskfile *tf) +{ + struct ata_ioports *ioaddr = &ap->ioaddr; + unsigned int is_addr = tf->flags & ATA_TFLAG_ISADDR; + + if (tf->ctl != ap->last_ctl) { + sil_irq_enable(ap, tf->ctl & ATA_NIEN); + writeb(tf->ctl, (void __iomem *) ap->ioaddr.ctl_addr); + ap->last_ctl = tf->ctl; + ata_wait_idle(ap); + } + + if (is_addr && (tf->flags & ATA_TFLAG_LBA48)) { + writeb(tf->hob_feature, (void __iomem *) ioaddr->feature_addr); + writeb(tf->hob_nsect, (void __iomem *) ioaddr->nsect_addr); + writeb(tf->hob_lbal, (void __iomem *) ioaddr->lbal_addr); + writeb(tf->hob_lbam, (void __iomem *) ioaddr->lbam_addr); + writeb(tf->hob_lbah, (void __iomem *) ioaddr->lbah_addr); + VPRINTK("hob: feat 0x%X nsect 0x%X, lba 0x%X 0x%X 0x%X\n", + tf->hob_feature, + tf->hob_nsect, + tf->hob_lbal, + tf->hob_lbam, + tf->hob_lbah); + } + + if (is_addr) { + writeb(tf->feature, (void __iomem *) ioaddr->feature_addr); + writeb(tf->nsect, (void __iomem *) ioaddr->nsect_addr); + writeb(tf->lbal, (void __iomem *) ioaddr->lbal_addr); + writeb(tf->lbam, (void __iomem *) ioaddr->lbam_addr); + writeb(tf->lbah, (void __iomem *) ioaddr->lbah_addr); + VPRINTK("feat 0x%X nsect 0x%X lba 0x%X 0x%X 0x%X\n", + tf->feature, + tf->nsect, + tf->lbal, + tf->lbam, + tf->lbah); + } + + if (tf->flags & ATA_TFLAG_DEVICE) { + writeb(tf->device, (void __iomem *) ioaddr->device_addr); + VPRINTK("device 0x%X\n", tf->device); + } + + ata_wait_idle(ap); +} static void sil_post_set_mode (struct ata_port *ap) {
Re: [PATCH] show RLIMIT_SIGPENDING usage in /proc/PID/status
* Roland McGrath ([EMAIL PROTECTED]) wrote: > Jeremy mentioned the aggravation of not being able to tell when your > processes are using up signal queue entries and hitting the > RLIMIT_SIGPENDING limit. This patch adds a line to /proc/PID/status > showing how many queue items are in use, and allowed, for your uid. Two questions: 1) This changes the interface for consumers of /proc/[pid]/status data, do we care? Adding new line like this should be safe enough. 2) Perhaps we should do /proc/[pid]/rlimit/ type dir for each value? This has been asked for before. thanks, -chris -- Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] override RLIMIT_SIGPENDING for non-RT signals
* Roland McGrath ([EMAIL PROTECTED]) wrote: > Indeed, I think your patch does not go far enough. I can read POSIX to say > that the siginfo_t data must be available when `kill' was used, as well. How? I only see reference to filling in SI_USER for rt signals? Just curious...(I've only got SuSv3 and some crusty old POSIX rt docs). > This patch makes it allocate the siginfo_t, even when that exceeds > {RLIMIT_SIGPENDING}, for any non-RT signal (< SIGRTMIN) not sent by > sigqueue (actually, any signal that couldn't have been faked by a sigqueue > call). Of course, in an extreme memory shortage situation, you are SOL and > violate POSIX a little before you die horribly from being out of memory > anyway. > The LEGACY_QUEUE logic already ensures that, for non-RT signals, at most > one is ever on the queue. So there really is no risk at all of unbounded > resource consumption; the usage can reach {RLIMIT_SIGPENDING} + 31, is all. Good point. Although it's RLIMIT_SIGPENDING + (31 * user_nprocs). So that could be 31 * 8k, for example. thanks, -chris -- Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02
On Thu, 2005-02-24 at 12:29 +1100, Nick Piggin wrote: > Lee Revell wrote: > > > > IIRC last time I really tested this a few months ago, the worst case > > latency on that machine was about 150us. Currently its 422us from the > > same clear_page_range code path. > > > Well it should be pretty trivial to add a break in there. > I don't think it can get into 2.6.11 at this point though, > so we'll revisit this for 2.6.12 if the clear_page_range > optimisations don't get anywhere. > Agreed, it would be much better to optimize this away than just add a scheduling point. It seems like we could do this lazily. IMHO it's not critical that these latency fixes be merged until the VP feature gets merged, until then people will be using Ingo's patches anyway. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] set RLIMIT_SIGPENDING limit based on RLIMIT_NPROC
While looking into the issues Jeremy had with the RLIMIT_SIGPENDING limit, it occurred to me that the normal setting of this limit is bizarrely low. The initial hard limit setting (MAX_SIGPENDING) was taken from the old max_queued_signals parameter, which was for the entire system in aggregate. But even as a per-user limit, the 1024 value is incongruously low for this. On my machine, RLIMIT_NPROC allows me 8192 processes, but only 1024 queued signals, i.e. fewer even than one pending signal in each process. (To me, this really puts in doubt the sensibility of using a per-user limit for this rather than a per-process one, i.e. counted in sighand_struct or signal_struct, which could have a much smaller reasonable value. I don't recall the rationale for making this new limit per-user in the first place.) This patch sets the default RLIMIT_SIGPENDING limit at boot time, using the calculation that decides the default RLIMIT_NPROC limit. This uses the same value for those two limits, which I think is still pretty conservative on the RLIMIT_SIGPENDING value. Thanks, Roland Signed-off-by: Roland McGrath <[EMAIL PROTECTED]> --- linux-2.6/include/asm-generic/resource.h +++ linux-2.6/include/asm-generic/resource.h @@ -51,7 +51,7 @@ [RLIMIT_MEMLOCK]= { MLOCK_LIMIT, MLOCK_LIMIT }, \ [RLIMIT_AS] = { RLIM_INFINITY, RLIM_INFINITY }, \ [RLIMIT_LOCKS] = { RLIM_INFINITY, RLIM_INFINITY }, \ - [RLIMIT_SIGPENDING] = { MAX_SIGPENDING, MAX_SIGPENDING }, \ + [RLIMIT_SIGPENDING] = { 0, 0 }, \ [RLIMIT_MSGQUEUE] = { MQ_BYTES_MAX, MQ_BYTES_MAX }, \ } --- linux-2.6/include/linux/signal.h +++ linux-2.6/include/linux/signal.h @@ -8,8 +8,6 @@ #ifdef __KERNEL__ -#define MAX_SIGPENDING 1024 - /* * Real Time signals may be queued. */ --- linux-2.6/kernel/fork.c +++ linux-2.6/kernel/fork.c @@ -129,6 +129,8 @@ void __init fork_init(unsigned long memp init_task.signal->rlim[RLIMIT_NPROC].rlim_cur = max_threads/2; init_task.signal->rlim[RLIMIT_NPROC].rlim_max = max_threads/2; + init_task.signal->rlim[RLIMIT_SIGPENDING] = + init_task.signal->rlim[RLIMIT_NPROC]; } static struct task_struct *dup_task_struct(struct task_struct *orig) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: memory management weirdness
On Tuesday 22 February 2005 04:57 am, Martin MOKREJŠ wrote: > The 3GB labeled file corresponds to fast case, 4GB is ugly slow. > What can you gather from those files? I did take a look and didn't analyze it further since Andi Mentioned it is a known BIOS bug. Sorry about the trouble - didn't imagine it might be BIOS related. Generally speaking it helps to have profile available when things are going slow. Parag - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1 (VFS: Cannot open root device "301")
On Thu, Feb 24, 2005 at 03:03:33AM +0100, Benoit Boissinot wrote: > On Wed, 23 Feb 2005 16:41:59 -0800, Matt Mackall <[EMAIL PROTECTED]> wrote: > > On Wed, Feb 23, 2005 at 04:16:53PM -0800, Andrew Morton wrote: > > > Steven Cole <[EMAIL PROTECTED]> wrote: > > > > > > > > > Yes, that worked. 2.6.11-rc4-mm1 now boots OK, but hdb1 seems to be > > > > > missing. > > > > > > Looking at the IDE update in rc4-mm1: > > > > > > +void ide_init_disk(struct gendisk *disk, ide_drive_t *drive) > > > +{ > > > + ide_hwif_t *hwif = drive->hwif; > > > + unsigned int unit = drive->select.all & (1 << 4); > > > + > > If i grep in the tree, for select.all, it looks like from the initialization > that you can not recover the unit from select.all (ide.c line 235 and 1882) > since the function used is not invertible. They're fine, if a bit ugly. Unit is either 0 or 1. So: (unit<<4) | 0xa0 is equivalent to unit * 16 as the mask won't mask off any bits. > > > > > > Could someone try this? > > > > > > - unsigned int unit = drive->select.all & (1 << 4); > > > + unsigned int unit = (drive->select.all >> 4) & 1; > > > > Apparently there's already an 'hdb' sitting in drive->name, perhaps we > > ought to do disk->disk_name = drive->name for the non-devfs case. > > > init_hwif_default initialized it right. > > Could something like this work ? No, because they're arrays and not pointers. I've booted with the obvious strcpy, works fine. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: A common layer for Accounting packages
Jay wrote: > I think the microbenchmarking your link provides is irrelevant. In the cases such as you describe where it's just some sort of empty function call, then yes, I am willing to accept a wave of the hands and a simple explanation of how it's not significant. I've done the same myself ;). What about the case where accounting is enabled, and thus actually has to do work? How does that compare with just doing the traditional BSD accounting? I presume in that case that the benchmarking is no longer irrelevant. Though if you can make a decent case that it is, I'm willing to listen. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] show RLIMIT_SIGPENDING usage in /proc/PID/status
Jeremy mentioned the aggravation of not being able to tell when your processes are using up signal queue entries and hitting the RLIMIT_SIGPENDING limit. This patch adds a line to /proc/PID/status showing how many queue items are in use, and allowed, for your uid. I can certainly see the appeal of having a display of the number of queued items specific to each process, and even the items within the process broken down per signal number. However, those are not things that are directly counted, and ascertaining them requires iterating through the queue. This patch instead gives what can be readily determined in constant time using the accounting already done. I'm not sure something more complex is warranted just to facilitate one particular debugging need. With this, you can see quickly that this particular problem has come up. Then examination of each process's SigPnd/ShdPnd lines ought to give you an indication of which processes have any queued RT signals sitting around for a long time, and you can then attack those programs directly, though there is no way after the fact to determine how many queued signals with the same number a given process has (short of killing it and seeing the usage drop). Note you may still have a mystery if the leaking programs are not leaving pending RT signals queued, but rather preallocating queue items via timer_create. That usage is not readily apparent in any /proc information. Signed-off-by: Roland McGrath <[EMAIL PROTECTED]> --- linux-2.6/fs/proc/array.c +++ linux-2.6/fs/proc/array.c @@ -239,6 +239,7 @@ static inline char * task_sig(struct tas { sigset_t pending, shpending, blocked, ignored, caught; int num_threads = 0; + unsigned long qsize = 0, qlim = 0; sigemptyset(&pending); sigemptyset(&shpending); @@ -255,11 +256,14 @@ static inline char * task_sig(struct tas blocked = p->blocked; collect_sigign_sigcatch(p, &ignored, &caught); num_threads = atomic_read(&p->signal->count); + qsize = atomic_read(&p->user->sigpending); + qlim = p->signal->rlim[RLIMIT_SIGPENDING].rlim_cur; spin_unlock_irq(&p->sighand->siglock); } read_unlock(&tasklist_lock); buffer += sprintf(buffer, "Threads:\t%d\n", num_threads); + buffer += sprintf(buffer, "SigQ:\t%lu/%lu\n", qsize, qlim); /* render them all */ buffer = render_sigset_t("SigPnd:\t", &pending, buffer); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ppc32 weirdness with gcc-4.0 in 2.6.11-rc4
> -Memory: 255872k available (1788k kernel code, 976k data, 144k init, 0k > highmem) > +Memory: 255872k available (1776k kernel code, 0k data, 144k init, 0k highmem) That is weird... (0k data) > AGP special page: 0xc000 > Calibrating delay loop... 830.66 BogoMIPS (lpj=4153344) > Mount-cache hash table entries: 512 (order: 0, 4096 bytes) > @@ -132,13 +132,7 @@ > VFS: Mounted root (ext3 filesystem) readonly. > Freeing unused kernel memory: 144k init 4k chrp 8k prep > usb 3-2: new full speed USB device using ohci_hcd and address 2 > -hub 3-2:1.0: USB hub found > -hub 3-2:1.0: 3 ports detected > -usb 3-2.1: new low speed USB device using ohci_hcd and address 3 > -input: USB HID v1.10 Mouse [Logitech Apple Optical USB Mouse] on > usb-0001:10:1b.0-2.1 > -usb 3-2.3: new full speed USB device using ohci_hcd and address 4 > -input: USB HID v1.10 Keyboard [Mitsumi Electric Apple Extended USB Keyboard] > on usb-0001:10:1b.0-2.3 > -input: USB HID v1.10 Device [Mitsumi Electric Apple Extended USB Keyboard] > on usb-0001:10:1b.0-2.3 > +usb 3-2: can't connect bus-powered hub to this port > EXT3 FS on hda5, internal journal > Adding 1048568k swap on /dev/hda3. Priority:-1 extents:1 > SCSI subsystem initialized > > Note: "Memory: ... 0k data ..." !? Surely that can't be correct. Not sure what's up, but it's probably something beeing miscompiled. Can you check if the udelay/medlay loops are correct ? Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: uninterruptible sleep lockups
On Wed, 23 Feb 2005, linux-os wrote: > On Wed, 23 Feb 2005, Bodo Eggert wrote: > > linux-os <[EMAIL PROTECTED]> wrote: > >> You don't seem to understand. A process that's stuck in 'D' state > >> shows a SEVERE error, usually with a hardware driver. > > > > Or a network filesystem mount to a no longer existing server or share. > > But that's a whole different problem. That's a systemic problem > of "fail-over". Network file-systems really need to interface > with an intermediate virtual device that can isolate failed > systems and make them look "perfect" to individual machines. > > If you don't do this, then as soon as somebody trips over a > wire, your database is trashed. I'm surprised that NFS, PCNFS, > SMB, etc., actually work as well as everybody seems to > think they do. Until the architectural problem is resolved, > there are still going to be hung processes, trashed databases, > etc. You don't run databases over a network filesystem unless you're begging for trouble. For the other common purposes you'll usurally get a more stable behaviour, since the failure on the client won't prevent the server from properly writing the metadata or flushing the cache. > > How to clean up the stuck processes: (This requires a MMU) > > Add an error path to each syscall (or create some generic error paths) and > > keep the original stack frame. On errors, you can "longjump" (not exactly, > > but similar) to the error path after copying the memory. The semaphore will > > not be taken, and the code depending on the semaphore will not be executed. > > > > Again, you are attacking the symptom. The problem could be resolved > by using a local disk (or a disk file) for the immediate I/O and > the I/O to the file-servers could occur whenever they are available. a) There are systems without local storage. b) It won't help while stat()ing a non-cached object. c) This would involve race conditions for e.g. two disconnected nodes on reconnect. AFAI can see, this race can be solved by: c1) The final transaction must be delayed until it's ACKed or NACKed. This may delay the D-State for some seconds, but not enough. c2) The server will have to keep track of the clients and need to be told when a user left for a trip to the south pole without unmounting. Very undesirable. c3) Ignoring. Very, very undesireable. c4) Requiring explicit transaction handling by the applications. Interesting, but not in the near future. d) This won't allow synchronous updates without falling back to classic handling. e) The users will update some files, get a positive reply and shut down their PCs before the changes can be commited to the server. If the server will not come back or the client is not rebooted within reasonable time, this will cause silent data loss. f) This will require reliable identification of the network server. g) I'm not only thinking of NFS/..., allthough I used it as _the_ example. E.g. if you see your IDE drive failing, you'll want to declare it dead instead of waiting $num_of_sectors times five minutes until the kernel decides to give up. I agree that most D-states are problems that need to be fixed instead of being worked-around, but sometimes you can't fix the problem without access to the crystal-ball-device. Therefore all devices that can block will need a manual override (with different probability), and the processes that were stuck will need a way to recover or be stuck forever. Obvoiusly the system is healthy enough to do some important and uninterruptible work after those errors occured, so having them stuck will be OK for now. Instead, the next task might be freeing the file descriptors preventing you from unmounting your removable media or network share or allowing really-forced umount. -- Top 100 things you don't want the sysadmin to say: 54. Uh huh.."nu -k $USER".. no problemsure thing... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1 (VFS: Cannot open root device "301")
On Wed, 23 Feb 2005 16:41:59 -0800, Matt Mackall <[EMAIL PROTECTED]> wrote: > On Wed, Feb 23, 2005 at 04:16:53PM -0800, Andrew Morton wrote: > > Steven Cole <[EMAIL PROTECTED]> wrote: > > > > > > > Yes, that worked. 2.6.11-rc4-mm1 now boots OK, but hdb1 seems to be > > > > missing. > > > > Looking at the IDE update in rc4-mm1: > > > > +void ide_init_disk(struct gendisk *disk, ide_drive_t *drive) > > +{ > > + ide_hwif_t *hwif = drive->hwif; > > + unsigned int unit = drive->select.all & (1 << 4); > > + If i grep in the tree, for select.all, it looks like from the initialization that you can not recover the unit from select.all (ide.c line 235 and 1882) since the function used is not invertible. > > > > Could someone try this? > > > > - unsigned int unit = drive->select.all & (1 << 4); > > + unsigned int unit = (drive->select.all >> 4) & 1; > > Apparently there's already an 'hdb' sitting in drive->name, perhaps we > ought to do disk->disk_name = drive->name for the non-devfs case. > init_hwif_default initialized it right. Could something like this work ? regards, Benoit --- linux/drivers/ide/ide-probe.c 2005-02-23 12:16:32.0 +0100 +++ linux-test/drivers/ide/ide-probe.c 2005-02-24 03:02:06.0 +0100 @@ -1269,11 +1269,11 @@ EXPORT_SYMBOL_GPL(ide_unregister_region) void ide_init_disk(struct gendisk *disk, ide_drive_t *drive) { ide_hwif_t *hwif = drive->hwif; - unsigned int unit = drive->select.all & (1 << 4); + unsigned int unit = drive->name[2] - 'a' - hwif->index * MAX_DRIVES; disk->major = hwif->major; disk->first_minor = unit << PARTN_BITS; - sprintf(disk->disk_name, "hd%c", 'a' + hwif->index * MAX_DRIVES + unit); + disk->disk_name = drive->name; disk->queue = drive->queue; }
Re: 2.6.11-rc4 libata-core (irq 30: nobody cared!)
Retry... that patch got screwed up in the last email... -Brian __ Do you Yahoo!? Yahoo! Mail - Helps protect you from nasty viruses. http://promotions.yahoo.com/new_mail--- libata-core.c.orig 2005-02-23 17:41:03.831836464 -0800 +++ libata-core.c 2005-02-23 17:54:51.287044152 -0800 @@ -3158,6 +3158,11 @@ if (qc && (!(qc->tf.ctl & ATA_NIEN))) { handled |= ata_host_intr(ap, qc); } + else { + /* bk - just ack spurious interrupt here - temp workaround */ + ata_irq_ack(ap, 0); + printk(KERN_WARNING "ata%d: irq trap\n", ap->id); + } } }
2.6.11-rc4 libata-core (irq 30: nobody cared!)
I see this problem with the sata_sil.c driver and SII3112 card. Others have reported seeing a similar problem: http://lkml.org/lkml/2005/2/6/41 There seems to be a pending interrupt from the drive, but the code has already set the NIEN bit, so the ATA_IRQ_TRAP macro doesn't help (the ata_interrupt handler never calls ata_host_intr in this case). I've implemented a quick workaround hack, but others should investigate a better fix (maybe acking pending interrupts before setting NIEN bit in ata_tf_load??) Regards, Brian --- libata-core.c.orig 2005-02-23 17:41:03.831836464 -0800 +++ libata-core.c 2005-02-23 17:31:07.930427248 -0800 @@ -3158,6 +3158,11 @@ if (qc && (!(qc->tf.ctl & ATA_NIEN))) { handled |= ata_host_intr(ap, qc); } + else { + /* bk - just ack spurious interrupt here - temp workaround */ + ata_irq_ack(ap, 0); + printk(KERN_WARNING "ata%d: irq trap\n", ap->id); + } } } Linux version 2.6.11-rc4 ([EMAIL PROTECTED]) (gcc version 3.3.2) #27 Wed Feb 23 17:49:05 PST 2005 Built 1 zonelists Kernel command line: root=/dev/ram rw ramdisk=36000 console=ttyS0 PID hash table entries: 1024 (order: 10, 16384 bytes) Console: colour dummy device 80x25 Dentry cache hash table entries: 32768 (order: 5, 131072 bytes) Inode-cache hash table entries: 16384 (order: 4, 65536 bytes) Memory: 120832k available (2136k kernel code, 916k data, 108k init, 0k highmem) Mount-cache hash table entries: 512 (order: 0, 4096 bytes) checking if image is initramfs...it isn't (no cpio magic); looks like an initrd Freeing initrd memory: 5709k freed NET: Registered protocol family 16 PCI: Probing PCI hardware SCSI subsystem initialized Installing knfsd (copyright (C) 1996 [EMAIL PROTECTED]). Initializing Cryptographic API Serial: 8250/16550 driver $Revision: 1.90 $ 6 ports, IRQ sharing disabled ttyS0 at MMIO 0x0 (irq = 0) is a 16550A ttyS1 at MMIO 0x0 (irq = 1) is a 16550A io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered RAMDISK driver initialized: 16 RAM disks of 36000K size 1024 blocksize loop: loaded (max 8 devices) mal0: Initialized, 1 tx channels, 1 rx channels emac: IBM EMAC Ethernet driver, version 2.0 Maintained by Benjamin Herrenschmidt <[EMAIL PROTECTED]> eth0: IBM emac, MAC 08:00:3e:26:15:59 eth0: Found Generic MII PHY (0x06) Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx ata1: SATA max UDMA/100 cmd 0xC9002E80 ctl 0xC9002E8A bmdma 0xC9002E00 irq 30 ata2: SATA max UDMA/100 cmd 0xC9002EC0 ctl 0xC9002ECA bmdma 0xC9002E08 irq 30 irq 30: nobody cared! Call trace: [c0005630] dump_stack+0x18/0x28 [c003ae0c] __report_bad_irq+0x34/0xac [c003af38] note_interrupt+0x98/0xd4 [c003a92c] __do_IRQ+0x15c/0x160 [c0003e54] do_IRQ+0x50/0x98 [c0002f64] ret_from_except+0x0/0x18 [c0003ed4] default_idle+0x38/0x5c [c0003f20] cpu_idle+0x28/0x38 [c00023a4] rest_init+0x24/0x34 [c02dc614] start_kernel+0x170/0x1a8 [c00022a4] start_here+0x44/0xb0 handlers: [] (ata_interrupt+0x0/0x27c) Disabling IRQ #30 ata1: dev 0 ATA, max UDMA7, 234493056 sectors: lba48 eth0: Link is Up eth0: Speed: 100, Full duplex. __ Do you Yahoo!? Yahoo! Mail - 250MB free storage. Do more. Manage less. http://info.mail.yahoo.com/mail_250 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lse-tech] Re: A common layer for Accounting packages
Hi Paul, I think the microbenchmarking your link provides is irrelevant. Your link provides benchmarking of doing a fork. However, we are talking about inserting a callback routine in a fork and/or an exit. The overhead is a function call and time spent in the routine. The callback routine can be configured to "do {} while (0)" if a certain CONFIG flag is not set. Thanks, - jay Paul Jackson wrote: So, I think such a fork/execve/exit hooks is harmless now. I don't recall seeing any microbenchmarking of the impact on fork/exit of such hooks. You might find such a benchmark in lmbench, or at http://bulk.fefe.de/scalability/. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] override RLIMIT_SIGPENDING for non-RT signals
Indeed, I think your patch does not go far enough. I can read POSIX to say that the siginfo_t data must be available when `kill' was used, as well. This patch makes it allocate the siginfo_t, even when that exceeds {RLIMIT_SIGPENDING}, for any non-RT signal (< SIGRTMIN) not sent by sigqueue (actually, any signal that couldn't have been faked by a sigqueue call). Of course, in an extreme memory shortage situation, you are SOL and violate POSIX a little before you die horribly from being out of memory anyway. The LEGACY_QUEUE logic already ensures that, for non-RT signals, at most one is ever on the queue. So there really is no risk at all of unbounded resource consumption; the usage can reach {RLIMIT_SIGPENDING} + 31, is all. It's already the case that the limit can be exceeded by (in theory) up to {RLIMIT_NPROC}-1 in race conditions because the bump and the limit check are not atomic. (Obviously you can only get anywhere near that many with assloads of preemption, but exceeding it by a few is not too unlikely.) This patch also fixes that accounting so that it should not be possible to exceed {RLIMIT_SIGPENDING} + SIGRTMIN-1 queue items per user in races. Thanks, Roland Signed-off-by: Roland McGrath <[EMAIL PROTECTED]> --- linux-2.6/kernel/signal.c +++ linux-2.6/kernel/signal.c @@ -260,19 +260,23 @@ next_signal(struct sigpending *pending, return sig; } -static struct sigqueue *__sigqueue_alloc(struct task_struct *t, int flags) +static struct sigqueue *__sigqueue_alloc(struct task_struct *t, int flags, +int override_rlimit) { struct sigqueue *q = NULL; - if (atomic_read(&t->user->sigpending) < + atomic_inc(&t->user->sigpending); + if (override_rlimit || + atomic_read(&t->user->sigpending) <= t->signal->rlim[RLIMIT_SIGPENDING].rlim_cur) q = kmem_cache_alloc(sigqueue_cachep, flags); - if (q) { + if (unlikely(q == NULL)) { + atomic_dec(&t->user->sigpending); + } else { INIT_LIST_HEAD(&q->list); q->flags = 0; q->lock = NULL; q->user = get_uid(t->user); - atomic_inc(&q->user->sigpending); } return(q); } @@ -793,7 +797,9 @@ static int send_signal(int sig, struct s make sure at least one signal gets delivered and don't pass on the info struct. */ - q = __sigqueue_alloc(t, GFP_ATOMIC); + q = __sigqueue_alloc(t, GFP_ATOMIC, (sig < SIGRTMIN && +((unsigned long) info < 2 || + info->si_code >= 0))); if (q) { list_add_tail(&q->list, &signals->list); switch ((unsigned long) info) { @@ -1316,7 +1322,7 @@ struct sigqueue *sigqueue_alloc(void) { struct sigqueue *q; - if ((q = __sigqueue_alloc(current, GFP_KERNEL))) + if ((q = __sigqueue_alloc(current, GFP_KERNEL, 0))) q->flags |= SIGQUEUE_PREALLOC; return(q); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Oops: Unable to handle kernel paging request
I run installed and updated sarge. I downloaded 2.6.10 from kernel.org. When I use apt-get under 2.4.27 everything is OK When I use apt-get under 2.6.10, I get a segfault I run the same combination (allthough a different configuration on a Pentium III (Coppermine) and on a Intel(R) Pentium(R) M processor 1.50GHz, but there it works fine The faulty system is configured for 686. I hope this is useful for you 0 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Processor #1 15:3 APIC version 20 ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 2, version 32, address 0xfec0, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. ACPI: IRQ9 used by override. Enabling APIC mode: Flat. Using 1 I/O APICs Using ACPI (MADT) for SMP configuration information Built 1 zonelists Kernel command line: root=/dev/hda1 ro mapped APIC to d000 (fee0) mapped IOAPIC to c000 (fec0) Initializing CPU#0 PID hash table entries: 4096 (order: 12, 65536 bytes) Detected 2993.709 MHz processor. Using tsc for high-res timesource Console: colour VGA+ 80x25 Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) Memory: 1030900k/1047744k available (1703k kernel code, 16028k reserved, 768k data, 288k init, 130168k highmem) Checking if this processor honours the WP bit even in supervisor mode... Ok. Calibrating delay loop... 5914.62 BogoMIPS (lpj=2957312) Mount-cache hash table entries: 512 (order: 0, 4096 bytes) CPU: After generic identify, caps: bfebfbff CPU: After vendor identify, caps: bfebfbff monitor/mwait feature present. using mwait in idle threads. CPU: Trace cache: 12K uops, L1 D cache: 16K CPU: L2 cache: 1024K CPU: Physical Processor ID: 0 CPU: After all inits, caps:bfebfbff 0080 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. CPU0: Intel P4/Xeon Extended MCE MSRs (12) available Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Checking 'hlt' instruction... OK. CPU0: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 04 per-CPU timeslice cutoff: 2925.86 usecs. task migration cache decay timeout: 3 msecs. Booting processor 1/1 eip 3000 Initializing CPU#1 Calibrating delay loop... 5980.16 BogoMIPS (lpj=2990080) CPU: After generic identify, caps: bfebfbff CPU: After vendor identify, caps: bfebfbff monitor/mwait feature present. CPU: Trace cache: 12K uops, L1 D cache: 16K CPU: L2 cache: 1024K CPU: Physical Processor ID: 0 CPU: After all inits, caps:bfebfbff 0080 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#1. CPU1: Intel P4/Xeon Extended MCE MSRs (12) available CPU1: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 04 Total of 2 processors activated (11894.78 BogoMIPS). ENABLING IO-APIC IRQs ..TIMER: vector=0x31 pin1=2 pin2=-1 checking TSC synchronization across 2 CPUs: passed. Brought up 2 CPUs CPU0: domain 0: span 03 groups: 01 02 CPU1: domain 0: span 03 groups: 02 01 checking if image is initramfs...it isn't (bad gzip magic numbers); looks like an initrd Freeing initrd memory: 3684k freed NET: Registered protocol family 16 EISA bus registered PCI: PCI BIOS revision 2.10 entry at 0xf0031, last bus=3 PCI: Using configuration type 1 ACPI: Subsystem revision 20041105 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (00:00) PCI: Probing PCI hardware (bus 00) PCI: Ignoring BAR0-3 of IDE controller :00:1f.1 PCI: Transparent bridge - :00:1e.0 ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P1._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P2._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P3._PRT] ACPI: Power Resource [URP1] (off) ACPI: Power Resource [URP2] (off) ACPI: Power Resource [FDDP] (off) ACPI: Power Resource [LPTP] (off) ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 *5 6 7 9 10 11 12 14 15) ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 10 *11 12 14 15) ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 9 *10 11 12 14 15) ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 9 *10 11 12 14 15) ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 9 10 *11 12 14 15) ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 7 *9 10 11 12 14 15) SCSI subsystem initialized PCI: Using ACPI for IRQ routing ** PCI interrupts are no longer routed automatically
Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02
Lee Revell wrote: On Thu, 2005-02-24 at 10:27 +1100, Nick Piggin wrote: If you are using i386 with 2-level page tables (no highmem), then the behaviour should be more or less identical. Odd. IIRC last time I really tested this a few months ago, the worst case latency on that machine was about 150us. Currently its 422us from the same clear_page_range code path. On my Athlon XP the clear_page_range latency is not showing up at all, and the worst delay so far is only 35us, most of which is the timer interrupt IOW that machine is showing the best achievable latency (with PREEMPT_DESKTOP). The machine seeing 422 us latencies in clear_page_range is a 600Mhz C3, which is known to be a FSB limited architecture. Well it should be pretty trivial to add a break in there. I don't think it can get into 2.6.11 at this point though, so we'll revisit this for 2.6.12 if the clear_page_range optimisations don't get anywhere. Nick - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/