[PATCH 0/5] sched: asymmetrical packing for POWER7 SMT4
This patch series implements asymmetric SMT packing which ensures consistently good performance on POWER7. I'm no scheduler expert so these probably need some review. It works for my simple test case on a POWER7 with SMT4. I can now run a number of CPU bound processes and they are pulled down to lower threads giving the best performance consistently. Suresh: some of this touches code you changed recently, so you may want to test in your environment. The 1st patch is a fix for SMT4 in general. The 2nd adds actual the asymmetrical packing infrastructure. The 3rd adds the powerpc specific hooks for POWER7. The 4th and 5th are fixes required for packing to work. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/5] sched: fix capacity calculations for SMT4
When calculating capacity we use the following calculation: capacity = DIV_ROUND_CLOSEST(power, SCHED_LOAD_SCALE); In SMT2, power will be 1178/2 (provided we are not scaling power with freq say) and SCHED_LOAD_SCALE will be 1024, resulting in capacity being 1. With SMT4 however, power will be 1178/4, hence capacity will end up as 0. Fix this by ensuring capacity is always at least 1 after this calculation. Signed-off-by: Michael Neuling mi...@neuling.org --- I'm not sure this is the correct fix but this works for me. Original post here: http://lkml.org/lkml/2010/3/30/884 --- kernel/sched_fair.c |6 +- 1 file changed, 5 insertions(+), 1 deletion(-) Index: linux-2.6-ozlabs/kernel/sched_fair.c === --- linux-2.6-ozlabs.orig/kernel/sched_fair.c +++ linux-2.6-ozlabs/kernel/sched_fair.c @@ -1482,6 +1482,7 @@ static int select_task_rq_fair(struct ta } capacity = DIV_ROUND_CLOSEST(power, SCHED_LOAD_SCALE); + capacity = max(capacity, 1UL); if (tmp-flags SD_POWERSAVINGS_BALANCE) nr_running /= 2; @@ -2488,6 +2489,7 @@ static inline void update_sg_lb_stats(st sgs-group_capacity = DIV_ROUND_CLOSEST(group-cpu_power, SCHED_LOAD_SCALE); + sgs-group_capacity = max(sgs-group_capacity, 1UL); } /** @@ -2795,9 +2797,11 @@ find_busiest_queue(struct sched_group *g for_each_cpu(i, sched_group_cpus(group)) { unsigned long power = power_of(i); - unsigned long capacity = DIV_ROUND_CLOSEST(power, SCHED_LOAD_SCALE); + unsigned long capacity; unsigned long wl; + capacity = max(DIV_ROUND_CLOSEST(power, SCHED_LOAD_SCALE), 1UL); + if (!cpumask_test_cpu(i, cpus)) continue; ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/5] sched: add asymmetric packing option for sibling domain
Some CPUs perform better when tasks are run on lower thread numbers. In the case of POWER7, when higher threads are idled, the core can run in lower SMT modes and hence perform better. This creates a new sd flag to prefer lower threads. Based heavily on patch from Peter Zijlstra. Signed-off-by: Michael Neuling mi...@neuling.org --- Peter: Since this is based mainly off your initial patch, it should have your signed-off-by too, but I didn't want to add without your permission. Can I add it? --- include/linux/sched.h|4 ++ include/linux/topology.h |1 kernel/sched_fair.c | 64 --- 3 files changed, 65 insertions(+), 4 deletions(-) Index: linux-2.6-ozlabs/include/linux/sched.h === --- linux-2.6-ozlabs.orig/include/linux/sched.h +++ linux-2.6-ozlabs/include/linux/sched.h @@ -799,7 +799,7 @@ enum cpu_idle_type { #define SD_POWERSAVINGS_BALANCE0x0100 /* Balance for power savings */ #define SD_SHARE_PKG_RESOURCES 0x0200 /* Domain members share cpu pkg resources */ #define SD_SERIALIZE 0x0400 /* Only a single load balancing instance */ - +#define SD_ASYM_PACKING0x0800 /* Place busy groups earlier in the domain */ #define SD_PREFER_SIBLING 0x1000 /* Prefer to place tasks in a sibling domain */ enum powersavings_balance_level { @@ -834,6 +834,8 @@ static inline int sd_balance_for_package return SD_PREFER_SIBLING; } +extern int __weak arch_sd_sibiling_asym_packing(void); + /* * Optimise SD flags for power savings: * SD_BALANCE_NEWIDLE helps agressive task consolidation and power savings. Index: linux-2.6-ozlabs/include/linux/topology.h === --- linux-2.6-ozlabs.orig/include/linux/topology.h +++ linux-2.6-ozlabs/include/linux/topology.h @@ -102,6 +102,7 @@ int arch_update_cpu_topology(void); | 1*SD_SHARE_PKG_RESOURCES \ | 0*SD_SERIALIZE\ | 0*SD_PREFER_SIBLING \ + | arch_sd_sibiling_asym_packing() \ , \ .last_balance = jiffies, \ .balance_interval = 1,\ Index: linux-2.6-ozlabs/kernel/sched_fair.c === --- linux-2.6-ozlabs.orig/kernel/sched_fair.c +++ linux-2.6-ozlabs/kernel/sched_fair.c @@ -2493,6 +2493,31 @@ static inline void update_sg_lb_stats(st } /** + * update_sd_pick_busiest - return 1 on busiest + */ +static int update_sd_pick_busiest(struct sched_domain *sd, + struct sd_lb_stats *sds, + struct sched_group *sg, + struct sg_lb_stats *sgs) +{ + if (sgs-sum_nr_running sgs-group_capacity) + return 1; + + if (sgs-group_imb) + return 1; + + if ((sd-flags SD_ASYM_PACKING) sgs-sum_nr_running) { + if (!sds-busiest) + return 1; + + if (group_first_cpu(sds-busiest) group_first_cpu(sg)) + return 1; + } + + return 0; +} + +/** * update_sd_lb_stats - Update sched_group's statistics for load balancing. * @sd: sched_domain whose statistics are to be updated. * @this_cpu: Cpu for which load balance is currently performed. @@ -2546,9 +2571,8 @@ static inline void update_sd_lb_stats(st sds-this = group; sds-this_nr_running = sgs.sum_nr_running; sds-this_load_per_task = sgs.sum_weighted_load; - } else if (sgs.avg_load sds-max_load - (sgs.sum_nr_running sgs.group_capacity || - sgs.group_imb)) { + } else if (sgs.avg_load = sds-max_load + update_sd_pick_busiest(sd, sds, group, sgs)) { sds-max_load = sgs.avg_load; sds-busiest = group; sds-busiest_nr_running = sgs.sum_nr_running; @@ -2562,6 +2586,36 @@ static inline void update_sd_lb_stats(st } while (group != sd-groups); } +int __weak arch_sd_sibiling_asym_packing(void) +{ + return 0*SD_ASYM_PACKING; +} + +/** + * check_asym_packing - Check to see if we the group is packed into + * the sched doman + */ +static int check_asym_packing(struct sched_domain *sd, + struct sd_lb_stats *sds, + int this_cpu, unsigned long *imbalance) +{ + int busiest_cpu; + + if (!(sd-flags SD_ASYM_PACKING)) + return 0; + +
[PATCH 3/5] powerpc: enabled asymmetric SMT scheduling on POWER7
The POWER7 core has dynamic SMT mode switching which is controlled by the hypervisor. There are 3 SMT modes: SMT1 uses thread 0 SMT2 uses threads 0 1 SMT4 uses threads 0, 1, 2 3 When in any particular SMT mode, all threads have the same performance as each other (ie. at any moment in time, all threads perform the same). The SMT mode switching works such that when linux has threads 2 3 idle and 0 1 active, it will cede (H_CEDE hypercall) threads 2 and 3 in the idle loop and the hypervisor will automatically switch to SMT2 for that core (independent of other cores). The opposite is not true, so if threads 0 1 are idle and 2 3 are active, we will stay in SMT4 mode. Similarly if thread 0 is active and threads 1, 2 3 are idle, we'll go into SMT1 mode. If we can get the core into a lower SMT mode (SMT1 is best), the threads will perform better (since they share less core resources). Hence when we have idle threads, we want them to be the higher ones. This adds a feature bit for asymmetric packing to powerpc and then enables it on POWER7. Signed-off-by: Michael Neuling mi...@neuling.org --- arch/powerpc/include/asm/cputable.h |3 ++- arch/powerpc/kernel/process.c | 13 + 2 files changed, 15 insertions(+), 1 deletion(-) Index: linux-2.6-ozlabs/arch/powerpc/include/asm/cputable.h === --- linux-2.6-ozlabs.orig/arch/powerpc/include/asm/cputable.h +++ linux-2.6-ozlabs/arch/powerpc/include/asm/cputable.h @@ -195,6 +195,7 @@ extern const char *powerpc_base_platform #define CPU_FTR_SAOLONG_ASM_CONST(0x0020) #define CPU_FTR_CP_USE_DCBTZ LONG_ASM_CONST(0x0040) #define CPU_FTR_UNALIGNED_LD_STD LONG_ASM_CONST(0x0080) +#define CPU_FTR_ASYM_SMT LONG_ASM_CONST(0x0100) #ifndef __ASSEMBLY__ @@ -409,7 +410,7 @@ extern const char *powerpc_base_platform CPU_FTR_MMCRA | CPU_FTR_SMT | \ CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | \ CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \ - CPU_FTR_DSCR | CPU_FTR_SAO) + CPU_FTR_DSCR | CPU_FTR_SAO | CPU_FTR_ASYM_SMT) #define CPU_FTRS_CELL (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \ CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \ CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \ Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c === --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c +++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c @@ -1265,3 +1265,16 @@ unsigned long randomize_et_dyn(unsigned return ret; } + +unsigned char asym_smt_print = 1; +int arch_sd_sibiling_asym_packing(void) +{ + if (cpu_has_feature(CPU_FTR_ASYM_SMT)){ + if (asym_smt_print) { + pr_info(Enabling Asymetric SMT scheduling\n); + asym_smt_print = 0; + } + return SD_ASYM_PACKING; + } + return 0; +} ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 5/5] sched: make fix_small_imbalance work with asymmetric packing
With the asymmetric packing infrastructure, fix_small_imbalance is causing idle higher threads to pull tasks off lower threads. This is being caused by an off-by-one error. Signed-off-by: Michael Neuling mi...@neuling.org --- I'm not sure this is the right fix but without it, higher threads pull tasks off the lower threads, then the packing pulls it back down, etc etc and tasks bounce around constantly. --- kernel/sched_fair.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6-ozlabs/kernel/sched_fair.c === --- linux-2.6-ozlabs.orig/kernel/sched_fair.c +++ linux-2.6-ozlabs/kernel/sched_fair.c @@ -2652,7 +2652,7 @@ static inline void fix_small_imbalance(s * SCHED_LOAD_SCALE; scaled_busy_load_per_task /= sds-busiest-cpu_power; - if (sds-max_load - sds-this_load + scaled_busy_load_per_task = + if (sds-max_load - sds-this_load + scaled_busy_load_per_task (scaled_busy_load_per_task * imbn)) { *imbalance = sds-busiest_load_per_task; return; ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 4/5] sched: Mark the balance type for use in need_active_balance()
need_active_balance() gates the asymmetric packing based due to power save logic, but for packing we don't care. This marks the type of balanace we are attempting to do perform from f_b_g() and stops need_active_balance() power save logic gating a balance in the asymmetric packing case. Signed-off-by: Michael Neuling mi...@neuling.org --- kernel/sched_fair.c | 24 +++- 1 file changed, 19 insertions(+), 5 deletions(-) Index: linux-2.6-ozlabs/kernel/sched_fair.c === --- linux-2.6-ozlabs.orig/kernel/sched_fair.c +++ linux-2.6-ozlabs/kernel/sched_fair.c @@ -91,6 +91,13 @@ const_debug unsigned int sysctl_sched_mi static const struct sched_class fair_sched_class; +enum balance_type { + BALANCE_NONE = 0, + BALANCE_LOAD, + BALANCE_POWER, + BALANCE_PACKING +}; + /** * CFS operations on generic schedulable entities: */ @@ -2783,7 +2790,8 @@ static inline void calculate_imbalance(s static struct sched_group * find_busiest_group(struct sched_domain *sd, int this_cpu, unsigned long *imbalance, enum cpu_idle_type idle, - int *sd_idle, const struct cpumask *cpus, int *balance) + int *sd_idle, const struct cpumask *cpus, int *balance, + enum balance_type *bt) { struct sd_lb_stats sds; @@ -2808,6 +2816,7 @@ find_busiest_group(struct sched_domain * if (!(*balance)) goto ret; + *bt = BALANCE_PACKING; if ((idle == CPU_IDLE || idle == CPU_NEWLY_IDLE) check_asym_packing(sd, sds, this_cpu, imbalance)) return sds.busiest; @@ -2828,6 +2837,7 @@ find_busiest_group(struct sched_domain * /* Looks like there is an imbalance. Compute it */ calculate_imbalance(sds, this_cpu, imbalance); + *bt = BALANCE_LOAD; return sds.busiest; out_balanced: @@ -2835,10 +2845,12 @@ out_balanced: * There is no obvious imbalance. But check if we can do some balancing * to save power. */ + *bt = BALANCE_POWER; if (check_power_save_busiest_group(sds, this_cpu, imbalance)) return sds.busiest; ret: *imbalance = 0; + *bt = BALANCE_NONE; return NULL; } @@ -2899,9 +2911,10 @@ find_busiest_queue(struct sched_group *g /* Working cpumask for load_balance and load_balance_newidle. */ static DEFINE_PER_CPU(cpumask_var_t, load_balance_tmpmask); -static int need_active_balance(struct sched_domain *sd, int sd_idle, int idle) +static int need_active_balance(struct sched_domain *sd, int sd_idle, int idle, + enum balance_type *bt) { - if (idle == CPU_NEWLY_IDLE) { + if (idle == CPU_NEWLY_IDLE *bt != BALANCE_PACKING) { /* * The only task running in a non-idle cpu can be moved to this * cpu in an attempt to completely freeup the other CPU @@ -2946,6 +2959,7 @@ static int load_balance(int this_cpu, st struct rq *busiest; unsigned long flags; struct cpumask *cpus = __get_cpu_var(load_balance_tmpmask); + enum balance_type bt; cpumask_copy(cpus, cpu_active_mask); @@ -2964,7 +2978,7 @@ static int load_balance(int this_cpu, st redo: update_shares(sd); group = find_busiest_group(sd, this_cpu, imbalance, idle, sd_idle, - cpus, balance); + cpus, balance, bt); if (*balance == 0) goto out_balanced; @@ -3018,7 +3032,7 @@ redo: schedstat_inc(sd, lb_failed[idle]); sd-nr_balance_failed++; - if (need_active_balance(sd, sd_idle, idle)) { + if (need_active_balance(sd, sd_idle, idle, bt)) { raw_spin_lock_irqsave(busiest-lock, flags); /* don't kick the migration_thread, if the curr ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/5] powerpc: enabled asymmetric SMT scheduling on POWER7
In message 20100409062118.f04b4cb...@localhost.localdomain you wrote: The POWER7 core has dynamic SMT mode switching which is controlled by the hypervisor. There are 3 SMT modes: SMT1 uses thread 0 SMT2 uses threads 0 1 SMT4 uses threads 0, 1, 2 3 When in any particular SMT mode, all threads have the same performance as each other (ie. at any moment in time, all threads perform the same). The SMT mode switching works such that when linux has threads 2 3 idle and 0 1 active, it will cede (H_CEDE hypercall) threads 2 and 3 in the idle loop and the hypervisor will automatically switch to SMT2 for that core (independent of other cores). The opposite is not true, so if threads 0 1 are idle and 2 3 are active, we will stay in SMT4 mode. Similarly if thread 0 is active and threads 1, 2 3 are idle, we'll go into SMT1 mode. If we can get the core into a lower SMT mode (SMT1 is best), the threads will perform better (since they share less core resources). Hence when we have idle threads, we want them to be the higher ones. This adds a feature bit for asymmetric packing to powerpc and then enables it on POWER7. Signed-off-by: Michael Neuling mi...@neuling.org --- arch/powerpc/include/asm/cputable.h |3 ++- arch/powerpc/kernel/process.c | 13 + 2 files changed, 15 insertions(+), 1 deletion(-) Index: linux-2.6-ozlabs/arch/powerpc/include/asm/cputable.h === --- linux-2.6-ozlabs.orig/arch/powerpc/include/asm/cputable.h +++ linux-2.6-ozlabs/arch/powerpc/include/asm/cputable.h @@ -195,6 +195,7 @@ extern const char *powerpc_base_platform #define CPU_FTR_SAO LONG_ASM_CONST(0x0020) #define CPU_FTR_CP_USE_DCBTZ LONG_ASM_CONST(0x0040) #define CPU_FTR_UNALIGNED_LD_STD LONG_ASM_CONST(0x0080) +#define CPU_FTR_ASYM_SMT LONG_ASM_CONST(0x0100) #ifndef __ASSEMBLY__ @@ -409,7 +410,7 @@ extern const char *powerpc_base_platform CPU_FTR_MMCRA | CPU_FTR_SMT | \ CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | \ CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \ - CPU_FTR_DSCR | CPU_FTR_SAO) + CPU_FTR_DSCR | CPU_FTR_SAO | CPU_FTR_ASYM_SMT) #define CPU_FTRS_CELL(CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \ CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \ CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \ Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c === --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c +++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c @@ -1265,3 +1265,16 @@ unsigned long randomize_et_dyn(unsigned return ret; } + +unsigned char asym_smt_print = 1; +int arch_sd_sibiling_asym_packing(void) +{ + if (cpu_has_feature(CPU_FTR_ASYM_SMT)){ + if (asym_smt_print) { + pr_info(Enabling Asymetric SMT scheduling\n); + asym_smt_print = 0; + } Oops... before anyone else laughs at me, I'm changing this to use printk_once. Mikey + return SD_ASYM_PACKING; + } + return 0; +} ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 net-next 0/2] timestamping support for gianfar
On 2010-04-07 Manfred Rudigier wrote: this patch series adds support for hardware time stamping to gianfar. It uses the new SO_TIMESTAMPING infrastructure to deliver raw hardware timestamps to user space applications. Freescale CPUs with an eTSEC are able to time stamp all incoming network packets and can also time stamp transmit packets when instructed. The time stamps are generated by the eTSEC timer clock module which is running either from an external oscillator or internal clock. The submitted patches do not initialize the timer clock module since the oscillator frequency might be different from board to board. Thus the user must configure the timer clock module by hand at the moment - otherwise no time stamps will be reported. Below is a simple example code which shows how to configure the timer clock module on the P2020DS/RDB. It can be used to quickly try out the patches. Testing was done with the time stamping program from Patrick Ohly which can be found in the kernel sources under Documentation/networking/timestamping. I have verified the functionality on the MPC8313RDB, P2020DS and P2020RDB board with the latest net-2.6 kernel. Send and receive time stamps could be retrieved on all eTSEC ports. Hello, I've modified the patches according to your comments so far. Version 2 changes: - Only two instead of four patches so every patch adds some useful functionality - hwts_rx_en and hwts_tx_en are no bitfields any more Manfred ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 net-next 1/2] gianfar: Add hardware RX timestamping support
The device is configured to insert hardware timestamps into all received packets. The RX timestamps are extracted from the padding alingment bytes during the clean_rx_ring operation and copied into the skb_shared_hwtstamps struct of the skb. This extraction only happens if the rx_filter was set to something else than HWTSTAMP_FILTER_NONE with the SIOCSHWTSTAMP ioctl command. Hardware timestamping is only supported for eTSEC devices. To indicate device support the new FSL_GIANFAR_DEV_HAS_TIMER flag was introduced. Signed-off-by: Manfred Rudigier manfred.rudig...@omicron.at --- drivers/net/gianfar.c | 66 +--- drivers/net/gianfar.h |5 +++ 2 files changed, 66 insertions(+), 5 deletions(-) diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c index 080d1ce..552e10d 100644 --- a/drivers/net/gianfar.c +++ b/drivers/net/gianfar.c @@ -82,6 +82,7 @@ #include linux/tcp.h #include linux/udp.h #include linux/in.h +#include linux/net_tstamp.h #include asm/io.h #include asm/irq.h @@ -377,6 +378,13 @@ static void gfar_init_mac(struct net_device *ndev) rctrl |= RCTRL_PADDING(priv-padding); } + /* Insert receive time stamps into padding alignment bytes */ + if (priv-device_flags FSL_GIANFAR_DEV_HAS_TIMER) { + rctrl = ~RCTRL_PAL_MASK; + rctrl |= RCTRL_PRSDEP_INIT | RCTRL_TS_ENABLE | RCTRL_PADDING(8); + priv-padding = 8; + } + /* keep vlan related bits if it's enabled */ if (priv-vlgrp) { rctrl |= RCTRL_VLEX | RCTRL_PRSDEP_INIT; @@ -501,7 +509,8 @@ void unlock_tx_qs(struct gfar_private *priv) /* Returns 1 if incoming frames use an FCB */ static inline int gfar_uses_fcb(struct gfar_private *priv) { - return priv-vlgrp || priv-rx_csum_enable; + return priv-vlgrp || priv-rx_csum_enable || + (priv-device_flags FSL_GIANFAR_DEV_HAS_TIMER); } static void free_tx_pointers(struct gfar_private *priv) @@ -742,7 +751,8 @@ static int gfar_of_init(struct of_device *ofdev, struct net_device **pdev) FSL_GIANFAR_DEV_HAS_CSUM | FSL_GIANFAR_DEV_HAS_VLAN | FSL_GIANFAR_DEV_HAS_MAGIC_PACKET | - FSL_GIANFAR_DEV_HAS_EXTENDED_HASH; + FSL_GIANFAR_DEV_HAS_EXTENDED_HASH | + FSL_GIANFAR_DEV_HAS_TIMER; ctype = of_get_property(np, phy-connection-type, NULL); @@ -772,6 +782,38 @@ err_grp_init: return err; } +static int gfar_hwtstamp_ioctl(struct net_device *netdev, + struct ifreq *ifr, int cmd) +{ + struct hwtstamp_config config; + struct gfar_private *priv = netdev_priv(netdev); + + if (copy_from_user(config, ifr-ifr_data, sizeof(config))) + return -EFAULT; + + /* reserved for future extensions */ + if (config.flags) + return -EINVAL; + + if (config.tx_type) + return -ERANGE; + + switch (config.rx_filter) { + case HWTSTAMP_FILTER_NONE: + priv-hwts_rx_en = 0; + break; + default: + if (!(priv-device_flags FSL_GIANFAR_DEV_HAS_TIMER)) + return -ERANGE; + priv-hwts_rx_en = 1; + config.rx_filter = HWTSTAMP_FILTER_ALL; + break; + } + + return copy_to_user(ifr-ifr_data, config, sizeof(config)) ? + -EFAULT : 0; +} + /* Ioctl MII Interface */ static int gfar_ioctl(struct net_device *dev, struct ifreq *rq, int cmd) { @@ -780,6 +822,9 @@ static int gfar_ioctl(struct net_device *dev, struct ifreq *rq, int cmd) if (!netif_running(dev)) return -EINVAL; + if (cmd == SIOCSHWTSTAMP) + return gfar_hwtstamp_ioctl(dev, rq, cmd); + if (!priv-phydev) return -ENODEV; @@ -982,7 +1027,8 @@ static int gfar_probe(struct of_device *ofdev, else priv-padding = 0; - if (dev-features NETIF_F_IP_CSUM) + if (dev-features NETIF_F_IP_CSUM || + priv-device_flags FSL_GIANFAR_DEV_HAS_TIMER) dev-hard_header_len += GMAC_FCB_LEN; /* Program the isrg regs only if number of grps 1 */ @@ -2474,6 +2520,17 @@ static int gfar_process_frame(struct net_device *dev, struct sk_buff *skb, skb_pull(skb, amount_pull); } + /* Get receive timestamp from the skb */ + if (priv-hwts_rx_en) { + struct skb_shared_hwtstamps *shhwtstamps = skb_hwtstamps(skb); + u64 *ns = (u64 *) skb-data; + memset(shhwtstamps, 0, sizeof(*shhwtstamps)); + shhwtstamps-hwtstamp = ns_to_ktime(*ns); + } + + if (priv-padding) + skb_pull(skb, priv-padding); + if (priv-rx_csum_enable) gfar_rx_checksum(skb, fcb); @@
[PATCH v2 net-next 2/2] gianfar: Add hardware TX timestamping support
If a packet has the skb_shared_tx-hardware flag set the device is instructed to generate a TX timestamp and write it back to memory after the frame is transmitted. During the clean_tx_ring operation the timestamp will be extracted and copied into the skb_shared_hwtstamps struct of the skb. TX timestamping is enabled by setting the tx_type to something else than HWTSTAMP_TX_OFF with the SIOCSHWTSTAMP ioctl command. It is only supported by eTSEC devices. Signed-off-by: Manfred Rudigier manfred.rudig...@omicron.at --- drivers/net/gianfar.c | 118 + drivers/net/gianfar.h |3 +- 2 files changed, 101 insertions(+), 20 deletions(-) diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c index 552e10d..becc3f3 100644 --- a/drivers/net/gianfar.c +++ b/drivers/net/gianfar.c @@ -795,8 +795,18 @@ static int gfar_hwtstamp_ioctl(struct net_device *netdev, if (config.flags) return -EINVAL; - if (config.tx_type) + switch (config.tx_type) { + case HWTSTAMP_TX_OFF: + priv-hwts_tx_en = 0; + break; + case HWTSTAMP_TX_ON: + if (!(priv-device_flags FSL_GIANFAR_DEV_HAS_TIMER)) + return -ERANGE; + priv-hwts_tx_en = 1; + break; + default: return -ERANGE; + } switch (config.rx_filter) { case HWTSTAMP_FILTER_NONE: @@ -1972,23 +1982,29 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev) struct netdev_queue *txq; struct gfar __iomem *regs = NULL; struct txfcb *fcb = NULL; - struct txbd8 *txbdp, *txbdp_start, *base; + struct txbd8 *txbdp, *txbdp_start, *base, *txbdp_tstamp = NULL; u32 lstatus; - int i, rq = 0; + int i, rq = 0, do_tstamp = 0; u32 bufaddr; unsigned long flags; - unsigned int nr_frags, length; - + unsigned int nr_frags, nr_txbds, length; + union skb_shared_tx *shtx; rq = skb-queue_mapping; tx_queue = priv-tx_queue[rq]; txq = netdev_get_tx_queue(dev, rq); base = tx_queue-tx_bd_base; regs = tx_queue-grp-regs; + shtx = skb_tx(skb); + + /* check if time stamp should be generated */ + if (unlikely(shtx-hardware priv-hwts_tx_en)) + do_tstamp = 1; /* make space for additional header when fcb is needed */ if (((skb-ip_summed == CHECKSUM_PARTIAL) || - (priv-vlgrp vlan_tx_tag_present(skb))) + (priv-vlgrp vlan_tx_tag_present(skb)) || + unlikely(do_tstamp)) (skb_headroom(skb) GMAC_FCB_LEN)) { struct sk_buff *skb_new; @@ -2005,8 +2021,14 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev) /* total number of fragments in the SKB */ nr_frags = skb_shinfo(skb)-nr_frags; + /* calculate the required number of TxBDs for this skb */ + if (unlikely(do_tstamp)) + nr_txbds = nr_frags + 2; + else + nr_txbds = nr_frags + 1; + /* check if there is space to queue this packet */ - if ((nr_frags+1) tx_queue-num_txbdfree) { + if (nr_txbds tx_queue-num_txbdfree) { /* no space, stop the queue */ netif_tx_stop_queue(txq); dev-stats.tx_fifo_errors++; @@ -2018,9 +2040,19 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev) txq-tx_packets ++; txbdp = txbdp_start = tx_queue-cur_tx; + lstatus = txbdp-lstatus; + + /* Time stamp insertion requires one additional TxBD */ + if (unlikely(do_tstamp)) + txbdp_tstamp = txbdp = next_txbd(txbdp, base, + tx_queue-tx_ring_size); if (nr_frags == 0) { - lstatus = txbdp-lstatus | BD_LFLAG(TXBD_LAST | TXBD_INTERRUPT); + if (unlikely(do_tstamp)) + txbdp_tstamp-lstatus |= BD_LFLAG(TXBD_LAST | + TXBD_INTERRUPT); + else + lstatus |= BD_LFLAG(TXBD_LAST | TXBD_INTERRUPT); } else { /* Place the fragment addresses and lengths into the TxBDs */ for (i = 0; i nr_frags; i++) { @@ -2066,11 +2098,32 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev) gfar_tx_vlan(skb, fcb); } - /* setup the TxBD length and buffer pointer for the first BD */ + /* Setup tx hardware time stamping if requested */ + if (unlikely(do_tstamp)) { + shtx-in_progress = 1; + if (fcb == NULL) + fcb = gfar_add_fcb(skb); + fcb-ptp = 1; + lstatus |= BD_LFLAG(TXBD_TOE); + } + txbdp_start-bufPtr = dma_map_single(priv-ofdev-dev,
Re: [-next April 8] eHEA driver failure on powerpc
Sachin Sant wrote: With today's next release, eHEA network interface on couple of power6 boxes fails to initialize. # modprobe ehea IBM eHEA ethernet device driver (Release EHEA_0102) alloc irq_desc for 256 on node 0 alloc kstat_irqs on node 0 irq: irq 590080 on host null mapped to virtual irq 256 ehea: Error in ehea_plpar_hcall_norets: opcode=26c ret=fffc arg1=8300 arg2=0 arg3=70050400 arg4=fc9b arg5=200 arg6=0 arg7=0 ehea: Error in ehea_reg_mr_section: register_rpage_mr failed ehea: Error in ehea_reg_kernel_mr: registering mr failed ehea: Error in ehea_setup_ports: creating MR failed ehea 23c00200.lhea: setup_ports failed ehea: probe of 23c00200.lhea failed with error -5 I tracked this problem to the following commit. commit 7545ba6f82924d4523f8f8a2baf2e517a750265d powerpc/mm: Bump SECTION_SIZE_BITS from 16MB to 256MB If i revert this commit, the network interface is initialized properly. Verified this on two different power6 boxes. Thanks -Sachin -- - Sachin Sant IBM Linux Technology Center India Systems and Technology Labs Bangalore, India - ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/2] [V5] Add non-Virtex5 support for LL TEMAC driver
On Thu, Apr 8, 2010 at 11:08 AM, John Linn john.l...@xilinx.com wrote: This patch adds support for using the LL TEMAC Ethernet driver on non-Virtex 5 platforms by adding support for accessing the Soft DMA registers as if they were memory mapped instead of solely through the DCR's (available on the Virtex 5). The patch also updates the driver so that it runs on the MicroBlaze. The changes were tested on the PowerPC 440, PowerPC 405, and the MicroBlaze platforms. Signed-off-by: John Tyner jty...@cs.ucr.edu Signed-off-by: John Linn john.l...@xilinx.com Picked up and build tested both patches on 405, 440, 60x and ppc64. No build problems found either built-in or as a module. for both: Acked-by: Grant Likely grant.lik...@secretlab.ca g. --- V2 - Incorporated comments from Grant and added more logic to allow the driver to work on MicroBlaze. V3 - Only updated it to apply to head, minor change to include slab.h. Also verified that it now builds for MicroBlaze. Retested on PowerPC and MicroBlaze. V4 - Removed buffer alignment for skb and called the network functions that already do the alignment for cache line and word alignment. Added constants to MicroBlaze system to make sure network alignment is maintained. Also updated the Kconfig so it depends on Microblaze or PPC based on Grant's comment. V5 - Respun the patch on top of a new patch to the driver which removed the call to virt_to_bus as it's now illegal and caused a failure when building the driver in linux-next. Retested with 440, 405 and Microblaze. Grant, can you do a build test to verify no build issues? --- arch/microblaze/include/asm/system.h | 11 +++ drivers/net/Kconfig | 2 +- drivers/net/ll_temac.h | 14 +++- drivers/net/ll_temac_main.c | 137 + 4 files changed, 126 insertions(+), 38 deletions(-) diff --git a/arch/microblaze/include/asm/system.h b/arch/microblaze/include/asm/system.h index 59efb3f..48c4f03 100644 --- a/arch/microblaze/include/asm/system.h +++ b/arch/microblaze/include/asm/system.h @@ -12,6 +12,7 @@ #include asm/registers.h #include asm/setup.h #include asm/irqflags.h +#include asm/cache.h #include asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h @@ -96,4 +97,14 @@ extern struct dentry *of_debugfs_root; #define arch_align_stack(x) (x) +/* + * MicroBlaze doesn't handle unaligned accesses in hardware. + * + * Based on this we force the IP header alignment in network drivers. + * We also modify NET_SKB_PAD to be a cacheline in size, thus maintaining + * cacheline alignment of buffers. + */ +#define NET_IP_ALIGN 2 +#define NET_SKB_PAD L1_CACHE_BYTES + #endif /* _ASM_MICROBLAZE_SYSTEM_H */ diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index 7b832c7..9073741 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -2434,8 +2434,8 @@ config MV643XX_ETH config XILINX_LL_TEMAC tristate Xilinx LL TEMAC (LocalLink Tri-mode Ethernet MAC) driver + depends on PPC || MICROBLAZE select PHYLIB - depends on PPC_DCR_NATIVE help This driver supports the Xilinx 10/100/1000 LocalLink TEMAC core used in Xilinx Spartan and Virtex FPGAs diff --git a/drivers/net/ll_temac.h b/drivers/net/ll_temac.h index 1af66a1..c033584 100644 --- a/drivers/net/ll_temac.h +++ b/drivers/net/ll_temac.h @@ -5,8 +5,11 @@ #include linux/netdevice.h #include linux/of.h #include linux/spinlock.h + +#ifdef CONFIG_PPC_DCR #include asm/dcr.h #include asm/dcr-regs.h +#endif /* packet size info */ #define XTE_HDR_SIZE 14 /* size of Ethernet header */ @@ -290,9 +293,6 @@ This option defaults to enabled (set) */ #define TX_CONTROL_CALC_CSUM_MASK 1 -#define XTE_ALIGN 32 -#define BUFFER_ALIGN(adr) ((XTE_ALIGN - ((u32) adr)) % XTE_ALIGN) - #define MULTICAST_CAM_TABLE_NUM 4 /* TX/RX CURDESC_PTR points to first descriptor */ @@ -335,9 +335,15 @@ struct temac_local { struct mii_bus *mii_bus; /* MII bus reference */ int mdio_irqs[PHY_MAX_ADDR]; /* IRQs table for MDIO bus */ - /* IO registers and IRQs */ + /* IO registers, dma functions and IRQs */ void __iomem *regs; + void __iomem *sdma_regs; +#ifdef CONFIG_PPC_DCR dcr_host_t sdma_dcrs; +#endif + u32 (*dma_in)(struct temac_local *, int); + void (*dma_out)(struct temac_local *, int, u32); + int tx_irq; int rx_irq; int emac_num; diff --git a/drivers/net/ll_temac_main.c b/drivers/net/ll_temac_main.c index ce9aa78..2b69d6c 100644 --- a/drivers/net/ll_temac_main.c +++ b/drivers/net/ll_temac_main.c @@ -20,9 +20,6 @@ * or rx, so this should be okay. * * TODO: - * - Fix driver to work on more than just Virtex5. Right now the driver - * assumes that the locallink DMA registers are accessed via DCR - *
RE: Upgrade to 2.6.26 results in Oops during request_irq
-Original Message- From: Sparks, Sam Sent: Thursday, April 08, 2010 4:15 PM In the interest of making it easier for someone to help, I've been able to replicate the problem with the following minimal kernel module: #include linux/interrupt.h #include linux/irq.h #include linux/module.h static unsigned int cpld_virq = NO_IRQ; unsigned value = 0xdeadbeef; static irqreturn_t cpld_isr(int irq, void *dev_id) { return IRQ_HANDLED; } void __exit cpld_cleanup(void) { irq_dispose_mapping(cpld_virq); return; } int __init cpld_init(void) { int retval; unsigned long cpld_interrupt = 23; cpld_virq = irq_create_mapping(NULL, cpld_interrupt); if (cpld_virq == NO_IRQ) { printk(KERN_ERR Could not map HW IRQ %lu\n, cpld_interrupt); return -EBUSY; } retval = request_irq(cpld_virq, cpld_isr, IRQF_DISABLED | IRQF_SHARED | IRQF_TRIGGER_FALLING, CPLD, value); if (retval) { irq_dispose_mapping(cpld_virq); return retval; } return 0; } module_init(cpld_init); module_exit(cpld_cleanup); MODULE_LICENSE(Dual BSD/GPL); It builds fine, but still gives me the same error when inserted into the kernel: # insmod /tmp/cpld.ko Unable to handle kernel paging request for instruction fetch Faulting instruction address: 0x Oops: Kernel access of bad area, sig: 11 [#1] PREEMPT SCPA-G2 Modules linked in: cpld(+) [last unloaded: immr] NIP: LR: c004bab8 CTR: REGS: dfb13df0 TRAP: 0400 Not tainted (2.6.26-twacs-100.0.0) MSR: 20001032 ME,IR,DR CR: 24244422 XER: 2000 TASK = df8b6c00[559] 'insmod' THREAD: dfb12000 GPR00: dfb13ea0 df8b6c00 0017 0001 0001 c02d1fb4 GPR08: 361a 44244484 10073f68 1ffcb000 007ffeb0 GPR16: 0080 b7f0 1006e3dc GPR24: 0002 2000 9032 df9da620 dfb12000 c02d40e4 0017 NIP [] 0x0 LR [c004bab8] setup_irq+0x4e0/0x510 Call Trace: [dfb13ea0] [c004ba70] setup_irq+0x498/0x510 (unreliable) [dfb13ed0] [c004bd64] request_irq+0xe0/0x130 [dfb13f00] [e107804c] cpld_init+0x4c/0xe8 [cpld] [dfb13f10] [c0048c4c] sys_init_module+0x14c/0x1d8 [dfb13f40] [c0010008] ret_from_syscall+0x0/0x38 --- Exception: c01 at 0xff27bb0 LR = 0x10019ca8 Instruction dump: Kernel panic - not syncing: Fatal exception Please let me know if I should be providing some additional information. Thanks, Sam ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
A better way to sequence driver initialization?
Guys: My recent post, Requesting a GPIO that hasn't been registered yet, and Anton's reply thereto (thanks, Anton!) on linuxppc-dev got me thinking about the problem of dependencies between devices in different classes, and/or between drivers/devices in general. I'd like to float an idea to see if it's worth pursuing. Changing the link order to get drivers to initialize in the right order will always be a problem for someone--- the order will be right for some people, but exactly wrong for others. And the problem is made worse for Device Tree-based systems, where just about everything including IRQ descriptors are created on a demand and/or as-available basis. What if we let the kernel sort those dependencies out for us, at runtime? I think it's possible, and I can't be the only one who would like to see this happen. There are two parts to my idea for a solution. First part is to modify do_initcalls() so that it launches each initcall function in its own kernel thread. Wait, don't panic yet! Second, we create/modify functions like gpio_request() so that if they fail, they can optionally stop in a wait queue until the call could succeed. Then gpiochip_add() would signal that queue each time a new gpiochip was added. Functions like request_irq() and clk_get()--- and probably many others--- would do the same thing, but with their own wait queues. Something like this: static wait_queue_head_t requestor_wq; int gpiochip_add(struct gpio_chip *chip) { ... wake_up_interruptible(requestor_wq); return status; } int gpio_request_wait(unsigned gpio, const char *label) { return wait_event_interruptible(requestor_wq, !gpio_request(gpio, label)); } Or we might want to make the above code be the actual gpio_request(), and bury the wait_event_interruptible() inside of that. Doing so would prevent having to touch a lot of code, but would definitely change the behavior of gpio_request(). Not sure which approach is best. Finally, I think that do_initcalls() would turn into something like this: static void __init do_initcalls(void) { initcall_t *fn; for (fn = __early_initcall_end; fn __initcall_end; fn++) kthread_create(do_one_initcall, *fn, do_initcalls); flush_scheduled_work(); } If someone doesn't tell me this is a stupid idea, I might post it to lkml. Now's your chance! :) b.g. -- Bill Gatliff b...@billgatliff.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] fsl_booke: Correct test for MMU_FTR_BIG_PHYS
The code was looking for this in cpu_features, not mmu_features. Fix this. Signed-off-by: Becky Bruce bec...@kernel.crashing.org --- arch/powerpc/mm/fsl_booke_mmu.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/mm/fsl_booke_mmu.c b/arch/powerpc/mm/fsl_booke_mmu.c index c539472..533cae5 100644 --- a/arch/powerpc/mm/fsl_booke_mmu.c +++ b/arch/powerpc/mm/fsl_booke_mmu.c @@ -116,7 +116,7 @@ void loadcam_entry(int idx) mtspr(SPRN_MAS2, TLBCAM[idx].MAS2); mtspr(SPRN_MAS3, TLBCAM[idx].MAS3); - if (cur_cpu_spec-cpu_features MMU_FTR_BIG_PHYS) + if (cur_cpu_spec-mmu_features MMU_FTR_BIG_PHYS) mtspr(SPRN_MAS7, TLBCAM[idx].MAS7); asm volatile(isync;tlbwe;isync : : : memory); @@ -152,7 +152,7 @@ static void settlbcam(int index, unsigned long virt, phys_addr_t phys, TLBCAM[index].MAS3 = (phys MAS3_RPN) | MAS3_SX | MAS3_SR; TLBCAM[index].MAS3 |= ((flags _PAGE_RW) ? MAS3_SW : 0); - if (cur_cpu_spec-cpu_features MMU_FTR_BIG_PHYS) + if (cur_cpu_spec-mmu_features MMU_FTR_BIG_PHYS) TLBCAM[index].MAS7 = (u64)phys 32; #ifndef CONFIG_KGDB /* want user access for breakpoints */ -- 1.6.0.6 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: A better way to sequence driver initialization?
Bill Gatliff wrote: If someone doesn't tell me this is a stupid idea, I might post it to lkml. Now's your chance! :) So I went ahead and tried it anyway: $ git diff init/main.c diff --git a/init/main.c b/init/main.c index dac44a9..1461d09 100644 --- a/init/main.c +++ b/init/main.c @@ -753,7 +753,11 @@ static void __init do_initcalls(void) initcall_t *fn; for (fn = __early_initcall_end; fn __initcall_end; fn++) +#if 1 + kthread_run(do_one_initcall, *fn, do_initcalls); +#else do_one_initcall(*fn); +#endif /* Make sure there is no pending stuff from the initcall sequence */ flush_scheduled_work(); Interestingly, things didn't blow up as quickly as I had expected they would. :) Booting with initcall_debug=1 both with and without the kthread_run() mod has proven insightful. The first OOPS I get with the mod looks like this: ... calling ch_init+0x0/0x18 @ 325 Unable to handle kernel NULL pointer dereference at virtual address pgd = c0004000 [] *pgd= Internal error: Oops: 5 [#4] PREEMPT last sysfs file: Modules linked in: CPU: 0 Tainted: GD (2.6.33-rc5-csb737-00010-gaba040a-dirty #59) PC is at kset_find_obj+0x18/0x98 LR is at kset_find_obj+0x18/0x98 pc : [c018156c] lr : [c018156c] psr: 2013 sp : c3937f68 ip : c3937f68 fp : r10: r9 : r8 : r7 : c0396a6f r6 : r5 : 0005 r4 : c03f03b0 r3 : 0003 r2 : c3936000 r1 : r0 : 0001 Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel Control: 0005317f Table: 20004000 DAC: 0017 Process do_initcalls (pid: 325, stack limit = 0xc3936270) Stack: (0xc3937f68 to 0xc3938000) 7f60: c03fc050 c03f03b0 0005 c03f03b0 c01c471c 7f80: c03f03b0 0005 c001e62c c024007c 1b58d32c 7fa0: 0005 c002c39c c381bfac c001e62c 1b58d32c 0005 c3937fd4 7fc0: c381bfac c001e62c c002c340 c005cb3c c3937fd8 c3937fd8 7fe0: c002df78 [c018156c] (kset_find_obj+0x18/0x98) from [c01c471c] (driver_register+0x88/0x150) [c01c471c] (driver_register+0x88/0x150) from [c024007c] (__hid_register_driver+0x38/0x70) [c024007c] (__hid_register_driver+0x38/0x70) from [c002c39c] (do_one_initcall+0x5c/0x1c4) [c002c39c] (do_one_initcall+0x5c/0x1c4) from [c005cb3c] (kthread+0x78/0x80) [c005cb3c] (kthread+0x78/0x80) from [c002df78] (kernel_thread_exit+0x0/0x8) Code: e24dd004 e3a1 e1a07001 ebfaf76c (e5963000) ---[ end trace f9bb68fd55862436 ]--- note: do_initcalls[325] exited with preempt_count 1 ... There is a ch_init() function in drivers/hid/hid-cherry.c. I think that function is calling hid_register_driver() before hid_init() has set up hid_bus_type. That causes kset_find_obj() to die because the kset structure it gets passed in ch_init() refers to a bus that doesn't exist yet. It's a plain 'ole initialization race. Maybe there are fewer places that would need wait queues than I originally thought! At least for drivers that use the device API, kset_find_obj() might be the only place that needs to wait until a device or bus appears. If I get a second next week, I might hack one in there and see how much farther I get... b.g. -- Bill Gatliff b...@billgatliff.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: A better way to sequence driver initialization?
Bill Gatliff wrote: Maybe there are fewer places that would need wait queues than I originally thought! At least for drivers that use the device API, kset_find_obj() might be the only place that needs to wait until a device or bus appears. ... and kset_register() might be the only place that calls wake_up_interruptible(). Wow. b.g. -- Bill Gatliff b...@billgatliff.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: A better way to sequence driver initialization?
Bill Gatliff wrote: ... and kset_register() might be the only place that calls wake_up_interruptible(). Wow. Wow, indeed. With just the attached patch, I get exactly one OOPS now--- in ssc_free(), which I think is getting called because atmel_ssc_modinit() is racing with something. I wasn't able to get all the way to mounting my NFS rootfs, I think because I'm getting to ip_auto_config() before my ethernet controller has registered! :) $ git diff lib/kobject.c diff --git a/lib/kobject.c b/lib/kobject.c index b512b74..e75556c 100644 --- a/lib/kobject.c +++ b/lib/kobject.c @@ -18,6 +18,10 @@ #include linux/stat.h #include linux/slab.h +#include linux/sched.h +#include linux/wait.h +DECLARE_WAIT_QUEUE_HEAD(kset_wait); + /* * populate_dir - populate directory with attributes. * @kobj: object we're working on. @@ -721,6 +725,7 @@ int kset_register(struct kset *k) if (err) return err; kobject_uevent(k-kobj, KOBJ_ADD); + wake_up_interruptible(kset_wait); return 0; } @@ -744,7 +749,7 @@ void kset_unregister(struct kset *k) * looking for a matching kobject. If matching object is found * take a reference and return the object. */ -struct kobject *kset_find_obj(struct kset *kset, const char *name) +struct kobject *__kset_find_obj(struct kset *kset, const char *name) { struct kobject *k; struct kobject *ret = NULL; @@ -760,6 +765,14 @@ struct kobject *kset_find_obj(struct kset *kset, const char *name) return ret; } +struct kobject *kset_find_obj(struct kset *kset, const char *name) +{ + struct kobject *ret = NULL; + wait_event_interruptible(kset_wait, (ret = __kset_find_obj(kset, name))); + return ret; +} + static void kset_release(struct kobject *kobj) { struct kset *kset = container_of(kobj, struct kset, kobj); b.g. -- Bill Gatliff b...@billgatliff.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Continual reading from the PowerPc time base register is not stable
Sorry, I attached the wrong log, this attachment is the right one. 2010/3/25 Csdncannon csdncan...@gmail.com In my program, the value of the 64-bit time base register is read out, and you will find the later value is even smaller than the earlier value from the log “log_timebase”. While the kernel depends on the accuracy of the timebase for the compensation of the lost PIT interrupt, the negative value between two continual timebase reading will bring to the jump of the jiffies. And this timebase problem will bring to the instability of the gettimeofday system call. Do you have any idea about this problem, thanks for your any advice. Attached is the code and log. Thanks Gino gettime.log Description: Binary data ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: A better way to sequence driver initialization?
On Fri, Apr 9, 2010 at 1:23 PM, Bill Gatliff b...@billgatliff.com wrote: Guys: My recent post, Requesting a GPIO that hasn't been registered yet, and Anton's reply thereto (thanks, Anton!) on linuxppc-dev got me thinking about the problem of dependencies between devices in different classes, and/or between drivers/devices in general. I'd like to float an idea to see if it's worth pursuing. Changing the link order to get drivers to initialize in the right order will always be a problem for someone--- the order will be right for some people, but exactly wrong for others. And the problem is made worse for Device Tree-based systems, where just about everything including IRQ descriptors are created on a demand and/or as-available basis. What if we let the kernel sort those dependencies out for us, at runtime? I think it's possible, and I can't be the only one who would like to see this happen. There are two parts to my idea for a solution. First part is to modify do_initcalls() so that it launches each initcall function in its own kernel thread. Wait, don't panic yet! Is initcall the right granularity? Shouldn't it be that each .probe() hook gets its own thread? Otherwise one device missing its resources could block another device using the same driver (whose resources are available) from probing. Regardless, parallel probe has be attempted and failed before. There are lots of fiddly bits to get right. In fact, there *used* to be code in the kernel that does exactly that. It was put in 2.6.20, but removed in 2.6.21-rc1. Here's the relevant commits, and a very interesting thread discussing the issues: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=21c7f30b1d3f8a3de3128478daca3ce203fc8733 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=5adc55da4a7758021bcc374904b0f8b076508a11 http://lkml.indiana.edu/hypermail/linux/kernel/0705.1/0205.html It is pointed out in that thread that a big part of the problem is that a large number of drivers in the tree just aren't safe for multithreaded probing which is kind of a showstopper. Now, maybe doing it at the initcall level makes it less scary and more sane, but I suspect that it will still expose a lot of broken code that assumes things are already set up because it has always been that way. Second, we create/modify functions like gpio_request() so that if they fail, they can optionally stop in a wait queue until the call could succeed. Then gpiochip_add() would signal that queue each time a new gpiochip was added. Functions like request_irq() and clk_get()--- and probably many others--- would do the same thing, but with their own wait queues. You could take a look at the bus notifier code too. I made an attempt at solving a similar problem between Ethernet MAC drivers and binding against the correct phy. I ended up solving the problem in a different way, but the experimenting I did convince me that it is a viable concept. Basically, if the driver depends on another resource/device then it can register a callback to be notified when a new device becomes available. Once the needed device shows up (or immediately if the device is already there), the driver gets called back and it can complete registration. Not quite the same model that you are talking about here, but it would solve the problem on a per-driver basis. Something like this: static wait_queue_head_t requestor_wq; int gpiochip_add(struct gpio_chip *chip) { ... wake_up_interruptible(requestor_wq); return status; } int gpio_request_wait(unsigned gpio, const char *label) { return wait_event_interruptible(requestor_wq, !gpio_request(gpio, label)); } Or we might want to make the above code be the actual gpio_request(), and bury the wait_event_interruptible() inside of that. Doing so would prevent having to touch a lot of code, but would definitely change the behavior of gpio_request(). Not sure which approach is best. Finally, I think that do_initcalls() would turn into something like this: static void __init do_initcalls(void) { initcall_t *fn; for (fn = __early_initcall_end; fn __initcall_end; fn++) kthread_create(do_one_initcall, *fn, do_initcalls); flush_scheduled_work(); } If someone doesn't tell me this is a stupid idea, I might post it to lkml. Now's your chance! :) Have you dug into the Arjan's asynchronous function call infrastructure? http://lwn.net/Articles/314808/ g. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev