date:20100409

[PATCH 0/5] sched: asymmetrical packing for POWER7 SMT4

2010-04-09 Thread Michael Neuling

This patch series implements asymmetric SMT packing which ensures
consistently good performance on POWER7.  

I'm no scheduler expert so these probably need some review.  It works
for my simple test case on a POWER7 with SMT4.  I can now run a number
of CPU bound processes and they are pulled down to lower threads
giving the best performance consistently. 

Suresh: some of this touches code you changed recently, so you may
want to test in your environment.

The 1st patch is a fix for SMT4 in general.
The 2nd adds actual the asymmetrical packing infrastructure.  
The 3rd adds the powerpc specific hooks for POWER7.
The 4th and 5th are fixes required for packing to work.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/5] sched: fix capacity calculations for SMT4

2010-04-09 Thread Michael Neuling

When calculating capacity we use the following calculation:

   capacity = DIV_ROUND_CLOSEST(power, SCHED_LOAD_SCALE);

In SMT2, power will be 1178/2 (provided we are not scaling power with
freq say) and SCHED_LOAD_SCALE will be 1024, resulting in capacity
being 1.

With SMT4 however, power will be 1178/4, hence capacity will end up as
0.

Fix this by ensuring capacity is always at least 1 after this
calculation.

Signed-off-by: Michael Neuling mi...@neuling.org
---
I'm not sure this is the correct fix but this works for me.  
Original post here: 
  http://lkml.org/lkml/2010/3/30/884

---

 kernel/sched_fair.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

Index: linux-2.6-ozlabs/kernel/sched_fair.c
===
--- linux-2.6-ozlabs.orig/kernel/sched_fair.c
+++ linux-2.6-ozlabs/kernel/sched_fair.c
@@ -1482,6 +1482,7 @@ static int select_task_rq_fair(struct ta
}
 
capacity = DIV_ROUND_CLOSEST(power, SCHED_LOAD_SCALE);
+   capacity = max(capacity, 1UL);
 
if (tmp-flags  SD_POWERSAVINGS_BALANCE)
nr_running /= 2;
@@ -2488,6 +2489,7 @@ static inline void update_sg_lb_stats(st
 
sgs-group_capacity =
DIV_ROUND_CLOSEST(group-cpu_power, SCHED_LOAD_SCALE);
+   sgs-group_capacity = max(sgs-group_capacity, 1UL);
 }
 
 /**
@@ -2795,9 +2797,11 @@ find_busiest_queue(struct sched_group *g
 
for_each_cpu(i, sched_group_cpus(group)) {
unsigned long power = power_of(i);
-   unsigned long capacity = DIV_ROUND_CLOSEST(power, 
SCHED_LOAD_SCALE);
+   unsigned long capacity;
unsigned long wl;
 
+   capacity = max(DIV_ROUND_CLOSEST(power, SCHED_LOAD_SCALE), 1UL);
+
if (!cpumask_test_cpu(i, cpus))
continue;
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/5] sched: add asymmetric packing option for sibling domain

2010-04-09 Thread Michael Neuling

Some CPUs perform better when tasks are run on lower thread numbers.
In the case of POWER7, when higher threads are idled, the core can run
in lower SMT modes and hence perform better.

This creates a new sd flag to prefer lower threads. 

Based heavily on patch from Peter Zijlstra.  

Signed-off-by: Michael Neuling mi...@neuling.org
---
Peter: Since this is based mainly off your initial patch, it should
have your signed-off-by too, but I didn't want to add without your
permission.  Can I add it?

---

 include/linux/sched.h|4 ++
 include/linux/topology.h |1 
 kernel/sched_fair.c  |   64 ---
 3 files changed, 65 insertions(+), 4 deletions(-)

Index: linux-2.6-ozlabs/include/linux/sched.h
===
--- linux-2.6-ozlabs.orig/include/linux/sched.h
+++ linux-2.6-ozlabs/include/linux/sched.h
@@ -799,7 +799,7 @@ enum cpu_idle_type {
 #define SD_POWERSAVINGS_BALANCE0x0100  /* Balance for power savings */
 #define SD_SHARE_PKG_RESOURCES 0x0200  /* Domain members share cpu pkg 
resources */
 #define SD_SERIALIZE   0x0400  /* Only a single load balancing 
instance */
-
+#define SD_ASYM_PACKING0x0800  /* Place busy groups earlier in 
the domain */
 #define SD_PREFER_SIBLING  0x1000  /* Prefer to place tasks in a sibling 
domain */
 
 enum powersavings_balance_level {
@@ -834,6 +834,8 @@ static inline int sd_balance_for_package
return SD_PREFER_SIBLING;
 }
 
+extern int __weak arch_sd_sibiling_asym_packing(void);
+
 /*
  * Optimise SD flags for power savings:
  * SD_BALANCE_NEWIDLE helps agressive task consolidation and power savings.
Index: linux-2.6-ozlabs/include/linux/topology.h
===
--- linux-2.6-ozlabs.orig/include/linux/topology.h
+++ linux-2.6-ozlabs/include/linux/topology.h
@@ -102,6 +102,7 @@ int arch_update_cpu_topology(void);
| 1*SD_SHARE_PKG_RESOURCES  \
| 0*SD_SERIALIZE\
| 0*SD_PREFER_SIBLING   \
+   | arch_sd_sibiling_asym_packing()   \
,   \
.last_balance   = jiffies,  \
.balance_interval   = 1,\
Index: linux-2.6-ozlabs/kernel/sched_fair.c
===
--- linux-2.6-ozlabs.orig/kernel/sched_fair.c
+++ linux-2.6-ozlabs/kernel/sched_fair.c
@@ -2493,6 +2493,31 @@ static inline void update_sg_lb_stats(st
 }
 
 /**
+ * update_sd_pick_busiest - return 1 on busiest
+ */
+static int update_sd_pick_busiest(struct sched_domain *sd,
+ struct sd_lb_stats *sds,
+ struct sched_group *sg,
+ struct sg_lb_stats *sgs)
+{
+   if (sgs-sum_nr_running  sgs-group_capacity)
+   return 1;
+
+   if (sgs-group_imb)
+   return 1;
+
+   if ((sd-flags  SD_ASYM_PACKING)  sgs-sum_nr_running) {
+   if (!sds-busiest)
+   return 1;
+
+   if (group_first_cpu(sds-busiest)  group_first_cpu(sg))
+   return 1;
+   }
+
+   return 0;
+}
+
+/**
  * update_sd_lb_stats - Update sched_group's statistics for load balancing.
  * @sd: sched_domain whose statistics are to be updated.
  * @this_cpu: Cpu for which load balance is currently performed.
@@ -2546,9 +2571,8 @@ static inline void update_sd_lb_stats(st
sds-this = group;
sds-this_nr_running = sgs.sum_nr_running;
sds-this_load_per_task = sgs.sum_weighted_load;
-   } else if (sgs.avg_load  sds-max_load 
-  (sgs.sum_nr_running  sgs.group_capacity ||
-   sgs.group_imb)) {
+   } else if (sgs.avg_load = sds-max_load 
+  update_sd_pick_busiest(sd, sds, group, sgs)) {
sds-max_load = sgs.avg_load;
sds-busiest = group;
sds-busiest_nr_running = sgs.sum_nr_running;
@@ -2562,6 +2586,36 @@ static inline void update_sd_lb_stats(st
} while (group != sd-groups);
 }
 
+int __weak arch_sd_sibiling_asym_packing(void)
+{
+   return 0*SD_ASYM_PACKING;
+}
+
+/**
+ * check_asym_packing - Check to see if we the group is packed into
+ * the sched doman
+ */
+static int check_asym_packing(struct sched_domain *sd,
+ struct sd_lb_stats *sds,
+ int this_cpu, unsigned long *imbalance)
+{
+   int busiest_cpu;
+
+   if (!(sd-flags  SD_ASYM_PACKING))
+   return 0;
+
+

[PATCH 3/5] powerpc: enabled asymmetric SMT scheduling on POWER7

2010-04-09 Thread Michael Neuling

The POWER7 core has dynamic SMT mode switching which is controlled by
the hypervisor.  There are 3 SMT modes:
SMT1 uses thread  0
SMT2 uses threads 0  1
SMT4 uses threads 0, 1, 2  3
When in any particular SMT mode, all threads have the same performance
as each other (ie. at any moment in time, all threads perform the same).  

The SMT mode switching works such that when linux has threads 2  3 idle
and 0  1 active, it will cede (H_CEDE hypercall) threads 2 and 3 in the
idle loop and the hypervisor will automatically switch to SMT2 for that
core (independent of other cores).  The opposite is not true, so if
threads 0  1 are idle and 2  3 are active, we will stay in SMT4 mode.

Similarly if thread 0 is active and threads 1, 2  3 are idle, we'll go
into SMT1 mode.  

If we can get the core into a lower SMT mode (SMT1 is best), the threads
will perform better (since they share less core resources).  Hence when
we have idle threads, we want them to be the higher ones.

This adds a feature bit for asymmetric packing to powerpc and then
enables it on POWER7. 

Signed-off-by: Michael Neuling mi...@neuling.org

---

 arch/powerpc/include/asm/cputable.h |3 ++-
 arch/powerpc/kernel/process.c   |   13 +
 2 files changed, 15 insertions(+), 1 deletion(-)

Index: linux-2.6-ozlabs/arch/powerpc/include/asm/cputable.h
===
--- linux-2.6-ozlabs.orig/arch/powerpc/include/asm/cputable.h
+++ linux-2.6-ozlabs/arch/powerpc/include/asm/cputable.h
@@ -195,6 +195,7 @@ extern const char *powerpc_base_platform
 #define CPU_FTR_SAOLONG_ASM_CONST(0x0020)
 #define CPU_FTR_CP_USE_DCBTZ   LONG_ASM_CONST(0x0040)
 #define CPU_FTR_UNALIGNED_LD_STD   LONG_ASM_CONST(0x0080)
+#define CPU_FTR_ASYM_SMT   LONG_ASM_CONST(0x0100)
 
 #ifndef __ASSEMBLY__
 
@@ -409,7 +410,7 @@ extern const char *powerpc_base_platform
CPU_FTR_MMCRA | CPU_FTR_SMT | \
CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | \
CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
-   CPU_FTR_DSCR | CPU_FTR_SAO)
+   CPU_FTR_DSCR | CPU_FTR_SAO  | CPU_FTR_ASYM_SMT)
 #define CPU_FTRS_CELL  (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
===
--- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
+++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
@@ -1265,3 +1265,16 @@ unsigned long randomize_et_dyn(unsigned 
 
return ret;
 }
+
+unsigned char asym_smt_print = 1;
+int arch_sd_sibiling_asym_packing(void)
+{
+   if (cpu_has_feature(CPU_FTR_ASYM_SMT)){
+   if (asym_smt_print) {
+   pr_info(Enabling Asymetric SMT scheduling\n);
+   asym_smt_print = 0;
+   }
+   return SD_ASYM_PACKING;
+   }
+   return 0;
+}
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 5/5] sched: make fix_small_imbalance work with asymmetric packing

2010-04-09 Thread Michael Neuling

With the asymmetric packing infrastructure, fix_small_imbalance is
causing idle higher threads to pull tasks off lower threads.  

This is being caused by an off-by-one error.  

Signed-off-by: Michael Neuling mi...@neuling.org
---
I'm not sure this is the right fix but without it, higher threads pull
tasks off the lower threads, then the packing pulls it back down, etc
etc and tasks bounce around constantly.

---

 kernel/sched_fair.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6-ozlabs/kernel/sched_fair.c
===
--- linux-2.6-ozlabs.orig/kernel/sched_fair.c
+++ linux-2.6-ozlabs/kernel/sched_fair.c
@@ -2652,7 +2652,7 @@ static inline void fix_small_imbalance(s
 * SCHED_LOAD_SCALE;
scaled_busy_load_per_task /= sds-busiest-cpu_power;
 
-   if (sds-max_load - sds-this_load + scaled_busy_load_per_task =
+   if (sds-max_load - sds-this_load + scaled_busy_load_per_task 
(scaled_busy_load_per_task * imbn)) {
*imbalance = sds-busiest_load_per_task;
return;
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 4/5] sched: Mark the balance type for use in need_active_balance()

2010-04-09 Thread Michael Neuling

need_active_balance() gates the asymmetric packing based due to power
save logic, but for packing we don't care.

This marks the type of balanace we are attempting to do perform from
f_b_g() and stops need_active_balance() power save logic gating a
balance in the asymmetric packing case.

Signed-off-by: Michael Neuling mi...@neuling.org

---

 kernel/sched_fair.c |   24 +++-
 1 file changed, 19 insertions(+), 5 deletions(-)

Index: linux-2.6-ozlabs/kernel/sched_fair.c
===
--- linux-2.6-ozlabs.orig/kernel/sched_fair.c
+++ linux-2.6-ozlabs/kernel/sched_fair.c
@@ -91,6 +91,13 @@ const_debug unsigned int sysctl_sched_mi
 
 static const struct sched_class fair_sched_class;
 
+enum balance_type {
+   BALANCE_NONE = 0,
+   BALANCE_LOAD,
+   BALANCE_POWER,
+   BALANCE_PACKING
+};
+
 /**
  * CFS operations on generic schedulable entities:
  */
@@ -2783,7 +2790,8 @@ static inline void calculate_imbalance(s
 static struct sched_group *
 find_busiest_group(struct sched_domain *sd, int this_cpu,
   unsigned long *imbalance, enum cpu_idle_type idle,
-  int *sd_idle, const struct cpumask *cpus, int *balance)
+  int *sd_idle, const struct cpumask *cpus, int *balance,
+  enum balance_type *bt)
 {
struct sd_lb_stats sds;
 
@@ -2808,6 +2816,7 @@ find_busiest_group(struct sched_domain *
if (!(*balance))
goto ret;
 
+   *bt = BALANCE_PACKING;
if ((idle == CPU_IDLE || idle == CPU_NEWLY_IDLE) 
check_asym_packing(sd, sds, this_cpu, imbalance))
return sds.busiest;
@@ -2828,6 +2837,7 @@ find_busiest_group(struct sched_domain *
 
/* Looks like there is an imbalance. Compute it */
calculate_imbalance(sds, this_cpu, imbalance);
+   *bt = BALANCE_LOAD;
return sds.busiest;
 
 out_balanced:
@@ -2835,10 +2845,12 @@ out_balanced:
 * There is no obvious imbalance. But check if we can do some balancing
 * to save power.
 */
+   *bt = BALANCE_POWER;
if (check_power_save_busiest_group(sds, this_cpu, imbalance))
return sds.busiest;
 ret:
*imbalance = 0;
+   *bt = BALANCE_NONE;
return NULL;
 }
 
@@ -2899,9 +2911,10 @@ find_busiest_queue(struct sched_group *g
 /* Working cpumask for load_balance and load_balance_newidle. */
 static DEFINE_PER_CPU(cpumask_var_t, load_balance_tmpmask);
 
-static int need_active_balance(struct sched_domain *sd, int sd_idle, int idle)
+static int need_active_balance(struct sched_domain *sd, int sd_idle, int idle,
+  enum balance_type *bt)
 {
-   if (idle == CPU_NEWLY_IDLE) {
+   if (idle == CPU_NEWLY_IDLE  *bt != BALANCE_PACKING) {
/*
 * The only task running in a non-idle cpu can be moved to this
 * cpu in an attempt to completely freeup the other CPU
@@ -2946,6 +2959,7 @@ static int load_balance(int this_cpu, st
struct rq *busiest;
unsigned long flags;
struct cpumask *cpus = __get_cpu_var(load_balance_tmpmask);
+   enum balance_type bt;
 
cpumask_copy(cpus, cpu_active_mask);
 
@@ -2964,7 +2978,7 @@ static int load_balance(int this_cpu, st
 redo:
update_shares(sd);
group = find_busiest_group(sd, this_cpu, imbalance, idle, sd_idle,
-  cpus, balance);
+  cpus, balance, bt);
 
if (*balance == 0)
goto out_balanced;
@@ -3018,7 +3032,7 @@ redo:
schedstat_inc(sd, lb_failed[idle]);
sd-nr_balance_failed++;
 
-   if (need_active_balance(sd, sd_idle, idle)) {
+   if (need_active_balance(sd, sd_idle, idle, bt)) {
raw_spin_lock_irqsave(busiest-lock, flags);
 
/* don't kick the migration_thread, if the curr
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/5] powerpc: enabled asymmetric SMT scheduling on POWER7

2010-04-09 Thread Michael Neuling



In message 20100409062118.f04b4cb...@localhost.localdomain you wrote:
 The POWER7 core has dynamic SMT mode switching which is controlled by
 the hypervisor.  There are 3 SMT modes:
   SMT1 uses thread  0
   SMT2 uses threads 0  1
   SMT4 uses threads 0, 1, 2  3
 When in any particular SMT mode, all threads have the same performance
 as each other (ie. at any moment in time, all threads perform the same).  
 
 The SMT mode switching works such that when linux has threads 2  3 idle
 and 0  1 active, it will cede (H_CEDE hypercall) threads 2 and 3 in the
 idle loop and the hypervisor will automatically switch to SMT2 for that
 core (independent of other cores).  The opposite is not true, so if
 threads 0  1 are idle and 2  3 are active, we will stay in SMT4 mode.
 
 Similarly if thread 0 is active and threads 1, 2  3 are idle, we'll go
 into SMT1 mode.  
 
 If we can get the core into a lower SMT mode (SMT1 is best), the threads
 will perform better (since they share less core resources).  Hence when
 we have idle threads, we want them to be the higher ones.
 
 This adds a feature bit for asymmetric packing to powerpc and then
 enables it on POWER7. 
 
 Signed-off-by: Michael Neuling mi...@neuling.org
 
 ---
 
  arch/powerpc/include/asm/cputable.h |3 ++-
  arch/powerpc/kernel/process.c   |   13 +
  2 files changed, 15 insertions(+), 1 deletion(-)
 
 Index: linux-2.6-ozlabs/arch/powerpc/include/asm/cputable.h
 ===
 --- linux-2.6-ozlabs.orig/arch/powerpc/include/asm/cputable.h
 +++ linux-2.6-ozlabs/arch/powerpc/include/asm/cputable.h
 @@ -195,6 +195,7 @@ extern const char *powerpc_base_platform
  #define CPU_FTR_SAO  LONG_ASM_CONST(0x0020)
  #define CPU_FTR_CP_USE_DCBTZ LONG_ASM_CONST(0x0040)
  #define CPU_FTR_UNALIGNED_LD_STD LONG_ASM_CONST(0x0080)
 +#define CPU_FTR_ASYM_SMT LONG_ASM_CONST(0x0100)
  
  #ifndef __ASSEMBLY__
  
 @@ -409,7 +410,7 @@ extern const char *powerpc_base_platform
   CPU_FTR_MMCRA | CPU_FTR_SMT | \
   CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | \
   CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
 - CPU_FTR_DSCR | CPU_FTR_SAO)
 + CPU_FTR_DSCR | CPU_FTR_SAO  | CPU_FTR_ASYM_SMT)
  #define CPU_FTRS_CELL(CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
   CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
   CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
 Index: linux-2.6-ozlabs/arch/powerpc/kernel/process.c
 ===
 --- linux-2.6-ozlabs.orig/arch/powerpc/kernel/process.c
 +++ linux-2.6-ozlabs/arch/powerpc/kernel/process.c
 @@ -1265,3 +1265,16 @@ unsigned long randomize_et_dyn(unsigned 
  
   return ret;
  }
 +
 +unsigned char asym_smt_print = 1;
 +int arch_sd_sibiling_asym_packing(void)
 +{
 + if (cpu_has_feature(CPU_FTR_ASYM_SMT)){
 + if (asym_smt_print) {
 + pr_info(Enabling Asymetric SMT scheduling\n);
 + asym_smt_print = 0;
 + }

Oops...  before anyone else laughs at me, I'm changing this to use
printk_once.

Mikey

 + return SD_ASYM_PACKING;
 + }
 + return 0;
 +}
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 net-next 0/2] timestamping support for gianfar

2010-04-09 Thread Manfred Rudigier

On 2010-04-07 Manfred Rudigier wrote:
 this patch series adds support for hardware time stamping to gianfar. It uses
 the new SO_TIMESTAMPING infrastructure to deliver raw hardware timestamps to
 user space applications.
 
 Freescale CPUs with an eTSEC are able to time stamp all incoming network 
 packets
 and can also time stamp transmit packets when instructed. The time stamps are
 generated by the eTSEC timer clock module which is running either from an
 external oscillator or internal clock.
 
 The submitted patches do not initialize the timer clock module since the
 oscillator frequency might be different from board to board. Thus the user
 must configure the timer clock module by hand at the moment - otherwise no
 time stamps will be reported. Below is a simple example code which
 shows how to configure the timer clock module on the P2020DS/RDB. It can be
 used to quickly try out the patches.
 
 Testing was done with the time stamping program from Patrick Ohly which can
 be found in the kernel sources under Documentation/networking/timestamping.
 I have verified the functionality on the MPC8313RDB, P2020DS and P2020RDB
 board with the latest net-2.6 kernel. Send and receive time stamps could be
 retrieved on all eTSEC ports.

Hello,

I've modified the patches according to your comments so far. 

Version 2 changes:
- Only two instead of four patches so every patch adds some useful functionality
- hwts_rx_en and hwts_tx_en are no bitfields any more

Manfred
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 net-next 1/2] gianfar: Add hardware RX timestamping support

2010-04-09 Thread Manfred Rudigier

The device is configured to insert hardware timestamps into all
received packets. The RX timestamps are extracted from the padding
alingment bytes during the clean_rx_ring operation and copied into the
skb_shared_hwtstamps struct of the skb. This extraction only happens if
the rx_filter was set to something else than HWTSTAMP_FILTER_NONE with
the SIOCSHWTSTAMP ioctl command.

Hardware timestamping is only supported for eTSEC devices. To indicate
device support the new FSL_GIANFAR_DEV_HAS_TIMER flag was introduced.

Signed-off-by: Manfred Rudigier manfred.rudig...@omicron.at
---
 drivers/net/gianfar.c |   66 +---
 drivers/net/gianfar.h |5 +++
 2 files changed, 66 insertions(+), 5 deletions(-)

diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 080d1ce..552e10d 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -82,6 +82,7 @@
 #include linux/tcp.h
 #include linux/udp.h
 #include linux/in.h
+#include linux/net_tstamp.h
 
 #include asm/io.h
 #include asm/irq.h
@@ -377,6 +378,13 @@ static void gfar_init_mac(struct net_device *ndev)
rctrl |= RCTRL_PADDING(priv-padding);
}
 
+   /* Insert receive time stamps into padding alignment bytes */
+   if (priv-device_flags  FSL_GIANFAR_DEV_HAS_TIMER) {
+   rctrl = ~RCTRL_PAL_MASK;
+   rctrl |= RCTRL_PRSDEP_INIT | RCTRL_TS_ENABLE | RCTRL_PADDING(8);
+   priv-padding = 8;
+   }
+
/* keep vlan related bits if it's enabled */
if (priv-vlgrp) {
rctrl |= RCTRL_VLEX | RCTRL_PRSDEP_INIT;
@@ -501,7 +509,8 @@ void unlock_tx_qs(struct gfar_private *priv)
 /* Returns 1 if incoming frames use an FCB */
 static inline int gfar_uses_fcb(struct gfar_private *priv)
 {
-   return priv-vlgrp || priv-rx_csum_enable;
+   return priv-vlgrp || priv-rx_csum_enable ||
+   (priv-device_flags  FSL_GIANFAR_DEV_HAS_TIMER);
 }
 
 static void free_tx_pointers(struct gfar_private *priv)
@@ -742,7 +751,8 @@ static int gfar_of_init(struct of_device *ofdev, struct 
net_device **pdev)
FSL_GIANFAR_DEV_HAS_CSUM |
FSL_GIANFAR_DEV_HAS_VLAN |
FSL_GIANFAR_DEV_HAS_MAGIC_PACKET |
-   FSL_GIANFAR_DEV_HAS_EXTENDED_HASH;
+   FSL_GIANFAR_DEV_HAS_EXTENDED_HASH |
+   FSL_GIANFAR_DEV_HAS_TIMER;
 
ctype = of_get_property(np, phy-connection-type, NULL);
 
@@ -772,6 +782,38 @@ err_grp_init:
return err;
 }
 
+static int gfar_hwtstamp_ioctl(struct net_device *netdev,
+   struct ifreq *ifr, int cmd)
+{
+   struct hwtstamp_config config;
+   struct gfar_private *priv = netdev_priv(netdev);
+
+   if (copy_from_user(config, ifr-ifr_data, sizeof(config)))
+   return -EFAULT;
+
+   /* reserved for future extensions */
+   if (config.flags)
+   return -EINVAL;
+
+   if (config.tx_type)
+   return -ERANGE;
+
+   switch (config.rx_filter) {
+   case HWTSTAMP_FILTER_NONE:
+   priv-hwts_rx_en = 0;
+   break;
+   default:
+   if (!(priv-device_flags  FSL_GIANFAR_DEV_HAS_TIMER))
+   return -ERANGE;
+   priv-hwts_rx_en = 1;
+   config.rx_filter = HWTSTAMP_FILTER_ALL;
+   break;
+   }
+
+   return copy_to_user(ifr-ifr_data, config, sizeof(config)) ?
+   -EFAULT : 0;
+}
+
 /* Ioctl MII Interface */
 static int gfar_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 {
@@ -780,6 +822,9 @@ static int gfar_ioctl(struct net_device *dev, struct ifreq 
*rq, int cmd)
if (!netif_running(dev))
return -EINVAL;
 
+   if (cmd == SIOCSHWTSTAMP)
+   return gfar_hwtstamp_ioctl(dev, rq, cmd);
+
if (!priv-phydev)
return -ENODEV;
 
@@ -982,7 +1027,8 @@ static int gfar_probe(struct of_device *ofdev,
else
priv-padding = 0;
 
-   if (dev-features  NETIF_F_IP_CSUM)
+   if (dev-features  NETIF_F_IP_CSUM ||
+   priv-device_flags  FSL_GIANFAR_DEV_HAS_TIMER)
dev-hard_header_len += GMAC_FCB_LEN;
 
/* Program the isrg regs only if number of grps  1 */
@@ -2474,6 +2520,17 @@ static int gfar_process_frame(struct net_device *dev, 
struct sk_buff *skb,
skb_pull(skb, amount_pull);
}
 
+   /* Get receive timestamp from the skb */
+   if (priv-hwts_rx_en) {
+   struct skb_shared_hwtstamps *shhwtstamps = skb_hwtstamps(skb);
+   u64 *ns = (u64 *) skb-data;
+   memset(shhwtstamps, 0, sizeof(*shhwtstamps));
+   shhwtstamps-hwtstamp = ns_to_ktime(*ns);
+   }
+
+   if (priv-padding)
+   skb_pull(skb, priv-padding);
+
if (priv-rx_csum_enable)
gfar_rx_checksum(skb, fcb);
 
@@

[PATCH v2 net-next 2/2] gianfar: Add hardware TX timestamping support

2010-04-09 Thread Manfred Rudigier

If a packet has the skb_shared_tx-hardware flag set the device is
instructed to generate a TX timestamp and write it back to memory after
the frame is transmitted. During the clean_tx_ring operation the
timestamp will be extracted and copied into the skb_shared_hwtstamps
struct of the skb.

TX timestamping is enabled by setting the tx_type to something else
than HWTSTAMP_TX_OFF with the SIOCSHWTSTAMP ioctl command. It is only
supported by eTSEC devices.

Signed-off-by: Manfred Rudigier manfred.rudig...@omicron.at
---
 drivers/net/gianfar.c |  118 +
 drivers/net/gianfar.h |3 +-
 2 files changed, 101 insertions(+), 20 deletions(-)

diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 552e10d..becc3f3 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -795,8 +795,18 @@ static int gfar_hwtstamp_ioctl(struct net_device *netdev,
if (config.flags)
return -EINVAL;
 
-   if (config.tx_type)
+   switch (config.tx_type) {
+   case HWTSTAMP_TX_OFF:
+   priv-hwts_tx_en = 0;
+   break;
+   case HWTSTAMP_TX_ON:
+   if (!(priv-device_flags  FSL_GIANFAR_DEV_HAS_TIMER))
+   return -ERANGE;
+   priv-hwts_tx_en = 1;
+   break;
+   default:
return -ERANGE;
+   }
 
switch (config.rx_filter) {
case HWTSTAMP_FILTER_NONE:
@@ -1972,23 +1982,29 @@ static int gfar_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
struct netdev_queue *txq;
struct gfar __iomem *regs = NULL;
struct txfcb *fcb = NULL;
-   struct txbd8 *txbdp, *txbdp_start, *base;
+   struct txbd8 *txbdp, *txbdp_start, *base, *txbdp_tstamp = NULL;
u32 lstatus;
-   int i, rq = 0;
+   int i, rq = 0, do_tstamp = 0;
u32 bufaddr;
unsigned long flags;
-   unsigned int nr_frags, length;
-
+   unsigned int nr_frags, nr_txbds, length;
+   union skb_shared_tx *shtx;
 
rq = skb-queue_mapping;
tx_queue = priv-tx_queue[rq];
txq = netdev_get_tx_queue(dev, rq);
base = tx_queue-tx_bd_base;
regs = tx_queue-grp-regs;
+   shtx = skb_tx(skb);
+
+   /* check if time stamp should be generated */
+   if (unlikely(shtx-hardware  priv-hwts_tx_en))
+   do_tstamp = 1;
 
/* make space for additional header when fcb is needed */
if (((skb-ip_summed == CHECKSUM_PARTIAL) ||
-   (priv-vlgrp  vlan_tx_tag_present(skb))) 
+   (priv-vlgrp  vlan_tx_tag_present(skb)) ||
+   unlikely(do_tstamp)) 
(skb_headroom(skb)  GMAC_FCB_LEN)) {
struct sk_buff *skb_new;
 
@@ -2005,8 +2021,14 @@ static int gfar_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
/* total number of fragments in the SKB */
nr_frags = skb_shinfo(skb)-nr_frags;
 
+   /* calculate the required number of TxBDs for this skb */
+   if (unlikely(do_tstamp))
+   nr_txbds = nr_frags + 2;
+   else
+   nr_txbds = nr_frags + 1;
+
/* check if there is space to queue this packet */
-   if ((nr_frags+1)  tx_queue-num_txbdfree) {
+   if (nr_txbds  tx_queue-num_txbdfree) {
/* no space, stop the queue */
netif_tx_stop_queue(txq);
dev-stats.tx_fifo_errors++;
@@ -2018,9 +2040,19 @@ static int gfar_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
txq-tx_packets ++;
 
txbdp = txbdp_start = tx_queue-cur_tx;
+   lstatus = txbdp-lstatus;
+
+   /* Time stamp insertion requires one additional TxBD */
+   if (unlikely(do_tstamp))
+   txbdp_tstamp = txbdp = next_txbd(txbdp, base,
+   tx_queue-tx_ring_size);
 
if (nr_frags == 0) {
-   lstatus = txbdp-lstatus | BD_LFLAG(TXBD_LAST | TXBD_INTERRUPT);
+   if (unlikely(do_tstamp))
+   txbdp_tstamp-lstatus |= BD_LFLAG(TXBD_LAST |
+   TXBD_INTERRUPT);
+   else
+   lstatus |= BD_LFLAG(TXBD_LAST | TXBD_INTERRUPT);
} else {
/* Place the fragment addresses and lengths into the TxBDs */
for (i = 0; i  nr_frags; i++) {
@@ -2066,11 +2098,32 @@ static int gfar_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
gfar_tx_vlan(skb, fcb);
}
 
-   /* setup the TxBD length and buffer pointer for the first BD */
+   /* Setup tx hardware time stamping if requested */
+   if (unlikely(do_tstamp)) {
+   shtx-in_progress = 1;
+   if (fcb == NULL)
+   fcb = gfar_add_fcb(skb);
+   fcb-ptp = 1;
+   lstatus |= BD_LFLAG(TXBD_TOE);
+   }
+
txbdp_start-bufPtr = dma_map_single(priv-ofdev-dev,

Re: [-next April 8] eHEA driver failure on powerpc

2010-04-09 Thread Sachin Sant


Sachin Sant wrote:

With today's next release, eHEA network interface on couple
of power6 boxes fails to initialize.

# modprobe ehea
IBM eHEA ethernet device driver (Release EHEA_0102)
 alloc irq_desc for 256 on node 0
 alloc kstat_irqs on node 0
irq: irq 590080 on host null mapped to virtual irq 256
ehea: Error in ehea_plpar_hcall_norets: opcode=26c 
ret=fffc arg1=8300 arg2=0 
arg3=70050400 arg4=fc9b arg5=200 arg6=0 arg7=0

ehea: Error in ehea_reg_mr_section: register_rpage_mr failed
ehea: Error in ehea_reg_kernel_mr: registering mr failed
ehea: Error in ehea_setup_ports: creating MR failed
ehea 23c00200.lhea: setup_ports failed
ehea: probe of 23c00200.lhea failed with error -5

I tracked this problem to the following commit.

commit 7545ba6f82924d4523f8f8a2baf2e517a750265d
powerpc/mm: Bump SECTION_SIZE_BITS from 16MB to 256MB

If i revert this commit, the network interface is initialized
properly. Verified this on two different power6 boxes.

Thanks
-Sachin

--

-
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
-

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] [V5] Add non-Virtex5 support for LL TEMAC driver

2010-04-09 Thread Grant Likely

On Thu, Apr 8, 2010 at 11:08 AM, John Linn john.l...@xilinx.com wrote:
 This patch adds support for using the LL TEMAC Ethernet driver on
 non-Virtex 5 platforms by adding support for accessing the Soft DMA
 registers as if they were memory mapped instead of solely through the
 DCR's (available on the Virtex 5).

 The patch also updates the driver so that it runs on the MicroBlaze.
 The changes were tested on the PowerPC 440, PowerPC 405, and the
 MicroBlaze platforms.

 Signed-off-by: John Tyner jty...@cs.ucr.edu
 Signed-off-by: John Linn john.l...@xilinx.com

Picked up and build tested both patches on 405, 440, 60x and ppc64.
No build problems found either built-in or as a module.

for both:
Acked-by: Grant Likely grant.lik...@secretlab.ca

g.


 ---

 V2 - Incorporated comments from Grant and added more logic to allow the driver
 to work on MicroBlaze.

 V3 - Only updated it to apply to head, minor change to include slab.h. Also
 verified that it now builds for MicroBlaze. Retested on PowerPC and 
 MicroBlaze.

 V4 - Removed buffer alignment for skb and called the network functions that
 already do the alignment for cache line and word alignment. Added constants
 to MicroBlaze system to make sure network alignment is maintained. Also 
 updated
 the Kconfig so it depends on Microblaze or PPC based on Grant's comment.

 V5 - Respun the patch on top of a new patch to the driver which removed the
 call to virt_to_bus as it's now illegal and caused a failure when building
 the driver in linux-next. Retested with 440, 405 and Microblaze.

 Grant, can you do a build test to verify no build issues?
 ---
  arch/microblaze/include/asm/system.h |   11 +++
  drivers/net/Kconfig                  |    2 +-
  drivers/net/ll_temac.h               |   14 +++-
  drivers/net/ll_temac_main.c          |  137 +
  4 files changed, 126 insertions(+), 38 deletions(-)

 diff --git a/arch/microblaze/include/asm/system.h 
 b/arch/microblaze/include/asm/system.h
 index 59efb3f..48c4f03 100644
 --- a/arch/microblaze/include/asm/system.h
 +++ b/arch/microblaze/include/asm/system.h
 @@ -12,6 +12,7 @@
  #include asm/registers.h
  #include asm/setup.h
  #include asm/irqflags.h
 +#include asm/cache.h

  #include asm-generic/cmpxchg.h
  #include asm-generic/cmpxchg-local.h
 @@ -96,4 +97,14 @@ extern struct dentry *of_debugfs_root;

  #define arch_align_stack(x) (x)

 +/*
 + * MicroBlaze doesn't handle unaligned accesses in hardware.
 + *
 + * Based on this we force the IP header alignment in network drivers.
 + * We also modify NET_SKB_PAD to be a cacheline in size, thus maintaining
 + * cacheline alignment of buffers.
 + */
 +#define NET_IP_ALIGN   2
 +#define NET_SKB_PAD    L1_CACHE_BYTES
 +
  #endif /* _ASM_MICROBLAZE_SYSTEM_H */
 diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
 index 7b832c7..9073741 100644
 --- a/drivers/net/Kconfig
 +++ b/drivers/net/Kconfig
 @@ -2434,8 +2434,8 @@ config MV643XX_ETH

  config XILINX_LL_TEMAC
        tristate Xilinx LL TEMAC (LocalLink Tri-mode Ethernet MAC) driver
 +       depends on PPC || MICROBLAZE
        select PHYLIB
 -       depends on PPC_DCR_NATIVE
        help
          This driver supports the Xilinx 10/100/1000 LocalLink TEMAC
          core used in Xilinx Spartan and Virtex FPGAs
 diff --git a/drivers/net/ll_temac.h b/drivers/net/ll_temac.h
 index 1af66a1..c033584 100644
 --- a/drivers/net/ll_temac.h
 +++ b/drivers/net/ll_temac.h
 @@ -5,8 +5,11 @@
  #include linux/netdevice.h
  #include linux/of.h
  #include linux/spinlock.h
 +
 +#ifdef CONFIG_PPC_DCR
  #include asm/dcr.h
  #include asm/dcr-regs.h
 +#endif

  /* packet size info */
  #define XTE_HDR_SIZE                   14      /* size of Ethernet header */
 @@ -290,9 +293,6 @@ This option defaults to enabled (set) */

  #define TX_CONTROL_CALC_CSUM_MASK   1

 -#define XTE_ALIGN       32
 -#define BUFFER_ALIGN(adr) ((XTE_ALIGN - ((u32) adr)) % XTE_ALIGN)
 -
  #define MULTICAST_CAM_TABLE_NUM 4

  /* TX/RX CURDESC_PTR points to first descriptor */
 @@ -335,9 +335,15 @@ struct temac_local {
        struct mii_bus *mii_bus;        /* MII bus reference */
        int mdio_irqs[PHY_MAX_ADDR];    /* IRQs table for MDIO bus */

 -       /* IO registers and IRQs */
 +       /* IO registers, dma functions and IRQs */
        void __iomem *regs;
 +       void __iomem *sdma_regs;
 +#ifdef CONFIG_PPC_DCR
        dcr_host_t sdma_dcrs;
 +#endif
 +       u32 (*dma_in)(struct temac_local *, int);
 +       void (*dma_out)(struct temac_local *, int, u32);
 +
        int tx_irq;
        int rx_irq;
        int emac_num;
 diff --git a/drivers/net/ll_temac_main.c b/drivers/net/ll_temac_main.c
 index ce9aa78..2b69d6c 100644
 --- a/drivers/net/ll_temac_main.c
 +++ b/drivers/net/ll_temac_main.c
 @@ -20,9 +20,6 @@
  *   or rx, so this should be okay.
  *
  * TODO:
 - * - Fix driver to work on more than just Virtex5.  Right now the driver
 - *   assumes that the locallink DMA registers are accessed via DCR
 - *

RE: Upgrade to 2.6.26 results in Oops during request_irq

2010-04-09 Thread Sparks, Sam

-Original Message-
From: Sparks, Sam
Sent: Thursday, April 08, 2010 4:15 PM

In the interest of making it easier for someone to help, I've been able
to replicate the problem with the following minimal kernel module:

#include linux/interrupt.h
#include linux/irq.h
#include linux/module.h

static unsigned int cpld_virq = NO_IRQ;
unsigned value = 0xdeadbeef;

static irqreturn_t cpld_isr(int irq, void *dev_id) {
return IRQ_HANDLED;
}

void __exit cpld_cleanup(void) {
irq_dispose_mapping(cpld_virq);
return;
}

int __init cpld_init(void) {
int retval;
unsigned long cpld_interrupt = 23;

cpld_virq = irq_create_mapping(NULL, cpld_interrupt);
if (cpld_virq == NO_IRQ) {
printk(KERN_ERR Could not map HW IRQ %lu\n, cpld_interrupt);
return -EBUSY;
}

retval =  request_irq(cpld_virq, cpld_isr,
IRQF_DISABLED | IRQF_SHARED | IRQF_TRIGGER_FALLING,
CPLD, value);
if (retval) {
irq_dispose_mapping(cpld_virq);
return retval;
}

return 0;
}

module_init(cpld_init);
module_exit(cpld_cleanup);
MODULE_LICENSE(Dual BSD/GPL);

It builds fine, but still gives me the same error when inserted into the
kernel:

# insmod /tmp/cpld.ko
Unable to handle kernel paging request for instruction fetch
Faulting instruction address: 0x
Oops: Kernel access of bad area, sig: 11 [#1]
PREEMPT SCPA-G2
Modules linked in: cpld(+) [last unloaded: immr]
NIP:  LR: c004bab8 CTR: 
REGS: dfb13df0 TRAP: 0400   Not tainted  (2.6.26-twacs-100.0.0)
MSR: 20001032 ME,IR,DR  CR: 24244422  XER: 2000
TASK = df8b6c00[559] 'insmod' THREAD: dfb12000
GPR00:  dfb13ea0 df8b6c00 0017 0001 0001 
c02d1fb4
GPR08: 361a    44244484 10073f68 1ffcb000
007ffeb0
GPR16:   0080  b7f0  1006e3dc

GPR24: 0002 2000  9032 df9da620 dfb12000 c02d40e4
0017
NIP [] 0x0
LR [c004bab8] setup_irq+0x4e0/0x510
Call Trace:
[dfb13ea0] [c004ba70] setup_irq+0x498/0x510 (unreliable)
[dfb13ed0] [c004bd64] request_irq+0xe0/0x130
[dfb13f00] [e107804c] cpld_init+0x4c/0xe8 [cpld]
[dfb13f10] [c0048c4c] sys_init_module+0x14c/0x1d8
[dfb13f40] [c0010008] ret_from_syscall+0x0/0x38
--- Exception: c01 at 0xff27bb0
LR = 0x10019ca8
Instruction dump:
       
       
Kernel panic - not syncing: Fatal exception

Please let me know if I should be providing some additional information.
Thanks,
Sam

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

A better way to sequence driver initialization?

2010-04-09 Thread Bill Gatliff

Guys:


My recent post, Requesting a GPIO that hasn't been registered yet, and
Anton's reply thereto (thanks, Anton!) on linuxppc-dev got me thinking
about the problem of dependencies between devices in different  classes,
and/or between drivers/devices in general.  I'd like to float an idea to
see if it's worth pursuing.

Changing the link order to get drivers to initialize in the right order
will always be a problem for someone--- the order will be right for some
people, but exactly wrong for others.  And the problem is made worse for
Device Tree-based systems, where just about everything including IRQ
descriptors are created on a demand and/or as-available basis.  What if
we let the kernel sort those dependencies out for us, at runtime?  I
think it's possible, and I can't be the only one who would like to see
this happen.

There are two parts to my idea for a solution.  First part is to modify
do_initcalls() so that it launches each initcall function in its own
kernel thread.  Wait, don't panic yet!

Second, we create/modify functions like gpio_request() so that if they
fail, they can optionally stop in a wait queue until the call could
succeed.  Then gpiochip_add() would signal that queue each time a new
gpiochip was added. Functions like request_irq() and clk_get()--- and
probably many others--- would do the same thing, but with their own wait
queues.

Something like this:

static wait_queue_head_t requestor_wq;

int gpiochip_add(struct gpio_chip *chip)
{
...
wake_up_interruptible(requestor_wq);
return status;
}

int gpio_request_wait(unsigned gpio, const char *label)
{
return wait_event_interruptible(requestor_wq,
!gpio_request(gpio, label));
}


Or we might want to make the above code be the actual gpio_request(),
and bury the wait_event_interruptible() inside of that.  Doing so would
prevent having to touch a lot of code, but would definitely change the
behavior of gpio_request().  Not sure which approach is best.

Finally, I think that do_initcalls() would turn into something like this:

static void __init do_initcalls(void)
{
initcall_t *fn;

for (fn = __early_initcall_end; fn  __initcall_end; fn++)
kthread_create(do_one_initcall, *fn, do_initcalls);

flush_scheduled_work();
}


If someone doesn't tell me this is a stupid idea, I might post it to
lkml.  Now's your chance!  :)


b.g.

-- 
Bill Gatliff
b...@billgatliff.com

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] fsl_booke: Correct test for MMU_FTR_BIG_PHYS

2010-04-09 Thread Becky Bruce

The code was looking for this in cpu_features, not mmu_features.  Fix this.

Signed-off-by: Becky Bruce bec...@kernel.crashing.org
---
 arch/powerpc/mm/fsl_booke_mmu.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/fsl_booke_mmu.c b/arch/powerpc/mm/fsl_booke_mmu.c
index c539472..533cae5 100644
--- a/arch/powerpc/mm/fsl_booke_mmu.c
+++ b/arch/powerpc/mm/fsl_booke_mmu.c
@@ -116,7 +116,7 @@ void loadcam_entry(int idx)
mtspr(SPRN_MAS2, TLBCAM[idx].MAS2);
mtspr(SPRN_MAS3, TLBCAM[idx].MAS3);
 
-   if (cur_cpu_spec-cpu_features  MMU_FTR_BIG_PHYS)
+   if (cur_cpu_spec-mmu_features  MMU_FTR_BIG_PHYS)
mtspr(SPRN_MAS7, TLBCAM[idx].MAS7);
 
asm volatile(isync;tlbwe;isync : : : memory);
@@ -152,7 +152,7 @@ static void settlbcam(int index, unsigned long virt, 
phys_addr_t phys,
 
TLBCAM[index].MAS3 = (phys  MAS3_RPN) | MAS3_SX | MAS3_SR;
TLBCAM[index].MAS3 |= ((flags  _PAGE_RW) ? MAS3_SW : 0);
-   if (cur_cpu_spec-cpu_features  MMU_FTR_BIG_PHYS)
+   if (cur_cpu_spec-mmu_features  MMU_FTR_BIG_PHYS)
TLBCAM[index].MAS7 = (u64)phys  32;
 
 #ifndef CONFIG_KGDB /* want user access for breakpoints */
-- 
1.6.0.6

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: A better way to sequence driver initialization?

2010-04-09 Thread Bill Gatliff

Bill Gatliff wrote:
 If someone doesn't tell me this is a stupid idea, I might post it to
 lkml.  Now's your chance!  :)
   

So I went ahead and tried it anyway:


$ git diff init/main.c
diff --git a/init/main.c b/init/main.c
index dac44a9..1461d09 100644
--- a/init/main.c
+++ b/init/main.c
@@ -753,7 +753,11 @@ static void __init do_initcalls(void)
initcall_t *fn;

for (fn = __early_initcall_end; fn  __initcall_end; fn++)
+#if 1
+   kthread_run(do_one_initcall, *fn, do_initcalls);
+#else
do_one_initcall(*fn);
+#endif

/* Make sure there is no pending stuff from the initcall sequence */
flush_scheduled_work();

Interestingly, things didn't blow up as quickly as I had expected they
would.  :)

Booting with initcall_debug=1 both with and without the kthread_run()
mod has proven insightful.  The first OOPS I get with the mod looks like
this:

...
calling  ch_init+0x0/0x18 @ 325
Unable to handle kernel NULL pointer dereference at virtual address

pgd = c0004000
[] *pgd= 
Internal error: Oops: 5 [#4] PREEMPT 
last sysfs file:
Modules linked in: 
CPU: 0 Tainted: GD  (2.6.33-rc5-csb737-00010-gaba040a-dirty #59)
PC is at kset_find_obj+0x18/0x98
LR is at kset_find_obj+0x18/0x98
pc : [c018156c] lr : [c018156c] psr: 2013
sp : c3937f68  ip : c3937f68  fp : 
r10:   r9 :   r8 : 
r7 : c0396a6f  r6 :   r5 : 0005  r4 : c03f03b0
r3 : 0003  r2 : c3936000  r1 :   r0 : 0001
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 0005317f  Table: 20004000  DAC: 0017
Process do_initcalls (pid: 325, stack limit = 0xc3936270) 
Stack: (0xc3937f68 to 0xc3938000) 
7f60: c03fc050 c03f03b0 0005 c03f03b0  c01c471c
7f80: c03f03b0 0005 c001e62c   c024007c 
1b58d32c
7fa0: 0005 c002c39c c381bfac c001e62c 1b58d32c 0005 
c3937fd4
7fc0: c381bfac c001e62c c002c340 c005cb3c   c3937fd8
c3937fd8
7fe0:      c002df78 

[c018156c] (kset_find_obj+0x18/0x98) from [c01c471c]
(driver_register+0x88/0x150)
[c01c471c] (driver_register+0x88/0x150) from [c024007c]
(__hid_register_driver+0x38/0x70) 
[c024007c] (__hid_register_driver+0x38/0x70) from [c002c39c]
(do_one_initcall+0x5c/0x1c4) 
[c002c39c] (do_one_initcall+0x5c/0x1c4) from [c005cb3c]
(kthread+0x78/0x80)
[c005cb3c] (kthread+0x78/0x80) from [c002df78]
(kernel_thread_exit+0x0/0x8)
Code: e24dd004 e3a1 e1a07001 ebfaf76c (e5963000)
---[ end trace f9bb68fd55862436 ]--- 
note: do_initcalls[325] exited with preempt_count 1
...

There is a ch_init() function in drivers/hid/hid-cherry.c.  I think that
function is calling hid_register_driver() before hid_init() has set up
hid_bus_type.  That causes kset_find_obj() to die because the kset
structure it gets passed in ch_init() refers to a bus that doesn't exist
yet.  It's a plain 'ole initialization race.

Maybe there are fewer places that would need wait queues than I
originally thought!  At least for drivers that use the device API,
kset_find_obj() might be the only place that needs to wait until a
device or bus appears.  If I get a second next week, I might hack one in
there and see how much farther I get...


b.g.

-- 
Bill Gatliff
b...@billgatliff.com

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: A better way to sequence driver initialization?

2010-04-09 Thread Bill Gatliff

Bill Gatliff wrote:
 Maybe there are fewer places that would need wait queues than I
 originally thought!  At least for drivers that use the device API,
 kset_find_obj() might be the only place that needs to wait until a
 device or bus appears.

... and kset_register() might be the only place that calls
wake_up_interruptible().  Wow.


b.g.

-- 
Bill Gatliff
b...@billgatliff.com

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: A better way to sequence driver initialization?

2010-04-09 Thread Bill Gatliff

Bill Gatliff wrote:

 ... and kset_register() might be the only place that calls
 wake_up_interruptible().  Wow.
   

Wow, indeed.  With just the attached patch, I get exactly one OOPS
now--- in ssc_free(), which I think is getting called because
atmel_ssc_modinit() is racing with something.

I wasn't able to get all the way to mounting my NFS rootfs, I think
because I'm getting to ip_auto_config() before my ethernet controller
has registered!  :)


$ git diff lib/kobject.c
diff --git a/lib/kobject.c b/lib/kobject.c
index b512b74..e75556c 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -18,6 +18,10 @@
 #include linux/stat.h
 #include linux/slab.h

+#include linux/sched.h
+#include linux/wait.h
+DECLARE_WAIT_QUEUE_HEAD(kset_wait);
+
 /*
  * populate_dir - populate directory with attributes.
  * @kobj: object we're working on.
@@ -721,6 +725,7 @@ int kset_register(struct kset *k)
if (err)
return err;
kobject_uevent(k-kobj, KOBJ_ADD);
+   wake_up_interruptible(kset_wait);
return 0;
 }

@@ -744,7 +749,7 @@ void kset_unregister(struct kset *k)
  * looking for a matching kobject. If matching object is found
  * take a reference and return the object.
  */
-struct kobject *kset_find_obj(struct kset *kset, const char *name)
+struct kobject *__kset_find_obj(struct kset *kset, const char *name)
 {
struct kobject *k;
struct kobject *ret = NULL;
@@ -760,6 +765,14 @@ struct kobject *kset_find_obj(struct kset *kset,
const char *name)
return ret;
 }

+struct kobject *kset_find_obj(struct kset *kset, const char *name)
+{
+   struct kobject *ret = NULL;
+   wait_event_interruptible(kset_wait, (ret = __kset_find_obj(kset,
name)));
+   return ret;
+}
+
 static void kset_release(struct kobject *kobj)
 {
struct kset *kset = container_of(kobj, struct kset, kobj);


b.g.

-- 
Bill Gatliff
b...@billgatliff.com

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Continual reading from the PowerPc time base register is not stable

2010-04-09 Thread Csdncannon

Sorry, I attached the wrong log, this attachment is the right one.


2010/3/25 Csdncannon csdncan...@gmail.com

  In my program, the value of the 64-bit time base register is read
 out, and you will find the later value is even smaller than the earlier
 value from the log “log_timebase”. While the kernel depends on the accuracy
 of the timebase for the compensation of the lost PIT interrupt, the negative
 value between two continual timebase reading will bring to the jump of the
 jiffies. And this timebase problem will bring to the instability of the
 gettimeofday system call.

  Do you have any idea about this problem, thanks for your any
 advice. Attached is the code and log.


 Thanks

 Gino






gettime.log
Description: Binary data
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: A better way to sequence driver initialization?

2010-04-09 Thread Grant Likely

On Fri, Apr 9, 2010 at 1:23 PM, Bill Gatliff b...@billgatliff.com wrote:
 Guys:


 My recent post, Requesting a GPIO that hasn't been registered yet, and
 Anton's reply thereto (thanks, Anton!) on linuxppc-dev got me thinking
 about the problem of dependencies between devices in different  classes,
 and/or between drivers/devices in general.  I'd like to float an idea to
 see if it's worth pursuing.

 Changing the link order to get drivers to initialize in the right order
 will always be a problem for someone--- the order will be right for some
 people, but exactly wrong for others.  And the problem is made worse for
 Device Tree-based systems, where just about everything including IRQ
 descriptors are created on a demand and/or as-available basis.  What if
 we let the kernel sort those dependencies out for us, at runtime?  I
 think it's possible, and I can't be the only one who would like to see
 this happen.

 There are two parts to my idea for a solution.  First part is to modify
 do_initcalls() so that it launches each initcall function in its own
 kernel thread.  Wait, don't panic yet!

Is initcall the right granularity?  Shouldn't it be that each .probe()
hook gets its own thread?  Otherwise one device missing its resources
could block another device using the same driver (whose resources are
available) from probing.  Regardless, parallel probe has be attempted
and failed before.  There are lots of fiddly bits to get right.

In fact, there *used* to be code in the kernel that does exactly that.
 It was put in 2.6.20, but removed in 2.6.21-rc1.  Here's the relevant
commits, and a very interesting thread discussing the issues:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=21c7f30b1d3f8a3de3128478daca3ce203fc8733
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=5adc55da4a7758021bcc374904b0f8b076508a11
http://lkml.indiana.edu/hypermail/linux/kernel/0705.1/0205.html

It is pointed out in that thread that a big part of the problem is
that a large number of drivers in the tree just aren't safe for
multithreaded probing which is kind of a showstopper.  Now, maybe
doing it at the initcall level makes it less scary and more sane, but
I suspect that it will still expose a lot of broken code that assumes
things are already set up because it has always been that way.

 Second, we create/modify functions like gpio_request() so that if they
 fail, they can optionally stop in a wait queue until the call could
 succeed.  Then gpiochip_add() would signal that queue each time a new
 gpiochip was added. Functions like request_irq() and clk_get()--- and
 probably many others--- would do the same thing, but with their own wait
 queues.

You could take a look at the bus notifier code too.  I made an attempt
at solving a similar problem between Ethernet MAC drivers and binding
against the correct phy.  I ended up solving the problem in a
different way, but the experimenting I did convince me that it is a
viable concept.

Basically, if the driver depends on another resource/device then it
can register a callback to be notified when a new device becomes
available.  Once the needed device shows up (or immediately if the
device is already there), the driver gets called back and it can
complete registration.

Not quite the same model that you are talking about here, but it would
solve the problem on a per-driver basis.

 Something like this:

    static wait_queue_head_t requestor_wq;

    int gpiochip_add(struct gpio_chip *chip)
    {
        ...
        wake_up_interruptible(requestor_wq);
        return status;
    }

    int gpio_request_wait(unsigned gpio, const char *label)
    {
        return wait_event_interruptible(requestor_wq,
 !gpio_request(gpio, label));
    }


 Or we might want to make the above code be the actual gpio_request(),
 and bury the wait_event_interruptible() inside of that.  Doing so would
 prevent having to touch a lot of code, but would definitely change the
 behavior of gpio_request().  Not sure which approach is best.

 Finally, I think that do_initcalls() would turn into something like this:

    static void __init do_initcalls(void)
    {
            initcall_t *fn;

            for (fn = __early_initcall_end; fn  __initcall_end; fn++)
                    kthread_create(do_one_initcall, *fn, do_initcalls);

            flush_scheduled_work();
    }


 If someone doesn't tell me this is a stupid idea, I might post it to
 lkml.  Now's your chance!  :)

Have you dug into the Arjan's asynchronous function call infrastructure?

http://lwn.net/Articles/314808/

g.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 0/5] sched: asymmetrical packing for POWER7 SMT4

[PATCH 1/5] sched: fix capacity calculations for SMT4

[PATCH 2/5] sched: add asymmetric packing option for sibling domain

[PATCH 3/5] powerpc: enabled asymmetric SMT scheduling on POWER7

[PATCH 5/5] sched: make fix_small_imbalance work with asymmetric packing

[PATCH 4/5] sched: Mark the balance type for use in need_active_balance()

Re: [PATCH 3/5] powerpc: enabled asymmetric SMT scheduling on POWER7

[PATCH v2 net-next 0/2] timestamping support for gianfar

[PATCH v2 net-next 1/2] gianfar: Add hardware RX timestamping support

[PATCH v2 net-next 2/2] gianfar: Add hardware TX timestamping support

Re: [-next April 8] eHEA driver failure on powerpc

Re: [PATCH 2/2] [V5] Add non-Virtex5 support for LL TEMAC driver

RE: Upgrade to 2.6.26 results in Oops during request_irq

A better way to sequence driver initialization?

[PATCH] fsl_booke: Correct test for MMU_FTR_BIG_PHYS

Re: A better way to sequence driver initialization?

Re: A better way to sequence driver initialization?

Re: A better way to sequence driver initialization?

Re: Continual reading from the PowerPc time base register is not stable

Re: A better way to sequence driver initialization?

20 matches

Site Navigation

Mail list logo

Footer information