from:"Paul Mackerras"

Re: AGP bogosities

2005-03-10 Thread Paul Mackerras

Jesse Barnes writes:

> I have a system in my office with several gfx pipes on different AGP busses, 
> and I'd like that to work well too! :)

Interesting, could you post the output from lspci -v on that system?

What is the relationship in the PCI device tree between the video
cards and their bridges?  Is there for instance only one AGP bridge
per host bridge?

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

AGP bogosities

2005-03-10 Thread Paul Mackerras

Linus,

I see that you did a cset -x on a changeset from Dave Jones that added
a bogus test for which AGP bridge a device is under.  That has left us
with code in agp_collect_device_status that will never find any device
(just take a look and you'll see).

In fact there are other bogosities in drivers/char/agp/generic.c.  I
can't believe Dave ever tested that code with an AGP 3.0 device.  If
you pass in a mode that has the AGP 3.0 bit set, agp_v3_parse_one()
will first clear that bit (and print a message), and then complain
because you haven't got that bit set in the mode, with a message that
the caller is broken.  Furthermore, if the mode passed in has both the
4x and 8x bits set, the new code will give you 4x where the old code
would give you 8x (which is what the caller wanted).

The patch below fixes these problems.  It will work in the 99.99% of
cases where we have one AGP bridge and one AGP video card.  We should
eventually cope with multiple AGP bridges, but doing the matching of
bridges to video cards is a hard problem because the video card is not
necessarily a child or sibling of the PCI device that we use for
controlling the AGP bridge.  I think we need to see an actual example
of a system with multiple AGP bridges first.

Oh, and by the way, I have 3D working relatively well on my G5 with a
64-bit kernel (and 32-bit X server and clients), which is why I care
about AGP 3.0 support. :)

Paul.

diff -urN linux-2.5/drivers/char/agp/agp.h g5-bad/drivers/char/agp/agp.h
--- linux-2.5/drivers/char/agp/agp.h2005-03-07 14:01:44.0 +1100
+++ g5/drivers/char/agp/agp.h   2005-03-11 11:54:54.0 +1100
@@ -322,7 +322,7 @@
 #define AGPCTRL_GTLBEN (1<<7)
 
 #define AGP2_RESERVED_MASK 0x00fffcc8
-#define AGP3_RESERVED_MASK 0x00ff00cc
+#define AGP3_RESERVED_MASK 0x00ff00c4
 
 #define AGP_ERRATA_FASTWRITES 1<<0
 #define AGP_ERRATA_SBA  1<<1
diff -urN linux-2.5/drivers/char/agp/generic.c g5-bad/drivers/char/agp/generic.c
--- linux-2.5/drivers/char/agp/generic.c2005-03-11 11:47:37.0 
+1100
+++ g5/drivers/char/agp/generic.c   2005-03-11 12:08:29.0 +1100
@@ -515,13 +515,9 @@
printk (KERN_INFO PFX "%s tried to set rate=x0. Setting to AGP3 
x4 mode.\n", current->comm);
*requested_mode |= AGPSTAT3_4X;
}
-   if (tmp == 3) {
-   printk (KERN_INFO PFX "%s tried to set rate=x3. Setting to AGP3 
x4 mode.\n", current->comm);
-   *requested_mode |= AGPSTAT3_4X;
-   }
-   if (tmp >3) {
-   printk (KERN_INFO PFX "%s tried to set rate=x%d. Setting to 
AGP3 x8 mode.\n", current->comm, tmp);
-   *requested_mode |= AGPSTAT3_8X;
+   if (tmp >= 3) {
+   printk (KERN_INFO PFX "%s tried to set rate=x%d. Setting to 
AGP3 x8 mode.\n", current->comm, tmp * 4);
+   *requested_mode = (*requested_mode & ~7) | AGPSTAT3_8X;
}
 
/* ARQSZ - Set the value to the maximum one.
@@ -642,11 +638,6 @@
return 0;
}
cap_ptr = pci_find_capability(device, PCI_CAP_ID_AGP);
-   if (!cap_ptr) {
-   pci_dev_put(device);
-   continue;
-   }
-   cap_ptr = 0;
}
 
/*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PPC64 NUMA memory fixup

2005-03-08 Thread Paul Mackerras

This patch is from Mike Kravetz <[EMAIL PROTECTED]>.

When I booted my new 720 on a kernel configured for NUMA, I received
the following during bootup:

WARNING: Unexpected node layout: region start 4400 length 200
NUMA is disabled

This is due to memory 'holes' within nodes.  If such holes are
encountered, then NUMA is disabled.  The following patch adds support
for such configurations.  My 720 now boots with the following message:

[boot]0012 Setup Arch
Node 0 Memory: 0x0-0x800 0x4400-0x12a00
Node 1 Memory: 0x800-0x4400 0x12a00-0x1ea00

Signed-off-by: Mike Kravetz <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

diff -Naupr linux-2.6.11-rc3/arch/ppc64/mm/numa.c 
linux-2.6.11-rc3.work/arch/ppc64/mm/numa.c
--- linux-2.6.11-rc3/arch/ppc64/mm/numa.c   2005-02-03 01:57:16.0 
+
+++ linux-2.6.11-rc3.work/arch/ppc64/mm/numa.c  2005-03-01 19:39:21.0 
+
@@ -40,7 +40,6 @@ int nr_cpus_in_node[MAX_NUMNODES] = { [0
 
 struct pglist_data *node_data[MAX_NUMNODES];
 bootmem_data_t __initdata plat_node_bdata[MAX_NUMNODES];
-static unsigned long node0_io_hole_size;
 static int min_common_depth;
 
 /*
@@ -49,7 +48,8 @@ static int min_common_depth;
  */
 static struct {
unsigned long node_start_pfn;
-   unsigned long node_spanned_pages;
+   unsigned long node_end_pfn;
+   unsigned long node_present_pages;
 } init_node_data[MAX_NUMNODES] __initdata;
 
 EXPORT_SYMBOL(node_data);
@@ -348,33 +348,28 @@ new_range:
if (max_domain < numa_domain)
max_domain = numa_domain;
 
-   /* 
-* For backwards compatibility, OF splits the first node
-* into two regions (the first being 0-4GB). Check for
-* this simple case and complain if there is a gap in
-* memory
+   /*
+* Initialize new node struct, or add to an existing one.
 */
-   if (init_node_data[numa_domain].node_spanned_pages) {
-   unsigned long shouldstart =
-   init_node_data[numa_domain].node_start_pfn +
-   init_node_data[numa_domain].node_spanned_pages;
-   if (shouldstart != (start / PAGE_SIZE)) {
-   /* Revert to non-numa for now */
-   printk(KERN_ERR
-  "WARNING: Unexpected node layout: "
-  "region start %lx length %lx\n",
-  start, size);
-   printk(KERN_ERR "NUMA is disabled\n");
-   goto err;
-   }
-   init_node_data[numa_domain].node_spanned_pages +=
+   if (init_node_data[numa_domain].node_end_pfn) {
+   if ((start / PAGE_SIZE) <
+   init_node_data[numa_domain].node_start_pfn)
+   init_node_data[numa_domain].node_start_pfn =
+   start / PAGE_SIZE;
+   else
+   init_node_data[numa_domain].node_end_pfn =
+   (start / PAGE_SIZE) +
+   (size / PAGE_SIZE);
+
+   init_node_data[numa_domain].node_present_pages +=
size / PAGE_SIZE;
} else {
node_set_online(numa_domain);
 
init_node_data[numa_domain].node_start_pfn =
start / PAGE_SIZE;
-   init_node_data[numa_domain].node_spanned_pages =
+   init_node_data[numa_domain].node_end_pfn =
+   init_node_data[numa_domain].node_start_pfn +
size / PAGE_SIZE;
}
 
@@ -391,14 +386,6 @@ new_range:
node_set_online(i);
 
return 0;
-err:
-   /* Something has gone wrong; revert any setup we've done */
-   for_each_node(i) {
-   node_set_offline(i);
-   init_node_data[i].node_start_pfn = 0;
-   init_node_data[i].node_spanned_pages = 0;
-   }
-   return -1;
 }
 
 static void __init setup_nonnuma(void)
@@ -426,12 +413,11 @@ static void __init setup_nonnuma(void)
node_set_online(0);
 
init_node_data[0].node_start_pfn = 0;
-   init_node_data[0].node_spanned_pages = lmb_end_of_DRAM() / PAGE_SIZE;
+   init_node_data[0].node_end_pfn = lmb_end_of_DRAM() / PAGE_SIZE;
+   init_node_data[0].node_present_pages = total_ram / PAGE_SIZE;
 
for (i = 0 ; i < top_of_ram; i += MEMORY_INCREMENT)
numa_memory_lookup_table[i >> MEMORY_INCREMENT_S

[PATCH] PPC64 fix eeh.h compile warnings

2005-03-08 Thread Paul Mackerras

This patch is from Nathan Lynch <[EMAIL PROTECTED]>.

Use static inlines instead of #defines for stub functions when
CONFIG_EEH=n, to eliminate "statement with no effect" warnings with
some toolchains.

Signed-off-by: Nathan Lynch <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

Index: linux-2.6.11/include/asm-ppc64/eeh.h
===
--- linux-2.6.11.orig/include/asm-ppc64/eeh.h   2005-03-02 07:38:38.0 
+
+++ linux-2.6.11/include/asm-ppc64/eeh.h2005-03-03 01:39:25.0 
+
@@ -104,17 +104,30 @@ int eeh_unregister_notifier(struct notif
  */
 #define EEH_IO_ERROR_VALUE(size)   (~0U >> ((4 - (size)) * 8))
 
-#else
-#define eeh_init()
-#define eeh_check_failure(token, val) (val)
-#define eeh_dn_check_failure(dn, dev) (0)
-#define pci_addr_cache_build()
-#define eeh_add_device_early(dn)
-#define eeh_add_device_late(dev)
-#define eeh_remove_device(dev)
+#else /* !CONFIG_EEH */
+static inline void eeh_init(void) { }
+
+static inline unsigned long eeh_check_failure(const volatile void __iomem 
*token, unsigned long val)
+{
+   return val;
+}
+
+static inline int eeh_dn_check_failure(struct device_node *dn, struct pci_dev 
*dev)
+{
+   return 0;
+}
+
+static inline void pci_addr_cache_build(void) { }
+
+static inline void eeh_add_device_early(struct device_node *dn) { }
+
+static inline void eeh_add_device_late(struct pci_dev *dev) { }
+
+static inline void eeh_remove_device(struct pci_dev *dev) { }
+
 #define EEH_POSSIBLE_ERROR(val, type) (0)
 #define EEH_IO_ERROR_VALUE(size) (-1UL)
-#endif
+#endif /* CONFIG_EEH */
 
 /* 
  * MMIO read/write operations with EEH support.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PPC64 call idle_task_exit with irqs disabled

2005-03-08 Thread Paul Mackerras

This patch is from Nathan Lynch <[EMAIL PROTECTED]>.

Seeing this very occasionally during cpu hotplug testing:

 Badness in slb_flush_and_rebolt at arch/ppc64/mm/slb.c:52
 Call Trace:
 [c000ef0efbe0] [c00127a0] .__switch_to+0xa4/0xf0 (unreliable)
 [c000ef0efc80] [c0050178] .idle_task_exit+0xbc/0x15c
 [c000ef0efd10] [c000d108] .cpu_die+0x18/0x68
 [c000ef0efd90] [c001023c] .dedicated_idle+0x1fc/0x254
 [c000ef0efe80] [c000fc80] .cpu_idle+0x3c/0x54
 [c000ef0eff00] [c003aa90] .start_secondary+0x108/0x148
 [c000ef0eff90] [c000bd28] .enable_64b_mode+0x0/0x28

idle_task_exit can result in a call to slb_flush_and_rebolt, which
must not be called with interrupts enabled.  Make the call with
interrupts disabled.


Signed-off-by: Nathan Lynch <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

 pSeries_setup.c |2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.11-bk2/arch/ppc64/kernel/pSeries_setup.c
===
--- linux-2.6.11-bk2.orig/arch/ppc64/kernel/pSeries_setup.c 2005-03-07 
04:09:29.0 +
+++ linux-2.6.11-bk2/arch/ppc64/kernel/pSeries_setup.c  2005-03-07 
04:15:22.0 +
@@ -322,8 +322,8 @@ static  void __init pSeries_discover_pic
 
 static void pSeries_mach_cpu_die(void)
 {
-   idle_task_exit();
local_irq_disable();
+   idle_task_exit();
/* Some hardware requires clearing the CPPR, while other hardware does 
not
 * it is safe either way
 */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PPC64 update irq affinity mask when migrating irqs

2005-03-08 Thread Paul Mackerras

This patch is from Nathan Lynch <[EMAIL PROTECTED]>.

When offlining a cpu, any device interrupts which are bound to the cpu
have their affinity forcibly reset to all cpus (the default).
However, the value in /proc/irq/XXX/smp_affinity remains unchanged.
Since we're doing this while all the other cpus are stopped, it should
be safe to just call desc->handler->set_affinity and manually update
the irq_affinity array.


Signed-off-by: Nathan Lynch <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

 xics.c |   11 ++-
 1 files changed, 2 insertions(+), 9 deletions(-)

Index: linux-2.6.11-bk2/arch/ppc64/kernel/xics.c
===
--- linux-2.6.11-bk2.orig/arch/ppc64/kernel/xics.c  2005-03-02 
07:38:10.0 +
+++ linux-2.6.11-bk2/arch/ppc64/kernel/xics.c   2005-03-07 03:52:08.0 
+
@@ -704,15 +704,8 @@ void xics_migrate_irqs_away(void)
   virq, cpu);
 
/* Reset affinity to all cpus */
-   xics_status[0] = default_distrib_server;
-
-   status = rtas_call(ibm_set_xive, 3, 1, NULL, irq,
-   xics_status[0], xics_status[1]);
-   if (status)
-   printk(KERN_ERR "migrate_irqs_away: irq=%d "
-   "ibm,set-xive returns %d\n",
-   virq, status);
-
+   desc->handler->set_affinity(virq, CPU_MASK_ALL);
+   irq_affinity[virq] = CPU_MASK_ALL;
 unlock:
spin_unlock_irqrestore(&desc->lock, flags);
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PPC64 error code cleanups for rtas wrappers

2005-03-08 Thread Paul Mackerras

This patch is from John Rose <[EMAIL PROTECTED]>

This patch changes the rtas wrapper functions in rtas.c to map RTAS
failure codes to conventional error values.  The goal is to make
failure conditions obvious in the wrapper functions and in the caller
code.

Signed-off-by: John Rose <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

diff -puN arch/ppc64/kernel/pSeries_smp.c~01_rtas_rcs 
arch/ppc64/kernel/pSeries_smp.c
--- 2_6_linus_3/arch/ppc64/kernel/pSeries_smp.c~01_rtas_rcs 2005-03-02 
14:50:33.0 -0600
+++ 2_6_linus_3-johnrose/arch/ppc64/kernel/pSeries_smp.c2005-03-02 
14:50:33.0 -0600
@@ -151,7 +151,7 @@ static unsigned int find_physical_cpu_to
if (index) {
int state;
int rc = rtas_get_sensor(9003, *index, &state);
-   if (rc != 0 || state != 1)
+   if (rc < 0 || state != 1)
continue;
}
 
diff -puN arch/ppc64/kernel/rtas.c~01_rtas_rcs arch/ppc64/kernel/rtas.c
--- 2_6_linus_3/arch/ppc64/kernel/rtas.c~01_rtas_rcs2005-03-02 
14:50:33.0 -0600
+++ 2_6_linus_3-johnrose/arch/ppc64/kernel/rtas.c   2005-03-02 
14:50:33.0 -0600
@@ -255,29 +255,59 @@ rtas_extended_busy_delay_time(int status
return ms; 
 }
 
-int
-rtas_get_power_level(int powerdomain, int *level)
+int rtas_error_rc(int rtas_rc)
+{
+   int rc;
+
+   switch (rtas_rc) {
+   case -1:/* Hardware Error */
+   rc = -EIO;
+   break;
+   case -3:/* Bad indicator/domain/etc */
+   rc = -EINVAL;
+   break;
+   case -9000: /* Isolation error */
+   rc = -EFAULT;
+   break;
+   case -9001: /* Outstanding TCE/PTE */
+   rc = -EEXIST;
+   break;
+   case -9002: /* No usable slot */
+   rc = -ENODEV;
+   break;
+   default:
+   printk(KERN_ERR "%s: unexpected RTAS error %d\n",
+   __FUNCTION__, rtas_rc);
+   rc = -ERANGE;
+   break;
+   }
+   return rc;
+}
+
+int rtas_get_power_level(int powerdomain, int *level)
 {
int token = rtas_token("get-power-level");
int rc;
 
if (token == RTAS_UNKNOWN_SERVICE)
-   return RTAS_UNKNOWN_OP;
+   return -ENOENT;
 
while ((rc = rtas_call(token, 1, 2, level, powerdomain)) == RTAS_BUSY)
udelay(1);
+
+   if (rc < 0)
+   return rtas_error_rc(rc);
return rc;
 }
 
-int
-rtas_set_power_level(int powerdomain, int level, int *setlevel)
+int rtas_set_power_level(int powerdomain, int level, int *setlevel)
 {
int token = rtas_token("set-power-level");
unsigned int wait_time;
int rc;
 
if (token == RTAS_UNKNOWN_SERVICE)
-   return RTAS_UNKNOWN_OP;
+   return -ENOENT;
 
while (1) {
rc = rtas_call(token, 2, 2, setlevel, powerdomain, level);
@@ -289,18 +319,20 @@ rtas_set_power_level(int powerdomain, in
} else
break;
}
+
+   if (rc < 0)
+   return rtas_error_rc(rc);
return rc;
 }
 
-int
-rtas_get_sensor(int sensor, int index, int *state)
+int rtas_get_sensor(int sensor, int index, int *state)
 {
int token = rtas_token("get-sensor-state");
unsigned int wait_time;
int rc;
 
if (token == RTAS_UNKNOWN_SERVICE)
-   return RTAS_UNKNOWN_OP;
+   return -ENOENT;
 
while (1) {
rc = rtas_call(token, 2, 2, state, sensor, index);
@@ -312,18 +344,20 @@ rtas_get_sensor(int sensor, int index, i
} else
break;
}
+
+   if (rc < 0)
+   return rtas_error_rc(rc);
return rc;
 }
 
-int
-rtas_set_indicator(int indicator, int index, int new_value)
+int rtas_set_indicator(int indicator, int index, int new_value)
 {
int token = rtas_token("set-indicator");
unsigned int wait_time;
int rc;
 
if (token == RTAS_UNKNOWN_SERVICE)
-   return RTAS_UNKNOWN_OP;
+   return -ENOENT;
 
while (1) {
rc = rtas_call(token, 3, 1, NULL, indicator, index, new_value);
@@ -337,6 +371,8 @@ rtas_set_indicator(int indicator, int in
break;
}
 
+   if (rc < 0)
+   return rtas_error_rc(rc);
return rc;
 }
 
diff -puN arch/ppc64/kernel/rtasd.c~01_rtas_rcs arch/ppc64/kernel/rtasd.c
--- 2_6_linus_3/arch/ppc64/kernel/r

[PATCH] PPC64 error code cleanups rpa[php,dlpar]

2005-03-08 Thread Paul Mackerras

This patch is from John Rose <[EMAIL PROTECTED]>

This patch changes the RPA PCI Hotplug and DLPAR modules to use more
conventional error values for return codes.  The goal is to make failure
conditions obvious in the wrapper functions and in the caller code.

Signed-off-by: John Rose <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

diff -puN drivers/pci/hotplug/rpaphp.h~02_rpaphp_rcs 
drivers/pci/hotplug/rpaphp.h
--- 2_6_linus_3/drivers/pci/hotplug/rpaphp.h~02_rpaphp_rcs  2005-03-07 
17:52:20.0 -0600
+++ 2_6_linus_3-johnrose/drivers/pci/hotplug/rpaphp.h   2005-03-07 
17:52:20.0 -0600
@@ -45,11 +45,6 @@
 #define LED_ID 2   /* slow blinking */
 #define LED_ACTION 3   /* fast blinking */
 
-/* Error status from rtas_get-sensor */
-#define NEED_POWER-9000/* slot must be power up and unisolated to get 
state */
-#define PWR_ONLY  -9001/* slot must be powerd up to get state, leave 
isolated */
-#define ERR_SENSE_USE -9002/* No DR operation will succeed, slot is 
unusable  */
-
 /* Sensor values from rtas_get-sensor */
 #define EMPTY   0  /* No card in slot */
 #define PRESENT 1  /* Card in slot */
diff -puN drivers/pci/hotplug/rpaphp_core.c~02_rpaphp_rcs 
drivers/pci/hotplug/rpaphp_core.c
--- 2_6_linus_3/drivers/pci/hotplug/rpaphp_core.c~02_rpaphp_rcs 2005-03-07 
17:52:20.0 -0600
+++ 2_6_linus_3-johnrose/drivers/pci/hotplug/rpaphp_core.c  2005-03-07 
17:52:20.0 -0600
@@ -256,12 +256,12 @@ int rpaphp_get_drc_props(struct device_n
my_index = (int *) get_property(dn, "ibm,my-drc-index", NULL);
if (!my_index) {
/* Node isn't DLPAR/hotplug capable */
-   return 1;
+   return -EINVAL;
}
 
rc = get_children_props(dn->parent, &indexes, &names, &types, &domains);
if (rc < 0) {
-   return 1;
+   return -EINVAL;
}
 
name_tmp = (char *) &names[1];
@@ -284,7 +284,7 @@ int rpaphp_get_drc_props(struct device_n
type_tmp += (strlen(type_tmp) + 1);
}
 
-   return 1;
+   return -EINVAL;
 }
 
 static int is_php_type(char *drc_type)
diff -puN drivers/pci/hotplug/rpaphp_pci.c~02_rpaphp_rcs 
drivers/pci/hotplug/rpaphp_pci.c
--- 2_6_linus_3/drivers/pci/hotplug/rpaphp_pci.c~02_rpaphp_rcs  2005-03-07 
17:52:20.0 -0600
+++ 2_6_linus_3-johnrose/drivers/pci/hotplug/rpaphp_pci.c   2005-03-07 
17:52:20.0 -0600
@@ -81,8 +81,8 @@ static int rpaphp_get_sensor_state(struc
 
rc = rtas_get_sensor(DR_ENTITY_SENSE, slot->index, state);
 
-   if (rc) {
-   if (rc == NEED_POWER || rc == PWR_ONLY) {
+   if (rc < 0) {
+   if (rc == -EFAULT || rc == -EEXIST) {
dbg("%s: slot must be power up to get sensor-state\n",
__FUNCTION__);
 
@@ -91,14 +91,14 @@ static int rpaphp_get_sensor_state(struc
 */
rc = rtas_set_power_level(slot->power_domain, POWER_ON,
  &setlevel);
-   if (rc) {
+   if (rc < 0) {
dbg("%s: power on slot[%s] failed rc=%d.\n",
__FUNCTION__, slot->name, rc);
} else {
rc = rtas_get_sensor(DR_ENTITY_SENSE,
 slot->index, state);
}
-   } else if (rc == ERR_SENSE_USE)
+   } else if (rc == -ENODEV)
info("%s: slot is unusable\n", __FUNCTION__);
else
err("%s failed to get sensor state\n", __FUNCTION__);
@@ -413,7 +413,7 @@ static int setup_pci_hotplug_slot_info(s
if (slot->hotplug_slot->info->adapter_status == NOT_VALID) {
err("%s: NOT_VALID: skip dn->full_name=%s\n",
__FUNCTION__, slot->dn->full_name);
-   return -1;
+   return -EINVAL;
}
return 0;
 }
@@ -426,15 +426,15 @@ static int set_phb_slot_name(struct slot
 
dn = slot->dn;
if (!dn) {
-   return 1;
+   return -EINVAL;
}
phb = dn->phb;
if (!phb) {
-   return 1;
+   return -EINVAL;
}
bus = phb->bus;
if (!bus) {
-   return 1;
+   return -EINVAL;
}
 
sprintf(slot->name, "%04x:%02x:%02x.%x", pci_domain_nr(bus),
@@ -448,7 +448,7 @@ static int setup_pci_slot(struct slot *s
 
if (slot->type == PHB) {
rc = set_phb_slot_name(slot);
-   if (rc) {
+   if (

[PATCH] PPC64 set pci_io_base dynamically if necessary

2005-03-07 Thread Paul Mackerras

This patch is from John Rose <[EMAIL PROTECTED]>.

Upon DLPAR addition of a PCI Host Brige to a system with purely virtual
I/O, set pci_io_base as necessary.

Signed-off-by: John Rose <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

diff -urN linux-2.5/arch/ppc64/kernel/pSeries_pci.c 
test/arch/ppc64/kernel/pSeries_pci.c
--- linux-2.5/arch/ppc64/kernel/pSeries_pci.c   2005-01-12 18:20:48.0 
+1100
+++ test/arch/ppc64/kernel/pSeries_pci.c2005-03-07 21:04:02.0 
+1100
@@ -424,16 +424,18 @@
unsigned int root_size_cells = 0;
struct pci_controller *phb;
struct pci_bus *bus;
+   int primary;
 
root_size_cells = prom_n_size_cells(root);
 
+   primary = list_empty(&hose_list);
phb = alloc_phb_dynamic(dn, root_size_cells);
if (!phb)
return NULL;
 
pci_process_bridge_OF_ranges(phb, dn);
 
-   pci_setup_phb_io_dynamic(phb);
+   pci_setup_phb_io_dynamic(phb, primary);
of_node_put(root);
 
pci_devs_phb_init_dynamic(phb);
diff -urN linux-2.5/arch/ppc64/kernel/pci.c test/arch/ppc64/kernel/pci.c
--- linux-2.5/arch/ppc64/kernel/pci.c   2005-03-07 08:21:53.0 +1100
+++ test/arch/ppc64/kernel/pci.c2005-03-07 21:04:02.0 +1100
@@ -619,7 +619,8 @@
res->end += io_virt_offset;
 }
 
-void __devinit pci_setup_phb_io_dynamic(struct pci_controller *hose)
+void __devinit pci_setup_phb_io_dynamic(struct pci_controller *hose,
+   int primary)
 {
unsigned long size = hose->pci_io_size;
unsigned long io_virt_offset;
@@ -631,6 +632,9 @@
hose->global_number, hose->io_base_phys,
(unsigned long) hose->io_base_virt);
 
+   if (primary)
+   pci_io_base = (unsigned long)hose->io_base_virt;
+
io_virt_offset = (unsigned long)hose->io_base_virt - pci_io_base;
res = &hose->io_resource;
res->start += io_virt_offset;
diff -urN linux-2.5/arch/ppc64/kernel/pci.h test/arch/ppc64/kernel/pci.h
--- linux-2.5/arch/ppc64/kernel/pci.h   2005-01-12 18:20:48.0 +1100
+++ test/arch/ppc64/kernel/pci.h2005-03-07 21:06:52.0 +1100
@@ -16,8 +16,7 @@
 
 extern void pci_setup_pci_controller(struct pci_controller *hose);
 extern void pci_setup_phb_io(struct pci_controller *hose, int primary);
-
-extern void pci_setup_phb_io_dynamic(struct pci_controller *hose);
+extern void pci_setup_phb_io_dynamic(struct pci_controller *hose, int primary);
 
 
 extern struct list_head hose_list;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ppc64: kprobes: handle trap variants while processing probes

2005-03-07 Thread Paul Mackerras

This patch is from Ananth N Mavinakayanahalli <[EMAIL PROTECTED]>.

While processing a kprobe, we were currently not handling all available 
trap variants available on PowerPC. This lead to the breakage of BUG()
handling in ppc64.

Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

diff -Naurp temp/linux-2.6.11-rc3/arch/ppc64/kernel/kprobes.c 
linux-2.6.11-rc3/arch/ppc64/kernel/kprobes.c
--- temp/linux-2.6.11-rc3/arch/ppc64/kernel/kprobes.c   2005-02-03 
07:26:53.0 +0530
+++ linux-2.6.11-rc3/arch/ppc64/kernel/kprobes.c2005-02-10 
18:08:25.0 +0530
@@ -105,8 +105,16 @@ static inline int kprobe_handler(struct 
p = get_kprobe(addr);
if (!p) {
unlock_kprobes();
-#if 0
if (*addr != BREAKPOINT_INSTRUCTION) {
+   /* 
+* PowerPC has multiple variants of the "trap"
+* instruction. If the current instruction is a
+* trap variant, it could belong to someone else
+*/
+   kprobe_opcode_t cur_insn = *addr;
+   if (IS_TW(cur_insn) || IS_TD(cur_insn) || 
+   IS_TWI(cur_insn) || IS_TDI(cur_insn))
+   goto no_kprobe;
/*
 * The breakpoint instruction was removed right
 * after we hit it.  Another cpu has removed
@@ -116,7 +124,6 @@ static inline int kprobe_handler(struct 
 */
ret = 1;
}
-#endif
/* Not one of ours: let kernel handle it */
goto no_kprobe;
}
diff -Naurp temp/linux-2.6.11-rc3/include/asm-ppc64/kprobes.h 
linux-2.6.11-rc3/include/asm-ppc64/kprobes.h
--- temp/linux-2.6.11-rc3/include/asm-ppc64/kprobes.h   2005-02-03 
07:25:50.0 +0530
+++ linux-2.6.11-rc3/include/asm-ppc64/kprobes.h2005-02-10 
18:08:58.0 +0530
@@ -35,6 +35,11 @@ typedef unsigned int kprobe_opcode_t;
 #define BREAKPOINT_INSTRUCTION 0x7fe8  /* trap */
 #define MAX_INSN_SIZE 1
 
+#define IS_TW(instr)   (((instr) & 0xfc0007fe) == 0x7c08)
+#define IS_TD(instr)   (((instr) & 0xfc0007fe) == 0x7c88)
+#define IS_TDI(instr)  (((instr) & 0xfc00) == 0x0800)
+#define IS_TWI(instr)  (((instr) & 0xfc00) == 0x0c00)
+
 #define JPROBE_ENTRY(pentry)   (kprobe_opcode_t *)((func_descr_t *)pentry)
 
 /* Architecture specific copy of original instruction */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PPC64: C99 initializers for hw_interrupt_type

2005-03-07 Thread Paul Mackerras

This patch is from Thomas Gleixner <[EMAIL PROTECTED]>.

Convert the initializers of hw_interrupt_type structures to C99
initializers.

Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

diff -urN 2.6.11-rc5.orig/arch/ppc64/kernel/i8259.c 
2.6.11-rc5/arch/ppc64/kernel/i8259.c
--- 2.6.11-rc5.orig/arch/ppc64/kernel/i8259.c   2005-01-24 12:25:36.0 
+0100
+++ 2.6.11-rc5/arch/ppc64/kernel/i8259.c2005-02-26 20:54:19.0 
+0100
@@ -131,14 +131,11 @@
 }
 
 struct hw_interrupt_type i8259_pic = {
-" i8259",
-NULL,
-NULL,
-i8259_unmask_irq,
-i8259_mask_irq,
-i8259_mask_and_ack_irq,
-i8259_end_irq,
-NULL
+   .typename = " i8259",
+   .enable = i8259_unmask_irq,
+   .disable = i8259_mask_irq,
+   .ack = i8259_mask_and_ack_irq,
+   .end = i8259_end_irq,
 };
 
 void __init i8259_init(int offset)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PPC64 Fix init_boot_display link error

2005-03-07 Thread Paul Mackerras

This patch is from Amos Waterland <[EMAIL PROTECTED]>.

In pmac_setup.c, the function init_boot_display as currently written
only makes sense with CONFIG_BOOTX_TEXT enabled, and causes a link error
if it is not enabled.

Signed-off-by: Amos Waterland <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

--- 1.15/arch/ppc64/kernel/pmac_setup.c 2005-01-08 00:43:52 -05:00
+++ edited/arch/ppc64/kernel/pmac_setup.c   2005-03-02 19:37:31 -05:00
@@ -244,7 +244,6 @@
 {
btext_drawchar(c);
 }
-#endif /* CONFIG_BOOTX_TEXT */
 
 static void __init init_boot_display(void)
 {
@@ -280,6 +279,7 @@
return;
}
 }
+#endif /* CONFIG_BOOTX_TEXT */
 
 /* 
  * Early initialization.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ppc64: Mode 2 PCI-X config space size fix

2005-03-06 Thread Paul Mackerras

This patch is from Brian King <[EMAIL PROTECTED]>.

When working with a PCI-X Mode 2 adapter on a PCI-X Mode 1 PPC64
system, the current code used to determine the config space size
of a device results in a PCI Master abort and an EEH error, resulting
in the device being taken offline. This patch checks OF to see if
the PCI bridge supports PCI-X Mode 2 and fails config accesses beyond
256 bytes if it does not.

Signed-off-by: Brian King <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

diff -urN linux-2.5/arch/ppc64/kernel/iSeries_pci.c 
test/arch/ppc64/kernel/iSeries_pci.c
--- linux-2.5/arch/ppc64/kernel/iSeries_pci.c   2005-01-22 09:25:41.0 
+1100
+++ test/arch/ppc64/kernel/iSeries_pci.c2005-03-07 18:21:41.0 
+1100
@@ -610,6 +610,10 @@
 
if (node == NULL)
return PCIBIOS_DEVICE_NOT_FOUND;
+   if (offset > 255) {
+   *val = ~0;
+   return PCIBIOS_BAD_REGISTER_NUMBER;
+   }
 
fn = hv_cfg_read_func[(size - 1) & 3];
HvCall3Ret16(fn, &ret, node->DsaAddr.DsaAddr, offset, 0);
@@ -636,6 +640,8 @@
 
if (node == NULL)
return PCIBIOS_DEVICE_NOT_FOUND;
+   if (offset > 255)
+   return PCIBIOS_BAD_REGISTER_NUMBER;
 
fn = hv_cfg_write_func[(size - 1) & 3];
ret = HvCall4(fn, node->DsaAddr.DsaAddr, offset, val, 0);
diff -urN linux-2.5/arch/ppc64/kernel/pSeries_pci.c 
test/arch/ppc64/kernel/pSeries_pci.c
--- linux-2.5/arch/ppc64/kernel/pSeries_pci.c   2005-01-12 18:20:48.0 
+1100
+++ test/arch/ppc64/kernel/pSeries_pci.c2005-03-07 18:21:41.0 
+1100
@@ -52,6 +52,16 @@
 
 extern struct mpic *pSeries_mpic;
 
+static int config_access_valid(struct device_node *dn, int where)
+{
+   if (where < 256)
+   return 1;
+   if (where < 4096 && dn->pci_ext_config_space)
+   return 1;
+
+   return 0;
+}
+
 static int rtas_read_config(struct device_node *dn, int where, int size, u32 
*val)
 {
int returnval = -1;
@@ -60,10 +70,11 @@
 
if (!dn)
return PCIBIOS_DEVICE_NOT_FOUND;
-   if (where & (size - 1))
+   if (!config_access_valid(dn, where))
return PCIBIOS_BAD_REGISTER_NUMBER;
 
-   addr = (dn->busno << 16) | (dn->devfn << 8) | where;
+   addr = ((where & 0xf00) << 20) | (dn->busno << 16) |
+   (dn->devfn << 8) | (where & 0xff);
buid = dn->phb->buid;
if (buid) {
ret = rtas_call(ibm_read_pci_config, 4, 2, &returnval,
@@ -108,10 +119,11 @@
 
if (!dn)
return PCIBIOS_DEVICE_NOT_FOUND;
-   if (where & (size - 1))
+   if (!config_access_valid(dn, where))
return PCIBIOS_BAD_REGISTER_NUMBER;
 
-   addr = (dn->busno << 16) | (dn->devfn << 8) | where;
+   addr = ((where & 0xf00) << 20) | (dn->busno << 16) |
+   (dn->devfn << 8) | (where & 0xff);
buid = dn->phb->buid;
if (buid) {
ret = rtas_call(ibm_write_pci_config, 5, 1, NULL, addr, buid >> 
32, buid & 0x, size, (ulong) val);
diff -urN linux-2.5/arch/ppc64/kernel/pci_dn.c test/arch/ppc64/kernel/pci_dn.c
--- linux-2.5/arch/ppc64/kernel/pci_dn.c2005-01-12 18:20:48.0 
+1100
+++ test/arch/ppc64/kernel/pci_dn.c 2005-03-07 18:21:41.0 +1100
@@ -37,6 +37,7 @@
 static void * __devinit update_dn_pci_info(struct device_node *dn, void *data)
 {
struct pci_controller *phb = data;
+   int *type = (int *)get_property(dn, "ibm,pci-config-space-type", NULL);
u32 *regs;
 
dn->phb = phb;
@@ -46,6 +47,8 @@
dn->busno = (regs[0] >> 16) & 0xff;
dn->devfn = (regs[0] >> 8) & 0xff;
}
+
+   dn->pci_ext_config_space = (type && *type == 1);
return NULL;
 }
 
diff -urN linux-2.5/include/asm-ppc64/prom.h test/include/asm-ppc64/prom.h
--- linux-2.5/include/asm-ppc64/prom.h  2005-01-29 09:58:49.0 +1100
+++ test/include/asm-ppc64/prom.h   2005-03-07 18:21:41.0 +1100
@@ -137,6 +137,7 @@
int devfn;  /* for pci devices */
int eeh_mode;   /* See eeh.h for possible EEH_MODEs */
int eeh_config_addr;
+   int pci_ext_config_space;   /* for pci devices */
struct  pci_controller *phb;/* for pci devices */
struct  iommu_table *iommu_table;   /* for phb's or bridges */
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] trivial fix for 2.6.11 raid6 compilation on ppc w/ Altivec

2005-03-03 Thread Paul Mackerras

Jeff Garzik writes:
> Rene Rebe wrote:
> > Hi,
> > 
> > 
> > --- linux-2.6.11/drivers/md/raid6altivec.uc.vanilla2005-03-02 
> > 16:44:56.407107752 +0100
> > +++ linux-2.6.11/drivers/md/raid6altivec.uc2005-03-02 
> > 16:45:22.424152560 +0100
> > @@ -108,7 +108,7 @@
> >  int raid6_have_altivec(void)
> >  {
> >  /* This assumes either all CPUs have Altivec or none does */
> > -return cur_cpu_spec->cpu_features & CPU_FTR_ALTIVEC;
> > +return cur_cpu_spec[0]->cpu_features & CPU_FTR_ALTIVEC;
> 
> 
> I nominate this as a candidate for linux-2.6.11 release branch.  :)

No.  Unfortunately if you fix ppc64 here you will break ppc, and vice
versa.  Yes, we are going to reconcile the cur_cpu_spec definitions
between ppc and ppc64. :)

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Page fault scalability patch V18: Drop first acquisition of ptl

2005-03-02 Thread Paul Mackerras

Andrew Morton writes:

> But if the approach which these patches take is not suitable for these
> architectures then they have no solution to the scalability problem.  The
> machines will perform suboptimally and more (perhaps conflicting)
> development will be needed.

We can do a pte_cmpxchg on ppc64.  We already have a busy bit in the
PTE and do most operations atomically, in order to ensure that we
don't get races or inconsistencies due to accesses to the PTE by the
low-level hash_page() routine (which instantiates a hardware PTE in
the hardware hash table based on a Linux PTE), because it already
accesses the linux page tables without taking the mm->page_table_lock.

However, there are other developments we are considering in this area:
notably Ben wants to change things so that when we invalidate a Linux
PTE we leave it busy until we actually remove the hardware PTE from
the hash table.  Also we are looking forward to DaveM's patch which
will change the generic MM code to give us the mm and address on all
PTE operations, which will simplify some things for us.  I don't
really want to have to think about pte_cmpxchg until those other
things are sorted out.

More generally, I would be interested to know what sorts of
applications or benchmarks show scalability problems on large machines
due to contention on mm->page_table_lock.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Consolidate compat_sys_waitid

2005-02-15 Thread Paul Mackerras

Stephen Rothwell writes:

> This patch does:
>   - consolidate the three implementations of compat_sys_waitid
> (some were called sys32_waitid).
>   - adds sys_waitid syscall to ppc
>   - adds sys_waitid and compat_sys_waitid syscalls to ppc64

Looks good to me.  Are you going to submit it to Andrew?

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PPC64 collect and export low-level cpu usage statistics

2005-02-09 Thread Paul Mackerras

POWER5 machines have a per-hardware-thread register which counts at a
rate which is proportional to the percentage of cycles on which the
cpu dispatches an instruction for this thread (if the thread gets all
the dispatch cycles it counts at the same rate as the timebase
register).  This register is also context-switched by the hypervisor.
Thus it gives a fine-grained measure of the actual cpu usage by the
thread over time.

This patch adds code to read this register every timer interrupt and
on every context switch.  The total over all virtual processors is
available through the existing /proc/ppc64/lparcfg file, giving a
way to measure the total cpu usage over the whole partition.

Andrew, this is relatively non-invasive, but nevertheless you may
prefer to put it in -mm until 2.6.11 is out.

Signed-off-by: Manish Ahuja <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

diff -urN linux-2.5/arch/ppc64/kernel/lparcfg.c test/arch/ppc64/kernel/lparcfg.c
--- linux-2.5/arch/ppc64/kernel/lparcfg.c   2005-01-06 13:13:08.0 
+1100
+++ test/arch/ppc64/kernel/lparcfg.c2005-02-09 22:38:05.508190616 +1100
@@ -33,8 +33,9 @@
 #include 
 #include 
 #include 
+#include 
 
-#define MODULE_VERS "1.5"
+#define MODULE_VERS "1.6"
 #define MODULE_NAME "lparcfg"
 
 /* #define LPARCFG_DEBUG */
@@ -214,13 +215,20 @@
 }
 
 static unsigned long get_purr(void);
-/* ToDo:  get sum of purr across all processors.  The purr collection code
- * is coming, but at this time is still problematic, so for now this
- * function will return 0.
- */
+
+/* Track sum of all purrs across all processors. This is used to further */
+/* calculate usage values by different applications   */
+
 static unsigned long get_purr(void)
 {
unsigned long sum_purr = 0;
+   int cpu;
+   struct cpu_usage *cu;
+
+   for_each_cpu(cpu) {
+   cu = &per_cpu(cpu_usage_array, cpu);
+   sum_purr += cu->current_tb;
+   }
return sum_purr;
 }
 
diff -urN linux-2.5/arch/ppc64/kernel/process.c test/arch/ppc64/kernel/process.c
--- linux-2.5/arch/ppc64/kernel/process.c   2005-01-29 09:58:49.0 
+1100
+++ test/arch/ppc64/kernel/process.c2005-02-10 08:09:22.428216944 +1100
@@ -51,6 +51,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifndef CONFIG_SMP
 struct task_struct *last_task_used_math = NULL;
@@ -168,6 +169,8 @@
 
 #endif /* CONFIG_ALTIVEC */
 
+DEFINE_PER_CPU(struct cpu_usage, cpu_usage_array);
+
 struct task_struct *__switch_to(struct task_struct *prev,
struct task_struct *new)
 {
@@ -206,6 +209,21 @@
new_thread = &new->thread;
old_thread = ¤t->thread;
 
+/* Collect purr utilization data per process and per processor wise */
+/* purr is nothing but processor time base  */
+
+#if defined(CONFIG_PPC_PSERIES)
+   if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) {
+   struct cpu_usage *cu = &__get_cpu_var(cpu_usage_array);
+   long unsigned start_tb, current_tb;
+   start_tb = old_thread->start_tb;
+   cu->current_tb = current_tb = mfspr(SPRN_PURR);
+   old_thread->accum_tb += (current_tb - start_tb);
+   new_thread->start_tb = current_tb;
+   }
+#endif
+
+
local_irq_save(flags);
last = _switch(old_thread, new_thread);
 
diff -urN linux-2.5/arch/ppc64/kernel/time.c test/arch/ppc64/kernel/time.c
--- linux-2.5/arch/ppc64/kernel/time.c  2005-01-22 09:25:41.0 +1100
+++ test/arch/ppc64/kernel/time.c   2005-02-10 08:09:34.412257896 +1100
@@ -334,6 +334,14 @@
}
 #endif
 
+/* collect purr register values often, for accurate calculations */
+#if defined(CONFIG_PPC_PSERIES)
+   if (cur_cpu_spec->firmware_features & FW_FEATURE_SPLPAR) {
+   struct cpu_usage *cu = &__get_cpu_var(cpu_usage_array);
+   cu->current_tb = mfspr(SPRN_PURR);
+   }
+#endif
+
irq_exit();
 
return 1;
diff -urN linux-2.5/include/asm-ppc64/processor.h 
test/include/asm-ppc64/processor.h
--- linux-2.5/include/asm-ppc64/processor.h 2005-01-17 08:47:37.0 
+1100
+++ test/include/asm-ppc64/processor.h  2005-02-09 22:38:05.528187576 +1100
@@ -562,7 +562,9 @@
double  fpr[32];/* Complete floating point set */
unsigned long   fpscr;  /* Floating point status (plus pad) */
unsigned long   fpexc_mode; /* Floating-point exception mode */
-   unsigned long   pad[3]; /* was saved_msr, saved_softe */
+   unsigned long   start_tb;   /* Start purr when proc switched in */
+   unsigned long   accum_tb;   /* Total accumilated purr for process */
+   unsigned long   pad;/* was saved_msr, saved_softe */
 #ifdef CONFIG_ALTIVEC
/* Complete AltiVec register set */

Re: A scrub daemon (prezeroing)

2005-02-04 Thread Paul Mackerras

Christoph Lameter writes:

> scrubd clears pages of orders 7-4 by default. That means 2^4 to 2^7
> pages are cleared at once.

So are you saying that clearing an order 4 page will take measurably
less time than clearing 16 order 0 pages?  I find that hard to
believe.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PPC64 replace last usage of vio dma mapping routines

2005-02-04 Thread Paul Mackerras

This patch is from Stephen Rothwell <[EMAIL PROTECTED]>.

This patch just replaces the last usage of the vio dma mapping routines
with the equivalent generic dma mapping routines.

Signed-off-by: Stephen Rothwell <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

diff -ruNp linus-bk/drivers/net/ibmveth.c linus-bk-vio.1/drivers/net/ibmveth.c
--- linus-bk/drivers/net/ibmveth.c  2004-12-08 04:06:06.0 +1100
+++ linus-bk-vio.1/drivers/net/ibmveth.c2005-01-31 16:45:28.0 
+1100
@@ -218,7 +218,8 @@ static void ibmveth_replenish_buffer_poo
ibmveth_assert(index != IBM_VETH_INVALID_MAP);
ibmveth_assert(pool->skbuff[index] == NULL);
 
-   dma_addr = vio_map_single(adapter->vdev, skb->data, 
pool->buff_size, DMA_FROM_DEVICE);
+   dma_addr = dma_map_single(&adapter->vdev->dev, skb->data,
+   pool->buff_size, DMA_FROM_DEVICE);
 
pool->free_map[free_index] = IBM_VETH_INVALID_MAP;
pool->dma_addr[index] = dma_addr;
@@ -238,7 +239,9 @@ static void ibmveth_replenish_buffer_poo
pool->free_map[free_index] = IBM_VETH_INVALID_MAP;
pool->skbuff[index] = NULL;
pool->consumer_index--;
-   vio_unmap_single(adapter->vdev, pool->dma_addr[index], 
pool->buff_size, DMA_FROM_DEVICE);
+   dma_unmap_single(&adapter->vdev->dev,
+   pool->dma_addr[index], pool->buff_size,
+   DMA_FROM_DEVICE);
dev_kfree_skb_any(skb);
adapter->replenish_add_buff_failure++;
break;
@@ -299,7 +302,7 @@ static void ibmveth_free_buffer_pool(str
for(i = 0; i < pool->size; ++i) {
struct sk_buff *skb = pool->skbuff[i];
if(skb) {
-   vio_unmap_single(adapter->vdev,
+   dma_unmap_single(&adapter->vdev->dev,
 pool->dma_addr[i],
 pool->buff_size,
 DMA_FROM_DEVICE);
@@ -337,7 +340,7 @@ static void ibmveth_remove_buffer_from_p
 
adapter->rx_buff_pool[pool].skbuff[index] = NULL;
 
-   vio_unmap_single(adapter->vdev,
+   dma_unmap_single(&adapter->vdev->dev,
 adapter->rx_buff_pool[pool].dma_addr[index],
 adapter->rx_buff_pool[pool].buff_size,
 DMA_FROM_DEVICE);
@@ -408,7 +411,9 @@ static void ibmveth_cleanup(struct ibmve
 {
if(adapter->buffer_list_addr != NULL) {
if(!dma_mapping_error(adapter->buffer_list_dma)) {
-   vio_unmap_single(adapter->vdev, 
adapter->buffer_list_dma, 4096, DMA_BIDIRECTIONAL);
+   dma_unmap_single(&adapter->vdev->dev,
+   adapter->buffer_list_dma, 4096,
+   DMA_BIDIRECTIONAL);
adapter->buffer_list_dma = DMA_ERROR_CODE;
}
free_page((unsigned long)adapter->buffer_list_addr);
@@ -417,7 +422,9 @@ static void ibmveth_cleanup(struct ibmve
 
if(adapter->filter_list_addr != NULL) {
if(!dma_mapping_error(adapter->filter_list_dma)) {
-   vio_unmap_single(adapter->vdev, 
adapter->filter_list_dma, 4096, DMA_BIDIRECTIONAL);
+   dma_unmap_single(&adapter->vdev->dev,
+   adapter->filter_list_dma, 4096,
+   DMA_BIDIRECTIONAL);
adapter->filter_list_dma = DMA_ERROR_CODE;
}
free_page((unsigned long)adapter->filter_list_addr);
@@ -426,7 +433,10 @@ static void ibmveth_cleanup(struct ibmve
 
if(adapter->rx_queue.queue_addr != NULL) {
if(!dma_mapping_error(adapter->rx_queue.queue_dma)) {
-   vio_unmap_single(adapter->vdev, 
adapter->rx_queue.queue_dma, adapter->rx_queue.queue_len, DMA_BIDIRECTIONAL);
+   dma_unmap_single(&adapter->vdev->dev,
+   adapter->rx_queue.queue_dma,
+   adapter->rx_queue.queue_len,
+   DMA_BIDIRECTIONAL);
adapter->rx_queue.queue_dma = DMA_ERROR_CODE;
}
kfree(adapter->rx_queue.queue_addr);
@@ -472,9 +482,13 @@ static int ibmveth_open(struct net_devi

[PATCH] Fix devfs name for the hvcs driver

2005-02-04 Thread Paul Mackerras

This patch is from Jimi Xenidis <[EMAIL PROTECTED]>.

The hvcs driver does not register a devfs_name resulting in devfs
creating /dev/* entries.
The following one line patch remedies the problem.

Signed-off-by: Jimi Xenidis <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

--- orig/drivers/char/hvcs.c
+++ mod/drivers/char/hvcs.c
@@ -1363,6 +1363,7 @@
 
hvcs_tty_driver->driver_name = hvcs_driver_name;
hvcs_tty_driver->name = hvcs_device_node;
+   hvcs_tty_driver->devfs_name = hvcs_device_node;
 
/*
 * We'll let the system assign us a major number, indicated by leaving
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PPC64 show -1 for physical_id of non-present cpus

2005-02-04 Thread Paul Mackerras

This patch is from Nathan Lynch <[EMAIL PROTECTED]>.

Make the physical_id cpu sysfs attribute on ppc64 show -1 instead of
65535 for non-present cpus.

Signed-off-by: Nathan Lynch <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

diff -puN arch/ppc64/kernel/sysfs.c~make-cpu-physical_id-signed 
arch/ppc64/kernel/sysfs.c
--- linux-2.6.11-rc2-mm1/arch/ppc64/kernel/sysfs.c~make-cpu-physical_id-signed  
2005-01-27 15:03:16.0 -0600
+++ linux-2.6.11-rc2-mm1-nathanl/arch/ppc64/kernel/sysfs.c  2005-01-27 
15:05:12.0 -0600
@@ -387,7 +387,7 @@ static ssize_t show_physical_id(struct s
 {
struct cpu *cpu = container_of(dev, struct cpu, sysdev);
 
-   return sprintf(buf, "%u\n", get_hard_smp_processor_id(cpu->sysdev.id));
+   return sprintf(buf, "%d\n", get_hard_smp_processor_id(cpu->sysdev.id));
 }
 static SYSDEV_ATTR(physical_id, 0444, show_physical_id, NULL);
 
diff -puN include/asm-ppc64/paca.h~make-cpu-physical_id-signed 
include/asm-ppc64/paca.h
--- linux-2.6.11-rc2-mm1/include/asm-ppc64/paca.h~make-cpu-physical_id-signed   
2005-01-27 15:04:14.0 -0600
+++ linux-2.6.11-rc2-mm1-nathanl/include/asm-ppc64/paca.h   2005-01-27 
15:04:51.0 -0600
@@ -68,7 +68,7 @@ struct paca_struct {
u64 stab_real;  /* Absolute address of segment table */
u64 stab_addr;  /* Virtual address of segment table */
void *emergency_sp; /* pointer to emergency stack */
-   u16 hw_cpu_id;  /* Physical processor number */
+   s16 hw_cpu_id;  /* Physical processor number */
u8 cpu_start;   /* At startup, processor spins until */
/* this becomes non-zero. */
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PPC64 correct return code in syscall auditing

2005-02-04 Thread Paul Mackerras

This patch is from David Woodhouse <[EMAIL PROTECTED]>.

We were pretending that every syscall returned zero. Don't do that.

Signed-Off-By: David Woodhouse <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

= arch/ppc64/kernel/entry.S 1.51 vs edited =
--- 1.51/arch/ppc64/kernel/entry.S  Thu Jan 13 09:48:36 2005
+++ edited/arch/ppc64/kernel/entry.SThu Jan 20 16:14:50 2005
@@ -231,6 +231,7 @@
 syscall_exit_trace:
std r3,GPR3(r1)
bl  .save_nvgprs
+   addir3,r1,STACK_FRAME_OVERHEAD
bl  .do_syscall_trace_leave
REST_NVGPRS(r1)
ld  r3,GPR3(r1)
@@ -324,6 +325,7 @@
ld  r4,TI_FLAGS(r4)
andi.   r4,r4,(_TIF_SYSCALL_T_OR_A|_TIF_SINGLESTEP)
beq+81f
+   addir3,r1,STACK_FRAME_OVERHEAD
bl  .do_syscall_trace_leave
 81:b   .ret_from_except
 
= arch/ppc64/kernel/ptrace.c 1.13 vs edited =
--- 1.13/arch/ppc64/kernel/ptrace.c Fri Dec 17 08:09:09 2004
+++ edited/arch/ppc64/kernel/ptrace.c   Thu Jan 20 16:24:12 2005
@@ -313,10 +313,10 @@
do_syscall_trace();
 }
 
-void do_syscall_trace_leave(void)
+void do_syscall_trace_leave(struct pt_regs *regs)
 {
if (unlikely(current->audit_context))
-   audit_syscall_exit(current, 0); /* FIXME: pass pt_regs */
+   audit_syscall_exit(current, regs->result);
 
if ((test_thread_flag(TIF_SYSCALL_TRACE)
 || test_thread_flag(TIF_SINGLESTEP))
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: A scrub daemon (prezeroing)

2005-02-04 Thread Paul Mackerras

Christoph Lameter writes:

> If the program does not use these cache lines then you have wasted time
> in the page fault handler allocating and handling them. That is what
> prezeroing does for you.

The program is going to access at least one cache line of the new
page.  On my G5, it takes _less_ time to clear the whole page and pull
in one cache line from L2 cache to L1 than it does to pull in that
same cache line from memory.

> Yes but its a short burst that only occurs very infrequestly and it takes

It occurs just as often as we clear pages in the page fault handler.
We aren't clearing any fewer pages by prezeroing, we are just clearing
them a bit earlier.

> advantage of all the optimizations that modern memory subsystems have for
> linear accesses. And if hardware exists that can offload that from the cpu
> then the cpu caches are only minimally affected.

I can believe that prezeroing could provide a benefit on some
machines, but I don't think it will provide any on ppc64.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: A scrub daemon (prezeroing)

2005-02-03 Thread Paul Mackerras

Christoph Lameter writes:

> You need to think about this in a different way. Prezeroing only makes
> sense if it can avoid using cache lines that the zeroing in the
> hot paths would have to use since it touches all cachelines on
> the page (the ppc instruction is certainly nice and avoids a cacheline
> read but it still uses a cacheline!). The zeroing in itself (within the

The dcbz instruction on the G5 (PPC970) establishes the new cache line
in the L2 cache and doesn't disturb the L1 cache (except to invalidate
the line in the L1 data cache if it is present there).  The L2 cache
is 512kB and 8-way set associative (LRU).  So zeroing a page is
unlikely to disturb the cache lines that the page fault handler is
using.  Then, when the page fault handler returns to the user program,
any cache lines that the program wants to touch are available in 12
cycles (L2 hit latency) instead of 200 - 300 (memory access latency).

> cpu caches) is extraordinarily fast and the zeroing of large portions of
> memory is so too. That is why the impact of scrubd is negligible since
> its extremely fast.

But that also disturbs cache lines that may well otherwise be useful.

> The point is to save activating cachelines not the time zeroing in itself
> takes. This only works if only parts of the page are needed immediately
> after the page fault. All of that has been documented in earlier posts on
> the subject.

As has my scepticism about pre-zeroing actually providing any benefit
on ppc64.  Nevertheless, the only definitive answer is to actually
measure the performance both ways.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: A scrub daemon (prezeroing)

2005-02-03 Thread Paul Mackerras

Rik van Riel writes:

> I'm not convinced.  Zeroing a page takes 2000-4000 CPU
> cycles, while faulting the page from RAM into cache takes
> 200-400 CPU cycles per cache line, or 6000-12000 CPU
> cycles.

On my G5 it takes ~200 cycles to zero a whole page.  In other words it
takes about the same time to zero a page as to bring in a single cache
line from memory.  (PPC has an instruction to establish a whole cache
line of zeroes in modified state without reading anything from
memory.)

Thus I can't see how prezeroing can ever be a win on ppc64.

Regards,
Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PPC64 use kref for device_node refcounting

2005-01-22 Thread Paul Mackerras

This patch is from Nathan Lynch <[EMAIL PROTECTED]>.

This changes struct device_node and associated code to use the kref
api for object refcounting and freeing.  I've given it some testing on
pSeries with cpu add/remove and verified that the release function
works.  The change is somewhat cosmetic but it does make the code
easier to understand... at least I think so =)

The only real change is that the refcount on all device_nodes is
initialized at 1, and the device node is freed when the refcount
reaches 0 (of_remove_node has the extra "put" to ensure that this
happens).  This lets us get rid of the OF_STALE flag and macros in
prom.h.

Signed-off-by: Nathan Lynch <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

diff -urN linux-2.5/arch/ppc64/kernel/prom.c test/arch/ppc64/kernel/prom.c
--- linux-2.5/arch/ppc64/kernel/prom.c  2005-01-22 09:25:41.0 +1100
+++ test/arch/ppc64/kernel/prom.c   2005-01-22 20:52:35.0 +1100
@@ -717,6 +717,7 @@
dad->next->sibling = np;
dad->next = np;
}
+   kref_init(&np->kref);
}
while(1) {
u32 sz, noff;
@@ -1475,24 +1476,31 @@
  * @node:  Node to inc refcount, NULL is supported to
  * simplify writing of callers
  *
- * Returns the node itself or NULL if gone.
+ * Returns node.
  */
 struct device_node *of_node_get(struct device_node *node)
 {
-   if (node && !OF_IS_STALE(node)) {
-   atomic_inc(&node->_users);
-   return node;
-   }
-   return NULL;
+   if (node)
+   kref_get(&node->kref);
+   return node;
 }
 EXPORT_SYMBOL(of_node_get);
 
+static inline struct device_node * kref_to_device_node(struct kref *kref)
+{
+   return container_of(kref, struct device_node, kref);
+}
+
 /**
- * of_node_cleanup - release a dynamically allocated node
- * @arg:  Node to be released
+ * of_node_release - release a dynamically allocated node
+ * @kref:  kref element of the node to be released
+ *
+ * In of_node_put() this function is passed to kref_put()
+ * as the destructor.
  */
-static void of_node_cleanup(struct device_node *node)
+static void of_node_release(struct kref *kref)
 {
+   struct device_node *node = kref_to_device_node(kref);
struct property *prop = node->properties;
 
if (!OF_IS_DYNAMIC(node))
@@ -1518,19 +1526,8 @@
  */
 void of_node_put(struct device_node *node)
 {
-   if (!node)
-   return;
-
-   WARN_ON(0 == atomic_read(&node->_users));
-
-   if (OF_IS_STALE(node)) {
-   if (atomic_dec_and_test(&node->_users)) {
-   of_node_cleanup(node);
-   return;
-   }
-   }
-   else
-   atomic_dec(&node->_users);
+   if (node)
+   kref_put(&node->kref, of_node_release);
 }
 EXPORT_SYMBOL(of_node_put);
 
@@ -1773,7 +1770,7 @@
 
np->properties = proplist;
OF_MARK_DYNAMIC(np);
-   of_node_get(np);
+   kref_init(&np->kref);
np->parent = derive_parent(path);
if (!np->parent) {
kfree(np);
@@ -1809,8 +1806,9 @@
 }
 
 /*
- * Remove an OF device node from the system.
- * Caller should have already "gotten" np.
+ * "Unplug" a node from the device tree.  The caller must hold
+ * a reference to the node.  The memory associated with the node
+ * is not freed until its refcount goes to zero.
  */
 int of_remove_node(struct device_node *np)
 {
@@ -1828,7 +1826,6 @@
of_cleanup_node(np);
 
write_lock(&devtree_lock);
-   OF_MARK_STALE(np);
remove_node_proc_entries(np);
if (allnodes == np)
allnodes = np->allnext;
@@ -1853,6 +1850,7 @@
}
write_unlock(&devtree_lock);
of_node_put(parent);
+   of_node_put(np); /* Must decrement the refcount */
return 0;
 }
 
diff -urN linux-2.5/include/asm-ppc64/prom.h test/include/asm-ppc64/prom.h
--- linux-2.5/include/asm-ppc64/prom.h  2005-01-06 13:13:10.0 +1100
+++ test/include/asm-ppc64/prom.h   2005-01-22 20:52:35.0 +1100
@@ -149,18 +149,15 @@
struct  proc_dir_entry *pde;   /* this node's proc directory */
struct  proc_dir_entry *name_link; /* name symlink */
struct  proc_dir_entry *addr_link; /* addr symlink */
-   atomic_t _users; /* reference count */
+   struct  kref kref;
unsigned long _flags;
 };
 
 extern struct device_node *of_chosen;
 
 /* flag descriptions */
-#define OF_STALE   0 /* node is slated for deletion */
 #define OF_DYNAMIC 1 /* node and properties were allocated via kmalloc */
 
-#define OF_IS_STALE(x) test_bit(OF_STALE, &x->_flags)
-#define OF_MARK_STALE(x) set_bi

[PATCH] PPC64 sparse fixes for cpu feature constants

2005-01-22 Thread Paul Mackerras

This patch is originally from Nathan Lynch <[EMAIL PROTECTED]>.

Sparse gives a warning "constant ... is so big it is long" for every
expression where we check bits in the cur_cpu_spec->cpu_features
value.  This patch removes the warnings by using the ASM_CONST macro.

Signed-off-by: Nathan Lynch <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

diff -urN linux-2.5/include/asm-ppc64/cacheflush.h 
test/include/asm-ppc64/cacheflush.h
--- linux-2.5/include/asm-ppc64/cacheflush.h2004-05-31 19:02:01.0 
+1000
+++ test/include/asm-ppc64/cacheflush.h 2005-01-22 20:13:46.0 +1100
@@ -40,7 +40,7 @@
 
 static inline void flush_icache_range(unsigned long start, unsigned long stop)
 {
-   if (!(cur_cpu_spec->cpu_features & ASM_CONST(CPU_FTR_COHERENT_ICACHE)))
+   if (!(cur_cpu_spec->cpu_features & CPU_FTR_COHERENT_ICACHE))
__flush_icache_range(start, stop);
 }
 
diff -urN linux-2.5/include/asm-ppc64/cputable.h 
test/include/asm-ppc64/cputable.h
--- linux-2.5/include/asm-ppc64/cputable.h  2004-06-28 14:30:55.0 
+1000
+++ test/include/asm-ppc64/cputable.h   2005-01-22 20:13:46.0 +1100
@@ -16,6 +16,7 @@
 #define __ASM_PPC_CPUTABLE_H
 
 #include 
+#include  /* for ASM_CONST */
 
 /* Exposed to userland CPU features - Must match ppc32 definitions */
 #define PPC_FEATURE_32 0x8000
@@ -103,38 +104,38 @@
 /* CPU kernel features */
 
 /* Retain the 32b definitions for the time being - use bottom half of word */
-#define CPU_FTR_SPLIT_ID_CACHE 0x0001
-#define CPU_FTR_L2CR   0x0002
-#define CPU_FTR_SPEC7450   0x0004
-#define CPU_FTR_ALTIVEC0x0008
-#define CPU_FTR_TAU0x0010
-#define CPU_FTR_CAN_DOZE   0x0020
-#define CPU_FTR_USE_TB 0x0040
-#define CPU_FTR_604_PERF_MON   0x0080
-#define CPU_FTR_6010x0100
-#define CPU_FTR_HPTE_TABLE 0x0200
-#define CPU_FTR_CAN_NAP0x0400
-#define CPU_FTR_L3CR   0x0800
-#define CPU_FTR_L3_DISABLE_NAP 0x1000
-#define CPU_FTR_NAP_DISABLE_L2_PR  0x2000
-#define CPU_FTR_DUAL_PLL_750FX 0x4000
+#define CPU_FTR_SPLIT_ID_CACHE ASM_CONST(0x0001)
+#define CPU_FTR_L2CR   ASM_CONST(0x0002)
+#define CPU_FTR_SPEC7450   ASM_CONST(0x0004)
+#define CPU_FTR_ALTIVECASM_CONST(0x0008)
+#define CPU_FTR_TAUASM_CONST(0x0010)
+#define CPU_FTR_CAN_DOZE   ASM_CONST(0x0020)
+#define CPU_FTR_USE_TB ASM_CONST(0x0040)
+#define CPU_FTR_604_PERF_MON   ASM_CONST(0x0080)
+#define CPU_FTR_601ASM_CONST(0x0100)
+#define CPU_FTR_HPTE_TABLE ASM_CONST(0x0200)
+#define CPU_FTR_CAN_NAPASM_CONST(0x0400)
+#define CPU_FTR_L3CR   ASM_CONST(0x0800)
+#define CPU_FTR_L3_DISABLE_NAP ASM_CONST(0x1000)
+#define CPU_FTR_NAP_DISABLE_L2_PR  ASM_CONST(0x2000)
+#define CPU_FTR_DUAL_PLL_750FX ASM_CONST(0x4000)
 
 /* Add the 64b processor unique features in the top half of the word */
-#define CPU_FTR_SLB0x0001
-#define CPU_FTR_16M_PAGE   0x0002
-#define CPU_FTR_TLBIEL 0x0004
-#define CPU_FTR_NOEXECUTE  0x0008
-#define CPU_FTR_NODSISRALIGN   0x0010
-#define CPU_FTR_IABR   0x0020
-#define CPU_FTR_MMCRA  0x0040
-#define CPU_FTR_PMC8   0x0080
-#define CPU_FTR_SMT0x0100
-#define CPU_FTR_COHERENT_ICACHE0x0200
-#define CPU_FTR_LOCKLESS_TLBIE 0x0400
-#define CPU_FTR_MMCRA_SIHV 0x0800
+#define CPU_FTR_SLBASM_CONST(0x0001)
+#define CPU_FTR_16M_PAGE   ASM_CONST(0x0002)
+#define CPU_FTR_TLBIEL ASM_CONST(0x0004)
+#define CPU_FTR_NOEXECUTE  ASM_CONST(0x0008)
+#define CPU_FTR_NODSISRALIGN   ASM_CONST(0x0010)
+#define CPU_FTR_IABR   ASM_CONST(0x0020)
+#define CPU_FTR_MMCRA  ASM_CONST(0x0040)
+#define CPU_FTR_PMC8   ASM_CONST(0x0080)
+#define CPU_FTR_SMTASM_CONST(0x0100)
+#define CPU_FTR_COHERE

Re: [PATCH] PPC: fix stack alignment for signal handlers

2005-01-22 Thread Paul Mackerras

Roland McGrath writes:

> For PPC32 signal handlers, while the frame itself was of properly aligned
> size, no alignment of the starting stack pointer was done at all, so that a
> signal handler can still get a misaligned stack pointer if the interrupted
> registers had one, though the kernel isn't gratuitously misaligning good
> ones like it is for PPC64.  I added explicit alignment to fix that.

This part is unnecessary, because arch/ppc/kernel/signal.c:do_signal()
already aligns the stack pointer to a 16-byte boundary:

if ((ka.sa.sa_flags & SA_ONSTACK) && current->sas_ss_size
&& !on_sig_stack(regs->gpr[1]))
newsp = current->sas_ss_sp + current->sas_ss_size;
else
newsp = regs->gpr[1];
newsp &= ~0xfUL;

/* Whee!  Actually deliver the signal.  */
if (ka.sa.sa_flags & SA_SIGINFO)
handle_rt_signal(signr, &ka, &info, oldset, regs, newsp);
else
handle_signal(signr, &ka, &info, oldset, regs, newsp);

The additions to arch/ppc64/kernel/signal32.c are likewise
unnecessary, because do_signal32() also does newsp &= ~0xfUL (in fact
the code there is very similar to the ppc32 code).

You are correct about the 64-bit case though.  I thought we had fixed
that but evidently not.  Your patch looks fine as far as
arch/ppc64/kernel/signal.c is concerned.

Regards,
Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PPC64: Trivial Cleanup: EEH_REGION

2005-01-21 Thread Paul Mackerras

This patch is originally from Linas Vepstas <[EMAIL PROTECTED]>.

This is a dumb, dorky cleanup patch:
Per last round of emails, the concept of EEH_REGION is gone, 
but a few stubs remained.  This patch removes them.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

diff -urN linux-2.5/arch/ppc64/mm/hash_utils.c test/arch/ppc64/mm/hash_utils.c
--- linux-2.5/arch/ppc64/mm/hash_utils.c2005-01-06 13:13:08.0 
+1100
+++ test/arch/ppc64/mm/hash_utils.c 2005-01-22 16:42:48.0 +1100
@@ -294,12 +294,6 @@
vsid = get_kernel_vsid(ea);
break;
 #if 0
-   case EEH_REGION_ID:
-   /*
-* Should only be hit if there is an access to MMIO space
-* which is protected by EEH.
-* Send the problem up to do_page_fault 
-*/
case KERNEL_REGION_ID:
/*
 * Should never get here - entire 0xC0... region is bolted.
diff -urN linux-2.5/arch/ppc64/mm/slb.c test/arch/ppc64/mm/slb.c
--- linux-2.5/arch/ppc64/mm/slb.c   2005-01-06 13:13:08.0 +1100
+++ test/arch/ppc64/mm/slb.c2005-01-22 16:44:26.0 +1100
@@ -78,7 +78,7 @@
 void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
 {
unsigned long offset = get_paca()->slb_cache_ptr;
-   unsigned long esid_data;
+   unsigned long esid_data = 0;
unsigned long pc = KSTK_EIP(tsk);
unsigned long stack = KSTK_ESP(tsk);
unsigned long unmapped_base;
@@ -97,11 +97,8 @@
}
 
/* Workaround POWER5 < DD2.1 issue */
-   if (offset == 1 || offset > SLB_CACHE_ENTRIES) {
-   /* flush segment in EEH region, we shouldn't ever
-* access addresses in this region. */
-   asm volatile("slbie %0" : : "r"(EEHREGIONBASE));
-   }
+   if (offset == 1 || offset > SLB_CACHE_ENTRIES)
+   asm volatile("slbie %0" : : "r" (esid_data));
 
get_paca()->slb_cache_ptr = 0;
get_paca()->context = mm->context;
diff -urN linux-2.5/include/asm-ppc64/page.h test/include/asm-ppc64/page.h
--- linux-2.5/include/asm-ppc64/page.h  2005-01-06 13:13:10.0 +1100
+++ test/include/asm-ppc64/page.h   2005-01-22 16:42:48.0 +1100
@@ -205,10 +205,8 @@
 #define KERNELBASE  PAGE_OFFSET
 #define VMALLOCBASE ASM_CONST(0xD000)
 #define IOREGIONBASEASM_CONST(0xE000)
-#define EEHREGIONBASE   ASM_CONST(0xA000)
 
 #define IO_REGION_ID   (IOREGIONBASE>>REGION_SHIFT)
-#define EEH_REGION_ID  (EEHREGIONBASE>>REGION_SHIFT)
 #define VMALLOC_REGION_ID  (VMALLOCBASE>>REGION_SHIFT)
 #define KERNEL_REGION_ID   (KERNELBASE>>REGION_SHIFT)
 #define USER_REGION_ID (0UL)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PPC64 replace schedule_timeout in __cpu_up

2005-01-21 Thread Paul Mackerras

This patch is from Nishanth Aravamudan <[EMAIL PROTECTED]>.

Replace schedule_timeout() with msleep to simplify the code and to
express the delay in milliseconds instead of HZ.

Signed-off-by: Nishanth Aravamudan <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

--- 2.6.11-rc1-kj-v/arch/ppc64/kernel/smp.c 2005-01-15 16:55:41.0 
-0800
+++ 2.6.11-rc1-kj/arch/ppc64/kernel/smp.c   2005-01-15 17:30:16.0 
-0800
@@ -459,8 +459,7 @@ int __devinit __cpu_up(unsigned int cpu)
 * hotplug case.  Wait five seconds.
 */
for (c = 25; c && !cpu_callin_map[cpu]; c--) {
-   set_current_state(TASK_UNINTERRUPTIBLE);
-   schedule_timeout(HZ/5);
+   msleep(200);
}
 #endif
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PPC64 replace schedule_timeout in pSeries_cpu_die

2005-01-21 Thread Paul Mackerras

This patch is from Nishanth Aravamudan <[EMAIL PROTECTED]>.

Replace schedule_timeout() with msleep to simplify the code and to
express the delay in milliseconds instead of HZ.

Signed-off-by: Nishanth Aravamudan <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

--- 2.6.11-rc1-kj-v/arch/ppc64/kernel/pSeries_smp.c 2005-01-15 
16:55:41.0 -0800
+++ 2.6.11-rc1-kj/arch/ppc64/kernel/pSeries_smp.c   2005-01-15 
17:21:12.0 -0800
@@ -107,8 +107,7 @@ void pSeries_cpu_die(unsigned int cpu)
cpu_status = query_cpu_stopped(pcpu);
if (cpu_status == 0 || cpu_status == -1)
break;
-   set_current_state(TASK_UNINTERRUPTIBLE);
-   schedule_timeout(HZ/5);
+   msleep(200);
}
if (cpu_status != 0) {
printk("Querying DEAD? cpu %i (%i) shows %i\n",
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PPC64 replace schedule_timeout in iSeries_pci_reset

2005-01-21 Thread Paul Mackerras

This patch is from Nishanth Aravamudan <[EMAIL PROTECTED]>.

Replace schedule_timeout() with msleep to simplify the code and to
express the delay in milliseconds instead of HZ.

Signed-off-by: Nishanth Aravamudan <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

--- 2.6.11-rc1-kj-v/arch/ppc64/kernel/iSeries_pci_reset.c   2005-01-15 
16:55:41.0 -0800
+++ 2.6.11-rc1-kj/arch/ppc64/kernel/iSeries_pci_reset.c 2005-01-15 
17:17:54.0 -0800
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -49,7 +50,7 @@
 int iSeries_Device_ToggleReset(struct pci_dev *PciDev, int AssertTime,
int DelayTime)
 {
-   unsigned long AssertDelay, WaitDelay;
+   unsigned int AssertDelay, WaitDelay;
struct iSeries_Device_Node *DeviceNode =
(struct iSeries_Device_Node *)PciDev->sysdata;
 
@@ -62,14 +63,14 @@ int iSeries_Device_ToggleReset(struct pc
 * Set defaults, Assert is .5 second, Wait is 3 seconds.
 */
if (AssertTime == 0)
-   AssertDelay = (5 * HZ) / 10;
+   AssertDelay = 500;
else
-   AssertDelay = (AssertTime * HZ) / 10;
+   AssertDelay = AssertTime * 100;
 
if (DelayTime == 0)
-   WaitDelay = (30 * HZ) / 10;
+   WaitDelay = 3000;
else
-   WaitDelay = (DelayTime * HZ) / 10;
+   WaitDelay = DelayTime * 100;
 
/*
 * Assert reset
@@ -77,8 +78,7 @@ int iSeries_Device_ToggleReset(struct pc
DeviceNode->ReturnCode = HvCallPci_setSlotReset(ISERIES_BUS(DeviceNode),
0x00, DeviceNode->AgentId, 1);
if (DeviceNode->ReturnCode == 0) {
-   set_current_state(TASK_UNINTERRUPTIBLE);
-   schedule_timeout(AssertDelay);   /* Sleep for the time */
+   msleep(AssertDelay);/* Sleep for the time */
DeviceNode->ReturnCode =
HvCallPci_setSlotReset(ISERIES_BUS(DeviceNode),
0x00, DeviceNode->AgentId, 0);
@@ -86,8 +86,7 @@ int iSeries_Device_ToggleReset(struct pc
/*
 * Wait for device to reset
 */
-   set_current_state(TASK_UNINTERRUPTIBLE);  
-   schedule_timeout(WaitDelay);
+   msleep(WaitDelay);
}
if (DeviceNode->ReturnCode == 0)
PCIFR("Slot 0x%04X.%02 Reset\n", ISERIES_BUS(DeviceNode),
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PPC64 replace schedule_timeout in die

2005-01-21 Thread Paul Mackerras

This patch is from Nishanth Aravamudan <[EMAIL PROTECTED]>.

Replace schedule_timeout() with ssleep to simplify the code and to
express the delay in seconds instead of HZ.

Signed-off-by: Nishanth Aravamudan <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

--- 2.6.11-rc1-kj-v/arch/ppc64/kernel/traps.c   2005-01-15 16:55:41.0 
-0800
+++ 2.6.11-rc1-kj/arch/ppc64/kernel/traps.c 2005-01-15 17:30:39.0 
-0800
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -137,8 +138,7 @@ int die(const char *str, struct pt_regs 
 
if (panic_on_oops) {
printk(KERN_EMERG "Fatal exception: panic in 5 seconds\n");
-   set_current_state(TASK_UNINTERRUPTIBLE);
-   schedule_timeout(5 * HZ);
+   ssleep(5);
panic("Fatal exception");
}
do_exit(SIGSEGV);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PPC64 Clear MSR_RI earlier in syscall exit path

2005-01-21 Thread Paul Mackerras

This patch is from Craig Chaney <[EMAIL PROTECTED]>.

This patch moves the restoring of the stack pointer in the system call
exit path to after the point where we clear the RI (recoverable
interrupt) bit in the MSR.  Normally, loading the stack pointer before
clearing RI doesn't cause any problem because there is no trap that
can normally occur in between.  But if we are tracing the code using a
tool that single-steps instructions, this can cause a problem.  In
this case, clearing RI serves as an indication that the following code
can't be safely single-stepped.

Signed-off-by: Craig Chaney <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

diff -Naur clean/arch/ppc64/kernel/entry.S edited/arch/ppc64/kernel/entry.S
--- clean/arch/ppc64/kernel/entry.S 2004-09-26 14:24:27.0 +
+++ edited/arch/ppc64/kernel/entry.S2004-09-27 14:36:29.221308744 +
@@ -185,10 +185,10 @@
beq-1f  /* only restore r13 if */
ld  r13,GPR13(r1)   /* returning to usermode */
 1: ld  r2,GPR2(r1)
-   ld  r1,GPR1(r1)
li  r12,MSR_RI
andcr10,r10,r12
mtmsrd  r10,1   /* clear MSR.RI */
+   ld  r1,GPR1(r1)
mtlrr4
mtcrr5
mtspr   SRR0,r7
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Extend clear_page by an order parameter

2005-01-21 Thread Paul Mackerras

Christoph Lameter writes:

> I had the name "zero_page" in V1 and V2 of the patch where it was
> separate. Then someone complained about code duplication.

Well, if you duplicated each arch's clear_page implementation in
zero_page, then yes, that would be unnecessary code duplication.  I
would suggest that for architectures where the clear_page
implementation can easily be extended, rename it to clear_page_order
(or something) and #define clear_page(x) to be clear_page_order(x, 0).
For architectures where it can't, leave clear_page as clear_page and
define clear_page_order as an inline function that calls clear_page in
a loop.

> clear_page is called clear_page because it clears one page of *any* order
> not just higher orders. zero-order pages are not segregated nor are they
> intrisincally better just because they contain more memory ;-).

You have missed my point, which was about address constraints, not a
distinction between zero-order pages and higher-order pages.

Anyway, I remain of the opinion that your naming is inconsistent with
the naming of other functions that deal with zero-order and
higher-order pages, such as get_free_pages, alloc_pages, free_pages,
etc., and that your patch is unnecessarily intrusive.  I guess it's up
to Andrew to decide which way we go.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PPC64 Fix in_be64 definition

2005-01-21 Thread Paul Mackerras

This patch is from Jake Moilanen <[EMAIL PROTECTED]>.

The instruction syntax for the in_be64 inline asm was incorrect for
the "m" constraint for the address parameter.  This patch fixes the
instruction in the inline asm.

Signed-off-by: Jake Moilanen <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

diff -puN include/asm-ppc64/io.h~in_be64-fix include/asm-ppc64/io.h
--- linux-2.6-bk/include/asm-ppc64/io.h~in_be64-fix Tue Jan  4 15:33:22 2005
+++ linux-2.6-bk-moilanen/include/asm-ppc64/io.hWed Jan  5 08:08:03 2005
@@ -371,7 +371,7 @@ static inline unsigned long in_be64(cons
 {
unsigned long ret;
 
-   __asm__ __volatile__("ld %0,0(%1); twi 0,%0,0; isync"
+   __asm__ __volatile__("ld%U1%X1 %0,%1; twi 0,%0,0; isync"
 : "=r" (ret) : "m" (*addr));
return ret;
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PPC64 xmon data breakpoints on partitioned systems

2005-01-21 Thread Paul Mackerras

This patch is originally from Jake Moilanen <[EMAIL PROTECTED]>,
substantially modified by me.

On PPC64 systems with a hypervisor, we can't set the Data Address
Breakpoint Register (DABR) directly, we have to do it through a
hypervisor call.

Signed-off-by: Jake Moilanen <[EMAIL PROTECTED]>
Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

diff -urN linux-2.5/arch/ppc64/xmon/xmon.c test/arch/ppc64/xmon/xmon.c
--- linux-2.5/arch/ppc64/xmon/xmon.c2005-01-12 18:20:48.0 +1100
+++ test/arch/ppc64/xmon/xmon.c 2005-01-22 10:55:46.664345064 +1100
@@ -624,6 +624,17 @@
return 0;
 }
 
+/* On systems with a hypervisor, we can't set the DABR
+   (data address breakpoint register) directly. */
+static void set_controlled_dabr(unsigned long val)
+{
+   if (systemcfg->platform == PLATFORM_PSERIES_LPAR) {
+   int rc = plpar_hcall_norets(H_SET_DABR, val);
+   if (rc != H_Success)
+   xmon_printf("Warning: setting DABR failed (%d)\n", rc);
+   } else
+   set_dabr(val);
+}
 
 static struct bpt *at_breakpoint(unsigned long pc)
 {
@@ -711,7 +722,7 @@
 static void insert_cpu_bpts(void)
 {
if (dabr.enabled)
-   set_dabr(dabr.address | (dabr.enabled & 7));
+   set_controlled_dabr(dabr.address | (dabr.enabled & 7));
if (iabr && (cur_cpu_spec->cpu_features & CPU_FTR_IABR))
set_iabr(iabr->address
 | (iabr->enabled & (BP_IABR|BP_IABR_TE)));
@@ -739,7 +750,7 @@
 
 static void remove_cpu_bpts(void)
 {
-   set_dabr(0);
+   set_controlled_dabr(0);
if ((cur_cpu_spec->cpu_features & CPU_FTR_IABR))
set_iabr(0);
 }
@@ -1049,8 +1060,8 @@
 "b  [cnt]   set breakpoint at given instr addr\n"
 "bc   clear all breakpoints\n"
 "bc   clear breakpoint number n or at addr\n"
-"bi  [cnt]  set hardware instr breakpoint (broken?)\n"
-"bd  [cnt]  set hardware data breakpoint (broken?)\n"
+"bi  [cnt]  set hardware instr breakpoint (POWER3/RS64 only)\n"
+"bd  [cnt]  set hardware data breakpoint\n"
 "";
 
 static void
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Extend clear_page by an order parameter

2005-01-21 Thread Paul Mackerras

Andrew Morton writes:

> It is, actually, from the POV of the page allocator.  It's a "higher order
> page" and is controlled by a struct page*, just like a zero-order page...

So why is the function that gets me one of these "higher order pages"
called "get_free_pages" with an "s"? :)

Christoph's patch is bigger than it needs to be because he has to
change all the occurrences of clear_page(x) to clear_page(x, 0), and
then he has to change a lot of architectures' clear_page functions to
be called _clear_page instead.  If he picked a different name for the
"clear a higher order page" function it would end up being less
invasive as well as less confusing.

The argument that clear_page is called that because it clears a higher
order page won't wash; all the clear_page implementations in his patch
are perfectly capable of clearing any contiguous set of 2^order pages
(oops, I mean "zero-order pages"), not just a "higher order page".

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Extend clear_page by an order parameter

2005-01-21 Thread Paul Mackerras

Andrew Morton writes:

> It is, actually, from the POV of the page allocator.  It's a "higher order
> page" and is controlled by a struct page*, just like a zero-order page...

OK.  I still reckon it's confusing terminology for the rest of us who
don't have our heads deep in the page allocator code.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Extend clear_page by an order parameter

2005-01-21 Thread Paul Mackerras

Christoph Lameter writes:

> clear_page clears one page of the specified order.

Now you're really being confusing.  A cluster of 2^n contiguous pages
isn't one page by any normal definition.  Call it "clear_page_cluster"
or "clear_page_order" or something, but not "clear_page".

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Extend clear_page by an order parameter

2005-01-21 Thread Paul Mackerras

Christoph Lameter writes:

> The zeroing of a page of a arbitrary order in page_alloc.c and in hugetlb.c 
> may benefit from a
> clear_page that is capable of zeroing multiple pages at once (and scrubd
> too but that is now an independent patch). The following patch extends
> clear_page with a second parameter specifying the order of the page to be 
> zeroed to allow an
> efficient zeroing of pages. Hope I caught everything

Wouldn't it be nicer to call the version that takes the order
parameter "clear_pages" and then define clear_page(p) as
clear_pages(p, 0) ?

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] PPC64: EEH Recovery

2005-01-20 Thread Paul Mackerras

Linas Vepstas writes:

> > 2. I don't see why the device nodes for the PCI subtree being reset
> >would go away, and thus I don't see the need for your eeh_cfg_tree
> >struct.
> 
> Its not the reset, its the hot-plug remove.  The hot plug code assumes
> that you are going to physically remove the device from the slot, so
> it removes the device_node as part of the "unconfig".  

OK, I missed that.  It seems a bit bogus to me.  Could you point me at
where in the code this happens?

> > 3. Is there a good reason why we can't use the assigned-addresses
> >property on the relevant device tree nodes to tell us what to set
> >the BARs to?
> 
> Yes, the reason is that after a reset, that property doesn't hold any 
> decent data.   I discussed this with the firmware developers, and thier 
> response was that it is the kernel's responsibility to compute 
> (or save/restore) such values.  (Except for bridges, which they will do for 
> us).

The not holding any decent data is a consequence of the device nodes
getting thrown away, isn't it?  I fail to see how resetting the device
can of itself affect our copy of the device tree.

> > In particular I think it should be a
> >userland write to a sysfs file that kicks off the restart process
> >rather than it just happening after 5 seconds.  Anyway, what
> >process or thread is executing that 5 second sleep?  Is it keventd
> >or something?
> 
> Its a workqueue.

Which get run in keventd's context.  In other words no other
workqueues will get run during the 5 second sleep, or at least not on
that cpu.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Horrible regression with -CURRENT from "Don't busy-lock-loop in preemptable spinlocks" patch

2005-01-19 Thread Paul Mackerras

Ingo Molnar writes:

> * Peter Chubb <[EMAIL PROTECTED]> wrote:
> 
> > >> Here's a patch that adds the missing read_is_locked() and
> > >> write_is_locked() macros for IA64.  When combined with Ingo's
> > >> patch, I can boot an SMP kernel with CONFIG_PREEMPT on.
> > >> 
> > >> However, I feel these macros are misnamed: read_is_locked() returns
> > >> true if the lock is held for writing; write_is_locked() returns
> > >> true if the lock is held for reading or writing.
> > 
> > Ingo> well, 'read_is_locked()' means: "will a read_lock() succeed"
> > 
> > Fail, surely?
> 
> yeah ... and with that i proved beyond doubt that the naming is indeed
> unintuitive :-)

Yes.  Intuitively read_is_locked() is true when someone has done a
read_lock and write_is_locked() is true when someone has done a write
lock.

I suggest read_poll(), write_poll(), spin_poll(), which are like
{read,write,spin}_trylock but don't do the atomic op to get the lock,
that is, they don't change the lock value but return true if the
trylock would succeed, assuming no other cpu takes the lock in the
meantime.

Regards,
Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] raid6: altivec support

2005-01-19 Thread Paul Mackerras

David Woodhouse writes:

> Yeah I'm increasingly tempted to merge ppc32/ppc64 into one arch
> like mips/parisc/s390. Or would that get vetoed on the basis that we
> don't have all that horrid non-OF platform support in ppc64 yet, and
> we're still kidding ourselves that all those embedded vendors will
> either not notice ppc64 or will use OF?

I'm going to insist that every new ppc64 platform supplies a device
tree.  They don't have to have OF but they do need to have the booter
or wrapper supply a flattened device tree (which is just a few kB of
binary data as far as the booter/wrapper is concerned).  It doesn't
have to include all the 

As for merging ppc32 and ppc64, I think it would end up an awful ifdef
mess, but if you can see a clean way to do it, send me a patch. :)

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] PPC64: EEH Recovery

2005-01-18 Thread Paul Mackerras

Linas Vepstas writes:

> p.s.  It was not clear to me if the EEH patch previously sent 
> (6 January 2005, same subject line) will be wending its way into 
> the main Torvalds kernel tree, or not.  I hadn't really gotten
> confirmation one way or another.

I'm not really totally happy with it yet, on a number of fronts:

1. You're adding more PCI-specific stuff to the device_node struct,
   which I don't like.  I would prefer that the device_node tree
   contains basically just what we get from OF, and that we have a
   separate struct for storing ppc64-specific information for each PCI
   device.  Fixing that is outside the scope of your patch, though.

2. I don't see why the device nodes for the PCI subtree being reset
   would go away, and thus I don't see the need for your eeh_cfg_tree
   struct.

3. Is there a good reason why we can't use the assigned-addresses
   property on the relevant device tree nodes to tell us what to set
   the BARs to?

4. I think the 5 second sleep is quite bogus, and shows that we have
   the flow of control wrong.  In particular I think it should be a
   userland write to a sysfs file that kicks off the restart process
   rather than it just happening after 5 seconds.  Anyway, what
   process or thread is executing that 5 second sleep?  Is it keventd
   or something?

5. AFAICS userland will get an unplug notification for the device, but
   nothing to indicate that is due to an EEH slot isolation event.  I
   think userland should be told about EEH events.

Regards,
Paul.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Horrible regression with -CURRENT from "Don't busy-lock-loop in preemptable spinlocks" patch

2005-01-16 Thread Paul Mackerras

Chris Wedgwood writes:

> +#define rwlock_is_write_locked(x) ((x)->lock == 0)

AFAICS on i386 the lock word, although it goes to 0 when write-locked,
can then go negative temporarily when another cpu tries to get a read
or write lock.  So I think this should be

((signed int)(x)->lock <= 0)

(or the equivalent using atomic_read).

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Ioctl compatibility for TIOCMIWAIT and TIOCGICOUNT

2005-01-16 Thread Paul Mackerras

This patch lets us use TIOCMIWAIT and TIOCGICOUNT from a 32-bit
process on a 64-bit processor.  TIOCMIWAIT uses the argument as a
bitmap of things to wait for.  The argument for TIOCGICOUNT points to
a struct serial_icounter_struct, which only contains ints and arrays
of int.

Signed-off-by: Paul Mackerras <[EMAIL PROTECTED]>

diff -urN linux-2.5/include/linux/compat_ioctl.h 
test/include/linux/compat_ioctl.h
--- linux-2.5/include/linux/compat_ioctl.h  2004-11-17 09:38:21.0 
+1100
+++ test/include/linux/compat_ioctl.h   2005-01-17 14:25:41.0 +1100
@@ -25,6 +25,8 @@
 COMPATIBLE_IOCTL(TIOCLINUX)
 COMPATIBLE_IOCTL(TIOCSBRK)
 COMPATIBLE_IOCTL(TIOCCBRK)
+ULONG_IOCTL(TIOCMIWAIT)
+COMPATIBLE_IOCTL(TIOCGICOUNT)
 /* Little t */
 COMPATIBLE_IOCTL(TIOCGETD)
 COMPATIBLE_IOCTL(TIOCSETD)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory region check in drivers/pcmcia/rsrc_mgr.c

2001-07-08 Thread Paul Mackerras

Linus Torvalds writes:

> HOWEVER, you can change the resource checking to use the proper "parent
> resource" instead of using the root resource. I absolutely agree that
> using the root resource is wrong per se - it depends (incorrectly) on the
> fact that on all laptops the PCMCIA controller tends to be on the root
> bus.

I was able to do this more easily than I had expected, and there is a
(lightly-tested) patch below for comment and testing.  The main thing
is that the routines in rsrc_mgr.c now basically need to get a handle
for the parent resource for the pcmcia socket controller that we are
concerned with at the moment.  To get this I use the s->cap.cb_dev
field, which AFAICS gets set to the pci_dev for the controller for PCI
controllers and should be NULL for ISA controllers.  If there is a
better way to get hold of the pci_dev let me know.

I have added a socket_info_t *s argument to validate_mem,
find_io_region and find_mem_region, so that we can get at
s->cap.cb_dev.  The callers of these routines all have the
socket_info_t pointer readily to hand.  We could pass in &s->cap or
s->cap.cb_dev instead, but passing s seems to be the easiest and most
generally useful option.

If s or s->cap.cb_dev is NULL the routines fall back to the old
behaviour, i.e. using ioport_resource or iomem_resource.  Of course it
is possible that the ISA memory and I/O space could be a sub-node in
the ioport/mem_resource trees, and that we should be using those nodes
for ISA pcmcia controllers rather than ioport/mem_resource.  If that
is so then we need to define new isa_ioport_resource and
isa_iomem_resource variables and set them in the architecture-specific
PCI code.

I also fixed the problem that Jeff Garzik pointed out, which is that
the existing code in find_io_region does a check_io_resource followed
by a request_region, without checking the return from request_region,
which is potentially racy (anyone for an SMP laptop? :).  (And
find_mem_region does the analogous thing.)  I replaced the pair of
calls with a single call to a new function, request_io_resource, which
attempts to allocate the region in the socket controller's parent
resource.  Similarly there is a new request_mem_resource function
used in find_mem_region.

> Note that the CardBus side gets this all right - I assume that a 32-bit
> CardBus card with a PCI driver should work on your powerbook even without
> this patch, no?

I assume so, but I don't have any cardbus devices to test it with.

Regards,
Paul.

diff -urN linux/drivers/pcmcia/cistpl.c pmac/drivers/pcmcia/cistpl.c
--- linux/drivers/pcmcia/cistpl.c   Thu Feb 22 14:25:19 2001
+++ pmac/drivers/pcmcia/cistpl.cSun Jul  8 17:57:37 2001
@@ -264,11 +264,11 @@
(s->cis_mem.sys_start == 0)) {
int low = !(s->cap.features & SS_CAP_PAGE_REGS);
vs = s;
-   validate_mem(cis_readable, checksum_match, low);
+   validate_mem(cis_readable, checksum_match, low, s);
s->cis_mem.sys_start = 0;
vs = NULL;
if (find_mem_region(&s->cis_mem.sys_start, s->cap.map_size,
-   s->cap.map_size, low, "card services")) {
+   s->cap.map_size, low, "card services", s)) {
printk(KERN_NOTICE "cs: unable to map card memory!\n");
return CS_OUT_OF_RESOURCE;
}
diff -urN linux/drivers/pcmcia/cs.c pmac/drivers/pcmcia/cs.c
--- linux/drivers/pcmcia/cs.c   Wed Jul  4 14:33:24 2001
+++ pmac/drivers/pcmcia/cs.cSun Jul  8 17:57:36 2001
@@ -797,7 +797,7 @@
return 1;
 for (i = 0; i < MAX_IO_WIN; i++) {
if (s->io[i].NumPorts == 0) {
-   if (find_io_region(base, num, align, name) == 0) {
+   if (find_io_region(base, num, align, name, s) == 0) {
s->io[i].Attributes = attr;
s->io[i].BasePort = *base;
s->io[i].NumPorts = s->io[i].InUse = num;
@@ -809,7 +809,7 @@
/* Try to extend top of window */
try = s->io[i].BasePort + s->io[i].NumPorts;
if ((*base == 0) || (*base == try))
-   if (find_io_region(&try, num, 0, name) == 0) {
+   if (find_io_region(&try, num, 0, name, s) == 0) {
*base = try;
s->io[i].NumPorts += num;
s->io[i].InUse += num;
@@ -818,7 +818,7 @@
/* Try to extend bottom of window */
try = s->io[i].BasePort - num;
if ((*base == 0) || (*base == try))
-   if (find_io_region(&try, num, 0, name) == 0) {
+   if (find_io_region(&try, num, 0, name, s) == 0) {
s->io[i].BasePort = *base = try;
s->io[i].NumPorts += num;
s->io[i].InUse += num;
@@ -1960,7 +1960,7 @@
find_mem_region(&win->base, win->size, align,
(req->Attributes & WIN_MAP_BELOW_1MB) ||
!(s->cap.features & SS_CAP_PAGE_REGS),
-   (*handle)->dev_info))
+   (*handle)->dev_info, s))

Memory region check in drivers/pcmcia/rsrc_mgr.c

2001-07-07 Thread Paul Mackerras


In drivers/pcmcia/rsrc_mgr.c, there is code that check whether a given
range of PCI memory addresses are available for the pcmcia code to
use.  This code uses a macro, check_mem_resource(), to check whether a
particular region is available, defined like this:

#define check_mem_resource(b,n) check_resource(&iomem_resource, (b), (n))

This code is now causing me problems on my powerbook because we now
register the regions mapped by each PCI host bridge in the
iomem_resource structure.  The basic problem is that check_resource
only checks at the top level of the iomem_resource tree.  I think that
we should be using check_mem_region instead, which will descend the
tree until it finds out whether the region is actually in use or not.

The patch below does this (and makes a similar correction for I/O
space).  With this patch applied, the pcmcia stuff works fine on my
powerbook, and I end up with something like this in /proc/iomem:

8000-afff : /pci@f200
  8000-8007 : Apple Computer Inc. KeyLargo Mac I/O
  9000-9fff : PCI CardBus #02
  a000-afff : Texas Instruments PCI1211
  a0001000-a0001fff : Apple Computer Inc. KeyLargo USB (#2)
a0001000-a0001fff : usb-ohci
  a0002000-a0002fff : Apple Computer Inc. KeyLargo USB
a0002000-a0002fff : usb-ohci
  a700-a7000fff : card services
b000-bfff : /pci@f000
  b000-b0003fff : ATI Technologies Inc Mobility M3 AGP 2x
b000-b0003fff : aty128fb MMIO
  b400-b7ff : ATI Technologies Inc Mobility M3 AGP 2x
b400-b7ff : aty128fb FB
f100-f1ff : /pci@f000
f300-f3ff : /pci@f200
  f300-f33f : PCI CardBus #02
f500-f5ff : /pci@f400
  f500-f5000fff : Apple Computer Inc. UniNorth FireWire
  f520-f53f : Apple Computer Inc. UniNorth GMAC

Linus, would you apply this patch to your tree?

Paul.

diff -urN linux/drivers/pcmcia/rsrc_mgr.c pmac/drivers/pcmcia/rsrc_mgr.c
--- linux/drivers/pcmcia/rsrc_mgr.c Sat Mar 31 03:06:19 2001
+++ pmac/drivers/pcmcia/rsrc_mgr.c  Wed Jun 20 14:25:25 2001
@@ -104,8 +104,8 @@
 
 ==*/
 
-#define check_io_resource(b,n) check_resource(&ioport_resource, (b), (n))
-#define check_mem_resource(b,n)check_resource(&iomem_resource, (b), (n))
+#define check_io_resource(b,n) check_region((b), (n))
+#define check_mem_resource(b,n)check_mem_region((b), (n))
 
 /*==
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PPC interrupt mapping fix

2001-07-06 Thread Paul Mackerras


Linus,

The patch below fixes the interrupt assignments on PPC machines that
use Open Firmware, in the case where we have devices behind a PCI-PCI
bridge and multiple PCI host bridges.  The patch is moderately large
because I rewrote the procedure that parsed the open firmware
interrupt tree.  The previous routine was monolithic and almost
unreadable - I wrote a new version which uses several subroutines and
should be much more readable.  There are also some fixes to allow us
to use the interrupt tree on powermacs when booted with BootX, which
we couldn't do previously.

Please apply to your tree.

Paul.

diff -urN linux/arch/ppc/kernel/prom.c pmac/arch/ppc/kernel/prom.c
--- linux/arch/ppc/kernel/prom.cWed Jul  4 14:33:18 2001
+++ pmac/arch/ppc/kernel/prom.c Wed Jul  4 22:53:29 2001
@@ -116,8 +116,11 @@
 unsigned int rtas_size;
 unsigned int old_rtas;
 
-/* Set for a newworld machine */
+/* Set for a newworld or CHRP machine */
 int use_of_interrupt_tree;
+struct device_node *dflt_interrupt_controller;
+int num_interrupt_controllers;
+
 int pmac_newworld;
 
 static struct device_node *allnodes;
@@ -1153,7 +1156,19 @@
*prev_propp = PTRUNRELOC(pp);
prev_propp = &pp->next;
}
-   *prev_propp = 0;
+   if (np->node != NULL) {
+   /* Add a "linux,phandle" property" */
+   pp = (struct property *) mem_start;
+   *prev_propp = PTRUNRELOC(pp);
+   prev_propp = &pp->next;
+   namep = (char *) (pp + 1);
+   pp->name = PTRUNRELOC(namep);
+   strcpy(namep, RELOC("linux,phandle"));
+   mem_start = ALIGN((unsigned long)namep + strlen(namep) + 1);
+   pp->value = (unsigned char *) PTRUNRELOC(&np->node);
+   pp->length = sizeof(np->node);
+   }
+   *prev_propp = NULL;
 
/* get the node's full name */
l = (int) call_prom(RELOC("package-to-path"), 3, 1, node,
@@ -1186,19 +1201,46 @@
 finish_device_tree(void)
 {
unsigned long mem = (unsigned long) klimit;
+   struct device_node *np;
 
-   /* All newworld machines now use the interrupt tree */
-   struct device_node *np = allnodes;
-
-   while(np && (_machine == _MACH_Pmac)) {
+   /* All newworld pmac machines and CHRPs now use the interrupt tree */
+   for (np = allnodes; np != NULL; np = np->allnext) {
if (get_property(np, "interrupt-parent", 0)) {
-   pmac_newworld = 1;
+   use_of_interrupt_tree = 1;
break;
}
-   np = np->allnext;
}
-   if ((_machine == _MACH_chrp) || (boot_infos == 0 && pmac_newworld))
-   use_of_interrupt_tree = 1;
+   if (_machine == _MACH_Pmac && use_of_interrupt_tree)
+   pmac_newworld = 1;
+
+#ifdef CONFIG_BOOTX_TEXT
+   if (boot_infos && pmac_newworld) {
+   prom_print("WARNING ! BootX/miBoot booting is not supported on this 
+machine\n");
+   prom_print("  You should use an Open Firmware bootloader\n");
+   }
+#endif /* CONFIG_BOOTX_TEXT */
+
+   if (use_of_interrupt_tree) {
+   /*
+* We want to find out here how many interrupt-controller
+* nodes there are, and if we are booted from BootX,
+* we need a pointer to the first (and hopefully only)
+* such node.  But we can't use find_devices here since
+* np->name has not been set yet.  -- paulus
+*/
+   int n = 0;
+   char *name;
+
+   for (np = allnodes; np != NULL; np = np->allnext) {
+   if ((name = get_property(np, "name", NULL)) == NULL
+   || strcmp(name, "interrupt-controller") != 0)
+   continue;
+   if (n == 0)
+   dflt_interrupt_controller = np;
+   ++n;
+   }
+   num_interrupt_controllers = n;
+   }
 
mem = finish_node(allnodes, mem, NULL, 1, 1);
dev_tree_size = mem - (unsigned long) allnodes;
@@ -1240,9 +1282,8 @@
if (ifunc != NULL) {
mem_start = ifunc(np, mem_start, naddrc, nsizec);
}
-   if (use_of_interrupt_tree) {
+   if (use_of_interrupt_tree)
mem_start = finish_node_interrupts(np, mem_start);
-   }
 
/* Look for #address-cells and #size-cells properties. */
ip = (int *) get_property(np, "#address-cells", 0);
@@ -1298,141 +1339,210 @@
return mem_start;
 }
 
-/* This routine walks the interrupt tree for a given device node and gather 
- * all necessary informations according to the draft interrupt mapping
- * for CHRP. The current version was only tested on Apple "Core99" machines
- * and may not handle cascaded controllers correctly.
+/*
+ * Find the interrupt pare

[PATCH] fix drivers/usb/scanner.c ioctl return

2001-07-06 Thread Paul Mackerras


The following patch corrects the return value from the ioctl function
in the USB scanner code, in the case where the ioctl is unrecognized.

Linus, please apply.

Paul.

diff -urN linux/drivers/usb/scanner.c pmac/drivers/usb/scanner.c
--- linux/drivers/usb/scanner.c Sat Apr 28 23:02:49 2001
+++ pmac/drivers/usb/scanner.c  Thu Jun 28 17:28:25 2001
@@ -909,7 +909,7 @@
return result;
}
default:
-   return -ENOIOCTLCMD;
+   return -ENOTTY;
}
return 0;
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

drivers/ide/sl82c105.c

2001-07-06 Thread Paul Mackerras


I am wondering who maintains drivers/ide/sl82c105.c, and who sent in
the recent changes to it.  We now have, at around line 278, this code:

unsigned int pci_init_sl82c105(struct pci_dev *dev, const char *msg)
{
return ide_special_settings(dev, msg);
}

The call to ide_special_settings gives a link error because
ide_special_settings is not exported from drivers/ide/ide-pci.c.
I can't see what the point of calling it is anyway, even if it were
exported, since ide_special_settings consists of a switch statement on
the device ID and none of the cases will match.

Paul (who uses sl82c105.c on his longtrail PPC CHRP box).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] fix compile error in usb-ohci.c

2001-07-06 Thread Paul Mackerras


The following patch fixes a trivial error in drivers/usb/usb-ohci.c,
where a missing argument to ohci_pci_suspend will cause a compile
error if you have powerbook support enabled.

Linus, please apply.

Paul.

diff -urN linux/drivers/usb/usb-ohci.c pmac/drivers/usb/usb-ohci.c
--- linux/drivers/usb/usb-ohci.cWed Jul  4 14:33:36 2001
+++ pmac/drivers/usb/usb-ohci.c Fri Jul  6 16:20:58 2001
@@ -2749,7 +2749,7 @@
 
switch (when) {
case PBOOK_SLEEP_NOW:
-   ohci_pci_suspend (ohci->ohci_dev);
+   ohci_pci_suspend (ohci->ohci_dev, 3);
break;
case PBOOK_WAKE:
ohci_pci_resume (ohci->ohci_dev);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] fix compile error in imsttfb.c

2001-07-06 Thread Paul Mackerras


As it currently stands, drivers/video/imsttfb.c will give a compile
error if FBCON_HAS_CFB32 is defined.  This patch fixes that.

There used to be a declaration of `i' which was only used if
FBCON_HAS_CFB32 was defined.  I suspect that somebody was compiling
without FBCON_HAS_CFB32 and saw an unused variable warning from gcc
and decided to take out the declaration.  This patch will avoid that
warning.

Linus, please apply.

Paul.

diff -urN linux/drivers/video/imsttfb.c linuxppc_2_4/drivers/video/imsttfb.c
--- linux/drivers/video/imsttfb.c   Thu Jul  5 14:46:16 2001
+++ linuxppc_2_4/drivers/video/imsttfb.cThu Jul  5 10:58:09 2001
@@ -1278,10 +1278,11 @@
break;
 #endif
 #ifdef FBCON_HAS_CFB32
-   case 32:
-   i = (regno << 8) | regno;
+   case 32: {
+   int i = (regno << 8) | regno;
p->fbcon_cmap.cfb32[regno] = (i << 16) | i;
break;
+   }
 #endif
}
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: Trouble Booting Linux PPC On Mac G4 2000

2001-07-06 Thread Paul Mackerras

Tim McDaniel writes:

> We are having a great degree of difficulty getting Linux PPC2
> running on a Mac G4 466 tower with 128MB of memory, One 30MB HD and one
> CR RW. This is not a NuBus based system. To the best of our knowledge we
> have followed the user manual to the tee, and even tried forcing video
> settings at the Xboot screen.   

One possible problem is that many Apple monitors only work at a fixed
horizontal frequency - the Apple Studio 17 monitor (with the
transparent case) that I use with my G4 cube is like that, it will
only operate at horizontal scan rates between 79 and 82 kHz.  If the
kernel video driver chooses a video mode with a scan rate outside that
range the screen goes black.  So I have to put video=aty128fb:vmode:20
on the kernel command line to avoid that.  (It would be nice if the
kernel driver did DDC but it doesn't.)

Other than that, you might get more useful suggestions if you ask on
the [EMAIL PROTECTED] mailing list.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: floating point problem

2001-07-05 Thread Paul Mackerras

[EMAIL PROTECTED] writes:

> In Linux PPC, the MSR[FP] bit (that is floating point available bit) is off
> (atleast for non-SMP).
> 
> Due to this, whenever some floating point instruction is executed in 'user
> mode', it leads to a exception 'FPUnavailable'. The exception handler for

Yes, this is so that we don't have to save and restore the floating
point registers on every context switch.

> this exception apart from setting the MSR[FP] bit, also sets the MSR[FE0]
> and MSR[FE1] bits. These bits basically enables the floating point
> exceptions so that if there are some floating point exception conditions
> encountered while exeuting a floating point instruction, an appropriate
> exception is raised.

You have control at user-level over whether the cpu will take an
exception (leading to a SIGFPE signal) or not by means of the FPSCR
register.  The VE, OE, UE, ZE and XE bits in the FPSCR control whether
the cpu will take an exception on floating-point invalid operation,
overflow, underflow, divide by zero and inexact result respectively.

If the kernel cleared the FE0 and FE1 bits, there would be no way for
an application to get a signal when a floating-point error occurred.
With FE0 and FE1 set, the application can control this using the
FPSCR, and get a signal, or not, as it prefers.

> But whenever some floating point instruction is executed in 'kernel mode',
> 'FPUnavailabe' exception handler code does not set the 'MSR[FE0] and
> MSR[FE1]' bits.

Floating point is not intended to be used in the kernel except in a
couple of specific places.

> Problem is that we want to get the good results without changing the
> kernel. Either by having the user mode application to interact with some
> special module which can set the MSR[FP] bit before we execute the floating
> point instruction or by some other trick.Is there any solution apart
> from changing the kernel?

Clear the appropriate bits in the FPSCR.  There is almost certainly a
glibc interface to do this but I don't know what it would be.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] fix typo in 2.4.6 for PPC

2001-07-04 Thread Paul Mackerras


The patch below fixes a typo in the PowerPC code in 2.4.6.  Without
this change, people attempting to compile up a kernel for a powermac
will get a compile error.

Paul.

diff -urN linux/arch/ppc/kernel/pmac_pci.c linuxppc_2_4/arch/ppc/kernel/pmac_pci.c
--- linux/arch/ppc/kernel/pmac_pci.cTue Jul  3 13:38:19 2001
+++ linuxppc_2_4/arch/ppc/kernel/pmac_pci.c Tue Jul  3 15:00:40 2001
@@ -249,7 +249,7 @@
out_le32(bp->cfg_addr, (1UL << BANDIT_DEVNUM) + PCI_VENDOR_ID);
udelay(2);
vendev = in_le32((volatile unsigned int *)bp->cfg_data);
-   if (vendev == (PCI_VENDOR_ID_APPLE_BANDIT << 16) + 
+   if (vendev == (PCI_DEVICE_ID_APPLE_BANDIT << 16) + 
PCI_VENDOR_ID_APPLE) {
/* read the revision id */
out_le32(bp->cfg_addr,
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: about kmap_high function

2001-07-04 Thread Paul Mackerras

Stephen C. Tweedie writes:

> On Tue, Jul 03, 2001 at 10:47:20PM +1000, Paul Mackerras wrote:
> > On PPC it is a bit different.  Flushing a single TLB entry is
> > relatively cheap - the hardware broadcasts the TLB invalidation on the
> > bus (in most implementations) so there are no cross-calls required.  But
> > flushing the whole TLB is expensive because we (strictly speaking)
> > have to flush the whole of the MMU hash table as well.
> 
> How much difference is there? 

Between flushing a single TLB entry and flushing the whole TLB, or
between flushing a single entry and flushing a range?

Flushing the whole TLB (including the MMU hash table) would be
extremely expensive.  Consider a machine with 1GB of RAM.  The
recommended MMU hash table size would be 16MB (1024MB/64), although we
generally run with much less, maybe a quarter of that.  That's still
4MB of memory we have to scan through in order to find and clear all
the entries in the hash table, which is what would be required for
flushing the whole hash table.

What we do at present is (a) have a bit in the linux page tables which
indicates whether there is a corresponding entry in the MMU hash table
and (b) only flush the kernel portion of the address space (0xc000
- 0x) in flush_tlb_all().  We have a single page table tree
for kernel addresses, shared between all processes.  That all helps
but we still have to scan through all the page table pages for kernel
addresses to do a flush_tlb_all().

I just did some measurements on a 400MHz POWER3 machine with 1GB of
RAM.  This is a 64-bit machine but running a 32-bit kernel (so both
the kernel and userspace run in 32-bit mode).  It is a 1-cpu machine
and I am running an SMP kernel with highmem enabled, with 512MB of
lowmem and 512MB of highmem.  The MMU hash table is 4MB.

The time taken inside a single flush_tlb_page call depends on whether
the linux PTE indicates that there is a hardware PTE in the hash
table.  If not, it takes about 110ns, if it does, it takes 1us (I
measured 998.5ns but I rounded it :).

A call to flush_tlb_range for 1024 pages from flush_all_zero_pkmaps
(replacing the flush_tlb_all call) takes around 1080us, which is
pretty much linear.  The time for flush_tlb_page was measured inside
the procedure whereas the time for flush_tlb_range was measured in the
caller, so the flush_tlb_range number includes procedure call and loop
overhead which the flush_tlb_page number doesn't.  I expect that
almost all the PTEs in the pkmap range would have a corresponding hash
table entry, since we would almost always touch a page that we have
kmap'd.

> We only flush once per kmap sweep, and
> we have 1024 entries in the global kmap pool, so the single tlb flush
> would have to be more than a thousand times less expensive overall
> than the global flush for that change to be worthwhile.

The time for doing a flush_tlb_all call in flush_all_zero_pkmaps was
3280us.  That is for the version which only flushes the kernel portion
of the address space.  Just doing a memset to 0 on the hash table
takes over 11ms (the memset goes at around 360MB/s but there is 4MB to
clear).  Clearing out the hash table properly would take much longer
since you are supposed to synchronize with the hardware when changing
each entry in the hash table and the memset is certainly not doing that.

So yes, the ratio is more than 1024 to 1.

> If the page flush really is _that_ much faster, then sure, this
> decision can easily be made per-architecture: the kmap_high code
> already has all of the locking and refcounting to know when a per-page
> tlb flush would be safe.

My preference would be for architectures to be able to make this
decision.  I don't mind whether it is a flush call per page inside the
loop in flush_all_zero_pkmaps or a flush_tlb_range call at the end of
the loop.  I counted the average number of pages needing to be
flushed in the loop in flush_all_zero_pkmaps - it was 1023.9 for the
workload I was using, which was a kernel compile.

Using flush_tlb_range would be fine on PPC but as I noted before some
architectures assume that flush_tlb_range is only used on user
addresses at the moment.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: virt_to_bus and virt_to_phys on Apple G4 target

2001-07-04 Thread Paul Mackerras

[EMAIL PROTECTED] writes:

> I am running linux 2.4.2 on Apple G4 machine. I think the 'PCI bus
> addresses' and 'physical addresses' are same on this architecture. I

They are the same on an Apple G4 but not necessarily on other PowerPC
machines.  It depends on the PCI host bridge implementation.

> expected the two be different but according to asm/io.h 'virt_to_bus(addr)
> = virt_to_phys(addr) + PCI_DRAM_OFFSET'. I printed the value of
> 'PCI_DRAM_OFFSET' and that come out to be zero. Is this correct?

Yes, for an Apple G4.

> If I somehow get the physical address of a user space buffer in a module
> and take this as a PCI bus address, will I be able to do DMA properly?

Yes, on an Apple G4.  If you use virt_to_bus then it should work on
all PowerPC machines that I know of (that run 32-bit PPC/Linux).  But
as Dave points out, you should use the interfaces described in
Documentation/DMA-mapping.txt instead if at all possible.  It's quite
possible that virt_to_bus will be removed during 2.5.x development.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: readl() / writel() on PowerPC

2001-07-04 Thread Paul Mackerras

David T Eger writes:

> Am I missing something?  Is there some reason that readl() and
> writel() should byte-swap by default?

readl()/writel() are defined to access PCI memory space in units of 32
bits.  PCI is by definition little-endian, PowerPC is (natively at
least) big-endian, hence the byte-swap.  Same for inl/outl etc., but
not insl/outsl - they don't swap because they are typically used for
transferring arrays of bytes, just doing it 4 bytes at a time (2 at a
time for insw/outsw).

You can use __raw_readl/__raw_writel if you don't want byte-swapping,
but they also don't give you any barriers.  Thus if you do

__raw_writel(v, addr);
x = __raw_readl(addr);

it is quite possible for the read to hit the device before the write.
If you want to prevent that you need to put an iobarrier_rw() call in
between the read and the write.  You don't need a barrier between
successive writes unless you want to prevent any potential
store-gathering from happening, because PowerPC's don't reorder writes
to I/O regions.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: about kmap_high function

2001-07-03 Thread Paul Mackerras

Stephen C. Tweedie writes:

> kmap_high is intended to be called routinely for access to highmem
> pages.  It is coded to be as fast as possible as a result.  TLB
> flushes are expensive, especially on SMP, so kmap_high tries hard to
> avoid unnecessary flushes.

The code assumes that flushing a single TLB entry is expensive on SMP,
while flushing the whole TLB is relatively cheap - certainly cheaper
than flushing several individual entries.  And that assumption is of
course true on i386.

On PPC it is a bit different.  Flushing a single TLB entry is
relatively cheap - the hardware broadcasts the TLB invalidation on the
bus (in most implementations) so there are no cross-calls required.  But
flushing the whole TLB is expensive because we (strictly speaking)
have to flush the whole of the MMU hash table as well.

The MMU gets its PTEs from a hash table (which can be very large) and
we use the hash table as a kind of level-2 cache of PTEs, which means
that the flush_tlb_* routines have to flush entries from the MMU hash
table as well.  The hash table can store PTEs from many contexts, so
it can have a lot of PTEs in it at any given time.  So flushing the
whole TLB would imply going through every single entry in the hash
table and clearing it.  In fact, currently we cheat - flush_tlb_all
actually only flushes the kernel portion of the address space, which
is all that is required in the three places where flush_tlb_all is
called at the moment.

This is not a criticism, rather a request that we expand the
interfaces so that the architecture-specific code can make the
decisions about when and how to flush TLB entries.

For example, I would like to get rid of flush_tlb_all and define a
flush_tlb_kernel_range instead.  In all the places where flush_tlb_all
is currently used, we do actually know the range of addresses which
are affected, and having that information would let us do things a lot
more efficiently on PPC.  On other platforms we could define
flush_tlb_kernel_range to just flush the whole TLB, or whatever.

Note that there is already a flush_tlb_range which could be used, but
some architectures assume that it is only used on user addresses.

Regards,
Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

hang from HUP'ing init in linuxrc

2001-07-01 Thread Paul Mackerras


Recently I tried running the Debian installer on top of a 2.4.6-pre6
kernel.  It got up to the point of installing libc and then the system
hung.  It was still taking interrupts (I could change vt's, etc.) but
no user processes were running.

What was happening was rather interesting.  The init process was stuck
inside prepare_namespace(), in the while loop here (this is lines 749
- 751 of init/main.c):

pid = kernel_thread(do_linuxrc, "/linuxrc", SIGCHLD);
if (pid>0)
while (pid != wait(&i));

The installer had sent a HUP signal to init.  The init process thus
had current->sigpending == 1.  When it called wait, it got down into
sys_wait4 which worked out that there were children but none were
zombies, and at that point it would normally sleep, but because there
were signals pending, it returned -ERESTARTSYS.  Now, on the way out
from the system call, the kernel noticed that it was returning to
kernel mode and thus didn't deliver any signals, and sigpending stayed
at 1.

Thus the system was sitting in a tight loop calling wait() over and
over again in kernel mode in the init process.

This was on PPC.  I had a look at the i386 code and AFAICS it will do
the same thing.  The check for whether we are returning to user mode
is in do_signal there (whereas PPC does the check in entry.S) but the
net effect in both cases is that we don't execute the main body of
do_signal when we are returning from a syscall from a process running
in kernel mode.

I'm not sure what the best way to fix this is.  The problem would crop
up whenever we have a kernel thread which wants to wait for a child
process.  I don't think we want to start delivering signals to kernel
threads in the same way that we do to usermode processes though.

Any suggestions?

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Cosmetic JFFS patch.

2001-06-29 Thread Paul Mackerras

Cort Dougan writes:

> Can we then expect to see all mention of authors in drivers disappear from
> the boot? 

I think we'll either see a lot more or a lot less.  In my example I
would have had no particular problem with a message saying "PPP driver
copyright Al Longyear and Michael Callahan" or whatever.  What annoyed
me was the noisy copyright message about something that was only 20 or
30 lines of code, and not especially clever code at that.

If copyright messages on boot are the way we get credit for the work
we've done, then I have a few to add myself. :)

My personal preference is for a quieter boot, with basically no
copyright messages.  It's Linus' call though.

> Same with url's, version #'s and the like?

See all the previous messages in this thread. :)

>  The built by
> user@host message is a good bit of "drumming ones own drum" while
> contributing very little (running 'make' vs. writing the system).

Isn't that more a "who to blame" than credit?

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Cosmetic JFFS patch.

2001-06-28 Thread Paul Mackerras

Linus Torvalds writes:

> There's another side to "drumming your own drum": it is often seen as
> actively offensive to some people who don't want to do the same thing.

I agree.  What usually seems to end up happening is that someone
writes 95% and gets no credit, someone else does 5% and puts in a
printk announcing their contribution loudly every time the system
boots.  I recall that the old PPP driver used to print "PPP Dynamic
channel allocation code copyright 1995 Caldera, Inc." which always
annoyed me because it was a completely trivial piece of code that the
notice was referring to.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: softirq in pre3 and all linux ports

2001-06-20 Thread Paul Mackerras

Andrea Arcangeli writes:

> We should release the stack before running the softirq (some place uses
> softirqs to release the stack and avoid overflows).

Well if they are relying on having a lot of stack available then those
places are buggy.  Once the softirq is made pending it can run at any
time that interrupts are enabled.  You can't rely on a softirq handler
having any more stack available than a hard interrupt handler has.

> ip + tcp are more intensive than just queueing a packet in a blacklog.
> That's why they're not done in irq context in first place.

Ah, ok, I misunderstood, I thought you were saying that that softirq
framework itself had a lot of overhead.

> I don't have gigabit ethernet so I cannot flood my boxes to death.
> But I think it's real, and a softirq marking itself runnable again is
> another case to handle without live lockups or starvation.

As for the gigabit ethernet case, if we are having packets coming in
and generating hard interrupts at that sort of a rate then what we
really need is the sort of interrupt throttling that Jamal talked
about at the 2.5 kernel kickoff.

It seems to me that possibly softirqs are being used in some places
where a kernel thread would be more appropriate.  Instead of making
softirqs use a kernel thread, I think it would be better to find the
places that should use a thread and make them do so.  Softirqs are
still after all interrupt handlers (ones that run at a lower priority
than any hardware interrupt) and should be treated as such.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: softirq in pre3 and all linux ports

2001-06-19 Thread Paul Mackerras

Andrea Arcangeli writes:

> With pre3 there are bugs introduced into mainline that are getting
> extended to all architectures.
> 
> First of all nucking the handle_softirq from entry.S is wrong. ppc
> copied without thinking and we'll need to resurrect it too for example

Well, I object to the "without thinking" bit.  It seems to me that
code that raises a softirq without having either hard interrupts or
BHs disabled is buggy - why would you want to do that?  And if we do
want to allow that, shouldn't we put the check in raise_softirq or the
equivalent, to get the minimum latency?

> Fourth if the tasklet or softirq or bottom half hander is been marked
> running again because of another even (like a nested irq) the kernel can
> starve userspace too. (softirqs are much heavier than the irq handler so
> it can also live lockup much more easily this way)

Soft irqs should definitely not be much heavier than an irq handler,
if they are then we have implemented them wrongly somehow.

> So I recommend Linus merging this patch that fixes all the above
> mentioned bugs (the anti starvation/live lockup logic is called
> ksoftirqd):

ksoftirqd seems like the wrong solution to the problem to me, if we
really getting starved by softirqs then we need to look at whether
whatever is doing it should be a kernel thread itself rather than
doing it in softirqs.  Do you have a concrete example of the
starvation/live lockup that you can describe to us?

Regards,
Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux/PPC maintainer changing

2001-06-18 Thread Paul Mackerras


Cort has put in an enormous amount of time and effort into maintaining
the PowerPC port of Linux over the past 5 or 6 years, and I for one
would like to acknowledge that publicly and thank him for that.  It
has not always been an easy task, I know, because there are a wide
range of opinions within the PPC/Linux camp and Cort has been the man
on the spot to sort out the balance between the competing interests.
And I for one will miss the time, effort and resources he has put into
the infrastructure things such as the repository, web pages, ftp site
etc.

I would also like to thank FSM Labs for contributing the space and
bandwidth for the PPC/Linux repository over the last couple of years.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: any good diff merging utility?

2001-06-17 Thread Paul Mackerras

Ivan Vadovic writes:

> Well, are there any utilities to merge diffs? I couldn't find any on freshmeat.
> So what are you using to stack many patches onto the kernel tree? Just manualy
> modify the diff? I'll try to write something more automatic if nothing comes up.

Try dirdiff - ftp://ftp.samba.org/pub/paulus/dirdiff-1.2.tar.gz.  I
use it all the time for merging in changes between Linus' official
tree, my own development tree, and the PPC/Linux bitkeeper trees.

Dirdiff is a tcl/tk-based utility for graphically displaying the
difference between directory trees.  It can handle from 2 to 5 trees.
It displays a main window where it shows which files are different.
You can select a file and get it to show the diffs between that file
in any two of the directory trees.  This comes up in another window
in a format like a unified diff but with the background of the line
colored according to which file it comes from.  You can also copy
files between trees with a menu item - in fact you can select whole
groups of files to be copied.  And you can use it to generate patches
too. :)

Once you have the differences between two versions of a file
displayed, you can do a merge between the two versions.  Each line of
differences has a little check box beside it.  If you check the box it
means you want to make that change (right-click or shift-click selects
a whole group of boxes).  When you have checked all the boxes you want
you select an item from the merge menu to say which tree you want to
update.  The new version of the file comes up in an edit window and
you can check it, make any further changes you want, etc.  Then you
can either save the result or close the window (discarding the merge).

It's hard to explain in words everything about how it works and how
you use it.  It isn't really a utility to merge diffs but it is very
useful in tracking and merging changes between several large source
trees.  I find it particularly useful because I am usually interested
only in a subset of the files (i.e. particularly arch/ppc and
include/asm-ppc).  So when Linus releases a new pre-patch, I update my
"official Linus source" tree and do another dirdiff.  If there are
changes to files under fs/ for instance, I just select all of them and
copy them over to my tree without looking at the diffs.  If there are
changes in arch/i386 for instance, I look at the diff to see if I am
going to need to make a similar change in arch/ppc.

Regards,
Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Inconsistent "#ifdef KERNEL" on different architectures

2001-06-05 Thread Paul Mackerras

Adrian Bunk writes:

> Whatever the right policy is, the main concern in my initial mail was the
> _consistency_ of the kernel headers between different architectures.
> So when you want to flush out these programs I see no reason to
> inconsistetly change it only on one architecture.

Different architectures are maintained by different people who have
different perspectives on things.  The only thing you have any right
to expect any consistency in is the kernel API, and even there things
like error numbers etc. differ between architectures.

If you want consistency, you would either have to persuade Linus to
issue an edict or else persuade every single architecture maintainer
to do things the same way.  But if the motivation is to make it easier
for user-level programs to use things which are not intended to be
exported to userspace, then all you will achieve is that we will make
sure that you can't use those things from userspace.  And this
definitely includes things like atomics, bitops, memory barriers etc.
Take a copy by all means but don't rely on the kernel definitions for
your userspace programs.

It is the policy for all architectures that kernel headers should not
be used in userspace programs.  The "inconsistency" that you are
complaining about is only a difference in the extent to which
this policy is enforced.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Inconsistent "#ifdef KERNEL" on different architectures

2001-06-04 Thread Paul Mackerras

Adrian Bunk writes:

> (my main concern wasn't whether the "#ifdef __KERNEL__" is correct or not
> but I was wondering whether there's a reason why it's different on
> different architectures)

The only valid reason for userspace programs to be including kernel
headers is to get definitions that are part of the kernel API.  (And
in fact others here will go further and assert that there are *no*
valid reasons for userspace programs to include kernel headers.)

If you want some atomic functions or whatever for your userspace
program and the ones in the kernel look like they would be useful,
then take a copy of the relevant kernel code if you like, but don't
include the kernel headers directly.  If you do, you will get bitten
at some point in the future when we decide to change some internal
implementation detail in the kernel, and your program suddenly won't
compile any more.

This is why I added #ifdef __KERNEL__ around most of the contents
of include/asm-ppc/*.h.  It was done deliberately to flush out those
programs which are depending on kernel headers when they shouldn't.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SyncPPP Generic PPP merge

2001-05-24 Thread Paul Mackerras

Jeff Mcadams writes:

> Indeed.  And let me just throw out another thought.  A clean abstraction
> of the various portions of the PPP functionality is beneficial in other
> ways.  My personal pet project being to add L2TP support to the kernel
> eventually.  A good abstraction of the framing capabilities and basic
> PPP processing would be rather useful in that project.

That is exactly what ppp_generic.c is intended to do - it abstracts
out the framing and encapsulation and low-level transport of PPP
frames into ppp "channels" (see for example ppp_async.c,
ppp_synctty.c) while ppp_generic.c does the basic PPP processing
(compression, multilink, handling the network interface device etc.).

You should be able to write an L2TP channel to work with ppp_generic -
all your code would need to know about is how to take a PPP frame and
encapsulate and send it, and how to receive and decapsulate PPP
frames.

[Note to myself: send in a Documentation/ppp_generic.txt which
describes the interface between ppp_generic.c and the channels.]

> I would agree that such a project would be 2.5 material.

Do it today if you like, I can't see that adding a new PPP channel
could break anything else, it would be like adding a new driver.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PATCH: New iSeries Device Drivers (small update)

2001-05-23 Thread Paul Mackerras


Alan Cox writes:

> I was ignoring them because I think they should come via the PPC maintainers

It's OK Alan, Tom is one of the maintainers for Linux on i-Series
(AS/400) machines (we just haven't got around to sending the patch to
the MAINTAINERS file yet).  Cort and Tom and I are discussing how best
to merge in the i-Series support into arch/ppc and include/asm-ppc but
these drivers can go in as far as I am concerned (and AFAIK Cort
agrees).

Regards,
Paul.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: add page argument to copy/clear_user_page

2001-05-23 Thread Paul Mackerras

Linus Torvalds writes:

> > As for the `to' argument, yes it is redundant since it is just kmap(page).
> 
> And why not let "clear_page()" just do that itself?

OK, here's a patch that does that.

> The thing is, copy/clear_page shouldn't exist at all (or rather, the
> "highpage" versions should be renamed to the non-highpage names, because
> the non-highmem case simply isn't interesting any more).

Each architecture already had a clear_page that was functionally
equivalent to memset(p, 0, PAGE_SIZE), but often in assembler, and
likewise a copy_page that was equivalent to memcpy(d, s, PAGE_SIZE).
So I renamed all the existing clear_page's and copy_page's to
__clear_page and __copy_page (since they are "lower-level" or "raw"
clear/copy page routines).

In highmem.h I have renamed copy_highpage to copy_page and
clear_highpage to clear_page.  I also have default versions of
copy_user_page and clear_user_page which just do copy_page/clear_page
for those architectures that don't have any cache issues to deal
with.  Architectures can define __HAVE_ARCH_USER_PAGE in asm/page.h
and then define their own copy/clear_user_page routines if they want
to.

I have fixed up all the architectures except sparc64.  There the
copy/clear_user_page routines are in assembler and my sparc assembler
is pretty rusty these days (particularly when DaveM goes doing hairy
things with the %g registers :).  I'll let Dave fix that one up; the
change is that copy/clear_user_page take page * arguments instead of
void * arguments.

This patch is a fair bit bigger than the last one, but most of the
bulk is just the renaming of clear_page to __clear_page and copy_page
to __copy_page.  I also renamed memclear_highpage to memclear_page
(which isn't actually used anywhere) and memclear_highpage_flush to
memclear_page_flush.

Let me know what you think of this; if it's OK, could you apply it to
your tree?

Thanks,
Paul.

diff -urN linux/Documentation/cachetlb.txt linux.new/Documentation/cachetlb.txt
--- linux/Documentation/cachetlb.txtSat Mar 31 03:05:54 2001
+++ linux.new/Documentation/cachetlb.txtWed May 23 20:48:38 2001
@@ -260,8 +260,9 @@

 Here is the new interface:

-  void copy_user_page(void *to, void *from, unsigned long address)
-  void clear_user_page(void *to, unsigned long address)
+  void copy_user_page(struct page *to, struct page *from,
+ unsigned long address)
+  void clear_user_page(struct page *to, unsigned long address)

These two routines store data in user anonymous or COW
pages.  It allows a port to efficiently avoid D-cache alias
@@ -279,6 +280,11 @@

If D-cache aliasing is not an issue, these two routines may
simply call memcpy/memset directly and do nothing more.
+
+   There are default versions of these procedures supplied in
+   include/linux/highmem.h.  If a port does not want to use the
+   default versions it should declare them and define the symbol
+   __HAVE_ARCH_USER_PAGE in include/asm/page.h.

   void flush_dcache_page(struct page *page)

diff -urN linux/arch/alpha/kernel/alpha_ksyms.c 
linux.new/arch/alpha/kernel/alpha_ksyms.c
--- linux/arch/alpha/kernel/alpha_ksyms.c   Sat Apr 28 23:02:30 2001
+++ linux.new/arch/alpha/kernel/alpha_ksyms.c   Wed May 23 20:39:23 2001
@@ -98,8 +98,8 @@
 EXPORT_SYMBOL(__memset);
 EXPORT_SYMBOL(__memsetw);
 EXPORT_SYMBOL(__constant_c_memset);
-EXPORT_SYMBOL(copy_page);
-EXPORT_SYMBOL(clear_page);
+EXPORT_SYMBOL(__copy_page);
+EXPORT_SYMBOL(__clear_page);

 EXPORT_SYMBOL(__direct_map_base);
 EXPORT_SYMBOL(__direct_map_size);
diff -urN linux/arch/alpha/lib/clear_page.S linux.new/arch/alpha/lib/clear_page.S
--- linux/arch/alpha/lib/clear_page.S   Thu Feb 22 14:24:52 2001
+++ linux.new/arch/alpha/lib/clear_page.S   Wed May 23 20:39:23 2001
@@ -6,9 +6,9 @@

.text
.align 4
-   .global clear_page
-   .ent clear_page
-clear_page:
+   .global __clear_page
+   .ent __clear_page
+__clear_page:
.prologue 0

lda $0,128
@@ -36,4 +36,4 @@
unop
nop

-   .end clear_page
+   .end __clear_page
diff -urN linux/arch/alpha/lib/copy_page.S linux.new/arch/alpha/lib/copy_page.S
--- linux/arch/alpha/lib/copy_page.SThu Feb 22 14:24:52 2001
+++ linux.new/arch/alpha/lib/copy_page.SWed May 23 21:05:31 2001
@@ -6,9 +6,9 @@

.text
.align 4
-   .global copy_page
-   .ent copy_page
-copy_page:
+   .global __copy_page
+   .ent __copy_page
+__copy_page:
.prologue 0

lda $18,128
@@ -46,4 +46,4 @@
unop
nop

-   .end copy_page
+   .end __copy_page
diff -urN linux/arch/alpha/lib/ev6-clear_page.S 
linux.new/arch/alpha/lib/ev6-clear_page.S
--- linux/arch/alpha/lib/ev6-clear_page.S   Thu Feb 22 14:24:52 2001
+++ linux.new/arch/alpha/lib/ev6-clear_page.S   Wed May 23 20:39:23 2001
@@ -6,9 +6,9 @@

 .text
 .align 4
-.global clear_page

Re: SyncPPP IPCP/LCP loop problem and patch

2001-05-22 Thread Paul Mackerras

[EMAIL PROTECTED] writes:

> I've hit a problem with the syncPPP module within Linux.
> 
> Under certain conditions (hard to quantify exactly, but try several 8Mbps
> streams hitting a relatively slow, say 200MHz processor) the LCP/IPCP
> negotiation hits the following loop.

[snip]

> My solution in the patch that follows is to detect the flip-flop using a
> counter and then after three occurrences with no genuine IPCP traffic to
> modify behavior on receipt of the LCP conf REQ. After three attempts we
> acknowledge the LCP conf REQ but stay in the opened state rather than
> dropping back and restarting our own LCP negotiation. This is non-RFC1661
> behavior unless you consider it part of the general loop avoidance directive.

Seems to me that when you get the conf-request in opened state, you
should send your conf-request before sending the conf-ack to the
peer's conf-request.  I think this would short-circuit the loop (I
could be wrong though, it's getting late).

That behaviour would be in line with the FSM in rfc1661, where the
action for event RCR+ in Opened state is "tld,scr,sca/8", i.e. the one
action involves sending both the conf-request and the conf-ack.  It is
debatable to what extent that specifies the order of the messages but
it does list the conf-request first FWIW.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFD w/info-PATCH] device arguments from lookup, partion code

2001-05-21 Thread Paul Mackerras

Alexander Viro writes:

> drivers/net/ppp_generic.c:
> ppp_set_compress(struct ppp *ppp, unsigned long arg)
> {
[snip]
> if (copy_from_user(&data, (void *) arg, sizeof(data))
> || (data.length <= CCP_MAX_OPTION_LENGTH
> && copy_from_user(ccp_option, data.ptr, data.length)))
> goto out;
> 
> And that's far from being uncommon. They _do_ follow pointers. Some - more
> than once.

:) That particular example is one that would probably be much cleaner
as a write on a control fd.  What is there currently is just a
relatively ugly way of getting a variable-sized lump of data from
usermode into the kernel.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: add page argument to copy/clear_user_page

2001-05-20 Thread Paul Mackerras

Linus Torvalds writes:

> If you add the page argument, why leave the old arguments lingering there
> at all? They only create confusion, and add no information. 

You mean the `to' pointer argument, or the `vaddr' argument?  The
`vaddr' argument isn't redundant, it's the user virtual address where
the page is mapped, and sparc64 needs it in order to avoid D-cache
aliasing issues I believe. (Dave?)

As for the `to' argument, yes it is redundant since it is just kmap(page).
But copy/clear_user_page isn't the interface that gets called from the
MM stuff, copy/clear_user_highpage is, defined in include/linux/highmem.h.
These are two of a whole series of functions which all do kmap, do
something, kunmap.

IMHO having the kmap/kunmap calls in copy/clear_user_highpage in
include/linux/highmem.h is the best approach because it means that
most architectures can just #define copy/clear_user_page as
copy/clear_page in include/asm/page.h (as they do at the moment).  It
means that the kmap/kunmap calls are in one place only instead of
being duplicated in every architecture.

But we could instead push the kmap/kunmap down into copy/clear_user_page.
Then we might as well rename them into copy/clear_user_highpage.  Here
is how it might turn out in include/asm-i386/page.h (and asm-alpha,
asm-arm, asm-crus, asm-s390, asm-sh, asm-s390x...):

extern void clear_page(void *page);
extern void copy_page(void * _to, void * _from);

#define clear_user_highpage(page, vaddr)\
do {\
struct page *__page = page; \
clear_page(kmap(__page));   \
kunmap(__page); \
} while (0)

#define copy_user_highpage(to, from, vaddr) \
do {\
struct page *__to = to, *__from = from; \
copy_page(kmap(__to), kmap(__from));\
kunmap(__from); \
kunmap(__to);   \
} while (0)

Doing it with inline functions would be cleaner but would mean that we
would need the declaration of kmap/kunmap in page.h.  That would mean
that we would need to #include  in include/asm/page.h
which is starting to get pretty messy and inviting circular
inclusions.  We could move these declarations to another file in
include/asm - include/asm/highmem.h might seem the natural place but
it is only used if CONFIG_HIGHMEM is defined and not all ports have
it.

I assume nobody wants to do these functions out-of-line. :)

So on the whole the way I had it seems cleanest to me.  But I can whip
up a patch to do the kmap/kunmap in the architecture-specific files
instead, if you prefer - if so, do you prefer the macro version or the
inline function version?

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

add page argument to copy/clear_user_page

2001-05-20 Thread Paul Mackerras


Linus,

The patch below adds a page * argument to copy_user_page and
clear_user_page.  These functions are used only in
include/linux/highmem.h to implement clear_user_highpage and
copy_user_highpage.  The idea is to pass in the pointer to the page
struct for the destination page so that, on architectures where it is
needed, we can use the PG_arch_1 bit of page->flags to indicate
whether the i-cache and d-cache are consistent for the page.  With the
extra argument, copy/clear_user_page can set the PG_arch_1 bit to the
"inconsistent" state.  We can then add architecture-specific code to
flush the i-cache when the bit is in the "inconsistent" state and a
user process wants to be able to execute from the page.

Sparc64, ppc and ia64 at least will benefit from this.  Using the
PG_arch_1 bit in this way lets us avoid doing unnecessary i-cache
flushes for the page - on ppc I have measured a 2.5% reduction in time
for a kernel compile by doing this.

David Miller and David Mosberger-Tang agree with this change, in
fact if you look in include/asm-ia64/pgalloc.h you will see that
copy/clear_user_page already have the extra page * argument.

The patch below does nothing more than add the extra argument to all
the definitions of copy_user_page and clear_user_page (for all
architectures), and to the places where they are called.  At this
stage this extra argument will be unused (except on ia64).  Once this
change goes in, the various architecture maintainers who care can send
you the patches which will make use of the extra argument on their
architecture.  We have patches for ppc tested and ready to be
included, and I know DaveM has patches for sparc64.

Please apply this to your tree.

Thanks,
Paul.

diff -urN linux/Documentation/cachetlb.txt linux.new/Documentation/cachetlb.txt
--- linux/Documentation/cachetlb.txtSat Mar 31 03:05:54 2001
+++ linux.new/Documentation/cachetlb.txtSun May 20 16:44:46 2001
@@ -260,8 +260,9 @@
 
 Here is the new interface:
 
-  void copy_user_page(void *to, void *from, unsigned long address)
-  void clear_user_page(void *to, unsigned long address)
+  void copy_user_page(void *to, void *from, unsigned long address,
+ struct page *page)
+  void clear_user_page(void *to, unsigned long address, struct page *page)
 
These two routines store data in user anonymous or COW
pages.  It allows a port to efficiently avoid D-cache alias
@@ -279,6 +280,12 @@
 
If D-cache aliasing is not an issue, these two routines may
simply call memcpy/memset directly and do nothing more.
+
+   The "page" parameter points to the page struct for the page.
+   This allows a port to store information about the cache status
+   of the page in the page struct (for example, by using the
+   PG_arch_1 bit of the flags field) and update that status to
+   reflect the effect of the clear or copy.
 
   void flush_dcache_page(struct page *page)
 
diff -urN linux/arch/sh/mm/cache.c linux.new/arch/sh/mm/cache.c
--- linux/arch/sh/mm/cache.cSat Apr 28 23:02:38 2001
+++ linux.new/arch/sh/mm/cache.cSun May 20 16:45:47 2001
@@ -506,14 +506,15 @@
 /* Page is 4K, OC size is 16K, there are four lines. */
 #define CACHE_ALIAS 0x3000
 
-void clear_user_page(void *to, unsigned long address)
+void clear_user_page(void *to, unsigned long address, struct page *page)
 {
clear_page(to);
if (((address ^ (unsigned long)to) & CACHE_ALIAS))
__flush_page_to_ram(to);
 }
 
-void copy_user_page(void *to, void *from, unsigned long address)
+void copy_user_page(void *to, void *from, unsigned long address,
+   struct page *page)
 {
copy_page(to, from);
if (((address ^ (unsigned long)to) & CACHE_ALIAS))
diff -urN linux/include/asm-alpha/page.h linux.new/include/asm-alpha/page.h
--- linux/include/asm-alpha/page.h  Thu Feb 22 14:25:37 2001
+++ linux.new/include/asm-alpha/page.h  Sun May 20 16:50:42 2001
@@ -13,10 +13,10 @@
 #define STRICT_MM_TYPECHECKS
 
 extern void clear_page(void *page);
-#define clear_user_page(page, vaddr)   clear_page(page)
+#define clear_user_page(page, vaddr, pg)   clear_page(page)
 
 extern void copy_page(void * _to, void * _from);
-#define copy_user_page(to, from, vaddr)copy_page(to, from)
+#define copy_user_page(to, from, vaddr, page)  copy_page(to, from)
 
 #ifdef STRICT_MM_TYPECHECKS
 /*
diff -urN linux/include/asm-arm/page.h linux.new/include/asm-arm/page.h
--- linux/include/asm-arm/page.hMon Aug 14 02:54:15 2000
+++ linux.new/include/asm-arm/page.hSun May 20 16:50:41 2001
@@ -14,8 +14,8 @@
 #define clear_page(page)   memzero((void *)(page), PAGE_SIZE)
 extern void copy_page(void *to, void *from);
 
-#define clear_user_page(page, vaddr)   clear_page(page)
-#define copy_user_page(to, from, vaddr)copy_page(to, from)
+#define clear_user_page(page, vaddr, pg)   clear_page(page)
+#define copy_user_page(to, from, vaddr, page)  co

icache flushing in kernel/ptrace.c

2001-05-19 Thread Paul Mackerras


I would like to change kernel/ptrace.c to call something else instead
of flush_icache_page in access_one_page in kernel/ptrace.c.  Currently
it calls flush_icache_page on the page after modifying it.  Now of
course on many architectures (including PPC) we need to do some sort
of i-cache flush - my contention is that flush_icache_page is the
wrong interface, we should be calling flush_icache_range or something
like it instead.

The problem with flush_icache_page is that it is also called in
do_no_page and do_swap_page in mm/memory.c.  In the do_no_page case it
is called on a page which we have usually just got from the page
cache.  If the page is clean and has previously had the i-cache
flushed for it then there is no need to do the flush again.  But there
is no way (no reasonable way, anyway) for flush_icache_page to tell
whether it has been called from do_no_page or from access_one_page.

I have been able to get good speedups on PPC by using the PG_arch_1
bit on the page to indicate whether a page is i-cache clean (has had
the flush done), by delaying flushing until necessary (i.e. until a
process maps in the page and has requested execute permission on it),
and by not flushing the page if it has already been flushed.  (Anton
Blanchard has actually done a lot of this work with input from Dave
Miller.)

But to do this I need to make flush_icache_page do nothing, which
breaks ptrace.  For now I have duplicated most of the contents of
kernel/ptrace.c inside arch/ppc/kernel/ptrace.c and changed the
flush_icache_page to flush_icache_range (with appropriate parameters)
to fix this.  But this is not ideal.

AFAICT the architectures that need to maintain i-cache coherency in
software are alpha, ia64, m68k, mips, mips64, parisc, ppc and sparc64.
There seems to be a lot of variation in the assumptions about what
sorts of addresses flush_icache_range will be used on and what it
should do.

Going by the name, flush_icache_range would be the ideal interface for
flushing the range of bytes that have been modified by
access_one_page.  But it looks to me like using it might be suboptimal
on other architectures, e.g. alpha, due to the way that
flush_icache_range has been implemented.

Anyway, here's a proposed patch.  Could the various architecture
maintainers (particularly alpha) comment on what the impact would be
on their architectures?  If flush_icache_range isn't the right
interface either, could we invent one that would be?

Thanks,
Paul.

diff -urN linux/kernel/ptrace.c pmac/kernel/ptrace.c
--- linuxppc_2_4/kernel/ptrace.cWed Mar 21 09:39:08 2001
+++ pmac/kernel/ptrace.cMon Apr 16 12:00:11 2001
@@ -58,10 +58,11 @@
flush_cache_page(vma, addr);
 
if (write) {
-   maddr = kmap(page);
-   memcpy(maddr + (addr & ~PAGE_MASK), buf, len);
+   maddr = kmap(page) + (addr & ~PAGE_MASK);
+   memcpy(maddr, buf, len);
flush_page_to_ram(page);
-   flush_icache_page(vma, page);
+   flush_icache_range((unsigned long) maddr,
+  (unsigned long) maddr + len);
kunmap(page);
} else {
maddr = kmap(page);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] [RESEND] fs/binfmt_elf.c changes vs 2.4.5-pre3

2001-05-18 Thread Paul Mackerras


Linus,

This patch against 2.4.5-pre3 makes 3 changes to fs/binfmt_elf.c:

1. It fixes the csp calculation so that it actually achieves the 16
   byte final alignment that the comment claims.  Previously the csp
   calculation didn't take the AT_NULL entry into account.  If you
   look at the current fs/binfmt_elf.c there is a "sp -= 2" that is
   not reflected in the csp calculation, unlike all the other
   decrements of sp.

2. It allows each architecture to add extra aux table entries by
   defining DLINFO_ARCH_ITEMS and ARCH_DLINFO in .  We need
   this on PowerPC to add entries for the cache line size, and to add
   entries for compatibility with older broken glibc's.

3. It removes the extra 16 bytes that were left free for PowerPC - in
   the past we had to move the auxiliary table up to cope with broken
   glibc's (now we cope by adding special AT_IGNORE entries using the
   ARCH_DLINFO macro).

Please apply this to your tree.

Thanks,
Paul.

diff -Nru a/fs/binfmt_elf.c b/fs/binfmt_elf.c
--- a/fs/binfmt_elf.c   Wed May 16 18:45:10 2001
+++ b/fs/binfmt_elf.c   Wed May 16 18:45:10 2001
@@ -135,12 +135,13 @@
 
/*
 * Force 16 byte _final_ alignment here for generality.
-* Leave an extra 16 bytes free so that on the PowerPC we
-* can move the aux table up to start on a 16-byte boundary.
 */
-   sp = (elf_addr_t *)((~15UL & (unsigned long)(u_platform)) - 16UL);
+   sp = (elf_addr_t *)(~15UL & (unsigned long)(u_platform));
csp = sp;
-   csp -= DLINFO_ITEMS*2 + (k_platform ? 2 : 0);
+   csp -= (1+DLINFO_ITEMS)*2 + (k_platform ? 2 : 0);
+#ifdef DLINFO_ARCH_ITEMS
+   csp -= DLINFO_ARCH_ITEMS*2;
+#endif
csp -= envc+1;
csp -= argc+1;
csp -= (!ibcs ? 3 : 1); /* argc itself */
@@ -174,6 +175,13 @@
NEW_AUX_ENT(10, AT_EUID, (elf_addr_t) current->euid);
NEW_AUX_ENT(11, AT_GID, (elf_addr_t) current->gid);
NEW_AUX_ENT(12, AT_EGID, (elf_addr_t) current->egid);
+#ifdef ARCH_DLINFO
+   /* 
+* ARCH_DLINFO must come last so platform specific code can enforce
+* special alignment requirements on the AUXV if necessary (eg. PPC).
+*/
+   ARCH_DLINFO;
+#endif
 #undef NEW_AUX_ENT
 
sp -= envc+1;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: isa_read/write not available on ppc - solution suggestions ??

2001-05-01 Thread Paul Mackerras

Linus Torvalds writes:

> I would suggest the opposite approach instead: make the PPC just support
> isa_readx/isa_writex instead.

We can certainly do that, no problem.

BUT that won't get a token ring pcmcia card working in the newer
powerbooks, such as the titanium G4 powerbook, because the PCI host
bridge doesn't map any cpu addresses to the bottom 16MB of PCI memory
space.  This is not a problem as far as pcmcia cards are concerned -
the pcmcia stuff just picks an appropriate address (typically in the
range 0x9000 - 0x9fff) and sets the pcmcia/cardbus bridge to
map that to the card.  But it means that the physical addresses for
the card's memory space will be above the 16MB point, so it is
essential to do the ioremap.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.4.3 oopses at lots of ppp sessions

2001-04-26 Thread Paul Mackerras

Marcell GAL writes:

> 2.4.3 (UP kernel UP machine, http://home.sch.bme.hu/~cell/.config) 
> oopses when I start lots of pppd eth0 simultaneously.
> (I guess the problem is not pppoe specific, but I do not know exactly)
> 
> The last pppd sighs: PPP: couldn't register device (-17)
> This is 2 oops not just 1...

Hmmm, somehow the list of ppp units has got a null pointer in it.  At
the moment I don't see how that can happen, but I will look into it.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [CHECKER] security rules?

2001-04-26 Thread Paul Mackerras

William Ie writes:

> 4.linux/2.4.3/drivers/net/ppp_async.c:345:ppp_async_ioctl
> case PPPIOCGFLAGS:
>   val = ap->flags | ap->rbits;
>   if (put_user(val, (int *) arg))
>   break;
>   err = 0;
>   break;
> case PPPIOCSFLAGS:
>   if (get_user(val, (int *) arg))
>   break;
>   ap->flags = val & ~SC_RCV_BITS;
>   spin_lock_bh(&ap->recv_lock);
>   ap->rbits = val & SC_RCV_BITS;
>   spin_unlock_bh(&ap->recv_lock);
>   err = 0;
>   break;
> seems to be getting and setting some flags without CAP_NET_ADMIN like in
> ppp_synctty.c

It is OK because this is a channel ioctl routine called from
ppp_generic.c as a result of an ioctl call on /dev/ppp, and it is not
possible to open /dev/ppp unless you have CAP_NET_ADMIN.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] PPP update against 2.4.4-pre5

2001-04-22 Thread Paul Mackerras


Byeong-ryeol Kim writes:

> I met 'unresolved symbol sk_chk_filter ...' after applying this patch
> and rebooting.( with CONFIG_PPP_FILTER=y )
> There shoud be folling lines in linux/net/netsyms.c or so:
> 
> #ifdef CONFIG_PPP_FILTER
> EXPORT_SYMBOL(sk_chk_filter);
> #endif

Good idea, actually let's put it next to the export of sk_run_filter,
as in the patch below.  Linus, could you apply this patch please?

Paul.

diff -urN linux/net/netsyms.c pmac/net/netsyms.c
--- linux/net/netsyms.c Sun Apr 22 17:07:40 2001
+++ pmac/net/netsyms.c  Mon Apr 23 11:24:31 2001
@@ -158,6 +158,7 @@
 
 #ifdef CONFIG_FILTER
 EXPORT_SYMBOL(sk_run_filter);
+EXPORT_SYMBOL(sk_chk_filter);
 #endif
 
 EXPORT_SYMBOL(neigh_table_init);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH] ppp_generic, kernel 2.4.3

2001-04-22 Thread Paul Mackerras


Tim Wilson writes:

> Thanks for your reply. It seems I am finally talking to the right person (I
> had previously tried posting this on the pptp-server mailing list, and I
> also tried sending it to you directly, but no luck).

Sorry, life has been a little turbulent for me over the last couple of
months.

> Well, I do know that people set up Linux gateways as PPTP servers, and that
> they use MPPE to allow win98 clients to connect to those servers. That's
> what I was trying to do anyway. After the connect, the gateway log says that
> MPPE is negotiated, and the win98 client claims MPPE is being used, so all
> looks OK, but the gateway sends PPP frames in cleartext. If that's not a
> security hole, it is certainly not a Good Thing.

Well, it's a consequence of using a knife to drive in a nail. :)

Neither CCP nor the Linux CCP implementation are really designed to
support encryption.  There is a fairly strong assumption that if
things go pear-shaped you can always take CCP down and send stuff
uncompressed - it will be slower but it will still work.

> As my patch shows, the fix
> is quite easy, so reqardless of what we call it, might as well fix it.

Sure, we can fix the problem you've pointed out, but that won't make
for a secure MPPE implementation.  (Is that an oxymoron, actually?)
What I am saying is that even with your fix there is still a lot more
work to do if you want to make sure that you never send or accept
unencypted PPP frames.

>  Server   Client
> 1)    2)   ConfAck-->
> 3)   ConfReq-->
> 4)    
> 
> The existing code (correctly) enables the compressor when it sends the
> ConfAck (2). Then, it (incorrectly) disables the compressor when sending the
> ConfReq in (3). With my fix, that doesn't happen; the compressor is disabled
> at by reception of the ConfReq at(1), but it's not enabled yet anyway, so no
> harm done.

Good point.

>   if( ppp->flags & SC_CCP_UP) {
>   ppp->rstate &= ~SC_DECOMP_RUN;
>   ppp->xstate &= ~SC_COMP_RUN;
>   ppp->flags &= ~SC_CCP_UP;
>   }

Yep, with the exception that I wouldn't clear SC_CCP_UP, since that is
set and cleared by pppd.

Here is an updated patch.

Paul.

diff -urN linux/drivers/net/ppp_generic.c pmac/drivers/net/ppp_generic.c
--- linux/drivers/net/ppp_generic.c Sun Apr 22 17:07:28 2001
+++ pmac/drivers/net/ppp_generic.c  Mon Apr 23 10:12:27 2001
@@ -1993,10 +1993,10 @@
/*
 * CCP is going down - disable compression.
 */
-   if (inbound)
+   if (ppp->flags & SC_CCP_UP) {
ppp->rstate &= ~SC_DECOMP_RUN;
-   else
ppp->xstate &= ~SC_COMP_RUN;
+   }
break;
 
case CCP_CONFACK:
@@ -2054,7 +2054,7 @@
ppp->xc_state = 0;
}
 
-   ppp->xstate &= ~SC_DECOMP_RUN;
+   ppp->rstate &= ~SC_DECOMP_RUN;
if (ppp->rc_state) {
ppp->rcomp->decomp_free(ppp->rc_state);
ppp->rc_state = 0;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] CONFIG_PPP_FILTER in -ac12 / -pre6

2001-04-22 Thread Paul Mackerras

Andrzej Krzysztofowicz writes:

> CONFIG_PPP_FILTER depends on CONFIG_FILTER (2.4.4-pre6, 2.4.3-ac12)
> [ sk_run_filter(), ...]
> So updated Config.in ...

> -   bool '  PPP filtering' CONFIG_PPP_FILTER
> +   dep_bool '  PPP filtering' CONFIG_PPP_FILTER $CONFIG_FILTER

Yep, definitely a good idea.  Thanks.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pmd_alloc, pte_alloc, Was Re: 2.4.3 and Alpha

2001-04-20 Thread Paul Mackerras

[EMAIL PROTECTED] writes:

>   Basically in the pmd, it would seem that the current design in 2.4.3 forces
> you to have pointers in there. Currently in our source we're using offsets
> instead of a 64 bit pointer... this of course saved us from having to alloc 2
> contiguous pages in memory. 

Nope, the representation of the pgd/pmd/pte entries is entirely up to
you (us :).  The pmd entries for example are accessed through pmd_none,
pmd_present, pte_offset, etc., and are set with pmd_populate.  Those
functions are all defined in asm/pgtable.h and asm/pgalloc.c.  So you
can make the representation whatever you like as long as those
functions all do the right thing.  Same goes for the pgd and pte
levels.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] PPP update against 2.4.4-pre5

2001-04-20 Thread Paul Mackerras


Brown-paper bag time...

The patch I sent earlier didn't include the accompanying changes to
if_ppp.h and ppp_channel.h.  Here they are.

Paul.

diff -urN linux/include/linux/if_ppp.h pmac/include/linux/if_ppp.h
--- linux/include/linux/if_ppp.hTue Mar 28 04:28:55 2000
+++ pmac/include/linux/if_ppp.h Mon Mar  5 12:16:15 2001
@@ -1,4 +1,4 @@
-/* $Id: if_ppp.h,v 1.19 1999/03/31 06:07:57 paulus Exp $   */
+/* $Id: if_ppp.h,v 1.21 2000/03/27 06:03:36 paulus Exp $   */
 
 /*
  * if_ppp.h - Point-to-Point Protocol definitions.
@@ -21,7 +21,7 @@
  */
 
 /*
- *  ==FILEVERSION 2324==
+ *  ==FILEVERSION 2724==
  *
  *  NOTE TO MAINTAINERS:
  * If you modify this file at all, please set the above date.
@@ -130,6 +130,8 @@
 #define PPPIOCSCOMPRESS_IOW('t', 77, struct ppp_option_data)
 #define PPPIOCGNPMODE  _IOWR('t', 76, struct npioctl) /* get NP mode */
 #define PPPIOCSNPMODE  _IOW('t', 75, struct npioctl)  /* set NP mode */
+#define PPPIOCSPASS_IOW('t', 71, struct sock_fprog) /* set pass filter */
+#define PPPIOCSACTIVE  _IOW('t', 70, struct sock_fprog) /* set active filt */
 #define PPPIOCGDEBUG   _IOR('t', 65, int)  /* Read debug level */
 #define PPPIOCSDEBUG   _IOW('t', 64, int)  /* Set debug level */
 #define PPPIOCGIDLE_IOR('t', 63, struct ppp_idle) /* get idle time */
diff -urN linux/include/linux/ppp_channel.h pmac/include/linux/ppp_channel.h
--- linux/include/linux/ppp_channel.h   Mon Apr  2 02:20:35 2001
+++ pmac/include/linux/ppp_channel.hThu Apr 19 19:16:39 2001
@@ -22,7 +22,6 @@
 #include 
 #include 
 #include 
-#include 
 
 struct ppp_channel;
 
@@ -32,7 +31,6 @@
int (*start_xmit)(struct ppp_channel *, struct sk_buff *);
/* Handle an ioctl call that has come in via /dev/ppp. */
int (*ioctl)(struct ppp_channel *, unsigned int, unsigned long);
-   
 };
 
 struct ppp_channel {
@@ -78,16 +76,6 @@
  * in the start_xmit and ioctl routines for the channel by the time
  * that ppp_unregister_channel returns.
  */
-
-/* The following are temporary compatibility stuff */
-ssize_t ppp_channel_read(struct ppp_channel *chan, struct file *file,
-char *buf, size_t count);
-ssize_t ppp_channel_write(struct ppp_channel *chan, const char *buf,
- size_t count);
-unsigned int ppp_channel_poll(struct ppp_channel *chan, struct file *file,
- poll_table *wait);
-int ppp_channel_ioctl(struct ppp_channel *chan, unsigned int cmd,
- unsigned long arg);
 
 #endif /* __KERNEL__ */
 #endif
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.4.3 Compile Errors - Power Mac

2001-04-20 Thread Paul Mackerras

Jeff Galloway writes:

> Compiler error message:
> 
> fork.c: In function copy_mm¹:
> fork.c:353: fixed or forbidden register 68 (0) was spilled for class
> CR0_REGS.
> This may be due to a compiler bug or to impossible asm statements or
> clauses.

You need a newer gcc, I suspect you have egcs installed, and you need
to upgrade to gcc-2.95.2 or later.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PPP update against 2.4.4-pre5

2001-04-20 Thread Paul Mackerras


Linus, Alan,

The patch below does two things:

- It takes out the rest of the compatibility stuff that is no longer
  used, and which has the possibility of accessing memory that has
  been kfree'd (this could happen if you did a blocking read on a tty
  in PPP line discipline, and the tty hangs up).  This possibility was
  pointed out by Kevin Buhr.

- It adds packet filtering to the PPP driver.  The main point of this
  is so that you can specify that certain sorts of packets don't count
  as activity, so they don't reset the idle timer and they don't bring
  up a demand-dialled link.  This is a useful feature that I get asked
  for periodically, it's a small amount of code (in fact it's no extra
  code if you don't enable CONFIG_PPP_FILTER), and it's something I have
  had in my tree since last July without any problems.

Linus, could this go in 2.4.4 please?

Thanks,
Paul.

diff -urN linux/Documentation/Configure.help pmac/Documentation/Configure.help
--- linux/Documentation/Configure.help  Fri Apr 20 17:04:13 2001
+++ pmac/Documentation/Configure.help   Fri Apr 20 17:45:20 2001
@@ -1756,6 +1756,10 @@
   certain types of data to get through the socket. Linux Socket
   Filtering works on all socket types except TCP for now. See the text
   file Documentation/networking/filter.txt for more information.
+
+  You need to say Y here if you want to use PPP packet filtering
+  (see the CONFIG_PPP_FILTER option below).
+
   If unsure, say N.
 
 Network packet filtering
@@ -7087,6 +7091,17 @@
 
   If unsure, say N.
 
+PPP filtering (EXPERIMENTAL)
+CONFIG_PPP_FILTER
+  Say Y here if you want to be able to filter the packets passing over
+  PPP interfaces.  This allows you to control which packets count as
+  activity (i.e. which packets will reset the idle timer or bring up
+  a demand-dialled link) and which packets are to be dropped entirely.
+  You need to say Y here if you wish to use the pass-filter and
+  active-filter options to pppd.
+
+  If unsure, say N.
+
 PPP support for async serial ports
 CONFIG_PPP_ASYNC
   Say Y (or M) here if you want to be able to use PPP over standard
diff -urN linux/drivers/net/Config.in pmac/drivers/net/Config.in
--- linux/drivers/net/Config.in Fri Apr 20 17:04:33 2001
+++ pmac/drivers/net/Config.in  Fri Apr 20 17:24:04 2001
@@ -227,6 +227,7 @@
 tristate 'PPP (point-to-point protocol) support' CONFIG_PPP
 if [ ! "$CONFIG_PPP" = "n" ]; then
dep_bool '  PPP multilink support (EXPERIMENTAL)' CONFIG_PPP_MULTILINK 
$CONFIG_EXPERIMENTAL
+   bool '  PPP filtering' CONFIG_PPP_FILTER
dep_tristate '  PPP support for async serial ports' CONFIG_PPP_ASYNC $CONFIG_PPP
dep_tristate '  PPP support for sync tty ports' CONFIG_PPP_SYNC_TTY $CONFIG_PPP
dep_tristate '  PPP Deflate compression' CONFIG_PPP_DEFLATE $CONFIG_PPP
diff -urN linux/drivers/net/ppp_async.c pmac/drivers/net/ppp_async.c
--- linux/drivers/net/ppp_async.c   Thu Feb 22 14:25:14 2001
+++ pmac/drivers/net/ppp_async.cThu Mar 29 13:47:47 2001
@@ -244,11 +244,6 @@
err = 0;
break;
 
-   case PPPIOCATTACH:
-   case PPPIOCDETACH:
-   err = ppp_channel_ioctl(&ap->chan, cmd, arg);
-   break;
-
default:
err = -ENOIOCTLCMD;
}
diff -urN linux/drivers/net/ppp_generic.c pmac/drivers/net/ppp_generic.c
--- linux/drivers/net/ppp_generic.c Fri Apr 20 17:04:35 2001
+++ pmac/drivers/net/ppp_generic.c  Fri Apr 20 17:31:04 2001
@@ -19,7 +19,7 @@
  * PPP driver, written by Michael Callahan and Al Longyear, and
  * subsequently hacked by Paul Mackerras.
  *
- * ==FILEVERSION 2417==
+ * ==FILEVERSION 2902==
  */
 
 #include 
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -121,6 +122,10 @@
struct sk_buff_head mrq;/* MP: receive reconstruction queue */
 #endif /* CONFIG_PPP_MULTILINK */
struct net_device_stats stats;  /* statistics */
+#ifdef CONFIG_PPP_FILTER
+   struct sock_fprog pass_filter;  /* filter for packets to pass */
+   struct sock_fprog active_filter;/* filter for pkts to reset idle */
+#endif /* CONFIG_PPP_FILTER */
 };
 
 /*
@@ -621,6 +626,43 @@
err = 0;
break;
 
+#ifdef CONFIG_PPP_FILTER
+   case PPPIOCSPASS:
+   case PPPIOCSACTIVE:
+   {
+   struct sock_fprog uprog, *filtp;
+   struct sock_filter *code = NULL;
+   int len;
+
+   if (copy_from_user(&uprog, (void *) arg, sizeof(uprog)))
+   break;
+   if (uprog.len > 0) {
+   err = -ENOMEM;
+   len = uprog.len * sizeof(struct sock_filter);
+   code = kmalloc(len, GFP_KERNEL);
+   if (code == 0)
+

Re: FW: Linux 2.4.3 Compile Errors - Power Mac

2001-04-20 Thread Paul Mackerras

Jeff Galloway writes:

> I sent this report to the people indicated below, whose names I got from the
> MAINTAINERS file in the 2.4.3 distribution, but the email address for Mr.
> MacKerras is no longer good and Mr. Chastain wrote me back that he is not
> following 2.4 issues.

I have left Linuxcare and [EMAIL PROTECTED] no longer works.
Please use [EMAIL PROTECTED]

> The compiler error message along with the menuconfig-generated configuration
> file are set out in the attached MS Word document.  I've had similar
> problems with other versions of 2.4.

Hmmm, I have to go to a lot of trouble to read Word documents, so I
don't like receiving them.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] [RESENT] fix bugs in HID driver

2001-04-20 Thread Paul Mackerras


[Oops, re-sent with a subject line this time...]

Linus,

This patch fixes some bugs in drivers/usb/hid.c.  Johannes Erdfelt
(the maintainer) sent it to you previously but it got missed.  Could
it go in 2.4.4 please?  Here are the comments explaining the patch
that I wrote originally:

> The first hunk just fixes some typos in s32ton.  For example, with
> n == 8, the code as it was would return 0x80 if value > 127 but 0xff
> if value < -128.  With my change it returns 0x7f for value > 127 and
> 0x80 for value < -128.
>
> The second hunk fixes the "cdcd" problem that we see on apple
> keyboards that can only handle 2-key rollover.  If you type "c" "d"
>  quickly on these keyboards, you get a report with the
> error-rollover code (1) in bytes 2 - 7 (instead of the codes for the
> keys that are down).  Without this patch the code thinks that all the
> keys that were down are now up.  When you release one key you get a
> normal report again and the code thinks that the remaining keys have
> been pressed again.  The patch makes the code just discard the report
> once it sees the error-rollover code.
>
> The remaining hunks fix some endianness problems in the code that sets
> the keyboard leds.

Thanks,
Paul.

diff -urN linux/drivers/usb/hid.c linuxppc_2_4/drivers/usb/hid.c
--- linux/drivers/usb/hid.c Thu Feb 22 14:25:27 2001
+++ linuxppc_2_4/drivers/usb/hid.c  Mon Feb 12 13:35:00 2001
@@ -698,7 +698,7 @@
 static __inline__ __u32 s32ton(__s32 value, unsigned n)
 {
__s32 a = value >> (n - 1);
-   if (a && a != -1) return value > 0 ? 1 << (n - 1) : (1 << n) - 1;
+   if (a && a != -1) return value < 0 ? 1 << (n - 1) : (1 << (n - 1)) - 1;
return value & ((1 << n) - 1);
 }
 
@@ -1016,9 +1016,15 @@
__s32 max = field->logical_maximum;
__s32 value[count]; /* WARNING: gcc specific */

-   for (n = 0; n < count; n++)
+   for (n = 0; n < count; n++) {
value[n] = min < 0 ? snto32(extract(data, offset + n * size, 
size), size) : 
extract(data, offset + n * size, 
size);
+   /* Handle the ErrorRollOver code (1) by simply ignoring this 
+report */
+   if (!(field->flags & HID_MAIN_ITEM_VARIABLE)
+   && value[n] >= min && value[n] <= max
+   && field->usage[value[n] - min].hid == HID_UP_KEYBOARD + 1)
+   return;
+   }
 
for (n = 0; n < count; n++) {
 
@@ -1231,7 +1237,7 @@
 
 static int hid_submit_out(struct hid_device *hid)
 {
-   hid->urbout.transfer_buffer_length = hid->out[hid->outtail].dr.length;
+   hid->urbout.transfer_buffer_length = 
+le16_to_cpup(&hid->out[hid->outtail].dr.length);
hid->urbout.transfer_buffer = hid->out[hid->outtail].buffer;
hid->urbout.setup_packet = (void *) &(hid->out[hid->outtail].dr);
hid->urbout.dev = hid->dev;
@@ -1271,8 +1277,8 @@
hid_set_field(field, offset, value);
hid_output_report(field->report, hid->out[hid->outhead].buffer);
 
-   hid->out[hid->outhead].dr.value = 0x200 | field->report->id;
-   hid->out[hid->outhead].dr.length = ((field->report->size - 1) >> 3) + 1;
+   hid->out[hid->outhead].dr.value = cpu_to_le16(0x200 | field->report->id);
+   hid->out[hid->outhead].dr.length = cpu_to_le16((field->report->size + 7) >> 3);
 
hid->outhead = (hid->outhead + 1) & (HID_CONTROL_FIFO_SIZE - 1);
 
@@ -1445,7 +1451,7 @@
for (n = 0; n < HID_CONTROL_FIFO_SIZE; n++) {
hid->out[n].dr.requesttype = USB_TYPE_CLASS | USB_RECIP_INTERFACE;
hid->out[n].dr.request = USB_REQ_SET_REPORT;
-   hid->out[n].dr.index = hid->ifnum;
+   hid->out[n].dr.index = cpu_to_le16(hid->ifnum);
}
 
hid->input.name = hid->name;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

No Subject

2001-04-20 Thread Paul Mackerras


Linus,

This patch fixes some bugs in drivers/usb/hid.c.  Johannes Erdfelt
(the maintainer) sent it to you previously but it got missed.  Could
it go in 2.4.4 please?  Here are the comments explaining the patch
that I wrote originally:

> The first hunk just fixes some typos in s32ton.  For example, with
> n == 8, the code as it was would return 0x80 if value > 127 but 0xff
> if value < -128.  With my change it returns 0x7f for value > 127 and
> 0x80 for value < -128.
>
> The second hunk fixes the "cdcd" problem that we see on apple
> keyboards that can only handle 2-key rollover.  If you type "c" "d"
>  quickly on these keyboards, you get a report with the
> error-rollover code (1) in bytes 2 - 7 (instead of the codes for the
> keys that are down).  Without this patch the code thinks that all the
> keys that were down are now up.  When you release one key you get a
> normal report again and the code thinks that the remaining keys have
> been pressed again.  The patch makes the code just discard the report
> once it sees the error-rollover code.
>
> The remaining hunks fix some endianness problems in the code that sets
> the keyboard leds.

Thanks,
Paul.

diff -urN linux/drivers/usb/hid.c linuxppc_2_4/drivers/usb/hid.c
--- linux/drivers/usb/hid.c Thu Feb 22 14:25:27 2001
+++ linuxppc_2_4/drivers/usb/hid.c  Mon Feb 12 13:35:00 2001
@@ -698,7 +698,7 @@
 static __inline__ __u32 s32ton(__s32 value, unsigned n)
 {
__s32 a = value >> (n - 1);
-   if (a && a != -1) return value > 0 ? 1 << (n - 1) : (1 << n) - 1;
+   if (a && a != -1) return value < 0 ? 1 << (n - 1) : (1 << (n - 1)) - 1;
return value & ((1 << n) - 1);
 }
 
@@ -1016,9 +1016,15 @@
__s32 max = field->logical_maximum;
__s32 value[count]; /* WARNING: gcc specific */

-   for (n = 0; n < count; n++)
+   for (n = 0; n < count; n++) {
value[n] = min < 0 ? snto32(extract(data, offset + n * size, 
size), size) : 
extract(data, offset + n * size, 
size);
+   /* Handle the ErrorRollOver code (1) by simply ignoring this 
+report */
+   if (!(field->flags & HID_MAIN_ITEM_VARIABLE)
+   && value[n] >= min && value[n] <= max
+   && field->usage[value[n] - min].hid == HID_UP_KEYBOARD + 1)
+   return;
+   }
 
for (n = 0; n < count; n++) {
 
@@ -1231,7 +1237,7 @@
 
 static int hid_submit_out(struct hid_device *hid)
 {
-   hid->urbout.transfer_buffer_length = hid->out[hid->outtail].dr.length;
+   hid->urbout.transfer_buffer_length = 
+le16_to_cpup(&hid->out[hid->outtail].dr.length);
hid->urbout.transfer_buffer = hid->out[hid->outtail].buffer;
hid->urbout.setup_packet = (void *) &(hid->out[hid->outtail].dr);
hid->urbout.dev = hid->dev;
@@ -1271,8 +1277,8 @@
hid_set_field(field, offset, value);
hid_output_report(field->report, hid->out[hid->outhead].buffer);
 
-   hid->out[hid->outhead].dr.value = 0x200 | field->report->id;
-   hid->out[hid->outhead].dr.length = ((field->report->size - 1) >> 3) + 1;
+   hid->out[hid->outhead].dr.value = cpu_to_le16(0x200 | field->report->id);
+   hid->out[hid->outhead].dr.length = cpu_to_le16((field->report->size + 7) >> 3);
 
hid->outhead = (hid->outhead + 1) & (HID_CONTROL_FIFO_SIZE - 1);
 
@@ -1445,7 +1451,7 @@
for (n = 0; n < HID_CONTROL_FIFO_SIZE; n++) {
hid->out[n].dr.requesttype = USB_TYPE_CLASS | USB_RECIP_INTERFACE;
hid->out[n].dr.request = USB_REQ_SET_REPORT;
-   hid->out[n].dr.index = hid->ifnum;
+   hid->out[n].dr.index = cpu_to_le16(hid->ifnum);
}
 
hid->input.name = hid->name;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] update drivers/input/keybdev.c

2001-04-19 Thread Paul Mackerras


Linus,

The following patch updates drivers/input/keybdev.c so that we can
generate either linux keycodes or ADB keycodes from keyboards that use
the input layer.  We now have ADB keyboards and mice using the input
layer as well as USB, so it is very useful to have the flexibility to
choose at runtime which type of keycodes you want to receive.  You
already have in your tree all of the other changes that we need in
order to do this, just this one file got missed somehow.

This change has been approved by the maintainer, Vojtech Pavlik.

If you decide not to take this patch, please let me know so I can
send you a patch to back out the corresponding changes that have
already been made in other files.

Paul.

diff -urN linux/drivers/input/keybdev.c pmac/drivers/input/keybdev.c
--- linux/drivers/input/keybdev.c   Thu Apr 19 15:03:43 2001
+++ pmac/drivers/input/keybdev.cFri Apr 20 16:47:48 2001
@@ -38,7 +38,8 @@
 #include 
 
 #if defined(CONFIG_X86) || defined(CONFIG_IA64) || defined(__alpha__) || \
-defined(__mips__) || defined(CONFIG_SPARC64) || defined(CONFIG_SUPERH)
+defined(__mips__) || defined(CONFIG_SPARC64) || defined(CONFIG_SUPERH) || \
+defined(CONFIG_PPC) || defined(__mc68000__)
 
 static int x86_sysrq_alt = 0;
 #ifdef CONFIG_SPARC64
@@ -63,8 +64,46 @@
308,310,313,314,315,317,318,319,320,321,322,323,324,325,326,330,
332,340,341,342,343,344,345,346,356,359,365,368,369,370,371,372 };
 
+#ifdef CONFIG_MAC_EMUMOUSEBTN
+extern int mac_hid_mouse_emulate_buttons(int, int, int);
+#endif /* CONFIG_MAC_EMUMOUSEBTN */
+#ifdef CONFIG_MAC_ADBKEYCODES
+extern int mac_hid_keyboard_sends_linux_keycodes(void);
+#else
+#define mac_hid_keyboard_sends_linux_keycodes()0
+#endif /* CONFIG_MAC_ADBKEYCODES */
+#if defined(CONFIG_MAC_ADBKEYCODES) || defined(CONFIG_ADB_KEYBOARD)
+static unsigned char mac_keycodes[256] = {
+ 0, 53, 18, 19, 20, 21, 23, 22, 26, 28, 25, 29, 27, 24, 51, 48,
+12, 13, 14, 15, 17, 16, 32, 34, 31, 35, 33, 30, 36, 54,128,  1,
+ 2,  3,  5,  4, 38, 40, 37, 41, 39, 50, 56, 42,  6,  7,  8,  9,
+11, 45, 46, 43, 47, 44,123, 67, 58, 49, 57,122,120, 99,118, 96,
+97, 98,100,101,109, 71,107, 89, 91, 92, 78, 86, 87, 88, 69, 83,
+84, 85, 82, 65, 42,  0, 10,103,111,  0,  0,  0,  0,  0,  0,  0,
+76,125, 75,105,124,110,115, 62,116, 59, 60,119, 61,121,114,117,
+ 0,  0,  0,  0,127, 81,  0,113,  0,  0,  0,  0, 95, 55, 55,  0,
+ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
+ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
+ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
+ 0,  0,  0,  0,  0, 94,  0, 93,  0,  0,  0,  0,  0,  0,104,102 };
+#endif /* CONFIG_MAC_ADBKEYCODES || CONFIG_ADB_KEYBOARD */
+ 
 static int emulate_raw(unsigned int keycode, int down)
 {
+#ifdef CONFIG_MAC_EMUMOUSEBTN
+   if (mac_hid_mouse_emulate_buttons(1, keycode, down))
+   return 0;
+#endif /* CONFIG_MAC_EMUMOUSEBTN */
+#if defined(CONFIG_MAC_ADBKEYCODES) || defined(CONFIG_ADB_KEYBOARD)
+   if (!mac_hid_keyboard_sends_linux_keycodes()) {
+   if (keycode > 255 || !mac_keycodes[keycode])
+   return -1;
+   
+   handle_scancode((mac_keycodes[keycode] & 0x7f), down);
+   return 0;
+   }
+#endif /* CONFIG_MAC_ADBKEYCODES || CONFIG_ADB_KEYBOARD */
+
if (keycode > 255 || !x86_keycodes[keycode])
return -1; 
 
@@ -103,28 +142,6 @@
if (keycode == KEY_STOP)
sparc_l1_a_state = down;
 #endif
-
-   return 0;
-}
-
-#elif defined(CONFIG_ADB_KEYBOARD)
-
-static unsigned char mac_keycodes[128] =
-   { 0, 53, 18, 19, 20, 21, 23, 22, 26, 28, 25, 29, 27, 24, 51, 48,
-12, 13, 14, 15, 17, 16, 32, 34, 31, 35, 33, 30, 36, 54,128,  1,
- 2,  3,  5,  4, 38, 40, 37, 41, 39, 50, 56, 42,  6,  7,  8,  9,
-11, 45, 46, 43, 47, 44,123, 67, 58, 49, 57,122,120, 99,118, 96,
-97, 98,100,101,109, 71,107, 89, 91, 92, 78, 86, 87, 88, 69, 83,
-84, 85, 82, 65, 42,  0, 10,103,111,  0,  0,  0,  0,  0,  0,  0,
-76,125, 75,105,124,  0,115, 62,116, 59, 60,119, 61,121,114,117,
- 0,  0,  0,  0,127, 81,  0,113,  0,  0,  0,  0,  0, 55, 55 };
-
-static int emulate_raw(unsigned int keycode, int down)
-{
-   if (keycode > 127 || !mac_keycodes[keycode])
-   return -1;
-
-   handle_scancode(mac_keycodes[keycode] & 0x7f, down);
 
return 0;
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.4.3] PPP errors

2001-04-19 Thread Paul Mackerras

Manfred H. Winter writes:

> Apr  4 02:05:21 marvin pppd[1227]: Plugin /usr/lib/passwordfd.so loaded.
> Apr  4 02:05:21 marvin pppd[1227]: pppd 2.4.0 started by mahowi, uid 500
> Apr  4 02:05:21 marvin pppd[1227]: Perms of /dev/ttyS0 are ok, no 'mesg n' necce
> sary.

Just out of curiosity, what pppd are you running, with what patches?
I don't recognize the message about 'perms of /dev/ttyS0'.
Or does this message come from the passwordfd.so plugin?

> Modules Loaded serial sb sb_lib uart401 isa-pnp NVdriver opl3 sound 
>soundcore ipt_MASQUERADE iptable_nat ip_conntrack ppp_generic slhc iptable_filter 
>ip_tables af_packet khttpd autofs4 unix 8139too ide-scsi aic7xxx scsi_mod

No ppp_async loaded - that's the problem.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] [RESEND] update chipsfb driver

2001-03-24 Thread Paul Mackerras


Linus,

At present, drivers/video/chipsfb.c can only be used on PPC, and it
doesn't compile even on PPC.  The patch below makes it compile, and by
changing it to use the generic inb/outb, means that there is at least
a chance it can be used on other platforms.  The patch is against
2.4.3-pre7, could you apply it please?

Paul.

diff -urN linux/drivers/video/chipsfb.c pmac/drivers/video/chipsfb.c
--- linux/drivers/video/chipsfb.c   Thu Feb 22 14:25:27 2001
+++ pmac/drivers/video/chipsfb.cSat Mar  3 21:17:19 2001
@@ -29,17 +29,19 @@
 #include 
 #include 
 #include 
+#include 
+
 #ifdef CONFIG_FB_COMPAT_XPMAC
 #include 
-#endif
-#include 
-#include 
 #include 
+#endif
 #ifdef CONFIG_PMAC_BACKLIGHT
 #include 
 #endif
+#ifdef CONFIG_PMAC_PBOOK
 #include 
 #include 
+#endif
 
 #include 
 #include 
@@ -56,14 +58,13 @@
struct {
__u8 red, green, blue;
} palette[256];
+   struct pci_dev *pdev;
unsigned long frame_buffer_phys;
__u8 *frame_buffer;
unsigned long blitter_regs_phys;
__u32 *blitter_regs;
unsigned long blitter_data_phys;
__u8 *blitter_data;
-   unsigned long io_base_phys;
-   __u8 *io_base;
struct fb_info_chips *next;
 #ifdef CONFIG_PMAC_PBOOK
unsigned char *save_framebuffer;
@@ -74,10 +75,10 @@
 };
 
 #define write_ind(num, val, ap, dp)do { \
-   out_8(p->io_base + (ap), (num)); out_8(p->io_base + (dp), (val)); \
+   outb((num), (ap)); outb((val), (dp)); \
 } while (0)
 #define read_ind(num, var, ap, dp) do { \
-   out_8(p->io_base + (ap), (num)); var = in_8(p->io_base + (dp)); \
+   outb((num), (ap)); var = inb((dp)); \
 } while (0);
 
 /* extension registers */
@@ -97,10 +98,10 @@
 #define read_sr(num, var)  read_ind(num, var, 0x3c4, 0x3c5)
 /* attribute registers - slightly strange */
 #define write_ar(num, val) do { \
-   in_8(p->io_base + 0x3da); write_ind(num, val, 0x3c0, 0x3c0); \
+   inb(0x3da); write_ind(num, val, 0x3c0, 0x3c0); \
 } while (0)
 #define read_ar(num, var)  do { \
-   in_8(p->io_base + 0x3da); read_ind(num, var, 0x3c0, 0x3c1); \
+   inb(0x3da); read_ind(num, var, 0x3c0, 0x3c1); \
 } while (0)
 
 static struct fb_info_chips *all_chips;
@@ -117,7 +118,7 @@
  */
 int chips_init(void);
 
-static void chips_of_init(struct device_node *dp);
+static void chips_pci_init(struct pci_dev *dp);
 static int chips_get_fix(struct fb_fix_screeninfo *fix, int con,
 struct fb_info *info);
 static int chips_get_var(struct fb_var_screeninfo *var, int con,
@@ -253,29 +254,29 @@
 #endif /* CONFIG_PMAC_BACKLIGHT */
/* get the palette from the chip */
for (i = 0; i < 256; ++i) {
-   out_8(p->io_base + 0x3c7, i);
+   outb(i, 0x3c7);
udelay(1);
-   p->palette[i].red = in_8(p->io_base + 0x3c9);
-   p->palette[i].green = in_8(p->io_base + 0x3c9);
-   p->palette[i].blue = in_8(p->io_base + 0x3c9);
+   p->palette[i].red = inb(0x3c9);
+   p->palette[i].green = inb(0x3c9);
+   p->palette[i].blue = inb(0x3c9);
}
for (i = 0; i < 256; ++i) {
-   out_8(p->io_base + 0x3c8, i);
+   outb(i, 0x3c8);
udelay(1);
-   out_8(p->io_base + 0x3c9, 0);
-   out_8(p->io_base + 0x3c9, 0);
-   out_8(p->io_base + 0x3c9, 0);
+   outb(0, 0x3c9);
+   outb(0, 0x3c9);
+   outb(0, 0x3c9);
}
} else {
 #ifdef CONFIG_PMAC_BACKLIGHT
set_backlight_enable(1);
 #endif /* CONFIG_PMAC_BACKLIGHT */
for (i = 0; i < 256; ++i) {
-   out_8(p->io_base + 0x3c8, i);
+   outb(i, 0x3c8);
udelay(1);
-   out_8(p->io_base + 0x3c9, p->palette[i].red);
-   out_8(p->io_base + 0x3c9, p->palette[i].green);
-   out_8(p->io_base + 0x3c9, p->palette[i].blue);
+   outb(p->palette[i].red, 0x3c9);
+   outb(p->palette[i].green, 0x3c9);
+   outb(p->palette[i].blue, 0x3c9);
}
}
 }
@@ -307,11 +308,11 @@
p->palette[regno].red = red;
p->palette[regno].green = green;
p->palette[regno].blue = blue;
-   out_8(p->io_base + 0x3c8, regno);
+   outb(regno, 0x3c8);
udelay(1);
-   out_8(p->io_base + 0x3c9, red);
-   out_8(p->io_base + 0x3c9, green);
-   out_8(p->io_base + 0x3c9, blue);
+   outb(red, 0x3c9);
+   outb(green, 0x3c9);
+   outb(blue, 0x3c9);
 
 #ifdef FBCON_HAS_CFB16
if (regno < 16)
@@ -388,7 +389,7 @@
disp->visual = fix->

[PATCH] MM update for PPC

2001-03-23 Thread Paul Mackerras


Linus,

The patch below updates the MM code for PowerPC to correspond with the
recent generic MM changes.  The patch is against 2.4.3-pre7, and it
affects only arch/ppc/mm/init.c, include/asm-ppc/pgalloc.h, and
include/asm-ppc/semaphore.h.

The changes to semaphore.h are only necessary because the definition
of INIT_MM in sched.h uses __RWSEM_INITIALIZER with the argument of
RW_LOCK_BIAS, meaning an unlocked semaphore.  I think RW_LOCK_BIAS is
at the very least a horrible name for something that means an unlocked
semaphore, and in fact it is really a private definition used in the
i386 semaphore code which should never be used in generic code like
this.  (But no I don't have a patch to fix this properly at the
moment.)

Paul.

diff -urN linux/arch/ppc/mm/init.c linuxppc_2_4/arch/ppc/mm/init.c
--- linux/arch/ppc/mm/init.cWed Mar 21 15:43:54 2001
+++ linuxppc_2_4/arch/ppc/mm/init.c Thu Mar 22 10:39:23 2001
@@ -110,7 +110,7 @@
 #endif
 
 void MMU_init(void);
-static void *MMU_get_page(void);
+void *early_get_page(void);
 unsigned long prep_find_end_of_memory(void);
 unsigned long pmac_find_end_of_memory(void);
 unsigned long apus_find_end_of_memory(void);
@@ -125,7 +125,7 @@
 unsigned long m8260_find_end_of_memory(void);
 #endif /* CONFIG_8260 */
 static void mapin_ram(void);
-void map_page(unsigned long va, unsigned long pa, int flags);
+int map_page(unsigned long va, unsigned long pa, int flags);
 void set_phys_avail(unsigned long total_ram);
 extern void die_if_kernel(char *,struct pt_regs *,long);
 
@@ -206,41 +206,20 @@
pmd_val(*pmd) = (unsigned long) BAD_PAGETABLE;
 }
 
-pte_t *get_pte_slow(pmd_t *pmd, unsigned long offset)
-{
-pte_t *pte;
-
-if (pmd_none(*pmd)) {
-   if (!mem_init_done)
-   pte = (pte_t *) MMU_get_page();
-   else if ((pte = (pte_t *) __get_free_page(GFP_KERNEL)))
-   clear_page(pte);
-if (pte) {
-pmd_val(*pmd) = (unsigned long)pte;
-return pte + offset;
-}
-   pmd_val(*pmd) = (unsigned long)BAD_PAGETABLE;
-return NULL;
-}
-if (pmd_bad(*pmd)) {
-__bad_pte(pmd);
-return NULL;
-}
-return (pte_t *) pmd_page(*pmd) + offset;
-}
-
 int do_check_pgt_cache(int low, int high)
 {
int freed = 0;
-   if(pgtable_cache_size > high) {
+   if (pgtable_cache_size > high) {
do {
-   if(pgd_quicklist)
-   free_pgd_slow(get_pgd_fast()), freed++;
-   if(pmd_quicklist)
-   free_pmd_slow(get_pmd_fast()), freed++;
-   if(pte_quicklist)
-   free_pte_slow(get_pte_fast()), freed++;
-   } while(pgtable_cache_size > low);
+if (pgd_quicklist) {
+   free_pgd_slow(get_pgd_fast());
+   freed++;
+   }
+   if (pte_quicklist) {
+   pte_free_slow(pte_alloc_one_fast());
+   freed++;
+   }
+   } while (pgtable_cache_size > low);
}
return freed;
 }
@@ -383,6 +362,7 @@
 __ioremap(unsigned long addr, unsigned long size, unsigned long flags)
 {
unsigned long p, v, i;
+   int err;
 
/*
 * Choose an address to map it to.
@@ -453,10 +433,20 @@
flags |= _PAGE_GUARDED;
 
/*
-* Is it a candidate for a BAT mapping?
+* Should check if it is a candidate for a BAT mapping
 */
-   for (i = 0; i < size; i += PAGE_SIZE)
-   map_page(v+i, p+i, flags);
+
+   spin_lock(&init_mm.page_table_lock);
+   err = 0;
+   for (i = 0; i < size && err == 0; i += PAGE_SIZE)
+   err = map_page(v+i, p+i, flags);
+   spin_unlock(&init_mm.page_table_lock);
+   if (err) {
+   if (mem_init_done)
+   vfree((void *)v);
+   return NULL;
+   }
+
 out:
return (void *) (v + (addr & ~PAGE_MASK));
 }
@@ -492,7 +482,7 @@
return (pte_val(*pg) & PAGE_MASK) | (addr & ~PAGE_MASK);
 }
 
-void
+int
 map_page(unsigned long va, unsigned long pa, int flags)
 {
pmd_t *pd;
@@ -501,10 +491,13 @@
/* Use upper 10 bits of VA to index the first level map */
pd = pmd_offset(pgd_offset_k(va), va);
/* Use middle 10 bits of VA to index the second-level map */
-   pg = pte_alloc(pd, va);
+   pg = pte_alloc(&init_mm, pd, va);
+   if (pg == 0)
+   return -ENOMEM;
set_pte(pg, mk_pte_phys(pa & PAGE_MASK, __pgprot(flags)));
if (mem_init_done)
flush_hash_page(0, va);
+   return 0;
 }
 
 #ifndef CONFIG_8xx
@@ -830,21 +823,16 @@
}
 }
 
-/* In fact t

Re: PATCH against 2.4.2: TTY hangup on PPP channel corrupts kernel memory

2001-03-22 Thread Paul Mackerras


Kevin Buhr writes:

> I didn't realize my specific hang was a peculiarity of the older
> attachment style.  The channel created by pushing the PPP line

I didn't realize you were talking about linux 2.4.0 and pppd 2.3.11.

> discipline onto a TTY was connected to a unit with a PPPIOCATTACH
> ioctl on the TTY---this didn't really "attach" the channel; it still
> had a refcnt of only one.  Through the old compatibility interface, it
> was possible to call ppp_asynctty_read -> ppp_channel_read -> ppp_read
> on the channel's "struct ppp_file" and wait on the channel's "rwait".
> If the modem hung up, "do_tty_hangup" would call "ppp_asynctty_close"
> (with a reader still in "ppp_asynctty_read") and the "struct channel"
> would be freed in "ppp_unregister_channel".

That's one of the main reasons why I removed the compatibility
stuff. :)

> I think your analysis of how things presently are with 2.4.2 and a
> modern "pppd" is correct...
> 
> Since the new "pppd" uses an explicit PPPIOCATTCHAN / PPPIOCCONNECT
> sequence, the refcnt gets bumped to 2 and stays there while the
> channel is attached.  So, this specific hang isn't a problem anymore
> for "ppp_async.c".  It's still a problem with "ppp_synctty.c", though
> (when used with "pppd" 2.3.11, say).  Is the compatibility stuff in
> there slated for removal, too?

Yep, and we should take out the stuff in ppp_generic.c that was called
by the compatibility stuff in the channels, too.

> In particular, the comment above "ppp_asynctty_close" is misleading.
> It's true that the TTY layer won't call any further line discipline
> entries while the "close" is executing; however, there may be
> processes already sleeping in line discipline functions called before
> the hangup.  For example, "ppp_asynctty_close" could be called while
> we sleep in the "get_user" in "ppp_channel_ioctl" (called from
> "ppp_asynctty_ioctl").  Therefore, calling "PPPIOCATTACH" on an
> unattached PPP-disciplined TTY could, in unlikely circumstances
> (argument swapped out), lead to a crash.

Yuck.  I don't see that we can protect against this without having
some sort of lock in the tty structure, though.  We can't protect the
existence of the channel structure with a lock inside that structure.
Ideally the necessary protection would be provided at the tty level.

> I assume PPPIOCATTACH (on the TTY) is deprecated in favor of
> PPPIOCATTCHAN / PPPIOCCONNECT (on the "/dev/ppp" handle).  Can we
> eliminate "ppp_channel_ioctl" from "ppp_async.c" entirely, as in the
> patch below?  We're requiring people to upgrade to "pppd" 2.4.0
> anyway, and it has no need for these calls.  This would give me a warm,
> fuzzy feeling.

Sure, that would be fine.  I'll make up a patch and send it to Linus.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PATCH against 2.4.2: TTY hangup on PPP channel corrupts kernel memory

2001-03-17 Thread Paul Mackerras

Kevin Buhr writes:

> If there's a hangup in the TTY layer on an async PPP channel,
> do_tty_hangup shuts down the PPP line discipline, and, in ppp_async.c,
> the function ppp_asynctty_close unregisteres the channel.  In
> ppp_generic.c, ppp_unregister_channel merrily wakes up the rwait
> queue, then proceeds to destroy the channel, freeing the "struct
> channel" which contains the "struct ppp_file" that contains the
> "wait_queue_head_t rwait".  When the waiting process wakes up, it
> removes itself from the wait queue, modifying freed memory.

But the waiting process must have had an instance of /dev/ppp open and
attached to the channel in order to be doing anything with rwait,
within either ppp_file_read or ppp_poll.  The process of attaching to
the channel increases its refcnt, meaning that the channel shouldn't
be destroyed until the instance of /dev/ppp is closed and ppp_release
is called.

Note that pppd will not be blocking inside ppp_file_read since it sets
the file descriptor non-blocking.  Most of the time pppd would be
inside a select, so rwait would be in use by the poll/select code.

I presume that the generic file descriptor code ensures that the file
release function doesn't get called while any task is inside the read
or write function for that file, or while the file descriptor is in
use in a select or poll.  If that assumption is wrong then it would
indeed be possible for the channel to be destroyed while some process
is waiting on rwait.  But in any case it shouldn't be a problem in
practice since it would only be pppd that would have the channel open
and pppd is single-threaded, i.e. it couldn't be closing the file
descriptor while it is blocked inside read or select.

So, to put it in other words, this is the sequence (simplified):

fd = open("/dev/ppp", O_RDWR);
ioctl(fd, PPPIOCATTCHAN, &channel_number);
fcntl(fd, F_SETFL, fcntl(fd, F_GETFL) | O_NONBLOCK);

select(...);/* fd_sets including fd */
read(fd, ...);
...
close(fd);

I believe the channel structure is guaranteed to exist from the ioctl
to the close, and all the selects and reads (i.e. all the uses of
rwait) have to happen within that time interval.

> A patch against 2.4.2 follows.  I've overloaded the "refcnt" in
> "struct ppp_file" to also keep track of rwaiters.  The last refcnt
> user destroys the channel and decreases the module use count.  I've
> tested this with printks in all the right places, and it seems to fix
> the problem correctly.

I'm not sure this is the right fix, this sounds to me like the
refcounts are going awry somehow or there is an SMP race that I
haven't considered, and I am concerned that this patch will just cover
over the real problem.  Actually, given that you've seen it 4 times in
6 months it's more likely that it is an SMP race IMHO.

In any case I don't think your patch does the right thing with
ppp_poll, because poll_wait doesn't actually wait, it just adds rwait
to a list of things to watch for wakeups.  In other words, rwait will
be in use from the time poll_wait is called until the time that the
poll/select logic (in fs/select.c) decides that it's time to return to
the user.  So increasing the refcount around just the poll_wait call
won't help much.

Do you have a way to reproduce the problem at will?  Have you seen it
happen on a UP box (i.e. could it be an SMP race)?  How sure are you
that your patch really fixes the problem?

Regards,
Paul.

-- 
Paul Mackerras, Open Source Research Fellow, Linuxcare, Inc.
+61 2 6262 8990 tel, +61 2 6262 8991 fax
[EMAIL PROTECTED], http://www.linuxcare.com.au/
Linuxcare.  Putting Open Source to work.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Bug in ppp_async.c

2001-01-24 Thread Paul Mackerras

Albert D. Cahalan writes:

> Even Red Hat 7 only has the 2.3.11 version.
> 
> The 2.4.xx series is supposed to be stable. If there is any way
> you could add a compatibility hack, please do so.

Stable != backwards compatible to the year dot.  ppp-2.4.0 has been
out for over 5 months now.  Adding the compatibility stuff back in
would make the PPP subsystem much more complicated and less robust.
And pppd is not the only thing you would have to upgrade if you are
using a 2.4.0 with Red Hat 7.0 - I would expect that you would also at
least have to upgrade modutils, and switch over from ipchains to
iptables if you use the netfilter stuff.

Paul.

-- 
Paul Mackerras, Open Source Research Fellow, Linuxcare, Inc.
+61 2 6262 8990 tel, +61 2 6262 8991 fax
[EMAIL PROTECTED], http://www.linuxcare.com.au/
Linuxcare.  Support for the revolution.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Bug in ppp_async.c

2001-01-23 Thread Paul Mackerras


Jo l'Indien writes:

> I found a bug in the 2.4.1-pre10 version of ppp_async.c
> 
> In fact, a lot of ioctl are not supported any more,
> whih make the pppd start fail.

I'll bet you're using an old pppd.  You need version 2.4.0 of pppd,
available from ftp://linuxcare.com.au/pub/ppp/, as documented in the
Documentation/Changes file.

> PS: sorry, but I don't know who is the actual maitainer of this
> driver...

Me.

-- 
Paul Mackerras, Open Source Research Fellow, Linuxcare, Inc.
+61 2 6262 8990 tel, +61 2 6262 8991 fax
[EMAIL PROTECTED], http://www.linuxcare.com.au/
Linuxcare.  Support for the revolution.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

< 1 2 3 4 5 6 7 >

501 - 600 of 611 matches

Mail list logo