Re: [RFC PATCH 1/2] USB: add HCD_BOUNCE_BUFFERS host controller driver flag

2010-02-07 Thread Albert Herranz
Alan Stern wrote:
>On a 64-bit processor, some of the accesses will be 64 bits wide
>instead of 32.  Does that matter for your purposes?
>

The wii uses a 32-bit processor, so this is safe in this case.

>What about ohci-hcd and uhci-hcd?  They both use non-32-bit accesses to 
>structures in coherent memory.
>

The wii has no uhci, but has 2 ohci controllers.
For ohci we need a similar approach as done for ehci.

>If you do it as described above then the buffers you're worried about
>won't be allocated in coherent memory to begin with, so no problems 
>will arise.

It turns out that we have more limitations.
The wii has 2 discontiguous memory areas (usually called MEM1 and MEM2). I have 
checked that the ehci controller doesn't work properly when performing dma to 
buffers allocated in MEM1 (it corrupts part of the data) but has no problems if 
the buffers sit within MEM2.
So usb buffers will need to be bounced anyway if they are part of MEM1.

This worked in the original patch as buffers were always bounced to MEM2 
buffers. Sigh.

>Alan Stern
>

Thanks,
Albert

PS: Your reply didn't get to me. I looked at the ML and found it (I'm not 
subscribed). Sorry for the late answer.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 1/2] USB: add HCD_BOUNCE_BUFFERS host controller driver flag

2010-02-07 Thread Alan Stern
On Sun, 7 Feb 2010, Albert Herranz wrote:

> The wii has no uhci, but has 2 ohci controllers.
> For ohci we need a similar approach as done for ehci.

So you'll need to write a patch splitting up the OHCI data structures 
in the same way the EHCI qh was split up.

> >If you do it as described above then the buffers you're worried about
> >won't be allocated in coherent memory to begin with, so no problems 
> >will arise.
> 
> It turns out that we have more limitations.
> The wii has 2 discontiguous memory areas (usually called MEM1 and MEM2). I 
> have checked that the ehci controller doesn't work properly when performing 
> dma to buffers allocated in MEM1 (it corrupts part of the data) but has no 
> problems if the buffers sit within MEM2.
> So usb buffers will need to be bounced anyway if they are part of MEM1.

This sounds like the sort of restriction that dma_map_single() should 
be capable of handling.

Alan Stern

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


powerpc: Reduce differences between pseries and ppc64 defconfigs

2010-02-07 Thread Anton Blanchard

The pseries and ppc64 defconfigs have drifted apart over the years. Reduce
some of the differences while still keeping the idea that the ppc64 defconfig
is cross platform but enables fewer features than pseries, eg NR_CPUS is
lower.

Also enable a number of common adapters as modules.

Signed-off-by: Anton Blanchard 
---

v2: Enable NUMA on ppc64_defconfig

Index: powerpc.git/arch/powerpc/configs/ppc64_defconfig
===
--- powerpc.git.orig/arch/powerpc/configs/ppc64_defconfig   2010-02-05 
14:57:48.889717208 +1100
+++ powerpc.git/arch/powerpc/configs/ppc64_defconfig2010-02-05 
14:57:52.600960379 +1100
@@ -137,8 +137,9 @@ CONFIG_TRACEPOINTS=y
 CONFIG_MARKERS=y
 CONFIG_OPROFILE=y
 CONFIG_HAVE_OPROFILE=y
-# CONFIG_KPROBES is not set
+CONFIG_KPROBES=y
 CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
+CONFIG_KRETPROBES=y
 CONFIG_HAVE_IOREMAP_PROT=y
 CONFIG_HAVE_KPROBES=y
 CONFIG_HAVE_KRETPROBES=y
@@ -191,6 +192,7 @@ CONFIG_SCANLOG=m
 CONFIG_LPARCFG=y
 CONFIG_PPC_SMLPAR=y
 CONFIG_CMM=y
+CONFIG_DTL=y
 CONFIG_PPC_ISERIES=y
 
 #
@@ -328,9 +330,10 @@ CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
 CONFIG_KEXEC=y
 # CONFIG_PHYP_DUMP is not set
 CONFIG_IRQ_ALL_CPUS=y
-# CONFIG_NUMA is not set
+CONFIG_NUMA=y
+CONFIG_NODES_SHIFT=8
+CONFIG_MAX_ACTIVE_REGIONS=256
 CONFIG_ARCH_SELECT_MEMORY_MODEL=y
-CONFIG_ARCH_FLATMEM_ENABLE=y
 CONFIG_ARCH_SPARSEMEM_ENABLE=y
 CONFIG_ARCH_SPARSEMEM_DEFAULT=y
 CONFIG_ARCH_POPULATES_NODE_MAP=y
@@ -339,6 +342,7 @@ CONFIG_SELECT_MEMORY_MODEL=y
 # CONFIG_DISCONTIGMEM_MANUAL is not set
 CONFIG_SPARSEMEM_MANUAL=y
 CONFIG_SPARSEMEM=y
+CONFIG_NEED_MULTIPLE_NODES=y
 CONFIG_HAVE_MEMORY_PRESENT=y
 CONFIG_SPARSEMEM_EXTREME=y
 CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
@@ -354,11 +358,12 @@ CONFIG_PHYS_ADDR_T_64BIT=y
 CONFIG_ZONE_DMA_FLAG=1
 CONFIG_BOUNCE=y
 CONFIG_UNEVICTABLE_LRU=y
+CONFIG_NODES_SPAN_OTHER_NODES=y
 CONFIG_ARCH_MEMORY_PROBE=y
 CONFIG_PPC_HAS_HASH_64K=y
 # CONFIG_PPC_64K_PAGES is not set
 CONFIG_FORCE_MAX_ZONEORDER=13
-# CONFIG_SCHED_SMT is not set
+CONFIG_SCHED_SMT=y
 CONFIG_PROC_DEVICETREE=y
 # CONFIG_CMDLINE_BOOL is not set
 CONFIG_EXTRA_TARGETS=""
@@ -790,12 +795,12 @@ CONFIG_SCSI_IPR=y
 CONFIG_SCSI_IPR_TRACE=y
 CONFIG_SCSI_IPR_DUMP=y
 # CONFIG_SCSI_QLOGIC_1280 is not set
-# CONFIG_SCSI_QLA_FC is not set
+CONFIG_SCSI_QLA_FC=m
 # CONFIG_SCSI_QLA_ISCSI is not set
 CONFIG_SCSI_LPFC=m
 # CONFIG_SCSI_DC395x is not set
 # CONFIG_SCSI_DC390T is not set
-CONFIG_SCSI_DEBUG=m
+# CONFIG_SCSI_DEBUG is not set
 # CONFIG_SCSI_SRP is not set
 # CONFIG_SCSI_LOWLEVEL_PCMCIA is not set
 # CONFIG_SCSI_DH is not set
@@ -867,9 +872,8 @@ CONFIG_MD_AUTODETECT=y
 CONFIG_MD_LINEAR=y
 CONFIG_MD_RAID0=y
 CONFIG_MD_RAID1=y
-CONFIG_MD_RAID10=y
-CONFIG_MD_RAID456=y
-CONFIG_MD_RAID5_RESHAPE=y
+CONFIG_MD_RAID10=m
+CONFIG_MD_RAID456=m
 CONFIG_MD_MULTIPATH=m
 CONFIG_MD_FAULTY=m
 CONFIG_BLK_DEV_DM=y
@@ -984,7 +988,7 @@ CONFIG_ACENIC=m
 CONFIG_ACENIC_OMIT_TIGON_I=y
 # CONFIG_DL2K is not set
 CONFIG_E1000=y
-# CONFIG_E1000E is not set
+CONFIG_E1000E=m
 # CONFIG_IP1000 is not set
 # CONFIG_IGB is not set
 # CONFIG_NS83820 is not set
@@ -1006,19 +1010,19 @@ CONFIG_GELIC_WIRELESS=y
 # CONFIG_ATL1E is not set
 # CONFIG_JME is not set
 CONFIG_NETDEV_1=y
-# CONFIG_CHELSIO_T1 is not set
-# CONFIG_CHELSIO_T3 is not set
+CONFIG_CHELSIO_T1=m
+CONFIG_CHELSIO_T3=m
 CONFIG_EHEA=m
 # CONFIG_ENIC is not set
-# CONFIG_IXGBE is not set
+CONFIG_IXGBE=m
 CONFIG_IXGB=m
-# CONFIG_S2IO is not set
-# CONFIG_MYRI10GE is not set
-# CONFIG_NETXEN_NIC is not set
+CONFIG_S2IO=m
+CONFIG_MYRI10GE=m
+CONFIG_NETXEN_NIC=m
 # CONFIG_NIU is not set
 CONFIG_PASEMI_MAC=y
-# CONFIG_MLX4_EN is not set
-# CONFIG_MLX4_CORE is not set
+CONFIG_MLX4_EN=m
+CONFIG_MLX4_CORE=m
 # CONFIG_TEHUTI is not set
 # CONFIG_BNX2X is not set
 # CONFIG_QLGE is not set
@@ -1169,7 +1173,7 @@ CONFIG_SERIAL_TXX9=y
 CONFIG_HAS_TXX9_SERIAL=y
 CONFIG_SERIAL_TXX9_NR_UARTS=6
 CONFIG_SERIAL_TXX9_CONSOLE=y
-# CONFIG_SERIAL_JSM is not set
+CONFIG_SERIAL_JSM=m
 # CONFIG_SERIAL_OF_PLATFORM is not set
 CONFIG_UNIX98_PTYS=y
 CONFIG_LEGACY_PTYS=y
@@ -1586,7 +1590,7 @@ CONFIG_USB_DEVICEFS=y
 CONFIG_USB_DEVICE_CLASS=y
 # CONFIG_USB_DYNAMIC_MINORS is not set
 # CONFIG_USB_OTG is not set
-# CONFIG_USB_MON is not set
+CONFIG_USB_MON=m
 # CONFIG_USB_WUSB is not set
 # CONFIG_USB_WUSB_CBAF is not set
 
@@ -1686,21 +1690,22 @@ CONFIG_USB_APPLEDISPLAY=m
 # CONFIG_NEW_LEDS is not set
 # CONFIG_ACCESSIBILITY is not set
 CONFIG_INFINIBAND=m
-# CONFIG_INFINIBAND_USER_MAD is not set
-# CONFIG_INFINIBAND_USER_ACCESS is not set
+CONFIG_INFINIBAND_USER_MAD=m
+CONFIG_INFINIBAND_USER_ACCESS=m
+CONFIG_INFINIBAND_USER_MEM=y
 CONFIG_INFINIBAND_ADDR_TRANS=y
 CONFIG_INFINIBAND_MTHCA=m
 CONFIG_INFINIBAND_MTHCA_DEBUG=y
-# CONFIG_INFINIBAND_IPATH is not set
+CONFIG_INFINIBAND_IPATH=m
 CONFIG_INFINIBAND_EHCA=m
 # CONFIG_INFINIBAND_AMSO1100 is not set
-# CONFIG_MLX4_INFINIBAND is not set
+CONFIG_MLX4_INFINIBAND=m
 # CONFIG_INFINIBAND_NES is not set
 CONFIG_INFINIBAND_IPOIB=m

[PATCH] powerpc: Convert mmu context allocator from idr to ida

2010-02-07 Thread Anton Blanchard

We can use the much more lightweight ida allocator since we don't
need the pointer storage idr provides.

Signed-off-by: Anton Blanchard 
---

Index: powerpc.git/arch/powerpc/mm/mmu_context_hash64.c
===
--- powerpc.git.orig/arch/powerpc/mm/mmu_context_hash64.c   2010-02-05 
14:57:48.399712677 +1100
+++ powerpc.git/arch/powerpc/mm/mmu_context_hash64.c2010-02-05 
14:57:55.938461799 +1100
@@ -23,7 +23,7 @@
 #include 
 
 static DEFINE_SPINLOCK(mmu_context_lock);
-static DEFINE_IDR(mmu_context_idr);
+static DEFINE_IDA(mmu_context_ida);
 
 /*
  * The proto-VSID space has 2^35 - 1 segments available for user mappings.
@@ -39,11 +39,11 @@ int __init_new_context(void)
int err;
 
 again:
-   if (!idr_pre_get(&mmu_context_idr, GFP_KERNEL))
+   if (!ida_pre_get(&mmu_context_ida, GFP_KERNEL))
return -ENOMEM;
 
spin_lock(&mmu_context_lock);
-   err = idr_get_new_above(&mmu_context_idr, NULL, 1, &index);
+   err = ida_get_new_above(&mmu_context_ida, 1, &index);
spin_unlock(&mmu_context_lock);
 
if (err == -EAGAIN)
@@ -53,7 +53,7 @@ again:
 
if (index > MAX_CONTEXT) {
spin_lock(&mmu_context_lock);
-   idr_remove(&mmu_context_idr, index);
+   ida_remove(&mmu_context_ida, index);
spin_unlock(&mmu_context_lock);
return -ENOMEM;
}
@@ -85,7 +85,7 @@ int init_new_context(struct task_struct 
 void __destroy_context(int context_id)
 {
spin_lock(&mmu_context_lock);
-   idr_remove(&mmu_context_idr, context_id);
+   ida_remove(&mmu_context_ida, context_id);
spin_unlock(&mmu_context_lock);
 }
 EXPORT_SYMBOL_GPL(__destroy_context);
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] powerpc: Only print clockevent settings once

2010-02-07 Thread Anton Blanchard

The clockevent multiplier and shift is useful information, but we
only need to print it once.

Signed-off-by: Anton Blanchard 
---

Index: powerpc.git/arch/powerpc/kernel/time.c
===
--- powerpc.git.orig/arch/powerpc/kernel/time.c 2010-02-05 14:57:48.839716602 
+1100
+++ powerpc.git/arch/powerpc/kernel/time.c  2010-02-05 14:57:53.057212067 
+1100
@@ -930,13 +930,17 @@ static void __init setup_clockevent_mult
 
 static void register_decrementer_clockevent(int cpu)
 {
+   static int printed = 0;
struct clock_event_device *dec = &per_cpu(decrementers, cpu).event;
 
*dec = decrementer_clockevent;
dec->cpumask = cpumask_of(cpu);
 
-   printk(KERN_DEBUG "clockevent: %s mult[%x] shift[%d] cpu[%d]\n",
-  dec->name, dec->mult, dec->shift, cpu);
+   if (!printed) {
+   printed = 1;
+   printk(KERN_DEBUG "clockevent: %s mult[%x] shift[%d] cpu[%d]\n",
+  dec->name, dec->mult, dec->shift, cpu);
+   }
 
clockevents_register_device(dec);
 }
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


powerpc: Make powerpc_firmware_features __read_mostly

2010-02-07 Thread Anton Blanchard

We use firmware_has_feature quite a lot these days, so it's worth putting
powerpc_firmware_features into __read_mostly.

Signed-off-by: Anton Blanchard 
---

Index: powerpc.git/arch/powerpc/kernel/firmware.c
===
--- powerpc.git.orig/arch/powerpc/kernel/firmware.c 2010-02-05 
14:57:48.579712760 +1100
+++ powerpc.git/arch/powerpc/kernel/firmware.c  2010-02-05 14:57:54.688461988 
+1100
@@ -17,5 +17,5 @@
 
 #include 
 
-unsigned long powerpc_firmware_features;
+unsigned long powerpc_firmware_features __read_mostly;
 EXPORT_SYMBOL_GPL(powerpc_firmware_features);
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] powerpc: reformat SD_NODE_INIT to match x86

2010-02-07 Thread Anton Blanchard

Clean up SD_NODE_INITS so we can easily compare it to x86. Similar to the
work in 47734f89be0614b5acbd6a532390f9c72f019648 (sched: Clean up topology.h)

Signed-off-by: Anton Blanchard 
---

Index: linux.trees.git/arch/powerpc/include/asm/topology.h
===
--- linux.trees.git.orig/arch/powerpc/include/asm/topology.h2010-01-22 
11:35:07.0 +1100
+++ linux.trees.git/arch/powerpc/include/asm/topology.h 2010-01-22 
11:37:29.0 +1100
@@ -38,27 +38,33 @@ static inline int pcibus_to_node(struct 
 cpumask_of_node(pcibus_to_node(bus)))
 
 /* sched_domains SD_NODE_INIT for PPC64 machines */
-#define SD_NODE_INIT (struct sched_domain) {   \
-   .parent = NULL, \
-   .child  = NULL, \
-   .groups = NULL, \
-   .min_interval   = 8,\
-   .max_interval   = 32,   \
-   .busy_factor= 32,   \
-   .imbalance_pct  = 125,  \
-   .cache_nice_tries   = 1,\
-   .busy_idx   = 3,\
-   .idle_idx   = 1,\
-   .newidle_idx= 0,\
-   .wake_idx   = 0,\
-   .flags  = SD_LOAD_BALANCE   \
-   | SD_BALANCE_EXEC   \
-   | SD_BALANCE_FORK   \
-   | SD_BALANCE_NEWIDLE\
-   | SD_SERIALIZE, \
-   .last_balance   = jiffies,  \
-   .balance_interval   = 1,\
-   .nr_balance_failed  = 0,\
+#define SD_NODE_INIT (struct sched_domain) {   \
+   .min_interval   = 8,\
+   .max_interval   = 32,   \
+   .busy_factor= 32,   \
+   .imbalance_pct  = 125,  \
+   .cache_nice_tries   = 1,\
+   .busy_idx   = 3,\
+   .idle_idx   = 1,\
+   .newidle_idx= 0,\
+   .wake_idx   = 0,\
+   .forkexec_idx   = 0,\
+   \
+   .flags  = 1*SD_LOAD_BALANCE \
+   | 1*SD_BALANCE_NEWIDLE  \
+   | 1*SD_BALANCE_EXEC \
+   | 1*SD_BALANCE_FORK \
+   | 0*SD_BALANCE_WAKE \
+   | 0*SD_WAKE_AFFINE  \
+   | 0*SD_PREFER_LOCAL \
+   | 0*SD_SHARE_CPUPOWER   \
+   | 0*SD_POWERSAVINGS_BALANCE \
+   | 0*SD_SHARE_PKG_RESOURCES  \
+   | 1*SD_SERIALIZE\
+   | 0*SD_PREFER_SIBLING   \
+   ,   \
+   .last_balance   = jiffies,  \
+   .balance_interval   = 1,\
 }
 
 extern void __init dump_numa_cpu_topology(void);
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 1/2] USB: add HCD_BOUNCE_BUFFERS host controller driver flag

2010-02-07 Thread Albert Herranz
Alan Stern wrote:
> On Sun, 7 Feb 2010, Albert Herranz wrote:
> 
>> The wii has no uhci, but has 2 ohci controllers.
>> For ohci we need a similar approach as done for ehci.
> 
> So you'll need to write a patch splitting up the OHCI data structures 
> in the same way the EHCI qh was split up.
> 

Yes.

>> It turns out that we have more limitations.
>> The wii has 2 discontiguous memory areas (usually called MEM1 and MEM2). I 
>> have checked that the ehci controller doesn't work properly when performing 
>> dma to buffers allocated in MEM1 (it corrupts part of the data) but has no 
>> problems if the buffers sit within MEM2.
>> So usb buffers will need to be bounced anyway if they are part of MEM1.
> 
> This sounds like the sort of restriction that dma_map_single() should 
> be capable of handling.
> 

On powerpc you can have per-device specific dma ops.
I'll work on that direction and create a special dma ops set for devices which 
need their dma buffers on mem2, and then use those for ehci-hlwd.

> Alan Stern
> 

Thanks,
Albert

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] powerpc: Quieten cede latency printk

2010-02-07 Thread Anton Blanchard

The cede latency stuff is relatively new and we don't need to complain about
it not working on older firmware.

Signed-off-by: Anton Blanchard 
---

Index: powerpc.git/arch/powerpc/platforms/pseries/hotplug-cpu.c
===
--- powerpc.git.orig/arch/powerpc/platforms/pseries/hotplug-cpu.c   
2010-02-05 17:36:22.509710985 +1100
+++ powerpc.git/arch/powerpc/platforms/pseries/hotplug-cpu.c2010-02-05 
17:36:30.118124726 +1100
@@ -396,14 +396,6 @@ static int parse_cede_parameters(void)
__pa(cede_parameters),
CEDE_LATENCY_PARAM_MAX_LENGTH);
 
-   if (call_status != 0)
-   printk(KERN_INFO "CEDE_LATENCY: \
-   %s %s Error calling get-system-parameter(0x%x)\n",
-   __FILE__, __func__, call_status);
-   else
-   printk(KERN_INFO "CEDE_LATENCY: \
-   get-system-parameter successful.\n");
-
return call_status;
 }
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] Restrict stack space reservation to rlimit

2010-02-07 Thread Michael Neuling
apkm, linus: this or something like it needs to go into 2.6.33 (& 32) to
fix 'ulimit -s'.  

Mikey

[PATCH] Restrict stack space reservation to rlimit

When reserving stack space for a new process, make sure we're not
attempting to allocate more than rlimit allows.

Also, reserve the same stack size independent of page size.

This fixes a bug unmasked by fc63cf237078c86214abcb2ee9926d8ad289da9b

Signed-off-by: Michael Neuling 
Cc: Anton Blanchard 
Cc: sta...@kernel.org
---
 fs/exec.c |9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Index: clone1/fs/exec.c
===
--- clone1.orig/fs/exec.c
+++ clone1/fs/exec.c
@@ -554,7 +554,7 @@ static int shift_arg_pages(struct vm_are
return 0;
 }
 
-#define EXTRA_STACK_VM_PAGES   20  /* random */
+#define EXTRA_STACK_VM_SIZE81920UL /* randomly 20 4K pages */
 
 /*
  * Finalizes the stack vm_area_struct. The flags and permissions are updated,
@@ -627,10 +627,13 @@ int setup_arg_pages(struct linux_binprm 
goto out_unlock;
}
 
+   stack_base = min(EXTRA_STACK_VM_SIZE,
+current->signal->rlim[RLIMIT_STACK].rlim_cur) -
+   PAGE_SIZE;
 #ifdef CONFIG_STACK_GROWSUP
-   stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+   stack_base = vma->vm_end + stack_base;
 #else
-   stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+   stack_base = vma->vm_start - stack_base;
 #endif
ret = expand_stack(vma, stack_base);
if (ret)
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Stack size protection broken on ppc64

2010-02-07 Thread Anton Blanchard

Hi,

> Cool, thanks.  The following is based on this and fixes the problem for
> me on PPC64 ie. the !CONFIG_STACK_GROWSUP case. 

Thanks! Seeing the original setting of EXTRA_STACK_VM_PAGES is more or
less random, I wonder if we should round EXTRA_STACK_VM_SIZE up to 128kB
(or even down to 64kB) so it operates better with > 4kB pages.

But in the end its probably of little use for the default OVERCOMMIT_GUESS
setting, so the main thing is we dont terminate processes incorrectly.

Acked-by: Anton Blanchard 

Anton
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] Restrict stack space reservation to rlimit

2010-02-07 Thread Michael Neuling
When reserving stack space for a new process, make sure we're not
attempting to allocate more than rlimit allows.

Also, reserve the same stack size independent of page size.

This fixes a bug cause by b6a2fea39318e43fee84fa7b0b90d68bed92d2ba 
"mm: variable length argument support" and unmasked by
fc63cf237078c86214abcb2ee9926d8ad289da9b 
"exec: setup_arg_pages() fails to return errors".

Signed-off-by: Michael Neuling 
Cc: Anton Blanchard 
Cc: sta...@kernel.org
---
Update commit message to include patch name and SHA1 of related
patches.  

 fs/exec.c |9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Index: clone1/fs/exec.c
===
--- clone1.orig/fs/exec.c
+++ clone1/fs/exec.c
@@ -554,7 +554,7 @@ static int shift_arg_pages(struct vm_are
return 0;
 }
 
-#define EXTRA_STACK_VM_PAGES   20  /* random */
+#define EXTRA_STACK_VM_SIZE81920UL /* randomly 20 4K pages */
 
 /*
  * Finalizes the stack vm_area_struct. The flags and permissions are updated,
@@ -627,10 +627,13 @@ int setup_arg_pages(struct linux_binprm 
goto out_unlock;
}
 
+   stack_base = min(EXTRA_STACK_VM_SIZE,
+current->signal->rlim[RLIMIT_STACK].rlim_cur) -
+   PAGE_SIZE;
 #ifdef CONFIG_STACK_GROWSUP
-   stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+   stack_base = vma->vm_end + stack_base;
 #else
-   stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+   stack_base = vma->vm_start - stack_base;
 #endif
ret = expand_stack(vma, stack_base);
if (ret)
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] powerpc: Add last sysfs file and dump of ftrace buffer to oops printout

2010-02-07 Thread Anton Blanchard

Add printout of last accessed sysfs file, added to x86 in
ae87221d3ce49d9de1e43756da834fd0bf05a2ad (sysfs: crash debugging)

Also add the notify_die hook that allows us to print out the ftrace
buffer on oops. This is useful in conjunction with ftrace function_graph:


Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=128 NUMA pSeries
last sysfs file: /sys/class/net/tunl0/type
Dumping ftrace buffer:

...

  0)   |.sysrq_handle_crash() {
  0)   0.476 us|  .hash_page();
  0)   0.488 us|  .xmon_fault_handler();
  0)   |  .bad_page_fault() {
  0)   |.search_exception_tables() {
  0)   0.590 us|  .search_module_extables();
  0)   2.546 us|}
  0)   |.printk() {
  0)   |  .vprintk() {
  0)   0.488 us|._raw_spin_lock();
  0)   0.572 us|.emit_log_char();


Showing the function graph of a sysrq-c crash.

Signed-off-by: Anton Blanchard 
---

Index: powerpc.git/arch/powerpc/kernel/traps.c
===
--- powerpc.git.orig/arch/powerpc/kernel/traps.c2010-02-08 
11:15:51.463071942 +1100
+++ powerpc.git/arch/powerpc/kernel/traps.c 2010-02-08 11:17:29.914321833 
+1100
@@ -146,6 +146,11 @@ int die(const char *str, struct pt_regs 
 #endif
printk("%s\n", ppc_md.name ? ppc_md.name : "");
 
+   sysfs_printk_last_file();
+   if (notify_die(DIE_OOPS, str, regs, err, 255,
+  SIGSEGV) == NOTIFY_STOP)
+   return 1;
+
print_modules();
show_regs(regs);
} else {
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


powerpc/pseries: Fix kexec regression caused by CPPR tracking

2010-02-07 Thread Mark Nelson
The code to track the CPPR values added by commit
49bd3647134ea47420067aea8d1401e722bf2aac ("powerpc/pseries: Track previous
CPPR values to correctly EOI interrupts") broke kexec on pseries because
the kexec code in xics.c calls xics_set_cpu_priority() before the IPI has
been EOI'ed. This wasn't a problem previously but it now triggers a BUG_ON
in xics_set_cpu_priority() because os_cppr->index isn't 0:

Oops: Exception in kernel mode, sig: 5 [#1] 
SMP NR_CPUS=128 NUMA
kernel BUG at arch/powerpc/platforms/pseries/xics.c:791!
pSeries 
Modules linked in: ehea dm_mirror dm_region_hash dm_log dm_zero dm_snapshot 
parport_pc parport dm_multipath autofs4
NIP: c00461bc LR: c0046260 CTR: c004bc08
   
REGS: cfffb770 TRAP: 0700   Not tainted  (2.6.33-rc6-autokern1) 
   
MSR: 80021032   CR: 4822  XER: 0001
   
TASK = c0aef700[0] 'swapper' THREAD: c0bcc000 CPU: 0
   
GPR00: 0001 cfffb9f0 c0bcc9e8   
   
GPR04:   00dc 0002  
   
GPR08: c40036e8 c4002e40 0898   
   
GPR12: 0002 c0bf8480 0350 c0792f28  
   
GPR16: c07915e0  00419000 03da8990  
   
GPR20: c08a8990 0010 c0ae92c0 0010  
   
GPR24:  c0be2380  00200200  
   
GPR28: 0001 0001 c0b249e8   
   
NIP [c00461bc] .xics_set_cpu_priority+0x38/0xb8 
   
LR [c0046260] .xics_teardown_cpu+0x24/0xa4  
   
Call Trace: 
   
[cfffb9f0] [ebf3] 0xebf3 (unreliable)   
   
[cfffba60] [c0046260] .xics_teardown_cpu+0x24/0xa4  
   
[cfffbae0] [c0046330] .xics_kexec_teardown_cpu+0x18/0xb4
   
[cfffbb60] [c004a150] .pseries_kexec_cpu_down_xics+0x20/0x38
   
[cfffbbf0] [c002e5b8] .kexec_smp_down+0x48/0x7c 
   
[cfffbc70] [c00b2dd0] 
.generic_smp_call_function_interrupt+0xf4/0x1b4  
[cfffbd20] [c002aed0] .smp_message_recv+0x48/0x100  
   
[cfffbda0] [c0046ae0] .xics_ipi_dispatch+0x84/0x148 
   
[cfffbe30] [c00d62dc] .handle_IRQ_event+0xc8/0x248  
   
[cfffbf00] [c00d8eb4] .handle_percpu_irq+0x80/0xf4  
   
[cfffbf90] [c0029048] .call_handle_irq+0x1c/0x2c
   
[c0bcfa30] [c000ec84] .do_IRQ+0x1b8/0x2a4   
   
[c0bcfae0] [c0004804] hardware_interrupt_entry+0x1c/0x98

Fix this problem by setting the index on the CPPR stack to 0 before calling
xics_set_cpu_priority() in xics_teardown_cpu().

Also make it clear that we only want to set the priority when there's just
one CPPR value in the stack, and enforce it by updating the value of
os_cppr->stack[0] rather than os_cppr->stack[os_cppr->index].

While we're at it change the BUG_ON to a WARN_ON.

Reported-by: Anton Blanchard 
Signed-off-by: Mark Nelson 
---
Ben, if it's not too late for 2.6.33 this would be really nice to have
as without it we can't kexec on pseries.

 arch/powerpc/platforms/pseries/xics.c |   14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

Index: upstream/arch/powerpc/platforms/pseries/xics.c
===
--- upstream.orig/arch/powerpc/platforms/pseries/xics.c
+++ upstream/arch/powerpc/platforms/pseries/xics.c
@@ -784,9 +784,13 @@ static void xics_set_cpu_priority(unsign
 {
struct xics_cppr *os_cppr = &__get_cpu_var(xics_cppr);

Re: [PATCH 01/10] arch/powerpc: Fix continuation line formats

2010-02-07 Thread Benjamin Herrenschmidt
On Mon, 2010-02-01 at 10:30 -0800, Joe Perches wrote:
> On Mon, 2010-02-01 at 13:16 +1100, Benjamin Herrenschmidt wrote:
> > On Sun, 2010-01-31 at 12:02 -0800, Joe Perches wrote:
> > > String constants that are continued on subsequent lines with \
> > > are not good.
> > > Signed-off-by: Joe Perches 
> > You want me to take that in the powerpc tree ?
> 
> Yes please.
> 
> > A minor glitch below tho...
> > > @@ -349,7 +349,7 @@ static int __init nvram_create_os_partition(void)
> > >   rc = ppc_md.nvram_write((char *)&seq_init, sizeof(seq_init), 
> > > &tmp_index);
> > >   if (rc <= 0) {
> > >   printk(KERN_ERR "nvram_create_os_partition: nvram_write "
> > > - "failed (%d)\n", rc);
> > > +"failed (%d)\n", rc);
> > >   return rc;
> > >   }
> > 
> > The above is objectionable :-)
> 
> Can you drop that section or do you need another patch?

I'll sort it out. Thanks.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: fix ioremap_flags() with book3e pte definition

2010-02-07 Thread Benjamin Herrenschmidt
On Fri, 2010-02-05 at 00:53 +0800, Li Yang wrote:
> We can't just clear the user read permission in book3e pte, because
> that will also clear supervisor read permission.  This surely isn't
> desired.  Fix the problem by adding the supervisor read back.
> 
> Signed-off-by: Li Yang 
> ---
>  arch/powerpc/mm/pgtable_32.c |5 +
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
> index cb96cb2..aff7c04 100644
> --- a/arch/powerpc/mm/pgtable_32.c
> +++ b/arch/powerpc/mm/pgtable_32.c
> @@ -144,6 +144,11 @@ ioremap_flags(phys_addr_t addr, unsigned long size, 
> unsigned long flags)
>   /* we don't want to let _PAGE_USER and _PAGE_EXEC leak out */
>   flags &= ~(_PAGE_USER | _PAGE_EXEC);
>  
> +#if defined(CONFIG_FSL_BOOKE) && defined(CONFIG_PTE_64BIT)
> + /* supervisor read permission has just been cleared, add back */
> + flags |= _PAGE_BAP_SR;
> +#endif
> +

So this is a bit fishy indeed. pgtable_64.c seems to have the same
problem in fact.

It boils down to the "hack" I have in the new PTE format which consists
of having _SR be part of _PAGE_USER. I wonder if I should change that
instead. Kumar, what do you reckon ?

_Maybe_ an option is to have _PAGE_NO_USER that defaults to be
~_PAGE_USER in pte-common.h and can be set in pte-book3e.h to
only strip out UR and UW ?

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: Only print clockevent settings once

2010-02-07 Thread Wolfram Sang
On Mon, Feb 08, 2010 at 09:28:01AM +1100, Anton Blanchard wrote:
> 
> The clockevent multiplier and shift is useful information, but we
> only need to print it once.
> 
> Signed-off-by: Anton Blanchard 
> ---
> 
> Index: powerpc.git/arch/powerpc/kernel/time.c
> ===
> --- powerpc.git.orig/arch/powerpc/kernel/time.c   2010-02-05 
> 14:57:48.839716602 +1100
> +++ powerpc.git/arch/powerpc/kernel/time.c2010-02-05 14:57:53.057212067 
> +1100
> @@ -930,13 +930,17 @@ static void __init setup_clockevent_mult
>  
>  static void register_decrementer_clockevent(int cpu)
>  {
> + static int printed = 0;
>   struct clock_event_device *dec = &per_cpu(decrementers, cpu).event;
>  
>   *dec = decrementer_clockevent;
>   dec->cpumask = cpumask_of(cpu);
>  
> - printk(KERN_DEBUG "clockevent: %s mult[%x] shift[%d] cpu[%d]\n",
> -dec->name, dec->mult, dec->shift, cpu);
> + if (!printed) {
> + printed = 1;
> + printk(KERN_DEBUG "clockevent: %s mult[%x] shift[%d] cpu[%d]\n",
> +dec->name, dec->mult, dec->shift, cpu);
> + }

What about printk_once from kernel.h?

-- 
Pengutronix e.K.   | Wolfram Sang|
Industrial Linux Solutions | http://www.pengutronix.de/  |


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] Restrict stack space reservation to rlimit

2010-02-07 Thread Anton Blanchard
 
Hi,

> Why do we need page size independent stack size? It seems to have
> compatibility breaking risk.

I don't think so. The current behaviour is clearly wrong, we dont need a
16x larger stack just because you went from a 4kB to a 64kB base page
size. The user application stack usage is the same in both cases.

Anton
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] Restrict stack space reservation to rlimit

2010-02-07 Thread KOSAKI Motohiro
>  
> Hi,
> 
> > Why do we need page size independent stack size? It seems to have
> > compatibility breaking risk.
> 
> I don't think so. The current behaviour is clearly wrong, we dont need a
> 16x larger stack just because you went from a 4kB to a 64kB base page
> size. The user application stack usage is the same in both cases.

I didn't discuss which behavior is better. Michael said he want to apply
his patch to 2.6.32 & 2.6.33. stable tree never accept the breaking
compatibility patch.

Your answer doesn't explain why can't we wait it until next merge window.


btw, personally, I like page size indepent stack size. but I'm not sure
why making stack size independency is related to bug fix.



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: Only print clockevent settings once

2010-02-07 Thread Anton Blanchard

Hi Wolfram,

> What about printk_once from kernel.h?

Thanks for the suggestion!

Anton
--

The clockevent multiplier and shift is useful information, but we
only need to print it once.

Signed-off-by: Anton Blanchard 
---

Index: powerpc.git/arch/powerpc/kernel/time.c
===
--- powerpc.git.orig/arch/powerpc/kernel/time.c 2010-02-08 11:45:12.933073040 
+1100
+++ powerpc.git/arch/powerpc/kernel/time.c  2010-02-08 16:21:08.505571532 
+1100
@@ -935,8 +935,8 @@ static void register_decrementer_clockev
*dec = decrementer_clockevent;
dec->cpumask = cpumask_of(cpu);
 
-   printk(KERN_DEBUG "clockevent: %s mult[%x] shift[%d] cpu[%d]\n",
-  dec->name, dec->mult, dec->shift, cpu);
+   printk_once(KERN_DEBUG "clockevent: %s mult[%x] shift[%d] cpu[%d]\n",
+   dec->name, dec->mult, dec->shift, cpu);
 
clockevents_register_device(dec);
 }
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] Restrict stack space reservation to rlimit

2010-02-07 Thread Anton Blanchard

Hi,

> I didn't discuss which behavior is better. Michael said he want to apply
> his patch to 2.6.32 & 2.6.33. stable tree never accept the breaking
> compatibility patch.
> 
> Your answer doesn't explain why can't we wait it until next merge window.
> 
> 
> btw, personally, I like page size indepent stack size. but I'm not sure
> why making stack size independency is related to bug fix.

OK sorry, I misunderstood your initial mail. I agree fixing the bit that
regressed in 2.6.32 is the most important thing. The difference in page size is
clearly wrong but since it isn't a regression we could probably live with it
until 2.6.34

Anton
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] Restrict stack space reservation to rlimit

2010-02-07 Thread KOSAKI Motohiro
Hi

> apkm, linus: this or something like it needs to go into 2.6.33 (& 32) to
> fix 'ulimit -s'.  

"fix ulimit -s" is too cool explanation ;-)
we are not ESPer. please consider to provide what bug is exist.


> Mikey
> 
> [PATCH] Restrict stack space reservation to rlimit
> 
> When reserving stack space for a new process, make sure we're not
> attempting to allocate more than rlimit allows.
> 
> Also, reserve the same stack size independent of page size.

Why do we need page size independent stack size? It seems to have
compatibility breaking risk.


> 
> This fixes a bug unmasked by fc63cf237078c86214abcb2ee9926d8ad289da9b
> 
> Signed-off-by: Michael Neuling 
> Cc: Anton Blanchard 
> Cc: sta...@kernel.org
> ---
>  fs/exec.c |9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> Index: clone1/fs/exec.c
> ===
> --- clone1.orig/fs/exec.c
> +++ clone1/fs/exec.c
> @@ -554,7 +554,7 @@ static int shift_arg_pages(struct vm_are
>   return 0;
>  }
>  
> -#define EXTRA_STACK_VM_PAGES 20  /* random */
> +#define EXTRA_STACK_VM_SIZE  81920UL /* randomly 20 4K pages */
>  
>  /*
>   * Finalizes the stack vm_area_struct. The flags and permissions are updated,
> @@ -627,10 +627,13 @@ int setup_arg_pages(struct linux_binprm 
>   goto out_unlock;
>   }
>  
> + stack_base = min(EXTRA_STACK_VM_SIZE,
> +  current->signal->rlim[RLIMIT_STACK].rlim_cur) -
> + PAGE_SIZE;
>  #ifdef CONFIG_STACK_GROWSUP
> - stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> + stack_base = vma->vm_end + stack_base;
>  #else
> - stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> + stack_base = vma->vm_start - stack_base;
>  #endif
>   ret = expand_stack(vma, stack_base);
>   if (ret)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] Clear MSR_RI during RTAS calls

2010-02-07 Thread Anton Blanchard

RTAS should never cause an exception but if it does (for example accessing
outside our RMO) then we might go a long way through the kernel before
oopsing. If we unset MSR_RI we should at least stop things on exception
exit.

Signed-off-by: Anton Blanchard 
---

The setting of MSR_RI during RTAS calls went in ages ago (2003), and I'm
not sure why. Clearly an exception inside RTAS is fatal since we most likely
have a bad r13.

Index: powerpc.git/arch/powerpc/kernel/entry_64.S
===
--- powerpc.git.orig/arch/powerpc/kernel/entry_64.S 2010-02-08 
12:25:51.543072119 +1100
+++ powerpc.git/arch/powerpc/kernel/entry_64.S  2010-02-08 12:26:56.043513290 
+1100
@@ -791,9 +791,8 @@ _GLOBAL(enter_rtas)

 li  r9,1
 rldicr  r9,r9,MSR_SF_LG,(63-MSR_SF_LG)
-   ori r9,r9,MSR_IR|MSR_DR|MSR_FE0|MSR_FE1|MSR_FP
+   ori r9,r9,MSR_IR|MSR_DR|MSR_FE0|MSR_FE1|MSR_FP|MSR_RI
andcr6,r0,r9
-   ori r6,r6,MSR_RI
sync/* disable interrupts so SRR0/1 */
mtmsrd  r0  /* don't get trashed */
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] Restrict stack space reservation to rlimit

2010-02-07 Thread Michael Neuling
> >  
> > Hi,
> > 
> > > Why do we need page size independent stack size? It seems to have
> > > compatibility breaking risk.
> > 
> > I don't think so. The current behaviour is clearly wrong, we dont need a
> > 16x larger stack just because you went from a 4kB to a 64kB base page
> > size. The user application stack usage is the same in both cases.
> 
> I didn't discuss which behavior is better. Michael said he want to apply
> his patch to 2.6.32 & 2.6.33. stable tree never accept the breaking
> compatibility patch.
> 
> Your answer doesn't explain why can't we wait it until next merge window.
> 
> btw, personally, I like page size indepent stack size. but I'm not sure
> why making stack size independency is related to bug fix.

I tend to agree.  

Below is just the bug fix to limit the reservation size based rlimit.
We still reserve different stack sizes based on the page size as
before (unless we hit rlimit of course).

Mikey

Restrict stack space reservation to rlimit

When reserving stack space for a new process, make sure we're not
attempting to allocate more than rlimit allows.

This fixes a bug cause by b6a2fea39318e43fee84fa7b0b90d68bed92d2ba 
"mm: variable length argument support" and unmasked by
fc63cf237078c86214abcb2ee9926d8ad289da9b 
"exec: setup_arg_pages() fails to return errors".

Signed-off-by: Michael Neuling 
Cc: Anton Blanchard 
Cc: sta...@kernel.org
---
 fs/exec.c |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

Index: linux-2.6-ozlabs/fs/exec.c
===
--- linux-2.6-ozlabs.orig/fs/exec.c
+++ linux-2.6-ozlabs/fs/exec.c
@@ -627,10 +627,13 @@ int setup_arg_pages(struct linux_binprm 
goto out_unlock;
}
 
+   stack_base = min(EXTRA_STACK_VM_PAGES * PAGE_SIZE,
+current->signal->rlim[RLIMIT_STACK].rlim_cur -
+  PAGE_SIZE);
 #ifdef CONFIG_STACK_GROWSUP
-   stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+   stack_base = vma->vm_end + stack_base;
 #else
-   stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+   stack_base = vma->vm_start - stack_base;
 #endif
ret = expand_stack(vma, stack_base);
if (ret)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] Clear MSR_RI during RTAS calls

2010-02-07 Thread Anton Blanchard

> The setting of MSR_RI during RTAS calls went in ages ago (2003), and I'm
> not sure why. Clearly an exception inside RTAS is fatal since we most likely
> have a bad r13.

I wrote the r13 comment without thinking :) Regardless I think we want
to catch any RTAS exception ASAP.  

Anton
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] Restrict stack space reservation to rlimit

2010-02-07 Thread KOSAKI Motohiro
> > >  
> > > Hi,
> > > 
> > > > Why do we need page size independent stack size? It seems to have
> > > > compatibility breaking risk.
> > > 
> > > I don't think so. The current behaviour is clearly wrong, we dont need a
> > > 16x larger stack just because you went from a 4kB to a 64kB base page
> > > size. The user application stack usage is the same in both cases.
> > 
> > I didn't discuss which behavior is better. Michael said he want to apply
> > his patch to 2.6.32 & 2.6.33. stable tree never accept the breaking
> > compatibility patch.
> > 
> > Your answer doesn't explain why can't we wait it until next merge window.
> > 
> > btw, personally, I like page size indepent stack size. but I'm not sure
> > why making stack size independency is related to bug fix.
> 
> I tend to agree.  
> 
> Below is just the bug fix to limit the reservation size based rlimit.
> We still reserve different stack sizes based on the page size as
> before (unless we hit rlimit of course).

Thanks.

I agree your patch in almost part. but I have very few requests.


> Mikey
> 
> Restrict stack space reservation to rlimit
> 
> When reserving stack space for a new process, make sure we're not
> attempting to allocate more than rlimit allows.
> 
> This fixes a bug cause by b6a2fea39318e43fee84fa7b0b90d68bed92d2ba 
> "mm: variable length argument support" and unmasked by
> fc63cf237078c86214abcb2ee9926d8ad289da9b 
> "exec: setup_arg_pages() fails to return errors".

Your initial mail have following problem use-case. please append it
into the patch description.

On recent ppc64 kernels, limiting the stack (using 'ulimit -s blah') is
now more restrictive than it was before.  On 2.6.31 with 4k pages I
could run 'ulimit -s 16; /usr/bin/test' without a problem.  Now with
mainline, even 'ulimit -s 64; /usr/bin/test' gets killed.


> 
> Signed-off-by: Michael Neuling 
> Cc: Anton Blanchard 
> Cc: sta...@kernel.org
> ---
>  fs/exec.c |7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6-ozlabs/fs/exec.c
> ===
> --- linux-2.6-ozlabs.orig/fs/exec.c
> +++ linux-2.6-ozlabs/fs/exec.c
> @@ -627,10 +627,13 @@ int setup_arg_pages(struct linux_binprm 
>   goto out_unlock;
>   }
>  
> + stack_base = min(EXTRA_STACK_VM_PAGES * PAGE_SIZE,
> +  current->signal->rlim[RLIMIT_STACK].rlim_cur -
> +PAGE_SIZE);

This line is a bit unclear why "- PAGE_SIZE" is necessary.
personally, I like following likes explicit comments.

stack_expand = EXTRA_STACK_VM_PAGES * PAGE_SIZE;
stack_lim = ACCESS_ONCE(rlim[RLIMIT_STACK].rlim_cur);

/* Initial stack must not cause stack overflow. */
if (stack_expand + PAGE_SIZE > stack_lim)
stack_expand = stack_lim - PAGE_SIZE;

note: accessing rlim_cur require ACCESS_ONCE.


Thought?


>  #ifdef CONFIG_STACK_GROWSUP
> - stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> + stack_base = vma->vm_end + stack_base;
>  #else
> - stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> + stack_base = vma->vm_start - stack_base;
>  #endif
>   ret = expand_stack(vma, stack_base);
>   if (ret)
> 



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] Restrict stack space reservation to rlimit

2010-02-07 Thread KOSAKI Motohiro
> 
> Hi,
> 
> > I didn't discuss which behavior is better. Michael said he want to apply
> > his patch to 2.6.32 & 2.6.33. stable tree never accept the breaking
> > compatibility patch.
> > 
> > Your answer doesn't explain why can't we wait it until next merge window.
> > 
> > 
> > btw, personally, I like page size indepent stack size. but I'm not sure
> > why making stack size independency is related to bug fix.
> 
> OK sorry, I misunderstood your initial mail. I agree fixing the bit that
> regressed in 2.6.32 is the most important thing. The difference in page size 
> is
> clearly wrong but since it isn't a regression we could probably live with it
> until 2.6.34

thanks!


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[git pull] Please pull powerpc.git merge branch

2010-02-07 Thread Benjamin Herrenschmidt
Hi Linus !

Here's a small regression fix that should still make it into .33

Cheers,
Ben.

The following changes since commit 6339204ecc2aa2067a99595522de0403f0854bb8:
  Linus Torvalds (1):
Merge branch 'for-linus' of git://git.kernel.org/.../viro/vfs-2.6

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git merge

Mark Nelson (1):
  powerpc/pseries: Fix kexec regression caused by CPPR tracking

 arch/powerpc/platforms/pseries/xics.c |   14 --
 1 files changed, 12 insertions(+), 2 deletions(-)


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] Restrict stack space reservation to rlimit

2010-02-07 Thread Américo Wang
On Mon, Feb 8, 2010 at 2:05 PM, KOSAKI Motohiro
 wrote:
>> --- linux-2.6-ozlabs.orig/fs/exec.c
>> +++ linux-2.6-ozlabs/fs/exec.c
>> @@ -627,10 +627,13 @@ int setup_arg_pages(struct linux_binprm
>>                       goto out_unlock;
>>       }
>>
>> +     stack_base = min(EXTRA_STACK_VM_PAGES * PAGE_SIZE,
>> +                      current->signal->rlim[RLIMIT_STACK].rlim_cur -
>> +                        PAGE_SIZE);
>
> This line is a bit unclear why "- PAGE_SIZE" is necessary.
> personally, I like following likes explicit comments.
>
>        stack_expand = EXTRA_STACK_VM_PAGES * PAGE_SIZE;
>        stack_lim = ACCESS_ONCE(rlim[RLIMIT_STACK].rlim_cur);
>
>        /* Initial stack must not cause stack overflow. */
>        if (stack_expand + PAGE_SIZE > stack_lim)
>                stack_expand = stack_lim - PAGE_SIZE;
>
> note: accessing rlim_cur require ACCESS_ONCE.
>
>
> Thought?

It's better to use the helper function: rlimit().
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] Restrict stack space reservation to rlimit

2010-02-07 Thread KOSAKI Motohiro
> On Mon, Feb 8, 2010 at 2:05 PM, KOSAKI Motohiro
>  wrote:
> >> --- linux-2.6-ozlabs.orig/fs/exec.c
> >> +++ linux-2.6-ozlabs/fs/exec.c
> >> @@ -627,10 +627,13 @@ int setup_arg_pages(struct linux_binprm
> >>                       goto out_unlock;
> >>       }
> >>
> >> +     stack_base = min(EXTRA_STACK_VM_PAGES * PAGE_SIZE,
> >> +                      current->signal->rlim[RLIMIT_STACK].rlim_cur -
> >> +                        PAGE_SIZE);
> >
> > This line is a bit unclear why "- PAGE_SIZE" is necessary.
> > personally, I like following likes explicit comments.
> >
> >        stack_expand = EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> >        stack_lim = ACCESS_ONCE(rlim[RLIMIT_STACK].rlim_cur);
> >
> >        /* Initial stack must not cause stack overflow. */
> >        if (stack_expand + PAGE_SIZE > stack_lim)
> >                stack_expand = stack_lim - PAGE_SIZE;
> >
> > note: accessing rlim_cur require ACCESS_ONCE.
> >
> >
> > Thought?
> 
> It's better to use the helper function: rlimit().

AFAIK, stable tree doesn't have rlimit(). but yes, making two patch
(for mainline and for stable) is good opinion.



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev