Re: [PATCH review 3/6] userns: Recommend use of memory control groups.

2013-01-27 Thread Lord Glauber Costa of Sealand
On 01/28/2013 11:37 AM, Lord Glauber Costa of Sealand wrote:
> On 01/26/2013 06:22 AM, Eric W. Biederman wrote:
>>
>> In the help text describing user namespaces recommend use of memory
>> control groups.  In many cases memory control groups are the only
>> mechanism there is to limit how much memory a user who can create
>> user namespaces can use.
>>
>> Signed-off-by: "Eric W. Biederman" 
>> ---
>>  Documentation/namespaces/resource-control.txt |   10 ++
>>  init/Kconfig  |7 +++
>>  2 files changed, 17 insertions(+), 0 deletions(-)
>>  create mode 100644 Documentation/namespaces/resource-control.txt
>>
>> diff --git a/Documentation/namespaces/resource-control.txt 
>> b/Documentation/namespaces/resource-control.txt
>> new file mode 100644
>> index 000..3d8178a
>> --- /dev/null
>> +++ b/Documentation/namespaces/resource-control.txt
>> @@ -0,0 +1,10 @@
>> +There are a lot of kinds of objects in the kernel that don't have
>> +individual limits or that have limits that are ineffective when a set
>> +of processes is allowed to switch user ids.  With user namespaces
>> +enabled in a kernel for people who don't trust their users or their
>> +users programs to play nice this problems becomes more acute.
>> +
>> +Therefore it is recommended that memory control groups be enabled in
>> +kernels that enable user namespaces, and it is further recommended
>> +that userspace configure memory control groups to limit how much
>> +memory users they don't trust to play nice can use.
>> diff --git a/init/Kconfig b/init/Kconfig
>> index 7d30240..c8c58bd 100644
>> --- a/init/Kconfig
>> +++ b/init/Kconfig
>> @@ -1035,6 +1035,13 @@ config USER_NS
>>  help
>>This allows containers, i.e. vservers, to use user namespaces
>>to provide different user info for different servers.
>> +
>> +  When user namespaces are enabled in the kernel it is
>> +  recommended that the MEMCG and MEMCG_KMEM options also be
>> +  enabled and that user-space use the memory control groups to
>> +  limit the amount of memory a memory unprivileged users can
>> +  use.
>> +
>>If unsure, say N.
> 
> Since this becomes an official recommendation that people will likely
> follow, are we really that much concerned about the types of abuses the
> MEMCG_KMEM will prevent? Those are mostly metadata-based abuses users
> could do in their own local disks without mounting anything extra (and
> things that look like that)
> 
> Unless there is a specific concern here, shouldn't we say "... that the
> MEMCG (and possibly MEMCG_KMEM) options..." ?
> 
> 
I just saw in a later patch of yours that your concern here seems not
limited to backed ram by tmpfs, but with things like the internal
structures for userns , to avoid patterns in the form: 'for (;;)
unshare(...)'

Humm, it does seem sensible. The kernel memory controller aims to
prevent exactly things like that. But they all exist already before
userns: there are destructive patterns like that with sockets, dentries,
processes, and pretty much every other resource in the kernel. So
Although the recommendation per-se makes sense, I am wondering if it is
worth it to mention anything in the user_ns config?




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2] ARM: davinci: da850: add RTC DT entries

2013-01-27 Thread Mrugesh Katepallewar
Add RTC DT entries in da850 dts file.

Signed-off-by: Mrugesh Katepallewar 
---
Applies on top of v3.8-rc4 of linus tree.

This patch is depending on 
"ARM: davinci: da850: add interrupt-parent property in soc node"
https://patchwork.kernel.org/patch/2044101/

Tested on da850-evm device.

Test Procedure:
date 2013.01.28-10:00:00 (usage: date[.]MM.DD-hh:mm[:ss])
hwclock -w
reset board and check system time.

Changes Since V1:
Remove interrupt-parent property from RTC node.
Change RTC node name in dts and dtsi file.

:100644 100644 37dc5a3... af4b7cc... M  arch/arm/boot/dts/da850-evm.dts
:100644 100644 640ab75... 90be701... M  arch/arm/boot/dts/da850.dtsi
 arch/arm/boot/dts/da850-evm.dts |3 +++
 arch/arm/boot/dts/da850.dtsi|7 +++
 2 files changed, 10 insertions(+)

diff --git a/arch/arm/boot/dts/da850-evm.dts b/arch/arm/boot/dts/da850-evm.dts
index 37dc5a3..af4b7cc 100644
--- a/arch/arm/boot/dts/da850-evm.dts
+++ b/arch/arm/boot/dts/da850-evm.dts
@@ -24,5 +24,8 @@
serial2: serial@1d0d000 {
status = "okay";
};
+   rtc0: rtc@1c23000 {
+   status = "okay";
+   };
};
 };
diff --git a/arch/arm/boot/dts/da850.dtsi b/arch/arm/boot/dts/da850.dtsi
index 640ab75..90be701 100644
--- a/arch/arm/boot/dts/da850.dtsi
+++ b/arch/arm/boot/dts/da850.dtsi
@@ -56,5 +56,12 @@
interrupt-parent = <>;
status = "disabled";
};
+   rtc0: rtc@1c23000 {
+   compatible = "ti,da830-rtc";
+   reg = <0x23000 0x1000>;
+   interrupts = <19
+ 19>;
+   status = "disabled";
+   };
};
 };
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 14/71] ARC: Low level IRQ/Trap/Exception Handling

2013-01-27 Thread Vineet Gupta
Hi Al,

On Thursday 24 January 2013 04:35 PM, Vineet Gupta wrote:
> Signed-off-by: Vineet Gupta 
> Cc: Al Viro 
> ---
>  arch/arc/include/asm/entry.h |  495 
>  arch/arc/kernel/entry.S  |  571 
> ++
>  2 files changed, 1066 insertions(+), 0 deletions(-)
>  create mode 100644 arch/arc/include/asm/entry.h
>  create mode 100644 arch/arc/kernel/entry.S
> 
> diff --git a/arch/arc/include/asm/entry.h b/arch/arc/include/asm/entry.h
> new file mode 100644
> index 000..63705b1
> --- /dev/null
> +++ b/arch/arc/include/asm/entry.h
> @@ -0,0 +1,495 @@
> +/*
> + * Copyright (C) 2004, 2007-2010, 2011-2012 Synopsys, Inc. (www.synopsys.com)
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * Vineetg: Aug 28th 2008: Bug #94984
> + *  -Zero Overhead Loop Context shd be cleared when entering IRQ/EXcp/Trap
> + *   Normally CPU does this automatically, however when doing FAKE rtie,
> + *   we also need to explicitly do this. The problem in macros
> + *   FAKE_RET_FROM_EXCPN and FAKE_RET_FROM_EXCPN_LOCK_IRQ was that this bit
> + *   was being "CLEARED" rather then "SET". Actually "SET" clears ZOL context
> + *
> + * Vineetg: May 5th 2008
> + *  - Defined Stack Switching Macro to be reused in all intr/excp hdlrs
> + *  - Shaved off 11 instructions from RESTORE_ALL_INT1 by using the
> + *  address Write back load ld.ab instead of seperate ld/add instn
> + *
> + * Amit Bhor, Sameer Dhavale: Codito Technologies 2004
> + */
> +
> +#ifndef __ASM_ARC_ENTRY_H
> +#define __ASM_ARC_ENTRY_H
> +
> +#ifdef __ASSEMBLY__
> +#include   /* For NR_syscalls defination */
> +#include 
> +#include 
> +#include 
> +#include  /* For THREAD_SIZE */
> +
> +/* Note on the LD/ST addr modes with addr reg wback
> + *
> + * LD.a same as LD.aw
> + *
> + * LD.areg1, [reg2, x]  => Pre Incr
> + *  Eff Addr for load = [reg2 + x]
> + *
> + * LD.ab   reg1, [reg2, x]  => Post Incr
> + *  Eff Addr for load = [reg2]
> + */
> +
> +/*--
> + * Save caller saved registers (scratch registers) ( r0 - r12 )
> + * Registers are pushed / popped in the order defined in struct ptregs
> + * in asm/ptrace.h
> + *-*/
> +.macro  SAVE_CALLER_SAVED
> + st.ar0, [sp, -4]
> + st.ar1, [sp, -4]
> + st.ar2, [sp, -4]
> + st.ar3, [sp, -4]
> + st.ar4, [sp, -4]
> + st.ar5, [sp, -4]
> + st.ar6, [sp, -4]
> + st.ar7, [sp, -4]
> + st.ar8, [sp, -4]
> + st.ar9, [sp, -4]
> + st.ar10, [sp, -4]
> + st.ar11, [sp, -4]
> + st.ar12, [sp, -4]
> +.endm
> +
> +/*--
> + * Restore caller saved registers (scratch registers)
> + *-*/
> +.macro RESTORE_CALLER_SAVED
> + ld.ab   r12, [sp, 4]
> + ld.ab   r11, [sp, 4]
> + ld.ab   r10, [sp, 4]
> + ld.ab   r9, [sp, 4]
> + ld.ab   r8, [sp, 4]
> + ld.ab   r7, [sp, 4]
> + ld.ab   r6, [sp, 4]
> + ld.ab   r5, [sp, 4]
> + ld.ab   r4, [sp, 4]
> + ld.ab   r3, [sp, 4]
> + ld.ab   r2, [sp, 4]
> + ld.ab   r1, [sp, 4]
> + ld.ab   r0, [sp, 4]
> +.endm
> +
> +
> +/*--
> + * Save callee saved registers (non scratch registers) ( r13 - r25 )
> + *  on kernel stack.
> + * User mode callee regs need to be saved in case of
> + *-fork and friends for replicating from parent to child
> + *-before going into do_signal( ) for ptrace/core-dump
> + * Special case handling is required for r25 in case it is used by kernel
> + *  for caching task ptr. Low level exception/ISR save user mode r25
> + *  into task->thread.user_r25. So it needs to be retrieved from there and
> + *  saved into kernel stack with rest of callee reg-file
> + *-*/
> +.macro SAVE_CALLEE_SAVED_USER
> + st.ar13, [sp, -4]
> + st.ar14, [sp, -4]
> + st.ar15, [sp, -4]
> + st.ar16, [sp, -4]
> + st.ar17, [sp, -4]
> + st.ar18, [sp, -4]
> + st.ar19, [sp, -4]
> + st.ar20, [sp, -4]
> + st.ar21, [sp, -4]
> + st.ar22, [sp, -4]
> + st.ar23, [sp, -4]
> + st.ar24, [sp, -4]
> + st.ar25, [sp, -4]
> +
> + /* move up by 1 word to "create" callee_regs->"stack_place_holder" */
> + sub sp, sp, 4
> +.endm
> +
> +/*--
> + * Save callee saved registers (non scratch registers) ( r13 - r25 )
> + * kernel mode callee regs needed to be saved in case of context switch
> + * If r25 is used for caching task 

Re: [PATCH v5 7/8] fat (exportfs): rebuild directory-inode if fat_dget() fails

2013-01-27 Thread Namjae Jeon
2013/1/26, OGAWA Hirofumi :
> Namjae Jeon  writes:
>
>> 2013/1/20, OGAWA Hirofumi :
>>> Namjae Jeon  writes:
>>>
 We rewrite patch as your suggestion using dummy inode. Would please
 you review below patch code ?
>>>
>>> Looks like good as initial. Clean and shorter.
>>>
>>> Next is, we have to think about race. I.e. if real inode was made, what
>>> happens? Is there no race?
>> Hi OGAWA.
>>
>> Although checking several routines to check hang case you said, I
>> didn't find anything.
>> And There is no any race on test result also. Am I missing something ?
>> Let me know your opinion.
>
> Hm, it's read-only. So, there may not be race for now, I'm sure there is
> race on write path though.
Yes, right. We checked/tested on read-only.
Maybe have you found race with rename and unlink ?
If yes, I think we can fix this issue with lock like this.

+   mutex_lock(_SB(sb)->s_lock);
 parent_inode = fat_rebuild_parent(sb, parent_logstart);
+   mutex_unlock(_SB(sb)->s_lock);

Let me know your opinion.
Thanks.
>
> Thanks.
> --
> OGAWA Hirofumi 
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 22/71] ARC: [Review] Prevent incorrect syscall restarts

2013-01-27 Thread Vineet Gupta
Hi Al,

On Thursday 24 January 2013 04:20 PM, Vineet Gupta wrote:
> Per Al Viro's "signals for dummies" https://lkml.org/lkml/2012/12/6/366
> there are 3 golden rules for (not) restarting syscalls:
> 
> " What we need to guarantee is
> * restarts do not happen on signals caught in interrupts or exceptions
> * restarts do not happen on signals caught in sigreturn()
> * restart should happen only once, even if we get through do_signal()
>   many times."
> 
> ARC Port already handled #1, this patch fixes #2 and #3.
> 
> We use the additional state in pt_regs->orig_r8 to ckh if restarting
> has already been done once.
> 
> Thanks to Al Viro for spotting this.
> 
> Signed-off-by: Vineet Gupta 
> Cc: Al Viro 
> ---
>  arch/arc/include/asm/ptrace.h |3 +++
>  arch/arc/kernel/signal.c  |   12 
>  2 files changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arc/include/asm/ptrace.h b/arch/arc/include/asm/ptrace.h
> index 1711d56..8bf7ff4 100644
> --- a/arch/arc/include/asm/ptrace.h
> +++ b/arch/arc/include/asm/ptrace.h
> @@ -135,6 +135,9 @@ struct user_regs_struct {
>  #define in_syscall(regs)(regs->event & orig_r8_IS_SCALL)
>  #define in_brkpt_trap(regs) (regs->event & orig_r8_IS_BRKPT)
>  
> +#define syscall_wont_restart(regs) (regs->event |= 
> orig_r8_IS_SCALL_RESTARTED)
> +#define syscall_restartable(regs) !(regs->event &  
> orig_r8_IS_SCALL_RESTARTED)
> +
>  #define current_pt_regs()\
>  ({   \
>   /* open-coded current_thread_info() */  \
> diff --git a/arch/arc/kernel/signal.c b/arch/arc/kernel/signal.c
> index 887a383..9a1ea2b 100644
> --- a/arch/arc/kernel/signal.c
> +++ b/arch/arc/kernel/signal.c
> @@ -131,6 +131,9 @@ SYSCALL_DEFINE0(rt_sigreturn)
>   if (restore_altstack(>uc.uc_stack))
>   goto badframe;
>  
> + /* Don't restart from sigreturn */
> + syscall_wont_restart(regs);
> +
>   return regs->r0;
>  
>  badframe:
> @@ -321,13 +324,13 @@ void do_signal(struct pt_regs *regs)
>  
>   signr = get_signal_to_deliver(, , regs, NULL);
>  
> - /* Are we from a system call? */
> - restart_scall = in_syscall(regs);
> + restart_scall = in_syscall(regs) && syscall_restartable(regs);
>  
>   if (signr > 0) {
> - if (restart_scall)
> + if (restart_scall) {
>   arc_restart_syscall(, regs);
> -
> + syscall_wont_restart(regs); /* No more restarts */
> + }
>   handle_signal(signr, , , regs);
>   return;
>   }
> @@ -342,6 +345,7 @@ void do_signal(struct pt_regs *regs)
>   regs->r8 = __NR_restart_syscall;
>   regs->ret -= 4;
>   }
> + syscall_wont_restart(regs); /* No more restarts */
>   }
>  
>   /* If there's no signal to deliver, restore the saved sigmask back */
> 

If this looks OK, can I get your ACK on this.

Thx,
-Vineet
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 32/71] ARC: [DeviceTree] Basic support

2013-01-27 Thread Vineet Gupta
Hi Rob,

On Thursday 24 January 2013 04:20 PM, Vineet Gupta wrote:
> This is minimal infrastructure needed for devicetree work.
> It uses an a sample "skeleton" devicetree - embedded in kernel image -
> to print the board, manufacturer by parsing the top-level "compatible"
> string.
> 
> As of now we don't need any additional "board" specific "machine_desc".
> 
> TODO: support interpreting the command line as boot-loader passed dtb
> 
> Signed-off-by: Vineet Gupta 
> 
> Cc: Arnd Bergmann 
> Cc: Grant Likely 
> Cc: devicetree-discuss-uLR06cmDAlY/bj5bz2r...@public.gmane.org
> Cc: Rob Herring 
> ---
>  arch/arc/Kconfig|9 +
>  arch/arc/Makefile   |9 +
>  arch/arc/boot/dts/Makefile  |   14 
>  arch/arc/boot/dts/skeleton.dts  |   10 ++
>  arch/arc/boot/dts/skeleton.dtsi |   21 
>  arch/arc/include/asm/prom.h |   15 
>  arch/arc/include/asm/sections.h |1 +
>  arch/arc/kernel/Makefile|2 +
>  arch/arc/kernel/devtree.c   |   69 
> +++
>  arch/arc/kernel/setup.c |9 +
>  arch/arc/mm/init.c  |   13 +++
>  11 files changed, 172 insertions(+), 0 deletions(-)
>  create mode 100644 arch/arc/boot/dts/Makefile
>  create mode 100644 arch/arc/boot/dts/skeleton.dts
>  create mode 100644 arch/arc/boot/dts/skeleton.dtsi
>  create mode 100644 arch/arc/include/asm/prom.h
>  create mode 100644 arch/arc/kernel/devtree.c
> 
> diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
> index a353849..7666857 100644
> --- a/arch/arc/Kconfig
> +++ b/arch/arc/Kconfig
> @@ -24,8 +24,11 @@ config ARC
>   select GENERIC_SMP_IDLE_THREAD
>   select HAVE_GENERIC_HARDIRQS
>   select HAVE_MEMBLOCK
> + select IRQ_DOMAIN
>   select MODULES_USE_ELF_RELA
>   select NO_BOOTMEM
> + select OF
> + select OF_EARLY_FLATTREE
>  
>  config SCHED_OMIT_FRAME_POINTER
>   def_bool y
> @@ -320,6 +323,12 @@ config CMDLINE_UBOOT
> to it. kernel startup code will copy the string into cmdline buffer
> and also append CONFIG_CMDLINE.
>  
> +config ARC_BUILTIN_DTB_NAME
> + string "Built in DTB"
> + help
> +   Set the name of the DTB to embed in the vmlinux binary
> +   Leaving it blank selects the minimal "skeleton" dtb
> +
>  source "kernel/Kconfig.preempt"
>  
>  endmenu   # "ARC Architecture Configuration"
> diff --git a/arch/arc/Makefile b/arch/arc/Makefile
> index 4d52a3b..90570f9 100644
> --- a/arch/arc/Makefile
> +++ b/arch/arc/Makefile
> @@ -83,6 +83,9 @@ head-y  := arch/arc/kernel/head.o
>  # See arch/arc/Kbuild for content of core part of the kernel
>  core-y   += arch/arc/
>  
> +# w/o this dtb won't embed into kernel binary
> +core-y   += arch/arc/boot/dts/
> +
>  # w/o this ifneq, make ARCH=arc clean was crapping out
>  ifneq ($(platform-y),)
>  core-y   += arch/arc/plat-$(PLATFORM)/
> @@ -101,6 +104,12 @@ bootpImage: vmlinux
>  uImage: vmlinux
>   $(Q)$(MAKE) $(build)=$(boot) $(boot)/$@
>  
> +%.dtb %.dtb.S %.dtb.o:
> + $(Q)$(MAKE) $(build)=$(boot)/dts $(boot)/dts/$@
> +
> +dtbs:
> + $(Q)$(MAKE) $(build)=$(boot)/dts $(boot)/dts/$@
> +
>  archclean:
>   $(Q)$(MAKE) $(clean)=$(boot)
>  
> diff --git a/arch/arc/boot/dts/Makefile b/arch/arc/boot/dts/Makefile
> new file mode 100644
> index 000..4a972a3
> --- /dev/null
> +++ b/arch/arc/boot/dts/Makefile
> @@ -0,0 +1,14 @@
> +ifeq ($(CONFIG_OF),y)
> +
> +# Built-in dtb
> +builtindtb-y := skeleton
> +
> +ifneq ($(CONFIG_ARC_BUILTIN_DTB_NAME),"")
> + builtindtb-y:= $(CONFIG_ARC_BUILTIN_DTB_NAME)
> +endif
> +
> +obj-y+= $(patsubst "%",%,$(builtindtb-y)).dtb.o
> +
> +clean-files := *.dtb
> +
> +endif
> diff --git a/arch/arc/boot/dts/skeleton.dts b/arch/arc/boot/dts/skeleton.dts
> new file mode 100644
> index 000..25a84fb
> --- /dev/null
> +++ b/arch/arc/boot/dts/skeleton.dts
> @@ -0,0 +1,10 @@
> +/*
> + * Copyright (C) 2012 Synopsys, Inc. (www.synopsys.com)
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +/dts-v1/;
> +
> +/include/ "skeleton.dtsi"
> diff --git a/arch/arc/boot/dts/skeleton.dtsi b/arch/arc/boot/dts/skeleton.dtsi
> new file mode 100644
> index 000..9b357d8
> --- /dev/null
> +++ b/arch/arc/boot/dts/skeleton.dtsi
> @@ -0,0 +1,21 @@
> +/*
> + * Copyright (C) 2012 Synopsys, Inc. (www.synopsys.com)
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +/*
> + * Skeleton device tree; the bare minimum needed to boot; just include and
> + * add a compatible value.
> + */
> +
> +/ {
> + compatible = "snps,arc";
> + #address-cells = <1>;
> + #size-cells 

Re: [PATCH] pci-sysfs: replace mutex_lock with mutex_trylock to avoid potential deadlock situation

2013-01-27 Thread Jiang Liu
Hi all,
I have worked out a draft patch set to serialize hotplug operations,
but encountered some obstacles when dealing with PCI root buses. I will
try to rebase the patch set onto "PCI: Iterate pci host bridge instead of
pci root bus" from Yinghai. And also need to fix bugs reported by Lin Feng
too.

Regards!
Gerry

On 2013-1-26 4:59, Yinghai Lu wrote:
> On Fri, Jan 25, 2013 at 12:30 PM, Bjorn Helgaas  wrote:
> 
>> If we're going to fix the sysfs deadlock (and we should), I want to
>> either see an argument for why we don't have a problem outside of
>> sysfs, or I want to fix sysfs and non-sysfs at the same time.
> 
> Sure.
> 
> Jiang should have one patches to address that?
> 
> Yinghai
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH review 3/6] userns: Recommend use of memory control groups.

2013-01-27 Thread Lord Glauber Costa of Sealand
On 01/26/2013 06:22 AM, Eric W. Biederman wrote:
> 
> In the help text describing user namespaces recommend use of memory
> control groups.  In many cases memory control groups are the only
> mechanism there is to limit how much memory a user who can create
> user namespaces can use.
> 
> Signed-off-by: "Eric W. Biederman" 
> ---
>  Documentation/namespaces/resource-control.txt |   10 ++
>  init/Kconfig  |7 +++
>  2 files changed, 17 insertions(+), 0 deletions(-)
>  create mode 100644 Documentation/namespaces/resource-control.txt
> 
> diff --git a/Documentation/namespaces/resource-control.txt 
> b/Documentation/namespaces/resource-control.txt
> new file mode 100644
> index 000..3d8178a
> --- /dev/null
> +++ b/Documentation/namespaces/resource-control.txt
> @@ -0,0 +1,10 @@
> +There are a lot of kinds of objects in the kernel that don't have
> +individual limits or that have limits that are ineffective when a set
> +of processes is allowed to switch user ids.  With user namespaces
> +enabled in a kernel for people who don't trust their users or their
> +users programs to play nice this problems becomes more acute.
> +
> +Therefore it is recommended that memory control groups be enabled in
> +kernels that enable user namespaces, and it is further recommended
> +that userspace configure memory control groups to limit how much
> +memory users they don't trust to play nice can use.
> diff --git a/init/Kconfig b/init/Kconfig
> index 7d30240..c8c58bd 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1035,6 +1035,13 @@ config USER_NS
>   help
> This allows containers, i.e. vservers, to use user namespaces
> to provide different user info for different servers.
> +
> +   When user namespaces are enabled in the kernel it is
> +   recommended that the MEMCG and MEMCG_KMEM options also be
> +   enabled and that user-space use the memory control groups to
> +   limit the amount of memory a memory unprivileged users can
> +   use.
> +
> If unsure, say N.

Since this becomes an official recommendation that people will likely
follow, are we really that much concerned about the types of abuses the
MEMCG_KMEM will prevent? Those are mostly metadata-based abuses users
could do in their own local disks without mounting anything extra (and
things that look like that)

Unless there is a specific concern here, shouldn't we say "... that the
MEMCG (and possibly MEMCG_KMEM) options..." ?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 58/71] ARC: UAPI Disintegrate arch/arc/include/asm

2013-01-27 Thread Vineet Gupta
Hi David,

On Thursday 24 January 2013 04:20 PM, Vineet Gupta wrote:
> 1. ./genfilelist.pl arch/arc/include/asm/
> 
> 2. Create arch/arc/include/uapi/asm/Kbuild as follows
> 
>   +# UAPI Header export list
>   +include include/uapi/asm-generic/Kbuild.asm
> 
> 3. ./disintegrate-one.pl arch/arc/include/{,uapi/}asm/
> 
> 4. Edit arch/arc/include/asm/Kbuild to remove ref to
>   asm-generic/Kbuild.asm
> 
> To work around empty uapi/asm/setup.h added a placholder comment.
> 
> Signed-off-by: Vineet Gupta 
> Cc: David Howells 
> ---
>  arch/arc/include/asm/Kbuild|8 ---
>  arch/arc/include/asm/byteorder.h   |   18 --
>  arch/arc/include/asm/cachectl.h|   28 -
>  arch/arc/include/asm/page.h|   30 +-
>  arch/arc/include/asm/ptrace.h  |   37 +---
>  arch/arc/include/asm/setup.h   |3 +-
>  arch/arc/include/asm/sigcontext.h  |   23 
>  arch/arc/include/asm/signal.h  |   27 -
>  arch/arc/include/asm/swab.h|   98 
> 
>  arch/arc/include/asm/unistd.h  |   34 ---
>  arch/arc/include/uapi/asm/Kbuild   |   11 
>  arch/arc/include/uapi/asm/byteorder.h  |   18 ++
>  arch/arc/include/uapi/asm/cachectl.h   |   28 +
>  arch/arc/include/uapi/asm/page.h   |   39 +
>  arch/arc/include/uapi/asm/ptrace.h |   46 +++
>  arch/arc/include/uapi/asm/setup.h  |6 ++
>  arch/arc/include/uapi/asm/sigcontext.h |   23 
>  arch/arc/include/uapi/asm/signal.h |   27 +
>  arch/arc/include/uapi/asm/swab.h   |   98 
> 
>  arch/arc/include/uapi/asm/unistd.h |   34 +++
>  20 files changed, 335 insertions(+), 301 deletions(-)
>  delete mode 100644 arch/arc/include/asm/byteorder.h
>  delete mode 100644 arch/arc/include/asm/cachectl.h
>  delete mode 100644 arch/arc/include/asm/sigcontext.h
>  delete mode 100644 arch/arc/include/asm/signal.h
>  delete mode 100644 arch/arc/include/asm/swab.h
>  delete mode 100644 arch/arc/include/asm/unistd.h
>  create mode 100644 arch/arc/include/uapi/asm/Kbuild
>  create mode 100644 arch/arc/include/uapi/asm/byteorder.h
>  create mode 100644 arch/arc/include/uapi/asm/cachectl.h
>  create mode 100644 arch/arc/include/uapi/asm/page.h
>  create mode 100644 arch/arc/include/uapi/asm/ptrace.h
>  create mode 100644 arch/arc/include/uapi/asm/setup.h
>  create mode 100644 arch/arc/include/uapi/asm/sigcontext.h
>  create mode 100644 arch/arc/include/uapi/asm/signal.h
>  create mode 100644 arch/arc/include/uapi/asm/swab.h
>  create mode 100644 arch/arc/include/uapi/asm/unistd.h
> 
> diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
> index b24089c..48af742 100644
> --- a/arch/arc/include/asm/Kbuild
> +++ b/arch/arc/include/asm/Kbuild
> @@ -1,11 +1,3 @@
> -include include/asm-generic/Kbuild.asm
> -
> -# 7-Oct-12: Jeremy Bennett . Some of these
> -# headers, beyond those specified in the generic set are needed by user code.
> -
> -header-y += page.h
> -header-y += cachectl.h
> -
>  generic-y += auxvec.h
>  generic-y += bugs.h
>  generic-y += bitsperlong.h
> diff --git a/arch/arc/include/asm/byteorder.h 
> b/arch/arc/include/asm/byteorder.h
> deleted file mode 100644
> index 9da71d4..000
> --- a/arch/arc/include/asm/byteorder.h
> +++ /dev/null
> @@ -1,18 +0,0 @@
> -/*
> - * Copyright (C) 2004, 2007-2010, 2011-2012 Synopsys, Inc. (www.synopsys.com)
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License version 2 as
> - * published by the Free Software Foundation.
> - */
> -
> -#ifndef __ASM_ARC_BYTEORDER_H
> -#define __ASM_ARC_BYTEORDER_H
> -
> -#ifdef CONFIG_CPU_BIG_ENDIAN
> -#include 
> -#else
> -#include 
> -#endif
> -
> -#endif /* ASM_ARC_BYTEORDER_H */
> diff --git a/arch/arc/include/asm/cachectl.h b/arch/arc/include/asm/cachectl.h
> deleted file mode 100644
> index 51c73f0..000
> --- a/arch/arc/include/asm/cachectl.h
> +++ /dev/null
> @@ -1,28 +0,0 @@
> -/*
> - * Copyright (C) 2004, 2007-2010, 2011-2012 Synopsys, Inc. (www.synopsys.com)
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License version 2 as
> - * published by the Free Software Foundation.
> - */
> -
> -#ifndef __ARC_ASM_CACHECTL_H
> -#define __ARC_ASM_CACHECTL_H
> -
> -/*
> - * ARC ABI flags defined for Android's finegrained cacheflush requirements
> - */
> -#define CF_I_INV 0x0002
> -#define CF_D_FLUSH   0x0010
> -#define CF_D_FLUSH_INV   0x0020
> -
> -#define CF_DEFAULT   (CF_I_INV | CF_D_FLUSH)
> -
> -/*
> - * Standard flags expected by cacheflush system call users
> - */
> -#define ICACHE   CF_I_INV
> -#define DCACHE   CF_D_FLUSH
> -#define BCACHE   (CF_I_INV | CF_D_FLUSH)
> -
> -#endif
> diff --git a/arch/arc/include/asm/page.h 

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Mike Galbraith
On Mon, 2013-01-28 at 15:17 +0800, Alex Shi wrote: 
> On 01/28/2013 02:49 PM, Mike Galbraith wrote:
> > On Mon, 2013-01-28 at 13:19 +0800, Alex Shi wrote: 
> >> On 01/27/2013 06:40 PM, Borislav Petkov wrote:
> >>> On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote:
>  Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9,
>  hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
>  loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear
>  performance change found.
> >>>
> >>> Ok, good, You could put that in one of the commit messages so that it is
> >>> there and people know that this patchset doesn't cause perf regressions
> >>> with the bunch of benchmarks.
> >>>
>  I also tested balance policy/powersaving policy with above benchmark,
>  found, the specjbb2005 drop much 30~50% on both of policy whenever
>  with openjdk or jrockit. and hackbench drops a lots with powersaving
>  policy on snb 4 sockets platforms. others has no clear change.
> >>>
> >>> I guess this is expected because there has to be some performance hit
> >>> when saving power...
> >>>
> >>
> >> BTW, I had tested the v3 version based on sched numa -- on tip/master.
> >> The specjbb just has about 5~7% dropping on balance/powersaving policy.
> >> The power scheduling done after the numa scheduling logical.
> > 
> > That makes sense.  How the numa scheduling numbers compare to mainline?
> > Do you have all three available, mainline, and tip w. w/o powersaving
> > policy?
> > 
> 
> I once caught 20~40% performance increasing on sched numa VS mainline
> 3.7-rc5. but have no baseline to compare balance/powersaving performance
> since lower data are acceptable for balance/powersaving and
> tip/master changes too quickly to follow up at that time.
> :)

(wow.  dram sucks, dram+smp sucks more, dram+smp+numa _sucks rocks_;)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] mm: clean up soft_offline_page()

2013-01-27 Thread Xishi Qiu
On 2013/1/26 13:02, Naoya Horiguchi wrote:

> Currently soft_offline_page() is hard to maintain because it has many
> return points and goto statements. All of this mess come from get_any_page().
> This function should only get page refcount as the name implies, but it does
> some page isolating actions like SetPageHWPoison() and dequeuing hugepage.
> This patch corrects it and introduces some internal subroutines to make
> soft offlining code more readable and maintainable.
> 
> ChangeLog v2:
>   - receive returned value from __soft_offline_page and soft_offline_huge_page
>   - place __soft_offline_page after soft_offline_page to reduce the diff
>   - rebased onto mmotm-2013-01-23-17-04
>   - add comment on double checks of PageHWpoison
> 
> Signed-off-by: Naoya Horiguchi 
> ---
>  mm/memory-failure.c | 154 
> 
>  1 file changed, 83 insertions(+), 71 deletions(-)
> 
> diff --git mmotm-2013-01-23-17-04.orig/mm/memory-failure.c 
> mmotm-2013-01-23-17-04/mm/memory-failure.c
> index c95e19a..302625b 100644
> --- mmotm-2013-01-23-17-04.orig/mm/memory-failure.c
> +++ mmotm-2013-01-23-17-04/mm/memory-failure.c
> @@ -1368,7 +1368,7 @@ static struct page *new_page(struct page *p, unsigned 
> long private, int **x)
>   * that is not free, and 1 for any other page type.
>   * For 1 the page is returned with increased page count, otherwise not.
>   */
> -static int get_any_page(struct page *p, unsigned long pfn, int flags)
> +static int __get_any_page(struct page *p, unsigned long pfn, int flags)
>  {
>   int ret;
>  
> @@ -1393,11 +1393,9 @@ static int get_any_page(struct page *p, unsigned long 
> pfn, int flags)
>   if (!get_page_unless_zero(compound_head(p))) {
>   if (PageHuge(p)) {
>   pr_info("%s: %#lx free huge page\n", __func__, pfn);
> - ret = dequeue_hwpoisoned_huge_page(compound_head(p));
> + ret = 0;
>   } else if (is_free_buddy_page(p)) {
>   pr_info("%s: %#lx free buddy page\n", __func__, pfn);
> - /* Set hwpoison bit while page is still isolated */
> - SetPageHWPoison(p);
>   ret = 0;
>   } else {
>   pr_info("%s: %#lx: unknown zero refcount page type 
> %lx\n",
> @@ -1413,42 +1411,62 @@ static int get_any_page(struct page *p, unsigned long 
> pfn, int flags)
>   return ret;
>  }
>  
> +static int get_any_page(struct page *page, unsigned long pfn, int flags)
> +{
> + int ret = __get_any_page(page, pfn, flags);
> +
> + if (ret == 1 && !PageHuge(page) && !PageLRU(page)) {
> + /*
> +  * Try to free it.
> +  */
> + put_page(page);
> + shake_page(page, 1);
> +
> + /*
> +  * Did it turn free?
> +  */
> + ret = __get_any_page(page, pfn, 0);
> + if (!PageLRU(page)) {
> + pr_info("soft_offline: %#lx: unknown non LRU page type 
> %lx\n",
> + pfn, page->flags);
> + return -EIO;
> + }
> + }
> + return ret;
> +}
> +
>  static int soft_offline_huge_page(struct page *page, int flags)
>  {
>   int ret;
>   unsigned long pfn = page_to_pfn(page);
>   struct page *hpage = compound_head(page);
>  
> + /*
> +  * This double-check of PageHWPoison is to avoid the race with
> +  * memory_failure(). See also comment in __soft_offline_page().
> +  */
> + lock_page(hpage);
>   if (PageHWPoison(hpage)) {
> + unlock_page(hpage);
> + put_page(hpage);
>   pr_info("soft offline: %#lx hugepage already poisoned\n", pfn);
> - ret = -EBUSY;
> - goto out;
> + return -EBUSY;
>   }
> -
> - ret = get_any_page(page, pfn, flags);
> - if (ret < 0)
> - goto out;
> - if (ret == 0)
> - goto done;
> + unlock_page(hpage);
>  
>   /* Keep page count to indicate a given hugepage is isolated. */
>   ret = migrate_huge_page(hpage, new_page, MPOL_MF_MOVE_ALL, false,
>   MIGRATE_SYNC);
>   put_page(hpage);
> - if (ret) {
> + if (ret)
>   pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
>   pfn, ret, page->flags);
> - goto out;
> - }
> -done:
>   /* keep elevated page count for bad page */
> - atomic_long_add(1 << compound_trans_order(hpage), _poisoned_pages);
> - set_page_hwpoison_huge_page(hpage);
> - dequeue_hwpoisoned_huge_page(hpage);

Hi Naoya,

Does num_poisoned_pages be added when soft_offline_huge_page? I mean the in-use 
huge pages.

Thanks,
Xishi Qiu

> -out:
>   return ret;
>  }
>  
> +static int __soft_offline_page(struct page *page, int flags);
> +
>  /**
>   * soft_offline_page - Soft offline a 

RE: [PATCH v2 1/4] ARM: OMAP2+: dpll: round rate to closest value

2013-01-27 Thread Paul Walmsley
Hi

On Fri, 25 Jan 2013, Mohammed, Afzal wrote:

> On Fri, Jan 25, 2013 at 13:48:11, Paul Walmsley wrote:
> > On Wed, 23 Jan 2013, Afzal Mohammed wrote:
> 
> > > Currently round rate function would return proper rate iff requested
> > > rate exactly matches the PLL lockable rate. This causes set_rate to
> > > fail if exact rate could not be set. Instead round rate may return
> > > closest rate possible (less than the requested). And if any user is
> > > badly in need of exact rate, then return value of round rate could
> > > be used to decide whether to invoke set rate or not.
> > > 
> > > Modify round rate so that it return closest possible rate.
> > 
> > This doesn't look like the right approach to me.  For some PLLs, an exact 
> > rate is desired.
> 
> If exact rate is required, there is a way to achieve it as mentioned
> in the commit message, i.e. by first invoking round rate over reqd. rate
> and if it doesn't match, bail out w/o invoking set_rate.
> 
> And it seems requirement of CCF w.r.t to round rate is to return closest
> possible rate.

Hmm.  Maybe I need to take a closer look.  I'm a little worried that, 
since __clk_round_rate() can be called from omap3_noncore_dpll_set_rate(), 
we might wind up with inconsistent behavior.  Effectively we'd need to 
mandate that clk_round_rate() would have to be called first for any DPLL 
where we'd expect to set an exact rate.


- Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Mike Galbraith
On Mon, 2013-01-28 at 07:42 +0100, Mike Galbraith wrote:

> Back to original 1ms sleep, 8ms work, turning NUMA box into a single
> node 10 core box with numactl.

(aim7 in one 10 core node.. so spread, no delta.)

Benchmark   Version Machine Run Date
AIM Multiuser Benchmark - Suite VII "1.1"   powersaving Jan 28 08:04:14 
2013

Tasks   Jobs/MinJTI RealCPU Jobs/sec/task
1   441.0   100 13.73.7 7.3508
5   2516.6  98  12.08.1 8.3887
10  5215.1  98  11.611.98.6919
20  10475.4 99  11.621.78.7295
40  20216.8 99  12.038.28.4237
80  35568.6 99  13.671.47.4101
160 57102.5 98  17.0138.2   5.9482
320 82099.9 97  23.6271.1   4.2760
Benchmark   Version Machine Run Date
AIM Multiuser Benchmark - Suite VII "1.1"   balance Jan 28 08:06:49 2013

Tasks   Jobs/MinJTI RealCPU Jobs/sec/task
1   439.4   100 13.83.8 7.3241
5   2583.1  98  11.77.2 8.6104
10  5325.1  99  11.411.08.8752
20  10687.8 99  11.323.68.9065
40  20200.0 99  12.038.78.4167
80  35464.5 98  13.771.47.3884
160 57203.5 98  16.9137.9   5.9587
320 82065.2 98  23.6271.1   4.2742
Benchmark   Version Machine Run Date
AIM Multiuser Benchmark - Suite VII "1.1"   performance Jan 28 08:09:20 
2013

Tasks   Jobs/MinJTI RealCPU Jobs/sec/task
1   438.8   100 13.83.8 7.3135
5   2634.8  99  11.57.2 8.7826
10  5396.3  99  11.211.48.9938
20  10725.7 99  11.324.08.9381
40  20183.2 99  12.038.58.4097
80  35620.9 99  13.671.47.4210
160 57203.5 98  16.9137.8   5.9587
320 81995.8 98  23.7271.3   4.2706

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [v3 2/2] ARM: tegra: Skip scu_enable(scu_base) if not Cortex A9

2013-01-27 Thread Hiroshi Doyu
Hi Russell,

On Tue, 22 Jan 2013 18:04:46 +0100
Olof Johansson  wrote:

> Since Russell had comments on it earlier, I'd like him to give a nod
> that he's happy with it too.

Is this ok for you?

The original patch is:
http://lists.infradead.org/pipermail/linux-arm-kernel/2013-January/143552.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RESEND PATCH v5 1/4] zram: Fix deadlock bug in partial write

2013-01-27 Thread Pekka Enberg
On Mon, Jan 28, 2013 at 2:38 AM, Minchan Kim  wrote:
> Now zram allocates new page with GFP_KERNEL in zram I/O path
> if IO is partial. Unfortunately, It may cuase deadlock with

s/cuase/cause/g

> reclaim path so this patch solves the problem.

It'd be nice to know about the problem in more detail. I'm also
curious on why you decided on GFP_ATOMIC for the read path and
GFP_NOIO in the write path.

>
> Cc: sta...@vger.kernel.org
> Cc: Jerome Marchand 
> Acked-by: Nitin Gupta 
> Signed-off-by: Minchan Kim 
> ---
>  drivers/staging/zram/zram_drv.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> index 61fb8f1..b285b3a 100644
> --- a/drivers/staging/zram/zram_drv.c
> +++ b/drivers/staging/zram/zram_drv.c
> @@ -220,7 +220,7 @@ static int zram_bvec_read(struct zram *zram, struct 
> bio_vec *bvec,
> user_mem = kmap_atomic(page);
> if (is_partial_io(bvec))
> /* Use  a temporary buffer to decompress the page */
> -   uncmem = kmalloc(PAGE_SIZE, GFP_KERNEL);
> +   uncmem = kmalloc(PAGE_SIZE, GFP_ATOMIC);
> else
> uncmem = user_mem;
>
> @@ -268,7 +268,7 @@ static int zram_bvec_write(struct zram *zram, struct 
> bio_vec *bvec, u32 index,
>  * This is a partial IO. We need to read the full page
>  * before to write the changes.
>  */
> -   uncmem = kmalloc(PAGE_SIZE, GFP_KERNEL);
> +   uncmem = kmalloc(PAGE_SIZE, GFP_NOIO);
> if (!uncmem) {
> pr_info("Error allocating temp memory!\n");
> ret = -ENOMEM;
> --
> 1.7.9.5
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Alex Shi
On 01/28/2013 02:49 PM, Mike Galbraith wrote:
> On Mon, 2013-01-28 at 13:19 +0800, Alex Shi wrote: 
>> On 01/27/2013 06:40 PM, Borislav Petkov wrote:
>>> On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote:
 Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9,
 hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
 loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear
 performance change found.
>>>
>>> Ok, good, You could put that in one of the commit messages so that it is
>>> there and people know that this patchset doesn't cause perf regressions
>>> with the bunch of benchmarks.
>>>
 I also tested balance policy/powersaving policy with above benchmark,
 found, the specjbb2005 drop much 30~50% on both of policy whenever
 with openjdk or jrockit. and hackbench drops a lots with powersaving
 policy on snb 4 sockets platforms. others has no clear change.
>>>
>>> I guess this is expected because there has to be some performance hit
>>> when saving power...
>>>
>>
>> BTW, I had tested the v3 version based on sched numa -- on tip/master.
>> The specjbb just has about 5~7% dropping on balance/powersaving policy.
>> The power scheduling done after the numa scheduling logical.
> 
> That makes sense.  How the numa scheduling numbers compare to mainline?
> Do you have all three available, mainline, and tip w. w/o powersaving
> policy?
> 

I once caught 20~40% performance increasing on sched numa VS mainline
3.7-rc5. but have no baseline to compare balance/powersaving performance
since lower data are acceptable for balance/powersaving and
tip/master changes too quickly to follow up at that time.
:)

> -Mike
> 
> 


-- 
Thanks Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] userns: Allow the unprivileged users to mount mqueue fs

2013-01-27 Thread Eric W. Biederman
Gao feng  writes:

> On 2013/01/28 11:43, Eric W. Biederman wrote:
>> Gao feng  writes:
>> 
>>> This patch allow the unprivileged user to mount mqueuefs in
>>> user ns.
>>>
>>> If two userns share the same ipcns,the files in mqueue fs
>>> should be seen in both these two userns.
>>>
>>> If the userns has its own ipcns,it has its own mqueue fs too.
>>> ipcns has already done this job well.
>> 
>> I am a little dense.  When does userspace actually mount a mqueuefs?
>> My impression was that user space never needed to mount and actually
>> never could mount a mqueuefs.  MS_NO_USER isn't set so mounting a
>> mqueuefs is possible but when does it happen and why?
>> 
>
> Actually the files which representative messgae queue in mqueuefs contains
> some informations,such as QSIZE,NOTIFY,SIGNO,NOTIFY_PID.
>
> My workstation is Fedora 17,mqueuefs is mounted on /dev/mqueue by default.
> So I think at lest this patch is needed by some people.

I have just confirmed that the most you can do in a mounted mqueue fs is
to create files message queues.

Given that these filesystems exist anyway I don't see a problem.

Applied thanks.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 00/44] Meta Linux Kernel Port

2013-01-27 Thread Vineet Gupta
Hi Arnd / Stephen,

On Saturday 26 January 2013 05:55 AM, Arnd Bergmann wrote:
> On Friday 25 January 2013, James Hogan wrote:
>> Hi Arnd,
>>
>> On 10/01/13 15:30, James Hogan wrote:
>>> This patchset adds core architecture support to Linux for Imagination's
>>> Meta ATP (Meta 1) and HTP (Meta 2) processor cores. Most of the feedback
>>> from the RFC and v2 patchsets has now been addressed. All further
>>> feedback is most welcome.
>>>
>>> The patches are based on next-20130110, and can also be found in the
>>> following git tree:
>>>   git://github.com/jahogan/metag-linux.git metag-core
>>
>> Review seems to have gone quiet. I'm fairly happy with this core
>> patchset in it's currently form (only trivial alterations required since
>> the v3 patches, e.g. some review comments and rebasing on linus/master),
>> and would like to get it into the v3.9 merge window. What's the best way
>> forward? I presume I need to get acks on each individual patch?
> 
> 
> I've just looked through the entire series once more and could not find
> any show-stoppers. I consider this ready for 3.9, and I'm also quite happy
> with Vineet's ARC port, although I think he is still integrating some
> feedback comments.

AFAIKS, I've already addressed all the comments in v1 and v2 except moving the
clocksources/clockevent code to drivers. If that is a must, I can do that,
although personally I think it is too arch specific (tied to ARC specific RTSC
insn and such) to be moved out of arch code. Can you please skim thru the latest
v3 series, or just part #1 (changes since v2) if that's too big to reconfirm.

> I'd suggest that you both ask Stephen to add the trees to linux-next
> now (I thought you had done that already, but I don't see them there
> at the moment).

Stephen, can you please add the following branch (rebased off 3.8-rc5) to 
linux-next

git://github.com/foss-for-synopsys-dwc-arc-processors/linux.git  arc-next

With next-20130125, there's a trivial conflict in init/Kconfig - fixable by
accepting both the hunks.

> You don't need Acked-by statements on every single patch, but having
> more of those is certainly benefitial. When it comes to the merge
> window, please send a pull request to Linus, and keep me on Cc,
> so I can weigh in with an additional Ack to the series.

While the first pull request can go directly from github, I presume the 
logistics
for setting up accounts on kernel.org will only kick start after the first batch
of code has been accepted. I can't seem to find any discussions on lists to that
effect.

> Until then, maybe you can have another look at each other's architecture
> trees (ARC and Meta). Since you are in exactly the same situation with
> upstream integration now, you are probably the best people to review
> the code, and you providing ACKs and constructive feedback to the other
> tree will helps others see that you are up to the job as an arch
> maintainer.

I'm certain we've both been looking at each other's patches in last few months 
- I
certainly have for say DeviceTree support so it makes sense to formalize it.
Although "Reviewed-by" will probably be better off that "Ack" in this case.

> I have also given a few comments to one of you that
> may actually apply to the other one just as well, but I can't remember
> now what I discussed with whom ;-)

OK, will re-skim thru all such discussions, although ARC comments are likely to 
be
superset of metag :-)

BTW I went back to hexagon submission patches in 2011 and it seems every new 
arch
makes the exact same mistakes:
-idle sleep race
-faltering in TIF_WORK_RESUME and in fixing that breaking the syscall restarting
-redundant symbols in module.c
...

Will it make sense to have a checklist-for-new-arches with concrete examples of
broken and fixed code so that you, Al and other reviewers don't have to repeat 
the
same thing over and over again.

Thx,
-Vineet

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [net-next RFC] pktgen: don't wait for the device who doesn't free skb immediately after sent

2013-01-27 Thread Jason Wang
On Monday, December 03, 2012 08:01:11 AM Stephen Hemminger wrote:
> On Mon, 03 Dec 2012 14:45:46 +0800
> 
> Jason Wang  wrote:
> > On Tuesday, November 27, 2012 08:49:19 AM Stephen Hemminger wrote:
> > > On Tue, 27 Nov 2012 14:45:13 +0800
> > > 
> > > Jason Wang  wrote:
> > > > On 11/27/2012 01:37 AM, Stephen Hemminger wrote:
> > > > > On Mon, 26 Nov 2012 15:56:52 +0800
> > > > > 
> > > > > Jason Wang  wrote:
> > > > >> Some deivces do not free the old tx skbs immediately after it has
> > > > >> been
> > > > >> sent
> > > > >> (usually in tx interrupt). One such example is virtio-net which
> > > > >> optimizes for virt and only free the possible old tx skbs during
> > > > >> the
> > > > >> next packet sending. This would lead the pktgen to wait forever in
> > > > >> the
> > > > >> refcount of the skb if no other pakcet will be sent afterwards.
> > > > >> 
> > > > >> Solving this issue by introducing a new flag IFF_TX_SKB_FREE_DELAY
> > > > >> which could notify the pktgen that the device does not free skb
> > > > >> immediately after it has been sent and let it not to wait for the
> > > > >> refcount to be one.
> > > > >> 
> > > > >> Signed-off-by: Jason Wang 
> > > > > 
> > > > > Another alternative would be using skb_orphan() and skb->destructor.
> > > > > There are other cases where skb's are not freed right away.
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > > > > the body of a message to majord...@vger.kernel.org
> > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > 
> > > > Hi Stephen:
> > > > 
> > > > Do you mean registering a skb->destructor for pktgen then set and
> > > > check
> > > > bits in skb->tx_flag?
> > > 
> > > Yes. Register a destructor that does something like update a counter
> > > (number of packets pending), then just spin while number of packets
> > > pending is over threshold.
> > 
> > Have some experiments on this, looks like it does not work weel when
> > clone_skb is used. For driver that call skb_orphan() in ndo_start_xmit,
> > the destructor is only called when the first packet were sent, but what
> > we need to know is when the last were sent. Any thoughts on this or we
> > can just introduce another flag (anyway we have something like
> > IFF_TX_SKB_SHARING) ?
> 
> The SKB_SHARING flag looks like the best solution then.
> Surprisingly, transmit buffer completion is a major bottleneck for 10G
> devices, and I suspect more changes will come.

It works, but we may lose some chances to use clone_skb and stress the device 
and driver more. I'm thinking maybe we can turn back to my original RFC to 
introduce another flag. This flag maybe also useful for BQL and zerocopy in the 
future since both of them are sensitive to the transmit buffer completion. 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net-next 2/3] virtio_net: multiqueue support

2013-01-27 Thread Jason Wang
On Tuesday, December 04, 2012 03:24:22 PM Michael S. Tsirkin wrote:
> I found some bugs, see below.
> Also some style nitpicking, this is not mandatory to address.

Thanks for the reviewing.
> 
> On Tue, Dec 04, 2012 at 07:07:57PM +0800, Jason Wang wrote:
> > This addes multiqueue support to virtio_net driver. In multiple queue
> > modes, the driver expects the number of queue paris is equal to the
> > number of vcpus. To eliminate the contention bettwen vcpus and
> > virtqueues, per-cpu virtqueue pairs
> > were implemented through:
> Lots of typos above - try running ispell on it :)
> 

Sure.
> > - select the txq based on the smp processor id.
> > - smp affinity hint were set to the vcpu that owns the queue pairs.
> > 
> > Signed-off-by: Krishna Kumar 
> > Signed-off-by: Jason Wang 
> > ---
> > 
> >  drivers/net/virtio_net.c|  472
> >  ++- include/uapi/linux/virtio_net.h
> >  |   16 ++
> >  2 files changed, 385 insertions(+), 103 deletions(-)
> > 
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 266f712..912f5b2 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -81,16 +81,25 @@ struct virtnet_info {
> > 
> > struct virtio_device *vdev;
> > struct virtqueue *cvq;
> > struct net_device *dev;
> > 
> > -   struct send_queue sq;
> > -   struct receive_queue rq;
> > +   struct send_queue *sq;
> > +   struct receive_queue *rq;
> > 
> > unsigned int status;
> > 
> > +   /* Max # of queue pairs supported by the device */
> > +   u16 max_queue_pairs;
> > +
> > +   /* # of queue pairs currently used by the driver */
> > +   u16 curr_queue_pairs;
> > +
> > 
> > /* I like... big packets and I cannot lie! */
> > bool big_packets;
> > 
> > /* Host will merge rx buffers for big packets (shake it! shake it!) */
> > bool mergeable_rx_bufs;
> > 
> > +   /* Has control virtqueue */
> > +   bool has_cvq;
> > +
> > 
> > /* enable config space updates */
> > bool config_enable;
> > 
> > @@ -125,6 +134,32 @@ struct padded_vnet_hdr {
> > 
> > char padding[6];
> >  
> >  };
> > 
> > +static const struct ethtool_ops virtnet_ethtool_ops;
> > +
> > +
> > +/* Converting between virtqueue no. and kernel tx/rx queue no.
> > + * 0:rx0 1:tx0 2:rx1 3:tx1 ... 2N:rxN 2N+1:txN 2N+2:cvq
> > + */
> > +static int vq2txq(struct virtqueue *vq)
> > +{
> > +   return (virtqueue_get_queue_index(vq) - 1) / 2;
> > +}
> > +
> > +static int txq2vq(int txq)
> > +{
> > +   return txq * 2 + 1;
> > +}
> > +
> > +static int vq2rxq(struct virtqueue *vq)
> > +{
> > +   return virtqueue_get_queue_index(vq) / 2;
> > +}
> > +
> > +static int rxq2vq(int rxq)
> > +{
> > +   return rxq * 2;
> > +}
> > +
> > 
> >  static inline struct skb_vnet_hdr *skb_vnet_hdr(struct sk_buff *skb)
> >  {
> >  
> > return (struct skb_vnet_hdr *)skb->cb;
> > 
> > @@ -165,7 +200,7 @@ static void skb_xmit_done(struct virtqueue *vq)
> > 
> > virtqueue_disable_cb(vq);
> > 
> > /* We were probably waiting for more output buffers. */
> > 
> > -   netif_wake_queue(vi->dev);
> > +   netif_wake_subqueue(vi->dev, vq2txq(vq));
> > 
> >  }
> >  
> >  static void set_skb_frag(struct sk_buff *skb, struct page *page,
> > 
> > @@ -502,7 +537,7 @@ static bool try_fill_recv(struct receive_queue *rq,
> > gfp_t gfp)> 
> >  static void skb_recv_done(struct virtqueue *rvq)
> >  {
> >  
> > struct virtnet_info *vi = rvq->vdev->priv;
> > 
> > -   struct receive_queue *rq = >rq;
> > +   struct receive_queue *rq = >rq[vq2rxq(rvq)];
> > 
> > /* Schedule NAPI, Suppress further interrupts if successful. */
> > if (napi_schedule_prep(>napi)) {
> > 
> > @@ -532,15 +567,21 @@ static void refill_work(struct work_struct *work)
> > 
> > struct virtnet_info *vi =
> > 
> > container_of(work, struct virtnet_info, refill.work);
> > 
> > bool still_empty;
> > 
> > +   int i;
> > +
> > +   for (i = 0; i < vi->max_queue_pairs; i++) {
> > +   struct receive_queue *rq = >rq[i];
> > 
> > -   napi_disable(>rq.napi);
> > -   still_empty = !try_fill_recv(>rq, GFP_KERNEL);
> > -   virtnet_napi_enable(>rq);
> > +   napi_disable(>napi);
> > +   still_empty = !try_fill_recv(rq, GFP_KERNEL);
> > +   virtnet_napi_enable(rq);
> > 
> > -   /* In theory, this can happen: if we don't get any buffers in
> > -* we will *never* try to fill again. */
> > -   if (still_empty)
> > -   schedule_delayed_work(>refill, HZ/2);
> > +   /* In theory, this can happen: if we don't get any buffers in
> > +* we will *never* try to fill again.
> > +*/
> > +   if (still_empty)
> > +   schedule_delayed_work(>refill, HZ/2);
> > +   }
> > 
> >  }
> >  
> >  static int virtnet_poll(struct napi_struct *napi, int budget)
> > 
> > @@ -650,7 +691,8 @@ static int xmit_skb(struct send_queue *sq, struct
> > sk_buff *skb)> 
> >  static netdev_tx_t start_xmit(struct 

Re: [PATCH net-next 3/3] virtio-net: change the number of queues through ethtool

2013-01-27 Thread Jason Wang
On Tuesday, December 04, 2012 03:49:59 PM Michael S. Tsirkin wrote:
> On Tue, Dec 04, 2012 at 07:07:58PM +0800, Jason Wang wrote:
> > This patch implement the ethtool_{set|get}_channels method of ethool to
> > allow user to change the number of queues dymaically when the device is
> > running. This would let the user to configure it on demand.
> > 
> > Signed-off-by: Jason Wang 
> > ---
> > 
> >  drivers/net/virtio_net.c |   44
> >   1 files changed, 44
> >  insertions(+), 0 deletions(-)
> > 
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 912f5b2..b9f9887 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1589,10 +1589,54 @@ static struct virtio_driver virtio_net_driver = {
> > 
> >  #endif
> >  };
> > 
> > +/* TODO: Eliminate OOO packets during switching */
> > +static int virtnet_set_channels(struct net_device *dev,
> > +   struct ethtool_channels *channels)
> > +{
> > +   struct virtnet_info *vi = netdev_priv(dev);
> > +   u16 queue_pairs = channels->combined_count;
> > +   u16 old_queue_pairs = vi->curr_queue_pairs;
> > +
> > +   /* We don't support separate rx/tx channels.
> > +* We don't allow setting 'other' channels.
> > +*/
> > +   if (channels->rx_count || channels->tx_count || channels->other_count)
> > +   return -EINVAL;
> > +
> > +   if (queue_pairs > vi->max_queue_pairs)
> > +   return -EINVAL;
> > +
> > +   vi->curr_queue_pairs = queue_pairs;
> > +   if (virtnet_set_queues(vi) == 0) {
> > +   netif_set_real_num_tx_queues(dev, vi->curr_queue_pairs);
> > +   netif_set_real_num_rx_queues(dev, vi->curr_queue_pairs);
> 
> Just use queue_pairs - it's shorter.

Ok.
> 
> > +
> > +   virtnet_set_affinity(vi, true);
> > +   } else
> > +   vi->curr_queue_pairs = old_queue_pairs;
> 
> Should be
>   ret = virtnet_set_queues(vi);
>   if (ret) {
>   vi->curr_queue_pairs = old_queue_pairs;
>   return ret;
>   }
> otherwise we loose error reporting.
> 

Right.
> Also it's better if virtnet_set_queues
> gets queue_pairs as parameter and set curr_queue_pairs
> on success.

True, looks simpler than current method, will do it in next version.
> 
> > +
> > +   return 0;
> > +}
> > +
> > +static void virtnet_get_channels(struct net_device *dev,
> > +struct ethtool_channels *channels)
> > +{
> > +   struct virtnet_info *vi = netdev_priv(dev);
> > +
> > +   channels->combined_count = vi->curr_queue_pairs;
> > +   channels->max_combined = vi->max_queue_pairs;
> > +   channels->max_other = 0;
> > +   channels->rx_count = 0;
> > +   channels->tx_count = 0;
> > +   channels->other_count = 0;
> > +}
> > +
> > 
> >  static const struct ethtool_ops virtnet_ethtool_ops = {
> >  
> > .get_drvinfo = virtnet_get_drvinfo,
> > .get_link = ethtool_op_get_link,
> > .get_ringparam = virtnet_get_ringparam,
> > 
> > +   .set_channels = virtnet_set_channels,
> > +   .get_channels = virtnet_get_channels,
> > 
> >  };
> >  
> >  static int __init init(void)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] ARM: davinci: da850: add RTC driver DT entries

2013-01-27 Thread Katepallewar, Mrugesh
On Fri, Jan 25, 2013 at 16:52:18, Nori, Sekhar wrote:
> On 1/25/2013 4:34 PM, Katepallewar, Mrugesh wrote:
> > On Fri, Jan 25, 2013 at 16:04:22, Nori, Sekhar wrote:
> >> On 1/25/2013 11:43 AM, Mrugesh Katepallewar wrote:
> >>> Add RTC DT entries in da850 dts file.
> >>>
> >>> Signed-off-by: Mrugesh Katepallewar 
> >>> ---
> >>> Applies on top of v3.8-rc4 of linus tree.
> >>>
> >>> Tested on da850-evm device.
> >>>
> >>> Test Procedure:
> >>> date [.]MM.DD-hh:mm[:ss]
> >>> hwclock -w
> >>> reset board and check system time.
> >>>
> >>> :100644 100644 37dc5a3... b16efd4... March/arm/boot/dts/da850-evm.dts
> >>> :100644 100644 640ab75... a8eb1b1... March/arm/boot/dts/da850.dtsi
> >>>  arch/arm/boot/dts/da850-evm.dts |3 +++
> >>>  arch/arm/boot/dts/da850.dtsi|8 
> >>>  2 files changed, 11 insertions(+)
> >>>
> >>> diff --git a/arch/arm/boot/dts/da850-evm.dts 
> >>> b/arch/arm/boot/dts/da850-evm.dts index 37dc5a3..b16efd4 100644
> >>> --- a/arch/arm/boot/dts/da850-evm.dts
> >>> +++ b/arch/arm/boot/dts/da850-evm.dts
> >>> @@ -24,5 +24,8 @@
> >>>   serial2: serial@1d0d000 {
> >>>   status = "okay";
> >>>   };
> >>> + rtc@1c23000 {
> 
> I did not mention this last time, but this should be:
> 
>   rtc0: rtc@1c23000 {
> 
> to be consistent with rest of the file.

Okay. I will send this in next version.
> 
> >>> + status = "okay";
> >>> + };
> >>>   };
> >>>  };
> >>> diff --git a/arch/arm/boot/dts/da850.dtsi 
> >>> b/arch/arm/boot/dts/da850.dtsi index 640ab75..a8eb1b1 100644
> >>> --- a/arch/arm/boot/dts/da850.dtsi
> >>> +++ b/arch/arm/boot/dts/da850.dtsi
> >>> @@ -56,5 +56,13 @@
> >>>   interrupt-parent = <>;
> >>>   status = "disabled";
> >>>   };
> >>> + rtc@1c23000 {
> 
> Here too.
> 
Okay. I will send this in next version.

> >>> + compatible = "ti,da830-rtc";
> >>> + reg = <0x23000 0x1000>;
> >>> + interrupts = <19
> >>> +   19>;
> >>
> >> Why two interrupts of the same number? If there is only one interrupt line 
> >> then only one should be specified, no?
> > We are using common OMAP RTC driver for da850 and other ti SoC's 
> > (e.g.am33xx). I have seen in am33xx.dtsi rtc timer and alarm interrupts are 
> > different. So, two interrupt numbers are expected from RTC DT node.
> 
> Okay. Looking at the OMAP-L138 TRM, I see that the IP on that SoC supports 
> both alarm and timer events and both arrive on the same interrupt number (as 
> opposed to am335x where they supposedly arrive on different interrupt lines). 
> What you have looks fine to me considering this.
> 
> Also, the interrupt-parent setting needs to be removed. See the patch 
> Prabhakar just submitted.

Okay. I will send this in next version.

> 
> Thanks,
> Sekhar
> 

Regards, 
Mrugesh
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH] x86/apic: check FADT settings after enable x2apic

2013-01-27 Thread Yinghai Lu
On Sun, Jan 27, 2013 at 9:05 PM, Wang, Song-Bo (Stoney)
 wrote:
> Hi Yinghai, hpa and others,
>
> Would you please review the patch on detecting x2apic FADT settings?
>
> We meet a BIOS system which works on x2apic physical mode by setting the bit 
> ACPI_FADT_APIC_PHYSICAL in FADT table.
> And for those systems with all cpuid < 255, the spec requires BIOS's default 
> mode in xapic.
> The kernel detects the default mode and do some initializations and will call 
> enable_IR_x2apic and change the mode to x2apic successfully.

Hi, Peter and Ingo,

I checked the patch, and looks right.

I updated the changelog and simplify the code a little bit.

Please check if you can put it into tip:x86/apic

Thanks

Yinghai


wang_hp_x2apic.patch
Description: Binary data


Re: [PATCH -v4 0/5] x86,smp: make ticket spinlock proportional backoff w/ auto tuning

2013-01-27 Thread Raghavendra K T

On 01/26/2013 12:35 AM, Rik van Riel wrote:

Many spinlocks are embedded in data structures; having many CPUs
pounce on the cache line the lock is in will slow down the lock
holder, and can cause system performance to fall off a cliff.

The paper "Non-scalable locks are dangerous" is a good reference:

http://pdos.csail.mit.edu/papers/linux:lock.pdf

In the Linux kernel, spinlocks are optimized for the case of
there not being contention. After all, if there is contention,
the data structure can be improved to reduce or eliminate
lock contention.

Likewise, the spinlock API should remain simple, and the
common case of the lock not being contended should remain
as fast as ever.

However, since spinlock contention should be fairly uncommon,
we can add functionality into the spinlock slow path that keeps
system performance from falling off a cliff when there is lock
contention.

Proportional delay in ticket locks is delaying the time between
checking the ticket based on a delay factor, and the number of
CPUs ahead of us in the queue for this lock. Checking the lock
less often allows the lock holder to continue running, resulting
in better throughput and preventing performance from dropping
off a cliff.

The test case has a number of threads locking and unlocking a
semaphore. With just one thread, everything sits in the CPU
cache and throughput is around 2.6 million operations per
second, with a 5-10% variation.

Once a second thread gets involved, data structures bounce
from CPU to CPU, and performance deteriorates to about 1.25
million operations per second, with a 5-10% variation.

However, as more and more threads get added to the mix,
performance with the vanilla kernel continues to deteriorate.
Once I hit 24 threads, on a 24 CPU, 4 node test system,
performance is down to about 290k operations/second.

With a proportional backoff delay added to the spinlock
code, performance with 24 threads goes up to about 400k
operations/second with a 50x delay, and about 900k operations/second
with a 250x delay. However, with a 250x delay, performance with
2-5 threads is worse than with a 50x delay.

Making the code auto-tune the delay factor results in a system
that performs well with both light and heavy lock contention,
and should also protect against the (likely) case of the fixed
delay factor being wrong for other hardware.

The attached graph shows the performance of the multi threaded
semaphore lock/unlock test case, with 1-24 threads, on the
vanilla kernel, with 10x, 50x, and 250x proportional delay,
as well as the v1 patch series with autotuning for 2x and 2.7x
spinning before the lock is obtained, and with the v2 series.

The v2 series integrates several ideas from Michel Lespinasse
and Eric Dumazet, which should result in better throughput and
nicer behaviour in situations with contention on multiple locks.

For the v3 series, I tried out all the ideas suggested by
Michel. They made perfect sense, but in the end it turned
out they did not work as well as the simple, aggressive
"try to make the delay longer" policy I have now. Several
small bug fixes and cleanups have been integrated.

For the v4 series, I added code to keep the maximum spinlock
delay to a small value when running in a virtual machine. That
should solve the performance regression seen in virtual machines.

The performance issue observed with AIM7 is still a mystery.

Performance is within the margin of error of v2, so the graph
has not been update.

Please let me know if you manage to break this code in any way,
so I can fix it...



After the introduction of patch 5 which limits the delay loops to 16,
I am no more seeing the degradation in virtual guests as reported
earlier, but improvements.

For the whole series:
Reviewed-by: Raghavendra K T 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Mike Galbraith
On Mon, 2013-01-28 at 13:19 +0800, Alex Shi wrote: 
> On 01/27/2013 06:40 PM, Borislav Petkov wrote:
> > On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote:
> >> Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9,
> >> hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
> >> loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear
> >> performance change found.
> > 
> > Ok, good, You could put that in one of the commit messages so that it is
> > there and people know that this patchset doesn't cause perf regressions
> > with the bunch of benchmarks.
> > 
> >> I also tested balance policy/powersaving policy with above benchmark,
> >> found, the specjbb2005 drop much 30~50% on both of policy whenever
> >> with openjdk or jrockit. and hackbench drops a lots with powersaving
> >> policy on snb 4 sockets platforms. others has no clear change.
> > 
> > I guess this is expected because there has to be some performance hit
> > when saving power...
> > 
> 
> BTW, I had tested the v3 version based on sched numa -- on tip/master.
> The specjbb just has about 5~7% dropping on balance/powersaving policy.
> The power scheduling done after the numa scheduling logical.

That makes sense.  How the numa scheduling numbers compare to mainline?
Do you have all three available, mainline, and tip w. w/o powersaving
policy?

-Mike


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] userns: Allow the unprivileged users to mount mqueue fs

2013-01-27 Thread Gao feng
On 2013/01/28 11:43, Eric W. Biederman wrote:
> Gao feng  writes:
> 
>> This patch allow the unprivileged user to mount mqueuefs in
>> user ns.
>>
>> If two userns share the same ipcns,the files in mqueue fs
>> should be seen in both these two userns.
>>
>> If the userns has its own ipcns,it has its own mqueue fs too.
>> ipcns has already done this job well.
> 
> I am a little dense.  When does userspace actually mount a mqueuefs?
> My impression was that user space never needed to mount and actually
> never could mount a mqueuefs.  MS_NO_USER isn't set so mounting a
> mqueuefs is possible but when does it happen and why?
> 

Actually the files which representative messgae queue in mqueuefs contains
some informations,such as QSIZE,NOTIFY,SIGNO,NOTIFY_PID.

My workstation is Fedora 17,mqueuefs is mounted on /dev/mqueue by default.
So I think at lest this patch is needed by some people.

Thanks!
Gao

> I am trying to think through the logic here and I think this is safe
> but since I don't understand why we would mount an mqueue fs I am
> having trouble verifying that there are no silly reasons why this might
> be a bad idea.
> 
> But from what I can tell so far this seems like a good patch.
> 
> Eric
> 
> 
>> Signed-off-by: Gao feng 
>> ---
>>  ipc/mqueue.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/ipc/mqueue.c b/ipc/mqueue.c
>> index 71a3ca1..023c986 100644
>> --- a/ipc/mqueue.c
>> +++ b/ipc/mqueue.c
>> @@ -1383,6 +1383,7 @@ static struct file_system_type mqueue_fs_type = {
>>  .name = "mqueue",
>>  .mount = mqueue_mount,
>>  .kill_sb = kill_litter_super,
>> +.fs_flags = FS_USERNS_MOUNT,
>>  };
>>  
>>  int mq_init_ns(struct ipc_namespace *ns)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Mike Galbraith
On Mon, 2013-01-28 at 07:15 +0100, Mike Galbraith wrote: 
> On Mon, 2013-01-28 at 13:51 +0800, Alex Shi wrote: 
> > On 01/28/2013 01:17 PM, Mike Galbraith wrote:
> > > On Sun, 2013-01-27 at 16:51 +0100, Mike Galbraith wrote: 
> > >> On Sun, 2013-01-27 at 21:25 +0800, Alex Shi wrote: 
> > >>> On 01/27/2013 06:35 PM, Borislav Petkov wrote:
> >  On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote:
> > > With aim7 compute on 4 node 40 core box, I see stable throughput
> > > improvement at tasks = nr_cores and below w. balance and powersaving. 
> > >> ... 
> >  Ok, this is sick. How is balance and powersaving better than perf? Both
> >  have much more jobs per minute than perf; is that because we do pack
> >  much more tasks per cpu with balance and powersaving?
> > >>>
> > >>> Maybe it is due to the lazy balancing on balance/powersaving. You can
> > >>> check the CS times in /proc/pid/status.
> > >>
> > >> Well, it's not wakeup path, limiting entry frequency per waker did zip
> > >> squat nada to any policy throughput.
> > > 
> > > monteverdi:/abuild/mike/:[0]# echo powersaving > 
> > > /sys/devices/system/cpu/sched_policy/current_sched_policy
> > > monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> > > 043321  00058616
> > > 043313  00058616
> > > 043318  00058968
> > > 043317  00058968
> > > 043316  00059184
> > > 043319  00059192
> > > 043320  00059048
> > > 043314  00059048
> > > 043312  00058176
> > > 043315  00058184
> > > monteverdi:/abuild/mike/:[0]# echo balance > 
> > > /sys/devices/system/cpu/sched_policy/current_sched_policy
> > > monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> > > 043337  00053448
> > > 04  00053456
> > > 043338  00052992
> > > 043331  00053448
> > > 043332  00053488
> > > 043335  00053496
> > > 043334  00053480
> > > 043329  00053288
> > > 043336  00053464
> > > 043330  00053496
> > > monteverdi:/abuild/mike/:[0]# echo performance > 
> > > /sys/devices/system/cpu/sched_policy/current_sched_policy
> > > monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> > > 043348  00052488
> > > 043344  00052488
> > > 043349  00052744
> > > 043343  00052504
> > > 043347  00052504
> > > 043352  00052888
> > > 043345  00052504
> > > 043351  00052496
> > > 043346  00052496
> > > 043350  00052304
> > > monteverdi:/abuild/mike/:[0]#
> > 
> > similar with aim7 results. Thanks, Mike!
> > 
> > Wold you like to collect vmstat info in background?
> > > 
> > > Zzzt.  Wish I could turn turbo thingy off.
> > 
> > Do you mean the turbo mode of cpu frequency? I remember some of machine
> > can disable it in BIOS.
> 
> Yeah, I can do that in my local x3550 box.  I can't fiddle with BIOS
> settings on the remote NUMA box.
> 
> This can't be anything but turbo gizmo mucking up the numbers I think,
> not that the numbers are invalid or anything, better numbers are better
> numbers no matter where/how they come about ;-)
> 
> The massive_intr load is dirt simple sleep/spin with bean counting.  It
> sleeps 1ms spins 8ms.  Change that to sleep 8ms, grind away for 1ms...
> 
> monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60
> 045150  6484
> 045157  6427
> 045156  6401
> 045152  6428
> 045155  6372
> 045154  6370
> 045158  6453
> 045149  6372
> 045151  6371
> 045153  6371
> monteverdi:/abuild/mike/:[0]# echo balance > 
> /sys/devices/system/cpu/sched_policy/current_sched_policy
> monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60
> 045170  6380
> 045172  6374
> 045169  6376
> 045175  6376
> 045171  6334
> 045176  6380
> 045168  6374
> 045174  6334
> 045177  6375
> 045173  6376
> monteverdi:/abuild/mike/:[0]# echo performance > 
> /sys/devices/system/cpu/sched_policy/current_sched_policy
> monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60
> 045198  6408
> 045191  6408
> 045197  6408
> 045192  6411
> 045194  6409
> 045196  6409
> 045195  6336
> 045189  6336
> 045193  6411
> 045190  6410

Back to original 1ms sleep, 8ms work, turning NUMA box into a single
node 10 core box with numactl.

monteverdi:/abuild/mike/:[0]# echo powersaving > 
/sys/devices/system/cpu/sched_policy/current_sched_policy
monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
045286  00043872
045289  00043464
045284  00043488
045287  00043440
045283  00043416
045281  00044456
045285  00043456
045288  00044312
045280  00043048
045282  00043240
monteverdi:/abuild/mike/:[0]# echo balance > 
/sys/devices/system/cpu/sched_policy/current_sched_policy
monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
045300  00052536
045307  00052472
045304  00052536
045299  00052536
045305  00052520
045306  00052528
045302  00052528
045303  00052528
045308  00052512
045301  00052520
monteverdi:/abuild/mike/:[0]# echo performance > 
/sys/devices/system/cpu/sched_policy/current_sched_policy
monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
045339  

Linus GIT 3.8.0-rc5: INFO: possible circular locking dependency detected -- ((fb_notifier_list).rwsem){.+.+.+}, at: [] __blocking_notifier_call_chain+0x49/0x80

2013-01-27 Thread Miles Lane
Hi Daniel,
At the bottom of this message you will find dmesg output showing this
problem from the current Linus GIT tree.
Here is the test of the message you wrote about this
(http://marc.info/?l=dri-devel=135905755124554=2):
--
Patches for the know issues around console_lock vs fbdev_notifier are in -mm:

http://ozlabs.org/~akpm/mmots/broken-out/fb-rework-locking-to-fix-lock-ordering-on-takeover.patch
http://ozlabs.org/~akpm/mmots/broken-out/fb-yet-another-band-aid-for-fixing-lockdep-mess.patch

Unfortunately the patches seem to be stuck there for now despite quite
a few reports about this (including seemingly relevant background
noise about hangs in distro bugzillas).
-Daniel
--

[  489.832113] [ INFO: possible circular locking dependency detected ]
[  489.832115] 3.8.0-rc5 #99 Not tainted
[  489.832116] ---
[  489.832117] 99video/4306 is trying to acquire lock:
[  489.832129]  ((fb_notifier_list).rwsem){.+.+.+}, at:
[] __blocking_notifier_call_chain+0x49/0x80
[  489.832130]
[  489.832130] but task is already holding lock:
[  489.832136]  (console_lock){+.+.+.}, at: []
store_fbstate+0x43/0x71
[  489.832137]
[  489.832137] which lock already depends on the new lock.
[  489.832137]
[  489.832138]
[  489.832138] the existing dependency chain (in reverse order) is:
[  489.832141]
[  489.832141] -> #1 (console_lock){+.+.+.}:
[  489.832146][] lock_acquire+0xfe/0x14d
[  489.832150][] console_lock+0x64/0x66
[  489.832154][] register_con_driver+0x33/0x123
[  489.832158][] take_over_console+0x21/0x266
[  489.832161][] fbcon_takeover+0x56/0x98
[  489.832165][] fbcon_event_notify+0x3b6/0x6e4
[  489.832169][] notifier_call_chain+0x8c/0xc0
[  489.832173][]
__blocking_notifier_call_chain+0x5f/0x80
[  489.832176][] blocking_notifier_call_chain+0xf/0x11
[  489.832181][] fb_notifier_call_chain+0x16/0x18
[  489.832184][] register_framebuffer+0x216/0x27a
[  489.832189][] vesafb_probe+0x6df/0x75f
[  489.832193][] platform_drv_probe+0x34/0x5e
[  489.832196][] driver_probe_device+0x90/0x19b
[  489.832199][] __driver_attach+0x4e/0x6f
[  489.832202][] bus_for_each_dev+0x52/0x85
[  489.832205][] driver_attach+0x19/0x1b
[  489.832208][] bus_add_driver+0xf7/0x21a
[  489.832211][] driver_register+0x8c/0x110
[  489.832214][] platform_driver_register+0x41/0x43
[  489.832217][] platform_driver_probe+0x18/0x8a
[  489.832220][] vesafb_init+0x215/0x258
[  489.832224][] do_one_initcall+0x7a/0x130
[  489.832228][] kernel_init_freeable+0x109/0x191
[  489.832233][] kernel_init+0x9/0xd1
[  489.832236][] ret_from_fork+0x7c/0xb0
[  489.832240]
[  489.832240] -> #0 ((fb_notifier_list).rwsem){.+.+.+}:
[  489.832242][] __lock_acquire+0xacc/0xe0c
[  489.832245][] lock_acquire+0xfe/0x14d
[  489.832249][] down_read+0x3f/0x4b
[  489.832253][]
__blocking_notifier_call_chain+0x49/0x80
[  489.832257][] blocking_notifier_call_chain+0xf/0x11
[  489.832260][] fb_notifier_call_chain+0x16/0x18
[  489.832263][] fb_set_suspend+0x22/0x4b
[  489.832266][] store_fbstate+0x4e/0x71
[  489.832270][] dev_attr_store+0x13/0x1f
[  489.832274][] sysfs_write_file+0xe9/0x121
[  489.832278][] vfs_write+0x91/0xd0
[  489.832281][] sys_write+0x5a/0x8b
[  489.832284][] system_call_fastpath+0x16/0x1b
[  489.832285]
[  489.832285] other info that might help us debug this:
[  489.832285]
[  489.832286]  Possible unsafe locking scenario:
[  489.832286]
[  489.832287]CPU0CPU1
[  489.832288]
[  489.832290]   lock(console_lock);
[  489.832292]lock((fb_notifier_list).rwsem);
[  489.832294]lock(console_lock);
[  489.832296]   lock((fb_notifier_list).rwsem);
[  489.832297]
[  489.832297]  *** DEADLOCK ***
[  489.832297]
[  489.832298] 4 locks held by 99video/4306:
[  489.832304]  #0:  (>mutex){+.+.+.}, at:
[] sysfs_write_file+0x37/0x121
[  489.832310]  #1:  (s_active#204){.+.+.+}, at: []
sysfs_write_file+0xd1/0x121
[  489.832315]  #2:  (_info->lock){+.+.+.}, at:
[] lock_fb_info+0x18/0x37
[  489.832321]  #3:  (console_lock){+.+.+.}, at: []
store_fbstate+0x43/0x71
[  489.832321]
[  489.832321] stack backtrace:
[  489.832324] Pid: 4306, comm: 99video Not tainted 3.8.0-rc5 #99
[  489.832325] Call Trace:
[  489.832330]  [] print_circular_bug+0x1f6/0x204
[  489.832333]  [] __lock_acquire+0xacc/0xe0c
[  489.832337]  [] lock_acquire+0xfe/0x14d
[  489.832341]  [] ? __blocking_notifier_call_chain+0x49/0x80
[  489.832345]  [] down_read+0x3f/0x4b
[  489.832348]  [] ? __blocking_notifier_call_chain+0x49/0x80
[  489.832352]  [] 

Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-01-27 Thread Simon Jeons
On Fri, 2013-01-25 at 18:01 -0800, Hugh Dickins wrote:
> Switching merge_across_nodes after running KSM is liable to oops on stale
> nodes still left over from the previous stable tree.  It's not something
> that people will often want to do, but it would be lame to demand a reboot
> when they're trying to determine which merge_across_nodes setting is best.
> 
> How can this happen?  We only permit switching merge_across_nodes when
> pages_shared is 0, and usually set run 2 to force that beforehand, which
> ought to unmerge everything: yet oopses still occur when you then run 1.
> 
> Three causes:
> 
> 1. The old stable tree (built according to the inverse merge_across_nodes)
> has not been fully torn down.  A stable node lingers until get_ksm_page()
> notices that the page it references no longer references it: but the page
> is not necessarily freed as soon as expected, particularly when swapcache.
> 
> Fix this with a pass through the old stable tree, applying get_ksm_page()
> to each of the remaining nodes (most found stale and removed immediately),
> with forced removal of any left over.  Unless the page is still mapped:
> I've not seen that case, it shouldn't occur, but better to WARN_ON_ONCE
> and EBUSY than BUG.
> 
> 2. __ksm_enter() has a nice little optimization, to insert the new mm
> just behind ksmd's cursor, so there's a full pass for it to stabilize
> (or be removed) before ksmd addresses it.  Nice when ksmd is running,
> but not so nice when we're trying to unmerge all mms: we were missing
> those mms forked and inserted behind the unmerge cursor.  Easily fixed
> by inserting at the end when KSM_RUN_UNMERGE.
> 
> 3. It is possible for a KSM page to be faulted back from swapcache into
> an mm, just after unmerge_and_remove_all_rmap_items() scanned past it.
> Fix this by copying on fault when KSM_RUN_UNMERGE: but that is private
> to ksm.c, so dissolve the distinction between ksm_might_need_to_copy()
> and ksm_does_need_to_copy(), doing it all in the one call into ksm.c.
> 
> A long outstanding, unrelated bugfix sneaks in with that third fix:
> ksm_does_need_to_copy() would copy from a !PageUptodate page (implying
> I/O error when read in from swap) to a page which it then marks Uptodate.
> Fix this case by not copying, letting do_swap_page() discover the error.
> 
> Signed-off-by: Hugh Dickins 
> ---
>  include/linux/ksm.h |   18 ++---
>  mm/ksm.c|   83 +++---
>  mm/memory.c |   19 -
>  3 files changed, 92 insertions(+), 28 deletions(-)
> 
> --- mmotm.orig/include/linux/ksm.h2013-01-25 14:27:58.220193250 -0800
> +++ mmotm/include/linux/ksm.h 2013-01-25 14:37:00.764206145 -0800
> @@ -16,9 +16,6 @@
>  struct stable_node;
>  struct mem_cgroup;
>  
> -struct page *ksm_does_need_to_copy(struct page *page,
> - struct vm_area_struct *vma, unsigned long address);
> -
>  #ifdef CONFIG_KSM
>  int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
>   unsigned long end, int advice, unsigned long *vm_flags);
> @@ -73,15 +70,8 @@ static inline void set_page_stable_node(
>   * We'd like to make this conditional on vma->vm_flags & VM_MERGEABLE,
>   * but what if the vma was unmerged while the page was swapped out?
>   */
> -static inline int ksm_might_need_to_copy(struct page *page,
> - struct vm_area_struct *vma, unsigned long address)
> -{
> - struct anon_vma *anon_vma = page_anon_vma(page);
> -
> - return anon_vma &&
> - (anon_vma->root != vma->anon_vma->root ||
> -  page->index != linear_page_index(vma, address));
> -}
> +struct page *ksm_might_need_to_copy(struct page *page,
> + struct vm_area_struct *vma, unsigned long address);
>  
>  int page_referenced_ksm(struct page *page,
>   struct mem_cgroup *memcg, unsigned long *vm_flags);
> @@ -113,10 +103,10 @@ static inline int ksm_madvise(struct vm_
>   return 0;
>  }
>  
> -static inline int ksm_might_need_to_copy(struct page *page,
> +static inline struct page *ksm_might_need_to_copy(struct page *page,
>   struct vm_area_struct *vma, unsigned long address)
>  {
> - return 0;
> + return page;
>  }
>  
>  static inline int page_referenced_ksm(struct page *page,
> --- mmotm.orig/mm/ksm.c   2013-01-25 14:36:58.856206099 -0800
> +++ mmotm/mm/ksm.c2013-01-25 14:37:00.768206145 -0800
> @@ -644,6 +644,57 @@ static int unmerge_ksm_pages(struct vm_a
>  /*
>   * Only called through the sysfs control interface:
>   */
> +static int remove_stable_node(struct stable_node *stable_node)
> +{
> + struct page *page;
> + int err;
> +
> + page = get_ksm_page(stable_node, true);
> + if (!page) {
> + /*
> +  * get_ksm_page did remove_node_from_stable_tree itself.
> +  */
> + return 0;
> + }
> +
> + if (WARN_ON_ONCE(page_mapped(page)))
> + err = 

Re: [PATCH v3 02/71] ARC: Build system: Makefiles, Kconfig, Linker script

2013-01-27 Thread Vineet Gupta
Hi Sam,

On Thursday 24 January 2013 04:20 PM, Vineet Gupta wrote:
> Arnd in his review pointed out that arch Kconfig organisation has several
> deficiencies:
> 
> * Build time entries for things which can be runtime extracted from DT
>   (e.g. SDRAM size, core clk frequency..)
> * Not multi-platform-image-build friendly (choice .. endchoice constructs)
> * cpu variants support (750/770) is exclusive.
> 
> The first 2 have been fixed in subsequent patches.
> Due to the nature of the 750 and 770, it is not possible to build for
> both together, w/o special runtime glue code which would hurt
> performance.
> 
> Signed-off-by: Vineet Gupta 
> Cc: Arnd Bergmann 
> Cc: Sam Ravnborg 
> ---
>  arch/arc/Kbuild|2 +
>  arch/arc/Kconfig   |  328 
> 
>  arch/arc/Kconfig.debug |   34 
>  arch/arc/Makefile  |  115 ++
>  arch/arc/boot/Makefile |   26 +++
>  arch/arc/include/asm/Kbuild|1 +
>  arch/arc/kernel/Makefile   |   16 ++
>  arch/arc/kernel/arcksyms.c |   56 +++
>  arch/arc/kernel/asm-offsets.c  |   46 ++
>  arch/arc/kernel/vmlinux.lds.S  |  116 ++
>  arch/arc/lib/Makefile  |9 +
>  arch/arc/mm/Makefile   |   10 ++
>  arch/arc/plat-arcfpga/Kconfig  |   33 
>  arch/arc/plat-arcfpga/Makefile |9 +
>  14 files changed, 801 insertions(+), 0 deletions(-)
>  create mode 100644 arch/arc/Kbuild
>  create mode 100644 arch/arc/Kconfig
>  create mode 100644 arch/arc/Kconfig.debug
>  create mode 100644 arch/arc/Makefile
>  create mode 100644 arch/arc/boot/Makefile
>  create mode 100644 arch/arc/kernel/Makefile
>  create mode 100644 arch/arc/kernel/arcksyms.c
>  create mode 100644 arch/arc/kernel/asm-offsets.c
>  create mode 100644 arch/arc/kernel/vmlinux.lds.S
>  create mode 100644 arch/arc/lib/Makefile
>  create mode 100644 arch/arc/mm/Makefile
>  create mode 100644 arch/arc/plat-arcfpga/Kconfig
>  create mode 100644 arch/arc/plat-arcfpga/Makefile
> 
> diff --git a/arch/arc/Kbuild b/arch/arc/Kbuild
> new file mode 100644
> index 000..082d329
> --- /dev/null
> +++ b/arch/arc/Kbuild
> @@ -0,0 +1,2 @@
> +obj-y += kernel/
> +obj-y += mm/
> diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
> new file mode 100644
> index 000..b0b09ae
> --- /dev/null
> +++ b/arch/arc/Kconfig
> @@ -0,0 +1,328 @@
> +#
> +# Copyright (C) 2004, 2007-2010, 2011-2012 Synopsys, Inc. (www.synopsys.com)
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License version 2 as
> +# published by the Free Software Foundation.
> +#
> +
> +config ARC
> + def_bool y
> + select ARCH_NO_VIRT_TO_BUS
> + # ARC Busybox based initramfs absolutely relies on DEVTMPFS for /dev
> + select DEVTMPFS if !INITRAMFS_SOURCE=""
> + select GENERIC_ATOMIC64
> + select GENERIC_CLOCKEVENTS
> + select GENERIC_FIND_FIRST_BIT
> + # for now, we don't need GENERIC_IRQ_PROBE, CONFIG_GENERIC_IRQ_CHIP
> + select GENERIC_IRQ_SHOW
> + select GENERIC_PENDING_IRQ if SMP
> + select GENERIC_SMP_IDLE_THREAD
> + select HAVE_GENERIC_HARDIRQS
> + select MODULES_USE_ELF_RELA
> +
> +config SCHED_OMIT_FRAME_POINTER
> + def_bool y
> +
> +config GENERIC_CSUM
> + def_bool y
> +
> +config RWSEM_GENERIC_SPINLOCK
> + def_bool y
> +
> +config ARCH_FLATMEM_ENABLE
> + def_bool y
> +
> +config MMU
> + def_bool y
> +
> +config NO_IOPORT
> + def_bool y
> +
> +config GENERIC_CALIBRATE_DELAY
> + def_bool y
> +
> +config GENERIC_HWEIGHT
> + def_bool y
> +
> +config BINFMT_ELF
> + def_bool y
> +
> +config HAVE_LATENCYTOP_SUPPORT
> + def_bool y
> +
> +config NO_DMA
> + def_bool n
> +
> +source "init/Kconfig"
> +source "kernel/Kconfig.freezer"
> +
> +menu "ARC Architecture Configuration"
> +
> +choice
> + prompt "ARC Platform"
> + default ARC_PLAT_FPGA_LEGACY
> +
> +config ARC_PLAT_FPGA_LEGACY
> + bool "\"Legacy\" ARC FPGA dev platform"
> + help
> +   Support for ARC development platforms, provided by Synopsys.
> +   These are based on FPGA or ISS. e.g.
> +   - ARCAngel4
> +   - ML509
> +   - MetaWare ISS
> +
> +#New platform adds here
> +endchoice
> +
> +menu "ARC CPU Configuration"
> +
> +choice
> + prompt "ARC Core"
> + default ARC_CPU_770
> +
> +config ARC_CPU_750D
> + bool "ARC750D"
> + help
> +   Support for ARC750 core
> +
> +config ARC_CPU_770
> + bool "ARC770"
> + select ARC_CPU_REL_4_10
> + help
> +   Support for ARC770 core introduced with Rel 4.10 (Summer 2011)
> +   This core has a bunch of cool new features:
> +   -MMU-v3: Variable Page Sz (4k, 8k, 16k), bigger J-TLB (128x4)
> +   Shared Address Spaces (for sharing TLB entires in MMU)
> +   -Caches: New Prog Model, Region Flush
> +   -Insns: endian swap, load-locked/store-conditional, time-stamp-ctr
> +
> 

Re: [PATCH] Subtract min_free_kbytes from dirtyable memory

2013-01-27 Thread Minchan Kim
On Fri, Jan 25, 2013 at 08:53:24PM +1100, paul.sz...@sydney.edu.au wrote:
> Dear Minchan,
> 
> > So what's the effect for user?
> > ...
> > It seems you saw old kernel.
> > ...
> > Current kernel includes ...
> > So I think we don't need this patch.
> 
> As I understand now, my patch is "right" and needed for older kernels;
> for newer kernels, the issue has been fixed in equivalent ways; it was
> an oversight that the change was not backported; and any justification
> you need, you can get from those "later better" patches.

I don't know your problem because you didn't write down your problem in
changelog. Anyway, If you want to apply it into older kernel,
please read Documentation/stable_kernel_rules.txt.

In summary,

1. Define your problem.
2. Apply your fix to see the problem goes away in older kernel.
3. If so, write the problem and effect in changelog
4. Send it to stable maintainers and mm maintainer

That's all.

> 
> I asked:
> 
>   A question: what is the use or significance of vm_highmem_is_dirtyable?
>   It seems odd that it would be used in setting limits or threshholds, but
>   not used in decisions where to put dirty things. Is that so, is that as
>   should be? What is the recommended setting of highmem_is_dirtyable?
> 
> The silence is deafening. I guess highmem_is_dirtyable is an aberration.

I hope this helps you find primary reason of your problem.
git show 195cf453


> 
> Thanks, Paul
> 
> Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
> School of Mathematics and Statistics   University of SydneyAustralia
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Tux3 Report: Initial fsck has landed

2013-01-27 Thread David Lang

On Sun, 27 Jan 2013, Daniel Phillips wrote:


Compared to Ext2/3/4, Tux3 has a big disadvantage in terms of fsck: it does
not confine inode table blocks to fixed regions of the volume. Tux3 may store
any metadata block anywhere, and tends to stir things around to new locations
during normal operation. To overcome this disadvantage, we have the concept of
uptags:

   http://phunq.net/pipermail/tux3/2013-January/001973.html
   "What are uptags?"

With uptags we should be able to fall back to a full scan of a damaged volume
and get a pretty good idea of which blocks are actually lost metadata blocks,
and to which filesystem objects they might belong.


The thing that jumps out at me with this is the question of how you will avoid 
the 'filesystem image in a file' disaster that reiserfs had (where it's fsck 
could mix up metadata chunks from the main filesystem with metadata chunks from 
any filesystem images that it happened to stumble across when scanning the disk)


many people with dd if=/dev/sda2 of=filesystem.image, and if you are doing 
virtualization, you may be running out of one of these filesystem images. With 
virtualization, it's very likely that you will have many copies of a single 
image that are all identical.


have you thought of how to deal with this problem?

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Mike Galbraith
On Mon, 2013-01-28 at 13:51 +0800, Alex Shi wrote: 
> On 01/28/2013 01:17 PM, Mike Galbraith wrote:
> > On Sun, 2013-01-27 at 16:51 +0100, Mike Galbraith wrote: 
> >> On Sun, 2013-01-27 at 21:25 +0800, Alex Shi wrote: 
> >>> On 01/27/2013 06:35 PM, Borislav Petkov wrote:
>  On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote:
> > With aim7 compute on 4 node 40 core box, I see stable throughput
> > improvement at tasks = nr_cores and below w. balance and powersaving. 
> >> ... 
>  Ok, this is sick. How is balance and powersaving better than perf? Both
>  have much more jobs per minute than perf; is that because we do pack
>  much more tasks per cpu with balance and powersaving?
> >>>
> >>> Maybe it is due to the lazy balancing on balance/powersaving. You can
> >>> check the CS times in /proc/pid/status.
> >>
> >> Well, it's not wakeup path, limiting entry frequency per waker did zip
> >> squat nada to any policy throughput.
> > 
> > monteverdi:/abuild/mike/:[0]# echo powersaving > 
> > /sys/devices/system/cpu/sched_policy/current_sched_policy
> > monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> > 043321  00058616
> > 043313  00058616
> > 043318  00058968
> > 043317  00058968
> > 043316  00059184
> > 043319  00059192
> > 043320  00059048
> > 043314  00059048
> > 043312  00058176
> > 043315  00058184
> > monteverdi:/abuild/mike/:[0]# echo balance > 
> > /sys/devices/system/cpu/sched_policy/current_sched_policy
> > monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> > 043337  00053448
> > 04  00053456
> > 043338  00052992
> > 043331  00053448
> > 043332  00053488
> > 043335  00053496
> > 043334  00053480
> > 043329  00053288
> > 043336  00053464
> > 043330  00053496
> > monteverdi:/abuild/mike/:[0]# echo performance > 
> > /sys/devices/system/cpu/sched_policy/current_sched_policy
> > monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> > 043348  00052488
> > 043344  00052488
> > 043349  00052744
> > 043343  00052504
> > 043347  00052504
> > 043352  00052888
> > 043345  00052504
> > 043351  00052496
> > 043346  00052496
> > 043350  00052304
> > monteverdi:/abuild/mike/:[0]#
> 
> similar with aim7 results. Thanks, Mike!
> 
> Wold you like to collect vmstat info in background?
> > 
> > Zzzt.  Wish I could turn turbo thingy off.
> 
> Do you mean the turbo mode of cpu frequency? I remember some of machine
> can disable it in BIOS.

Yeah, I can do that in my local x3550 box.  I can't fiddle with BIOS
settings on the remote NUMA box.

This can't be anything but turbo gizmo mucking up the numbers I think,
not that the numbers are invalid or anything, better numbers are better
numbers no matter where/how they come about ;-)

The massive_intr load is dirt simple sleep/spin with bean counting.  It
sleeps 1ms spins 8ms.  Change that to sleep 8ms, grind away for 1ms...

monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60
045150  6484
045157  6427
045156  6401
045152  6428
045155  6372
045154  6370
045158  6453
045149  6372
045151  6371
045153  6371
monteverdi:/abuild/mike/:[0]# echo balance > 
/sys/devices/system/cpu/sched_policy/current_sched_policy
monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60
045170  6380
045172  6374
045169  6376
045175  6376
045171  6334
045176  6380
045168  6374
045174  6334
045177  6375
045173  6376
monteverdi:/abuild/mike/:[0]# echo performance > 
/sys/devices/system/cpu/sched_policy/current_sched_policy
monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60
045198  6408
045191  6408
045197  6408
045192  6411
045194  6409
045196  6409
045195  6336
045189  6336
045193  6411
045190  6410

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Tux3 Report: Initial fsck has landed

2013-01-27 Thread Daniel Phillips
On Sun, Jan 27, 2013 at 10:02 PM, David Lang  wrote:
> On Sun, 27 Jan 2013, Daniel Phillips wrote:
> The thing that jumps out at me with this is the question of how you will
> avoid the 'filesystem image in a file' disaster that reiserfs had (where
> it's fsck could mix up metadata chunks from the main filesystem with
> metadata chunks from any filesystem images that it happened to stumble
> across when scanning the disk)
>
> many people with dd if=/dev/sda2 of=filesystem.image, and if you are doing
> virtualization, you may be running out of one of these filesystem images.
> With virtualization, it's very likely that you will have many copies of a
> single image that are all identical.
>
> have you thought of how to deal with this problem?
>
> David Lang

Only superficially. Deep thoughts are in order. First, there needs to be a
hole in the filesystem structure, before we would even consider trying to
plug something in there. Once we know there is a hole, we want to
narrow down the list of candidates to fill it. If a candidate already lies
within a perfectly viable file, obviously we would not want to interpret
that as lost metadata. Unless the filesystem is really mess up...

That is about as far as I have got with the analysis. Clearly, much more
is required. Suggestions welcome.

Regards,

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: One of these things (CONFIG_HZ) is not like the others..

2013-01-27 Thread Santosh Shilimkar

On Tuesday 22 January 2013 08:35 PM, Santosh Shilimkar wrote:

On Tuesday 22 January 2013 08:21 PM, Russell King - ARM Linux wrote:

On Tue, Jan 22, 2013 at 03:44:03PM +0530, Santosh Shilimkar wrote:

Sorry for not being clear enough. On OMAP, 32KHz is the only clock which
is always running(even during low power states) and hence the clock
source and clock event have been clocked using 32KHz clock. As mentioned
by RMK, with 32768 Hz clock and HZ = 100, there will be always an
error of 0.1 %. This accuracy also impacts the timer tick interval.
This was the reason, OMAP has been using the HZ = 128.


Ok.  Let's look at this.  As far as time-of-day is concerned, this
shouldn't really matter with the clocksource/clockevent based system
that we now have (where *important point* platforms have been converted
over.)

Any platform providing a clocksource will override the jiffy-based
clocksource.  The measurement of time-of-day passing is now based on
the difference in values read from the clocksource, not from the actual
tick rate.

Anything _not_ providing a clock source will be reliant on jiffies
incrementing, which in turn _requires_ one timer interrupt per jiffies
at a known rate (which is HZ).

Now, that's the time of day, what about jiffies?  Well, jiffies is
incremented based on a certain number of nsec having passed since the
last jiffy update.  That means the code copes with dropped ticks and
the like.

However, if your actual interrupt rate is close to the desired HZ, then
it can lead to some interesting effects (and noise):

- if the interrupt rate is slightly faster than HZ, then you can end up
   with updates being delayed by 2x interrupt rate.
- if the interrupt rate is slightly slower than HZ, you can occasionally
   end up with jiffies incrementing by two.
- if your interrupt rate is dead on HZ, then other system noise can come
   into effect and you may get maybe zero, one or two jiffy increments
per
   interrupt.

(You have to think about time passing in NS, where jiffy updates should
be vs where the timer interrupts happen.)  See tick_do_update_jiffies64()
for the details.

The timer infrastructure is jiffy based - which includes scheduling where
the scheduler does not use hrtimers.  That means a slight discrepency
between HZ and the actual interrupt rate can cause around 1/HZ jitter.
That's a matter of fact due to how the code works.

So, actually, I think the accuracy of HZ has much overall effect
_provided_
a platform provides a clocksource to the accuracy of jiffy based timers
nor timekeeping.  For those which don't, the accuracy of the timer
interrupt to HZ is very important.

(This is just based on reading some code and not on practical
experiments - I'd suggest some research of this is done, trying HZ=100
on OMAP's 32kHz timers, checking whether there's any drift, checking
how accurately a single task can be woken from various select/poll/epoll
delays, and checking whether NTP works.)


Thanks for expanding it. It is really helpful.


And I think further discussion is pointless until such research has been
done (or someone who _really_ knows the time keeping/timer/sched code
inside out comments.)


Fully agree about experimentation to re-asses the drift.
 From what I recollect from past, few OMAP customers did
report the time drift issue and that is how the switch
from 100 --> 128 happened.

Anyway I have added the suggested task to my long todo list.


So I tried to see if any time drift with HZ = 100 on OMAP. I ran the
setup for 62 hours and 27 mins with time synced up once with NTP server.
I measure about ~174 millisecond drift which is almost noise considering
the observed duration was ~22482 milliseconds.

Am re-running the setup with HZ = 128 for similar time frame to see if
the minimal drift observed goes away.

Once through that, I will send a patch to update the OMAP to use
HZ = 100 and possibly get rid of the custom OMAP HZ config.

Regards,
Santosh


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT] Networking

2013-01-27 Thread David Miller

Much more accumulated than I would have liked due to an
unexpected bout with a nasty flu.

1) AH and ESP input don't set ECN field correctly because the transport
   head of the SKB isn't set correctly, fix from Li RongQing.

2) If netfilter conntrack zones are disabled, we can return an uninitialized
   variable instead of the proper error code.  Fix from Borislav Petkov.

3) Fix double SKB free in ath9k driver beacon handling, from Felix Feitkau.

4) Remove bogus assumption about netns cleanup ordering in nf_conntrack,
   from Pablo Neira Ayuso.

5) Remove a bogus BUG_ON in the new TCP fastopen code, from Eric
   Dumazet.  It uses spin_is_locked() in it's test and is therefore
   unsuitable for UP.

6) Fix SELINUX labelling regressions added by the tuntap multiqueue
   changes, from Paul Moore.

7) Fix CRC errors with jumbo frame receive in tg3 driver, from Nithin
   Nayak Sujir.

8) CXGB4 driver sets interrupt coalescing parameters only on first
   queue, rather than all of them.  Fix from Thadeu Lima de Souza
   Cascardo.

9) Fix regression in the dispatch of read/write registers in dm9601
   driver, from Tushar Behera.

10) ipv6_append_data miscalculates header length, from Romain KUNTZ.

11) Fix PMTU handling regressions on ipv4 routes, from Steffen
Klassert, Timo Teräs, and Julian Anastasov.

12) In 3c574_cs driver, add necessary parenthesis to "x << y & z"
expression.  From Nickolai Zeldovich.

13) macvlan_get_size() causes underallocation netlink message space,
fix from Eric Dumazet.

14) Avoid division by zero in xfrm_replay_advance_bmp(), from Nickolai
Zeldovich.  Amusingly the zero check was already there, we were
just performing it after the modulus :-)

15) Some more splice bug fixes from Eric Dumazet, which fix things mostly
eminating from how we now more aggressively use high-order pages in
SKBs.

16) Fix size calculation bug when freeing hash tables in the IPSEC xfrm
code, from Michal Kubecek.

17) Fix PMTU event propagation into socket cached routes, from Steffen
Klassert.

18) Fix off by one in TX buffer release in netxen driver, from Eric
Dumazet.

19) Fix rediculous memory allocation requirements introduced by the
tuntap multiqueue changes, from Jason Wang.

20) Remove bogus AMD platform workaround in r8169 driver that causes major
problems in normal operation, from Timo Teräs.

21) virtio-net set affinity and select queue don't handle discontiguous
cpu numbers properly, fix from Wanlong Gao.

22) Fix a route refcounting issue in loopback driver, from Eric Dumazet.
There's a similar fix coming that we might add to the macvlan driver
as well.

23) Fix SKB leaks in batman-adv's distributed arp table code, from
Matthias Schiffer.

24) r8169 driver gives descriptor ownership back the hardware before we're
done reading the VLAN tag out of it, fix from Francois Romieu.

25) Checksums not calculated properly in GRE tunnel driver fix from
Pravin B Shelar.

26) Fix SCTP memory leak on namespace exit.

Please pull, thanks a lot!

The following changes since commit 3152ba0f86428cebe8a9f8462d5be0a9aefa6289:

  Merge tag 'dt-fixes-for-3.8' of git://sources.calxeda.com/kernel/linux 
(2013-01-14 13:19:08 -0800)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master

for you to fetch changes up to 6642f91c92da07369cf1e582503ea3ccb4a7f1a9:

  dm9601: support dm9620 variant (2013-01-28 00:18:04 -0500)


AceLan Kao (3):
  Bluetooth: Add support for IMC Networks [13d3:3393]
  Bluetooth: Add support for Foxconn / Hon Hai [0489:e04e]
  Bluetooth: Add support for Foxconn / Hon Hai [0489:e056]

Amitkumar Karwar (2):
  mwifiex: update config_bands during infra association
  mwifiex: correct config_bands handling for ibss network

Anderson Lizardo (1):
  Bluetooth: Fix incorrect strncpy() in hidp_setup_hid()

Avinash Patil (1):
  mwifiex: fix typo in PCIe adapter NULL check

Bjørn Mork (7):
  net: qmi_wwan: add TP-LINK HSUPA Modem MA180
  net: qmi_wwan: add ONDA MT8205 4G LTE
  net: cdc_ncm: workaround for missing CDC Union
  net: cdc_mbim: send ZLP after max sized NTBs
  net: cdc_ncm: fix error path for single interface probing
  net: cdc_mbim: send ZLP only for the specific buggy device
  net: cdc_ncm: use IAD provided by the USB core

Bob Copeland (2):
  mac80211: set NEED_TXPROCESSING for PERR frames
  mac80211: add encrypt headroom to PERR frames

Dan Carpenter (1):
  ip6mr: limit IPv6 MRT_TABLE identifiers

Daniel Schaal (1):
  Bluetooth: Add support for GC-WB300D PCIe [04ca:3006] to ath3k.

Daniel Wagner (1):
  net: net_cls: fd passed in SCM_RIGHTS datagram not set correctly

David S. Miller (6):
  Merge branch 'master' of git://1984.lsi.us.es/nf
  Merge branch 'wireless'
  Merge branch 'usb_cdc_fixes'
  Merge branch 

[PATCH V3 1/3] mtd: add new fields to nand_flash_dev{}

2013-01-27 Thread Huang Shijie
As time goes on, we begin to meet the situation that we can not get enough
information from some nand chips's id data. Take some Toshiba's nand chips
for example. I have 4 Toshiba's nand chips in my hand:
TC58NVG2S0F, TC58NVG3S0F, TC58NVG5D2, TC58NVG6D2

When we read these chips' datasheets, we will get the geometry of these chips:
TC58NVG2S0F : 4096 + 224
TC58NVG3S0F : 4096 + 232
TC58NVG5D2  : 8192 + 640
TC58NVG6D2  : 8192 + 640

But we can not parse out the correct oob size for these chips from the id data.
So it is time to add some new fields to the nand_flash_dev{}, and update the
detection mechanisms.

This patch just adds some new fields to the nand_flash_dev{}:
  @id[8] : the 8 bytes id data.
  @id_len: the valid length of the id data.
  @oobsize: the oob size.

Signed-off-by: Huang Shijie 
---
 drivers/mtd/devices/doc2000.c |2 +-
 drivers/mtd/devices/doc2001.c |2 +-
 drivers/mtd/devices/doc2001plus.c |2 +-
 drivers/mtd/nand/nand_base.c  |2 +-
 drivers/mtd/nand/nand_ids.c   |  196 -
 drivers/mtd/nand/nandsim.c|2 +-
 drivers/mtd/nand/pxa3xx_nand.c|2 +-
 drivers/mtd/nand/sm_common.c  |   61 ++--
 include/linux/mtd/nand.h  |8 +-
 9 files changed, 147 insertions(+), 130 deletions(-)

diff --git a/drivers/mtd/devices/doc2000.c b/drivers/mtd/devices/doc2000.c
index a4eb8b5..93f037f 100644
--- a/drivers/mtd/devices/doc2000.c
+++ b/drivers/mtd/devices/doc2000.c
@@ -379,7 +379,7 @@ static int DoC_IdentChip(struct DiskOnChip *doc, int floor, 
int chip)
 
/* Print and store the manufacturer and ID codes. */
for (i = 0; nand_flash_ids[i].name != NULL; i++) {
-   if (id == nand_flash_ids[i].id) {
+   if (id == nand_flash_ids[i].id[1]) {
/* Try to identify manufacturer */
for (j = 0; nand_manuf_ids[j].id != 0x0; j++) {
if (nand_manuf_ids[j].id == mfr)
diff --git a/drivers/mtd/devices/doc2001.c b/drivers/mtd/devices/doc2001.c
index f692795..15dd177 100644
--- a/drivers/mtd/devices/doc2001.c
+++ b/drivers/mtd/devices/doc2001.c
@@ -206,7 +206,7 @@ static int DoC_IdentChip(struct DiskOnChip *doc, int floor, 
int chip)
 
/* FIXME: to deal with multi-flash on multi-Millennium case more 
carefully */
for (i = 0; nand_flash_ids[i].name != NULL; i++) {
-   if ( id == nand_flash_ids[i].id) {
+   if (id == nand_flash_ids[i].id[1]) {
/* Try to identify manufacturer */
for (j = 0; nand_manuf_ids[j].id != 0x0; j++) {
if (nand_manuf_ids[j].id == mfr)
diff --git a/drivers/mtd/devices/doc2001plus.c 
b/drivers/mtd/devices/doc2001plus.c
index 4f2220a..80aef1b 100644
--- a/drivers/mtd/devices/doc2001plus.c
+++ b/drivers/mtd/devices/doc2001plus.c
@@ -314,7 +314,7 @@ static int DoC_IdentChip(struct DiskOnChip *doc, int floor, 
int chip)
return 0;
 
for (i = 0; nand_flash_ids[i].name != NULL; i++) {
-   if (id == nand_flash_ids[i].id) {
+   if (id == nand_flash_ids[i].id[1]) {
/* Try to identify manufacturer */
for (j = 0; nand_manuf_ids[j].id != 0x0; j++) {
if (nand_manuf_ids[j].id == mfr)
diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index a8c1fb4..0e80ec4 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -3204,7 +3204,7 @@ static struct nand_flash_dev *nand_get_flash_type(struct 
mtd_info *mtd,
type = nand_flash_ids;
 
for (; type->name != NULL; type++)
-   if (*dev_id == type->id)
+   if (*dev_id == type->id[1])
break;
 
chip->onfi_version = 0;
diff --git a/drivers/mtd/nand/nand_ids.c b/drivers/mtd/nand/nand_ids.c
index e3aa274..99949f6 100644
--- a/drivers/mtd/nand/nand_ids.c
+++ b/drivers/mtd/nand/nand_ids.c
@@ -10,6 +10,8 @@
  */
 #include 
 #include 
+#include 
+
 /*
 *  Chip ID list
 *
@@ -24,47 +26,59 @@
 struct nand_flash_dev nand_flash_ids[] = {
 
 #ifdef CONFIG_MTD_NAND_MUSEUM_IDS
-   {"NAND 1MiB 5V 8-bit",  0x6e, 256, 1, 0x1000, 0},
-   {"NAND 2MiB 5V 8-bit",  0x64, 256, 2, 0x1000, 0},
-   {"NAND 4MiB 5V 8-bit",  0x6b, 512, 4, 0x2000, 0},
-   {"NAND 1MiB 3,3V 8-bit",0xe8, 256, 1, 0x1000, 0},
-   {"NAND 1MiB 3,3V 8-bit",0xec, 256, 1, 0x1000, 0},
-   {"NAND 2MiB 3,3V 8-bit",0xea, 256, 2, 0x1000, 0},
-   {"NAND 4MiB 3,3V 8-bit",0xd5, 512, 4, 0x2000, 0},
-   {"NAND 4MiB 3,3V 8-bit",0xe3, 512, 4, 0x2000, 0},
-   {"NAND 4MiB 3,3V 8-bit",0xe5, 512, 4, 0x2000, 0},
-   {"NAND 8MiB 3,3V 8-bit",0xd6, 512, 8, 0x2000, 0},
-
-   {"NAND 8MiB 1,8V 8-bit",

[PATCH V3 3/3] mtd: add the new detection method for the unparsable nand chips

2013-01-27 Thread Huang Shijie
Add the new detection method find_nand_type_by_fullid() for the
unparsable nand chips. The new detection method is called firstly
before all the other detection methods.

Signed-off-by: Huang Shijie 
---
 drivers/mtd/nand/nand_base.c |   33 -
 1 files changed, 32 insertions(+), 1 deletions(-)

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index 0e80ec4..7f0431d 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -3152,6 +3152,29 @@ static void nand_decode_bbm_options(struct mtd_info *mtd,
chip->bbt_options |= NAND_BBT_SCAN2NDPAGE;
 }
 
+static struct nand_flash_dev *find_nand_type_by_fullid(struct mtd_info *mtd,
+   struct nand_chip *chip, u8 *id_data, int *busw)
+{
+   struct nand_flash_dev *type = nand_flash_full_ids;
+
+   for (; type->name != NULL; type++) {
+   if (!strncmp(type->id, id_data, type->id_len)) {
+   mtd->writesize = type->pagesize;
+   mtd->erasesize = type->erasesize;
+   mtd->oobsize = type->oobsize;
+
+   chip->cellinfo = id_data[2];
+   chip->chipsize = (uint64_t)type->chipsize << 20;
+   chip->options |= type->options;
+
+   *busw = type->options & NAND_BUSWIDTH_16;
+
+   break;
+   }
+   }
+   return type;
+}
+
 /*
  * Get the flash and manufacturer id and lookup if the type is supported.
  */
@@ -3200,8 +3223,16 @@ static struct nand_flash_dev *nand_get_flash_type(struct 
mtd_info *mtd,
return ERR_PTR(-ENODEV);
}
 
-   if (!type)
+   if (!type) {
+   /*
+* Some nand chips's information can not be paresed out
+* from the id data. So, try your luck in the full-id table.
+*/
+   type = find_nand_type_by_fullid(mtd, chip, id_data, );
+   if (type->name != NULL)
+   goto ident_done;
type = nand_flash_ids;
+   }
 
for (; type->name != NULL; type++)
if (*dev_id == type->id[1])
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V3 0/3] mtd: use the full-id as the keyword

2013-01-27 Thread Huang Shijie
From: Huang Shijie 

I ever submitted a patch to use the full-id as the keyword for
some unparsable nand chips. This is the second tries.

As time goes on, we begin to meet the situation that we can not
get enough information from some nand chips's id data.
Take some Toshiba's nand chips for example.
I have 4 Toshiba's nand chips in my hand:
TC58NVG2S0F, TC58NVG3S0F, TC58NVG5D2, TC58NVG6D2

When we read these chips' datasheets, we will get the geometry of these chips:
TC58NVG2S0F : 4096 + 224
TC58NVG3S0F : 4096 + 232
TC58NVG5D2  : 8192 + 640
TC58NVG6D2  : 8192 + 640

But we can not parse out the correct oob size for these chips from the id data.
So it is time to add some new fields to the nand_flash_dev{},
and update the detection mechanisms.

v2 --> v3:
[1] remove the duplicated header.
[2] remove the field "ecc_len" in nand_flash_dev{}.
[3] fix some coding style warnings.
[4] add more comments

Huang Shijie (3):
  mtd: add new fields to nand_flash_dev{}
  mtd: add a new table for the unparsable nand chips
  mtd: add the new detection method for the unparsable nand chips

 drivers/mtd/devices/doc2000.c |2 +-
 drivers/mtd/devices/doc2001.c |2 +-
 drivers/mtd/devices/doc2001plus.c |2 +-
 drivers/mtd/nand/nand_base.c  |   35 ++-
 drivers/mtd/nand/nand_ids.c   |  217 +
 drivers/mtd/nand/nandsim.c|2 +-
 drivers/mtd/nand/pxa3xx_nand.c|2 +-
 drivers/mtd/nand/sm_common.c  |   61 +--
 include/linux/mtd/nand.h  |9 ++-
 9 files changed, 201 insertions(+), 131 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V3 2/3] mtd: add a new table for the unparsable nand chips

2013-01-27 Thread Huang Shijie
We have 4 Toshiba nand chips which can not be parsed out by the
id data.  Add a new table for the unparsable nand chips.

It makes mess if we add these entries to the nand_flash_ids table.
The entries in the nand_flash_ids stands for a class of nand chips.
But the unparsable nand chips are just some singular chips.

Signed-off-by: Huang Shijie 
---
 drivers/mtd/nand/nand_ids.c |   21 +
 include/linux/mtd/nand.h|1 +
 2 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/drivers/mtd/nand/nand_ids.c b/drivers/mtd/nand/nand_ids.c
index 99949f6..4147c78 100644
--- a/drivers/mtd/nand/nand_ids.c
+++ b/drivers/mtd/nand/nand_ids.c
@@ -12,6 +12,27 @@
 #include 
 #include 
 
+/* This table uses the full ID data as the keyword. */
+struct nand_flash_dev nand_flash_full_ids[] = {
+   /* TOSHIBA */
+   {"TC58NVG2S0F 4G 3.3V 8-bit ",
+   {0x98, 0xdc, 0x90, 0x26, 0x76, 0x15, 0x01, 0x08},
+   SZ_4K, SZ_512, SZ_256K, 0, 8, 224},
+   {"TC58NVG3S0F 8G 3.3V 8-bit ",
+   {0x98, 0xd3, 0x90, 0x26, 0x76, 0x15, 0x02, 0x08},
+   SZ_4K, SZ_1K, SZ_256K, 0, 8, 232},
+   {"TC58NVG5D2 32G 3.3V 8-bit ",
+   {0x98, 0xd7, 0x94, 0x32, 0x76, 0x56, 0x09, 0x00},
+   SZ_8K, SZ_4K, SZ_1M, 0, 8, 640},
+   {"TC58NVG6D2 64G 3.3V 8-bit ",
+   {0x98, 0xde, 0x94, 0x82, 0x76, 0x56, 0x04, 0x20},
+   SZ_8K, SZ_8K, SZ_2M, 0, 8, 640},
+
+   /* end here */
+   {NULL,}
+};
+
+
 /*
 *  Chip ID list
 *
diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h
index d8fd638..119c8e0 100644
--- a/include/linux/mtd/nand.h
+++ b/include/linux/mtd/nand.h
@@ -618,6 +618,7 @@ struct nand_manufacturers {
 };
 
 extern struct nand_flash_dev nand_flash_ids[];
+extern struct nand_flash_dev nand_flash_full_ids[];
 extern struct nand_manufacturers nand_manuf_ids[];
 
 extern int nand_scan_bbt(struct mtd_info *mtd, struct nand_bbt_descr *bd);
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Alex Shi
On 01/28/2013 01:17 PM, Mike Galbraith wrote:
> On Sun, 2013-01-27 at 16:51 +0100, Mike Galbraith wrote: 
>> On Sun, 2013-01-27 at 21:25 +0800, Alex Shi wrote: 
>>> On 01/27/2013 06:35 PM, Borislav Petkov wrote:
 On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote:
> With aim7 compute on 4 node 40 core box, I see stable throughput
> improvement at tasks = nr_cores and below w. balance and powersaving. 
>> ... 
 Ok, this is sick. How is balance and powersaving better than perf? Both
 have much more jobs per minute than perf; is that because we do pack
 much more tasks per cpu with balance and powersaving?
>>>
>>> Maybe it is due to the lazy balancing on balance/powersaving. You can
>>> check the CS times in /proc/pid/status.
>>
>> Well, it's not wakeup path, limiting entry frequency per waker did zip
>> squat nada to any policy throughput.
> 
> monteverdi:/abuild/mike/:[0]# echo powersaving > 
> /sys/devices/system/cpu/sched_policy/current_sched_policy
> monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> 043321  00058616
> 043313  00058616
> 043318  00058968
> 043317  00058968
> 043316  00059184
> 043319  00059192
> 043320  00059048
> 043314  00059048
> 043312  00058176
> 043315  00058184
> monteverdi:/abuild/mike/:[0]# echo balance > 
> /sys/devices/system/cpu/sched_policy/current_sched_policy
> monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> 043337  00053448
> 04  00053456
> 043338  00052992
> 043331  00053448
> 043332  00053488
> 043335  00053496
> 043334  00053480
> 043329  00053288
> 043336  00053464
> 043330  00053496
> monteverdi:/abuild/mike/:[0]# echo performance > 
> /sys/devices/system/cpu/sched_policy/current_sched_policy
> monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> 043348  00052488
> 043344  00052488
> 043349  00052744
> 043343  00052504
> 043347  00052504
> 043352  00052888
> 043345  00052504
> 043351  00052496
> 043346  00052496
> 043350  00052304
> monteverdi:/abuild/mike/:[0]#

similar with aim7 results. Thanks, Mike!

Wold you like to collect vmstat info in background?
> 
> Zzzt.  Wish I could turn turbo thingy off.

Do you mean the turbo mode of cpu frequency? I remember some of machine
can disable it in BIOS.
> 
> -Mike
> 


-- 
Thanks Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] perf: Add the rcu_read_lock to protect the list_for_each_entry_rcu.

2013-01-27 Thread Jun Chen

The list_for_each_entry_rcu should be guarded by rcu_read_lock().This patch add
rcu_read_lock to protect the list_for_each_entry_rcu.

Signed-off-by: Chen Jun 
---
 kernel/events/core.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 301079d..e2f2fa5 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2442,6 +2442,7 @@ static void perf_adjust_freq_unthr_context(struct 
perf_event_context *ctx,
 
raw_spin_lock(>lock);
perf_pmu_disable(ctx->pmu);
+   rcu_read_lock();
 
list_for_each_entry_rcu(event, >event_list, event_entry) {
if (event->state != PERF_EVENT_STATE_ACTIVE)
@@ -2483,6 +2484,7 @@ static void perf_adjust_freq_unthr_context(struct 
perf_event_context *ctx,
event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0);
}
 
+   rcu_read_unlock();
perf_pmu_enable(ctx->pmu);
raw_spin_unlock(>lock);
 }
-- 
1.7.4.1



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] mfd: palma: add RTC and GPIO support

2013-01-27 Thread Laxman Dewangan

On Monday 28 January 2013 02:58 AM, Samuel Ortiz wrote:

Hi Laxman,

On Sun, Jan 27, 2013 at 02:15:43PM +0530, Laxman Dewangan wrote:

On Sunday 27 January 2013 05:58 AM, Samuel Ortiz wrote:

Hi Laxman,

On Thu, Jan 03, 2013 at 04:16:56PM +0530, Laxman Dewangan wrote:

This series add the RTC and gpio driver for the TI Palma series PMIC.
The changes are splitted so that easy to apply in different sub systems.

Laxman Dewangan (4):
   mfd: palmas: add rtc irq number as irq resource for palmas-rtc
   mfd: palmas: add apis to access the Palmas' registers
   gpio: palmas: Add support for Palams GPIO
   rtc: palmas: add RTC driver Palmas series PMIC

All 4 patches applied to my for-next branch, thanks.

Hi Samuel,

Thanks for taking care. But it seems 2/4 of this series is missed
which is very important for 3/4 and 4/4 to be compile.

Can you please apply the 2/4 before 3/4 and 4/4?

Patch #2 is applied, see:

http://git.kernel.org/?p=linux/kernel/git/sameo/mfd-2.6.git;a=commit;h=2c3a09ab5da9ef4438af445d8f051e92bac85616



Yes, it is there, Thank you very much. All patches are applied properly.


Thanks,
Laxman

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Doubts about listen backlog and tcp_max_syn_backlog

2013-01-27 Thread Vijay Subramanian
> +{ "ListenDrops", N_("%u SYNs to LISTEN sockets dropped"), opt_number },
>
> (see the file debian/patches/CVS-20081003-statistics.c_sync.patch
>  in the net-tools src)
>
> i.e., the netstat pkg is printing the value of the TCPEXT MIB counter
> that's counting TCPExtListenDrops.
>
> Theoretically, that number should be the same as that printed by nstat,
> as they are getting it from the same kernel stats counter. I have not
> looked at nstat code (I actually almost always dump the counters from
> /proc/net/{netstat + snmp} via a simple prettyprint script (will send
> you that offline).
>

nstat pretty much does what you describe which is to parse the
/proc/net files(s) and
print the contents. This is one advantage of nstat over netstat. When
you add a new MIB, you
do not need to update nstat.

> If the nstat and netstat counters don't match, something is fishy.
> That nstat output is broken.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Alex Shi
On 01/27/2013 06:40 PM, Borislav Petkov wrote:
> On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote:
>> Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9,
>> hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
>> loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear
>> performance change found.
> 
> Ok, good, You could put that in one of the commit messages so that it is
> there and people know that this patchset doesn't cause perf regressions
> with the bunch of benchmarks.
> 
>> I also tested balance policy/powersaving policy with above benchmark,
>> found, the specjbb2005 drop much 30~50% on both of policy whenever
>> with openjdk or jrockit. and hackbench drops a lots with powersaving
>> policy on snb 4 sockets platforms. others has no clear change.
> 
> I guess this is expected because there has to be some performance hit
> when saving power...
> 

BTW, I had tested the v3 version based on sched numa -- on tip/master.
The specjbb just has about 5~7% dropping on balance/powersaving policy.
The power scheduling done after the numa scheduling logical.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2/6] ARM: davinci: da850: add DT node for I2C0

2013-01-27 Thread Vishwanathrao Badarkhe, Manish
On Fri, Jan 25, 2013 at 16:20:13, Nori, Sekhar wrote:
> On 1/24/2013 5:05 PM, Vishwanathrao Badarkhe, Manish wrote:
> > Add I2C0 device tree node information to da850-evm.
> > Also, add I2C0 pin muxing information in da850-evm.
> > 
> > Signed-off-by: Vishwanathrao Badarkhe, Manish 
> > ---
> > Depends on patch
> > http://comments.gmane.org/gmane.linux.davinci/25993
> > 
> >  arch/arm/boot/dts/da850-evm.dts |   15 +++
> >  arch/arm/boot/dts/da850.dtsi|   10 ++
> >  2 files changed, 25 insertions(+), 0 deletions(-)
> > 
> > diff --git a/arch/arm/boot/dts/da850-evm.dts 
> > b/arch/arm/boot/dts/da850-evm.dts index 8cac9d2..3d8290a 100755
> > --- a/arch/arm/boot/dts/da850-evm.dts
> > +++ b/arch/arm/boot/dts/da850-evm.dts
> > @@ -27,5 +27,20 @@
> > serial2: serial@1d0d000 {
> > status = "okay";
> > };
> > +   i2c0@1c22000 {
> 
> This should be
>   i2c0: i2c@1c22000
> 
> to follow the convention elsewhere in file.

Ok, I will change this in next version.

> 
> > +   status = "okay";
> > +   };
> > +   };
> > +};
> > +_core {
> > +   pinctrl-names = "default";
> > +   pinctrl-0 = <
> > +   _pins
> > +   >;
> > +
> > +   i2c0_pins: pinmux_i2c0_pins{
> > +   pinctrl-single,bits = <
> > +   0x10 0x2200 0xff00  /* I2C0_SDA,I2C0_SCL */
> > +   >;
> 
> This should go into the dtsi file. See the discussion on NAND DT support 
> submitted by Anil Kumar.

I have seen Anil Kumar's discussion for pin-muxing which includes Linus patch 
of grab pin 
control handles from device core at following location:
http://lkml.indiana.edu/hypermail/linux/kernel/1301.2/00094.html

I have done changes accordingly for I2C0 pin muxing and seen kernel crashes 
giving message 
like "i2c_davinci i2c_davinci.1: could not find pctldev for node /soc/
pinmux@1c14120/pinmux_i2c0_pins, deferring probe". This is happened because 
I2C0 driver 
gets probed before pin mux driver.
  
To resolve this issue, I made changes in code to ensure pin control driver gets 
probed before 
I2C0 driver by registering pin control driver during arch_init call.
 
Hence, in order to move I2C0 pin muxing in dtsi file above fix is required.
> 
> Thanks,
> Sekhar
> 
> PS: You are using an old address for Kevin Hilman. The MAINTAINERS file has 
> been updated for a long time now. Liam's address is also wrong but I don't 
> have his updated e-mail.
> 


Regards, 
Manish
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Mike Galbraith
On Sun, 2013-01-27 at 16:51 +0100, Mike Galbraith wrote: 
> On Sun, 2013-01-27 at 21:25 +0800, Alex Shi wrote: 
> > On 01/27/2013 06:35 PM, Borislav Petkov wrote:
> > > On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote:
> > >> With aim7 compute on 4 node 40 core box, I see stable throughput
> > >> improvement at tasks = nr_cores and below w. balance and powersaving. 
> ... 
> > > Ok, this is sick. How is balance and powersaving better than perf? Both
> > > have much more jobs per minute than perf; is that because we do pack
> > > much more tasks per cpu with balance and powersaving?
> > 
> > Maybe it is due to the lazy balancing on balance/powersaving. You can
> > check the CS times in /proc/pid/status.
> 
> Well, it's not wakeup path, limiting entry frequency per waker did zip
> squat nada to any policy throughput.

monteverdi:/abuild/mike/:[0]# echo powersaving > 
/sys/devices/system/cpu/sched_policy/current_sched_policy
monteverdi:/abuild/mike/:[0]# massive_intr 10 60
043321  00058616
043313  00058616
043318  00058968
043317  00058968
043316  00059184
043319  00059192
043320  00059048
043314  00059048
043312  00058176
043315  00058184
monteverdi:/abuild/mike/:[0]# echo balance > 
/sys/devices/system/cpu/sched_policy/current_sched_policy
monteverdi:/abuild/mike/:[0]# massive_intr 10 60
043337  00053448
04  00053456
043338  00052992
043331  00053448
043332  00053488
043335  00053496
043334  00053480
043329  00053288
043336  00053464
043330  00053496
monteverdi:/abuild/mike/:[0]# echo performance > 
/sys/devices/system/cpu/sched_policy/current_sched_policy
monteverdi:/abuild/mike/:[0]# massive_intr 10 60
043348  00052488
043344  00052488
043349  00052744
043343  00052504
043347  00052504
043352  00052888
043345  00052504
043351  00052496
043346  00052496
043350  00052304
monteverdi:/abuild/mike/:[0]#

Zzzt.  Wish I could turn turbo thingy off.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] x86/apic: check FADT settings after enable x2apic

2013-01-27 Thread Wang, Song-Bo (Stoney)
Hi Yinghai, hpa and others,

Would you please review the patch on detecting x2apic FADT settings?

We meet a BIOS system which works on x2apic physical mode by setting the bit 
ACPI_FADT_APIC_PHYSICAL in FADT table.
And for those systems with all cpuid < 255, the spec requires BIOS's default 
mode in xapic.
The kernel detects the default mode and do some initializations and will call 
enable_IR_x2apic and change the mode to x2apic successfully.

So it is necessary to check ACPI_FADT_APIC_PHYSICAL bit after the kernel change 
the mode from xapic to x2apic.

(*drv)->acpi_madt_oem_check is called on detect default BIOS mode,
(*drv)->probe is called after enable_IR_x2apic,

The previous FADT check (commit ea0dcf90) should be applied to 
x2apic_phys_probe too.

Thanks,
Stoney

-Original Message-
From: Wang, Song-Bo (Stoney) 
Sent: Tuesday, January 15, 2013 9:51 AM
To: suresh.b.sid...@intel.com
Cc: Zhang, Lin-Bao (Linux Kernel R); Pearson, Greg; 
linux-kernel@vger.kernel.org; Wang, Song-Bo (Stoney)
Subject: [PATCH] x86/apic: check FADT settings after enable x2apic

OS will enable x2apic mode even BIOS default in xapic mode.

FADT settings check (commit ea0dcf903e7d76aa5d483d876215fedcfdfe140f)
should be applied after detect default mode and change the mode 
(enable_IR_x2apic called)

Signed-off-by: Stoney Wang 
---
 arch/x86/kernel/apic/x2apic_phys.c |   25 -
 1 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/apic/x2apic_phys.c 
b/arch/x86/kernel/apic/x2apic_phys.c
index e03a1e1..76ea60d 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -20,18 +20,22 @@ static int set_x2apic_phys_mode(char *arg)  }  
early_param("x2apic_phys", set_x2apic_phys_mode);
 
-static int x2apic_acpi_madt_oem_check(char *oem_id, char *oem_table_id)
+static int x2apic_fadt_phys(void)
 {
-   if (x2apic_phys)
-   return x2apic_enabled();
-   else if ((acpi_gbl_FADT.header.revision >= FADT2_REVISION_ID) &&
-   (acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL) &&
-   x2apic_enabled()) {
+   if ((acpi_gbl_FADT.header.revision >= FADT2_REVISION_ID) &&
+   (acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL)) {
printk(KERN_DEBUG "System requires x2apic physical mode\n");
return 1;
}
-   else
-   return 0;
+   return 0;
+}
+
+static int x2apic_acpi_madt_oem_check(char *oem_id, char *oem_table_id) 
+{
+   if (x2apic_enabled())
+   return x2apic_phys || x2apic_fadt_phys();
+
+   return 0;
 }
 
 static void
@@ -85,7 +89,10 @@ static int x2apic_phys_probe(void)
if (x2apic_mode && x2apic_phys)
return 1;
 
-   return apic == _x2apic_phys;
+   if (apic == _x2apic_phys)
+   return 1;
+
+   return x2apic_enabled() && x2apic_fadt_phys();
 }
 
 static struct apic apic_x2apic_phys = {
--
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 5/6] ARM: regulator: add tps6507x device tree data

2013-01-27 Thread Vishwanathrao Badarkhe, Manish
On Sat, Jan 26, 2013 at 10:42:08, Mark Brown wrote:
> On Fri, Jan 25, 2013 at 06:29:49AM +, Vishwanathrao Badarkhe, Manish 
> wrote:
> > On Thu, Jan 24, 2013 at 17:30:51, Mark Brown wrote:
> 
> > I too doubt that whether it should be in architecture specific folder,
> 
> > My code is in reference to below patch:
> > arm/dts: regulator: Add tps65910 device tree 
> > data(d5d08e2e1672da627d7c9d34a9dc1089c653e23a)
> 
> > Could you please suggest me if it can be moved somewhere else?
> 
> We should have somewhere to put this sort of generic stuff, yes.  Not sure 
> where, possibly under drivers/of or some non-drivers part of the tree.
> 

Right now, nobody has put this kind of generic stuff other than architecture 
folder. 
Later on We can move these kind of generic stuff somewhere else in one shot.


Regards, 
Manish
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH Resend Resend] smp:Fix use un-initialized cpumask_ipi

2013-01-27 Thread Wang YanQing
c7b798525b50256c8084215a139fa40b0114bfcc
[smp: Fix SMP function call empty cpu mask race]
use the un-initialized variable cpumask_ipi when
enable CONFIG_CPUMASK_OFFSTACK.

Signed-off-by: Wang YanQing 
---
 I am sorry for miss it first.

 kernel/smp.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/kernel/smp.c b/kernel/smp.c
index 7c56aba..af86f5e 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -57,6 +57,9 @@ hotplug_cfd(struct notifier_block *nfb, unsigned long action, 
void *hcpu)
if (!zalloc_cpumask_var_node(>cpumask, GFP_KERNEL,
cpu_to_node(cpu)))
return notifier_from_errno(-ENOMEM);
+   if (!zalloc_cpumask_var_node(>cpumask_ipi, GFP_KERNEL,
+   cpu_to_node(cpu)))
+   return notifier_from_errno(-ENOMEM);
break;
 
 #ifdef CONFIG_HOTPLUG_CPU
@@ -66,6 +69,7 @@ hotplug_cfd(struct notifier_block *nfb, unsigned long action, 
void *hcpu)
case CPU_DEAD:
case CPU_DEAD_FROZEN:
free_cpumask_var(cfd->cpumask);
+   free_cpumask_var(cfd->cpumask_ipi);
break;
 #endif
};
-- 
1.7.11.1.116.g8228a23
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 3.0.61

2013-01-27 Thread Greg KH
diff --git a/Makefile b/Makefile
index 3359fcf..2d64957 100644
--- a/Makefile
+++ b/Makefile
@@ -1,6 +1,6 @@
 VERSION = 3
 PATCHLEVEL = 0
-SUBLEVEL = 60
+SUBLEVEL = 61
 EXTRAVERSION =
 NAME = Sneaky Weasel
 
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 0310da6..1d44903 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -1,6 +1,7 @@
 #ifndef _ASM_X86_TRAPS_H
 #define _ASM_X86_TRAPS_H
 
+#include 
 #include 
 #include/* TRAP_TRACE, ... */
 
@@ -87,4 +88,29 @@ asmlinkage void smp_thermal_interrupt(void);
 asmlinkage void mce_threshold_interrupt(void);
 #endif
 
+/* Interrupts/Exceptions */
+enum {
+   X86_TRAP_DE = 0,/*  0, Divide-by-zero */
+   X86_TRAP_DB,/*  1, Debug */
+   X86_TRAP_NMI,   /*  2, Non-maskable Interrupt */
+   X86_TRAP_BP,/*  3, Breakpoint */
+   X86_TRAP_OF,/*  4, Overflow */
+   X86_TRAP_BR,/*  5, Bound Range Exceeded */
+   X86_TRAP_UD,/*  6, Invalid Opcode */
+   X86_TRAP_NM,/*  7, Device Not Available */
+   X86_TRAP_DF,/*  8, Double Fault */
+   X86_TRAP_OLD_MF,/*  9, Coprocessor Segment Overrun */
+   X86_TRAP_TS,/* 10, Invalid TSS */
+   X86_TRAP_NP,/* 11, Segment Not Present */
+   X86_TRAP_SS,/* 12, Stack Segment Fault */
+   X86_TRAP_GP,/* 13, General Protection Fault */
+   X86_TRAP_PF,/* 14, Page Fault */
+   X86_TRAP_SPURIOUS,  /* 15, Spurious Interrupt */
+   X86_TRAP_MF,/* 16, x87 Floating-Point Exception */
+   X86_TRAP_AC,/* 17, Alignment Check */
+   X86_TRAP_MC,/* 18, Machine Check */
+   X86_TRAP_XF,/* 19, SIMD Floating-Point Exception */
+   X86_TRAP_IRET = 32, /* 32, IRET Exception */
+};
+
 #endif /* _ASM_X86_TRAPS_H */
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 431ab11..65976cb 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -991,6 +991,9 @@ static int acpi_processor_setup_cpuidle(struct 
acpi_processor *pr)
return -EINVAL;
}
 
+   if (!dev)
+   return -EINVAL;
+
dev->cpu = pr->id;
for (i = 0; i < CPUIDLE_STATE_MAX; i++) {
dev->states[i].name[0] = '\0';
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index 8300250..75a8d0f 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -402,6 +402,12 @@ static const struct pci_device_id ahci_pci_tbl[] = {
/* Promise */
{ PCI_VDEVICE(PROMISE, 0x3f20), board_ahci },   /* PDC42819 */
 
+   /* Asmedia */
+   { PCI_VDEVICE(ASMEDIA, 0x0601), board_ahci },   /* ASM1060 */
+   { PCI_VDEVICE(ASMEDIA, 0x0602), board_ahci },   /* ASM1060 */
+   { PCI_VDEVICE(ASMEDIA, 0x0611), board_ahci },   /* ASM1061 */
+   { PCI_VDEVICE(ASMEDIA, 0x0612), board_ahci },   /* ASM1062 */
+
/* Generic, PCI class code for AHCI */
{ PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID,
  PCI_CLASS_STORAGE_SATA_AHCI, 0xff, board_ahci },
diff --git a/drivers/dma/ioat/dma_v3.c b/drivers/dma/ioat/dma_v3.c
index d845dc4..6e33926 100644
--- a/drivers/dma/ioat/dma_v3.c
+++ b/drivers/dma/ioat/dma_v3.c
@@ -949,7 +949,7 @@ static int __devinit ioat_xor_val_self_test(struct 
ioatdma_device *device)
goto free_resources;
}
}
-   dma_sync_single_for_device(dev, dest_dma, PAGE_SIZE, DMA_TO_DEVICE);
+   dma_sync_single_for_device(dev, dest_dma, PAGE_SIZE, DMA_FROM_DEVICE);
 
/* skip validate if the capability is not present */
if (!dma_has_cap(DMA_XOR_VAL, dma_chan->device->cap_mask))
diff --git a/drivers/firmware/dmi_scan.c b/drivers/firmware/dmi_scan.c
index 02a52d1..66b6315 100644
--- a/drivers/firmware/dmi_scan.c
+++ b/drivers/firmware/dmi_scan.c
@@ -16,6 +16,7 @@
  */
 static char dmi_empty_string[] = "";
 
+static u16 __initdata dmi_ver;
 /*
  * Catch too early calls to dmi_check_system():
  */
@@ -118,12 +119,12 @@ static int __init dmi_walk_early(void (*decode)(const 
struct dmi_header *,
return 0;
 }
 
-static int __init dmi_checksum(const u8 *buf)
+static int __init dmi_checksum(const u8 *buf, u8 len)
 {
u8 sum = 0;
int a;
 
-   for (a = 0; a < 15; a++)
+   for (a = 0; a < len; a++)
sum += buf[a];
 
return sum == 0;
@@ -161,8 +162,10 @@ static void __init dmi_save_uuid(const struct dmi_header 
*dm, int slot, int inde
return;
 
for (i = 0; i < 16 && (is_ff || is_00); i++) {
-   if(d[i] != 0x00) is_ff = 0;
-   if(d[i] != 0xFF) is_00 = 0;
+   if (d[i] != 0x00)
+   is_00 = 0;
+   if (d[i] != 0xFF)
+   is_ff = 0;
}
 
   

Linux 3.0.61

2013-01-27 Thread Greg KH
I'm announcing the release of the 3.0.61 kernel.

All users of the 3.0 kernel series must upgrade.

The updated 3.0.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git 
linux-3.0.y
and can be browsed at the normal kernel.org git web browser:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary

thanks,

greg k-h



 Makefile   |2 
 arch/x86/include/asm/traps.h   |   26 +
 drivers/acpi/processor_idle.c  |3 +
 drivers/ata/ahci.c |6 ++
 drivers/dma/ioat/dma_v3.c  |2 
 drivers/firmware/dmi_scan.c|   78 ++---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   21 +++
 drivers/gpu/drm/i915/i915_reg.h|3 +
 drivers/gpu/drm/i915/intel_display.c   |4 +
 drivers/misc/sgi-xp/xpc_main.c |   34 +++-
 drivers/pci/pcie/aspm.c|3 +
 drivers/scsi/sd.c  |   13 ++--
 drivers/staging/usbip/usbip_common.c   |   11 +---
 drivers/staging/usbip/usbip_common.h   |2 
 drivers/staging/usbip/vhci_rx.c|3 -
 drivers/tty/serial/8250.c  |2 
 drivers/usb/host/uhci-hcd.c|   15 +++--
 include/linux/pci_ids.h|2 
 kernel/trace/ftrace.c  |2 
 19 files changed, 188 insertions(+), 44 deletions(-)

Alan Cox (1):
  ahci: Add identifiers for ASM106x devices

Alan Stern (1):
  USB: UHCI: fix IRQ race during initialization

Bart Westgeest (1):
  staging: usbip: changed function return type to void

Chris Wilson (1):
  drm/i915: Invalidate the relocation presumed_offsets along the slow path

Colin Ian King (1):
  PCI: Allow pcie_aspm=force even when FADT indicates it is unsupported

Daniel Vetter (1):
  drm/i915: Implement WaDisableHiZPlanesWhenMSAAEnabled

Greg Kroah-Hartman (1):
  Linux 3.0.61

Jiri Slaby (1):
  serial: 8250, increase PASS_LIMIT

Joel D. Diaz (1):
  SCSI: sd: Reshuffle init_sd to avoid crash

Kees Cook (1):
  x86: Use enum instead of literals for trap values [PARTIAL]

Konrad Rzeszutek Wilk (1):
  ACPI / cpuidle: Fix NULL pointer issues when cpuidle is disabled

Robin Holt (1):
  SGI-XP: handle non-fatal traps

Shuah Khan (1):
  ioat: Fix DMA memory sync direction correct flag

Steven Rostedt (1):
  ftrace: Be first to run code modification on modules

Zhenzhong Duan (2):
  drivers/firmware/dmi_scan.c: check dmi version when get system uuid
  drivers/firmware/dmi_scan.c: fetch dmi version from SMBIOS if it exists



pgpou2Lj7aCSK.pgp
Description: PGP signature


Re: Linux 3.4.28

2013-01-27 Thread Greg KH

diff --git a/Makefile b/Makefile
index f139ce7..8ccebba 100644
--- a/Makefile
+++ b/Makefile
@@ -1,6 +1,6 @@
 VERSION = 3
 PATCHLEVEL = 4
-SUBLEVEL = 27
+SUBLEVEL = 28
 EXTRAVERSION =
 NAME = Saber-toothed Squirrel
 
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index f3decb3..6cba428 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -1018,6 +1018,9 @@ static int acpi_processor_setup_cpuidle_cx(struct 
acpi_processor *pr)
return -EINVAL;
}
 
+   if (!dev)
+   return -EINVAL;
+
dev->cpu = pr->id;
 
if (max_cstate == 0)
@@ -1205,6 +1208,7 @@ int acpi_processor_cst_has_changed(struct acpi_processor 
*pr)
}
 
/* Populate Updated C-state information */
+   acpi_processor_get_power_info(pr);
acpi_processor_setup_cpuidle_states(pr);
 
/* Enable all cpuidle devices */
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index 93cbc44..71a4d04 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -53,6 +53,7 @@
 
 enum {
AHCI_PCI_BAR_STA2X11= 0,
+   AHCI_PCI_BAR_ENMOTUS= 2,
AHCI_PCI_BAR_STANDARD   = 5,
 };
 
@@ -405,7 +406,13 @@ static const struct pci_device_id ahci_pci_tbl[] = {
{ PCI_VDEVICE(PROMISE, 0x3f20), board_ahci },   /* PDC42819 */
 
/* Asmedia */
-   { PCI_VDEVICE(ASMEDIA, 0x0612), board_ahci },   /* ASM1061 */
+   { PCI_VDEVICE(ASMEDIA, 0x0601), board_ahci },   /* ASM1060 */
+   { PCI_VDEVICE(ASMEDIA, 0x0602), board_ahci },   /* ASM1060 */
+   { PCI_VDEVICE(ASMEDIA, 0x0611), board_ahci },   /* ASM1061 */
+   { PCI_VDEVICE(ASMEDIA, 0x0612), board_ahci },   /* ASM1062 */
+
+   /* Enmotus */
+   { PCI_DEVICE(0x1c44, 0x8000), board_ahci },
 
/* Generic, PCI class code for AHCI */
{ PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID,
@@ -1079,9 +1086,11 @@ static int ahci_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
dev_info(>dev,
 "PDC42819 can only drive SATA devices with this 
driver\n");
 
-   /* The Connext uses non-standard BAR */
+   /* Both Connext and Enmotus devices use non-standard BARs */
if (pdev->vendor == PCI_VENDOR_ID_STMICRO && pdev->device == 0xCC06)
ahci_pci_bar = AHCI_PCI_BAR_STA2X11;
+   else if (pdev->vendor == 0x1c44 && pdev->device == 0x8000)
+   ahci_pci_bar = AHCI_PCI_BAR_ENMOTUS;
 
/* acquire resources */
rc = pcim_enable_device(pdev);
diff --git a/drivers/dma/ioat/dma_v3.c b/drivers/dma/ioat/dma_v3.c
index f7f1dc6..ed0e8b7 100644
--- a/drivers/dma/ioat/dma_v3.c
+++ b/drivers/dma/ioat/dma_v3.c
@@ -951,7 +951,7 @@ static int __devinit ioat_xor_val_self_test(struct 
ioatdma_device *device)
goto free_resources;
}
}
-   dma_sync_single_for_device(dev, dest_dma, PAGE_SIZE, DMA_TO_DEVICE);
+   dma_sync_single_for_device(dev, dest_dma, PAGE_SIZE, DMA_FROM_DEVICE);
 
/* skip validate if the capability is not present */
if (!dma_has_cap(DMA_XOR_VAL, dma_chan->device->cap_mask))
diff --git a/drivers/firmware/dmi_scan.c b/drivers/firmware/dmi_scan.c
index b298158..fd3ae62 100644
--- a/drivers/firmware/dmi_scan.c
+++ b/drivers/firmware/dmi_scan.c
@@ -16,6 +16,7 @@
  */
 static char dmi_empty_string[] = "";
 
+static u16 __initdata dmi_ver;
 /*
  * Catch too early calls to dmi_check_system():
  */
@@ -118,12 +119,12 @@ static int __init dmi_walk_early(void (*decode)(const 
struct dmi_header *,
return 0;
 }
 
-static int __init dmi_checksum(const u8 *buf)
+static int __init dmi_checksum(const u8 *buf, u8 len)
 {
u8 sum = 0;
int a;
 
-   for (a = 0; a < 15; a++)
+   for (a = 0; a < len; a++)
sum += buf[a];
 
return sum == 0;
@@ -161,8 +162,10 @@ static void __init dmi_save_uuid(const struct dmi_header 
*dm, int slot, int inde
return;
 
for (i = 0; i < 16 && (is_ff || is_00); i++) {
-   if(d[i] != 0x00) is_ff = 0;
-   if(d[i] != 0xFF) is_00 = 0;
+   if (d[i] != 0x00)
+   is_00 = 0;
+   if (d[i] != 0xFF)
+   is_ff = 0;
}
 
if (is_ff || is_00)
@@ -172,7 +175,15 @@ static void __init dmi_save_uuid(const struct dmi_header 
*dm, int slot, int inde
if (!s)
return;
 
-   sprintf(s, "%pUB", d);
+   /*
+* As of version 2.6 of the SMBIOS specification, the first 3 fields of
+* the UUID are supposed to be little-endian encoded.  The specification
+* says that this is the defacto standard.
+*/
+   if (dmi_ver >= 0x0206)
+   sprintf(s, "%pUL", d);
+   else
+   sprintf(s, "%pUB", d);
 
 dmi_ident[slot] = s;
 }
@@ -404,29 +415,57 @@ static int 

Linux 3.4.28

2013-01-27 Thread Greg KH
I'm announcing the release of the 3.4.28 kernel.

All users of the 3.4 kernel series must upgrade.

The updated 3.4.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git 
linux-3.4.y
and can be browsed at the normal kernel.org git web browser:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary

thanks,

greg k-h



 Makefile   |2 
 drivers/acpi/processor_idle.c  |4 +
 drivers/ata/ahci.c |   13 
 drivers/dma/ioat/dma_v3.c  |2 
 drivers/firmware/dmi_scan.c|   78 ++---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   21 +++
 drivers/gpu/drm/i915/i915_reg.h|3 +
 drivers/gpu/drm/i915/intel_display.c   |4 +
 drivers/pci/hotplug/pciehp.h   |2 
 drivers/pci/hotplug/pciehp_core.c  |   11 
 drivers/pci/hotplug/pciehp_ctrl.c  |8 +-
 drivers/pci/hotplug/pciehp_hpc.c   |   11 +++-
 drivers/pci/hotplug/shpchp.h   |1 
 drivers/pci/hotplug/shpchp_core.c  |   10 ---
 drivers/pci/hotplug/shpchp_ctrl.c  |2 
 drivers/pci/pcie/aer/aerdrv_core.c |1 
 drivers/pci/pcie/aspm.c|3 +
 drivers/scsi/sd.c  |   13 ++--
 drivers/usb/dwc3/gadget.c  |1 
 drivers/usb/host/uhci-hcd.c|   15 +++--
 include/linux/sched.h  |   11 +++-
 kernel/ptrace.c|   72 +-
 kernel/sched/core.c|3 -
 kernel/signal.c|   19 +++
 kernel/trace/ftrace.c  |2 
 security/integrity/evm/evm_crypto.c|4 -
 sound/usb/endpoint.c   |6 --
 27 files changed, 230 insertions(+), 92 deletions(-)

Alan Cox (1):
  ahci: Add identifiers for ASM106x devices

Alan Stern (1):
  USB: UHCI: fix IRQ race during initialization

Betty Dall (1):
  PCI/AER: pci_get_domain_bus_and_slot() call missing required pci_dev_put()

Bjorn Helgaas (1):
  PCI: shpchp: Handle push button event asynchronously

Chris Wilson (1):
  drm/i915: Invalidate the relocation presumed_offsets along the slow path

Colin Ian King (1):
  PCI: Allow pcie_aspm=force even when FADT indicates it is unsupported

Daniel Vetter (1):
  drm/i915: Implement WaDisableHiZPlanesWhenMSAAEnabled

Dmitry Kasatkin (1):
  evm: checking if removexattr is not a NULL

Greg Kroah-Hartman (1):
  Linux 3.4.28

Hugh Daschbach (1):
  libata: ahci: Add support for Enmotus Bobcat device.

Joel D. Diaz (1):
  SCSI: sd: Reshuffle init_sd to avoid crash

Konrad Rzeszutek Wilk (1):
  ACPI / cpuidle: Fix NULL pointer issues when cpuidle is disabled

Oleg Nesterov (3):
  ptrace: introduce signal_wake_up_state() and ptrace_signal_wake_up()
  ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL
  wake_up_process() should be never used to wakeup a TASK_STOPPED/TRACED 
task

Pratyush Anand (1):
  usb: dwc3: gadget: fix ep->maxburst for ep0

Shuah Khan (1):
  ioat: Fix DMA memory sync direction correct flag

Steven Rostedt (1):
  ftrace: Be first to run code modification on modules

Takashi Iwai (1):
  ALSA: usb-audio: Fix regression by disconnection-race-fix patch

Thomas Schlichter (1):
  ACPI / processor: Get power info before updating the C-states

Yijing Wang (1):
  PCI: pciehp: Use per-slot workqueues to avoid deadlock

Zhenzhong Duan (2):
  drivers/firmware/dmi_scan.c: check dmi version when get system uuid
  drivers/firmware/dmi_scan.c: fetch dmi version from SMBIOS if it exists



pgp9ubvDp801g.pgp
Description: PGP signature


Linux 3.7.5

2013-01-27 Thread Greg KH
I'm announcing the release of the 3.7.5 kernel.

All users of the 3.7 kernel series must upgrade.

The updated 3.7.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git 
linux-3.7.y
and can be browsed at the normal kernel.org git web browser:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary

thanks,

greg k-h



 Makefile   |4 
 arch/arm64/include/asm/elf.h   |5 
 arch/x86/kernel/cpu/perf_event.c   |6 -
 arch/x86/kernel/step.c |9 -
 drivers/acpi/processor_idle.c  |4 
 drivers/acpi/processor_perflib.c   |7 +
 drivers/ata/ahci.c |8 +
 drivers/ata/libahci.c  |6 -
 drivers/ata/libata-core.c  |   22 ++--
 drivers/ata/libata-eh.c|2 
 drivers/block/virtio_blk.c |7 +
 drivers/cpufreq/Kconfig.x86|2 
 drivers/cpufreq/acpi-cpufreq.c |7 +
 drivers/dma/ioat/dma_v3.c  |2 
 drivers/dma/tegra20-apb-dma.c  |8 +
 drivers/firmware/dmi_scan.c|   78 ---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   21 
 drivers/gpu/drm/i915/i915_reg.h|1 
 drivers/gpu/drm/i915/intel_pm.c|4 
 drivers/i2c/busses/i2c-mxs.c   |2 
 drivers/idle/intel_idle.c  |3 
 drivers/media/usb/gspca/kinect.c   |1 
 drivers/misc/ti-st/st_kim.c|   37 +++
 drivers/pci/hotplug/pciehp.h   |2 
 drivers/pci/hotplug/pciehp_core.c  |   11 --
 drivers/pci/hotplug/pciehp_ctrl.c  |8 -
 drivers/pci/hotplug/pciehp_hpc.c   |   11 +-
 drivers/pci/hotplug/shpchp.h   |3 
 drivers/pci/hotplug/shpchp_core.c  |   35 ++
 drivers/pci/hotplug/shpchp_ctrl.c  |6 -
 drivers/pci/pcie/aer/aerdrv_core.c |1 
 drivers/pci/pcie/aspm.c|3 
 drivers/scsi/sd.c  |   13 +-
 drivers/usb/dwc3/gadget.c  |1 
 drivers/usb/gadget/f_fs.c  |6 -
 drivers/usb/host/uhci-hcd.c|   15 +-
 drivers/usb/musb/cppi_dma.c|4 
 drivers/vfio/pci/vfio_pci_rdwr.c   |4 
 include/linux/ata.h|8 -
 include/linux/libata.h |4 
 include/linux/module.h |   10 -
 include/linux/sched.h  |   11 +-
 init/do_mounts_initrd.c|4 
 init/main.c|4 
 kernel/async.c |   27 +++--
 kernel/debug/kdb/kdb_main.c|2 
 kernel/module.c|  147 +++--
 kernel/ptrace.c|   72 +++---
 kernel/sched/core.c|3 
 kernel/signal.c|   19 +--
 kernel/trace/ftrace.c  |2 
 lib/bug.c  |1 
 security/integrity/evm/evm_crypto.c|4 
 sound/pci/hda/patch_conexant.c |9 +
 sound/pci/hda/patch_realtek.c  |1 
 tools/perf/Makefile|2 
 56 files changed, 491 insertions(+), 208 deletions(-)

Al Viro (1):
  make sure that /linuxrc has std{in,out,err}

Alan Stern (1):
  USB: UHCI: fix IRQ race during initialization

Alex Williamson (1):
  vfio-pci: Fix buffer overfill

Alexander Graf (1):
  virtio-blk: Don't free ida when disk is in use

Benoit Goby (1):
  usb: gadget: FunctionFS: Fix missing braces in parse_opts

Betty Dall (1):
  PCI/AER: pci_get_domain_bus_and_slot() call missing required pci_dev_put()

Bian Yu (1):
  libata: ahci: Fix lack of command retry after a success error handler.

Bjorn Helgaas (2):
  PCI: shpchp: Handle push button event asynchronously
  PCI: shpchp: Use per-slot workqueues to avoid deadlock

Borislav Petkov (1):
  powernow-k8: Add a kconfig dependency on acpi-cpufreq

Chris Wilson (1):
  drm/i915: Invalidate the relocation presumed_offsets along the slow path

Colin Ian King (1):
  PCI: Allow pcie_aspm=force even when FADT indicates it is unsupported

Daniel Vetter (1):
  drm/i915: Implement WaDisableHiZPlanesWhenMSAAEnabled

David Ahern (1):
  perf x86: revert 20b279 - require exclude_guest to use PEBS - kernel side

David Henningsson (1):
  ALSA: hda - Fix mute led for another HP machine

Dmitry Kasatkin (1):
  evm: checking if removexattr is not a NULL

Fabio Estevam (1):
  i2c: mxs: Fix type of error code

Greg Kroah-Hartman (1):
  Linux 3.7.5

Hugh Daschbach (1):
  libata: ahci: Add support for Enmotus Bobcat device.

Jacob Schloss (1):
  media: 

[PATCH Resend] smp:Fix use un-initialized cpumask_ipi

2013-01-27 Thread Wang YanQing
c7b798525b50256c8084215a139fa40b0114bfcc
[smp: Fix SMP function call empty cpu mask race]
use the un-initialized variable cpumask_ipi when
enable CONFIG_CPUMASK_OFFSTACK.

Signed-off-by: Wang YanQing 
---
 I am sorry for miss it first.

 kernel/smp.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/kernel/smp.c b/kernel/smp.c
index 7c56aba..af86f5e 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -57,6 +57,9 @@ hotplug_cfd(struct notifier_block *nfb, unsigned long action, 
void *hcpu)
if (!zalloc_cpumask_var_node(>cpumask, GFP_KERNEL,
cpu_to_node(cpu)))
return notifier_from_errno(-ENOMEM);
+   if (!zalloc_cpumask_var_node(>cpumask_ipi, GFP_KERNEL,
+   cpu_to_node(cpu)))
+   return notifier_from_errno(-ENOMEM);
break;
 
 #ifdef CONFIG_HOTPLUG_CPU
@@ -66,6 +69,7 @@ hotplug_cfd(struct notifier_block *nfb, unsigned long action, 
void *hcpu)
case CPU_DEAD:
case CPU_DEAD_FROZEN:
free_cpumask_var(cfd->cpumask);
+   free_cpumask_var(cfd->cpumask_ini);
break;
 #endif
};
-- 
1.7.11.1.116.g8228a23
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 11/15] ahci: Add identifiers for ASM106x devices

2013-01-27 Thread Greg Kroah-Hartman
On Fri, Jan 25, 2013 at 05:00:48PM -0500, Abdallah Chatila wrote:
> On Fri, Jan 25, 2013 at 01:45:21PM -0700, Jerry Snitselaar wrote:
> > 
> > There is a whitespace error in this patch:
> > 
> > Applying: ahci: Add identifiers for ASM106x devices
> > /root/linux/linux/.git/rebase-apply/patch:12: space before tab in indent.
> > /* Asmedia */
> > warning: 1 line adds whitespace errors.
> 
> Will send a new patch to remove the extra whitespace before the comment
> shortly.

I've edited it by hand and fixed it up.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/6] cpufreq: Add a get_current_driver helper

2013-01-27 Thread Viresh Kumar
On Sun, Jan 20, 2013 at 3:54 PM, Borislav Petkov  wrote:
> From: Borislav Petkov 
>
> Add a helper function to return cpufreq_driver->name.
>
> Signed-off-by: Borislav Petkov 
> ---
>  drivers/cpufreq/cpufreq.c | 14 ++
>  include/linux/cpufreq.h   |  1 +
>  2 files changed, 15 insertions(+)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 1f93dbd72355..6ed3c1377caf 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1386,6 +1386,20 @@ static struct syscore_ops cpufreq_syscore_ops = {
> .resume = cpufreq_bp_resume,
>  };
>
> +/**
> + * cpufreq_get_current_driver - return current driver's name
> + *
> + * Return the name string of the currently loaded cpufreq driver
> + * or NULL, if none.
> + */
> +const char *cpufreq_get_current_driver(void)
> +{
> +   if (cpufreq_driver)
> +   return cpufreq_driver->name;
> +
> +   return NULL;
> +}
> +EXPORT_SYMBOL_GPL(cpufreq_get_current_driver);
>
>  /*
>   * NOTIFIER LISTS INTERFACE  *
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index a55b88eaf96a..a018da2d2a7c 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -407,4 +407,5 @@ void cpufreq_frequency_table_get_attr(struct 
> cpufreq_frequency_table *table,
>   unsigned int cpu);
>
>  void cpufreq_frequency_table_put_attr(unsigned int cpu);
> +extern const char *cpufreq_get_current_driver(void);

Two minor things here:
- You placed the routine at bad place. This place is meant for
freq_table helpers.
- And you really don't need extern for function prototypes.

And after fixing these, feel free to add

Acked-by: Viresh Kumar 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] i2c-i801: SMBus patch for Intel Avoton DeviceIDs

2013-01-27 Thread Wolfram Sang

> @@ -166,6 +167,7 @@
>  #define PCI_DEVICE_ID_INTEL_5_3400_SERIES_SMBUS  0x3b30
>  #define PCI_DEVICE_ID_INTEL_LYNXPOINT_SMBUS  0x8c22
>  #define PCI_DEVICE_ID_INTEL_LYNXPOINT_LP_SMBUS   0x9c22
> +#define PCI_DEVICE_ID_INTEL_AVOTON_SMBUS 0x1f3c

This seems to be sorted, please stick to that.

-- 
Pengutronix e.K.   | Wolfram Sang|
Industrial Linux Solutions | http://www.pengutronix.de/  |


signature.asc
Description: Digital signature


[tip:x86/efi] x86, build: Dynamically find entry points in compressed startup code

2013-01-27 Thread tip-bot for David Woodhouse
Commit-ID:  99f857db8857aff691c51302f93648263ed07eb1
Gitweb: http://git.kernel.org/tip/99f857db8857aff691c51302f93648263ed07eb1
Author: David Woodhouse 
AuthorDate: Thu, 10 Jan 2013 14:31:59 +
Committer:  H. Peter Anvin 
CommitDate: Sun, 27 Jan 2013 20:19:37 -0800

x86, build: Dynamically find entry points in compressed startup code

We have historically hard-coded entry points in head.S just so it's easy
to build the executable/bzImage headers with references to them.

Unfortunately, this leads to boot loaders abusing these "known" addresses
even when they are *explicitly* told that they "should look at the ELF
header to find this address, as it may change in the future". And even
when the address in question *has* actually been changed in the past,
without fanfare or thought to compatibility.

Thus we have bootloaders doing stunningly broken things like jumping
to offset 0x200 in the kernel startup code in 64-bit mode, *hoping*
that startup_64 is still there (it has moved at least once
before). And hoping that it's actually a 64-bit kernel despite the
fact that we don't give them any indication of that fact.

This patch should hopefully remove the temptation to abuse internal
addresses in future, where sternly worded comments have not sufficed.
Instead of having hard-coded addresses and saying "please don't abuse
these", we actually pull the addresses out of the ELF payload into
zoffset.h, and make build.c shove them back into the right places in
the bzImage header.

Rather than including zoffset.h into build.c and thus having to rebuild
the tool for every kernel build, we parse it instead. The parsing code
is small and simple.

This patch doesn't actually move any of the interesting entry points, so
any offending bootloader will still continue to "work" after this patch
is applied. For some version of "work" which includes jumping into the
compressed payload and crashing, if the bzImage it's given is a 32-bit
kernel. No change there then.

[ hpa: some of the issues in the description are addressed or
  retconned by the 2.12 boot protocol.  This patch has been edited to
  only remove fixed addresses that were *not* thus retconned. ]

Signed-off-by: David Woodhouse 
Link: http://lkml.kernel.org/r/1358513837.2397.247.ca...@shinybook.infradead.org
Signed-off-by: H. Peter Anvin 
Cc: Matt Fleming 
---
 arch/x86/boot/Makefile |  4 +-
 arch/x86/boot/compressed/head_32.S |  6 +--
 arch/x86/boot/compressed/head_64.S |  8 ++--
 arch/x86/boot/tools/build.c| 81 +-
 4 files changed, 72 insertions(+), 27 deletions(-)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index ccce0ed..379814b 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -71,7 +71,7 @@ GCOV_PROFILE := n
 $(obj)/bzImage: asflags-y  := $(SVGA_MODE)
 
 quiet_cmd_image = BUILD   $@
-cmd_image = $(obj)/tools/build $(obj)/setup.bin $(obj)/vmlinux.bin > $@
+cmd_image = $(obj)/tools/build $(obj)/setup.bin $(obj)/vmlinux.bin 
$(obj)/zoffset.h > $@
 
 $(obj)/bzImage: $(obj)/setup.bin $(obj)/vmlinux.bin $(obj)/tools/build FORCE
$(call if_changed,image)
@@ -92,7 +92,7 @@ targets += voffset.h
 $(obj)/voffset.h: vmlinux FORCE
$(call if_changed,voffset)
 
-sed-zoffset := -e 's/^\([0-9a-fA-F]*\) . 
\(startup_32\|input_data\|_end\|z_.*\)$$/\#define ZO_\2 0x\1/p'
+sed-zoffset := -e 's/^\([0-9a-fA-F]*\) . 
\(startup_32\|startup_64\|efi_pe_entry\|efi_stub_entry\|input_data\|_end\|z_.*\)$$/\#define
 ZO_\2 0x\1/p'
 
 quiet_cmd_zoffset = ZOFFSET $@
   cmd_zoffset = $(NM) $< | sed -n $(sed-zoffset) > $@
diff --git a/arch/x86/boot/compressed/head_32.S 
b/arch/x86/boot/compressed/head_32.S
index ccb2f4a..1e3184f 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -35,11 +35,11 @@ ENTRY(startup_32)
 #ifdef CONFIG_EFI_STUB
jmp preferred_addr
 
-   .balign 0x10
/*
 * We don't need the return address, so set up the stack so
-* efi_main() can find its arugments.
+* efi_main() can find its arguments.
 */
+ENTRY(efi_pe_entry)
add $0x4, %esp
 
callmake_boot_params
@@ -52,7 +52,7 @@ ENTRY(startup_32)
pushl   %ecx
sub $0x4, %esp
 
-   .org 0x30,0x90
+ENTRY(efi_stub_entry)
add $0x4, %esp
callefi_main
cmpl$0, %eax
diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index 2c4b171..f5d1aaa 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -201,12 +201,12 @@ ENTRY(startup_64)
 */
 #ifdef CONFIG_EFI_STUB
/*
-* The entry point for the PE/COFF executable is 0x210, so only
-* legacy boot loaders will execute this jmp.
+* The entry point for the PE/COFF executable is efi_pe_entry, so
+* only legacy boot loaders will execute this jmp.
 */
jmp preferred_addr
 
-   

Re: [PATCHv2 6/9] zsmalloc: promote to lib/

2013-01-27 Thread Minchan Kim
On Mon, Jan 28, 2013 at 01:01:16PM +0900, Minchan Kim wrote:
> On Mon, Jan 07, 2013 at 02:24:37PM -0600, Seth Jennings wrote:
> > This patch promotes the slab-based zsmalloc memory allocator
> > from the staging tree to lib/
> > 
> > zswap depends on this allocator for storing compressed RAM pages
> > in an efficient way under system wide memory pressure where
> > high-order (greater than 0) page allocation are very likely to
> > fail.
> > 
> > For more information on zsmalloc and its internals, read the
> > documentation at the top of the zsmalloc.c file.
> > 
> > Signed-off-by: Seth Jennings 
> 
> Seth, zsmalloc has a bug[1], I sent a patch totay. If it want't known,
> it mighte be no problem to promote but it's known bug so let's fix it
> before promoting.
> 
> Another question. Why do you promote zsmalloc in this patchset?
> It might make you hard to merge even zswap into staging.

When I look at [8/9], I realized you are trying to merge this patch
into mm/, NOT staging. I don't know history why zsmalloc/zram/zscache was
in staging at the beginning but personally, I don't ojbect zswap into /mm
directly because I got realized staging is very deep hole to get out,
expecially related to mm stuff. ;-)

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/efi] x86, efi: Fix PCI ROM handing in EFI boot stub, in 32-bit mode

2013-01-27 Thread tip-bot for David Woodhouse
Commit-ID:  b607e2126705ca28ecf21aa051172882bbdaae8a
Gitweb: http://git.kernel.org/tip/b607e2126705ca28ecf21aa051172882bbdaae8a
Author: David Woodhouse 
AuthorDate: Mon, 7 Jan 2013 22:09:49 +
Committer:  H. Peter Anvin 
CommitDate: Sun, 27 Jan 2013 20:19:37 -0800

x86, efi: Fix PCI ROM handing in EFI boot stub, in 32-bit mode

The 'Attributes' argument to pci->Attributes() function is 64-bit. So
when invoking in 32-bit mode it takes two registers, not just one.

This fixes memory corruption when booting via the 32-bit EFI boot stub.

Signed-off-by: David Woodhouse 
Cc: 
Link: http://lkml.kernel.org/r/1358513837.2397.247.ca...@shinybook.infradead.org
Signed-off-by: H. Peter Anvin 
Cc: Matt Fleming 
---
 arch/x86/boot/compressed/eboot.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
index 448a86e..b7f2208 100644
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -295,10 +295,15 @@ static efi_status_t setup_efi_pci(struct boot_params 
*params)
if (!pci)
continue;
 
+#ifdef CONFIG_X86_64
status = efi_call_phys4(pci->attributes, pci,
EfiPciIoAttributeOperationGet, 0,
);
-
+#else
+   status = efi_call_phys5(pci->attributes, pci,
+   EfiPciIoAttributeOperationGet, 0, 0,
+   );
+#endif
if (status != EFI_SUCCESS)
continue;
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/efi] x86, efi: Fix 32-bit EFI handover protocol entry point

2013-01-27 Thread tip-bot for David Woodhouse
Commit-ID:  f791620fa7517e1045742c475a7f005db9a634b8
Gitweb: http://git.kernel.org/tip/f791620fa7517e1045742c475a7f005db9a634b8
Author: David Woodhouse 
AuthorDate: Mon, 7 Jan 2013 22:01:50 +
Committer:  H. Peter Anvin 
CommitDate: Sun, 27 Jan 2013 20:19:37 -0800

x86, efi: Fix 32-bit EFI handover protocol entry point

If the bootloader calls the EFI handover entry point as a standard function
call, then it'll have a return address on the stack. We need to pop that
before calling efi_main(), or the arguments will all be out of position on
the stack.

Signed-off-by: David Woodhouse 
Cc: 
Link: http://lkml.kernel.org/r/1358513837.2397.247.ca...@shinybook.infradead.org
Signed-off-by: H. Peter Anvin 
Cc: Matt Fleming 
---
 arch/x86/boot/compressed/head_32.S | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/boot/compressed/head_32.S 
b/arch/x86/boot/compressed/head_32.S
index aa4aaf1..ccb2f4a 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -50,8 +50,10 @@ ENTRY(startup_32)
pushl   %eax
pushl   %esi
pushl   %ecx
+   sub $0x4, %esp
 
.org 0x30,0x90
+   add $0x4, %esp
callefi_main
cmpl$0, %eax
movl%eax, %esi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/efi] x86, efi: Fix display detection in EFI boot stub

2013-01-27 Thread tip-bot for David Woodhouse
Commit-ID:  70a479cbe80296d3113e65cc2f713a5101061daf
Gitweb: http://git.kernel.org/tip/70a479cbe80296d3113e65cc2f713a5101061daf
Author: David Woodhouse 
AuthorDate: Mon, 7 Jan 2013 21:52:16 +
Committer:  H. Peter Anvin 
CommitDate: Sun, 27 Jan 2013 20:19:37 -0800

x86, efi: Fix display detection in EFI boot stub

When booting under OVMF we have precisely one GOP device, and it
implements the ConOut protocol.

We break out of the loop when we look at it... and then promptly abort
because 'first_gop' never gets set. We should set first_gop *before*
breaking out of the loop. Yes, it doesn't really mean "first" any more,
but that doesn't matter. It's only a flag to indicate that a suitable
GOP was found.

In fact, we'd do just as well to initialise 'width' to zero in this
function, then just check *that* instead of first_gop. But I'll do the
minimal fix for now (and for stable@).

Signed-off-by: David Woodhouse 
Cc: 
Link: http://lkml.kernel.org/r/1358513837.2397.247.ca...@shinybook.infradead.org
Signed-off-by: H. Peter Anvin 
Cc: Matt Fleming 
---
 arch/x86/boot/compressed/eboot.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
index 18e329c..448a86e 100644
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -432,10 +432,9 @@ static efi_status_t setup_gop(struct screen_info *si, 
efi_guid_t *proto,
 * Once we've found a GOP supporting ConOut,
 * don't bother looking any further.
 */
+   first_gop = gop;
if (conout_found)
break;
-
-   first_gop = gop;
}
}
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 7/9] mm: break up swap_writepage() for frontswap backends

2013-01-27 Thread Minchan Kim
On Mon, Jan 07, 2013 at 02:24:38PM -0600, Seth Jennings wrote:
> swap_writepage() is currently where frontswap hooks into the swap
> write path to capture pages with the frontswap_store() function.
> However, if a frontswap backend wants to "resume" the writeback of
> a page to the swap device, it can't call swap_writepage() as
> the page will simply reenter the backend.
> 
> This patch separates swap_writepage() into a top and bottom half, the
> bottom half named __swap_writepage() to allow a frontswap backend,
> like zswap, to resume writeback beyond the frontswap_store() hook and
> by notified when the writeback completes.
> 
> Signed-off-by: Seth Jennings 

Looks good to me except few nitpicks.

Acked-by: Minchan Kim 

> ---
>  include/linux/swap.h |4 
>  mm/page_io.c |   22 +-
>  mm/swap_state.c  |2 +-
>  3 files changed, 22 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 8c66486..a3da829 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -321,6 +321,9 @@ static inline void mem_cgroup_uncharge_swap(swp_entry_t 
> ent)
>  /* linux/mm/page_io.c */
>  extern int swap_readpage(struct page *);
>  extern int swap_writepage(struct page *page, struct writeback_control *wbc);
> +extern void end_swap_bio_write(struct bio *bio, int err);
> +extern int __swap_writepage(struct page *page, struct writeback_control *wbc,
> + void (*end_write_func)(struct bio *, int));
>  extern int swap_set_page_dirty(struct page *page);
>  extern void end_swap_bio_read(struct bio *bio, int err);
>  
> @@ -335,6 +338,7 @@ extern struct address_space swapper_space;
>  extern void show_swap_cache_info(void);
>  extern int add_to_swap(struct page *);
>  extern int add_to_swap_cache(struct page *, swp_entry_t, gfp_t);
> +extern int __add_to_swap_cache(struct page *page, swp_entry_t entry);

What's related __add_to_swap_cache with this patch?

>  extern void __delete_from_swap_cache(struct page *);
>  extern void delete_from_swap_cache(struct page *);
>  extern void free_page_and_swap_cache(struct page *);
> diff --git a/mm/page_io.c b/mm/page_io.c
> index c535d39..806085e 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -43,7 +43,7 @@ static struct bio *get_swap_bio(gfp_t gfp_flags,
>   return bio;
>  }
>  
> -static void end_swap_bio_write(struct bio *bio, int err)
> +void end_swap_bio_write(struct bio *bio, int err)

Why do you remove static in this patch? It's not related to the patch.

>  {
>   const int uptodate = test_bit(BIO_UPTODATE, >bi_flags);
>   struct page *page = bio->bi_io_vec[0].bv_page;
> @@ -180,15 +180,16 @@ bad_bmap:
>   goto out;
>  }
>  
> +int __swap_writepage(struct page *page, struct writeback_control *wbc,
> + void (*end_write_func)(struct bio *, int));
> +
>  /*
>   * We may have stale swap cache pages in memory: notice
>   * them here and get rid of the unnecessary final write.
>   */
>  int swap_writepage(struct page *page, struct writeback_control *wbc)
>  {
> - struct bio *bio;
> - int ret = 0, rw = WRITE;
> - struct swap_info_struct *sis = page_swap_info(page);
> + int ret = 0;
>  
>   if (try_to_free_swap(page)) {
>   unlock_page(page);
> @@ -200,6 +201,17 @@ int swap_writepage(struct page *page, struct 
> writeback_control *wbc)
>   end_page_writeback(page);
>   goto out;
>   }
> + ret = __swap_writepage(page, wbc, end_swap_bio_write);
> +out:
> + return ret;
> +}
> +
> +int __swap_writepage(struct page *page, struct writeback_control *wbc,
> + void (*end_write_func)(struct bio *, int))
> +{
> + struct bio *bio;
> + int ret = 0, rw = WRITE;
> + struct swap_info_struct *sis = page_swap_info(page);
>  
>   if (sis->flags & SWP_FILE) {
>   struct kiocb kiocb;
> @@ -227,7 +239,7 @@ int swap_writepage(struct page *page, struct 
> writeback_control *wbc)
>   return ret;
>   }
>  
> - bio = get_swap_bio(GFP_NOIO, page, end_swap_bio_write);
> + bio = get_swap_bio(GFP_NOIO, page, end_write_func);
>   if (bio == NULL) {
>   set_page_dirty(page);
>   unlock_page(page);
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 0cb36fb..7eded9c 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -67,7 +67,7 @@ void show_swap_cache_info(void)
>   * __add_to_swap_cache resembles add_to_page_cache_locked on swapper_space,
>   * but sets SwapCache flag and private instead of mapping and index.
>   */
> -static int __add_to_swap_cache(struct page *page, swp_entry_t entry)
> +int __add_to_swap_cache(struct page *page, swp_entry_t entry)

Ditto

>  {
>   int error;
>  
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

-- 
Kind 

Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-01-27 Thread Hugh Dickins
On Sun, 27 Jan 2013, Simon Jeons wrote:
> On Fri, 2013-01-25 at 18:01 -0800, Hugh Dickins wrote:
> > Switching merge_across_nodes after running KSM is liable to oops on stale
> > nodes still left over from the previous stable tree.  It's not something
> 
> Since this patch solve the problem, so the description of
> merge_across_nodes(Value can be changed only when there is no ksm shared
> pages in system) should be changed in this patch.

No.

The code could be changed to unmerge_and_remove_all_rmap_items()
automatically whenever merge_across_nodes is changed; but that's
not what Petr chose to do, and I didn't feel strongly to change it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-01-27 Thread Hugh Dickins
On Sun, 27 Jan 2013, Simon Jeons wrote:
> On Sun, 2013-01-27 at 15:05 -0800, Hugh Dickins wrote:
> > On Sat, 26 Jan 2013, Simon Jeons wrote:
> > > > How can this happen?  We only permit switching merge_across_nodes when
> > > > pages_shared is 0, and usually set run 2 to force that beforehand, which
> > > > ought to unmerge everything: yet oopses still occur when you then run 1.
> > > > 
> > > > Three causes:
> > > > 
> > > > 1. The old stable tree (built according to the inverse 
> > > > merge_across_nodes)
>^
> How to understand inverse merge_across_nodes here?

How not to understand it?  Either it was 0 before (in which case there
were as many stable trees as NUMA nodes) and is being changed to 1 (in
which case there is to be only one stable tree), or it was 1 before
(for one) and is being changed to 0 (for many).

> 
> > > > has not been fully torn down.  A stable node lingers until 
> > > > get_ksm_page()
> > > > notices that the page it references no longer references it: but the 
> > > > page
> 
> Do you mean page->mapping is NULL when call get_ksm_page()? Who clear it
> NULL?

I think I already pointed you to free_pages_prepare().

> 
> > > > is not necessarily freed as soon as expected, particularly when 
> > > > swapcache.
> 
> Why is not necessarily freed as soon as expected?

As I answered below.

> > > > 
> > > 
> > > When can this happen?  
> > 
> > Whenever there's an additional reference to the page, beyond those for
> > its ptes in userspace - swapcache for example, or pinned by get_user_pages.
> > That delays its being freed (arriving at the "page->mapping = NULL;"
> > in free_pages_prepare()).  Or it might simply be sitting in a pagevec,
> > waiting for that to be filled up, to be freed as part of a batch.

> > > mms forked will be unmerged just after ksmd's cursor since they're
> > > inserted behind it, why will be missing?
> > 
> > unmerge_and_remove_all_rmap_items() makes one pass through the list
> > from start to finish: insert behind the cursor and it will be missed.
> 
> Since mms forked will be insert just after ksmd's cursor, so it is the
> next which will be scan and unmerge, where I miss?

mms forked are normally inserted just behind (== before) ksmd's cursor,
as I've said in comments and explanations several times.

Simon, I've had enough: you clearly have much more time to spare for
asking questions than I have for answering them repeatedly: I would
rather spend my time attending to 100 higher priorities.

Please try much harder to work these things out for yourself from the
source (perhaps with help from kernelnewbies.org), before interrogating
linux-kernel and linux-mm developers.  Sometimes your questions may
help everybody to understand better, but often they just waste our time.

I'll happily admit that mm, and mm/ksm.c in particular, is not the easiest
place to start in understanding the kernel, nor I the best expositor.

Best wishes,
Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/6] audit: flatten kauditd_thread wait queue code

2013-01-27 Thread Richard Guy Briggs
From: Richard Guy Briggs 

The wait queue control code in kauditd_thread() was nested deeper than
necessary.  The function has been flattened for better legibility.

Signed-off-by: Richard Guy Briggs 
---

This is a code clean up in preparation to add a multicast netlink socket to
kaudit for read-only userspace clients such as systemd, in addition to the
bidirectional audit userspace client.

 kernel/audit.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 4bf486c..1531efb 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -458,10 +458,11 @@ static void flush_hold_queue(void)
 
 static int kauditd_thread(void *dummy)
 {
-   struct sk_buff *skb;
-
set_freezable();
while (!kthread_should_stop()) {
+   struct sk_buff *skb;
+   DECLARE_WAITQUEUE(wait, current);
+
flush_hold_queue();
 
skb = skb_dequeue(_skb_queue);
@@ -471,19 +472,18 @@ static int kauditd_thread(void *dummy)
kauditd_send_skb(skb);
else
audit_printk_skb(skb);
-   } else {
-   DECLARE_WAITQUEUE(wait, current);
-   set_current_state(TASK_INTERRUPTIBLE);
-   add_wait_queue(_wait, );
-
-   if (!skb_queue_len(_skb_queue)) {
-   try_to_freeze();
-   schedule();
-   }
+   continue;
+   }
+   set_current_state(TASK_INTERRUPTIBLE);
+   add_wait_queue(_wait, );
 
-   __set_current_state(TASK_RUNNING);
-   remove_wait_queue(_wait, );
+   if (!skb_queue_len(_skb_queue)) {
+   try_to_freeze();
+   schedule();
}
+
+   __set_current_state(TASK_RUNNING);
+   remove_wait_queue(_wait, );
}
return 0;
 }
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/6] audit: move kaudit thread start from auditd registration to kaudit init

2013-01-27 Thread Richard Guy Briggs
From: Richard Guy Briggs 

The kauditd_thread() task was started only after the auditd userspace daemon
registers itself with kaudit.  This was fine when only auditd consumed messages
from the kaudit netlink unicast socket.  With the addition of a multicast group
to that socket it is more convenient to have the thread start on init of the
kaudit kernel subsystem.

Signed-off-by: Richard Guy Briggs 
---

This is a code clean up in preparation to add a multicast netlink socket to
kaudit for read-only userspace clients such as systemd, in addition to the
bidirectional audit userspace client.

 kernel/audit.c | 14 --
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 1531efb..02a5d9e 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -676,16 +676,6 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
if (err)
return err;
 
-   /* As soon as there's any sign of userspace auditd,
-* start kauditd to talk to it */
-   if (!kauditd_task)
-   kauditd_task = kthread_run(kauditd_thread, NULL, "kauditd");
-   if (IS_ERR(kauditd_task)) {
-   err = PTR_ERR(kauditd_task);
-   kauditd_task = NULL;
-   return err;
-   }
-
loginuid = audit_get_loginuid(current);
sessionid = audit_get_sessionid(current);
security_task_getsecid(current, );
@@ -974,6 +964,10 @@ static int __init audit_init(void)
else
audit_sock->sk_sndtimeo = MAX_SCHEDULE_TIMEOUT;
 
+   kauditd_task = kthread_run(kauditd_thread, NULL, "kauditd");
+   if (IS_ERR(kauditd_task))
+   return PTR_ERR(kauditd_task);
+
skb_queue_head_init(_skb_queue);
skb_queue_head_init(_skb_hold_queue);
audit_initialized = AUDIT_INITIALIZED;
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] staging: zsmalloc: various cleanups/improvments

2013-01-27 Thread Minchan Kim
Hi Seth,

On Fri, Jan 25, 2013 at 11:46:14AM -0600, Seth Jennings wrote:
> These patches are the first 4 patches of the zswap patchset I
> sent out previously.  Some recent commits to zsmalloc and
> zcache in staging-next forced a rebase. While I was at it, Nitin
> (zsmalloc maintainer) requested I break these 4 patches out from
> the zswap patchset, since they stand on their own.

[2/4] and [4/4] is okay to merge current zsmalloc in staging but
[1/4] and [3/4] is dependent on zswap so it should be part of
zswap patchset.

> 
> All are already Acked-by Nitin.
> 
> Based on staging-next as of today.
> 
> Seth Jennings (4):
>   staging: zsmalloc: add gfp flags to zs_create_pool
>   staging: zsmalloc: remove unused pool name
>   staging: zsmalloc: add page alloc/free callbacks
>   staging: zsmalloc: make CLASS_DELTA relative to PAGE_SIZE
> 
>  drivers/staging/zram/zram_drv.c  |4 +-
>  drivers/staging/zsmalloc/zsmalloc-main.c |   60 
> ++
>  drivers/staging/zsmalloc/zsmalloc.h  |   10 -
>  3 files changed, 47 insertions(+), 27 deletions(-)
> 
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/6] audit: add restricted capability read-only netlink multicast socket

2013-01-27 Thread Richard Guy Briggs
From: Richard Guy Briggs 

Add a netlink multicast socket with one group to kaudit for "best-effort"
delivery to read-only userspace clients such as systemd, in addition to the
existing bidirectional unicast auditd userspace client.

Currently, auditd is intended to use the CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE
capabilities, but actually uses CAP_NET_ADMIN.  The CAP_AUDIT_READ capability
is added for use by read-only AUDIT_NLGRP_READLOG netlink multicast group
clients to the kaudit subsystem.

This will safely give access to services such as systemd to consume audit logs
while ensuring write access remains restricted for integrity.

Signed-off-by: Richard Guy Briggs 
---

(The seemingly wasteful skb_copy() is necessary because the original kaudit
unicast socket sends up messages with nlmsg_len set to the payload length
rather than the entire message length.  This breaks the convention used by
netlink.  The existing auditd daemon assumes this breakage.  Fixing this would
require co-ordinating a change in the established protocol between kaudit
kernel code and auditd userspace code.  There is no reason for new multicast
clients to continue with this breakage.)

 include/uapi/linux/audit.h  |  8 
 include/uapi/linux/capability.h |  5 -
 kernel/audit.c  | 40 +
 security/selinux/include/classmap.h |  2 +-
 4 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index 9f096f1..6296e5d9 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -357,6 +357,14 @@ enum {
 #define AUDIT_PERM_READ4
 #define AUDIT_PERM_ATTR8
 
+/* Multicast Netlink socket groups (default up to 32) */
+enum audit_nlgrps {
+   AUDIT_NLGRP_NONE,   /* Group 0 not used */
+   AUDIT_NLGRP_READLOG,/* "best effort" read only socket */
+   __AUDIT_NLGRP_MAX
+};
+#define AUDIT_NLGRP_MAX(__AUDIT_NLGRP_MAX - 1)
+
 struct audit_status {
__u32   mask;   /* Bit mask for valid entries */
__u32   enabled;/* 1 = enabled, 0 = disabled */
diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
index ba478fa..f579a06 100644
--- a/include/uapi/linux/capability.h
+++ b/include/uapi/linux/capability.h
@@ -343,7 +343,10 @@ struct vfs_cap_data {
 
 #define CAP_BLOCK_SUSPEND36
 
-#define CAP_LAST_CAP CAP_BLOCK_SUSPEND
+/* Allowed to read the audit log */
+#define CAP_AUDIT_READ 37
+
+#define CAP_LAST_CAP CAP_AUDIT_READ
 
 #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)
 
diff --git a/kernel/audit.c b/kernel/audit.c
index 02a5d9e..9eef05b 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -418,6 +418,37 @@ static void kauditd_send_skb(struct sk_buff *skb)
 }
 
 /*
+ * kauditd_send_multicast_skb - send the skb to multicast userspace listeners
+ *
+ * This function doesn't consume an skb as might be expected since it has to
+ * copy it anyways.
+ */
+static void kauditd_send_multicast_skb(struct sk_buff *skb)
+{
+   struct sk_buff *copy;
+   struct nlmsghdr *nlh;
+
+   /*
+* The seemingly wasteful skb_copy() is necessary because the original
+* kaudit unicast socket sends up messages with nlmsg_len set to the
+* payload length rather than the entire message length.  This breaks
+* the standard set by netlink.  The existing auditd daemon assumes
+* this breakage.  Fixing this would require co-ordinating a change in
+* the established protocol between the kaudit kernel subsystem and
+* the auditd userspace code.  There is no reason for new multicast
+* clients to continue with this non-compliance.
+*/
+   copy = skb_copy(skb, GFP_KERNEL);
+   if (!copy)
+   return;
+
+   nlh = nlmsg_hdr(copy);
+   nlh->nlmsg_len = copy->len;
+
+   nlmsg_multicast(audit_sock, copy, 0, AUDIT_NLGRP_READLOG, GFP_KERNEL);
+}
+
+/*
  * flush_hold_queue - empty the hold queue if auditd appears
  *
  * If auditd just started, drain the queue of messages already
@@ -468,6 +499,12 @@ static int kauditd_thread(void *dummy)
skb = skb_dequeue(_skb_queue);
wake_up(_backlog_wait);
if (skb) {
+   /* Don't bump skb refcount for multicast send since
+* kauditd_send_multicast_skb() copies the skb anyway
+* due to audit unicast netlink protocol
+* non-compliance.
+*/
+   kauditd_send_multicast_skb(skb);
if (audit_pid)
kauditd_send_skb(skb);
else
@@ -951,6 +988,9 @@ static int __init audit_init(void)
int i;
struct netlink_kernel_cfg cfg = {
.input  = audit_receive,
+   

[PATCH 6/6] audit: send multicast messages only if there are listeners

2013-01-27 Thread Richard Guy Briggs
From: Richard Guy Briggs 

Test first to see if there are any userspace multicast listeners bound to the
socket before starting the multicast send work.

Signed-off-by: Richard Guy Briggs 
---
 kernel/audit.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/audit.c b/kernel/audit.c
index 9eef05b..d153a6b 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -428,6 +428,8 @@ static void kauditd_send_multicast_skb(struct sk_buff *skb)
struct sk_buff *copy;
struct nlmsghdr *nlh;
 
+   if (!netlink_has_listeners(audit_sock, AUDIT_NLGRP_READLOG))
+   return;
/*
 * The seemingly wasteful skb_copy() is necessary because the original
 * kaudit unicast socket sends up messages with nlmsg_len set to the
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/6] audit: refactor hold queue flush

2013-01-27 Thread Richard Guy Briggs
From: Richard Guy Briggs 

The hold queue flush code is an autonomous chunk of code that can be
refactored, removed from kauditd_thread() into flush_hold_queue() and
flattenned for better legibility.

Signed-off-by: Richard Guy Briggs 
---

This is a code clean up in preparation to add a multicast netlink socket to
kaudit for read-only userspace clients such as systemd, in addition to the
bidirectional audit userspace client.

 kernel/audit.c | 62 +-
 1 file changed, 40 insertions(+), 22 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index d596e53..4bf486c 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -417,34 +417,52 @@ static void kauditd_send_skb(struct sk_buff *skb)
consume_skb(skb);
 }
 
+/*
+ * flush_hold_queue - empty the hold queue if auditd appears
+ *
+ * If auditd just started, drain the queue of messages already
+ * sent to syslog/printk.  Remember loss here is ok.  We already
+ * called audit_log_lost() if it didn't go out normally.  so the
+ * race between the skb_dequeue and the next check for audit_pid
+ * doesn't matter.
+ *
+ * If you ever find kauditd to be too slow we can get a perf win
+ * by doing our own locking and keeping better track if there
+ * are messages in this queue.  I don't see the need now, but
+ * in 5 years when I want to play with this again I'll see this
+ * note and still have no friggin idea what i'm thinking today.
+ */
+static void flush_hold_queue(void)
+{
+   struct sk_buff *skb;
+
+   if (!audit_default || !audit_pid)
+   return;
+
+   skb = skb_dequeue(_skb_hold_queue);
+   if (likely(!skb))
+   return;
+
+   while (skb && audit_pid) {
+   kauditd_send_skb(skb);
+   skb = skb_dequeue(_skb_hold_queue);
+   }
+
+   /*
+* if auditd just disappeared but we
+* dequeued an skb we need to drop ref
+*/
+   if (skb)
+   consume_skb(skb);
+}
+
 static int kauditd_thread(void *dummy)
 {
struct sk_buff *skb;
 
set_freezable();
while (!kthread_should_stop()) {
-   /*
-* if auditd just started drain the queue of messages already
-* sent to syslog/printk.  remember loss here is ok.  we already
-* called audit_log_lost() if it didn't go out normally.  so the
-* race between the skb_dequeue and the next check for audit_pid
-* doesn't matter.
-*
-* if you ever find kauditd to be too slow we can get a perf win
-* by doing our own locking and keeping better track if there
-* are messages in this queue.  I don't see the need now, but
-* in 5 years when I want to play with this again I'll see this
-* note and still have no friggin idea what i'm thinking today.
-*/
-   if (audit_default && audit_pid) {
-   skb = skb_dequeue(_skb_hold_queue);
-   if (unlikely(skb)) {
-   while (skb && audit_pid) {
-   kauditd_send_skb(skb);
-   skb = 
skb_dequeue(_skb_hold_queue);
-   }
-   }
-   }
+   flush_hold_queue();
 
skb = skb_dequeue(_skb_queue);
wake_up(_backlog_wait);
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/6] netlink: add send and receive capability requirement and capability flags

2013-01-27 Thread Richard Guy Briggs
From: Richard Guy Briggs 

Currently netlink socket permissions are controlled by the
NL_CFG_F_NONROOT_{RECV,SEND} flags in the kernel socket configuration or by the
CAP_NET_ADMIN capability of the client.  The former allows non-root users
access to the socket.  The latter allows all network admin clients access to
the socket.  It would be useful to be able to further restrict this access to
send or receive capabilities individually within specific subsystems with a
more targetted capability.  Two more flags, NL_CFG_F_CAPABILITY_{RECV,SEND},
have been added to specifically require a named capability should the subsystem
request it, allowing the client to drop other broad unneeded capabilities.

Signed-off-by: Richard Guy Briggs 
---

This is a feature addition in preparation to add a multicast netlink socket to
kaudit for read-only userspace clients such as systemd, in addition to the
bidirectional unicast auditd userspace client.

Currently, auditd has the CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE capabilities
(bot uses CAP_NET_ADMIN).  The CAP_AUDIT_READ capability will be added for use
by read-only AUDIT_NLGRP_READLOG multicast group clients to the kaudit
subsystem.

 include/linux/netlink.h  |  4 
 net/netlink/af_netlink.c | 35 +--
 2 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index e0f746b..30daf11 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -31,6 +31,8 @@ extern void netlink_table_ungrab(void);
 
 #define NL_CFG_F_NONROOT_RECV  (1 << 0)
 #define NL_CFG_F_NONROOT_SEND  (1 << 1)
+#define NL_CFG_F_CAPABILITY_RECV (1 << 2)
+#define NL_CFG_F_CAPABILITY_SEND (1 << 3)
 
 /* optional Netlink kernel configuration parameters */
 struct netlink_kernel_cfg {
@@ -39,6 +41,8 @@ struct netlink_kernel_cfg {
void(*input)(struct sk_buff *skb);
struct mutex*cb_mutex;
void(*bind)(int group);
+   int cap_send_requires;
+   int cap_recv_requires;
 };
 
 extern struct sock *__netlink_kernel_create(struct net *net, int unit,
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index c0353d5..cce1fe3 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -127,6 +127,8 @@ struct netlink_table {
struct module   *module;
void(*bind)(int group);
int registered;
+   int cap_send_requires;
+   int cap_recv_requires;
 };
 
 static struct netlink_table *nl_table;
@@ -552,6 +554,8 @@ static int netlink_release(struct socket *sock)
nl_table[sk->sk_protocol].bind = NULL;
nl_table[sk->sk_protocol].flags = 0;
nl_table[sk->sk_protocol].registered = 0;
+   nl_table[sk->sk_protocol].cap_send_requires = 0;
+   nl_table[sk->sk_protocol].cap_recv_requires = 0;
}
} else if (nlk->subscriptions) {
netlink_update_listeners(sk);
@@ -611,8 +615,20 @@ retry:
 
 static inline int netlink_capable(const struct socket *sock, unsigned int flag)
 {
-   return (nl_table[sock->sk->sk_protocol].flags & flag) ||
-   ns_capable(sock_net(sock->sk)->user_ns, CAP_NET_ADMIN);
+   struct netlink_table *nlt = _table[sock->sk->sk_protocol];
+
+   switch (flag & nlt->flags) {
+   case NL_CFG_F_NONROOT_RECV:
+   case NL_CFG_F_NONROOT_SEND:
+   return true;
+   case NL_CFG_F_CAPABILITY_RECV:
+   return capable(nlt->cap_recv_requires);
+   case NL_CFG_F_CAPABILITY_SEND:
+   return capable(nlt->cap_send_requires);
+   default:
+   return capable(CAP_NET_ADMIN);
+   }
+   return false;
 }
 
 static void
@@ -677,7 +693,8 @@ static int netlink_bind(struct socket *sock, struct 
sockaddr *addr,
 
/* Only superuser is allowed to listen multicasts */
if (nladdr->nl_groups) {
-   if (!netlink_capable(sock, NL_CFG_F_NONROOT_RECV))
+   if (!netlink_capable(sock, NL_CFG_F_NONROOT_RECV |
+  NL_CFG_F_CAPABILITY_RECV))
return -EPERM;
err = netlink_realloc_groups(sk);
if (err)
@@ -739,7 +756,9 @@ static int netlink_connect(struct socket *sock, struct 
sockaddr *addr,
return -EINVAL;
 
/* Only superuser is allowed to send multicasts */
-   if (nladdr->nl_groups && !netlink_capable(sock, NL_CFG_F_NONROOT_SEND))
+   if (nladdr->nl_groups &&
+   !netlink_capable(sock, NL_CFG_F_NONROOT_SEND |
+  NL_CFG_F_CAPABILITY_SEND))
return -EPERM;
 
if (!nlk->portid)
@@ -1262,7 +1281,8 @@ static int netlink_setsockopt(struct socket *sock, int 
level, int optname,

[PATCH 0/6] audit: add restricted capability read-only netlink multicast socket

2013-01-27 Thread Richard Guy Briggs
From: Richard Guy Briggs 

Hi,

This is a patch set Eric Paris and I have been working on to add a restricted
capability read-only netlink multicast socket to kaudit to enable 
userspace clients such as systemd to consume audit logs, in addition to the
bidirectional auditd userspace client.

Currently, auditd has the CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE capabilities
(bot uses CAP_NET_ADMIN).  The CAP_AUDIT_READ capability will be added for use
by read-only AUDIT_NLGRP_READLOG multicast group clients to the kaudit
subsystem.

https://bugzilla.redhat.com/show_bug.cgi?id=887992

Feedback please!

Richard Guy Briggs (6):
  audit: refactor hold queue flush
  audit: flatten kauditd_thread wait queue code
  audit: move kaudit thread start from auditd registration to kaudit
init
  netlink: add send and receive capability requirement and capability
flags
  audit: add the first netlink multicast socket group
  audit: send multicast messages only if there are listeners

 include/linux/netlink.h |   4 +
 include/uapi/linux/audit.h  |   8 ++
 include/uapi/linux/capability.h |   5 +-
 kernel/audit.c  | 142 +---
 net/netlink/af_netlink.c|  35 +++--
 security/selinux/include/classmap.h |   2 +-
 6 files changed, 144 insertions(+), 52 deletions(-)

-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/11] ksm: make KSM page migration possible

2013-01-27 Thread Hugh Dickins
On Sun, 27 Jan 2013, Simon Jeons wrote:
> On Sun, 2013-01-27 at 15:12 -0800, Hugh Dickins wrote:
> > On Sat, 26 Jan 2013, Simon Jeons wrote:
> > > 
> > > Could you explain why need check page->mapping twice after get page?
> > 
> > Once for the !locked case, which should not return page if mapping changed.
> > Once for the locked case, which should not return page if mapping changed.
> > We could use "else", but that wouldn't be an improvement.
> 
> But for locked case, page->mapping will be check twice.

Thrice.

I'm beginning to wonder: you do realize that page->mapping is volatile,
from the point of view of get_ksm_page()?  That is the whole point of
why get_ksm_page() exists.

I can see that the word "volatile" is not obviously used here - it's
tucked away inside the ACCESS_ONCE() - but I thought the descriptions
of races and barriers made that obvious.

If the comments here haven't helped enough, please take a look at
git commit 4035c07a8959 "ksm: take keyhole reference to page".
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 8/11] ksm: make !merge_across_nodes migration safe

2013-01-27 Thread Simon Jeons
On Fri, 2013-01-25 at 18:05 -0800, Hugh Dickins wrote:
> The new KSM NUMA merge_across_nodes knob introduces a problem, when it's
> set to non-default 0: if a KSM page is migrated to a different NUMA node,
> how do we migrate its stable node to the right tree?  And what if that
> collides with an existing stable node?
> 
> ksm_migrate_page() can do no more than it's already doing, updating
> stable_node->kpfn: the stable tree itself cannot be manipulated without
> holding ksm_thread_mutex.  So accept that a stable tree may temporarily
> indicate a page belonging to the wrong NUMA node, leave updating until
> the next pass of ksmd, just be careful not to merge other pages on to a

How you not to merge other pages on to a misplaced page? I don't see it.

> misplaced page.  Note nid of holding tree in stable_node, and recognize
> that it will not always match nid of kpfn.
> 
> A misplaced KSM page is discovered, either when ksm_do_scan() next comes
> around to one of its rmap_items (we now have to go to cmp_and_merge_page
> even on pages in a stable tree), or when stable_tree_search() arrives at
> a matching node for another page, and this node page is found misplaced.
> 
> In each case, move the misplaced stable_node to a list of migrate_nodes
> (and use the address of migrate_nodes as magic by which to identify them):
> we don't need them in a tree.  If stable_tree_search() finds no match for
> a page, but it's currently exiled to this list, then slot its stable_node
> right there into the tree, bringing all of its mappings with it; otherwise
> they get migrated one by one to the original page of the colliding node.
> stable_tree_search() is now modelled more like stable_tree_insert(),
> in order to handle these insertions of migrated nodes.

When node will be removed from migrate_nodes list and insert to stable
tree?

> 
> remove_node_from_stable_tree(), remove_all_stable_nodes() and
> ksm_check_stable_tree() have to handle the migrate_nodes list as well as
> the stable tree itself.  Less obviously, we do need to prune the list of
> stale entries from time to time (scan_get_next_rmap_item() does it once
> each full scan):

>  whereas stale nodes in the stable tree get naturally
> pruned as searches try to brush past them, these migrate_nodes may get
> forgotten and accumulate.

Hard to understand this description. Could you explain it? :)

> Signed-off-by: Hugh Dickins 

What will happen if page node of an unstable tree migrate to a new numa
node? Also need to handle colliding? 

> ---
>  mm/ksm.c |  164 +++--
>  1 file changed, 134 insertions(+), 30 deletions(-)
> 
> --- mmotm.orig/mm/ksm.c   2013-01-25 14:37:03.832206218 -0800
> +++ mmotm/mm/ksm.c2013-01-25 14:37:06.880206290 -0800
> @@ -122,13 +122,25 @@ struct ksm_scan {
>  /**
>   * struct stable_node - node of the stable rbtree
>   * @node: rb node of this ksm page in the stable tree
> + * @head: (overlaying parent) _nodes indicates temporarily on that 
> list
> + * @list: linked into migrate_nodes, pending placement in the proper node 
> tree
>   * @hlist: hlist head of rmap_items using this ksm page
> - * @kpfn: page frame number of this ksm page
> + * @kpfn: page frame number of this ksm page (perhaps temporarily on wrong 
> nid)
> + * @nid: NUMA node id of stable tree in which linked (may not match kpfn)
>   */
>  struct stable_node {
> - struct rb_node node;
> + union {
> + struct rb_node node;/* when node of stable tree */
> + struct {/* when listed for migration */
> + struct list_head *head;
> + struct list_head list;
> + };
> + };
>   struct hlist_head hlist;
>   unsigned long kpfn;
> +#ifdef CONFIG_NUMA
> + int nid;
> +#endif
>  };
>  
>  /**
> @@ -169,6 +181,9 @@ struct rmap_item {
>  static struct rb_root root_unstable_tree[MAX_NUMNODES];
>  static struct rb_root root_stable_tree[MAX_NUMNODES];
>  
> +/* Recently migrated nodes of stable tree, pending proper placement */
> +static LIST_HEAD(migrate_nodes);
> +
>  #define MM_SLOTS_HASH_BITS 10
>  static DEFINE_HASHTABLE(mm_slots_hash, MM_SLOTS_HASH_BITS);
>  
> @@ -311,11 +326,6 @@ static void insert_to_mm_slots_hash(stru
>   hash_add(mm_slots_hash, _slot->link, (unsigned long)mm);
>  }
>  
> -static inline int in_stable_tree(struct rmap_item *rmap_item)
> -{
> - return rmap_item->address & STABLE_FLAG;
> -}
> -
>  /*
>   * ksmd, and unmerge_and_remove_all_rmap_items(), must not touch an mm's
>   * page tables after it has passed through ksm_exit() - which, if necessary,
> @@ -476,7 +486,6 @@ static void remove_node_from_stable_tree
>  {
>   struct rmap_item *rmap_item;
>   struct hlist_node *hlist;
> - int nid;
>  
>   hlist_for_each_entry(rmap_item, hlist, _node->hlist, hlist) {
>   if (rmap_item->hlist.next)
> @@ -488,8 +497,11 @@ static void remove_node_from_stable_tree
>  

Re: [PATCH 4/4] staging: zsmalloc: make CLASS_DELTA relative to PAGE_SIZE

2013-01-27 Thread Minchan Kim
On Fri, Jan 25, 2013 at 11:46:18AM -0600, Seth Jennings wrote:
> Right now ZS_SIZE_CLASS_DELTA is hardcoded to be 16.  This
> creates 254 classes for systems with 4k pages. However, on
> PPC64 with 64k pages, it creates 4095 classes which is far
> too many.
> 
> This patch makes ZS_SIZE_CLASS_DELTA relative to PAGE_SIZE
> so that regardless of the page size, there will be the same
> number of classes.
> 
> Acked-by: Nitin Gupta 
> Signed-off-by: Seth Jennings 
Acked-by: Minchan Kim 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] userns: Allow the unprivileged users to mount mqueue fs

2013-01-27 Thread Eric W. Biederman
Gao feng  writes:

> This patch allow the unprivileged user to mount mqueuefs in
> user ns.
>
> If two userns share the same ipcns,the files in mqueue fs
> should be seen in both these two userns.
>
> If the userns has its own ipcns,it has its own mqueue fs too.
> ipcns has already done this job well.

I am a little dense.  When does userspace actually mount a mqueuefs?
My impression was that user space never needed to mount and actually
never could mount a mqueuefs.  MS_NO_USER isn't set so mounting a
mqueuefs is possible but when does it happen and why?

I am trying to think through the logic here and I think this is safe
but since I don't understand why we would mount an mqueue fs I am
having trouble verifying that there are no silly reasons why this might
be a bad idea.

But from what I can tell so far this seems like a good patch.

Eric


> Signed-off-by: Gao feng 
> ---
>  ipc/mqueue.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/ipc/mqueue.c b/ipc/mqueue.c
> index 71a3ca1..023c986 100644
> --- a/ipc/mqueue.c
> +++ b/ipc/mqueue.c
> @@ -1383,6 +1383,7 @@ static struct file_system_type mqueue_fs_type = {
>   .name = "mqueue",
>   .mount = mqueue_mount,
>   .kill_sb = kill_litter_super,
> + .fs_flags = FS_USERNS_MOUNT,
>  };
>  
>  int mq_init_ns(struct ipc_namespace *ns)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] mfd: wm8994: Use devm_regulator_bulk_get API

2013-01-27 Thread Sachin Kamat
On 27 January 2013 05:52, Samuel Ortiz  wrote:
> Hi Sachin,
>
> On Thu, Jan 24, 2013 at 09:13:20AM +0530, Sachin Kamat wrote:
>> Hi Samuel,
>>
>> On 8 January 2013 16:06, Mark Brown  
>> wrote:
>> > On Tue, Jan 08, 2013 at 02:01:22PM +0530, Sachin Kamat wrote:
>> >> devm_regulator_bulk_get is device managed and saves some cleanup
>> >> and exit code.
>> >
>> > Acked-by: Mark Brown 
>>
>> Would you be picking this patch up?
> I will, yes.

Thanks Samuel.

There is also another patch that needs your attention (pending for a while).
https://lkml.org/lkml/2012/12/7/65

-- 
With warm regards,
Sachin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] staging: zsmalloc: add page alloc/free callbacks

2013-01-27 Thread Minchan Kim
On Sat, Jan 26, 2013 at 2:46 AM, Seth Jennings
 wrote:
> This patch allows users of zsmalloc to register the
> allocation and free routines used by zsmalloc to obtain
> more pages for the memory pool.  This allows the user
> more control over zsmalloc pool policy and behavior.
>
> If the user does not wish to control this, alloc_page() and
> __free_page() are used by default.
>
> Acked-by: Nitin Gupta 
> Signed-off-by: Seth Jennings 
Acked-by: Minchan Kim 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/4] staging: zsmalloc: remove unused pool name

2013-01-27 Thread Minchan Kim
On Fri, Jan 25, 2013 at 11:46:16AM -0600, Seth Jennings wrote:
> zs_create_pool() currently takes a name argument which is
> never used in any useful way.
> 
> This patch removes it.
> 
> Acked-by: Nitin Gupta 
> Signed-off-by: Seth Jennnings 
Acked-by: Minchan Kim 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] staging: zsmalloc: add gfp flags to zs_create_pool

2013-01-27 Thread Minchan Kim
Hi Seth,

On Fri, Jan 25, 2013 at 11:46:15AM -0600, Seth Jennings wrote:
> zs_create_pool() currently takes a gfp flags argument
> that is used when growing the memory pool.  However
> it is not used in allocating the metadata for the pool
> itself.  That is currently hardcoded to GFP_KERNEL.
> 
> zswap calls zs_create_pool() at swapon time which is done
> in atomic context, resulting in a "might sleep" warning.
> 
> This patch changes the meaning of the flags argument in
> zs_create_pool() to mean the flags for the metadata allocation,
> and adds a flags argument to zs_malloc that will be used for
> memory pool growth if required.

As I mentioned, I'm not strongly against with this patch but it
should be last resort in case of not being able to address
frontswap's init routine's dependency with swap_lock.

I sent a patch and am waiting reply of Konrand or Dan.
If we can fix frontswap, it would be better rather than
changing zsmalloc.

> 
> Acked-by: Nitin Gupta 
> Signed-off-by: Seth Jennings 
> ---
>  drivers/staging/zram/zram_drv.c  |4 ++--
>  drivers/staging/zsmalloc/zsmalloc-main.c |9 +++--
>  drivers/staging/zsmalloc/zsmalloc.h  |2 +-
>  3 files changed, 6 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> index 6762b99..836dccf 100644
> --- a/drivers/staging/zram/zram_drv.c
> +++ b/drivers/staging/zram/zram_drv.c
> @@ -325,7 +325,7 @@ static int zram_bvec_write(struct zram *zram, struct 
> bio_vec *bvec, u32 index,
>   clen = PAGE_SIZE;
>   }
>  
> - handle = zs_malloc(zram->mem_pool, clen);
> + handle = zs_malloc(zram->mem_pool, clen, GFP_NOIO | __GFP_HIGHMEM);
>   if (!handle) {
>   pr_info("Error allocating memory for compressed "
>   "page: %u, size=%zu\n", index, clen);
> @@ -565,7 +565,7 @@ int zram_init_device(struct zram *zram)
>   /* zram devices sort of resembles non-rotational disks */
>   queue_flag_set_unlocked(QUEUE_FLAG_NONROT, zram->disk->queue);
>  
> - zram->mem_pool = zs_create_pool("zram", GFP_NOIO | __GFP_HIGHMEM);
> + zram->mem_pool = zs_create_pool("zram", GFP_KERNEL);
>   if (!zram->mem_pool) {
>   pr_err("Error creating memory pool\n");
>   ret = -ENOMEM;
> diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c 
> b/drivers/staging/zsmalloc/zsmalloc-main.c
> index eb00772..f29f170 100644
> --- a/drivers/staging/zsmalloc/zsmalloc-main.c
> +++ b/drivers/staging/zsmalloc/zsmalloc-main.c
> @@ -205,8 +205,6 @@ struct link_free {
>  
>  struct zs_pool {
>   struct size_class size_class[ZS_SIZE_CLASSES];
> -
> - gfp_t flags;/* allocation flags used when growing pool */
>   const char *name;
>  };
>  
> @@ -818,7 +816,7 @@ struct zs_pool *zs_create_pool(const char *name, gfp_t 
> flags)
>   return NULL;
>  
>   ovhd_size = roundup(sizeof(*pool), PAGE_SIZE);
> - pool = kzalloc(ovhd_size, GFP_KERNEL);
> + pool = kzalloc(ovhd_size, flags);
>   if (!pool)
>   return NULL;
>  
> @@ -838,7 +836,6 @@ struct zs_pool *zs_create_pool(const char *name, gfp_t 
> flags)
>  
>   }
>  
> - pool->flags = flags;
>   pool->name = name;
>  
>   return pool;
> @@ -874,7 +871,7 @@ EXPORT_SYMBOL_GPL(zs_destroy_pool);
>   * otherwise 0.
>   * Allocation requests with size > ZS_MAX_ALLOC_SIZE will fail.
>   */
> -unsigned long zs_malloc(struct zs_pool *pool, size_t size)
> +unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t flags)
>  {
>   unsigned long obj;
>   struct link_free *link;
> @@ -896,7 +893,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
>  
>   if (!first_page) {
>   spin_unlock(>lock);
> - first_page = alloc_zspage(class, pool->flags);
> + first_page = alloc_zspage(class, flags);
>   if (unlikely(!first_page))
>   return 0;
>  
> diff --git a/drivers/staging/zsmalloc/zsmalloc.h 
> b/drivers/staging/zsmalloc/zsmalloc.h
> index de2e8bf..907ff03 100644
> --- a/drivers/staging/zsmalloc/zsmalloc.h
> +++ b/drivers/staging/zsmalloc/zsmalloc.h
> @@ -31,7 +31,7 @@ struct zs_pool;
>  struct zs_pool *zs_create_pool(const char *name, gfp_t flags);
>  void zs_destroy_pool(struct zs_pool *pool);
>  
> -unsigned long zs_malloc(struct zs_pool *pool, size_t size);
> +unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t flags);
>  void zs_free(struct zs_pool *pool, unsigned long obj);
>  
>  void *zs_map_object(struct zs_pool *pool, unsigned long handle,
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to 

[PATCH V2 1/2] thermal: sysfs: Add a new sysfs node emul_temp for thermal emulation

2013-01-27 Thread Amit Daniel Kachhap
This patch adds support to set the emulated temperature method in
thermal zone (sensor). After setting this feature thermal zone may
report this temperature and not the actual temperature. The emulation
implementation may be based on sensor capability through platform
specific handler or pure software emulation if no platform handler defined.

This is useful in debugging different temperature threshold and its
associated cooling action. Critical threshold's cannot be emulated.
Writing 0 on this node should disable emulation.

Signed-off-by: Amit Daniel Kachhap 
Acked-by: Kukjin Kim 
---

Changes in V2:
* Added config option for enabling emulation support.
* Added s/w emulation if no platform handler registered.
* skip the critical trip point emulation

This patchset is based on thermal maintainer next tree.
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux.git next 

 Documentation/thermal/sysfs-api.txt |   13 ++
 drivers/thermal/Kconfig |8 +++
 drivers/thermal/thermal_sys.c   |   82 ++-
 include/linux/thermal.h |2 +
 4 files changed, 94 insertions(+), 11 deletions(-)

diff --git a/Documentation/thermal/sysfs-api.txt 
b/Documentation/thermal/sysfs-api.txt
index 526d4b9..6859661 100644
--- a/Documentation/thermal/sysfs-api.txt
+++ b/Documentation/thermal/sysfs-api.txt
@@ -55,6 +55,8 @@ temperature) and throttle appropriate devices.
.get_trip_type: get the type of certain trip point.
.get_trip_temp: get the temperature above which the certain trip point
will be fired.
+   .set_emul_temp: set the emulation temperature which helps in debugging
+   different threshold temperature points.
 
 1.1.2 void thermal_zone_device_unregister(struct thermal_zone_device *tz)
 
@@ -153,6 +155,7 @@ Thermal zone device sys I/F, created once it's registered:
 |---trip_point_[0-*]_temp: Trip point temperature
 |---trip_point_[0-*]_type: Trip point type
 |---trip_point_[0-*]_hyst: Hysteresis value for this trip point
+|---emul_temp: Emulated temperature set node
 
 Thermal cooling device sys I/F, created once it's registered:
 /sys/class/thermal/cooling_device[0-*]:
@@ -252,6 +255,16 @@ passive
Valid values: 0 (disabled) or greater than 1000
RW, Optional
 
+emul_temp
+   Interface to set the emulated temperature method in thermal zone
+   (sensor). After setting this temperature, the thermal zone may pass
+   this temperature to platform emulation function if registered or
+   cache it locally. This is useful in debugging different temperature
+   threshold and its associated cooling action. This is write only node
+   and writing 0 on this node should disable emulation.
+   Unit: millidegree Celsius
+   WO, Optional
+
 *
 * Cooling device attributes *
 *
diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
index faf38c5..e4cf7fb 100644
--- a/drivers/thermal/Kconfig
+++ b/drivers/thermal/Kconfig
@@ -78,6 +78,14 @@ config CPU_THERMAL
  and not the ACPI interface.
  If you want this support, you should say Y here.
 
+config THERMAL_EMULATION
+   bool "Thermal emulation mode support"
+   help
+ Enable this option to make a emul_temp sysfs node in thermal zone
+ directory to support temperature emulation. With emulation sysfs node,
+ user can manually input temperature and test the different trip
+ threshold behaviour for simulation purpose.
+
 config SPEAR_THERMAL
bool "SPEAr thermal sensor driver"
depends on PLAT_SPEAR
diff --git a/drivers/thermal/thermal_sys.c b/drivers/thermal/thermal_sys.c
index 0a1bf6b..59ba709 100644
--- a/drivers/thermal/thermal_sys.c
+++ b/drivers/thermal/thermal_sys.c
@@ -378,24 +378,57 @@ static void handle_thermal_trip(struct 
thermal_zone_device *tz, int trip)
monitor_thermal_zone(tz);
 }
 
+static int thermal_zone_get_temp(struct thermal_zone_device *tz,
+   unsigned long *temp)
+{
+   int ret = 0, count;
+   unsigned long crit_temp = -1UL;
+   enum thermal_trip_type type;
+
+   mutex_lock(>lock);
+
+   if (tz->ops->get_temp)
+   ret = tz->ops->get_temp(tz, temp);
+   else
+   ret = -EPERM;
+
+   if (!tz->emul_temperature)
+   goto skip_emul;
+
+   for (count = 0; count < tz->trips; count++) {
+   ret = tz->ops->get_trip_type(tz, count, );
+   if (!ret && type == THERMAL_TRIP_CRITICAL) {
+   ret = tz->ops->get_trip_temp(tz, count, _temp);
+   break;
+   }
+   }
+
+   if (ret)
+   goto skip_emul;
+
+   if (*temp < crit_temp)
+   *temp = tz->emul_temperature;
+
+skip_emul:
+   mutex_unlock(>lock);
+   return ret;
+}
+
 static void 

Re: bzImage 2.12

2013-01-27 Thread H. Peter Anvin
Thanks.

Yinghai Lu  wrote:

>On Sun, Jan 27, 2013 at 11:39 AM, H. Peter Anvin  wrote:
>> I'm planning to sort it out... I'll let you know if I run out of
>bandwidth.
>>
>> Yinghai Lu  wrote:
>>
>>>On Sun, Jan 27, 2013 at 11:19 AM, H. Peter Anvin 
>wrote:

 I think we can probably do that, since it doesn't affect anything
>>>non-broken
 at this point.  I'm sorting out what can be done for 3.8 vs 3.9 at
>>>this
 point.

 Anyway, as you can tell I'm spending this weekend working for a
>>>reason.

 It turns out the patch I sent out doesn't actually build.  Here is
>an
 updated patch.  Can I get your ack for this so I can do the
>>>appropriate
 hacks to your and Yinghai's patchsets?
>>>
>>>Acked-by: Yinghai Lu 
>>>
>>>Do you want to me to update my patchset on top it and resend?
>>>
>>>or you are going to sort it out by you self?
>>>
>
>To save your some time, please check attached patch.
>
>It would take position of
>
>https://patchwork.kernel.org/patch/2035731/
>
>[25/35] x86, boot: Add fields to support load bzImage and ramdisk above
>4G
>
>Thanks
>
>Yinghai

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Panic during interrupt handling while terminating hostapd

2013-01-27 Thread Mihai Moldovan
Hi,

I've found yet another problem with (at least) 3.7.4 and 3.8-rc4.

When terminating hostapd via SIGINT, this bug and panic came up:


BUG: unable to handle kernel paging request at 001d8000
IP: [<-ADDRESS>] kmem_cache_alloc+0x43/0xb0
PGD 21c3db067 PUD 0
Oops:  [#1] SMP
Modules linked in: xt_conntrack xt_dscp i915 ath9k drm_kms_helper mac80211
kvm_intel video ath9k_common ath9k_hw kvm e1000e ath backlight cfg80211 rfkill
CPU 2
Pid: 6972, comm: modprobe Tainted: GW3.7.4-OSS4.2
#3  /DQ45CB
RIP: 0010:[<-ADDRESS>]  [<-ADDRESS>] kmem_cache_alloc+0x43/0xb0
RSP: 0018:-ADDRESS  EFLAGS: 00010206
RAX: -ADDRESS RBX: -ADDRESS RCX: -ADDRESS
RDX: -ADDRESS RSI: -ADDRESS RDI: -ADDRESS
RBP: -ADDRESS R08: -ADDRESS R09: -ADDRESS
R10: -ADDRESS R11: -ADDRESS R12: -ADDRESS
FS:  -ADDRESS() GS:-ADDRESS() knlGS:-ADDRESS
CS:  0010 DS:  ES:  CR0: -ADDRESS
CR2: -ADDRESS CR3: -ADDRESS CR4: -ADDRESS
DR0: -ADDRESS CR1: -ADDRESS DR2: -ADDRESS
DR3: -ADDRESS DR6: -ADDRESS DR7: -ADDRESS
Process modprobe (pid: 6972, threadinfo -ADDRESS, task -ADDRESS)
Stack:
 -ADDRESS -ADDRESS -ADDRESS -ADDRESS
 -ADDRESS -ADDRESS -ADDRESS -ADDRESS
 -ADDRESS -ADDRESS -ADDRESS -ADDRESS
Call Trace:
 [<-ADDRESS>] __d_alloc+0x2f/0x180
 [<-ADDRESS>] d_alloc+0x13/0x70
 [<-ADDRESS>] lookup_dcache+0xa3/0xd0
 [<-ADDRESS>] ? path_get+0x26/0x40
 [<-ADDRESS>] lookup_open+0x54/0x1c0
 [<-ADDRESS>] do_last+0x319/0x830
 [<-ADDRESS>] path_openat+0xae/0x4c0
 [<-ADDRESS>] ? handle_mm_fault+0x210/0x2d0
 [<-ADDRESS>] do_filp_open+0x3d/0xa0
 [<-ADDRESS>] ? __alloc_fd+0x45/0x120
 [<-ADDRESS>] do_sys_open+0xf9/0x1e0
 [<-ADDRESS>] sys_openat+0xf/0x20
 [<-ADDRESS>] system_call_fastpath+0x16/0x1b
Code: 5d e0 4c 89 65 e8 49 8b 4d 00 65 48 03 0c 25 28 cd 00 00 48 8b 51 08 4c 8b
21 4d 85 e4 74 62 49 63 45 20 48 8d 4a 01 49 8b 7d 00 <49> 8b 1c
 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 74 c8 49 63
RIP  [<-ADDRESS>] kmem_cache_alloc+0x43/0xb0
 RSP <-ADDRESS>
CR2: -ADDRESS
general protection fault:  [#2] SMP
Modules linked in: xt_conntrack xt_dscp i915 ath9k drm_kms_helper mac80211
kvm_intel video ath9k_common ath9k_hw kvm e1000e ath backlight cfg80211 rfkill
CPU 2
Pid: 0, comm: swapper/2 Tainted: G  D W3.7.4-OSS4.2 #3 
/DQ45CB
RIP: 0010[<-ADDRESS>]  [<-ADDRESS>] 
rcu_do_batch.isra.37+0x131/0x290
RSP: 0018:-ADDRESS  EFLAGS: 00010212
RAX: -ADDRESS RBX: -ADDRESS RCX: -ADDRESS
RDX: -ADDRESS RSI: -ADDRESS RDI: -ADDRESS
RBP: -ADDRESS R08: -ADDRESS R09: -ADDRESS
R10: -ADDRESS R11: -ADDRESS R12: -ADDRESS
R13: -ADDRESS R14: -ADDRESS R15: -ADDRESS
FS:  -ADDRESS() GS:-ADDRESS() knlGS:-ADDRESS
CS:  0010 DS:  ES:  CR0: -ADDRESS
CR2: -ADDRESS CR3: -ADDRESS CR4: -ADDRESS
DR0: -ADDRESS DR1: -ADDRESS DR2: -ADDRESS
DR3: -ADDRESS DR6: -ADDRESS DR7: -ADDRESS
Process swapper/2 (pid: 0, threadinfo -ADDRESS, task -ADDRESS)
Stack:
 -ADDRESS -ADDRESS -ADDRESS -ADDRESS
 -ADDRESS -ADDRESS -ADDRESS -ADDRESS
 -ADDRESS -ADDRESS -ADDRESS -ADDRESS
Call Trace:
 
 [<-ADDRESS>] ? tick_program_event+0x1f/0x30
 [<-ADDRESS>] __rcu_process_callbacks+0xaa/0x140
 [<-ADDRESS>] rcu_process_callbacks+0x48/0x70
 [<-ADDRESS>] __do_softirq+0xa8/0x150
 [<-ADDRESS>] call_softirq+0x1c/0x30
 [<-ADDRESS>] do_softirq+0x4d/0x80
 [<-ADDRESS>] irq_exit+0x8e/0xb0
 [<-ADDRESS>] do_IRQ+0x5e/0xd0
 [<-ADDRESS>] common_interrupt+0x67/0x67
 
 [<-ADDRESS>] ? acpi_idle_enter_simple+0xbd/0xf4
 [<-ADDRESS>] ? acpi_idle_enter_simple+0xb8/0xf4
 [<-ADDRESS>] acpi_idle_enter_bm+0xe1/0x24b
 [<-ADDRESS>] ? menu_select+0xe4/0x300
 [<-ADDRESS>] cpuidle_enter+0x19/0x20
 [<-ADDRESS>] cpuidle_idle_call+0x8b/0xf0
 [<-ADDRESS>] cpu_idle+0xbf/0x110
 [<-ADDRESS>] start_secondary+0xb3/0xb5
Code: b8 8b 92 ac 01 00 00 85 d2 75 2f 4d 85 ff 74 2a 4c 89 ff 48 8b 57 08 4c 8b
3f 48 81 fa ff 0f 00 00 41 0f 18 0f 76 ab 48 89 45 a8  d2 48
 8b 45 a8 eb b4 0f 1f 80 00 00 00 00 48 89 c1 9c 41 5d
RIP [<-ADDRESS>] 

Re: [PATCH 5/11] ksm: get_ksm_page locked

2013-01-27 Thread Hugh Dickins
On Sun, 27 Jan 2013, Simon Jeons wrote:
> On Sun, 2013-01-27 at 14:08 -0800, Hugh Dickins wrote:
> > On Sat, 26 Jan 2013, Simon Jeons wrote:
> > > 
> > > Why the parameter lock passed from stable_tree_search/insert is true,
> > > but remove_rmap_item_from_tree is false?
> > 
> > The other way round?  remove_rmap_item_from_tree needs the page locked,
> > because it's about to modify the list: that's secured (e.g. against
> > concurrent KSM page reclaim) by the page lock.
> 
> How can KSM page reclaim path call remove_rmap_item_from_tree? I have
> already track every callsites but can't find it.

It doesn't.  Please read what I said above again.

> BTW, I'm curious about
> KSM page reclaim, it seems that there're no special handle in vmscan.c
> for KSM page reclaim, is it will be reclaimed similiar with normal
> page? 

Look for PageKsm in mm/rmap.c.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] thermal: sysfs: Add a new sysfs node emul_temp

2013-01-27 Thread amit kachhap
On Mon, Jan 21, 2013 at 7:20 PM, Zhang Rui  wrote:
> On Wed, 2013-01-16 at 11:30 -0800, amit kachhap wrote:
>> Hi Rui,
>>
>> Thanks for the review comments,
>> On Tue, Jan 15, 2013 at 11:33 PM, Zhang Rui  wrote:
>> > Hi, Amit,
>> >
>> > On Sun, 2013-01-06 at 16:08 -0800, Amit Daniel Kachhap wrote:
>> >> This patch adds support to set the emulated temperature method in
>> >> thermal zone (sensor). After setting this feature thermal zone must
>> >> report this temperature and not the actual temperature. The actual
>> >> implementation of this emulated temperature is based on sensor
>> >> capability or platform specific. This is useful in debugging different
>> >> temperature threshold and its associated cooling action. Writing 0 on
>> >> this node should disable emulation.
>> >
>> > Question:
>> > will this bring hardware issue? Say, critical temperature reached while
>> > in emulation mode?
>> No emulation does cause any h/w issue.
>> >
>> > As this is for debug purpose, I'd prefer to have a seperate Kconfig
>> > option for this feature.
>> Yes agreed. Will re-submit with kconfig option.
>> >
>> >> Signed-off-by: Amit Daniel Kachhap 
>> >> ---
>> >>  Documentation/thermal/sysfs-api.txt |   14 ++
>> >>  drivers/thermal/thermal_sys.c   |   26 ++
>> >>  include/linux/thermal.h |1 +
>> >>  3 files changed, 41 insertions(+), 0 deletions(-)
>> >>
>> >> diff --git a/Documentation/thermal/sysfs-api.txt 
>> >> b/Documentation/thermal/sysfs-api.txt
>> >> index 88c0233..e8f2ee4 100644
>> >> --- a/Documentation/thermal/sysfs-api.txt
>> >> +++ b/Documentation/thermal/sysfs-api.txt
>> >> @@ -55,6 +55,8 @@ temperature) and throttle appropriate devices.
>> >>   .get_trip_type: get the type of certain trip point.
>> >>   .get_trip_temp: get the temperature above which the certain trip 
>> >> point
>> >>   will be fired.
>> >> + .set_emul_temp: set the emulation temperature which helps in 
>> >> debugging
>> >> + different threshold temperature points.
>> >>
>> >>  1.1.2 void thermal_zone_device_unregister(struct thermal_zone_device *tz)
>> >>
>> >> @@ -153,6 +155,7 @@ Thermal zone device sys I/F, created once it's 
>> >> registered:
>> >>  |---trip_point_[0-*]_temp:   Trip point temperature
>> >>  |---trip_point_[0-*]_type:   Trip point type
>> >>  |---trip_point_[0-*]_hyst:   Hysteresis value for this trip point
>> >> +|---emul_temp:   Emulated temperature set node
>> >>
>> >>  Thermal cooling device sys I/F, created once it's registered:
>> >>  /sys/class/thermal/cooling_device[0-*]:
>> >> @@ -252,6 +255,17 @@ passive
>> >>   Valid values: 0 (disabled) or greater than 1000
>> >>   RW, Optional
>> >>
>> >> +emul_temp
>> >> + Interface to set the emulated temperature method in thermal zone
>> >> + (sensor). After setting this feature thermal zone must report
>> >> + this temperature and not the actual temperature. The actual
>> >> + implementation of this emulated temperature is platform specific.
>> >
>> > can we have a pure software temperature emulation method?
>> > say, the generic thermal layer caches the emulated temperature value,
>> > and hook it in update_temperature()?
>> > This is also useful for testing in polling mode, and it does not require
>> > platform specific callback support. I mean thermal_ops->set_emul_temp is
>> > optional, but thermal emulation is always available for all platforms.
>> Yes It makes sense and we can have pure software emulation and use the
>> cached temperature when no platform call is registered. In my case I
>> needed this in h/w so to have the same sensor trigger interrupts
>> behaviour.
>>
>> So the code flow can be like this,
>>
>> #ifdef CONFIG_THERMAL_EMULATION
>> if (thermal_ops->set_emul_temp)
>> then pass emul_temp to platform and use the normal platform
>> thermal_ops->get_temp
>> else
>> Store it locally and use emul_temp  instead of calling platform
>> thermal_ops->get_temp
>> #endif
>>
> No.
> We should not support emulation is CONFIG_THERMAL_EMULATION is cleared.
> And further more, for pure software emulation, we should check if the
> real temperature reaches critical trip point.
Yes agreed. Submitted the V2 version with your suggestion.

Thanks,
Amit Daniel

>
> thanks,
> rui
>> I will re-submit with this change.
>>
>> Thanks,
>> Amit
>> >
>> > thanks,
>> > rui
>> >> + This is useful in debugging different temperature threshold and its
>> >> + associated cooling action. Writing 0 on this node should disable
>> >> + emulation.
>> >> + Unit: millidegree Celsius
>> >> + WO, Optional
>> >> +
>> >>  *
>> >>  * Cooling device attributes *
>> >>  *
>> >> diff --git a/drivers/thermal/thermal_sys.c b/drivers/thermal/thermal_sys.c
>> >> index 8c8ce80..ecdfc7d 100644
>> >> --- a/drivers/thermal/thermal_sys.c
>> >> +++ 

[PATCH] smp:Fix use un-initialized cpumask_ipi

2013-01-27 Thread Wang YanQing
c7b798525b50256c8084215a139fa40b0114bfcc
[smp: Fix SMP function call empty cpu mask race]
use the un-initialized variable cpumask_ipi when
enable CONFIG_CPUMASK_OFFSTACK.

Signed-off-by: Wang YanQing 
---
 I am sorry for miss it, I just think it when I 
 was lying on the bed last night. :)

 kernel/smp.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/smp.c b/kernel/smp.c
index 7c56aba..a38aa18 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -57,6 +57,9 @@ hotplug_cfd(struct notifier_block *nfb, unsigned long action, 
void *hcpu)
if (!zalloc_cpumask_var_node(>cpumask, GFP_KERNEL,
cpu_to_node(cpu)))
return notifier_from_errno(-ENOMEM);
+   if (!zalloc_cpumask_var_node(>cpumask_ipi, GFP_KERNEL,
+   cpu_to_node(cpu)))
+   return notifier_from_errno(-ENOMEM);
break;
 
 #ifdef CONFIG_HOTPLUG_CPU
-- 
1.7.11.1.116.g8228a23
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 2/2] thermal: exynos: Use the framework for temperature emulation support

2013-01-27 Thread Amit Daniel Kachhap
This removes the driver specific sysfs support of the temperature
emulation and uses the newly added core thermal framework for thermal
emulation. A platform specific handler is added to support this.

Signed-off-by: Amit Daniel Kachhap 
Acked-by: Kukjin Kim 
---
Changes in V2:
* Added config option CONFIG_THERMAL_EMULATION instead of
 CONFIG_EXYNOS_THERMAL_EMUL 

This patchset is based on thermal maintainer next tree.
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux.git next 

 Documentation/thermal/exynos_thermal_emulation |8 +-
 drivers/thermal/Kconfig|9 --
 drivers/thermal/exynos_thermal.c   |  158 ++--
 3 files changed, 67 insertions(+), 108 deletions(-)

diff --git a/Documentation/thermal/exynos_thermal_emulation 
b/Documentation/thermal/exynos_thermal_emulation
index b73bbfb..36a3e79 100644
--- a/Documentation/thermal/exynos_thermal_emulation
+++ b/Documentation/thermal/exynos_thermal_emulation
@@ -13,11 +13,11 @@ Thermal emulation mode supports software debug for TMU's 
operation. User can set
 manually with software code and TMU will read current temperature from user 
value not from
 sensor's value.
 
-Enabling CONFIG_EXYNOS_THERMAL_EMUL option will make this support in available.
-When it's enabled, sysfs node will be created under
-/sys/bus/platform/devices/'exynos device name'/ with name of 'emulation'.
+Enabling CONFIG_THERMAL_EMULATION option will make this support available.
+When it's enabled, sysfs node will be created as
+/sys/devices/virtual/thermal/thermal_zone'zone id'/emul_temp.
 
-The sysfs node, 'emulation', will contain value 0 for the initial state. When 
you input any
+The sysfs node, 'emul_node', will contain value 0 for the initial state. When 
you input any
 temperature you want to update to sysfs node, it automatically enable 
emulation mode and
 current temperature will be changed into it.
 (Exynos also supports user changable delay time which would be used to delay of
diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
index e4cf7fb..2a79510 100644
--- a/drivers/thermal/Kconfig
+++ b/drivers/thermal/Kconfig
@@ -109,15 +109,6 @@ config EXYNOS_THERMAL
  If you say yes here you get support for TMU (Thermal Management
  Unit) on SAMSUNG EXYNOS series of SoC.
 
-config EXYNOS_THERMAL_EMUL
-   bool "EXYNOS TMU emulation mode support"
-   depends on EXYNOS_THERMAL
-   help
- Exynos 4412 and 4414 and 5 series has emulation mode on TMU.
- Enable this option will be make sysfs node in exynos thermal platform
- device directory to support emulation mode. With emulation mode sysfs
- node, you can manually input temperature to TMU for simulation 
purpose.
-
 config DB8500_THERMAL
bool "DB8500 thermal management"
depends on ARCH_U8500
diff --git a/drivers/thermal/exynos_thermal.c b/drivers/thermal/exynos_thermal.c
index 327102a..afe9c2a 100644
--- a/drivers/thermal/exynos_thermal.c
+++ b/drivers/thermal/exynos_thermal.c
@@ -99,13 +99,13 @@
 #define IDLE_INTERVAL 1
 #define MCELSIUS   1000
 
-#ifdef CONFIG_EXYNOS_THERMAL_EMUL
+#ifdef CONFIG_THERMAL_EMULATION
 #define EXYNOS_EMUL_TIME   0x57F0
 #define EXYNOS_EMUL_TIME_SHIFT 16
 #define EXYNOS_EMUL_DATA_SHIFT 8
 #define EXYNOS_EMUL_DATA_MASK  0xFF
 #define EXYNOS_EMUL_ENABLE 0x1
-#endif /* CONFIG_EXYNOS_THERMAL_EMUL */
+#endif /* CONFIG_THERMAL_EMULATION */
 
 /* CPU Zone information */
 #define PANIC_ZONE  4
@@ -143,6 +143,7 @@ struct  thermal_cooling_conf {
 struct thermal_sensor_conf {
char name[SENSOR_NAME_LEN];
int (*read_temperature)(void *data);
+   int (*write_emul_temp)(void *data, unsigned long temp);
struct thermal_trip_point_conf trip_data;
struct thermal_cooling_conf cooling_data;
void *private_data;
@@ -366,6 +367,23 @@ static int exynos_get_temp(struct thermal_zone_device 
*thermal,
return 0;
 }
 
+/* Get temperature callback functions for thermal zone */
+static int exynos_set_emul_temp(struct thermal_zone_device *thermal,
+   unsigned long temp)
+{
+   void *data;
+   int ret = -EINVAL;
+
+   if (!th_zone->sensor_conf) {
+   pr_info("Temperature sensor not initialised\n");
+   return -EINVAL;
+   }
+   data = th_zone->sensor_conf->private_data;
+   if (th_zone->sensor_conf->write_emul_temp)
+   ret = th_zone->sensor_conf->write_emul_temp(data, temp);
+   return ret;
+}
+
 /* Get the temperature trend */
 static int exynos_get_trend(struct thermal_zone_device *thermal,
int trip, enum thermal_trend *trend)
@@ -382,6 +400,7 @@ static struct thermal_zone_device_ops const exynos_dev_ops 
= {
.bind = exynos_bind,
.unbind = exynos_unbind,
.get_temp = exynos_get_temp,
+   .set_emul_temp = exynos_set_emul_temp,
.get_trend = 

Re: [PATCH] vhost-net: fall back to vmalloc if high-order allocation fails

2013-01-27 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Wed, 23 Jan 2013 23:04:11 +0200

> Maybe we should try and reduce our memory usage,
> I will look into this.

As has been pointed out, 32K of the size is from those iovecs in
the queues.

The size of this structure is frankly offensive, and even if you add
some levels of indirection even just one iovec chunk is 16K on 64-bit
which is in my opinion still unacceptably large.

TCP sockets aren't even %25 the size of this beast. :-)

I'm not going to apply this vmalloc patch, because if I apply it the
fundamental problem here just gets swept under the carpet even longer.

Sorry.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] userns: Allow the unprivileged users to mount mqueue fs

2013-01-27 Thread Gao feng
This patch allow the unprivileged user to mount mqueuefs in
user ns.

If two userns share the same ipcns,the files in mqueue fs
should be seen in both these two userns.

If the userns has its own ipcns,it has its own mqueue fs too.
ipcns has already done this job well.

Signed-off-by: Gao feng 
---
 ipc/mqueue.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 71a3ca1..023c986 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -1383,6 +1383,7 @@ static struct file_system_type mqueue_fs_type = {
.name = "mqueue",
.mount = mqueue_mount,
.kill_sb = kill_litter_super,
+   .fs_flags = FS_USERNS_MOUNT,
 };
 
 int mq_init_ns(struct ipc_namespace *ns)
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] checkpatch.pl: Fix warnings on code comments

2013-01-27 Thread David Miller
From: Jeff Kirsher 
Date: Sun, 27 Jan 2013 18:56:45 -0800

> So will you be fine with cleanup patches which go through and
> convert all the existing code comments to the desired format?

Sure.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 1/9] staging: zsmalloc: add gfp flags to zs_create_pool

2013-01-27 Thread Minchan Kim
On Fri, Jan 25, 2013 at 07:56:29AM -0800, Dan Magenheimer wrote:
> > From: Seth Jennings [mailto:sjenn...@linux.vnet.ibm.com]
> > Subject: Re: [PATCHv2 1/9] staging: zsmalloc: add gfp flags to 
> > zs_create_pool
> > 
> > On 01/24/2013 07:33 PM, Minchan Kim wrote:
> > > Hi Seth, frontswap guys
> > >
> > > On Tue, Jan 8, 2013 at 5:24 AM, Seth Jennings
> > >  wrote:
> > >> zs_create_pool() currently takes a gfp flags argument
> > >> that is used when growing the memory pool.  However
> > >> it is not used in allocating the metadata for the pool
> > >> itself.  That is currently hardcoded to GFP_KERNEL.
> > >>
> > >> zswap calls zs_create_pool() at swapon time which is done
> > >> in atomic context, resulting in a "might sleep" warning.
> > >
> > > I didn't review this all series, really sorry but totday I saw Nitin
> > > added Acked-by so I'm afraid Greg might get it under my radar. I'm not
> > > strong against but I would like know why we should call frontswap_init
> > > under swap_lock? Is there special reason?
> > 
> > The call stack is:
> > 
> > SYSCALL_DEFINE2(swapon.. <-- swapon_mutex taken here
> > enable_swap_info() <-- swap_lock taken here
> > frontswap_init()
> > __frontswap_init()
> > zswap_frontswap_init()
> > zs_create_pool()
> > 
> > It isn't entirely clear to me why frontswap_init() is called under
> > lock.  Then again, I'm not entirely sure what the swap_lock protects.
> >  There are no comments near the swap_lock definition to tell me.
> > 
> > I would guess that the intent is to block any writes to the swap
> > device until frontswap_init() has completed.
> > 
> > Dan care to weigh in?
> 
> I think frontswap's first appearance needs to be atomic, i.e.
> the transition from (a) frontswap is not present and will fail
> all calls, to (b) frontswap is fully functional... that transition
> must be atomic.  And, once Konrad's module patches are in, the
> opposite transition must be atomic also.  But there are most
> likely other ways to do those transitions atomically that
> don't need to hold swap_lock.

It could be raced once swap_info is registered.
But what's the problem if we call frontswap_init before calling
_enable_swap_info out of lock?
Swap subsystem never do I/O before it register new swap_info_struct.

And IMHO, if frontswap is to be atomic, it would be better to have
own scheme without dependency of swap_lock if it's possible.
> 
> Honestly, I never really focused on the initialization code
> so I am very open to improvements as long as they work for
> all the various frontswap backends.

How about this?

>From 157a3edf49feb93be0595574beb153b322ddf7d2 Mon Sep 17 00:00:00 2001
From: Minchan Kim 
Date: Mon, 28 Jan 2013 11:34:00 +0900
Subject: [PATCH] frontswap: Get rid of swap_lock dependency

Frontswap initialization routine depends on swap_lock, which want
to be atomic about frontswap's first appearance.
IOW, frontswap is not present and will fail all calls OR frontswap is
fully functional but if new swap_info_struct isn't registered
by enable_swap_info, swap subsystem doesn't start I/O so there is no race
between init procedure and page I/O working on frontswap.

So let's remove unncessary swap_lock dependency.

Cc: Dan Magenheimer 
Cc: Konrad Rzeszutek Wilk 
Signed-off-by: Minchan Kim 
---
 include/linux/frontswap.h |6 +++---
 mm/frontswap.c|7 ---
 mm/swapfile.c |   11 +--
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/linux/frontswap.h b/include/linux/frontswap.h
index 3044254..b7e238e 100644
--- a/include/linux/frontswap.h
+++ b/include/linux/frontswap.h
@@ -22,7 +22,7 @@ extern void frontswap_writethrough(bool);
 #define FRONTSWAP_HAS_EXCLUSIVE_GETS
 extern void frontswap_tmem_exclusive_gets(bool);
 
-extern void __frontswap_init(unsigned type);
+extern void __frontswap_init(unsigned type, unsigned long *map);
 extern int __frontswap_store(struct page *page);
 extern int __frontswap_load(struct page *page);
 extern void __frontswap_invalidate_page(unsigned, pgoff_t);
@@ -120,10 +120,10 @@ static inline void frontswap_invalidate_area(unsigned 
type)
__frontswap_invalidate_area(type);
 }
 
-static inline void frontswap_init(unsigned type)
+static inline void frontswap_init(unsigned type, unsigned long *map)
 {
if (frontswap_enabled)
-   __frontswap_init(type);
+   __frontswap_init(type, map);
 }
 
 #endif /* _LINUX_FRONTSWAP_H */
diff --git a/mm/frontswap.c b/mm/frontswap.c
index 2890e67..bad21b0 100644
--- a/mm/frontswap.c
+++ b/mm/frontswap.c
@@ -115,13 +115,14 @@ EXPORT_SYMBOL(frontswap_tmem_exclusive_gets);
 /*
  * Called when a swap device is swapon'd.
  */
-void __frontswap_init(unsigned type)
+void __frontswap_init(unsigned type, unsigned long *map)
 {
struct swap_info_struct *sis = swap_info[type];
 
BUG_ON(sis == NULL);
-   if (sis->frontswap_map == NULL)
-   return;
+   BUG_ON(sis->frontswap_map);
+
+   

Re: [PATCH] checkpatch.pl: Fix warnings on code comments

2013-01-27 Thread Jeff Kirsher
On Sun, 2013-01-27 at 18:59 -0500, David Miller wrote:
> From: Jeff Kirsher 
> Date: Sun, 27 Jan 2013 03:35:39 -0800
> 
> > Produces warnings on code comments which follow the Linux coding style
> > guide.  While the desired code comment style for networking my differ
> > from the rest of the kernel, both styles should be permitted.
> 
> I was actually going to mention to you guys that I've been lackadasical
> about enforcing the comment style I want with the Intel drivers.
> 
> That was a mistake, I should have enforced it strictly, as I do for
> the other drivers and the core networking code, from the beginning.
> 
> And it's clearly a mistake if you feel the need to take out the very
> checkpatch working that's meant to enforce this comment style in all
> of the networking drivers and core.
> 
> Do not revert this, follow it's advice instead.

Ok, I am fine with that.  I just had not seen any emails/responses that
this was direction you wanted to go.

So will you be fine with cleanup patches which go through and convert
all the existing code comments to the desired format?  If so, I will get
started on patches to cleanup,convert the Intel drivers to the desired
code comment style.


signature.asc
Description: This is a digitally signed message part


Re: Doubts about listen backlog and tcp_max_syn_backlog

2013-01-27 Thread Nivedita Singhvi
On 01/25/2013 02:05 AM, Leandro Lucarella wrote:
> On Thu, Jan 24, 2013 at 10:12:46PM -0800, Nivedita SInghvi wrote:
> I was just kind of quoting the name given by netstat: "SYNs to LISTEN
> sockets dropped" (for kernel 3.0, I noticed newer kernels don't have
> this stat anymore, or the name was changed). I still don't know if we
> are talking about the same thing.

>> [snip]
 I will sometimes be tripped-up by netstat's not showing a statistic
 with a zero value...
>>
>> Leandro, you should be able to do an nstat -z, it will print all
>> counters even if zero. You should see something like so:
>>
>> ipv4]> nstat -z
>> #kernel
>> IpInReceives2135   0.0
>> IpInHdrErrors   0  0.0
>> IpInAddrErrors  2020.0
>> ...
>>
>> You might want to take a look at those (your pkts may not even be
>> making it to tcp) and these in particular:
>>
>> TcpExtSyncookiesSent0  0.0
>> TcpExtSyncookiesRecv0  0.0
>> TcpExtSyncookiesFailed  0  0.0
>> TcpExtListenOverflows   0  0.0
>> TcpExtListenDrops   0  0.0
>> TcpExtTCPBacklogDrop0  0.0
>> TcpExtTCPMinTTLDrop 0  0.0
>> TcpExtTCPDeferAcceptDrop0  0.0
>>
>> If you don't have nstat on that version for some reason, download the
>> latest iproute pkg. Looking at the counter names is a lot more helpful
>> and precise than the netstat converstion to human consumption. 
> 
> Thanks, but what about this?
> 
> pc2 $ nstat -z | grep -i drop
> TcpExtLockDroppedIcmps  0  0.0
> TcpExtListenDrops   0  0.0
> TcpExtTCPPrequeueDropped0  0.0
> TcpExtTCPBacklogDrop0  0.0
> TcpExtTCPMinTTLDrop 0  0.0
> TcpExtTCPDeferAcceptDrop0  0.0

That seems bogus. 


> pc2 $ netstat -s | grep -i drop
> 470 outgoing packets dropped
> 5659740 SYNs to LISTEN sockets dropped
> 
> Is this normal?

That's a lot ofconnect requests dropped, but it depends on how 
long you've been up and how much traffic you've seen. 

Hmm...you were on an older Ubuntu, right? The netstat source 
was patched to translate it as follows:

+{ "ListenDrops", N_("%u SYNs to LISTEN sockets dropped"), opt_number },

(see the file debian/patches/CVS-20081003-statistics.c_sync.patch 
 in the net-tools src)

i.e., the netstat pkg is printing the value of the TCPEXT MIB counter
that's counting TCPExtListenDrops. 

Theoretically, that number should be the same as that printed by nstat,
as they are getting it from the same kernel stats counter. I have not
looked at nstat code (I actually almost always dump the counters from
/proc/net/{netstat + snmp} via a simple prettyprint script (will send
you that offline).  

If the nstat and netstat counters don't match, something is fishy.
That nstat output is broken.  

>>> Yes, I already did captures and we are definitely loosing packets
>>> (including SYNs), but it looks like the amount of SYNs I'm loosing is
>>> lower than the amount of long connect() times I observe. This is not
>>> confirmed yet, I'm still investigating.
>>
>> Where did you narrow down the drop to? There are quite a few places in
>> the networking stack we silently drop packets (such as the one pointed
>> out earlier in this thread), although they should almost all be
>> extremely low probability/NEVER type events. Do you want a patch to
>> gap the most likely scenario? (I'll post that to netdev separately). 
> 
> Even when that would be awesome, unfortunately there is no way I could
> get permission to run a patched kernel (or even restart the servers for
> that matter).
> 
> And I don't know how could I narrow down the drops in any way. What I
> know is capturing traffic with tcpdump, I see some packets leaving one
> server but never arriving to the new one.

Hmm..do you have a switch between your two end points dropping pkts? 
Could be.. Basically, by looking at the statistics kept by each layer, you 
should be able to narrow it down a little bit at least. 

It does still sound like some drops are occurring in TCP due to accept 
backlog being full and you're overrunning TCP incoming processing (or 
at least this contributing), going by that ListenDrops count. 

> Also, the hardware is not great either, I'm not sure is not responsible
> for the loss. There are some errors reported by ethtool, but I don't
> know exactly what they mean:
> 
> # ethtool -S eth0
> NIC statistics:
>  tx_packets: 336978308273
>  rx_packets: 384108075585
>  tx_errors: 0
>  rx_errors: 194
>  rx_missed: 1119
>  align_errors: 31731
>  tx_single_collisions: 0
>  tx_multi_collisions: 0
>  unicast: 384108023754
>  broadcast: 51825
>  

Re: Doubts about listen backlog and tcp_max_syn_backlog

2013-01-27 Thread Nivedita Singhvi
On 01/25/2013 02:05 AM, Leandro Lucarella wrote:
> On Thu, Jan 24, 2013 at 10:12:46PM -0800, Nivedita SInghvi wrote:
> I was just kind of quoting the name given by netstat: "SYNs to LISTEN
> sockets dropped" (for kernel 3.0, I noticed newer kernels don't have
> this stat anymore, or the name was changed). I still don't know if we
> are talking about the same thing.

>> [snip]
 I will sometimes be tripped-up by netstat's not showing a statistic
 with a zero value...
>>
>> Leandro, you should be able to do an nstat -z, it will print all
>> counters even if zero. You should see something like so:
>>
>> ipv4]> nstat -z
>> #kernel
>> IpInReceives2135   0.0
>> IpInHdrErrors   0  0.0
>> IpInAddrErrors  2020.0
>> ...
>>
>> You might want to take a look at those (your pkts may not even be
>> making it to tcp) and these in particular:
>>
>> TcpExtSyncookiesSent0  0.0
>> TcpExtSyncookiesRecv0  0.0
>> TcpExtSyncookiesFailed  0  0.0
>> TcpExtListenOverflows   0  0.0
>> TcpExtListenDrops   0  0.0
>> TcpExtTCPBacklogDrop0  0.0
>> TcpExtTCPMinTTLDrop 0  0.0
>> TcpExtTCPDeferAcceptDrop0  0.0
>>
>> If you don't have nstat on that version for some reason, download the
>> latest iproute pkg. Looking at the counter names is a lot more helpful
>> and precise than the netstat converstion to human consumption. 
> 
> Thanks, but what about this?
> 
> pc2 $ nstat -z | grep -i drop
> TcpExtLockDroppedIcmps  0  0.0
> TcpExtListenDrops   0  0.0
> TcpExtTCPPrequeueDropped0  0.0
> TcpExtTCPBacklogDrop0  0.0
> TcpExtTCPMinTTLDrop 0  0.0
> TcpExtTCPDeferAcceptDrop0  0.0

That seems bogus. 


> pc2 $ netstat -s | grep -i drop
> 470 outgoing packets dropped
> 5659740 SYNs to LISTEN sockets dropped
> 
> Is this normal?

That's a lot ofconnect requests dropped, but it depends on how 
long you've been up and how much traffic you've seen. 

Hmm...you were on an older Ubuntu, right? The netstat source 
was patched to translate it as follows:

+{ "ListenDrops", N_("%u SYNs to LISTEN sockets dropped"), opt_number },

(see the file debian/patches/CVS-20081003-statistics.c_sync.patch 
 in the net-tools src)

i.e., the netstat pkg is printing the value of the TCPEXT MIB counter
that's counting TCPExtListenDrops. 

Theoretically, that number should be the same as that printed by nstat,
as they are getting it from the same kernel stats counter. I have not
looked at nstat code (I actually almost always dump the counters from
/proc/net/{netstat + snmp} via a simple prettyprint script (will send
you that offline).  

If the nstat and netstat counters don't match, something is fishy.
That nstat output is broken.  

>>> Yes, I already did captures and we are definitely loosing packets
>>> (including SYNs), but it looks like the amount of SYNs I'm loosing is
>>> lower than the amount of long connect() times I observe. This is not
>>> confirmed yet, I'm still investigating.
>>
>> Where did you narrow down the drop to? There are quite a few places in
>> the networking stack we silently drop packets (such as the one pointed
>> out earlier in this thread), although they should almost all be
>> extremely low probability/NEVER type events. Do you want a patch to
>> gap the most likely scenario? (I'll post that to netdev separately). 
> 
> Even when that would be awesome, unfortunately there is no way I could
> get permission to run a patched kernel (or even restart the servers for
> that matter).
> 
> And I don't know how could I narrow down the drops in any way. What I
> know is capturing traffic with tcpdump, I see some packets leaving one
> server but never arriving to the new one.

Hmm..do you have a switch between your two end points dropping pkts? 
Could be.. Basically, by looking at the statistics kept by each layer, you 
should be able to narrow it down a little bit at least. 

It does still sound like some drops are occurring in TCP due to accept 
backlog being full and you're overrunning TCP incoming processing (or 
at least this contributing), going by that ListenDrops count. 

> Also, the hardware is not great either, I'm not sure is not responsible
> for the loss. There are some errors reported by ethtool, but I don't
> know exactly what they mean:
> 
> # ethtool -S eth0
> NIC statistics:
>  tx_packets: 336978308273
>  rx_packets: 384108075585
>  tx_errors: 0
>  rx_errors: 194
>  rx_missed: 1119
>  align_errors: 31731
>  tx_single_collisions: 0
>  tx_multi_collisions: 0
>  unicast: 384108023754
>  broadcast: 51825
>  

Re: [patch] NTB: fix pointer math issues

2013-01-27 Thread Jon Mason
On Wed, Jan 23, 2013 at 10:26:05PM +0300, Dan Carpenter wrote:
> ->remote_rx_info and ->rx_info are struct ntb_rx_info pointers.  If we
> add sizeof(struct ntb_rx_info) then it goes too far.

Good catch, I'll add it to me pending patch queue.

Thanks,
Jon

> 
> Signed-off-by: Dan Carpenter 
> ---
> 
> diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
> index e0bdfd7..676ee16 100644
> --- a/drivers/ntb/ntb_transport.c
> +++ b/drivers/ntb/ntb_transport.c
> @@ -486,7 +486,7 @@ static void ntb_transport_setup_qp_mw(struct 
> ntb_transport *nt,
>(qp_num / NTB_NUM_MW * rx_size);
>   rx_size -= sizeof(struct ntb_rx_info);
>  
> - qp->rx_buff = qp->remote_rx_info + sizeof(struct ntb_rx_info);
> + qp->rx_buff = qp->remote_rx_info + 1;
>   qp->rx_max_frame = min(transport_mtu, rx_size);
>   qp->rx_max_entry = rx_size / qp->rx_max_frame;
>   qp->rx_index = 0;
> @@ -780,7 +780,7 @@ static void ntb_transport_init_queue(struct ntb_transport 
> *nt,
> (qp_num / NTB_NUM_MW * tx_size);
>   tx_size -= sizeof(struct ntb_rx_info);
>  
> - qp->tx_mw = qp->rx_info + sizeof(struct ntb_rx_info);
> + qp->tx_mw = qp->rx_info + 1;
>   qp->tx_max_frame = min(transport_mtu, tx_size);
>   qp->tx_max_entry = tx_size / qp->tx_max_frame;
>   qp->tx_index = 0;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] arch: avr32: add dummy syscalls

2013-01-27 Thread HÃ¥vard Skinnemoen
On Sun, Jan 27, 2013 at 7:50 PM, Matthias Brugger
 wrote:
> This patch adds dummy syscalls so that compiling
> for this architecture does not provoke warnings when
> checksyscalls.sh is called.

Nice, but...

> --- a/arch/avr32/kernel/syscall_table.S
> +++ b/arch/avr32/kernel/syscall_table.S
> @@ -298,3 +298,32 @@ sys_call_table:
> .long   sys_recvmmsg
> .long   sys_setns
> .long   sys_ni_syscall  /* r8 is saturated at nr_syscalls */

This terminator needs to stay at the end. Its only purpose is to allow
us to save a cycle or two when saturating the system call number.

Also, Al's suggestion sounds good to me.

Havard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.8.0-rc4+ - Oops on removing WinTV-HVR-1400 expresscard TV Tuner

2013-01-27 Thread Yijing Wang
Hi Chris,
   Sorry for the delay reply. It seems like my reply last night was missed.

>From the sysinfo you provide, there are no pcie port devices under 
>/sys/bus/pci_express/devices.
Maybe because there are some problems with _OSC in your laptop, so pcie port 
driver won't create pcie port device
for hotplug, aer and so on.

Maybe you can add boot parameter "pcie_ports=native" and reboot your laptop.
Then use #modprobe pciehp pciehp_force=1 pciehp_debug=1 to load pciehp modules.
After above actions, enter /sys/bus/pci_express/devices/ directory and 
/sys/bus/pci/slots/
Some slots and pcie port devices should be there now.

/sys/bus/pci_express/devices:
total 0

/sys/bus/pci_express/drivers:
total 0
drwxr-xr-x 2 root root 0 Jan 27 13:17 pciehp/


On 2013/1/28 6:53, Chris Clayton wrote:
> Thanks again, Martin.
> 
> Firstly, maybe we should remove the linux-media list from the copy list. I 
> imagine this hotplug stuff is just noise to them.
> 
> [snip]
>> Do you have any other express card around to try if it works at all? Try 
>> that always after a cold boot.
>>
> Not at the moment, but I ordered at USB3 expresscard yesterday, so I will 
> have one soon.
> 
>> Posting a diff result of the below procedure might help:
>>
>> # lspci -vvvxxx > lspci.before_insertion.txt
>>
>> [plug your card into the slot]
>>
>> # lspci -vvvxxx > lspci.after_insertion.txt
>>
>> [ unplug your card]
>>
>> # lspci -vvvxxx > lspci.after_1st_removal.txt
>>
>> [re-plug your card into the slot]
>>
>> # lspci -vvvxxx > lspci.after_1st_re-insertion.txt
>>
>> [ unplug your card]
>>
>> # lspci -vvvxxx > lspci.after_2nd_removal.txt
>>
> 
> OK, I've been using kernel 3.8.0-rc kernels so far, but given that is still 
> under development, I've switched to 3.7.4, mainly because you are having 
> success with 3.7.x, acpiphp and pcie_aspm=off. I verified the environment as 
> follows:
> 
> [chris:~]$ cat /proc/cmdline
> root=/dev/sda5 pcie_aspm=off ro resume=/dev/sda6
> [chris:~]$ dmesg | grep ASPM
> [0.00] PCIe ASPM is disabled
> [0.348959]  pci:00: ACPI _OSC support notification failed, disabling 
> PCIe ASPM
> [chris:~]$ dmesg | grep acpiphp
> [0.400846] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
> [chris:~]$ dmesg | grep pciehp
> [chris:~]$ uname -a
> Linux laptop 3.7.4 #13 SMP PREEMPT Sun Jan 27 18:39:39 GMT 2013 i686 GNU/Linux
> 
> 
>> Then compare them using diff. These should have no difference:
>>
>> diff lspci.after_insertion.txt lspci.after_1st_re-insertion.txt
>> diff lspci.after_1st_removal.txt lspci.after_2nd_removal.txt
>>
> Correct, there were no differences.
> 
>>
>> These may have only little difference, or none:
>>
>> diff lspci.before_insertion.txt lspci.after_1st_removal.txt
> 
> 263c263
> <   LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency 
> L0 <1us, L1 <16us
> ---
>  >   LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency 
> L0 <512ns, L1 <16us
> 265c265
> <   LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- 
> CommClk-
> ---
>  >   LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- 
> CommClk+
> 267c267
> <   LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ 
> DLActive- BWMgmt- ABWMgmt-
> ---
>  >   LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ 
> DLActive- BWMgmt+ ABWMgmt-
> 273c273
> <   Changed: MRL- PresDet- LinkState-
> ---
>  >   Changed: MRL- PresDet- LinkState+
> 295,296c295,296
> < 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 4c 12 04
> < 50: 03 00 01 10 60 b2 1c 00 08 00 00 00 00 00 00 00
> ---
>  > 40: 10 80 42 01 00 80 00 00 00 00 10 00 12 3c 12 04
>  > 50: 40 00 11 50 60 b2 1c 00 08 00 00 01 00 00 00 00
> 
>> diff lspci.after_1st_removal.txt lspci.after_2nd_removal.txt
>>
> No difference.
>>
>>
>> Finally, these should confirm whether the PresDet works for you (for me NOT 
>> with pciehp but does work with acpiphp).
>> You should see PresDet- to PresDet+ changes in:
>>
> Yes, I do see the PresDet- to PresDet+ changes
> 
>> diff lspci.before_insertion.txt lspci.after_insertion.txt
> 
> 263c263
> <   LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency 
> L0 <1us, L1 <16us
> ---
>  >   LnkCap: Port #4, Speed 5GT/s, Width x1, ASPM L0s L1, Latency 
> L0 <512ns, L1 <16us
> 265c265
> <   LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- 
> CommClk-
> ---
>  >   LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- 
> CommClk+
> 267c267
> <   LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ 
> DLActive- BWMgmt- ABWMgmt-
> ---
>  >   LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ 
> DLActive+ BWMgmt+ ABWMgmt-
> 272,273c272,273
> <   SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- 
> Interlock-
> <   Changed: MRL- PresDet- LinkState-
> ---
>  >   SltSta: 

  1   2   3   4   5   6   7   >