from:"Steven Noonan"

Re: [PATCH 1/4] x86/cpufeatures: Add Intel feature bits for Speculation Control

2018-01-20 Thread Steven Noonan

On Sat, Jan 20, 2018 at 4:03 AM, David Woodhouse  wrote:
> Add three feature bits exposed by new microcode on Intel CPUs for
> speculation control. We would now be up to five bits in CPUID(7).RDX
> so take them out of the 'scattered' features and make a proper word
> for them instead.
>
> Signed-off-by: David Woodhouse 
> ---
>  arch/x86/include/asm/cpufeature.h|  7 +--
>  arch/x86/include/asm/cpufeatures.h   | 12 +---
>  arch/x86/include/asm/disabled-features.h |  3 ++-
>  arch/x86/include/asm/required-features.h |  3 ++-
>  arch/x86/kernel/cpu/common.c |  1 +
>  arch/x86/kernel/cpu/scattered.c  |  2 --
>  6 files changed, 19 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/include/asm/cpufeature.h 
> b/arch/x86/include/asm/cpufeature.h
> index ea9a7dd..70eddb3 100644
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -29,6 +29,7 @@ enum cpuid_leafs
> CPUID_8000_000A_EDX,
> CPUID_7_ECX,
> CPUID_8000_0007_EBX,
> +   CPUID_7_EDX,
>  };
>
>  #ifdef CONFIG_X86_FEATURE_NAMES
> @@ -79,8 +80,9 @@ extern const char * const x86_bug_flags[NBUGINTS*32];
>CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 15, feature_bit) ||\
>CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 16, feature_bit) ||\
>CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 17, feature_bit) ||\
> +  CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 18, feature_bit) ||\
>REQUIRED_MASK_CHECK||\
> -  BUILD_BUG_ON_ZERO(NCAPINTS != 18))
> +  BUILD_BUG_ON_ZERO(NCAPINTS != 19))
>
>  #define DISABLED_MASK_BIT_SET(feature_bit) \
>  ( CHECK_BIT_IN_MASK_WORD(DISABLED_MASK,  0, feature_bit) ||\
> @@ -101,8 +103,9 @@ extern const char * const x86_bug_flags[NBUGINTS*32];
>CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 15, feature_bit) ||\
>CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 16, feature_bit) ||\
>CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 17, feature_bit) ||\
> +  CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 18, feature_bit) ||\
>DISABLED_MASK_CHECK||\
> -  BUILD_BUG_ON_ZERO(NCAPINTS != 18))
> +  BUILD_BUG_ON_ZERO(NCAPINTS != 19))
>
>  #define cpu_has(c, bit)  
>   \
> (__builtin_constant_p(bit) && REQUIRED_MASK_BIT_SET(bit) ? 1 :  \
> diff --git a/arch/x86/include/asm/cpufeatures.h 
> b/arch/x86/include/asm/cpufeatures.h
> index 25b9375..adebdaa 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -13,7 +13,7 @@
>  /*
>   * Defines x86 CPU feature bits
>   */
> -#define NCAPINTS   18 /* N 32-bit words worth of 
> info */
> +#define NCAPINTS   19 /* N 32-bit words worth of 
> info */
>  #define NBUGINTS   1  /* N 32-bit bug flags */
>
>  /*
> @@ -206,8 +206,6 @@
>  #define X86_FEATURE_RETPOLINE  ( 7*32+12) /* Generic Retpoline 
> mitigation for Spectre variant 2 */
>  #define X86_FEATURE_RETPOLINE_AMD  ( 7*32+13) /* AMD Retpoline 
> mitigation for Spectre variant 2 */
>  #define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor 
> Inventory Number */
> -#define X86_FEATURE_AVX512_4VNNIW  ( 7*32+16) /* AVX-512 Neural Network 
> Instructions */
> -#define X86_FEATURE_AVX512_4FMAPS  ( 7*32+17) /* AVX-512 Multiply 
> Accumulation Single precision */
>
>  #define X86_FEATURE_MBA( 7*32+18) /* Memory 
> Bandwidth Allocation */
>  #define X86_FEATURE_RSB_CTXSW  ( 7*32+19) /* Fill RSB on context 
> switches */
> @@ -319,6 +317,14 @@
>  #define X86_FEATURE_SUCCOR (17*32+ 1) /* Uncorrectable error 
> containment and recovery */
>  #define X86_FEATURE_SMCA   (17*32+ 3) /* Scalable MCA */
>
> +/* Intel-defined CPU features, CPUID level 0x0007:0 (EDX), word 18 */
> +#define X86_FEATURE_AVX512_4VNNIW  (18*32+ 2) /* AVX-512 Neural Network 
> Instructions */
> +#define X86_FEATURE_AVX512_4FMAPS  (18*32+ 3) /* AVX-512 Multiply 
> Accumulation Single precision */
> +#define X86_FEATURE_SPEC_CTRL  (18*32+26) /* Speculation Control 
> (IBRS + IBPB) */
> +#define X86_FEATURE_STIPB  (18*32+27) /* Speculation Control 
> with STIPB (Intel) */

Is this correct? I thought the acronym was "STIBP", i.e.
"Single-Thread Indrect Branch Prediction"? If so, then you've got the
B and P swapped.

> +#define X86_FEATURE_ARCH_CAPABILITIES  (18*32+29) /* IA32_ARCH_CAPABILITIES 
> MSR (Intel) */
> +
> +
>  /*
>   * BUG word(s)
>   */
> diff --git a/arch/x86/include/asm/disabled-features.h 
> b/arch/x86/include/asm/disabled-features.h
> index e428e16..c6a3af1 100644
> --- a/arch/x86/include/asm/disabled-features.h
> +++ b/arch/x86/include/asm/disabled-featur

Re: perf object code reading test crashes

2016-02-17 Thread Steven Noonan

On Wed, Feb 17, 2016 at 6:27 AM, Arnaldo Carvalho de Melo
 wrote:
> Em Tue, Feb 16, 2016 at 10:42:19PM -0800, Steven Noonan escreveu:
>> I oddly didn't run into this issue on every machine I tried, but
>> there's some issues here:
>>
>> $ sudo perf test 21
>> 21: Test object code reading :***
>> Error in `perf': corrupted double-linked list: 0x023ffcd0 ***
>
>>  FAILED!
>>
>> Valgrind seems to suggest that the cpu map is getting freed too early:
>>
>> ==11450==  Address 0x875b8a0 is 0 bytes inside a block of size 136 free'd
>> ==11450==at 0x4C29D2A: free (in
>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>> ==11450==by 0x4CBD49: cpu_map__delete (cpumap.c:228)
>> ==11450==by 0x4CC690: cpu_map__put (cpumap.c:242)
>> ==11450==by 0x484DE3: __perf_evlist__propagate_maps (evlist.c:136)
>
>
>> I tried this, and the problem goes away:
>
>
>> +++ b/tools/perf/tests/code-reading.c
>> @@ -514,6 +514,7 @@ static int do_test_code_reading(bool try_kcore)
>> }
>> +   cpu_map__get(cpus);
>>
>> while (1) {
>
> Yeah, we forgot to grab refcounts in perf_evlist__set_maps(), can you
> try this instead, if it works please let me know so that I can add a:
>
> Reported-and-Tested-by: you to this patch,
>
> Thanks for the nice report!
>
> - Arnaldo

That did the trick for the refcounting as far as valgrind/libc are concerned.

Reported-and-Tested-by: Steven Noonan 


Now to figure out why the test is failing. This same test works fine
on another system running the same kernel build:

$ ./perf test -v -v 21
21: Test object code reading :
--- start ---
test child forked, pid 19527
Looking at the vmlinux_path (7 entries long)
Using /usr/lib/debug/lib/modules/4.4.1-1-ec2/vmlinux for symbols
Parsing event 'cycles'

perf_event_attr:
  size 112
  { sample_period, sample_freq }   4000
  sample_type  IP|TID|PERIOD
  disabled 1
  inherit  1
  mmap 1
  comm 1
  freq 1
  task 1
  sample_id_all1
  exclude_guest1
  mmap21
  comm_exec1

sys_perf_event_open: pid 19527  cpu 0  group_fd -1  flags 0x8
sys_perf_event_open failed, error -22

perf_event_attr:
  size 112
  { sample_period, sample_freq }   4000
  sample_type  IP|TID|PERIOD
  disabled 1
  inherit  1
  mmap 1
  comm 1
  freq 1
  task 1
  sample_id_all1
  exclude_guest1
  mmap21
  comm_exec1

sys_perf_event_open: pid 19527  cpu 0  group_fd -1  flags 0
sys_perf_event_open failed, error -22

perf_event_attr:
  size 112
  { sample_period, sample_freq }   4000
  sample_type  IP|TID|PERIOD
  disabled 1
  inherit  1
  mmap 1
  comm 1
  freq 1
  task 1
  sample_id_all1
  exclude_guest1
  comm_exec1

sys_perf_event_open: pid 19527  cpu 0  group_fd -1  flags 0
sys_perf_event_open failed, error -22

perf_event_attr:
  size 112
  { sample_period, sample_freq }   4000
  sample_type  IP|TID|PERIOD
  disabled 1
  inherit  1
  mmap 1
  comm 1
  freq 1
  task 1
  sample_id_all1
  comm_exec1

sys_perf_event_open: pid 19527  cpu 0  group_fd -1  flags 0
sys_perf_event_open failed, error -22

perf_event_attr:
  size

perf object code reading test crashes

2016-02-16 Thread Steven Noonan

I oddly didn't run into this issue on every machine I tried, but
there's some issues here:

$ sudo perf test 21
21: Test object code reading :***
Error in `perf': corrupted double-linked list: 0x023ffcd0 ***
=== Backtrace: =
/usr/lib/libc.so.6(+0x72055)[0x7f25be0f3055]
/usr/lib/libc.so.6(+0x779b6)[0x7f25be0f89b6]
/usr/lib/libc.so.6(+0x7a0ed)[0x7f25be0fb0ed]
/usr/lib/libc.so.6(__libc_calloc+0xba)[0x7f25be0fceda]
perf(parse_events_lex_init_extra+0x38)[0x4cfff8]
perf(parse_events+0x55)[0x4a0615]
perf(perf_evlist__config+0xcf)[0x4eeb2f]
perf[0x479f82]
perf(test__code_reading+0x1e)[0x47ad4e]
perf(cmd_test+0x5dd)[0x46452d]
perf[0x47f4e3]
perf(main+0x603)[0x42c723]
/usr/lib/libc.so.6(__libc_start_main+0xf0)[0x7f25be0a1610]
perf(_start+0x29)[0x42c859]
=== Memory map: 
0040-0068d000 r-xp  08:03 2384296
  /usr/bin/perf
0088d000-008a1000 r--p 0028d000 08:03 2384296
  /usr/bin/perf
008a1000-008c2000 rw-p 002a1000 08:03 2384296
  /usr/bin/perf
008c2000-0194f000 rw-p  00:00 0
02193000-021b4000 rw-p  00:00 0  [heap]
021b4000-0254a000 rw-p  00:00 0  [heap]
7f25b800-7f25b8021000 rw-p  00:00 0
7f25b8021000-7f25bc00 ---p  00:00 0
7f25bcdff000-7f25bce15000 r-xp  08:03 2378588
  /usr/lib/libgcc_s.so.1
7f25bce15000-7f25bd014000 ---p 00016000 08:03 2378588
  /usr/lib/libgcc_s.so.1
7f25bd014000-7f25bd015000 rw-p 00015000 08:03 2378588
  /usr/lib/libgcc_s.so.1
7f25bd015000-7f25bd017000 r-xp  08:03 2361423
  /usr/lib/libutil-2.22.so
7f25bd017000-7f25bd216000 ---p 2000 08:03 2361423
  /usr/lib/libutil-2.22.so
7f25bd216000-7f25bd217000 r--p 1000 08:03 2361423
  /usr/lib/libutil-2.22.so
7f25bd217000-7f25bd218000 rw-p 2000 08:03 2361423
  /usr/lib/libutil-2.22.so
7f25bd218000-7f25bd22 r-xp  08:03 2361358
  /usr/lib/libcrypt-2.22.so
7f25bd22-7f25bd42 ---p 8000 08:03 2361358
  /usr/lib/libcrypt-2.22.so
7f25bd42-7f25bd421000 r--p 8000 08:03 2361358
  /usr/lib/libcrypt-2.22.so
7f25bd421000-7f25bd422000 rw-p 9000 08:03 2361358
  /usr/lib/libcrypt-2.22.so
7f25bd422000-7f25bd45 rw-p  00:00 0
7f25bd45-7f25bd45f000 r-xp  08:03 2365752
  /usr/lib/libbz2.so.1.0.6
7f25bd45f000-7f25bd65e000 ---p f000 08:03 2365752
  /usr/lib/libbz2.so.1.0.6
7f25bd65e000-7f25bd66 rw-p e000 08:03 2365752
  /usr/lib/libbz2.so.1.0.6
7f25bd66-7f25bd66a000 r-xp  08:03 2379940
  /usr/lib/libnuma.so.1.0.0
7f25bd66a000-7f25bd86a000 ---p a000 08:03 2379940
  /usr/lib/libnuma.so.1.0.0
7f25bd86a000-7f25bd86b000 r--p a000 08:03 2379940
  /usr/lib/libnuma.so.1.0.0
7f25bd86b000-7f25bd86c000 rw-p b000 08:03 2379940
  /usr/lib/libnuma.so.1.0.0
7f25bd86c000-7f25bd891000 r-xp  08:03 2365772
  /usr/lib/liblzma.so.5.2.2
7f25bd891000-7f25bda9 ---p 00025000 08:03 2365772
  /usr/lib/liblzma.so.5.2.2
7f25bda9-7f25bda91000 r--p 00024000 08:03 2365772
  /usr/lib/liblzma.so.5.2.2
7f25bda91000-7f25bda92000 rw-p 00025000 08:03 2365772
  /usr/lib/liblzma.so.5.2.2
7f25bda92000-7f25bdaa7000 r-xp  08:03 2365728
  /usr/lib/libz.so.1.2.8
7f25bdaa7000-7f25bdca6000 ---p 00015000 08:03 2365728
  /usr/lib/libz.so.1.2.8
7f25bdca6000-7f25bdca7000 r--p 00014000 08:03 2365728
  /usr/lib/libz.so.1.2.8
7f25bdca7000-7f25bdca8000 rw-p 00015000 08:03 2365728
  /usr/lib/libz.so.1.2.8
7f25bdca8000-7f25bde33000 r-xp  08:03 2380612
  /usr/lib/libpython2.7.so.1.0
7f25bde33000-7f25be032000 ---p 0018b000 08:03 2380612
  /usr/lib/libpython2.7.so.1.0
7f25be032000-7f25be034000 r--p 0018a000 08:03 2380612
  /usr/lib/libpython2.7.so.1.0
7f25be034000-7f25be072000 rw-p 0018c000 08:03 2380612
  /usr/lib/libpython2.7.so.1.0
7f25be072000-7f25be081000 rw-p  00:00 0
7f25be081000-7f25be21b000 r-xp  08:03 2361437
  /usr/lib/libc-2.22.so
7f25be21b000-7f25be41b000 ---p 0019a000 08:03 2361437
  /usr/lib/libc-2.22.so
7f25be41b000-7f25be41f000 r--p 0019a000 08:03 2361437
  /usr/lib/libc-2.22.so
7f25be41f000-7f25be421000 rw-p 0019e000 08:03 2361437
  /usr/lib/libc-2.22.so
7f25be421000-7f25be425000 rw-p  00:00 0
7f25be425000-7f25be5ff000 r-xp  08:03 2367396
  /usr/lib/perl5/core_perl/CORE/libperl.so
7f25be5ff000-7f25be7ff000 ---p 001da000 08:03 2367396
  /usr/lib/perl5/core_perl/CORE/libperl.so
7f25be7ff000-7f25be804000 r--p 001da000 08:03 2367396
  /usr/lib/perl5/core_perl/CORE/libperl.so
7f25be804000-7f25be808000 rw-p 001df000 08:03 2367396
  /usr/lib/perl5/core_perl/CORE/libperl.so
7f25be808000-7f25be905000 r-xp  08:03 2392491
  /usr/lib/libslang.so.2.3.0
7f25be905000-7f25beb04000 ---p 000fd000 08:03 2392491
  /usr/lib/libslang.so.2.3.0
7f25beb04000-7f25beb09000 r--p 000fc000 08:03 2392491
  /usr/lib/libslang.so.2.3.0
7f25beb09000-7f25beb22000 rw-p 00101000 08:03 2392491
  /usr/lib/libslang.so.2.3.0
7f25beb22000-7f25beb77000 rw-p  00:00 0
7f25beb77000-7

Re: [GIT PULL] kdbus for 4.1-rc1

2015-04-23 Thread Steven Noonan

On Thu, Apr 23, 2015 at 2:41 PM, Borislav Petkov  wrote:
> On Thu, Apr 23, 2015 at 11:22:39PM +0200, David Herrmann wrote:
>> No it's not. O(256) equals O(1).
>
> Ok, you're right. Maybe O() was not the right thing to use when trying
> to point out that iterating over 256 hash buckets and then following the
> chain in each bucket per packet broadcast looks like a lot.
>

Heh. I guess you could call it an "expensive O(1)". While big-O
notation is useful for describing algorithm scalability with respect
to input size, it falls flat on its face when trying to articulate
impact in measurable units.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

4.0.0-rc4 NVMe NULL pointer dereference and hang

2015-03-22 Thread Steven Noonan

This happens on boot, and then eventually results in an RCU stall.

[8.047533] nvme :05:00.0: Device not ready; aborting initialisation

Note that the above is expected with this hardware (long story).
Although 3.19.x prints the above and then continues gracefully, 4.0-rc
breaks immediately after the above message:

[8.054306] BUG: unable to handle kernel NULL pointer dereference
at 0008
[8.062155] IP: [] nvme_dev_list_remove+0x24/0xa0 [nvme]
[8.069043] PGD 0
[8.071067] Oops: 0002 [#1] SMP
[8.074332] Modules linked in: ahci libahci libata ehci_pci
ehci_hcd scsi_mod usbcore usb_common nvme i915 intel_gtt i2c_algo_bit
video drm_kms_helper drm i2c_core e
1000e ptp pps_core ipmi_poweroff ipmi_msghandler button
[8.094244] CPU: 4 PID: 632 Comm: kworker/u288:1 Not tainted
4.0.0-rc4-00347-gb87444a2 #5
[8.109878] Workqueue: nvme nvme_reset_workfn [nvme]
[8.114852] task: 881f98271d70 ti: 881f982b8000 task.ti:
881f982b8000
[8.122321] RIP: 0010:[]  []
nvme_dev_list_remove+0x24/0xa0 [nvme]
[8.131624] RSP: :881f982bbd18  EFLAGS: 00010246
[8.136930] RAX:  RBX: 883f63f84800 RCX: 88bf66e6a418
[8.144052] RDX:  RSI: 0120 RDI: a0269848
[8.151171] RBP: 881f982bbd28 R08: 881f982b8000 R09: 0001
[8.158288] R10: 0086 R11: 0020 R12: 883f63f84800
[8.165411] R13: 88bf66e6a400 R14: 88df627ff900 R15: 1000
[8.172530] FS:  () GS:883f7f88()
knlGS:
[8.180600] CS:  0010 DS:  ES:  CR0: 80050033
[8.186337] CR2: 0008 CR3: 01007ea0c000 CR4: 001406e0
[8.193458] DR0:  DR1:  DR2: 
[8.200574] DR3:  DR6: fffe0ff0 DR7: 0400
[8.207693] Stack:
[8.209705]  881f982bbd28 883f63f84978 881f982bbdc8
a026005e
[8.217150]  883f7f894300 8de0 881f982bbd98
810a65e1
[8.224600]  881f982bbdd8 810a9943 881f982bbd98
881f982bbdd0
[8.232049] Call Trace:
[8.234500]  [] nvme_dev_shutdown+0x1e/0x430 [nvme]
[8.240943]  [] ? put_prev_entity+0x31/0x350
[8.246772]  [] ? pick_next_task_fair+0x103/0x4e0
[8.253046]  [] ? __switch_to+0x175/0x5c0
[8.258607]  [] nvme_reset_failed_dev+0x1e/0x100 [nvme]
[8.265378]  [] nvme_reset_workfn+0xf/0x20 [nvme]
[8.271649]  [] process_one_work+0x14e/0x400
[8.277472]  [] worker_thread+0x5b/0x530
[8.282943]  [] ? rescuer_thread+0x3a0/0x3a0
[8.288778]  [] kthread+0xc9/0xe0
[8.293649]  [] ? kthread_stop+0x100/0x100
[8.299322]  [] ret_from_fork+0x58/0x90
[8.304711]  [] ? kthread_stop+0x100/0x100
[8.310357] Code: 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5
53 48 89 fb 48 c7 c7 48 98 26 a0 48 83 ec 08 e8 c3 23 2e e1 48 8b 13
48 8b 43 08 <48> 89 42 08 48 8
9 10 48 89 1b 48 81 3d 77 ae 00 00 a0 94 26 a0
[8.330295] RIP  [] nvme_dev_list_remove+0x24/0xa0 [nvme]
[8.337258]  RSP 
[8.340739] CR2: 0008
[8.344056] ---[ end trace 70831a936042aa41 ]---
[8.348727] BUG: unable to handle kernel paging request at ffd8
[8.355708] IP: [] kthread_data+0x11/0x20
[8.361286] PGD 1007ea0f067 PUD 1007ea11067 PMD 0
[8.366120] Oops:  [#2]
[8.412683] SMP
[8.414727] Modules linked in: ahci libahci libata ehci_pci
ehci_hcd scsi_mod usbcore usb_common nvme i915 intel_gtt i2c_algo_bit
video drm_kms_helper drm i2c_core e
1000e ptp pps_core ipmi_poweroff ipmi_msghandler button
[8.434632] CPU: 4 PID: 632 Comm: kworker/u288:1 Tainted: G  D
   4.0.0-rc4-00347-gb87444a2 #5
[8.451480] task: 881f98271d70 ti: 881f982b8000 task.ti:
881f982b8000
[8.458942] RIP: 0010:[]  []
kthread_data+0x11/0x20
[8.466946] RSP: :881f982bb9d0  EFLAGS: 00010096
[8.472247] RAX:  RBX: 0004 RCX: 000f
[8.479367] RDX:  RSI: 0004 RDI: 881f98271d70
[8.486486] RBP: 881f982bb9e8 R08:  R09: 8106d480
[8.493605] R10: 883f7f897f20 R11: ea007e60a800 R12: 00014300
[8.500726] R13: 883f7f894300 R14: 881f98271d70 R15: 0004
[8.507846] FS:  () GS:883f7f88()
knlGS:
[8.515918] CS:  0010 DS:  ES:  CR0: 80050033
[8.521651] CR2: 0028 CR3: 01007ea0c000 CR4: 001406e0
[8.528772] DR0:  DR1:  DR2: 
[8.535890] DR3:  DR6: fffe0ff0 DR7: 0400
[8.543011] Stack:
[8.545020]  81088785 881f982bb9e8 883f7f894300
881f982bba38
[8.552467]  8153cfae  881f98271d70
0001
[8.5

Re: [PATCH v2 0/7] New Lenovos 2015 touchpads: party time!

2015-03-10 Thread Steven Noonan

On Tue, Mar 10, 2015 at 12:23 AM, Hans de Goede  wrote:
> Hi,
>
> On 10-03-15 07:17, Steven Noonan wrote:
>>
>> Hi Benjamin,
>>
>> I just got a ThinkPad X250 in today and have tried out your patches on
>> 3.19.1. Before the patches, the top TrackPoint buttons weren't working
>> at all, but the clickpad was working fine. For the most part, your
>> patches fixed the TrackPoint.
>>
>> There's something weird going on though. If I control the mouse cursor
>> with the trackpoint nub, it feels "slow". At first I though it was
>> running the video mode at half the normal refresh rate, because the
>> pointer was only moving at what felt like a 30Hz refresh rate. But
>> then I tried the trackpad, and it behaves as expected (snappy and
>> responsive). Note that this is a definite difference between the BDW
>> generation and the HSW generation, as my HSW ThinkPad Yoga feels fine.
>>
>> Is there something in the driver that controls the TrackPoint nub's
>> sampling rate?
>
>
> Actually the trackpoint sensitivity is of (less sensitive) on the t440 /
> x240 generation too. There it seems slower then with previous thinkpads
> as well. I was hoping this would be fixed with the t450, but given that
> they've recycled the keyboard it makes sense that it is not fixed.
>
> I still have writing a kernel patch for this on my todo list. In the
> mean time you can change the sensitivity as documented here:
>
> http://www.thinkwiki.org/wiki/How_to_configure_the_TrackPoint#Sensitivity_.26_Speed

Nice! Super useful.

> I plan to write a kernel patch to set a different sensitivity by default
> on these newer models to fix this ootb. If you can let me know what seems
> to be a good sensitivity that would be useful.

Original values:

# head speed sensitivity
==> speed <==
97

==> sensitivity <==
128


These are comfortable for me, but are a little bit faster than even
the X230 TrackPoint was:

# head speed sensitivity
==> speed <==
105

==> sensitivity <==
160


These feel pretty close to the X230 behavior:

# head speed sensitivity
==> speed <==
105

==> sensitivity <==
140

> Regards,
>
> Hans
>
>
>
>
>>
>> - Steven
>>
>> On Mon, Mar 9, 2015 at 12:36 PM, Benjamin Tissoires
>>  wrote:
>>>
>>> On Mon, Mar 9, 2015 at 4:24 AM, Hans de Goede 
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>>
>>>> On 09-03-15 07:46, Dmitry Torokhov wrote:
>>>>>
>>>>>
>>>>> On Wed, Feb 25, 2015 at 03:58:20PM +0100, Hans de Goede wrote:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On 25-02-15 15:36, Benjamin Tissoires wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Feb 16, 2015 at 10:23 PM, Benjamin Tissoires
>>>>>>>  wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Feb 6, 2015 at 3:04 PM, Benjamin Tissoires
>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> This is the second episode of the Lenovo 2015 party :)
>>>>>>>>>
>>>>>>>>> Thanks to Andrew, we now have an idea within the driver of what are
>>>>>>>>> the extra
>>>>>>>>> buttons aimed for, and the patch series looks cleaner.
>>>>>>>>> Many thanks for your help.
>>>>>>>>>
>>>>>>>>> I marked only patches 1/7, 2/7 and 3/7 as stable because they are
>>>>>>>>> really
>>>>>>>>> stable fixes. Without the rest of the series, user-space can cope
>>>>>>>>> with
>>>>>>>>> the
>>>>>>>>> kernel result, and so there is IMO no need to backport too many
>>>>>>>>> patches in
>>>>>>>>> stable. I bet distributions will cherry-pick the rest of the series
>>>>>>>>> however.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Guys,
>>>>>>>>
>>>>>>>> any chances we consider this for 3.20 (or whatever it will be
>>>>>>>> numbered)?
>>>>>>>> I'd really like to see this accepted upstream in o

Re: [PATCH v2 0/7] New Lenovos 2015 touchpads: party time!

2015-03-09 Thread Steven Noonan

Hi Benjamin,

I just got a ThinkPad X250 in today and have tried out your patches on
3.19.1. Before the patches, the top TrackPoint buttons weren't working
at all, but the clickpad was working fine. For the most part, your
patches fixed the TrackPoint.

There's something weird going on though. If I control the mouse cursor
with the trackpoint nub, it feels "slow". At first I though it was
running the video mode at half the normal refresh rate, because the
pointer was only moving at what felt like a 30Hz refresh rate. But
then I tried the trackpad, and it behaves as expected (snappy and
responsive). Note that this is a definite difference between the BDW
generation and the HSW generation, as my HSW ThinkPad Yoga feels fine.

Is there something in the driver that controls the TrackPoint nub's
sampling rate?

- Steven

On Mon, Mar 9, 2015 at 12:36 PM, Benjamin Tissoires
 wrote:
> On Mon, Mar 9, 2015 at 4:24 AM, Hans de Goede  wrote:
>> Hi,
>>
>>
>> On 09-03-15 07:46, Dmitry Torokhov wrote:
>>>
>>> On Wed, Feb 25, 2015 at 03:58:20PM +0100, Hans de Goede wrote:

 Hi,

 On 25-02-15 15:36, Benjamin Tissoires wrote:
>
> On Mon, Feb 16, 2015 at 10:23 PM, Benjamin Tissoires
>  wrote:
>>
>> On Fri, Feb 6, 2015 at 3:04 PM, Benjamin Tissoires
>>  wrote:
>>>
>>> Hi,
>>>
>>> This is the second episode of the Lenovo 2015 party :)
>>>
>>> Thanks to Andrew, we now have an idea within the driver of what are
>>> the extra
>>> buttons aimed for, and the patch series looks cleaner.
>>> Many thanks for your help.
>>>
>>> I marked only patches 1/7, 2/7 and 3/7 as stable because they are
>>> really
>>> stable fixes. Without the rest of the series, user-space can cope with
>>> the
>>> kernel result, and so there is IMO no need to backport too many
>>> patches in
>>> stable. I bet distributions will cherry-pick the rest of the series
>>> however.
>>>
>>
>> Guys,
>>
>> any chances we consider this for 3.20 (or whatever it will be
>> numbered)?
>> I'd really like to see this accepted upstream in one way or one other
>> so we will prevent the mess we had to deal with last year.
>>
>
> Hans, Dmitry,
>
> well, it's been 3 weeks since I received the loaner I have to support
> these touchpads. I will have to return it next week or the week after
> at most. That means that I will not be able to conduct more tests at
> that point.
> Can I ask you to please review the series?


 Ah, sorry I missed you did a v2 (I did review v1).

 Series looks good to me and is:

 Acked-by: Hans de Goede 
>>>
>>>
>>> I did a few edits of the patches in the 2 series so I created a separate
>>> branch "synaptics" based on 3.19. I'd appreciate if you could give it q
>>> quick spin before I will send it for 4.0.
>>
>>
>> I don't have access to the hardware in question, but Benjamin does, so
>> we'll have to wait (a bit) for him to wake up :)
>>
>
> It took me a little bit of time to retrieve the laptop and get it tested.
> So far, so good:
> - t440s (2013) shows the correct behavior
> - x1 carbon 3 has the buttons properly forwarded through the
> trackstick interface and are reacting as expected.
>
> Thanks Dmitry!
>
> I've added Daniel to the thread and asked it this morning if he could
> also give a try to the series.
>
> Cheers,
> Benjamin
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2] idle/intel_powerclamp: Redesign idle injection to use bandwidth control mechanism

2015-02-09 Thread Steven Noonan

On Mon, Feb 9, 2015 at 9:56 AM, Steven Noonan  wrote:
> On Mon, Feb 9, 2015 at 3:51 AM, Preeti U Murthy
>  wrote:
>> Hi Steven,
>>
>> On 02/09/2015 01:02 PM, Steven Noonan wrote:
>>> On Sun, Feb 8, 2015 at 8:49 PM, Preeti U Murthy
>>>  wrote:
>>>> The powerclamp driver injects idle periods to stay within the thermal 
>>>> constraints.
>>>> The driver does a fake idle by spawning per-cpu threads that call the mwait
>>>> instruction. This behavior of fake idle can confuse the other kernel 
>>>> subsystems.
>>>> For instance it calls into the nohz tick handlers, which are meant to be 
>>>> called
>>>> only by the idle thread. It sets the state of the tick in each cpu to idle 
>>>> and
>>>> stops the tick, when there are tasks on the runqueue. As a result the 
>>>> callers of
>>>> idle_cpu()/ tick_nohz_tick_stopped() see different states of the cpu; 
>>>> while the
>>>> former thinks that the cpu is busy, the latter thinks that it is idle. The 
>>>> outcome
>>>> may be  inconsistency in the scheduler/nohz states which can lead to 
>>>> serious
>>>> consequences. One of them was reported on this thread:
>>>> https://lkml.org/lkml/2014/12/11/365.
>>>>
>>>> Thomas posted out a patch to disable the powerclamp driver from calling 
>>>> into the
>>>> tick nohz code which has taken care of the above regression for the 
>>>> moment. However
>>>> powerclamp driver as a result, will not be able to inject idle periods due 
>>>> to the
>>>> presence of periodic ticks. With the current design of fake idle, we 
>>>> cannot move
>>>> towards a better solution.
>>>> https://lkml.org/lkml/2014/12/18/169
>>>>
>>>> This patch aims at removing the concept of fake idle and instead makes the 
>>>> cpus
>>>> truly idle by throttling the runqueues during the idle injection periods. 
>>>> The situation
>>>> is in fact very similar to throttling of cfs_rqs when they exceed their 
>>>> bandwidths.
>>>> The idle injection metrics can be mapped to the bandwidth control metrics 
>>>> 'quota' and
>>>> 'period' to achieve the same result. When the powerclamping is begun or 
>>>> when the
>>>> clamping controls have been modified, the bandwidth for the root task 
>>>> group is set.
>>>> The 'quota' will be the amount of time that the system needs to be busy 
>>>> and 'period'
>>>> will be the sum of this busy duration and the idle duration as calculated 
>>>> by the driver.
>>>> This gets rid of per-cpu kthreads, control cpu, hotplug notifiers and 
>>>> clamping mask since
>>>> the thread starting powerclamping will set the bandwidth and throttling of 
>>>> all cpus will
>>>> automatically fall in place. None of the other cpus need be bothered about 
>>>> this. This
>>>> simplifies the design of the driver.
>>>>
>>>> Of course this is only if the idle injection metrics can be conveniently 
>>>> transformed
>>>> into bandwidth control metrics. There are a couple of other primary 
>>>> concerns around if
>>>> doing the below two in this patch is valid.
>>>> a. This patch exports the functions to set the quota and period of task 
>>>> groups.
>>>> b. This patch removes the constraint of not being able to set the root 
>>>> task grp's bandwidth.
>>>>
>>>> Signed-off-by: Preeti U Murthy 
>>>
>>> This doesn't compile.
>>
>> Thanks for reporting this! I realized that I had not compiled in the 
>> powerclamp driver
>> as a module while compile testing it. I was focusing on the issues with the 
>> design and
>> failed to cross verify this. Apologies for the inconvenience.
>>
>> Find the diff compile tested below.
>>
>> I also realized that clamp_cpus() that sets the bandwidth cannot be called 
>> from
>> multiple places. Currently I am calling it from end_powerclamp(), when the 
>> user changes the
>> idle clamping duration and from a queued timer. This will require 
>> synchronization between
>> callers which is not really called for. The queued wakeup_timer alone can 
>> re-evaluate the
>> clamping metrics after every throttle-unthrottle period a

Re: [PATCH V2] idle/intel_powerclamp: Redesign idle injection to use bandwidth control mechanism

2015-02-09 Thread Steven Noonan

On Mon, Feb 9, 2015 at 3:51 AM, Preeti U Murthy
 wrote:
> Hi Steven,
>
> On 02/09/2015 01:02 PM, Steven Noonan wrote:
>> On Sun, Feb 8, 2015 at 8:49 PM, Preeti U Murthy
>>  wrote:
>>> The powerclamp driver injects idle periods to stay within the thermal 
>>> constraints.
>>> The driver does a fake idle by spawning per-cpu threads that call the mwait
>>> instruction. This behavior of fake idle can confuse the other kernel 
>>> subsystems.
>>> For instance it calls into the nohz tick handlers, which are meant to be 
>>> called
>>> only by the idle thread. It sets the state of the tick in each cpu to idle 
>>> and
>>> stops the tick, when there are tasks on the runqueue. As a result the 
>>> callers of
>>> idle_cpu()/ tick_nohz_tick_stopped() see different states of the cpu; while 
>>> the
>>> former thinks that the cpu is busy, the latter thinks that it is idle. The 
>>> outcome
>>> may be  inconsistency in the scheduler/nohz states which can lead to serious
>>> consequences. One of them was reported on this thread:
>>> https://lkml.org/lkml/2014/12/11/365.
>>>
>>> Thomas posted out a patch to disable the powerclamp driver from calling 
>>> into the
>>> tick nohz code which has taken care of the above regression for the moment. 
>>> However
>>> powerclamp driver as a result, will not be able to inject idle periods due 
>>> to the
>>> presence of periodic ticks. With the current design of fake idle, we cannot 
>>> move
>>> towards a better solution.
>>> https://lkml.org/lkml/2014/12/18/169
>>>
>>> This patch aims at removing the concept of fake idle and instead makes the 
>>> cpus
>>> truly idle by throttling the runqueues during the idle injection periods. 
>>> The situation
>>> is in fact very similar to throttling of cfs_rqs when they exceed their 
>>> bandwidths.
>>> The idle injection metrics can be mapped to the bandwidth control metrics 
>>> 'quota' and
>>> 'period' to achieve the same result. When the powerclamping is begun or 
>>> when the
>>> clamping controls have been modified, the bandwidth for the root task group 
>>> is set.
>>> The 'quota' will be the amount of time that the system needs to be busy and 
>>> 'period'
>>> will be the sum of this busy duration and the idle duration as calculated 
>>> by the driver.
>>> This gets rid of per-cpu kthreads, control cpu, hotplug notifiers and 
>>> clamping mask since
>>> the thread starting powerclamping will set the bandwidth and throttling of 
>>> all cpus will
>>> automatically fall in place. None of the other cpus need be bothered about 
>>> this. This
>>> simplifies the design of the driver.
>>>
>>> Of course this is only if the idle injection metrics can be conveniently 
>>> transformed
>>> into bandwidth control metrics. There are a couple of other primary 
>>> concerns around if
>>> doing the below two in this patch is valid.
>>> a. This patch exports the functions to set the quota and period of task 
>>> groups.
>>> b. This patch removes the constraint of not being able to set the root task 
>>> grp's bandwidth.
>>>
>>> Signed-off-by: Preeti U Murthy 
>>
>> This doesn't compile.
>
> Thanks for reporting this! I realized that I had not compiled in the 
> powerclamp driver
> as a module while compile testing it. I was focusing on the issues with the 
> design and
> failed to cross verify this. Apologies for the inconvenience.
>
> Find the diff compile tested below.
>
> I also realized that clamp_cpus() that sets the bandwidth cannot be called 
> from
> multiple places. Currently I am calling it from end_powerclamp(), when the 
> user changes the
> idle clamping duration and from a queued timer. This will require 
> synchronization between
> callers which is not really called for. The queued wakeup_timer alone can 
> re-evaluate the
> clamping metrics after every throttle-unthrottle period and this should 
> suffice as far
> as I can see. Thoughts ?

Hmm, I've had two system lockups so far while running a kernel with
intel_powerclamp loaded. Both times it slowly ground to a halt and
processes piled up...

> --
>
> V2 of intel_powerclamp driver
>
> From: Preeti U Murthy 
>
>
> ---
>  drivers/thermal/Kconfig|

Re: [PATCH V2] idle/intel_powerclamp: Redesign idle injection to use bandwidth control mechanism

2015-02-08 Thread Steven Noonan

On Sun, Feb 8, 2015 at 8:49 PM, Preeti U Murthy
 wrote:
> The powerclamp driver injects idle periods to stay within the thermal 
> constraints.
> The driver does a fake idle by spawning per-cpu threads that call the mwait
> instruction. This behavior of fake idle can confuse the other kernel 
> subsystems.
> For instance it calls into the nohz tick handlers, which are meant to be 
> called
> only by the idle thread. It sets the state of the tick in each cpu to idle and
> stops the tick, when there are tasks on the runqueue. As a result the callers 
> of
> idle_cpu()/ tick_nohz_tick_stopped() see different states of the cpu; while 
> the
> former thinks that the cpu is busy, the latter thinks that it is idle. The 
> outcome
> may be  inconsistency in the scheduler/nohz states which can lead to serious
> consequences. One of them was reported on this thread:
> https://lkml.org/lkml/2014/12/11/365.
>
> Thomas posted out a patch to disable the powerclamp driver from calling into 
> the
> tick nohz code which has taken care of the above regression for the moment. 
> However
> powerclamp driver as a result, will not be able to inject idle periods due to 
> the
> presence of periodic ticks. With the current design of fake idle, we cannot 
> move
> towards a better solution.
> https://lkml.org/lkml/2014/12/18/169
>
> This patch aims at removing the concept of fake idle and instead makes the 
> cpus
> truly idle by throttling the runqueues during the idle injection periods. The 
> situation
> is in fact very similar to throttling of cfs_rqs when they exceed their 
> bandwidths.
> The idle injection metrics can be mapped to the bandwidth control metrics 
> 'quota' and
> 'period' to achieve the same result. When the powerclamping is begun or when 
> the
> clamping controls have been modified, the bandwidth for the root task group 
> is set.
> The 'quota' will be the amount of time that the system needs to be busy and 
> 'period'
> will be the sum of this busy duration and the idle duration as calculated by 
> the driver.
> This gets rid of per-cpu kthreads, control cpu, hotplug notifiers and 
> clamping mask since
> the thread starting powerclamping will set the bandwidth and throttling of 
> all cpus will
> automatically fall in place. None of the other cpus need be bothered about 
> this. This
> simplifies the design of the driver.
>
> Of course this is only if the idle injection metrics can be conveniently 
> transformed
> into bandwidth control metrics. There are a couple of other primary concerns 
> around if
> doing the below two in this patch is valid.
> a. This patch exports the functions to set the quota and period of task 
> groups.
> b. This patch removes the constraint of not being able to set the root task 
> grp's bandwidth.
>
> Signed-off-by: Preeti U Murthy 

This doesn't compile.

Missing forward declaration:

drivers/thermal/intel_powerclamp.c: In function ‘window_size_set’:
drivers/thermal/intel_powerclamp.c:160:2: error: implicit declaration
of function ‘clamp_cpus’ [-Werror=implicit-function-declaration]
  clamp_cpus();
  ^
drivers/thermal/intel_powerclamp.c: At top level:
drivers/thermal/intel_powerclamp.c:355:12: error: static declaration
of ‘clamp_cpus’ follows non-static declaration
 static int clamp_cpus(void)
^
drivers/thermal/intel_powerclamp.c:160:2: note: previous implicit
declaration of ‘clamp_cpus’ was here
  clamp_cpus();
  ^


Missing RUNTIME_INF definition (only appears in kernel/sched/sched.h
from what I can see?):

drivers/thermal/intel_powerclamp.c: In function ‘clamp_cpus’:
drivers/thermal/intel_powerclamp.c:358:14: error: ‘RUNTIME_INF’
undeclared (first use in this function)
  u64 quota = RUNTIME_INF, period;
  ^
drivers/thermal/intel_powerclamp.c:358:14: note: each undeclared
identifier is reported only once for each function it appears in


Weird label placement that doesn't make sense:

drivers/thermal/intel_powerclamp.c:364:2: error: a label can only be
part of a statement and a declaration is not a statement
  int sleeptime;
  ^
drivers/thermal/intel_powerclamp.c:365:2: error: expected expression
before ‘unsigned’
  unsigned long target_jiffies;
  ^
drivers/thermal/intel_powerclamp.c:366:2: warning: ISO C90 forbids
mixed declarations and code [-Wdeclaration-after-statement]
  unsigned int guard;
  ^
drivers/thermal/intel_powerclamp.c:390:2: error: ‘target_jiffies’
undeclared (first use in this function)
  target_jiffies = roundup(jiffies, interval);
  ^


Missing debug variable:

drivers/thermal/intel_powerclamp.c: In function ‘powerclamp_debug_show’:
drivers/thermal/intel_powerclamp.c:598:41: error: ‘control_cpu’
undeclared (first use in this function)
  seq_printf(m, "controlling cpu: %d\n", control_cpu);
 ^

Missing label:

drivers/thermal/intel_powerclamp.c: In function ‘powerclamp_init’:
drivers/thermal/intel_powerclamp.c:649:3: error: label ‘exit_free’
used but not defined
   goto exit_free;
   ^
drivers

Re: [PATCH] x86: change cachemode symbols to non-gpl to avoid breaking out-of-tree modules

2015-01-22 Thread Steven Noonan

On Thu, Jan 22, 2015 at 3:43 AM, Juergen Gross  wrote:
> Commit 281d4078bec366d60990add9d91a952953bd0d72 ("x86: Make page
> cache mode a real type") introduced the symbols __cachemode2pte_tbl
> and __pte2cachemode_tbl and exported them via EXPORT_SYMBOL_GPL.
> This will break building out-of-tree non-gpl modules which need to
> set caching mode in ptes.
>
> Change EXPORT_SYMBOL_GPL to EXPORT-SYMBOL for these two symbols.
>
> Reported-by: Stevan Noonan 

Steven, not "Stevan". :)

Otherwise, patch looks good. Already tested an identical change
locally and it unbroke the out-of-tree module builds.

> Signed-off-by: Juergen Gross 
> ---
>  arch/x86/mm/init.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index 08a7d31..079c3b6 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -43,7 +43,7 @@ uint16_t __cachemode2pte_tbl[_PAGE_CACHE_MODE_NUM] = {
> [_PAGE_CACHE_MODE_WT]   = _PAGE_PCD,
> [_PAGE_CACHE_MODE_WP]   = _PAGE_PCD,
>  };
> -EXPORT_SYMBOL_GPL(__cachemode2pte_tbl);
> +EXPORT_SYMBOL(__cachemode2pte_tbl);
>  uint8_t __pte2cachemode_tbl[8] = {
> [__pte2cm_idx(0)] = _PAGE_CACHE_MODE_WB,
> [__pte2cm_idx(_PAGE_PWT)] = _PAGE_CACHE_MODE_WC,
> @@ -54,7 +54,7 @@ uint8_t __pte2cachemode_tbl[8] = {
> [__pte2cm_idx(_PAGE_PCD | _PAGE_PAT)] = _PAGE_CACHE_MODE_UC_MINUS,
> [__pte2cm_idx(_PAGE_PWT | _PAGE_PCD | _PAGE_PAT)] = 
> _PAGE_CACHE_MODE_UC,
>  };
> -EXPORT_SYMBOL_GPL(__pte2cachemode_tbl);
> +EXPORT_SYMBOL(__pte2cachemode_tbl);
>
>  static unsigned long __initdata pgt_buf_start;
>  static unsigned long __initdata pgt_buf_end;
> --
> 2.1.2
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V6 01/18] x86: Make page cache mode a real type

2015-01-21 Thread Steven Noonan

On Mon, Nov 3, 2014 at 5:01 AM, Juergen Gross  wrote:
> At the moment there are a lot of places that handle setting or getting
> the page cache mode by treating the pgprot bits equal to the cache mode.
> This is only true because there are a lot of assumptions about the setup
> of the PAT MSR. Otherwise the cache type needs to get translated into
> pgprot bits and vice versa.
>
> This patch tries to prepare for that by introducing a separate type
> for the cache mode and adding functions to translate between those and
> pgprot values.
>
> To avoid too much performance penalty the translation between cache mode
> and pgprot values is done via tables which contain the relevant
> information.  Write-back cache mode is hard-wired to be 0, all other
> modes are configurable via those tables. For large pages there are
> translation functions as the PAT bit is located at different positions
> in the ptes of 4k and large pages.
>
> Based-on-patch-by: Stefan Bader 
> Signed-off-by: Juergen Gross 
> Reviewed-by: Thomas Gleixner 
> ---
>  arch/x86/include/asm/pgtable_types.h | 73 
> +++-
>  arch/x86/mm/init.c   | 29 ++
>  2 files changed, 101 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/pgtable_types.h 
> b/arch/x86/include/asm/pgtable_types.h
> index 0778964..5124642 100644
> --- a/arch/x86/include/asm/pgtable_types.h
> +++ b/arch/x86/include/asm/pgtable_types.h
> @@ -128,12 +128,34 @@
>  _PAGE_SOFT_DIRTY | _PAGE_NUMA)
>  #define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE | _PAGE_NUMA)
>
> -#define _PAGE_CACHE_MASK   (_PAGE_PCD | _PAGE_PWT)
>  #define _PAGE_CACHE_WB (0)
>  #define _PAGE_CACHE_WC (_PAGE_PWT)
>  #define _PAGE_CACHE_UC_MINUS   (_PAGE_PCD)
>  #define _PAGE_CACHE_UC (_PAGE_PCD | _PAGE_PWT)
>
> +/*
> + * The cache modes defined here are used to translate between pure SW usage
> + * and the HW defined cache mode bits and/or PAT entries.
> + *
> + * The resulting bits for PWT, PCD and PAT should be chosen in a way
> + * to have the WB mode at index 0 (all bits clear). This is the default
> + * right now and likely would break too much if changed.
> + */
> +#ifndef __ASSEMBLY__
> +enum page_cache_mode {
> +   _PAGE_CACHE_MODE_WB = 0,
> +   _PAGE_CACHE_MODE_WC = 1,
> +   _PAGE_CACHE_MODE_UC_MINUS = 2,
> +   _PAGE_CACHE_MODE_UC = 3,
> +   _PAGE_CACHE_MODE_WT = 4,
> +   _PAGE_CACHE_MODE_WP = 5,
> +   _PAGE_CACHE_MODE_NUM = 8
> +};
> +#endif
> +
> +#define _PAGE_CACHE_MASK   (_PAGE_PAT | _PAGE_PCD | _PAGE_PWT)
> +#define _PAGE_NOCACHE  (cachemode2protval(_PAGE_CACHE_MODE_UC))
> +
>  #define PAGE_NONE  __pgprot(_PAGE_PROTNONE | _PAGE_ACCESSED)
>  #define PAGE_SHARED__pgprot(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \
>  _PAGE_ACCESSED | _PAGE_NX)
> @@ -341,6 +363,55 @@ static inline pmdval_t pmdnuma_flags(pmd_t pmd)
>  #define pgprot_val(x)  ((x).pgprot)
>  #define __pgprot(x)((pgprot_t) { (x) } )
>
> +extern uint16_t __cachemode2pte_tbl[_PAGE_CACHE_MODE_NUM];
> +extern uint8_t __pte2cachemode_tbl[8];
> +
> +#define __pte2cm_idx(cb)   \
> +   cb) >> (_PAGE_BIT_PAT - 2)) & 4) |  \
> +(((cb) >> (_PAGE_BIT_PCD - 1)) & 2) |  \
> +(((cb) >> _PAGE_BIT_PWT) & 1))
> +
> +static inline unsigned long cachemode2protval(enum page_cache_mode pcm)
> +{
> +   if (likely(pcm == 0))
> +   return 0;
> +   return __cachemode2pte_tbl[pcm];
> +}
> +static inline pgprot_t cachemode2pgprot(enum page_cache_mode pcm)
> +{
> +   return __pgprot(cachemode2protval(pcm));
> +}
> +static inline enum page_cache_mode pgprot2cachemode(pgprot_t pgprot)
> +{
> +   unsigned long masked;
> +
> +   masked = pgprot_val(pgprot) & _PAGE_CACHE_MASK;
> +   if (likely(masked == 0))
> +   return 0;
> +   return __pte2cachemode_tbl[__pte2cm_idx(masked)];
> +}
> +static inline pgprot_t pgprot_4k_2_large(pgprot_t pgprot)
> +{
> +   pgprot_t new;
> +   unsigned long val;
> +
> +   val = pgprot_val(pgprot);
> +   pgprot_val(new) = (val & ~(_PAGE_PAT | _PAGE_PAT_LARGE)) |
> +   ((val & _PAGE_PAT) << (_PAGE_BIT_PAT_LARGE - _PAGE_BIT_PAT));
> +   return new;
> +}
> +static inline pgprot_t pgprot_large_2_4k(pgprot_t pgprot)
> +{
> +   pgprot_t new;
> +   unsigned long val;
> +
> +   val = pgprot_val(pgprot);
> +   pgprot_val(new) = (val & ~(_PAGE_PAT | _PAGE_PAT_LARGE)) |
> + ((val & _PAGE_PAT_LARGE) >>
> +  (_PAGE_BIT_PAT_LARGE - _PAGE_BIT_PAT));
> +   return new;
> +}
> +
>
>  typedef struct page *pgtable_t;
>
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index 66dba36..a9776ba 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -27,6 +27,35 @@
>
>  #include "mm_internal.h"
>
> +/*
> + * Tables translating betw

[PATCH] nf_log_ipv6: correct typo in module description

2014-11-27 Thread Steven Noonan

It incorrectly identifies itself as "IPv4" packet logging.

Signed-off-by: Steven Noonan 
---
 net/ipv6/netfilter/nf_log_ipv6.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/netfilter/nf_log_ipv6.c b/net/ipv6/netfilter/nf_log_ipv6.c
index 7b17a0b..41b9ade 100644
--- a/net/ipv6/netfilter/nf_log_ipv6.c
+++ b/net/ipv6/netfilter/nf_log_ipv6.c
@@ -412,6 +412,6 @@ module_init(nf_log_ipv6_init);
 module_exit(nf_log_ipv6_exit);
 
 MODULE_AUTHOR("Netfilter Core Team ");
-MODULE_DESCRIPTION("Netfilter IPv4 packet logging");
+MODULE_DESCRIPTION("Netfilter IPv6 packet logging");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_NF_LOGGER(AF_INET6, 0);
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: EFI-related general protection faults

2014-11-01 Thread Steven Noonan


On Sat, Nov 1, 2014 at 2:43 PM, Steven Noonan  wrote:
> On Sat, Nov 1, 2014 at 6:17 AM, Steven Noonan  wrote:
>> On Sat, Nov 1, 2014 at 6:00 AM, Steven Noonan  wrote:
>>> I've been getting general protection faults in EFI modules at boot time
>>> across several machines. I originally thought it was just an EFI quirk
>>> on one machine so I blacklisted the rtc-efi module (which was the
>>> offender at the time), but I've seen it elsewhere since. Once this
>>> happens, the system is only half-usable and needs to reboot. It's also
>>> sadly not 100% reproducible at every boot.
>>>
>>> From what I've observed, it only occurs at boot time when the various
>>> EFI modules are initializing. I haven't yet tested whether I can
>>> trigger it just by unloading/reloading EFI modules repeatedly, but seems
>>> like it'd be worth a shot.
>>>
>>> In two of the three traces below, it seems to happen while two EFI
>>> modules are loading at the same time (rtc_efi and efivars), so perhaps
>>> there's some common data initialization that's racy?
>>
>> Neat. If I do these in two separate shells simultaneously,
>>
>> # while true; do rmmod rtc_efi; modprobe rtc_efi; done
>> # while true; do rmmod efivars; modprobe efivars; done
>>
>> then it faults:
>>
>> Nov 01 06:10:04 osprey kernel: rtc-efi rtc-efi: rtc core: registered rtc-efi 
>> as rtc1
>> Nov 01 06:10:04 osprey kernel: rtc-efi rtc-efi: rtc core: registered rtc-efi 
>> as rtc1
>> Nov 01 06:10:04 osprey kernel: rtc-efi rtc-efi: rtc core: registered rtc-efi 
>> as rtc1
>> Nov 01 06:10:04 osprey kernel: rtc-efi rtc-efi: rtc core: registered rtc-efi 
>> as rtc1
>> Nov 01 06:10:04 osprey kernel: rtc-efi rtc-efi: rtc core: registered rtc-efi 
>> as rtc1
>> Nov 01 06:10:04 osprey kernel: rtc-efi rtc-efi: rtc core: registered rtc-efi 
>> as rtc1
>> Nov 01 06:10:04 osprey kernel: EFI Variables Facility v0.08 2004-May-17
>> Nov 01 06:10:04 osprey kernel: general protection fault:  [#1] SMP
>> Nov 01 06:10:04 osprey kernel: Modules linked in: rtc_efi(+) efivars(+) 
>> sch_sfq bridge stp llc it87 hwmon_vid joydev hid_generic ecb btusb 
>> sch_fq_codel bluetooth usbhid hid nls_cp437 vfat fat iTCO_wdt 
>> iTCO_vendor_support x86_pkg_temp_thermal intel_powerclamp coretemp 
>> crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul 
>> glue_helper ablk_helper i2c_i801 r8169 cryptd lpc_ich mfd_core mii fan 
>> thermal battery tpm_tis tpm evdev snd_hda_codec_realtek 
>> snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_controller 
>> snd_hda_codec snd_hwdep snd_pcm snd_timer snd soundcore acpi_cpufreq 
>> processor usbip_host usbip_core msr vhost_scsi target_core_mod 
>> crct10dif_generic crct10dif_pclmul configfs vhost_net tun vhost macvtap 
>> macvlan kvm_intel kvm efivarfs ext4 crc16 jbd2 mbcache sd_mod crc_t10dif 
>> crct10dif_common
>> Nov 01 06:10:04 osprey kernel:  ahci libahci libata crc32c_intel ehci_pci 
>> xhci_hcd ehci_hcd scsi_mod usbcore usb_common i915 intel_gtt i2c_algo_bit 
>> video drm_kms_helper drm i2c_core e1000e ptp pps_core ipmi_poweroff 
>> ipmi_msghandler button [last unloaded: rtc_efi]
>> Nov 01 06:10:04 osprey kernel: CPU: 4 PID: 13264 Comm: modprobe Not tainted 
>> 3.17.2-1-ec2 #1
>> Nov 01 06:10:04 osprey kernel: Hardware name: GIGABYTE 
>> M4HM87P-00/M4HM87P-00, BIOS F5 06/23/2014
>> Nov 01 06:10:04 osprey kernel: task: 880401729d60 ti: 8803f869c000 
>> task.ti: 8803f869c000
>> Nov 01 06:10:04 osprey kernel: RIP: 0010:[]  
>> [] efi_call+0x8e/0x100
>> Nov 01 06:10:04 osprey kernel: RSP: 0018:8803f869f9b0  EFLAGS: 00010002
>> Nov 01 06:10:04 osprey kernel: RAX:  RBX: 8803f869fb60 
>> RCX: 
>> Nov 01 06:10:04 osprey kernel: RDX: 80020020 RSI: 8803f869fb60 
>> RDI: fffef0fe3990
>> Nov 01 06:10:04 osprey kernel: RBP: 8803f869fa80 R08: 8803f869fa90 
>> R09: 001e
>> Nov 01 06:10:04 osprey kernel: R10: fffef0ff7f58 R11: 8803f869f8c0 
>> R12: 0286
>> Nov 01 06:10:04 osprey kernel: R13: 8803f869fb61 R14: 8803f869fa90 
>> R15: a40cafd8
>> Nov 01 06:10:04 osprey kernel: FS:  7fdd75904700() 
>> GS:88041eb0() knlGS:
>> Nov 01 06:10:04 osprey kernel: CS:  0010 DS:  ES:  CR0: 
>> 80050033
>> Nov 01 06:10:04 osprey kernel: CR2: 7fdd7593a4e1 CR3: 0009a000 
>> CR4: 001407e0
>>

Re: EFI-related general protection faults

2014-11-01 Thread Steven Noonan

On Sat, Nov 1, 2014 at 6:17 AM, Steven Noonan  wrote:
> On Sat, Nov 1, 2014 at 6:00 AM, Steven Noonan  wrote:
>> I've been getting general protection faults in EFI modules at boot time
>> across several machines. I originally thought it was just an EFI quirk
>> on one machine so I blacklisted the rtc-efi module (which was the
>> offender at the time), but I've seen it elsewhere since. Once this
>> happens, the system is only half-usable and needs to reboot. It's also
>> sadly not 100% reproducible at every boot.
>>
>> From what I've observed, it only occurs at boot time when the various
>> EFI modules are initializing. I haven't yet tested whether I can
>> trigger it just by unloading/reloading EFI modules repeatedly, but seems
>> like it'd be worth a shot.
>>
>> In two of the three traces below, it seems to happen while two EFI
>> modules are loading at the same time (rtc_efi and efivars), so perhaps
>> there's some common data initialization that's racy?
>
> Neat. If I do these in two separate shells simultaneously,
>
> # while true; do rmmod rtc_efi; modprobe rtc_efi; done
> # while true; do rmmod efivars; modprobe efivars; done
>
> then it faults:
>
> Nov 01 06:10:04 osprey kernel: rtc-efi rtc-efi: rtc core: registered rtc-efi 
> as rtc1
> Nov 01 06:10:04 osprey kernel: rtc-efi rtc-efi: rtc core: registered rtc-efi 
> as rtc1
> Nov 01 06:10:04 osprey kernel: rtc-efi rtc-efi: rtc core: registered rtc-efi 
> as rtc1
> Nov 01 06:10:04 osprey kernel: rtc-efi rtc-efi: rtc core: registered rtc-efi 
> as rtc1
> Nov 01 06:10:04 osprey kernel: rtc-efi rtc-efi: rtc core: registered rtc-efi 
> as rtc1
> Nov 01 06:10:04 osprey kernel: rtc-efi rtc-efi: rtc core: registered rtc-efi 
> as rtc1
> Nov 01 06:10:04 osprey kernel: EFI Variables Facility v0.08 2004-May-17
> Nov 01 06:10:04 osprey kernel: general protection fault:  [#1] SMP
> Nov 01 06:10:04 osprey kernel: Modules linked in: rtc_efi(+) efivars(+) 
> sch_sfq bridge stp llc it87 hwmon_vid joydev hid_generic ecb btusb 
> sch_fq_codel bluetooth usbhid hid nls_cp437 vfat fat iTCO_wdt 
> iTCO_vendor_support x86_pkg_temp_thermal intel_powerclamp coretemp 
> crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul 
> glue_helper ablk_helper i2c_i801 r8169 cryptd lpc_ich mfd_core mii fan 
> thermal battery tpm_tis tpm evdev snd_hda_codec_realtek snd_hda_codec_generic 
> snd_hda_codec_hdmi snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep 
> snd_pcm snd_timer snd soundcore acpi_cpufreq processor usbip_host usbip_core 
> msr vhost_scsi target_core_mod crct10dif_generic crct10dif_pclmul configfs 
> vhost_net tun vhost macvtap macvlan kvm_intel kvm efivarfs ext4 crc16 jbd2 
> mbcache sd_mod crc_t10dif crct10dif_common
> Nov 01 06:10:04 osprey kernel:  ahci libahci libata crc32c_intel ehci_pci 
> xhci_hcd ehci_hcd scsi_mod usbcore usb_common i915 intel_gtt i2c_algo_bit 
> video drm_kms_helper drm i2c_core e1000e ptp pps_core ipmi_poweroff 
> ipmi_msghandler button [last unloaded: rtc_efi]
> Nov 01 06:10:04 osprey kernel: CPU: 4 PID: 13264 Comm: modprobe Not tainted 
> 3.17.2-1-ec2 #1
> Nov 01 06:10:04 osprey kernel: Hardware name: GIGABYTE M4HM87P-00/M4HM87P-00, 
> BIOS F5 06/23/2014
> Nov 01 06:10:04 osprey kernel: task: 880401729d60 ti: 8803f869c000 
> task.ti: 8803f869c000
> Nov 01 06:10:04 osprey kernel: RIP: 0010:[]  
> [] efi_call+0x8e/0x100
> Nov 01 06:10:04 osprey kernel: RSP: 0018:8803f869f9b0  EFLAGS: 00010002
> Nov 01 06:10:04 osprey kernel: RAX:  RBX: 8803f869fb60 
> RCX: 
> Nov 01 06:10:04 osprey kernel: RDX: 80020020 RSI: 8803f869fb60 
> RDI: fffef0fe3990
> Nov 01 06:10:04 osprey kernel: RBP: 8803f869fa80 R08: 8803f869fa90 
> R09: 001e
> Nov 01 06:10:04 osprey kernel: R10: fffef0ff7f58 R11: 8803f869f8c0 
> R12: 0286
> Nov 01 06:10:04 osprey kernel: R13: 8803f869fb61 R14: 8803f869fa90 
> R15: a40cafd8
> Nov 01 06:10:04 osprey kernel: FS:  7fdd75904700() 
> GS:88041eb0() knlGS:
> Nov 01 06:10:04 osprey kernel: CS:  0010 DS:  ES:  CR0: 
> 80050033
> Nov 01 06:10:04 osprey kernel: CR2: 7fdd7593a4e1 CR3: 0009a000 
> CR4: 001407e0
> Nov 01 06:10:04 osprey kernel: Stack:
> Nov 01 06:10:04 osprey kernel:  8803f869fb60 8803f869fa80 
> 8803f869fb60 fffef0fe3990
> Nov 01 06:10:04 osprey kernel:  0286 8803f869fb60 
> 8803f869fa58 80050033
> Nov 01 06:10:04 osprey kernel:    
>  00ff
> Nov 01 06:10:04 osprey ke

Re: EFI-related general protection faults

2014-11-01 Thread Steven Noonan

On Sat, Nov 1, 2014 at 6:00 AM, Steven Noonan  wrote:
> I've been getting general protection faults in EFI modules at boot time
> across several machines. I originally thought it was just an EFI quirk
> on one machine so I blacklisted the rtc-efi module (which was the
> offender at the time), but I've seen it elsewhere since. Once this
> happens, the system is only half-usable and needs to reboot. It's also
> sadly not 100% reproducible at every boot.
>
> From what I've observed, it only occurs at boot time when the various
> EFI modules are initializing. I haven't yet tested whether I can
> trigger it just by unloading/reloading EFI modules repeatedly, but seems
> like it'd be worth a shot.
>
> In two of the three traces below, it seems to happen while two EFI
> modules are loading at the same time (rtc_efi and efivars), so perhaps
> there's some common data initialization that's racy?

Neat. If I do these in two separate shells simultaneously,

# while true; do rmmod rtc_efi; modprobe rtc_efi; done
# while true; do rmmod efivars; modprobe efivars; done

then it faults:

Nov 01 06:10:04 osprey kernel: rtc-efi rtc-efi: rtc core: registered rtc-efi as 
rtc1
Nov 01 06:10:04 osprey kernel: rtc-efi rtc-efi: rtc core: registered rtc-efi as 
rtc1
Nov 01 06:10:04 osprey kernel: rtc-efi rtc-efi: rtc core: registered rtc-efi as 
rtc1
Nov 01 06:10:04 osprey kernel: rtc-efi rtc-efi: rtc core: registered rtc-efi as 
rtc1
Nov 01 06:10:04 osprey kernel: rtc-efi rtc-efi: rtc core: registered rtc-efi as 
rtc1
Nov 01 06:10:04 osprey kernel: rtc-efi rtc-efi: rtc core: registered rtc-efi as 
rtc1
Nov 01 06:10:04 osprey kernel: EFI Variables Facility v0.08 2004-May-17
Nov 01 06:10:04 osprey kernel: general protection fault:  [#1] SMP
Nov 01 06:10:04 osprey kernel: Modules linked in: rtc_efi(+) efivars(+) sch_sfq 
bridge stp llc it87 hwmon_vid joydev hid_generic ecb btusb sch_fq_codel 
bluetooth usbhid hid nls_cp437 vfat fat iTCO_wdt iTCO_vendor_support 
x86_pkg_temp_thermal intel_powerclamp coretemp crc32_pclmul ghash_clmulni_intel 
aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper i2c_i801 r8169 
cryptd lpc_ich mfd_core mii fan thermal battery tpm_tis tpm evdev 
snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel 
snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_timer snd soundcore 
acpi_cpufreq processor usbip_host usbip_core msr vhost_scsi target_core_mod 
crct10dif_generic crct10dif_pclmul configfs vhost_net tun vhost macvtap macvlan 
kvm_intel kvm efivarfs ext4 crc16 jbd2 mbcache sd_mod crc_t10dif 
crct10dif_common
Nov 01 06:10:04 osprey kernel:  ahci libahci libata crc32c_intel ehci_pci 
xhci_hcd ehci_hcd scsi_mod usbcore usb_common i915 intel_gtt i2c_algo_bit video 
drm_kms_helper drm i2c_core e1000e ptp pps_core ipmi_poweroff ipmi_msghandler 
button [last unloaded: rtc_efi]
Nov 01 06:10:04 osprey kernel: CPU: 4 PID: 13264 Comm: modprobe Not tainted 
3.17.2-1-ec2 #1
Nov 01 06:10:04 osprey kernel: Hardware name: GIGABYTE M4HM87P-00/M4HM87P-00, 
BIOS F5 06/23/2014
Nov 01 06:10:04 osprey kernel: task: 880401729d60 ti: 8803f869c000 
task.ti: 8803f869c000
Nov 01 06:10:04 osprey kernel: RIP: 0010:[]  
[] efi_call+0x8e/0x100
Nov 01 06:10:04 osprey kernel: RSP: 0018:8803f869f9b0  EFLAGS: 00010002
Nov 01 06:10:04 osprey kernel: RAX:  RBX: 8803f869fb60 RCX: 

Nov 01 06:10:04 osprey kernel: RDX: 80020020 RSI: 8803f869fb60 RDI: 
fffef0fe3990
Nov 01 06:10:04 osprey kernel: RBP: 8803f869fa80 R08: 8803f869fa90 R09: 
001e
Nov 01 06:10:04 osprey kernel: R10: fffef0ff7f58 R11: 8803f869f8c0 R12: 
0286
Nov 01 06:10:04 osprey kernel: R13: 8803f869fb61 R14: 8803f869fa90 R15: 
a40cafd8
Nov 01 06:10:04 osprey kernel: FS:  7fdd75904700() 
GS:88041eb0() knlGS:
Nov 01 06:10:04 osprey kernel: CS:  0010 DS:  ES:  CR0: 80050033
Nov 01 06:10:04 osprey kernel: CR2: 7fdd7593a4e1 CR3: 0009a000 CR4: 
001407e0
Nov 01 06:10:04 osprey kernel: Stack:
Nov 01 06:10:04 osprey kernel:  8803f869fb60 8803f869fa80 
8803f869fb60 fffef0fe3990
Nov 01 06:10:04 osprey kernel:  0286 8803f869fb60 
8803f869fa58 80050033
Nov 01 06:10:04 osprey kernel:    
 00ff
Nov 01 06:10:04 osprey kernel: Call Trace:
Nov 01 06:10:04 osprey kernel:  [] ? 
virt_efi_get_wakeup_time+0x51/0x80
Nov 01 06:10:04 osprey kernel:  [] 0xa40cf302
Nov 01 06:10:04 osprey kernel:  [] ? 
mutex_lock_interruptible+0x12/0x50
Nov 01 06:10:04 osprey kernel:  [] __rtc_read_alarm+0x96/0x3d0
Nov 01 06:10:04 osprey kernel:  [] ? ida_pre_get+0x54/0xf0
Nov 01 06:10:04 osprey kernel:  [] ? 
kmem_cache_alloc_trace+0x1d2/0x200
Nov 01 06:10:04 osprey kernel:  [] ? 
rtc_device_register+

EFI-related general protection faults

2014-11-01 Thread Steven Noonan

I've been getting general protection faults in EFI modules at boot time
across several machines. I originally thought it was just an EFI quirk
on one machine so I blacklisted the rtc-efi module (which was the
offender at the time), but I've seen it elsewhere since. Once this
happens, the system is only half-usable and needs to reboot. It's also
sadly not 100% reproducible at every boot.

>From what I've observed, it only occurs at boot time when the various
EFI modules are initializing. I haven't yet tested whether I can
trigger it just by unloading/reloading EFI modules repeatedly, but seems
like it'd be worth a shot.

In two of the three traces below, it seems to happen while two EFI
modules are loading at the same time (rtc_efi and efivars), so perhaps
there's some common data initialization that's racy?

>From the logs I've dug up so far, only 3.17 and later seem to have this
issue. But I can't be certain when the problem was introduced, as I
haven't done a bisection yet.

Hopefully someone has some ideas before I dive deeper.


I've seen this one across two machines now:

general protection fault:  [#1] SMP
Modules linked in: rtc_efi(+) efivars serio_raw iwldvm(+) mac80211 wmi 
tpm_tis(+) tpm thinkpad_acpi(+) battery nvram ac iwlwifi snd_hda_intel(+) 
i2c_i801(+) snd_hda_controller btusb(+) snd_hda_codec snd_hwdep bluetooth 
snd_pcm cfg80211 e1000e(+) snd_timer snd soundcore ptp lpc_ich mfd_core 
pps_core thermal evdev processor sch_fq_codel usbip_host usbip_core msr 
efivarfs ext4 crc16 jbd2 mbcache sd_mod crc_t10dif crct10dif_common 
crc32c_intel ahci libahci libata scsi_mod ehci_pci sdhci_pci xhci_hcd ehci_hcd 
sdhci mmc_core usbcore usb_common i915 button intel_gtt i2c_algo_bit video 
drm_kms_helper drm i2c_core
CPU: 0 PID: 195 Comm: systemd-udevd Not tainted 3.17.2-1-ec2 #1
Hardware name: LENOVO 2306CTO/2306CTO, BIOS G2ET95WW (2.55 ) 07/09/2013
task: 880406823ac0 ti: 880407ed8000 task.ti: 880407ed8000
RIP: 0010:[]  [] efi_call+0x8e/0x100
RSP: 0018:880407edb970  EFLAGS: 00010002
RAX:  RBX: 880407edba50 RCX: 
RDX: 880407edba44 RSI: 880407edba50 RDI: fffefa23dad8
RBP: 880407edba30 R08:  R09: 880407edba4f
R10: 880407edba50 R11: 880407edb908 R12: 0282
R13: 880407edba44 R14: a07285c0 R15: a0723fd8
FS:  7f07716577c0() GS:88041e20() 
knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f0772d91fc0 CR3: 00053000 CR4: 001407f0
Stack:
 880407edba50 880407edba50 880487edbae3 818035cc
 0282 880407edbad8 880407edba10 80050033
    00ff
Call Trace:
 [] ? virt_efi_get_time+0x49/0x70
 [] 0xa0728364
 [] __rtc_read_time.isra.3+0x4a/0x60
 [] rtc_read_time+0x39/0x50
 [] __rtc_read_alarm+0x25/0x3d0
 [] ? ida_pre_get+0xca/0xf0
 [] ? kmem_cache_alloc_trace+0x1d2/0x200
 [] ? rtc_device_register+0x58/0x2e0
 [] rtc_device_register+0x19d/0x2e0
 [] ? devm_rtc_device_register+0x34/0x90
 [] ? rtc_device_unregister+0x70/0x70
 [] devm_rtc_device_register+0x54/0x90
 [] __this_module+0x1a66/0x1a7a [rtc_efi]
 [] platform_drv_probe+0x2d/0x80
 [] driver_probe_device+0x8e/0x270
 [] __driver_attach+0x8b/0x90
 [] ? __device_attach+0x40/0x40
 [] bus_for_each_dev+0x6b/0xb0
 [] driver_attach+0x1e/0x20
 [] bus_add_driver+0x178/0x230
 [] ? __this_module+0x1a7a/0x1a7a [rtc_efi]
 [] driver_register+0x64/0xf0
 [] __platform_driver_register+0x4a/0x50
 [] platform_driver_probe+0x24/0xc0
 [] init_module+0x17/0x19 [rtc_efi]
 [] do_one_initcall+0x8c/0x1c0
 [] ? __vunmap+0xa2/0x100
 [] load_module+0x1c5c/0x2330
 [] ? store_uevent+0x40/0x40
 [] ? copy_module_from_fd.isra.39+0x111/0x170
 [] SyS_finit_module+0x7e/0x80
 [] system_call_fastpath+0x1a/0x1f
Code: b7 9d 00 41 0f 20 df 4c 89 3d 97 b7 9d 00 4c 8b 3d 98 b7 9d 00 41 
0f 22 df ff d7 80 3d 93 b7 9d 00 00 74 41 4c 8b 3d 7a b7 9d 00 <41> 0f 22 df 4c 
8b 3d 67 b7 9d 00 4c 89 3d 60 b7 9d 00 4c 89 35
RIP  [] efi_call+0x8e/0x100
 RSP 
---[ end trace 6aba1dee290210d8 ]---


Another machine, same fault location:

general protection fault:  [#1] SMP 
Modules linked in: rtc_efi(+) efivars(+) r8169(+) lpc_ich mfd_core mii 
thermal fan tpm_tis battery tpm evdev snd_hda_codec_realtek 
snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_controller 
snd_hda_codec snd_hwdep snd_pcm snd_timer snd soundcore acpi_cpufreq processor 
usbip_ho

[tip:core/urgent] compiler/gcc4+: Remove inaccurate comment about 'asm goto' miscompiles

2014-10-28 Thread tip-bot for Steven Noonan

Commit-ID:  5631b8fba640a4ab2f8a954f63a603fa34eda96b
Gitweb: http://git.kernel.org/tip/5631b8fba640a4ab2f8a954f63a603fa34eda96b
Author: Steven Noonan 
AuthorDate: Sat, 25 Oct 2014 15:09:42 -0700
Committer:  Ingo Molnar 
CommitDate: Tue, 28 Oct 2014 11:03:40 +0100

compiler/gcc4+: Remove inaccurate comment about 'asm goto' miscompiles

The bug referenced by the comment in this commit was not
completely fixed in GCC 4.8.2, as I mentioned in a thread back
in February:

   https://lkml.org/lkml/2014/2/12/797

The conclusion at that time was to make the quirk unconditional
until the bug could be found and fixed in GCC. Unfortunately,
when I submitted the patch (commit a9f18034) I left a comment
in that claimed the bug was fixed in GCC 4.8.2+.

This comment is inaccurate, and should be removed.

Signed-off-by: Steven Noonan 
Signed-off-by: Ingo Molnar 
Cc: Jakub Jelinek 
Cc: Richard Henderson 
Cc: Linus Torvalds 
Cc: Steven Rostedt 
Link: 
http://lkml.kernel.org/r/1414274982-14040-1-git-send-email-ste...@uplinklabs.net
Cc: Ingo Molnar 
---
 include/linux/compiler-gcc4.h | 1 -
 include/linux/compiler-gcc5.h | 1 -
 2 files changed, 2 deletions(-)

diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
index 2507fd2..d1a5582 100644
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -71,7 +71,6 @@
  *   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670
  *
  * Work it around via a compiler barrier quirk suggested by Jakub Jelinek.
- * Fixed in GCC 4.8.2 and later versions.
  *
  * (asm goto is automatically volatile - the naming reflects this.)
  */
diff --git a/include/linux/compiler-gcc5.h b/include/linux/compiler-gcc5.h
index cdd1cc2..c8c5659 100644
--- a/include/linux/compiler-gcc5.h
+++ b/include/linux/compiler-gcc5.h
@@ -53,7 +53,6 @@
  *   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670
  *
  * Work it around via a compiler barrier quirk suggested by Jakub Jelinek.
- * Fixed in GCC 4.8.2 and later versions.
  *
  * (asm goto is automatically volatile - the naming reflects this.)
  */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] compiler-gcc*.h: remove inaccurate comment about 'asm goto' miscompiles

2014-10-25 Thread Steven Noonan

The bug referenced by the comment in this commit was not completely fixed in
GCC 4.8.2, as I mentioned in a thread back in February[1]. The conclusion at
that time was to make the quirk unconditional until the bug could be found and
fixed in GCC. Unfortunately, when I submitted the patch (commit a9f18034)
I left a comment in that claimed the bug was fixed in GCC 4.8.2+.  This comment
is inaccurate, and should be removed.

[1] https://lkml.org/lkml/2014/2/12/797
Signed-off-by: Steven Noonan 
Cc: Ingo Molnar 
---
 include/linux/compiler-gcc4.h | 1 -
 include/linux/compiler-gcc5.h | 1 -
 2 files changed, 2 deletions(-)

diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
index 2507fd2..d1a5582 100644
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -71,7 +71,6 @@
  *   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670
  *
  * Work it around via a compiler barrier quirk suggested by Jakub Jelinek.
- * Fixed in GCC 4.8.2 and later versions.
  *
  * (asm goto is automatically volatile - the naming reflects this.)
  */
diff --git a/include/linux/compiler-gcc5.h b/include/linux/compiler-gcc5.h
index cdd1cc2..c8c5659 100644
--- a/include/linux/compiler-gcc5.h
+++ b/include/linux/compiler-gcc5.h
@@ -53,7 +53,6 @@
  *   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670
  *
  * Work it around via a compiler barrier quirk suggested by Jakub Jelinek.
- * Fixed in GCC 4.8.2 and later versions.
  *
  * (asm goto is automatically volatile - the naming reflects this.)
  */
-- 
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [compiler/gcc4] a9f180345f5: -100.0% last_state.is_incomplete_run

2014-10-02 Thread Steven Noonan

On Tue, Sep 30, 2014 at 1:14 AM, Fengguang Wu  wrote:
> Hi Steven,
>
> FYI, we noticed that your commit a9f180345f5378ac87d80ed0bea55ba421d83859
> ("compiler/gcc4: Make quirk for asm_volatile_goto() unconditional") fixed
> a number of machine boot failures in our LKP test farm. This is really 
> helpful!
> Our gcc version is 4.9.1 (Debian 4.9.1-11).

Hey cool, that's good news!

I rather wish we could find the root cause of the miscompiles, though,
so we could conditionalize the quirk on something again. I'm terrible
at debugging GCC behavior, though, so I'm not the right person for it.

- Steven

>
> 569d6557ab957d6  a9f180345f5378ac87d80ed0b
> ---  -
>  %stddev%change   %stddev
> \  | /
>  1 ± 0%-100.0%  0 ± 0%  
> lkp-st02/dd-write/5m-11HDD-RAID5-cfq-btrfs-100dd
>  1 ± 0%-100.0%  0 ± 0%  TOTAL 
> dmesg.kernel_BUG_at_fs/nfs/pagelist.c
>
> 569d6557ab957d6  a9f180345f5378ac87d80ed0b
> ---  -
>  1 ± 0%-100.0%  0 ± 0%  
> lkp-st02/dd-write/5m-11HDD-RAID5-cfq-btrfs-100dd
>  1 ± 0%-100.0%  0 ± 0%  TOTAL 
> dmesg.Kernel_panic-not_syncing:Fatal_exception
>
> 569d6557ab957d6  a9f180345f5378ac87d80ed0b
> ---  -
>  1 ± 0%-100.0%  0 ± 0%  
> lkp-st02/dd-write/5m-11HDD-RAID5-cfq-btrfs-100dd
>  1 ± 0%-100.0%  0 ± 0%  TOTAL dmesg.invalid_opcode
>
> 569d6557ab957d6  a9f180345f5378ac87d80ed0b
> ---  -
>  1 ± 0%-100.0%  0 ± 0%  ivb42/will-it-scale/futex4
>  1 ± 0%-100.0%  0 ± 0%  
> ivb44/fsmark/1x-1t-1HDD-xfs-4M-60G-NoSync
>  1 ± 0%-100.0%  0 ± 0%  
> ivb44/fsmark/1x-64t-1BRD_48G-btrfs-4M-40G-fsyncBeforeClose
>  1 ± 0%-100.0%  0 ± 0%  lkp-bdw01/blogbench/1SSD-btrfs
>  1 ± 0%-100.0%  0 ± 0%  
> lkp-hsw01/vm-scalability/300s-anon-rx-rand-mt
>  1 ± 0%-100.0%  0 ± 0%  lkp-sbx04/will-it-scale/futex3
>  1 ± 0%-100.0%  0 ± 0%  
> lkp-sbx04/will-it-scale/page_fault3
>  1 ± 0%-100.0%  0 ± 0%  
> lkp-st02/dd-write/5m-11HDD-RAID5-cfq-btrfs-100dd
>  8 ± 0%-100.0%  0 ± 0%  TOTAL last_state.is_incomplete_run
>
> 569d6557ab957d6  a9f180345f5378ac87d80ed0b
> ---  -
>  1 ± 0%-100.0%  0 ± 0%  ivb42/will-it-scale/futex4
>  1 ± 0%-100.0%  0 ± 0%  
> ivb44/fsmark/1x-1t-1HDD-xfs-4M-60G-NoSync
>  1 ± 0%-100.0%  0 ± 0%  
> ivb44/fsmark/1x-64t-1BRD_48G-btrfs-4M-40G-fsyncBeforeClose
>  1 ± 0%-100.0%  0 ± 0%  lkp-bdw01/blogbench/1SSD-btrfs
>  1 ± 0%-100.0%  0 ± 0%  
> lkp-hsw01/vm-scalability/300s-anon-rx-rand-mt
>  1 ± 0%-100.0%  0 ± 0%  lkp-sbx04/will-it-scale/futex3
>  1 ± 0%-100.0%  0 ± 0%  
> lkp-sbx04/will-it-scale/page_fault3
>  1 ± 0%-100.0%  0 ± 0%  
> lkp-st02/dd-write/5m-11HDD-RAID5-cfq-btrfs-100dd
>  8 ± 0%-100.0%  0 ± 0%  TOTAL last_state.booting
>
> Thanks,
> Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Thunderbolt driver hotplug not working correctly

2014-08-15 Thread Steven Noonan

On Wed, Aug 13, 2014 at 4:05 PM, Andreas Noever
 wrote:
> Hello Steven,
>
> I think that there are two problems:
>  - The Kernel does not notice that the device is gone.
>  - The first hotplug operation, after removing a coldplugged device fails.
>
> For the first one could you check whether thie pciehp (sub)-driver is loaded?
> (dmesg | grep pciehp should show something, the config option is
> CONFIG_HOTPLUG_PCI_PCIE).
>
> I was able reproduce the second problem on my machine. Could you test whether
> this patch fixes the problem?
>

With the patch I see that PCI bridge 09:00.0 survives the hotplug
events, but the bridge at 0a:00.0 and the Ethernet controller don't
survive.

>
> ---
>  drivers/thunderbolt/path.c | 21 -
>  1 file changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/thunderbolt/path.c b/drivers/thunderbolt/path.c
> index 8fcf8a7..9562cd0 100644
> --- a/drivers/thunderbolt/path.c
> +++ b/drivers/thunderbolt/path.c
> @@ -150,7 +150,26 @@ int tb_path_activate(struct tb_path *path)
>
> /* Activate hops. */
> for (i = path->path_length - 1; i >= 0; i--) {
> -   struct tb_regs_hop hop;
> +   struct tb_regs_hop hop = { 0 };
> +
> +   /*
> +* We do (currently) not tear down paths setup by the 
> firmeware.
> +* If a firmware device is unplugged and plugged in again then
> +* it can happen that we reuse some of the hops from the (now
> +* defunct) firmeware path. This causes the hotplug operation 
> to
> +* fail (the pci device does not show up). Clearing the hop
> +* before overwriting it fixes the problem.
> +*
> +* Should be removed once we discover and tear down firmeware
> +* paths.
> +*/
> +   res = tb_port_write(path->hops[i].in_port, &hop, TB_CFG_HOPS,
> +   2 * path->hops[i].in_hop_index, 2);
> +   if (res) {
> +   __tb_path_deactivate_hops(path, i);
> +   __tb_path_deallocate_nfc(path, 0);
> +   goto err;
> +   }
>
> /* dword 0 */
> hop.next_hop = path->hops[i].next_hop_index;
> --
> 2.0.4
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Thunderbolt driver hotplug not working correctly

2014-08-15 Thread Steven Noonan

On Tue, Aug 12, 2014 at 3:55 PM, Andreas Noever
 wrote:
> On Tue, Aug 12, 2014 at 11:01 AM, Steven Noonan  wrote:
>> Hello Andreas,
>>
>> I'm trying out the new Thunderbolt driver in Linus' tree and I'm
>> noticing device hotplug isn't quite working correctly.
>>
>> I've got a Haswell 2013 MacBook Pro Retina and the Apple-vended
>> Thunderbolt Ethernet adapter, which uses the tg3 driver.
>>
>> Here's what dmesg says when I unplug:
>>
>> [ 1866.359118] thunderbolt :08:00.0: resetting error on 0:3.
>> [ 1866.359150] thunderbolt :08:00.0: 0:3: unplugged
>>
>> When I re-plug the device:
>>
>> [ 1872.481675] thunderbolt :08:00.0: resetting error on 0:3.
>> [ 1872.481695] thunderbolt :08:00.0: 0:3: hotplug: scanning
>> [ 1872.481764] thunderbolt :08:00.0: 0:3: is connected, link is up 
>> (state: 2)
>> [ 1872.482078] thunderbolt :08:00.0: initializing Switch at 0x3 (depth: 
>> 1, up port: 1)
>> [ 1872.482084] thunderbolt :08:00.0: old switch config:
>> [ 1872.482089] thunderbolt :08:00.0:  Switch: 8086:1549 (Revision: 0, TB 
>> Version: 1)
>> [ 1872.482092] thunderbolt :08:00.0:   Max Port Number: 2
>> [ 1872.482094] thunderbolt :08:00.0:   Config:
>> [ 1872.482098] thunderbolt :08:00.0:Upstream Port Number: 0 Depth: 0 
>> Route String: 0x0 Enabled: 0, PlugEventsDelay: 10ms
>> [ 1872.482102] thunderbolt :08:00.0:unknown1: 0x0 unknown4: 0x0
>> [ 1872.496213] thunderbolt :08:00.0: 3: reading drom (length: 0x7b)
>> [ 1872.821969] thunderbolt :08:00.0: 3: uid: 0x10002014eed70
>> [ 1872.822067] thunderbolt :08:00.0:  Port 0: 8086:1549 (Revision: 0, TB 
>> Version: 1, Type: Port (0x1))
>> [ 1872.822069] thunderbolt :08:00.0:   Max hop id (in/out): 7/7
>> [ 1872.822071] thunderbolt :08:00.0:   Max counters: 8
>> [ 1872.822072] thunderbolt :08:00.0:   NFC Credits: 0x70
>> [ 1872.822566] thunderbolt :08:00.0:  Port 1: 8086:1549 (Revision: 0, TB 
>> Version: 1, Type: Port (0x1))
>> [ 1872.822567] thunderbolt :08:00.0:   Max hop id (in/out): 8/8
>> [ 1872.822569] thunderbolt :08:00.0:   Max counters: 4
>> [ 1872.822570] thunderbolt :08:00.0:   NFC Credits: 0x3c0
>> [ 1872.822666] thunderbolt :08:00.0:  Port 2: 8086:1549 (Revision: 0, TB 
>> Version: 1, Type: PCIe (0x100102))
>> [ 1872.822667] thunderbolt :08:00.0:   Max hop id (in/out): 8/8
>> [ 1872.822668] thunderbolt :08:00.0:   Max counters: 2
>> [ 1872.822670] thunderbolt :08:00.0:   NFC Credits: 0x70
>> [ 1872.822982] thunderbolt :08:00.0: 3: hotplug: activating pcie devices
>> [ 1872.823576] thunderbolt :08:00.0: 0:6 <-> 3:2 (PCI): activating
>> [ 1872.823580] thunderbolt :08:00.0: activating path from 0:6 to 3:2
>> [ 1872.823582] thunderbolt :08:00.0: 3:1: Writing hop 1, index 8
>> [ 1872.823583] thunderbolt :08:00.0: 3:1:  Hop through port 2 to hop 8 
>> (enabled)
>> [ 1872.823585] thunderbolt :08:00.0: 3:1:   Weight: 1 Priority: 3 
>> Credits: 16 Drop: 0
>> [ 1872.823587] thunderbolt :08:00.0: 3:1:Counter enabled: 0 Counter 
>> index: 2047
>> [ 1872.823589] thunderbolt :08:00.0: 3:1:   Flow Control (In/Eg): 1/0 
>> Shared Buffer (In/Eg): 0/0
>> [ 1872.823591] thunderbolt :08:00.0: 3:1:   Unknown1: 0x0 Unknown2: 0x0 
>> Unknown3: 0x0
>> [ 1872.823664] thunderbolt :08:00.0: 0:6: Writing hop 0, index 8
>> [ 1872.823666] thunderbolt :08:00.0: 0:6:  Hop through port 3 to hop 8 
>> (enabled)
>> [ 1872.823667] thunderbolt :08:00.0: 0:6:   Weight: 1 Priority: 3 
>> Credits: 7 Drop: 0
>> [ 1872.823669] thunderbolt :08:00.0: 0:6:Counter enabled: 0 Counter 
>> index: 2047
>> [ 1872.823670] thunderbolt :08:00.0: 0:6:   Flow Control (In/Eg): 1/1 
>> Shared Buffer (In/Eg): 0/0
>> [ 1872.823672] thunderbolt :08:00.0: 0:6:   Unknown1: 0x0 Unknown2: 0x0 
>> Unknown3: 0x0
>> [ 1872.823764] thunderbolt :08:00.0: path activation complete
>> [ 1872.823765] thunderbolt :08:00.0: activating path from 3:2 to 0:6
>> [ 1872.823767] thunderbolt :08:00.0: 0:3: Writing hop 1, index 8
>> [ 1872.823768] thunderbolt :08:00.0: 0:3:  Hop through port 6 to hop 8 
>> (enabled)
>> [ 1872.823770] thunderbolt :08:00.0: 0:3:   Weight: 1 Priority: 3 
>> Credits: 16 Drop: 0
>> [ 1872.823771] thunderbolt :08:00.0: 0:3:Counter enabled: 0 Counter 
>> index: 2047
>> [ 1872.823773] thunderbolt :08:00.0: 0:3:   Flow Control (In/Eg): 1/0 
>> Shared Buffer (In/Eg): 0

Re: wlwifi - Microcode SW error detected.

2014-08-12 Thread Steven Noonan

I'm seeing this on linus/master right now...

$ uname -r
3.16.0-ec2-10567-gc7a19c7

$ dmesg
[...]
[   32.450641] ieee80211 phy0: Hardware restart was requested
[   32.451124] iwlwifi :04:00.0: L1 Enabled; Disabling L0S
[   32.451457] iwlwifi :04:00.0: L1 Enabled; Disabling L0S
[   32.726567] iwlwifi :04:00.0: Microcode SW error detected.  Restarting 
0x200.
[   32.726572] iwlwifi :04:00.0: CSR values:
[   32.726574] iwlwifi :04:00.0: (2nd byte of CSR_INT_COALESCING is 
CSR_INT_PERIODIC_REG)
[   32.726586] iwlwifi :04:00.0:CSR_HW_IF_CONFIG_REG: 0X40489204
[   32.726597] iwlwifi :04:00.0:  CSR_INT_COALESCING: 0X8040
[   32.726608] iwlwifi :04:00.0: CSR_INT: 0X
[   32.726618] iwlwifi :04:00.0:CSR_INT_MASK: 0X
[   32.726629] iwlwifi :04:00.0:   CSR_FH_INT_STATUS: 0X
[   32.726639] iwlwifi :04:00.0: CSR_GPIO_IN: 0X
[   32.726650] iwlwifi :04:00.0:   CSR_RESET: 0X
[   32.726661] iwlwifi :04:00.0:CSR_GP_CNTRL: 0X080403c5
[   32.726671] iwlwifi :04:00.0:  CSR_HW_REV: 0X0144
[   32.726682] iwlwifi :04:00.0:  CSR_EEPROM_REG: 0X
[   32.726692] iwlwifi :04:00.0:   CSR_EEPROM_GP: 0X8000
[   32.726703] iwlwifi :04:00.0:  CSR_OTP_GP_REG: 0X803a
[   32.726713] iwlwifi :04:00.0: CSR_GIO_REG: 0X00080042
[   32.726724] iwlwifi :04:00.0:CSR_GP_UCODE_REG: 0X
[   32.726734] iwlwifi :04:00.0:   CSR_GP_DRIVER_REG: 0X
[   32.726745] iwlwifi :04:00.0:   CSR_UCODE_DRV_GP1: 0X
[   32.726755] iwlwifi :04:00.0:   CSR_UCODE_DRV_GP2: 0X
[   32.726766] iwlwifi :04:00.0: CSR_LED_REG: 0X0060
[   32.726777] iwlwifi :04:00.0:CSR_DRAM_INT_TBL_REG: 0X880d2aef
[   32.726787] iwlwifi :04:00.0:CSR_GIO_CHICKEN_BITS: 0X27800200
[   32.726798] iwlwifi :04:00.0: CSR_ANA_PLL_CFG: 0Xd5d5
[   32.726808] iwlwifi :04:00.0:  CSR_MONITOR_STATUS_REG: 0X3c08019d
[   32.726819] iwlwifi :04:00.0:   CSR_HW_REV_WA_REG: 0X0001001a
[   32.726830] iwlwifi :04:00.0:CSR_DBG_HPET_MEM_REG: 0X
[   32.726831] iwlwifi :04:00.0: FH register values:
[   32.726851] iwlwifi :04:00.0: FH_RSCSR_CHNL0_STTS_WPTR_REG: 
0X2135fa00
[   32.726863] iwlwifi :04:00.0:FH_RSCSR_CHNL0_RBDCB_BASE_REG: 
0X02135fb0
[   32.726875] iwlwifi :04:00.0:  FH_RSCSR_CHNL0_WPTR: 
0X0020
[   32.726886] iwlwifi :04:00.0: FH_MEM_RCSR_CHNL0_CONFIG_REG: 
0X80801114
[   32.726898] iwlwifi :04:00.0:  FH_MEM_RSSR_SHARED_CTRL_REG: 
0X00fc
[   32.726909] iwlwifi :04:00.0:FH_MEM_RSSR_RX_STATUS_REG: 
0X0703
[   32.726921] iwlwifi :04:00.0:FH_MEM_RSSR_RX_ENABLE_ERR_IRQ2DRV: 
0X
[   32.726933] iwlwifi :04:00.0:FH_TSSR_TX_STATUS_REG: 
0X07ff0001
[   32.726945] iwlwifi :04:00.0: FH_TSSR_TX_ERROR_REG: 
0X
[   32.727049] iwlwifi :04:00.0: Start IWL Error Log Dump:
[   32.727051] iwlwifi :04:00.0: Status: 0x, count: 6
[   32.727053] iwlwifi :04:00.0: Loaded firmware version: 23.214.9.0
[   32.727055] iwlwifi :04:00.0: 0x277B | ADVANCED_SYSASSERT
[   32.727056] iwlwifi :04:00.0: 0x00A002B0 | uPc
[   32.727058] iwlwifi :04:00.0: 0x | branchlink1
[   32.727059] iwlwifi :04:00.0: 0x0BA4 | branchlink2
[   32.727061] iwlwifi :04:00.0: 0x000166A4 | interruptlink1
[   32.727062] iwlwifi :04:00.0: 0x00173B03 | interruptlink2
[   32.727064] iwlwifi :04:00.0: 0x0024 | data1
[   32.727065] iwlwifi :04:00.0: 0x0018 | data2
[   32.727066] iwlwifi :04:00.0: 0xDEADBEEF | data3
[   32.727068] iwlwifi :04:00.0: 0x003BE59F | beacon time
[   32.727069] iwlwifi :04:00.0: 0x00041A80 | tsf low
[   32.727070] iwlwifi :04:00.0: 0x | tsf hi
[   32.727072] iwlwifi :04:00.0: 0x | time gp1
[   32.727073] iwlwifi :04:00.0: 0x00041A80 | time gp2
[   32.727075] iwlwifi :04:00.0: 0x | time gp3
[   32.727076] iwlwifi :04:00.0: 0x000417D6 | uCode version
[   32.727077] iwlwifi :04:00.0: 0x0144 | hw version
[   32.727079] iwlwifi :04:00.0: 0x40489204 | board version
[   32.727080] iwlwifi :04:00.0: 0x001C | hcmd
[   32.727081] iwlwifi :04:00.0: 0x00022000 | isr0
[   32.727083] iwlwifi :04:00.0: 0x0100 | isr1
[   32.727084] iwlwifi :04:00.0: 0x0002 | isr2
[   32.727085] iwlwifi :04:00.0: 0x004040C0 | isr3
[   32.727087] iwlwifi :04:00.0: 0x | isr4
[   32.727088] iwlwifi :04:00.0: 0x0110 | isr_pref
[   32.727090] iwlwifi :04:00.0: 0x | wait_event
[   32.727091] iwlwifi :04:00.0: 0x0080 | l

Thunderbolt driver hotplug not working correctly

2014-08-12 Thread Steven Noonan

Hello Andreas,

I'm trying out the new Thunderbolt driver in Linus' tree and I'm
noticing device hotplug isn't quite working correctly.

I've got a Haswell 2013 MacBook Pro Retina and the Apple-vended
Thunderbolt Ethernet adapter, which uses the tg3 driver.

Here's what dmesg says when I unplug:

[ 1866.359118] thunderbolt :08:00.0: resetting error on 0:3.
[ 1866.359150] thunderbolt :08:00.0: 0:3: unplugged

When I re-plug the device:

[ 1872.481675] thunderbolt :08:00.0: resetting error on 0:3.
[ 1872.481695] thunderbolt :08:00.0: 0:3: hotplug: scanning
[ 1872.481764] thunderbolt :08:00.0: 0:3: is connected, link is up (state: 
2)
[ 1872.482078] thunderbolt :08:00.0: initializing Switch at 0x3 (depth: 1, 
up port: 1)
[ 1872.482084] thunderbolt :08:00.0: old switch config:
[ 1872.482089] thunderbolt :08:00.0:  Switch: 8086:1549 (Revision: 0, TB 
Version: 1)
[ 1872.482092] thunderbolt :08:00.0:   Max Port Number: 2
[ 1872.482094] thunderbolt :08:00.0:   Config:
[ 1872.482098] thunderbolt :08:00.0:Upstream Port Number: 0 Depth: 0 
Route String: 0x0 Enabled: 0, PlugEventsDelay: 10ms
[ 1872.482102] thunderbolt :08:00.0:unknown1: 0x0 unknown4: 0x0
[ 1872.496213] thunderbolt :08:00.0: 3: reading drom (length: 0x7b)
[ 1872.821969] thunderbolt :08:00.0: 3: uid: 0x10002014eed70
[ 1872.822067] thunderbolt :08:00.0:  Port 0: 8086:1549 (Revision: 0, TB 
Version: 1, Type: Port (0x1))
[ 1872.822069] thunderbolt :08:00.0:   Max hop id (in/out): 7/7
[ 1872.822071] thunderbolt :08:00.0:   Max counters: 8
[ 1872.822072] thunderbolt :08:00.0:   NFC Credits: 0x70
[ 1872.822566] thunderbolt :08:00.0:  Port 1: 8086:1549 (Revision: 0, TB 
Version: 1, Type: Port (0x1))
[ 1872.822567] thunderbolt :08:00.0:   Max hop id (in/out): 8/8
[ 1872.822569] thunderbolt :08:00.0:   Max counters: 4
[ 1872.822570] thunderbolt :08:00.0:   NFC Credits: 0x3c0
[ 1872.822666] thunderbolt :08:00.0:  Port 2: 8086:1549 (Revision: 0, TB 
Version: 1, Type: PCIe (0x100102))
[ 1872.822667] thunderbolt :08:00.0:   Max hop id (in/out): 8/8
[ 1872.822668] thunderbolt :08:00.0:   Max counters: 2
[ 1872.822670] thunderbolt :08:00.0:   NFC Credits: 0x70
[ 1872.822982] thunderbolt :08:00.0: 3: hotplug: activating pcie devices
[ 1872.823576] thunderbolt :08:00.0: 0:6 <-> 3:2 (PCI): activating
[ 1872.823580] thunderbolt :08:00.0: activating path from 0:6 to 3:2
[ 1872.823582] thunderbolt :08:00.0: 3:1: Writing hop 1, index 8
[ 1872.823583] thunderbolt :08:00.0: 3:1:  Hop through port 2 to hop 8 
(enabled)
[ 1872.823585] thunderbolt :08:00.0: 3:1:   Weight: 1 Priority: 3 Credits: 
16 Drop: 0
[ 1872.823587] thunderbolt :08:00.0: 3:1:Counter enabled: 0 Counter 
index: 2047
[ 1872.823589] thunderbolt :08:00.0: 3:1:   Flow Control (In/Eg): 1/0 
Shared Buffer (In/Eg): 0/0
[ 1872.823591] thunderbolt :08:00.0: 3:1:   Unknown1: 0x0 Unknown2: 0x0 
Unknown3: 0x0
[ 1872.823664] thunderbolt :08:00.0: 0:6: Writing hop 0, index 8
[ 1872.823666] thunderbolt :08:00.0: 0:6:  Hop through port 3 to hop 8 
(enabled)
[ 1872.823667] thunderbolt :08:00.0: 0:6:   Weight: 1 Priority: 3 Credits: 
7 Drop: 0
[ 1872.823669] thunderbolt :08:00.0: 0:6:Counter enabled: 0 Counter 
index: 2047
[ 1872.823670] thunderbolt :08:00.0: 0:6:   Flow Control (In/Eg): 1/1 
Shared Buffer (In/Eg): 0/0
[ 1872.823672] thunderbolt :08:00.0: 0:6:   Unknown1: 0x0 Unknown2: 0x0 
Unknown3: 0x0
[ 1872.823764] thunderbolt :08:00.0: path activation complete
[ 1872.823765] thunderbolt :08:00.0: activating path from 3:2 to 0:6
[ 1872.823767] thunderbolt :08:00.0: 0:3: Writing hop 1, index 8
[ 1872.823768] thunderbolt :08:00.0: 0:3:  Hop through port 6 to hop 8 
(enabled)
[ 1872.823770] thunderbolt :08:00.0: 0:3:   Weight: 1 Priority: 3 Credits: 
16 Drop: 0
[ 1872.823771] thunderbolt :08:00.0: 0:3:Counter enabled: 0 Counter 
index: 2047
[ 1872.823773] thunderbolt :08:00.0: 0:3:   Flow Control (In/Eg): 1/0 
Shared Buffer (In/Eg): 0/0
[ 1872.823774] thunderbolt :08:00.0: 0:3:   Unknown1: 0x0 Unknown2: 0x0 
Unknown3: 0x0
[ 1872.823864] thunderbolt :08:00.0: 3:2: Writing hop 0, index 8
[ 1872.823865] thunderbolt :08:00.0: 3:2:  Hop through port 1 to hop 8 
(enabled)
[ 1872.823867] thunderbolt :08:00.0: 3:2:   Weight: 1 Priority: 3 Credits: 
7 Drop: 0
[ 1872.823868] thunderbolt :08:00.0: 3:2:Counter enabled: 0 Counter 
index: 2047
[ 1872.823870] thunderbolt :08:00.0: 3:2:   Flow Control (In/Eg): 1/1 
Shared Buffer (In/Eg): 0/0
[ 1872.823871] thunderbolt :08:00.0: 3:2:   Unknown1: 0x0 Unknown2: 0x0 
Unknown3: 0x0
[ 1872.823963] thunderbolt :08:00.0: path activation complete


And the tg3 driver didn't notice anything happened at all during that
process, so I did an rmmod/modprobe:

[ 1894.412903] tg3 :0b:00.0: tg3_abort_hw timed out, TX_MODE_ENABLE will 
not clear MAC_T

[PATCH 1/2 v2] lockref: make lockref count signed

2014-08-05 Thread Steven Noonan

There are numerous places where this is casted to a signed value anyway, for
comparisons checking that the value hasn't been set to the 'dead' value of
-128. This change turns the count value into a signed integer, which is how
it's already being treated anyway. This reduces the chance for developer errors
when making those comparisons.

Suggested-by: Linus Torvalds 
Cc: NeilBrown 
Cc: Al Viro 
Signed-off-by: Steven Noonan 
---

v2: d_count() function was unsigned and there was another cast inside autofs4.
Fixed those as well.

 fs/autofs4/root.c   | 2 +-
 fs/dcache.c | 6 +++---
 include/linux/dcache.h  | 2 +-
 include/linux/lockref.h | 2 +-
 lib/lockref.c   | 4 ++--
 5 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
index cc87c1a..c4583c8 100644
--- a/fs/autofs4/root.c
+++ b/fs/autofs4/root.c
@@ -179,7 +179,7 @@ static struct dentry *autofs4_lookup_active(struct dentry 
*dentry)
spin_lock(&active->d_lock);
 
/* Already gone? */
-   if ((int) d_count(active) <= 0)
+   if (d_count(active) <= 0)
goto next;
 
qstr = &active->d_name;
diff --git a/fs/dcache.c b/fs/dcache.c
index 06f6585..f7a592e 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -479,7 +479,7 @@ static void __dentry_kill(struct dentry *dentry)
 * dentry_iput drops the locks, at which point nobody (except
 * transient RCU lookups) can reach this dentry.
 */
-   BUG_ON((int)dentry->d_lockref.count > 0);
+   BUG_ON(dentry->d_lockref.count > 0);
this_cpu_dec(nr_dentry);
if (dentry->d_op && dentry->d_op->d_release)
dentry->d_op->d_release(dentry);
@@ -532,7 +532,7 @@ static inline struct dentry *lock_parent(struct dentry 
*dentry)
struct dentry *parent = dentry->d_parent;
if (IS_ROOT(dentry))
return NULL;
-   if (unlikely((int)dentry->d_lockref.count < 0))
+   if (unlikely(dentry->d_lockref.count < 0))
return NULL;
if (likely(spin_trylock(&parent->d_lock)))
return parent;
@@ -848,7 +848,7 @@ static void shrink_dentry_list(struct list_head *list)
 * We found an inuse dentry which was not removed from
 * the LRU because of laziness during lookup. Do not free it.
 */
-   if ((int)dentry->d_lockref.count > 0) {
+   if (dentry->d_lockref.count > 0) {
spin_unlock(&dentry->d_lock);
if (parent)
spin_unlock(&parent->d_lock);
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 3c7ec32..7531470 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -320,7 +320,7 @@ extern struct dentry *__d_lookup(const struct dentry *, 
const struct qstr *);
 extern struct dentry *__d_lookup_rcu(const struct dentry *parent,
const struct qstr *name, unsigned *seq);
 
-static inline unsigned d_count(const struct dentry *dentry)
+static inline int d_count(const struct dentry *dentry)
 {
return dentry->d_lockref.count;
 }
diff --git a/include/linux/lockref.h b/include/linux/lockref.h
index 4bfde0e..8558ff1 100644
--- a/include/linux/lockref.h
+++ b/include/linux/lockref.h
@@ -28,7 +28,7 @@ struct lockref {
 #endif
struct {
spinlock_t lock;
-   unsigned int count;
+   int count;
};
};
 };
diff --git a/lib/lockref.c b/lib/lockref.c
index d2233de..e4c4255 100644
--- a/lib/lockref.c
+++ b/lib/lockref.c
@@ -158,7 +158,7 @@ int lockref_get_not_dead(struct lockref *lockref)
 
CMPXCHG_LOOP(
new.count++;
-   if ((int)old.count < 0)
+   if (old.count < 0)
return 0;
,
return 1;
@@ -166,7 +166,7 @@ int lockref_get_not_dead(struct lockref *lockref)
 
spin_lock(&lockref->lock);
retval = 0;
-   if ((int) lockref->count >= 0) {
+   if (lockref->count >= 0) {
lockref->count++;
retval = 1;
}
-- 
2.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] lockref: make lockref count signed

2014-08-05 Thread Steven Noonan

On Tue, Aug 5, 2014 at 2:44 PM, NeilBrown  wrote:
> On Tue,  5 Aug 2014 12:52:27 -0700 Steven Noonan 
> wrote:
>
>> There are numerous places where this is casted to a signed value anyway, for
>> comparisons checking that the value hasn't been set to the 'dead' value of
>> -128. This change turns the count value into a signed integer, which is how
>> it's already being treated anyway. This reduces the chance for developer 
>> errors
>> when making those comparisons.
>>
>> Suggested-by: Linus Torvalds 
>> Cc: NeilBrown 
>> Cc: Al Viro 
>> Signed-off-by: Steven Noonan 
>
> Thanks!  But you missed one "(int)" removal :-)
>
> fs/autofs4/root.c:  if ((int) d_count(active) <= 0)

Ahh, yeah. The return type of d_count() also needs to be fixed up.
I'll send a new version in a few minutes...

> NeilBrown
>
>> ---
>>  fs/dcache.c | 6 +++---
>>  include/linux/lockref.h | 2 +-
>>  lib/lockref.c   | 4 ++--
>>  3 files changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/dcache.c b/fs/dcache.c
>> index 06f6585..f7a592e 100644
>> --- a/fs/dcache.c
>> +++ b/fs/dcache.c
>> @@ -479,7 +479,7 @@ static void __dentry_kill(struct dentry *dentry)
>>* dentry_iput drops the locks, at which point nobody (except
>>* transient RCU lookups) can reach this dentry.
>>*/
>> - BUG_ON((int)dentry->d_lockref.count > 0);
>> + BUG_ON(dentry->d_lockref.count > 0);
>>   this_cpu_dec(nr_dentry);
>>   if (dentry->d_op && dentry->d_op->d_release)
>>   dentry->d_op->d_release(dentry);
>> @@ -532,7 +532,7 @@ static inline struct dentry *lock_parent(struct dentry 
>> *dentry)
>>   struct dentry *parent = dentry->d_parent;
>>   if (IS_ROOT(dentry))
>>   return NULL;
>> - if (unlikely((int)dentry->d_lockref.count < 0))
>> + if (unlikely(dentry->d_lockref.count < 0))
>>   return NULL;
>>   if (likely(spin_trylock(&parent->d_lock)))
>>   return parent;
>> @@ -848,7 +848,7 @@ static void shrink_dentry_list(struct list_head *list)
>>* We found an inuse dentry which was not removed from
>>* the LRU because of laziness during lookup. Do not free it.
>>*/
>> - if ((int)dentry->d_lockref.count > 0) {
>> + if (dentry->d_lockref.count > 0) {
>>   spin_unlock(&dentry->d_lock);
>>   if (parent)
>>   spin_unlock(&parent->d_lock);
>> diff --git a/include/linux/lockref.h b/include/linux/lockref.h
>> index 4bfde0e..8558ff1 100644
>> --- a/include/linux/lockref.h
>> +++ b/include/linux/lockref.h
>> @@ -28,7 +28,7 @@ struct lockref {
>>  #endif
>>   struct {
>>   spinlock_t lock;
>> - unsigned int count;
>> + int count;
>>   };
>>   };
>>  };
>> diff --git a/lib/lockref.c b/lib/lockref.c
>> index d2233de..e4c4255 100644
>> --- a/lib/lockref.c
>> +++ b/lib/lockref.c
>> @@ -158,7 +158,7 @@ int lockref_get_not_dead(struct lockref *lockref)
>>
>>   CMPXCHG_LOOP(
>>   new.count++;
>> - if ((int)old.count < 0)
>> + if (old.count < 0)
>>   return 0;
>>   ,
>>   return 1;
>> @@ -166,7 +166,7 @@ int lockref_get_not_dead(struct lockref *lockref)
>>
>>   spin_lock(&lockref->lock);
>>   retval = 0;
>> - if ((int) lockref->count >= 0) {
>> + if (lockref->count >= 0) {
>>   lockref->count++;
>>   retval = 1;
>>   }
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] lockref: replace lockref_get_not_zero with lockref_get_active

2014-08-05 Thread Steven Noonan

The new lockref_get_active ensures the count is nonzero and that the lockref is
not dead (i.e, count > 0). Simply comparing to zero was risky for the only
caller of this function (dget_parent), as it wasn't holding the lockref->lock.

Suggested-by: Linus Torvalds 
Cc: NeilBrown 
Cc: Al Viro 
Signed-off-by: Steven Noonan 
---
 fs/dcache.c |  2 +-
 include/linux/lockref.h |  2 +-
 lib/lockref.c   | 13 +++--
 3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index f7a592e..66ee98e 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -699,7 +699,7 @@ struct dentry *dget_parent(struct dentry *dentry)
 */
rcu_read_lock();
ret = ACCESS_ONCE(dentry->d_parent);
-   gotref = lockref_get_not_zero(&ret->d_lockref);
+   gotref = lockref_get_active(&ret->d_lockref);
rcu_read_unlock();
if (likely(gotref)) {
if (likely(ret == ACCESS_ONCE(dentry->d_parent)))
diff --git a/include/linux/lockref.h b/include/linux/lockref.h
index 8558ff1..1a9827e 100644
--- a/include/linux/lockref.h
+++ b/include/linux/lockref.h
@@ -34,7 +34,7 @@ struct lockref {
 };
 
 extern void lockref_get(struct lockref *);
-extern int lockref_get_not_zero(struct lockref *);
+extern int lockref_get_active(struct lockref *);
 extern int lockref_get_or_lock(struct lockref *);
 extern int lockref_put_or_lock(struct lockref *);
 
diff --git a/lib/lockref.c b/lib/lockref.c
index e4c4255..318bef6 100644
--- a/lib/lockref.c
+++ b/lib/lockref.c
@@ -60,17 +60,18 @@ void lockref_get(struct lockref *lockref)
 EXPORT_SYMBOL(lockref_get);
 
 /**
- * lockref_get_not_zero - Increments count unless the count is 0
+ * lockref_get_active - Increments count unless the count is 0 or ref is dead
  * @lockref: pointer to lockref structure
- * Return: 1 if count updated successfully or 0 if count was zero
+ * Return: 1 if count updated successfully or 0 if count was zero or lockref
+ * was dead
  */
-int lockref_get_not_zero(struct lockref *lockref)
+int lockref_get_active(struct lockref *lockref)
 {
int retval;
 
CMPXCHG_LOOP(
new.count++;
-   if (!old.count)
+   if (old.count < 1)
return 0;
,
return 1;
@@ -78,14 +79,14 @@ int lockref_get_not_zero(struct lockref *lockref)
 
spin_lock(&lockref->lock);
retval = 0;
-   if (lockref->count) {
+   if (lockref->count >= 1) {
lockref->count++;
retval = 1;
}
spin_unlock(&lockref->lock);
return retval;
 }
-EXPORT_SYMBOL(lockref_get_not_zero);
+EXPORT_SYMBOL(lockref_get_active);
 
 /**
  * lockref_get_or_lock - Increments count unless the count is 0
-- 
2.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] lockref: make lockref count signed

2014-08-05 Thread Steven Noonan

There are numerous places where this is casted to a signed value anyway, for
comparisons checking that the value hasn't been set to the 'dead' value of
-128. This change turns the count value into a signed integer, which is how
it's already being treated anyway. This reduces the chance for developer errors
when making those comparisons.

Suggested-by: Linus Torvalds 
Cc: NeilBrown 
Cc: Al Viro 
Signed-off-by: Steven Noonan 
---
 fs/dcache.c | 6 +++---
 include/linux/lockref.h | 2 +-
 lib/lockref.c   | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 06f6585..f7a592e 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -479,7 +479,7 @@ static void __dentry_kill(struct dentry *dentry)
 * dentry_iput drops the locks, at which point nobody (except
 * transient RCU lookups) can reach this dentry.
 */
-   BUG_ON((int)dentry->d_lockref.count > 0);
+   BUG_ON(dentry->d_lockref.count > 0);
this_cpu_dec(nr_dentry);
if (dentry->d_op && dentry->d_op->d_release)
dentry->d_op->d_release(dentry);
@@ -532,7 +532,7 @@ static inline struct dentry *lock_parent(struct dentry 
*dentry)
struct dentry *parent = dentry->d_parent;
if (IS_ROOT(dentry))
return NULL;
-   if (unlikely((int)dentry->d_lockref.count < 0))
+   if (unlikely(dentry->d_lockref.count < 0))
return NULL;
if (likely(spin_trylock(&parent->d_lock)))
return parent;
@@ -848,7 +848,7 @@ static void shrink_dentry_list(struct list_head *list)
 * We found an inuse dentry which was not removed from
 * the LRU because of laziness during lookup. Do not free it.
 */
-   if ((int)dentry->d_lockref.count > 0) {
+   if (dentry->d_lockref.count > 0) {
spin_unlock(&dentry->d_lock);
if (parent)
spin_unlock(&parent->d_lock);
diff --git a/include/linux/lockref.h b/include/linux/lockref.h
index 4bfde0e..8558ff1 100644
--- a/include/linux/lockref.h
+++ b/include/linux/lockref.h
@@ -28,7 +28,7 @@ struct lockref {
 #endif
struct {
spinlock_t lock;
-   unsigned int count;
+   int count;
};
};
 };
diff --git a/lib/lockref.c b/lib/lockref.c
index d2233de..e4c4255 100644
--- a/lib/lockref.c
+++ b/lib/lockref.c
@@ -158,7 +158,7 @@ int lockref_get_not_dead(struct lockref *lockref)
 
CMPXCHG_LOOP(
new.count++;
-   if ((int)old.count < 0)
+   if (old.count < 0)
return 0;
,
return 1;
@@ -166,7 +166,7 @@ int lockref_get_not_dead(struct lockref *lockref)
 
spin_lock(&lockref->lock);
retval = 0;
-   if ((int) lockref->count >= 0) {
+   if (lockref->count >= 0) {
lockref->count++;
retval = 1;
}
-- 
2.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Is lockref_get_not_zero() really correct in dget_parent()

2014-08-04 Thread Steven Noonan

On Mon, Aug 4, 2014 at 9:54 PM, Steven Noonan  wrote:
> On Mon, Aug 04, 2014 at 09:07:32PM -0700, Linus Torvalds wrote:
>> On Mon, Aug 4, 2014 at 8:17 PM, NeilBrown  wrote:
>> >
>> >  I've been looking at last year's change to dentry refcounting which sets 
>> > the
>> >  refcount to -128 (mark_dead()) when the dentry is gone.
>> >
>> >  As this is an "unsigned long" and there are several places where
>> >  d_lockref.count is compared e.g. "> 1", I start feeling uncomfortable, as
>> >  "-128" is greater than "1".
>>
>> Anybody who checks the lockref count without holding the lock is
>> pretty much buggy by definition. And if you hold the lock, you had
>> better not ever see a negative (== large positive) number, because
>> that would be all kinds of buggy too.
>>
>> So I don't *think* that people who compare with "> 1" kind of things
>> should be problematic. I wouldn't necessarily disagree with the notion
>> of making a lockref be a signed entity, though. It started out
>> unsigned, but it started out without that dead state too, so that
>> unsigned thing can be considered a historical artifact rather than any
>> real design decision.
>>
>> Anyway, I think my argument is that anybody who actually looks at
>> d_count() and might see that magic dead value is so fundamentally
>> broken in other ways that I wouldn't worry too much about *that* part.
>>
>> But your "lockref_get_not_zero()" thing is a different thing:
>>
>> >  That brings me to dget_parent().  It only has rcu_read_lock() protection, 
>> > and
>> >  yet uses lockref_get_not_zero().  This doesn't seem safe.
>>
>> Yes, agreed, it's ugly and wrong, and smells bad.
>>
>> But I think it happens to be safe (because the re-checking of d_parent
>> will fail if a rename and dput could have triggered it, and even the
>> extraneous "dput()" is actually safe, because it won't cause the value
>> to become zero, so nothing bad happens. But it *is* kind of subtle,
>> and I do agree that it's *needlessly* so.
>>
>> So it might be a good idea to get rid of the "not zero" version
>> entirely, and make the check be about being *active* (ie not zero, and
>> not dead).
>>
>> The only user of lockref_get_not_zero() is that dget_parent() thing,
>> so that should be easy.
>>
>> So renaming it to "lockref_get_active()", and changing the "not zero"
>> test to check for "positive" and change the rtype of "count" to be
>> signed, all sound like good things to me.
>>
>> But I don't actually think it's an active bug, it's just an "active
>> horribly ugly and subtly working code". I guess in theory if you can
>> get lots of CPU's triggering the race at the same time, the magic
>> negative number could become zero and positive, but at that point I
>> don't think we're really talking reality any more.
>>
>> Can somebody pick holes in that? Does somebody want to send in the
>> cleanup patch?
>
> How does this look?
>
>
> diff --git a/fs/dcache.c b/fs/dcache.c
> index e99c6f5..1e7dc31 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -479,7 +479,7 @@ static void __dentry_kill(struct dentry *dentry)
>  * dentry_iput drops the locks, at which point nobody (except
>  * transient RCU lookups) can reach this dentry.
>  */
> -   BUG_ON((int)dentry->d_lockref.count > 0);
> +   BUG_ON(dentry->d_lockref.count > 0);
> this_cpu_dec(nr_dentry);
> if (dentry->d_op && dentry->d_op->d_release)
> dentry->d_op->d_release(dentry);
> @@ -532,7 +532,7 @@ static inline struct dentry *lock_parent(struct dentry 
> *dentry)
> struct dentry *parent = dentry->d_parent;
> if (IS_ROOT(dentry))
> return NULL;
> -   if (unlikely((int)dentry->d_lockref.count < 0))
> +   if (unlikely(dentry->d_lockref.count < 0))
> return NULL;
> if (likely(spin_trylock(&parent->d_lock)))
> return parent;
> @@ -699,7 +699,7 @@ struct dentry *dget_parent(struct dentry *dentry)
>  */
> rcu_read_lock();
> ret = ACCESS_ONCE(dentry->d_parent);
> -   gotref = lockref_get_not_zero(&ret->d_lockref);
> +   gotref = lockref_get_active(&ret->d_lockref);
> rcu_read_unlock();
> if (

Re: Is lockref_get_not_zero() really correct in dget_parent()

2014-08-04 Thread Steven Noonan

On Mon, Aug 04, 2014 at 09:07:32PM -0700, Linus Torvalds wrote:
> On Mon, Aug 4, 2014 at 8:17 PM, NeilBrown  wrote:
> >
> >  I've been looking at last year's change to dentry refcounting which sets 
> > the
> >  refcount to -128 (mark_dead()) when the dentry is gone.
> >
> >  As this is an "unsigned long" and there are several places where
> >  d_lockref.count is compared e.g. "> 1", I start feeling uncomfortable, as
> >  "-128" is greater than "1".
> 
> Anybody who checks the lockref count without holding the lock is
> pretty much buggy by definition. And if you hold the lock, you had
> better not ever see a negative (== large positive) number, because
> that would be all kinds of buggy too.
> 
> So I don't *think* that people who compare with "> 1" kind of things
> should be problematic. I wouldn't necessarily disagree with the notion
> of making a lockref be a signed entity, though. It started out
> unsigned, but it started out without that dead state too, so that
> unsigned thing can be considered a historical artifact rather than any
> real design decision.
> 
> Anyway, I think my argument is that anybody who actually looks at
> d_count() and might see that magic dead value is so fundamentally
> broken in other ways that I wouldn't worry too much about *that* part.
> 
> But your "lockref_get_not_zero()" thing is a different thing:
> 
> >  That brings me to dget_parent().  It only has rcu_read_lock() protection, 
> > and
> >  yet uses lockref_get_not_zero().  This doesn't seem safe.
> 
> Yes, agreed, it's ugly and wrong, and smells bad.
> 
> But I think it happens to be safe (because the re-checking of d_parent
> will fail if a rename and dput could have triggered it, and even the
> extraneous "dput()" is actually safe, because it won't cause the value
> to become zero, so nothing bad happens. But it *is* kind of subtle,
> and I do agree that it's *needlessly* so.
> 
> So it might be a good idea to get rid of the "not zero" version
> entirely, and make the check be about being *active* (ie not zero, and
> not dead).
> 
> The only user of lockref_get_not_zero() is that dget_parent() thing,
> so that should be easy.
> 
> So renaming it to "lockref_get_active()", and changing the "not zero"
> test to check for "positive" and change the rtype of "count" to be
> signed, all sound like good things to me.
> 
> But I don't actually think it's an active bug, it's just an "active
> horribly ugly and subtly working code". I guess in theory if you can
> get lots of CPU's triggering the race at the same time, the magic
> negative number could become zero and positive, but at that point I
> don't think we're really talking reality any more.
> 
> Can somebody pick holes in that? Does somebody want to send in the
> cleanup patch?

How does this look?


diff --git a/fs/dcache.c b/fs/dcache.c
index e99c6f5..1e7dc31 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -479,7 +479,7 @@ static void __dentry_kill(struct dentry *dentry)
 * dentry_iput drops the locks, at which point nobody (except
 * transient RCU lookups) can reach this dentry.
 */
-   BUG_ON((int)dentry->d_lockref.count > 0);
+   BUG_ON(dentry->d_lockref.count > 0);
this_cpu_dec(nr_dentry);
if (dentry->d_op && dentry->d_op->d_release)
dentry->d_op->d_release(dentry);
@@ -532,7 +532,7 @@ static inline struct dentry *lock_parent(struct dentry 
*dentry)
struct dentry *parent = dentry->d_parent;
if (IS_ROOT(dentry))
return NULL;
-   if (unlikely((int)dentry->d_lockref.count < 0))
+   if (unlikely(dentry->d_lockref.count < 0))
return NULL;
if (likely(spin_trylock(&parent->d_lock)))
return parent;
@@ -699,7 +699,7 @@ struct dentry *dget_parent(struct dentry *dentry)
 */
rcu_read_lock();
ret = ACCESS_ONCE(dentry->d_parent);
-   gotref = lockref_get_not_zero(&ret->d_lockref);
+   gotref = lockref_get_active(&ret->d_lockref);
rcu_read_unlock();
if (likely(gotref)) {
if (likely(ret == ACCESS_ONCE(dentry->d_parent)))
@@ -848,7 +848,7 @@ static void shrink_dentry_list(struct list_head *list)
 * We found an inuse dentry which was not removed from
 * the LRU because of laziness during lookup. Do not free it.
 */
-   if ((int)dentry->d_lockref.count > 0) {
+   if (dentry->d_lockref.count > 0) {
spin_unlock(&dentry->d_lock);
if (parent)
spin_unlock(&parent->d_lock);
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index aec7f73..d492f0e 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1745,7 +1745,7 @@ void gfs2_dump_glock(struct seq_file *seq, const struct 
gfs2_glock *gl)
  state2str(gl->gl_demote_state), dtime,
  atomic_read(&gl->gl_ail_count),
  atomic_read(&gl->gl_revok

Re: [PATCH] x86, efi: print debug values in Kib not MB

2014-07-29 Thread Steven Noonan

On Tue, Jul 29, 2014 at 3:54 PM, Prarit Bhargava  wrote:
>
>
> On 07/29/2014 06:36 PM, Borislav Petkov wrote:
>> On Tue, Jul 29, 2014 at 06:32:56PM -0400, Prarit Bhargava wrote:
>>> and it was best to keep the code simple with a KiB.
>>
>> You're missing the point - the output doesn't get simple with KiB. Read
>> the example I just gave you!
>>
>> (13893632kiB) is actively *not* helping when one looks at it!
>
> I did get your point and I'm (politely) disagreeing with it.  Your case ONLY
> works if the number is _exactly_ 13GB.  What if it is 13.5?  Then we're still
> rounding off and reporting 14GB.  Which, if this code is meant for debug, 
> makes
> no sense to me.  Why round it off?
>

I think if it was being represented in procfs or sysfs, we'd need to
be extremely specific to make it machine-readable, but for a
human-readable printk, it makes sense to me to print it in the smaller
unit size until the value is in tens of the next higher unit size
(e.g. print in KiB until 10+MiB, print in MiB until 10+GiB).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: general protection fault on 3.15.6

2014-07-25 Thread Steven Noonan

On Fri, Jul 25, 2014 at 9:42 PM, Steven Noonan  wrote:
> On Thu, Jul 24, 2014 at 12:06 AM, Alexander Holler  
> wrote:
>> Am 23.07.2014 19:50, schrieb Steven Noonan:
>>
>>> (Oops, LKML doesn't like rich text, resending. Was trying to avoid
>>> GMail's bad line wrapping. Going to use Mutt instead.)
>>>
>>> I'm starting to wonder if it's bad RAM or something. Just got a couple of
>>> worrying warnings on boot from the same system (after it spontaneously
>>> rebooted, with nothing revealing in the previous boot's logs).
>
> So the spontaneous reboot was apparently caused by a power outage. All
> my boxes had identical uptimes of less than a couple days when I checked
> them.
>
>>
>>
>> I once had such too and since then I'm using memtest=3 in my kernel command
>> line on x86* machines. Depending on the amount of RAM it will slow down boot
>> by a few seconds, but if you don't care if your machine comes up in 5 or 10
>> seconds, it is a no-brainer.
>>
>
> However, I got another general protection fault. This time it happened
> when doing 'find' on an NFS mount point. Tried booting with 'memtest=16'
> to see if that would catch anything, but it passed without finding any
> bad regions. I'm running memtest86 right now to be a bit more thorough
> and ensure it's not just bad hardware, but so far it's not found
> anything (1 full pass done so far).
>
> Here's the latest backtraces. I only managed to copy/paste this before
> the system hung and I had to reboot it, but there should be a more
> complete kernel log in the systemd journal that I can grab once it's
> done with memtest86.
>
> [212326.408380] general protection fault:  [#1] SMP
> [212326.409183] Modules linked in: rpcsec_gss_krb5 auth_rpcgss oid_registry 
> nfsv4 nfs lockd fscache sunrpc macvlan xt_nat sit tunnel4 ip_tunnel sch_sfq 
> ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ipt_REJECT 
> xt_limit 8021q nf_conntrack_ipv4 nf_defrag_ipv4 xt_LOG xt_tcpudp bridge 
> ip6t_rt nf_conntrack_ipv6 stp llc nf_defrag_ipv6 xt_conntrack nf_conntrack 
> iptable_filter ip6table_filter ip6_tables ip_tables x_tables it87 hwmon_vid 
> nls_cp437 vfat fat x86_pkg_temp_thermal iTCO_wdt intel_powerclamp raid1 
> iTCO_vendor_support raid0 coretemp crct10dif_pclmul md_mod snd_hda_codec_hdmi 
> crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul 
> snd_hda_codec_realtek glue_helper ablk_helper cryptd snd_hda_codec_generic 
> snd_hda_intel snd_hda_controller microcode i2c_i801 r8169 snd_hda_codec
> [212326.411879]  snd_hwdep mii snd_pcm snd_timer thermal fan snd acpi_cpufreq 
> battery soundcore lpc_ich mfd_core evdev processor zfs(PO) zunicode(PO) 
> zavl(PO) zcommon(PO) znvpair(PO) spl(O) tun usbip_host(C) usbip_core(C) msr 
> loop kvm_intel kvm efivarfs ext4 crc16 jbd2 mbcache sd_mod crc_t10dif 
> crct10dif_common hid_generic usbhid hid ahci libahci crc32c_intel ehci_pci 
> libata xhci_hcd ehci_hcd scsi_mod usbcore usb_common i915 video intel_gtt 
> i2c_algo_bit drm_kms_helper drm i2c_core e1000e ptp pps_core ipmi_poweroff 
> ipmi_msghandler button
> [212326.414577] CPU: 5 PID: 30360 Comm: find Tainted: PWC O  
> 3.15.6-1-ec2 #1
> [212326.415457] Hardware name: Shuttle Inc. SH67H/FH67H, BIOS 2.04 04/10/2013
> [212326.416352] task: 8801275bbb00 ti: 88030f80c000 task.ti: 
> 88030f80c000
> [212326.417261] RIP: 0010:[]  [] 
> __kmalloc_track_caller+0x86/0x260
> [212326.418194] RSP: 0018:88030f80fb78  EFLAGS: 00010282
> [212326.419130] RAX:  RBX: 0004 RCX: 
> 35ee
> [212326.420081] RDX: 35ed RSI:  RDI: 
> 
> [212326.421021] RBP: 88030f80fbb0 R08: 000173c0 R09: 
> 8801eb6ae160
> [212326.421958] R10: 88040e803e00 R11: 0004 R12: 
> ff0074726f707262
> [212326.422887] R13: 00d0 R14: 0004 R15: 
> 88040e803e00
> [212326.423808] FS:  7f3b98919700() GS:88041f34() 
> knlGS:
> [212326.424752] CS:  0010 DS:  ES:  CR0: 80050033
> [212326.425698] CR2: 00ef0010 CR3: 0003ffd3c000 CR4: 
> 001407e0
> [212326.426659] Stack:
> [212326.427620]  88040e803e00 a0211d75 0004 
> 8803607f0558
> [212326.428609]  0009 8801eb6ae000 8801eb6ae140 
> 88030f80fbd0
> [212326.429630]  8116fb60 88030f80fd40 88030f80fe58 
> 88030f80fcc8
> [212326.430640] Call Trace:
> [212326.431651]  [] ? nfs_permission+0x405/0xfb0 [nfs]
> [212326.432681]  [] kmemdup+0x20/0x50
> [21232

Re: general protection fault on 3.15.6

2014-07-25 Thread Steven Noonan

On Thu, Jul 24, 2014 at 12:06 AM, Alexander Holler  wrote:
> Am 23.07.2014 19:50, schrieb Steven Noonan:
>
>> (Oops, LKML doesn't like rich text, resending. Was trying to avoid
>> GMail's bad line wrapping. Going to use Mutt instead.)
>>
>> I'm starting to wonder if it's bad RAM or something. Just got a couple of
>> worrying warnings on boot from the same system (after it spontaneously
>> rebooted, with nothing revealing in the previous boot's logs).

So the spontaneous reboot was apparently caused by a power outage. All
my boxes had identical uptimes of less than a couple days when I checked
them.

>
>
> I once had such too and since then I'm using memtest=3 in my kernel command
> line on x86* machines. Depending on the amount of RAM it will slow down boot
> by a few seconds, but if you don't care if your machine comes up in 5 or 10
> seconds, it is a no-brainer.
>

However, I got another general protection fault. This time it happened
when doing 'find' on an NFS mount point. Tried booting with 'memtest=16'
to see if that would catch anything, but it passed without finding any
bad regions. I'm running memtest86 right now to be a bit more thorough
and ensure it's not just bad hardware, but so far it's not found
anything (1 full pass done so far).

Here's the latest backtraces. I only managed to copy/paste this before
the system hung and I had to reboot it, but there should be a more
complete kernel log in the systemd journal that I can grab once it's
done with memtest86.

[212326.408380] general protection fault:  [#1] SMP
[212326.409183] Modules linked in: rpcsec_gss_krb5 auth_rpcgss oid_registry 
nfsv4 nfs lockd fscache sunrpc macvlan xt_nat sit tunnel4 ip_tunnel sch_sfq 
ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ipt_REJECT 
xt_limit 8021q nf_conntrack_ipv4 nf_defrag_ipv4 xt_LOG xt_tcpudp bridge ip6t_rt 
nf_conntrack_ipv6 stp llc nf_defrag_ipv6 xt_conntrack nf_conntrack 
iptable_filter ip6table_filter ip6_tables ip_tables x_tables it87 hwmon_vid 
nls_cp437 vfat fat x86_pkg_temp_thermal iTCO_wdt intel_powerclamp raid1 
iTCO_vendor_support raid0 coretemp crct10dif_pclmul md_mod snd_hda_codec_hdmi 
crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul 
snd_hda_codec_realtek glue_helper ablk_helper cryptd snd_hda_codec_generic 
snd_hda_intel snd_hda_controller microcode i2c_i801 r8169 snd_hda_codec
[212326.411879]  snd_hwdep mii snd_pcm snd_timer thermal fan snd acpi_cpufreq 
battery soundcore lpc_ich mfd_core evdev processor zfs(PO) zunicode(PO) 
zavl(PO) zcommon(PO) znvpair(PO) spl(O) tun usbip_host(C) usbip_core(C) msr 
loop kvm_intel kvm efivarfs ext4 crc16 jbd2 mbcache sd_mod crc_t10dif 
crct10dif_common hid_generic usbhid hid ahci libahci crc32c_intel ehci_pci 
libata xhci_hcd ehci_hcd scsi_mod usbcore usb_common i915 video intel_gtt 
i2c_algo_bit drm_kms_helper drm i2c_core e1000e ptp pps_core ipmi_poweroff 
ipmi_msghandler button
[212326.414577] CPU: 5 PID: 30360 Comm: find Tainted: PWC O  
3.15.6-1-ec2 #1
[212326.415457] Hardware name: Shuttle Inc. SH67H/FH67H, BIOS 2.04 04/10/2013
[212326.416352] task: 8801275bbb00 ti: 88030f80c000 task.ti: 
88030f80c000
[212326.417261] RIP: 0010:[]  [] 
__kmalloc_track_caller+0x86/0x260
[212326.418194] RSP: 0018:88030f80fb78  EFLAGS: 00010282
[212326.419130] RAX:  RBX: 0004 RCX: 
35ee
[212326.420081] RDX: 35ed RSI:  RDI: 

[212326.421021] RBP: 88030f80fbb0 R08: 000173c0 R09: 
8801eb6ae160
[212326.421958] R10: 88040e803e00 R11: 0004 R12: 
ff0074726f707262
[212326.422887] R13: 00d0 R14: 0004 R15: 
88040e803e00
[212326.423808] FS:  7f3b98919700() GS:88041f34() 
knlGS:
[212326.424752] CS:  0010 DS:  ES:  CR0: 80050033
[212326.425698] CR2: 00ef0010 CR3: 0003ffd3c000 CR4: 
001407e0
[212326.426659] Stack:
[212326.427620]  88040e803e00 a0211d75 0004 
8803607f0558
[212326.428609]  0009 8801eb6ae000 8801eb6ae140 
88030f80fbd0
[212326.429630]  8116fb60 88030f80fd40 88030f80fe58 
88030f80fcc8
[212326.430640] Call Trace:
[212326.431651]  [] ? nfs_permission+0x405/0xfb0 [nfs]
[212326.432681]  [] kmemdup+0x20/0x50
[212326.433717]  [] nfs_permission+0x405/0xfb0 [nfs]
[212326.434760]  [] nfs_permission+0x907/0xfb0 [nfs]
[212326.435810]  [] ? nfs_permission+0x9e0/0xfb0 [nfs]
[212326.436863]  [] nfs_permission+0xa02/0xfb0 [nfs]
[212326.437924]  [] do_read_cache_page+0x7e/0x1a0
[212326.438990]  [] read_cache_page+0x1c/0x20
[212326.440078]  [] nfs_permission+0xbbb/0xfb0 [nfs]
[212326.441159]  [] ? nfs4_proc_secinfo+0x63a0/0x63a0 [nfsv4]
[212326.442251]  [] iterate_dir+0xa6/0xe0
[212326.44334

Re: general protection fault on 3.15.6

2014-07-23 Thread Steven Noonan

5cfff
Jul 23 09:43:20 orcus kernel:  [] 
acpi_bus_register_driver+0x40/0x42
Jul 23 09:43:20 orcus kernel:  [] init_module+0x67/0x81 
[thermal]
Jul 23 09:43:20 orcus kernel:  [] do_one_initcall+0xfa/0x160
Jul 23 09:43:20 orcus kernel:  [] ? 
__blocking_notifier_call_chain+0x52/0x60
Jul 23 09:43:20 orcus kernel:  [] load_module+0x1a11/0x2300
Jul 23 09:43:20 orcus kernel:  [] ? store_uevent+0x40/0x40
Jul 23 09:43:20 orcus kernel:  [] ? 
copy_module_from_fd.isra.39+0x111/0x170
Jul 23 09:43:20 orcus kernel:  [] SyS_finit_module+0x7e/0x80
Jul 23 09:43:20 orcus kernel:  [] 
system_call_fastpath+0x1a/0x1f
Jul 23 09:43:20 orcus kernel: ---[ end trace 71a1e508f45dbd1c ]---
Jul 23 09:43:20 orcus kernel: [ cut here ]
Jul 23 09:43:20 orcus kernel: WARNING: CPU: 4 PID: 270 at lib/kobject.c:670 
kobject_put+0x58/0x60()
Jul 23 09:43:20 orcus kernel: kobject: '(null)' (880405d80cb0): is not 
initialized, yet kobject_put() is being called.
Jul 23 09:43:20 orcus kernel: Modules linked in: snd_hda_intel 
snd_hda_controller microcode(+) i2c_i801 r8169(+) snd_hda_codec snd_hwdep mii 
snd_pcm snd_timer thermal(+) fan snd acpi_cpufreq(+) battery soundcore lpc_ich 
mfd_core evdev processor zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) 
spl(O) tun usbip_host(C) usbip_core(C) msr loop kvm_intel kvm efivarfs ext4 
crc16 jbd2 mbcache sd_mod crc_t10dif crct10dif_common hid_generic usbhid hid 
ahci libahci crc32c_intel ehci_pci libata xhci_hcd ehci_hcd scsi_mod usbcore 
usb_common i915 video intel_gtt i2c_algo_bit drm_kms_helper drm i2c_core e1000e 
ptp pps_core ipmi_poweroff ipmi_msghandler button
Jul 23 09:43:20 orcus kernel: CPU: 4 PID: 270 Comm: systemd-udevd Tainted: P
WC O  3.15.6-1-ec2 #1
Jul 23 09:43:20 orcus kernel: Hardware name: Shuttle Inc. SH67H/FH67H, BIOS 
2.04 04/10/2013
Jul 23 09:43:20 orcus kernel:  0009 8804067b7820 
81505dd6 8804067b7868
Jul 23 09:43:20 orcus kernel:  8804067b7858 81066a3d 
880405d80cb0 
Jul 23 09:43:20 orcus kernel:  0007 880402b15018 
 8804067b78b8
Jul 23 09:43:20 orcus kernel: Call Trace:
Jul 23 09:43:20 orcus kernel:  [] dump_stack+0x45/0x56
Jul 23 09:43:20 orcus kernel:  [] 
warn_slowpath_common+0x7d/0xa0
Jul 23 09:43:20 orcus kernel:  [] warn_slowpath_fmt+0x4c/0x50
Jul 23 09:43:20 orcus kernel:  [] kobject_put+0x58/0x60
Jul 23 09:43:20 orcus kernel:  [] cpufreq_cpu_put+0x20/0x30
Jul 23 09:43:20 orcus kernel:  [] cpufreq_get_policy+0x59/0x70
Jul 23 09:43:20 orcus kernel:  [] 
acpi_processor_power_exit+0x164/0x3c6 [processor]
Jul 23 09:43:20 orcus kernel:  [] ? 
cpufreq_update_policy+0x140/0x140
Jul 23 09:43:20 orcus kernel:  [] 
acpi_processor_power_exit+0x20e/0x3c6 [processor]
Jul 23 09:43:20 orcus kernel:  [] 
acpi_processor_power_exit+0x399/0x3c6 [processor]
Jul 23 09:43:20 orcus kernel:  [] 
thermal_cdev_update+0x8d/0xa0
Jul 23 09:43:20 orcus kernel:  [] step_wise_throttle+0x59/0x90
Jul 23 09:43:20 orcus kernel:  [] 
handle_thermal_trip+0x4c/0x150
Jul 23 09:43:20 orcus kernel:  [] 
thermal_zone_device_update+0x7d/0xc0
Jul 23 09:43:20 orcus kernel:  [] 
thermal_zone_device_register+0x7ad/0x8b0
Jul 23 09:43:20 orcus kernel:  [] 0xa085b163
Jul 23 09:43:20 orcus kernel:  [] ? 
sysfs_do_create_link_sd.isra.2+0x71/0xe0
Jul 23 09:43:20 orcus kernel:  [] acpi_device_probe+0x43/0xe9
Jul 23 09:43:20 orcus kernel:  [] 
driver_probe_device+0x8e/0x270
Jul 23 09:43:20 orcus kernel:  [] __driver_attach+0x8b/0x90
Jul 23 09:43:20 orcus kernel:  [] ? __device_attach+0x40/0x40
Jul 23 09:43:20 orcus kernel:  [] bus_for_each_dev+0x6b/0xb0
Jul 23 09:43:20 orcus kernel:  [] driver_attach+0x1e/0x20
Jul 23 09:43:20 orcus kernel:  [] bus_add_driver+0x178/0x230
Jul 23 09:43:20 orcus kernel:  [] ? 0xa085cfff
Jul 23 09:43:20 orcus kernel:  [] driver_register+0x64/0xf0
Jul 23 09:43:20 orcus kernel:  [] ? 0xa085cfff
Jul 23 09:43:20 orcus kernel:  [] 
acpi_bus_register_driver+0x40/0x42
Jul 23 09:43:20 orcus kernel:  [] init_module+0x67/0x81 
[thermal]
Jul 23 09:43:20 orcus kernel:  [] do_one_initcall+0xfa/0x160
Jul 23 09:43:20 orcus kernel:  [] ? 
__blocking_notifier_call_chain+0x52/0x60
Jul 23 09:43:20 orcus kernel:  [] load_module+0x1a11/0x2300
Jul 23 09:43:20 orcus kernel:  [] ? store_uevent+0x40/0x40
Jul 23 09:43:20 orcus kernel:  [] ? 
copy_module_from_fd.isra.39+0x111/0x170
Jul 23 09:43:20 orcus kernel:  [] SyS_finit_module+0x7e/0x80
Jul 23 09:43:20 orcus kernel:  [] 
system_call_fastpath+0x1a/0x1f
Jul 23 09:43:20 orcus kernel: ---[ end trace 71a1e508f45dbd1d ]---

On Mon, Jul 21, 2014 at 10:41:45AM -0700, Steven Noonan wrote:
> On Mon, Jul 21, 2014 at 6:29 AM, Tejun Heo  wrote:
> > Hello, Steven.
> >
> > On Sun, Jul 20, 2014 at 09:27:42PM -0700, Steven Noonan wrote:
> >> My router/storage box suddenly stopped responding (originally noticed
> >> because dnsmasq wasn't responding) and I had to reboot it.

Re: general protection fault on 3.15.6

2014-07-21 Thread Steven Noonan

On Mon, Jul 21, 2014 at 6:29 AM, Tejun Heo  wrote:
> Hello, Steven.
>
> On Sun, Jul 20, 2014 at 09:27:42PM -0700, Steven Noonan wrote:
>> My router/storage box suddenly stopped responding (originally noticed
>> because dnsmasq wasn't responding) and I had to reboot it. I checked
>> the systemd journal when it came back and these were the last thing in
>> there for the previous boot. Any ideas about pinning down the cause?
>>
>> general protection fault:  [#1] SMP
> ...
>> CPU: 3 PID: 8881 Comm: systemd Tainted: PWC O  3.15.6 #1
>> Hardware name: Shuttle Inc. SH67H/FH67H, BIOS 2.04 04/10/2013
>> task: 8802f473d880 ti: 8802f0abc000 task.ti: 8802f0abc000
>> RIP: 0010:[]  []
>> __kmalloc_track_caller+0x86/0x260
>
> So, GFP in kmalloc,
>
>> Call Trace:
>>  [] kstrdup+0x31/0x60
>
> called from kstrdup()
>
>>  [] __kernfs_new_node+0x34/0xf0
>>  [] kernfs_new_node+0x26/0x50
>
> which was invoked to copy the node name while creating a new kernfs
> node.
>
>>  [] __kernfs_create_file+0x39/0xa0
>>  [] cgroup_addrm_files+0x110/0x250
>>  [] cgroup_mkdir+0x21b/0x540
>>  [] ? security_inode_notifysecctx+0x16/0x20
>>  [] kernfs_iop_mkdir+0x5a/0x90
>>  [] vfs_mkdir+0xe0/0x180
>>  [] SyS_mkdirat+0xaa/0xe0
>>  [] SyS_mkdir+0x19/0x20
>>  [] system_call_fastpath+0x1a/0x1f
>> Code: 25 88 dd 00 00 49 8b 50 08 4d 8b 20 4d 85 e4 0f 84 50 01 00 00
>> 49 83 78 10 00 0f 84 45 01 00 00 49 63 47 20 48 8d 4a 01 4d 8b 07 <49>
>> 8b 1c 04 4c 89 e0 65 49 0f c7 08 0f 94 c0 84 c0 74 bb 49 63
>> RIP  [] __kmalloc_track_caller+0x86/0x260
>>  RSP 
>
> followed by another GPF
>
>> general protection fault:  [#2] SMP
> ...
>> RIP: 0010:[]  [] __kmalloc+0x8a/0x280
>
> in __kmalloc()
>
>>  [] acpi_ns_internalize_name+0x68/0xad
>
> called from acpi to copy a different name.
>
> I don't think the problem is anything cgroup / kernfs specific.  The
> allocator is GPFing inside it from multiple callers and it's not even
> using a caller-provided cache.  It looks like something screwed up the
> memory allocator and it's now faulting on unrelated callers.  Most
> likely illegal free or use-after-free.
>
> Steven, can you please post the full kernel log from boot till reboot?
> It usually is a good idea to include full log when reporting bugs as
> it's very easy to exclude the actually relevant part.
>

I would if I could, but I've had to set up some rather draconian
limits on my systemd journal sizes because of some incessant kernel
messages filling up the logs (related to 6to4 SIT tunnels) -- this has
unfortunately truncated most of the log. Are there any particular
kernel config options I should enable to make tracking this down
easier if it comes up again?

- Steven
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

general protection fault on 3.15.6

2014-07-20 Thread Steven Noonan

My router/storage box suddenly stopped responding (originally noticed
because dnsmasq wasn't responding) and I had to reboot it. I checked
the systemd journal when it came back and these were the last thing in
there for the previous boot. Any ideas about pinning down the cause?

general protection fault:  [#1] SMP
Modules linked in: zfs(PO) zunicode(PO) zavl(PO) zcommon(PO)
znvpair(PO) spl(O) xt_nat sit tunnel4 ip_tunnel sch_sfq 8021q
ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat iptable_mangle
ipt_REJECT xt_tcpudp xt_LOG ip6t_rt xt_limit nf_conntrack_ipv6
nf_conntrack_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc
xt_conntrack nf_conntrack iptable_filter ip6table_filter ip6_tables
ip_tables x_tables it87 hwmon_vid nls_cp437 vfat fat
snd_hda_codec_hdmi snd_hda_codec_realtek iTCO_wdt
snd_hda_codec_generic iTCO_vendor_support raid0 raid1 md_mod
x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul
glue_helper ablk_helper cryptd snd_hda_intel snd_hda_controller
snd_hda_codec microcode r8169 i2c_i801 snd_hwdep acpi_cpufreq
 snd_pcm mii snd_timer thermal snd fan soundcore lpc_ich battery
mfd_core evdev processor tun usbip_host(C) usbip_core(C) msr loop
kvm_intel kvm efivarfs ext4 crc16 jbd2 mbcache sd_mod crc_t10dif
crct10dif_common hid_generic usbhid hid ahci libahci crc32c_intel
libata ehci_pci xhci_hcd ehci_hcd scsi_mod usbcore usb_common i915
video intel_gtt i2c_algo_bit drm_kms_helper drm i2c_core e1000e ptp
pps_core ipmi_poweroff ipmi_msghandler button
CPU: 3 PID: 8881 Comm: systemd Tainted: PWC O  3.15.6 #1
Hardware name: Shuttle Inc. SH67H/FH67H, BIOS 2.04 04/10/2013
task: 8802f473d880 ti: 8802f0abc000 task.ti: 8802f0abc000
RIP: 0010:[]  []
__kmalloc_track_caller+0x86/0x260
RSP: 0018:8802f0abfc88  EFLAGS: 00010286
RAX:  RBX: 8802f0abfdae RCX: 16e8
RDX: 16e7 RSI:  RDI: 
RBP: 8802f0abfcc0 R08: 000173c0 R09: 81a8e058
R10: 88040e803e00 R11: ea00101e4c00 R12: 00736b736174
R13: 00d0 R14: 0006 R15: 88040e803e00
FS:  7f47b62ac780() GS:88041f2c() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f47b6ac3568 CR3: 000359fb CR4: 001407e0
Stack:
 88040e803e00 8123a4f4 8802f0abfdae 0006
 00d0 81a4 01a4 8802f0abfce8
 8116fb11 8802f0abfdae 880406631100 0002
Call Trace:
 [] ? __kernfs_new_node+0x34/0xf0
 [] kstrdup+0x31/0x60
 [] __kernfs_new_node+0x34/0xf0
 [] kernfs_new_node+0x26/0x50
 [] __kernfs_create_file+0x39/0xa0
 [] cgroup_addrm_files+0x110/0x250
 [] cgroup_mkdir+0x21b/0x540
 [] ? security_inode_notifysecctx+0x16/0x20
 [] kernfs_iop_mkdir+0x5a/0x90
 [] vfs_mkdir+0xe0/0x180
 [] SyS_mkdirat+0xaa/0xe0
 [] SyS_mkdir+0x19/0x20
 [] system_call_fastpath+0x1a/0x1f
Code: 25 88 dd 00 00 49 8b 50 08 4d 8b 20 4d 85 e4 0f 84 50 01 00 00
49 83 78 10 00 0f 84 45 01 00 00 49 63 47 20 48 8d 4a 01 4d 8b 07 <49>
8b 1c 04 4c 89 e0 65 49 0f c7 08 0f 94 c0 84 c0 74 bb 49 63
RIP  [] __kmalloc_track_caller+0x86/0x260
 RSP 
general protection fault:  [#2] SMP
Modules linked in: zfs(PO) zunicode(PO) zavl(PO) zcommon(PO)
znvpair(PO) spl(O) xt_nat sit tunnel4 ip_tunnel sch_sfq 8021q
ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat iptable_mangle
ipt_REJECT xt_tcpudp xt_LOG ip6t_rt xt_limit nf_conntrack_ipv6
nf_conntrack_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc
xt_conntrack nf_conntrack iptable_filter ip6table_filter ip6_tables
ip_tables x_tables it87 hwmon_vid nls_cp437 vfat fat
snd_hda_codec_hdmi snd_hda_codec_realtek iTCO_wdt
snd_hda_codec_generic iTCO_vendor_support raid0 raid1 md_mod
x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul
glue_helper ablk_helper cryptd snd_hda_intel snd_hda_controller
snd_hda_codec microcode r8169 i2c_i801 snd_hwdep acpi_cpufreq
 snd_pcm mii snd_timer thermal snd fan soundcore lpc_ich battery
mfd_core evdev processor tun usbip_host(C) usbip_core(C) msr loop
kvm_intel kvm efivarfs ext4 crc16 jbd2 mbcache sd_mod crc_t10dif
crct10dif_common hid_generic usbhid hid ahci libahci crc32c_intel
libata ehci_pci xhci_hcd ehci_hcd scsi_mod usbcore usb_common i915
video intel_gtt i2c_algo_bit drm_kms_helper drm i2c_core e1000e ptp
pps_core ipmi_poweroff ipmi_msghandler button
CPU: 3 PID: 8881 Comm: systemd Tainted: PWC O  3.15.6 #1
Hardware name: Shuttle Inc. SH67H/FH67H, BIOS 2.04 04/10/2013
task: 8802f473d880 ti: 8802f0abc000 task.ti: 8802f0abc000
RIP: 0010:[]  [] __kmalloc+0x8a/0x280
RSP: 0018:8802f0abf718  EFLAGS: 00010086
RAX:  RBX: 8800373cb3c0 RCX: 16e8
RDX: 16e7 RSI:  RDI: 
RBP: 8802f0abf750 R08: 000173c0 R09

Re: Linux 3.15.2

2014-06-26 Thread Steven Noonan

On Thu, Jun 26, 2014 at 5:23 PM, Josh Boyer  wrote:
> On Thu, Jun 26, 2014 at 3:26 PM, Greg KH  wrote:
>> I'm announcing the release of the 3.15.2 kernel.
>>
>> All users of the 3.15 kernel series must upgrade.
>>
>> The updated 3.15.y git tree can be found at:
>> 
>> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git 
>> linux-3.15.y
>> and can be browsed at the normal kernel.org git web browser:
>> 
>> http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
>
> Is the patch for this stuck somewhere on kernel.org?  I see
> patch-3.14.9.xz, but patch-3.15.2.xz is missing on both the kernel.org
> front page and the 3.x http directory where we'd normally find it.
>

Indeed, the patch for 3.10.45 is also missing.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel-gfx] REGRESSION 3.14 i915 warning & mouse cursor vanishing

2014-06-10 Thread Steven Noonan

On Wed, Apr 16, 2014 at 3:03 PM, Steven Noonan  wrote:
> On Wed, Apr 16, 2014 at 2:46 PM, Jani Nikula
>  wrote:
>> On Tue, 15 Apr 2014, Imre Deak  wrote:
>>> On Tue, 2014-04-15 at 21:43 +0200, Daniel Vetter wrote:
>>>> On Mon, Apr 14, 2014 at 11:56:03AM -0700, Steven Noonan wrote:
>>>> > On Mon, Apr 14, 2014 at 11:35:05AM -0700, Keith Packard wrote:
>>>> > > Steven Noonan  writes:
>>>> > >
>>>> > > > Was using my machine normally, then my mouse cursor vanished. After 
>>>> > > > switching
>>>> > > > to a VT and back to X11, my cursor came back. But I did notice a 
>>>> > > > nasty trace in
>>>> > > > dmesg (below).
>>>> > >
>>>> > > I don't think the trace below is related to the cursor disappearing.
>>>> >
>>>> > Any idea what the trace is all about then? Seems it has something to do
>>>> > with runtime power management (maybe my aggressive kernel command-line
>>>> > options are triggering it).
>>>>
>>>> Please test without them. Currently runtime pm should be disabled still on
>>>> vlv (since it's incomplete in 3.14). If you've force-enabled that then you
>>>> get to keep all pieces ;-)
>>>>
>>>> In general don't set any i915 options if you're not a developer or someone
>>>> else who _really_ knows what's going on.
>>>
>>> Note that the lspci output and the
>>>
>>> [ 1795.275026] [drm:hsw_unclaimed_reg_clear] *ERROR* Unknown unclaimed
>>> register before writing to 70084
>>>
>>> line suggests HSW and the specs for ThinkPad Yoga suggests the same. But
>>> I don't know how the vlv_* functions can possible end up in those traces
>>> then, perhaps just a coincidence, random data on stack?
>>
>> I'm wondering the same. Perhaps double check your kernel build and
>> modules are all right and matching?
>>
>
> It was a clean build (built in a clean chroot, no ccache or anything
> fancy), so those stack traces are as "right" as they could be under
> those conditions.
>
> The "good" news (or perhaps scary news) is that I've been running
> 3.14.1 for the past 36 hours and haven't been able to reproduce either
> problem since then (warnings or ninja mouse cursor). Nothing in the
> changelog for v3.14..v3.14.1 really stands out as a clear fix though.
> The only changes that appear to directly affect my configuration would
> be the futex changes, iwlwifi change, efi change, and ipv6 change.

This issue is haunting me again. This time I'm running 3.14.6. My mouse
cursor vanished, and I have a bunch of warnings in dmesg:

[ 5622.922652] [ cut here ]
[ 5622.922707] WARNING: CPU: 0 PID: 312 at 
drivers/gpu/drm/i915/intel_uncore.c:455 vlv_flisdsi_write+0x1d69/0x2cc0 [i915]()
[ 5622.922710] Device suspended
[ 5622.922714] Modules linked in: cpufreq_stats fuse ctr ccm hid_generic 
hid_sensor_magn_3d hid_sensor_als hid_sensor_gyro_3d hid_sensor_accel_3d 
hid_sensor_trigger industrialio_triggered_buffer kfifo_buf 
hid_sensor_iio_common industrialio hid_sensor_hub hid_multitouch usbhid wacom 
hid btusb joydev bnep bluetooth 6lowpan_iphc tun arc4 nls_cp437 vfat fat iwlmvm 
mac80211 x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_hdmi 
iTCO_wdt kvm_intel iTCO_vendor_support kvm crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper iwlwifi 
ablk_helper cryptd microcode psmouse cfg80211 serio_raw snd_hda_codec_conexant 
i2c_i801 lpc_ich snd_hda_codec_generic mfd_core tpm_tis snd_hda_intel thermal 
tpm wmi thinkpad_acpi snd_hda_codec snd_hwdep nvram battery snd_pcm
[ 5622.922796]  ac evdev snd_timer snd soundcore intel_smartconnect 
acpi_cpufreq processor vmnet(O) vmblock(O) vmci(O) vmmon(O) msr ax88179_178a(O) 
usbnet mii efivarfs ext4 crc16 jbd2 mbcache sd_mod crc_t10dif crct10dif_common 
crc32c_intel ahci libahci libata xhci_hcd scsi_mod usbcore usb_common i915 
video intel_gtt i2c_algo_bit drm_kms_helper drm i2c_core e1000e ptp pps_core 
ipmi_poweroff ipmi_msghandler button
[ 5622.922853] CPU: 0 PID: 312 Comm: X Tainted: G   O 3.14.6-1-ec2 #1
[ 5622.922856] Hardware name: LENOVO 20CDCTO1WW/20CDCTO1WW, BIOS GQET35WW (1.15 
) 04/21/2014
[ 5622.922860]  0009 880212319c08 814fd4c3 
880212319c50
[ 5622.922867]  880212319c40 810664ad 00070088 
8802119b8000
[ 5622.922873]  8802119b8028 8802119b8020 03bb0611 
880212319ca0
[ 5622.922880] Call Trace:
[ 5622.922891]  [] dump_stack+0x45/

Re: [PATCH] tell gcc optimizer to never introduce new data races

2014-06-10 Thread Steven Noonan

On Tue, Jun 10, 2014 at 10:46 AM, Linus Torvalds
 wrote:
> On Tue, Jun 10, 2014 at 6:23 AM, Jiri Kosina  wrote:
>> We have been chasing a memory corruption bug, which turned out to be
>> caused by very old gcc (4.3.4), which happily turned conditional load into
>> a non-conditional one, and that broke correctness (the condition was met
>> only if lock was held) and corrupted memory.
>
> Just out of interest, can you point to the particular kernel code that
> caused this? I think that's more interesting than the example program
> you show - which I'm sure is really nice for gcc developers as an
> example, but from a kernel standpoint I think it's more important to
> show the particular problems this caused for the kernel?
>

Jiri, is there a workaround for compilers that don't support '--param
allow-store-data-races=0'? For example:

$ gcc-4.5 -O2 -o cond_store cond_store.c && ./cond_store
Segmentation fault (core dumped)

$ gcc-4.5 -O2 --param allow-store-data-races=0 -o cond_store
cond_store.c && ./cond_store
cc1: error: invalid parameter ‘allow-store-data-races’

$ gcc-4.5 -v
Using built-in specs.
COLLECT_GCC=gcc-4.5
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-unknown-linux-gnu/4.5.4/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --libdir=/usr/lib --libexecdir=/usr/lib
--program-suffix=-4.5 --enable-shared
--enable-languages=c,c++,fortran,objc,obj-c++ --enable-__cxa_atexit
--disable-libstdcxx-pch --disable-multilib --disable-libgomp
--disable-libmudflap --disable-libssp --enable-clocale=gnu
--with-tune=generic --with-cloog --with-ppl --with-system-zlib
Thread model: posix
gcc version 4.5.4 (GCC)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel-gfx] REGRESSION 3.14 i915 warning & mouse cursor vanishing

2014-04-16 Thread Steven Noonan

On Wed, Apr 16, 2014 at 2:46 PM, Jani Nikula
 wrote:
> On Tue, 15 Apr 2014, Imre Deak  wrote:
>> On Tue, 2014-04-15 at 21:43 +0200, Daniel Vetter wrote:
>>> On Mon, Apr 14, 2014 at 11:56:03AM -0700, Steven Noonan wrote:
>>> > On Mon, Apr 14, 2014 at 11:35:05AM -0700, Keith Packard wrote:
>>> > > Steven Noonan  writes:
>>> > >
>>> > > > Was using my machine normally, then my mouse cursor vanished. After 
>>> > > > switching
>>> > > > to a VT and back to X11, my cursor came back. But I did notice a 
>>> > > > nasty trace in
>>> > > > dmesg (below).
>>> > >
>>> > > I don't think the trace below is related to the cursor disappearing.
>>> >
>>> > Any idea what the trace is all about then? Seems it has something to do
>>> > with runtime power management (maybe my aggressive kernel command-line
>>> > options are triggering it).
>>>
>>> Please test without them. Currently runtime pm should be disabled still on
>>> vlv (since it's incomplete in 3.14). If you've force-enabled that then you
>>> get to keep all pieces ;-)
>>>
>>> In general don't set any i915 options if you're not a developer or someone
>>> else who _really_ knows what's going on.
>>
>> Note that the lspci output and the
>>
>> [ 1795.275026] [drm:hsw_unclaimed_reg_clear] *ERROR* Unknown unclaimed
>> register before writing to 70084
>>
>> line suggests HSW and the specs for ThinkPad Yoga suggests the same. But
>> I don't know how the vlv_* functions can possible end up in those traces
>> then, perhaps just a coincidence, random data on stack?
>
> I'm wondering the same. Perhaps double check your kernel build and
> modules are all right and matching?
>

It was a clean build (built in a clean chroot, no ccache or anything
fancy), so those stack traces are as "right" as they could be under
those conditions.

The "good" news (or perhaps scary news) is that I've been running
3.14.1 for the past 36 hours and haven't been able to reproduce either
problem since then (warnings or ninja mouse cursor). Nothing in the
changelog for v3.14..v3.14.1 really stands out as a clear fix though.
The only changes that appear to directly affect my configuration would
be the futex changes, iwlwifi change, efi change, and ipv6 change.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel-gfx] REGRESSION 3.14 i915 warning & mouse cursor vanishing

2014-04-15 Thread Steven Noonan

On Tue, Apr 15, 2014 at 12:59 PM, Imre Deak  wrote:
> On Tue, 2014-04-15 at 21:43 +0200, Daniel Vetter wrote:
>> On Mon, Apr 14, 2014 at 11:56:03AM -0700, Steven Noonan wrote:
>> > On Mon, Apr 14, 2014 at 11:35:05AM -0700, Keith Packard wrote:
>> > > Steven Noonan  writes:
>> > >
>> > > > Was using my machine normally, then my mouse cursor vanished. After 
>> > > > switching
>> > > > to a VT and back to X11, my cursor came back. But I did notice a nasty 
>> > > > trace in
>> > > > dmesg (below).
>> > >
>> > > I don't think the trace below is related to the cursor disappearing.
>> >
>> > Any idea what the trace is all about then? Seems it has something to do
>> > with runtime power management (maybe my aggressive kernel command-line
>> > options are triggering it).
>>
>> Please test without them. Currently runtime pm should be disabled still on
>> vlv (since it's incomplete in 3.14). If you've force-enabled that then you
>> get to keep all pieces ;-)
>>
>> In general don't set any i915 options if you're not a developer or someone
>> else who _really_ knows what's going on.
>
> Note that the lspci output and the
>
> [ 1795.275026] [drm:hsw_unclaimed_reg_clear] *ERROR* Unknown unclaimed
> register before writing to 70084
>
> line suggests HSW and the specs for ThinkPad Yoga suggests the same. But
> I don't know how the vlv_* functions can possible end up in those traces
> then, perhaps just a coincidence, random data on stack?
>
> For HSW the rc6 kernel option shouldn't make a difference.

Correct, it's Haswell.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: REGRESSION 3.14 i915 warning & mouse cursor vanishing

2014-04-14 Thread Steven Noonan

On Mon, Apr 14, 2014 at 11:35:05AM -0700, Keith Packard wrote:
> Steven Noonan  writes:
> 
> > Was using my machine normally, then my mouse cursor vanished. After 
> > switching
> > to a VT and back to X11, my cursor came back. But I did notice a nasty 
> > trace in
> > dmesg (below).
> 
> I don't think the trace below is related to the cursor disappearing.

Any idea what the trace is all about then? Seems it has something to do
with runtime power management (maybe my aggressive kernel command-line
options are triggering it).

> I found a pair of bugs (one in the intel driver, one in the X server)
> which can cause cursor disappearances. I just sent an intel driver patch
> to the intel-gfx list with the subject:
> 
> [PATCH] load_cursor_argb is supposed to return a Bool, not void
> 
> I've posted the X server patch once, and will respond to some review
> comments. Either is sufficient to get a cursor back, the intel driver
> one means you get a working hardware cursor again, rather than using a
> software cursor by mistake.

OK, good to know. Thanks for pointing those out!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

REGRESSION 3.14 i915 warning & mouse cursor vanishing

2014-04-14 Thread Steven Noonan

Was using my machine normally, then my mouse cursor vanished. After switching
to a VT and back to X11, my cursor came back. But I did notice a nasty trace in
dmesg (below).

I have a few options specified on my command line related to i915, but these
worked fine through 3.13.9:

i915.lvds_downclock=1 i915.i915_enable_rc6=7 i915.i915_enable_fbc=1

System is a Lenovo ThinkPad S1 Yoga.

lspci:

$ sudo lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation Haswell-ULT DRAM 
Controller [8086:0a04] (rev 0b)
00:02.0 VGA compatible controller [0300]: Intel Corporation Haswell-ULT 
Integrated Graphics Controller [8086:0a16] (rev 0b)
00:03.0 Audio device [0403]: Intel Corporation Device [8086:0a0c] (rev 
0b)
00:14.0 USB controller [0c03]: Intel Corporation Lynx Point-LP USB xHCI 
HC [8086:9c31] (rev 04)
00:16.0 Communication controller [0780]: Intel Corporation Lynx 
Point-LP HECI #0 [8086:9c3a] (rev 04)
00:1b.0 Audio device [0403]: Intel Corporation Lynx Point-LP HD Audio 
Controller [8086:9c20] (rev 04)
00:1c.0 PCI bridge [0604]: Intel Corporation Lynx Point-LP PCI Express 
Root Port 1 [8086:9c10] (rev e4)
00:1c.2 PCI bridge [0604]: Intel Corporation Lynx Point-LP PCI Express 
Root Port 3 [8086:9c14] (rev e4)
00:1c.3 PCI bridge [0604]: Intel Corporation Lynx Point-LP PCI Express 
Root Port 4 [8086:9c16] (rev e4)
00:1d.0 USB controller [0c03]: Intel Corporation Lynx Point-LP USB EHCI 
#1 [8086:9c26] (rev 04)
00:1f.0 ISA bridge [0601]: Intel Corporation Lynx Point-LP LPC 
Controller [8086:9c43] (rev 04)
00:1f.2 SATA controller [0106]: Intel Corporation Lynx Point-LP SATA 
Controller 1 [AHCI mode] [8086:9c03] (rev 04)
00:1f.3 SMBus [0c05]: Intel Corporation Lynx Point-LP SMBus Controller 
[8086:9c22] (rev 04)
04:00.0 Network controller [0280]: Intel Corporation Wireless 7260 
[8086:08b2] (rev 73)
05:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. Device 
[10ec:5227] (rev 01)


Let me know what other information would be helpful. I could do a bisection if
needed, but I'm hoping the stack traces below trigger an "aha" moment in 
somebody.

Here's the trace:

[ 1703.305960] [ cut here ]
[ 1703.305982] WARNING: CPU: 2 PID: 351 at 
drivers/gpu/drm/i915/intel_uncore.c:453 vlv_flisdsi_write+0x1ea9/0x2ed0 [i915]()
[ 1703.305983] Device suspended
[ 1703.305984] Modules linked in: usb_storage fuse ctr ccm bnep tun wacom 
hid_sensor_accel_3d hid_sensor_magn_3d hid_sensor_gyro_3d hid_sensor_als 
hid_sensor_trigger industrialio_triggered_buffer kfifo_buf 
hid_sensor_iio_common industrialio hid_sensor_hub joydev hid_multitouch usbhid 
btusb hid bluetooth 6lowpan_iphc arc4 nls_cp437 vfat fat iwlmvm 
snd_hda_codec_hdmi x86_pkg_temp_thermal mac80211 intel_powerclamp coretemp 
iTCO_wdt kvm_intel iTCO_vendor_support kvm crct10dif_pclmul crc32_pclmul 
crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul 
glue_helper ablk_helper cryptd microcode snd_hda_codec_conexant 
snd_hda_codec_generic wmi iwlwifi psmouse serio_raw i2c_i801 cfg80211 thermal 
snd_hda_intel tpm_tis tpm snd_hda_codec snd_hwdep snd_pcm thinkpad_acpi nvram 
battery ac snd_timer intel_smartconnect
[ 1703.306011]  evdev snd lpc_ich soundcore mfd_core processor usbip_host(C) 
usbip_core(C) efivarfs ext4 crc16 mbcache jbd2 sd_mod crc_t10dif 
crct10dif_common ahci libahci xhci_hcd libata ehci_pci ehci_hcd scsi_mod 
usbcore usb_common i915 video intel_gtt i2c_algo_bit drm_kms_helper drm 
i2c_core e1000e ptp pps_core ipmi_poweroff ipmi_msghandler button
[ 1703.306028] CPU: 2 PID: 351 Comm: X Tainted: G C   3.14.0-1-ec2 #1
[ 1703.306029] Hardware name: LENOVO 20CDCTO1WW/20CDCTO1WW, BIOS GQET34WW (1.14 
) 02/26/2014
[ 1703.306030]  0009 88021f283d50 814ea57f 
88021f283d98
[ 1703.306032]  88021f283d88 810653dd 8800d2748000 
0004400c
[ 1703.306034]  8800d2748028 8800d2748020 74002529 
88021f283de8
[ 1703.306036] Call Trace:
[ 1703.306037][] dump_stack+0x45/0x56
[ 1703.306044]  [] warn_slowpath_common+0x7d/0xa0
[ 1703.306046]  [] warn_slowpath_fmt+0x4c/0x50
[ 1703.306050]  [] ? enqueue_task_fair+0x43d/0x550
[ 1703.306055]  [] vlv_flisdsi_write+0x1ea9/0x2ed0 [i915]
[ 1703.306060]  [] vlv_flisdsi_write+0x2680/0x2ed0 [i915]
[ 1703.306065]  [] i915_queue_hangcheck+0x129a/0x2f20 [i915]
[ 1703.306068]  [] ? try_to_wake_up+0x28c/0x2a0
[ 1703.306070]  [] handle_irq_event_percpu+0x62/0x1f0
[ 1703.306072]  [] handle_irq_event+0x3d/0x60
[ 1703.306074]  [] handle_edge_irq+0x108/0x140
[ 1703.306078]  [] handle_irq+0x1e/0x40
[ 1703.306081]  [] do_IRQ+0x4f/0xe0
[ 1703.306084]  [] common_interrupt+0x6d/0x6d
[ 1703.306085][] ? arp_ignore+0x8c/0x8c
[ 1703.306089]  [] ? msleep+0x2f/0x40
[ 1703.306092]  [] pci_raw_set_power_state+0x142/0x230
[ 1703.306095]  [] pci_set_power_state+0xd3/0x150
[ 1703.306097]  [] pci_restore_standard_co

Re: [PATCH 3.13 08/22] Revert "xen: properly account for _PAGE_NUMA during xen pte translations"

2014-04-10 Thread Steven Noonan

I realize it's late to protest on this given that 3.13.9 is out, but
what is the path forward for those experiencing the original issue
that the reverted commit was intended to correct?

http://marc.info/?l=linux-kernel&m=139034684731087&w=2

On Mon, Mar 31, 2014 at 9:08 PM, Greg Kroah-Hartman
 wrote:
> 3.13-stable review patch.  If anyone has any objections, please let me know.
>
> --
>
> From: David Vrabel 
>
> commit 5926f87fdaad4be3ed10cec563bf357915e55a86 upstream.
>
> This reverts commit a9c8e4beeeb64c22b84c803747487857fe424b68.
>
> PTEs in Xen PV guests must contain machine addresses if _PAGE_PRESENT
> is set and pseudo-physical addresses is _PAGE_PRESENT is clear.
>
> This is because during a domain save/restore (migration) the page
> table entries are "canonicalised" and uncanonicalised". i.e., MFNs are
> converted to PFNs during domain save so that on a restore the page
> table entries may be rewritten with the new MFNs on the destination.
> This canonicalisation is only done for PTEs that are present.
>
> This change resulted in writing PTEs with MFNs if _PAGE_PROTNONE (or
> _PAGE_NUMA) was set but _PAGE_PRESENT was clear.  These PTEs would be
> migrated as-is which would result in unexpected behaviour in the
> destination domain.  Either a) the MFN would be translated to the
> wrong PFN/page; b) setting the _PAGE_PRESENT bit would clear the PTE
> because the MFN is no longer owned by the domain; or c) the present
> bit would not get set.
>
> Symptoms include "Bad page" reports when munmapping after migrating a
> domain.
>
> Signed-off-by: David Vrabel 
> Acked-by: Konrad Rzeszutek Wilk 
> Signed-off-by: Greg Kroah-Hartman 
>
> ---
>  arch/x86/include/asm/pgtable.h |   14 ++
>  arch/x86/xen/mmu.c |4 ++--
>  2 files changed, 4 insertions(+), 14 deletions(-)
>
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -445,20 +445,10 @@ static inline int pte_same(pte_t a, pte_
> return a.pte == b.pte;
>  }
>
> -static inline int pteval_present(pteval_t pteval)
> -{
> -   /*
> -* Yes Linus, _PAGE_PROTNONE == _PAGE_NUMA. Expressing it this
> -* way clearly states that the intent is that protnone and numa
> -* hinting ptes are considered present for the purposes of
> -* pagetable operations like zapping, protection changes, gup etc.
> -*/
> -   return pteval & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_NUMA);
> -}
> -
>  static inline int pte_present(pte_t a)
>  {
> -   return pteval_present(pte_flags(a));
> +   return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE |
> +  _PAGE_NUMA);
>  }
>
>  #define pte_accessible pte_accessible
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -365,7 +365,7 @@ void xen_ptep_modify_prot_commit(struct
>  /* Assume pteval_t is equivalent to all the other *val_t types. */
>  static pteval_t pte_mfn_to_pfn(pteval_t val)
>  {
> -   if (pteval_present(val)) {
> +   if (val & _PAGE_PRESENT) {
> unsigned long mfn = (val & PTE_PFN_MASK) >> PAGE_SHIFT;
> unsigned long pfn = mfn_to_pfn(mfn);
>
> @@ -381,7 +381,7 @@ static pteval_t pte_mfn_to_pfn(pteval_t
>
>  static pteval_t pte_pfn_to_mfn(pteval_t val)
>  {
> -   if (pteval_present(val)) {
> +   if (val & _PAGE_PRESENT) {
> unsigned long pfn = (val & PTE_PFN_MASK) >> PAGE_SHIFT;
> pteval_t flags = val & PTE_FLAGS_MASK;
> unsigned long mfn;
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels

2014-04-08 Thread Steven Noonan

On Tue, Apr 8, 2014 at 8:16 AM, H. Peter Anvin  wrote:
> 
>
> Of course, it would also be preferable if Amazon (or anything else) didn't 
> need Xen PV :(

Well Amazon doesn't expose NUMA on PV, only on HVM guests.

> On April 7, 2014 9:04:53 PM PDT, Steven Noonan  wrote:
>>On Mon, Apr 7, 2014 at 2:25 PM, Mel Gorman  wrote:
>>> On Mon, Apr 07, 2014 at 12:42:40PM -0700, H. Peter Anvin wrote:
>>>> On 04/07/2014 12:36 PM, Cyrill Gorcunov wrote:
>>>> > On Mon, Apr 07, 2014 at 12:27:10PM -0700, H. Peter Anvin wrote:
>>>> >> On 04/07/2014 11:28 AM, Mel Gorman wrote:
>>>> >>>
>>>> >>> I had considered the soft-dirty tracking usage of the same bit.
>>I thought I'd
>>>> >>> be able to swizzle around it or a further worst case of having
>>soft-dirty and
>>>> >>> automatic NUMA balancing mutually exclusive. Unfortunately upon
>>examination
>>>> >>> it's not obvious how to have both of them share a bit and I
>>suspect any
>>>> >>> attempt to will break CRIU.  In my current tree, NUMA_BALANCING
>>cannot be
>>>> >>> set if MEM_SOFT_DIRTY which is not particularly satisfactory.
>>Next on the
>>>> >>> list is examining if _PAGE_BIT_IOMAP can be used.
>>>> >>
>>>> >> Didn't we smoke the last user of _PAGE_BIT_IOMAP?
>>>> >
>>>> > Seems so, at least for non-kernel pages (not considering this bit
>>references in
>>>> > xen code, which i simply don't know but i guess it's used for
>>kernel pages only).
>>>> >
>>>>
>>>> David Vrabel has a patchset which I presumed would be pulled through
>>the
>>>> Xen tree this merge window:
>>>>
>>>> [PATCHv5 0/8] x86/xen: fixes for mapping high MMIO regions (and
>>remove
>>>> _PAGE_IOMAP)
>>>>
>>>> That frees up this bit.
>>>>
>>>
>>> Thanks, I was not aware of that patch.  Based on it, I intend to
>>force
>>> automatic NUMA balancing to depend on !XEN and see what the reaction
>>is. If
>>> support for Xen is really required then it potentially be re-enabled
>>if/when
>>> that series is merged assuming they do not need the bit for something
>>else.
>>>
>>
>>Amazon EC2 does have large memory instance types with NUMA exposed to
>>the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable
>>(to me anyway) if we didn't require !XEN.
>
> --
> Sent from my mobile phone.  Please pardon brevity and lack of formatting.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels

2014-04-07 Thread Steven Noonan

On Mon, Apr 7, 2014 at 2:25 PM, Mel Gorman  wrote:
> On Mon, Apr 07, 2014 at 12:42:40PM -0700, H. Peter Anvin wrote:
>> On 04/07/2014 12:36 PM, Cyrill Gorcunov wrote:
>> > On Mon, Apr 07, 2014 at 12:27:10PM -0700, H. Peter Anvin wrote:
>> >> On 04/07/2014 11:28 AM, Mel Gorman wrote:
>> >>>
>> >>> I had considered the soft-dirty tracking usage of the same bit. I 
>> >>> thought I'd
>> >>> be able to swizzle around it or a further worst case of having 
>> >>> soft-dirty and
>> >>> automatic NUMA balancing mutually exclusive. Unfortunately upon 
>> >>> examination
>> >>> it's not obvious how to have both of them share a bit and I suspect any
>> >>> attempt to will break CRIU.  In my current tree, NUMA_BALANCING cannot be
>> >>> set if MEM_SOFT_DIRTY which is not particularly satisfactory. Next on the
>> >>> list is examining if _PAGE_BIT_IOMAP can be used.
>> >>
>> >> Didn't we smoke the last user of _PAGE_BIT_IOMAP?
>> >
>> > Seems so, at least for non-kernel pages (not considering this bit 
>> > references in
>> > xen code, which i simply don't know but i guess it's used for kernel pages 
>> > only).
>> >
>>
>> David Vrabel has a patchset which I presumed would be pulled through the
>> Xen tree this merge window:
>>
>> [PATCHv5 0/8] x86/xen: fixes for mapping high MMIO regions (and remove
>> _PAGE_IOMAP)
>>
>> That frees up this bit.
>>
>
> Thanks, I was not aware of that patch.  Based on it, I intend to force
> automatic NUMA balancing to depend on !XEN and see what the reaction is. If
> support for Xen is really required then it potentially be re-enabled if/when
> that series is merged assuming they do not need the bit for something else.
>

Amazon EC2 does have large memory instance types with NUMA exposed to
the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable
(to me anyway) if we didn't require !XEN.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BISECTED] Xen HVM guest hangs since 3.12-rc5

2014-02-22 Thread Steven Noonan

On Fri, Feb 21, 2014 at 12:07 PM, Konrad Rzeszutek Wilk
 wrote:
> On Thu, Feb 20, 2014 at 12:44:15PM -0800, Steven Noonan wrote:
>> On Wed, Feb 19, 2014 at 1:01 PM, Steven Noonan  wrote:
>> > On Wed, Feb 19, 2014 at 9:41 AM, Konrad Rzeszutek Wilk
>> >  wrote:
>> >> On Tue, Feb 18, 2014 at 11:16:05PM -0800, Steven Noonan wrote:
>> >>> I've been running into problems on an Xen HVM domU. I've got a guest 
>> >>> with NUMA
>> >>> enabled, 60GB of RAM, and 3 disks attached (including root volume). 2 of 
>> >>> the
>> >>> disks are in an MD RAID0 in the guest, with an ext4 filesystem on top of 
>> >>> that.
>> >>> I was running the fio 'iometer-file-access-server.fio' example config 
>> >>> against
>> >>> that fs. During this workload, it would eventually cause a soft lockup, 
>> >>> like
>> >>> the below:
>> >>
>> >> I presume since you mention NUMA and Mel is CC-ed that if you boot without
>> >> NUMA enabled (either via the toolstack or via Linux command line) - the 
>> >> issue
>> >> is not present?
>> >
>> > I mentioned NUMA because the bisected commit is sched/numa, and the
>> > guest is NUMA-enabled. I hadn't attempted booting with NUMA off. I
>> > just tried with numa=off, and the workload has run in a loop for 20
>> > minutes so far with no issues (normally the issue would repro in less
>> > than 5).
>>
>> The subject line is actually incorrect -- I did a 'git describe' on
>> the result of the bisection when writing the subject line, but the
>> '3.12-rc5' tag was just the base on which the code was originally
>> developed. As far as what tags actually contain the commit:
>>
>> $ git tag --contains b795854b1fa70f6aee923ae5df74ff7afeaddcaa
>> v3.13
>> v3.13-rc1
>> v3.13-rc2
>> v3.13-rc3
>> v3.13-rc4
>> v3.13-rc5
>> v3.13-rc6
>> v3.13-rc7
>> v3.13-rc8
>> v3.13.1
>> v3.13.2
>> v3.13.3
>> v3.14-rc1
>> v3.14-rc2
>>
>> So it's more accurate to say it was introduced in the v3.13 merge window.
>>
>> In any case, does anyone have any ideas?
>
> There is nothing in that git commit that gives that 'AHA' feeling.
>
> If you revert that patch on top of the latest Linux kernel does the problem
> go away? This is more of a double-check to see if the commit
> is really the fault or if it exposed some latent issue.

I just tried out 3.13.5 and the problem went away. Looking through the
commit logs, it appears this commit (added as part of 3.13.4) resolved
the issue:

commit 27b4328e523b3de854229e6b505f94aa9708dde6
Author: KOSAKI Motohiro 
Date:   Thu Feb 6 12:04:24 2014 -0800

mm: __set_page_dirty_nobuffers() uses spin_lock_irqsave() instead
of spin_lock_irq()

commit a85d9df1ea1d23682a0ed1e100e6965006595d06 upstream.

During aio stress test, we observed the following lockdep warning.  This
mean AIO+numa_balancing is currently deadlockable.

The problem is, aio_migratepage disable interrupt, but
__set_page_dirty_nobuffers unintentionally enable it again.

Generally, all helper function should use spin_lock_irqsave() instead of
spin_lock_irq() because they don't know caller at all.

   other info that might help us debug this:
Possible unsafe locking scenario:

  CPU0
  
 lock(&(&ctx->completion_lock)->rlock);
 
   lock(&(&ctx->completion_lock)->rlock);

*** DEADLOCK ***

  dump_stack+0x19/0x1b
  print_usage_bug+0x1f7/0x208
  mark_lock+0x21d/0x2a0
  mark_held_locks+0xb9/0x140
  trace_hardirqs_on_caller+0x105/0x1d0
  trace_hardirqs_on+0xd/0x10
  _raw_spin_unlock_irq+0x2c/0x50
  __set_page_dirty_nobuffers+0x8c/0xf0
  migrate_page_copy+0x434/0x540
  aio_migratepage+0xb1/0x140
  move_to_new_page+0x7d/0x230
  migrate_pages+0x5e5/0x700
  migrate_misplaced_page+0xbc/0xf0
  do_numa_page+0x102/0x190
  handle_pte_fault+0x241/0x970
  handle_mm_fault+0x265/0x370
  __do_page_fault+0x172/0x5a0
  do_page_fault+0x1a/0x70
  page_fault+0x28/0x30

Signed-off-by: KOSAKI Motohiro 
Cc: Larry Woodman 
Cc: Rik van Riel 
Cc: Johannes Weiner 
Acked-by: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman 

>>
>> >>>
>> >>> [ 2536.250054] BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u257:0:7]

Re: [BISECTED] Xen HVM guest hangs since 3.12-rc5

2014-02-20 Thread Steven Noonan

On Wed, Feb 19, 2014 at 1:01 PM, Steven Noonan  wrote:
> On Wed, Feb 19, 2014 at 9:41 AM, Konrad Rzeszutek Wilk
>  wrote:
>> On Tue, Feb 18, 2014 at 11:16:05PM -0800, Steven Noonan wrote:
>>> I've been running into problems on an Xen HVM domU. I've got a guest with 
>>> NUMA
>>> enabled, 60GB of RAM, and 3 disks attached (including root volume). 2 of the
>>> disks are in an MD RAID0 in the guest, with an ext4 filesystem on top of 
>>> that.
>>> I was running the fio 'iometer-file-access-server.fio' example config 
>>> against
>>> that fs. During this workload, it would eventually cause a soft lockup, like
>>> the below:
>>
>> I presume since you mention NUMA and Mel is CC-ed that if you boot without
>> NUMA enabled (either via the toolstack or via Linux command line) - the issue
>> is not present?
>
> I mentioned NUMA because the bisected commit is sched/numa, and the
> guest is NUMA-enabled. I hadn't attempted booting with NUMA off. I
> just tried with numa=off, and the workload has run in a loop for 20
> minutes so far with no issues (normally the issue would repro in less
> than 5).

The subject line is actually incorrect -- I did a 'git describe' on
the result of the bisection when writing the subject line, but the
'3.12-rc5' tag was just the base on which the code was originally
developed. As far as what tags actually contain the commit:

$ git tag --contains b795854b1fa70f6aee923ae5df74ff7afeaddcaa
v3.13
v3.13-rc1
v3.13-rc2
v3.13-rc3
v3.13-rc4
v3.13-rc5
v3.13-rc6
v3.13-rc7
v3.13-rc8
v3.13.1
v3.13.2
v3.13.3
v3.14-rc1
v3.14-rc2

So it's more accurate to say it was introduced in the v3.13 merge window.

In any case, does anyone have any ideas?

>>>
>>> [ 2536.250054] BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u257:0:7]
>>> [ 2536.250054] Modules linked in: isofs crct10dif_pclmul crct10dif_common 
>>> crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 lrw 
>>> gf128mul glue_helper ablk_helper cryptd raid0 md_mod acpi_cpufreq psmouse 
>>> i2c_piix4 intel_agp intel_gtt i2c_core processor serio_raw evdev microcode 
>>> ext4 crc16 mbcache jbd2 ata_generic pata_acpi ata_piix libata scsi_mod 
>>> floppy ixgbevf xen_privcmd xen_netfront xen_kbdfront syscopyarea 
>>> sysfillrect sysimgblt fb_sys_fops xen_blkfront virtio_pci virtio_net 
>>> virtio_blk virtio_ring virtio ipmi_poweroff ipmi_msghandler button
>>> [ 2536.250054] CPU: 0 PID: 7 Comm: kworker/u257:0 Tainted: GW
>>> 3.12.0-rc4-bisect-00073-g6fe6b2d #26
>>> [ 2536.250054] Hardware name: Xen HVM domU, BIOS 4.2.amazon 01/14/2014
>>> [ 2536.250054] Workqueue: writeback bdi_writeback_workfn (flush-202:0)
>>> [ 2536.250054] task: 880766533400 ti: 88076652e000 task.ti: 
>>> 88076652e000
>>> [ 2536.250054] RIP: 0010:[]  [] 
>>> smp_call_function_many+0x258/0x2b0
>>> [ 2536.250054] RSP: 0018:88076652f878  EFLAGS: 0202
>>> [ 2536.250054] RAX: 000f RBX: 88076652f808 RCX: 
>>> 880ef0ef74a8
>>> [ 2536.250054] RDX: 000f RSI: 0080 RDI: 
>>> 
>>> [ 2536.250054] RBP: 88076652f8c0 R08: 880771046c00 R09: 
>>> 880770c008e0
>>> [ 2536.250054] R10: 003e R11: 0210 R12: 
>>> 88076652f7f0
>>> [ 2536.250054] R13: 810b859e R14: 88076652f7e0 R15: 
>>> 810b50e7
>>> [ 2536.250054] FS:  () GS:88077160() 
>>> knlGS:
>>> [ 2536.250054] CS:  0010 DS:  ES:  CR0: 80050033
>>> [ 2536.250054] CR2: 7f8752bea000 CR3: 01a0d000 CR4: 
>>> 001406f0
>>> [ 2536.250054] Stack:
>>> [ 2536.250054]  000181275231 00014d00 88076652f8d0 
>>> 810564e0
>>> [ 2536.250054]  88076530b180 7f0c8826a000 880ed4d57700 
>>> 88076530b180
>>> [ 2536.250054]  880ed4cc6350 88076652f8e8 81056637 
>>> 88076530b180
>>> [ 2536.250054] Call Trace:
>>> [ 2536.250054]  [] ? leave_mm+0x70/0x70
>>> [ 2536.250054]  [] native_flush_tlb_others+0x37/0x40
>>> [ 2536.250054]  [] flush_tlb_page+0x88/0x90
>>> [ 2536.250054]  [] ptep_clear_flush+0x34/0x40
>>> [ 2536.250054]  [] page_mkclean+0x12e/0x1d0
>>> [ 2536.250054]  [] clear_page_dirty_for_io+0x3b/0xe0
>>> [ 2536.250054]  [] mpage_submit_page+0x52/0x80 [ext4]
>>> [ 2536.250054]  [] mpage_process_page_bufs+0x109/0x140 
>>> [ext4]
>&

Re: [BISECTED] Xen HVM guest hangs since 3.12-rc5

2014-02-19 Thread Steven Noonan

On Wed, Feb 19, 2014 at 9:41 AM, Konrad Rzeszutek Wilk
 wrote:
> On Tue, Feb 18, 2014 at 11:16:05PM -0800, Steven Noonan wrote:
>> I've been running into problems on an Xen HVM domU. I've got a guest with 
>> NUMA
>> enabled, 60GB of RAM, and 3 disks attached (including root volume). 2 of the
>> disks are in an MD RAID0 in the guest, with an ext4 filesystem on top of 
>> that.
>> I was running the fio 'iometer-file-access-server.fio' example config against
>> that fs. During this workload, it would eventually cause a soft lockup, like
>> the below:
>
> I presume since you mention NUMA and Mel is CC-ed that if you boot without
> NUMA enabled (either via the toolstack or via Linux command line) - the issue
> is not present?

I mentioned NUMA because the bisected commit is sched/numa, and the
guest is NUMA-enabled. I hadn't attempted booting with NUMA off. I
just tried with numa=off, and the workload has run in a loop for 20
minutes so far with no issues (normally the issue would repro in less
than 5).


>>
>> [ 2536.250054] BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u257:0:7]
>> [ 2536.250054] Modules linked in: isofs crct10dif_pclmul crct10dif_common 
>> crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 lrw 
>> gf128mul glue_helper ablk_helper cryptd raid0 md_mod acpi_cpufreq psmouse 
>> i2c_piix4 intel_agp intel_gtt i2c_core processor serio_raw evdev microcode 
>> ext4 crc16 mbcache jbd2 ata_generic pata_acpi ata_piix libata scsi_mod 
>> floppy ixgbevf xen_privcmd xen_netfront xen_kbdfront syscopyarea sysfillrect 
>> sysimgblt fb_sys_fops xen_blkfront virtio_pci virtio_net virtio_blk 
>> virtio_ring virtio ipmi_poweroff ipmi_msghandler button
>> [ 2536.250054] CPU: 0 PID: 7 Comm: kworker/u257:0 Tainted: GW
>> 3.12.0-rc4-bisect-00073-g6fe6b2d #26
>> [ 2536.250054] Hardware name: Xen HVM domU, BIOS 4.2.amazon 01/14/2014
>> [ 2536.250054] Workqueue: writeback bdi_writeback_workfn (flush-202:0)
>> [ 2536.250054] task: 880766533400 ti: 88076652e000 task.ti: 
>> 88076652e000
>> [ 2536.250054] RIP: 0010:[]  [] 
>> smp_call_function_many+0x258/0x2b0
>> [ 2536.250054] RSP: 0018:88076652f878  EFLAGS: 0202
>> [ 2536.250054] RAX: 000f RBX: 88076652f808 RCX: 
>> 880ef0ef74a8
>> [ 2536.250054] RDX: 000f RSI: 0080 RDI: 
>> 
>> [ 2536.250054] RBP: 88076652f8c0 R08: 880771046c00 R09: 
>> 880770c008e0
>> [ 2536.250054] R10: 003e R11: 0210 R12: 
>> 88076652f7f0
>> [ 2536.250054] R13: 810b859e R14: 88076652f7e0 R15: 
>> 810b50e7
>> [ 2536.250054] FS:  () GS:88077160() 
>> knlGS:
>> [ 2536.250054] CS:  0010 DS:  ES:  CR0: 80050033
>> [ 2536.250054] CR2: 7f8752bea000 CR3: 01a0d000 CR4: 
>> 001406f0
>> [ 2536.250054] Stack:
>> [ 2536.250054]  000181275231 00014d00 88076652f8d0 
>> 810564e0
>> [ 2536.250054]  88076530b180 7f0c8826a000 880ed4d57700 
>> 88076530b180
>> [ 2536.250054]  880ed4cc6350 88076652f8e8 81056637 
>> 88076530b180
>> [ 2536.250054] Call Trace:
>> [ 2536.250054]  [] ? leave_mm+0x70/0x70
>> [ 2536.250054]  [] native_flush_tlb_others+0x37/0x40
>> [ 2536.250054]  [] flush_tlb_page+0x88/0x90
>> [ 2536.250054]  [] ptep_clear_flush+0x34/0x40
>> [ 2536.250054]  [] page_mkclean+0x12e/0x1d0
>> [ 2536.250054]  [] clear_page_dirty_for_io+0x3b/0xe0
>> [ 2536.250054]  [] mpage_submit_page+0x52/0x80 [ext4]
>> [ 2536.250054]  [] mpage_process_page_bufs+0x109/0x140 
>> [ext4]
>> [ 2536.250054]  [] mpage_prepare_extent_to_map+0x217/0x2d0 
>> [ext4]
>> [ 2536.250054]  [] ext4_writepages+0x469/0xca0 [ext4]
>> [ 2536.250054]  [] do_writepages+0x1e/0x50
>> [ 2536.250054]  [] __writeback_single_inode+0x76/0x240
>> [ 2536.250054]  [] writeback_sb_inodes+0x282/0x420
>> [ 2536.250054]  [] __writeback_inodes_wb+0x7f/0xd0
>> [ 2536.250054]  [] wb_writeback+0x15b/0x2a0
>> [ 2536.250054]  [] bdi_writeback_workfn+0x1d7/0x450
>> [ 2536.250054]  [] process_one_work+0x25d/0x460
>> [ 2536.250054]  [] worker_thread+0x266/0x480
>> [ 2536.250054]  [] ? manage_workers.isra.18+0x3f0/0x3f0
>> [ 2536.250054]  [] kthread+0xbb/0xd0
>> [ 2536.250054]  [] ? kthread_stop+0xf0/0xf0
>> [ 2536.250054]  [] ret_from_fork+0x7c/0xb0
>> [ 2536.250054]  [] ? kthread_stop+0xf0/0xf0
>> [ 2536.250054] Code: 00 74 70 48 63 35 d1 1f a1 00 ba ff ff ff ff eb 29 66 
&g

[BISECTED] Xen HVM guest hangs since 3.12-rc5

2014-02-18 Thread Steven Noonan

I've been running into problems on an Xen HVM domU. I've got a guest with NUMA
enabled, 60GB of RAM, and 3 disks attached (including root volume). 2 of the
disks are in an MD RAID0 in the guest, with an ext4 filesystem on top of that.
I was running the fio 'iometer-file-access-server.fio' example config against
that fs. During this workload, it would eventually cause a soft lockup, like
the below:

[ 2536.250054] BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u257:0:7]
[ 2536.250054] Modules linked in: isofs crct10dif_pclmul crct10dif_common 
crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 lrw 
gf128mul glue_helper ablk_helper cryptd raid0 md_mod acpi_cpufreq psmouse 
i2c_piix4 intel_agp intel_gtt i2c_core processor serio_raw evdev microcode ext4 
crc16 mbcache jbd2 ata_generic pata_acpi ata_piix libata scsi_mod floppy 
ixgbevf xen_privcmd xen_netfront xen_kbdfront syscopyarea sysfillrect sysimgblt 
fb_sys_fops xen_blkfront virtio_pci virtio_net virtio_blk virtio_ring virtio 
ipmi_poweroff ipmi_msghandler button
[ 2536.250054] CPU: 0 PID: 7 Comm: kworker/u257:0 Tainted: GW
3.12.0-rc4-bisect-00073-g6fe6b2d #26
[ 2536.250054] Hardware name: Xen HVM domU, BIOS 4.2.amazon 01/14/2014
[ 2536.250054] Workqueue: writeback bdi_writeback_workfn (flush-202:0)
[ 2536.250054] task: 880766533400 ti: 88076652e000 task.ti: 
88076652e000
[ 2536.250054] RIP: 0010:[]  [] 
smp_call_function_many+0x258/0x2b0
[ 2536.250054] RSP: 0018:88076652f878  EFLAGS: 0202
[ 2536.250054] RAX: 000f RBX: 88076652f808 RCX: 880ef0ef74a8
[ 2536.250054] RDX: 000f RSI: 0080 RDI: 
[ 2536.250054] RBP: 88076652f8c0 R08: 880771046c00 R09: 880770c008e0
[ 2536.250054] R10: 003e R11: 0210 R12: 88076652f7f0
[ 2536.250054] R13: 810b859e R14: 88076652f7e0 R15: 810b50e7
[ 2536.250054] FS:  () GS:88077160() 
knlGS:
[ 2536.250054] CS:  0010 DS:  ES:  CR0: 80050033
[ 2536.250054] CR2: 7f8752bea000 CR3: 01a0d000 CR4: 001406f0
[ 2536.250054] Stack:
[ 2536.250054]  000181275231 00014d00 88076652f8d0 
810564e0
[ 2536.250054]  88076530b180 7f0c8826a000 880ed4d57700 
88076530b180
[ 2536.250054]  880ed4cc6350 88076652f8e8 81056637 
88076530b180
[ 2536.250054] Call Trace:
[ 2536.250054]  [] ? leave_mm+0x70/0x70
[ 2536.250054]  [] native_flush_tlb_others+0x37/0x40
[ 2536.250054]  [] flush_tlb_page+0x88/0x90
[ 2536.250054]  [] ptep_clear_flush+0x34/0x40
[ 2536.250054]  [] page_mkclean+0x12e/0x1d0
[ 2536.250054]  [] clear_page_dirty_for_io+0x3b/0xe0
[ 2536.250054]  [] mpage_submit_page+0x52/0x80 [ext4]
[ 2536.250054]  [] mpage_process_page_bufs+0x109/0x140 [ext4]
[ 2536.250054]  [] mpage_prepare_extent_to_map+0x217/0x2d0 
[ext4]
[ 2536.250054]  [] ext4_writepages+0x469/0xca0 [ext4]
[ 2536.250054]  [] do_writepages+0x1e/0x50
[ 2536.250054]  [] __writeback_single_inode+0x76/0x240
[ 2536.250054]  [] writeback_sb_inodes+0x282/0x420
[ 2536.250054]  [] __writeback_inodes_wb+0x7f/0xd0
[ 2536.250054]  [] wb_writeback+0x15b/0x2a0
[ 2536.250054]  [] bdi_writeback_workfn+0x1d7/0x450
[ 2536.250054]  [] process_one_work+0x25d/0x460
[ 2536.250054]  [] worker_thread+0x266/0x480
[ 2536.250054]  [] ? manage_workers.isra.18+0x3f0/0x3f0
[ 2536.250054]  [] kthread+0xbb/0xd0
[ 2536.250054]  [] ? kthread_stop+0xf0/0xf0
[ 2536.250054]  [] ret_from_fork+0x7c/0xb0
[ 2536.250054]  [] ? kthread_stop+0xf0/0xf0
[ 2536.250054] Code: 00 74 70 48 63 35 d1 1f a1 00 ba ff ff ff ff eb 29 66 90 
48 98 48 8b 0b 48 03 0c c5 00 27 ad 81 f6 41 20 01 74 14 0f 1f 44 00 00  90 
f6 41 20 01 75 f8 48 63 35 a1 1f a1 00 48 8b 7b 08 83 c2 
[ 2544.900055] BUG: soft lockup - CPU#31 stuck for 24s! [systemd-journal:304]
[ 2544.900055] Modules linked in: isofs crct10dif_pclmul crct10dif_common 
crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 lrw 
gf128mul glue_helper ablk_helper cryptd raid0 md_mod acpi_cpufreq psmouse 
i2c_piix4 intel_agp intel_gtt i2c_core processor serio_raw evdev microcode ext4 
crc16 mbcache jbd2 ata_generic pata_acpi ata_piix libata scsi_mod floppy 
ixgbevf xen_privcmd xen_netfront xen_kbdfront syscopyarea sysfillrect sysimgblt 
fb_sys_fops xen_blkfront virtio_pci virtio_net virtio_blk virtio_ring virtio 
ipmi_poweroff ipmi_msghandler button
[ 2544.900055] CPU: 31 PID: 304 Comm: systemd-journal Tainted: GW
3.12.0-rc4-bisect-00073-g6fe6b2d #26
[ 2544.900055] Hardware name: Xen HVM domU, BIOS 4.2.amazon 01/14/2014
[ 2544.900055] task: 880764bcb400 ti: 8807653f6000 task.ti: 
8807653f6000
[ 2544.900055] RIP: 0010:[]  [] 
generic_exec_single+0x80/0xa0
[ 2544.900055] RSP: 0018:8807653f7c80  EFLAGS: 0202
[ 2544.900055] RAX: 0080 RBX: 813207fd RCX: 0080
[ 2544.900055] RDX: 0001 RSI:  R

Re: Lenovo X240 (haswell) suspend-to-ram hangs on 3-14.0-rc2

2014-02-14 Thread Steven Noonan

On Fri, Feb 14, 2014 at 11:01 AM, Jeff Chua  wrote:
> On Fri, Feb 14, 2014 at 9:57 PM, Takashi Iwai  wrote:
>> The other possible change in hda_intel.c is the enablement of runtime
>> PM for Panther Point.  But it's been working for other chips, so
>> wondering why it hits anything.  In anyway, please give the full
>> Oops messages not only the stack trace.
>
>> Any difference in the sound hardware, i.e. PCI controller and codec
>> chips?
>
> # X230 reported the sound card as:
> 00:1b.0 Audio device: Intel Corporation 7 Series/C210 Series Chipset
> Family High Definition Audio Controller (rev 04)
> == HDA Intel PCH, ALC269VC Analog
>
> # X240 reported the sound card as:
> 00:1b.0 Audio device: Intel Corporation Lynx Point-LP HD Audio
> Controller (rev 04)
> ==  HDA Intel PCH, ALC292 Analog
>
> Now I managed to make suspend-to-ram work by using sound as module
> instead of build-in.
>
> Here's the difference ...
>
> # bad
> CONFIG_SND_HDA_CODEC_HDMI =y
> CONFIG_SND_HDA_INTEL=y
> CONFIG_SND_HDA_INPUT_BEEP=y
> CONFIG_SND_HDA_CODEC_HDMI=y
> CONFIG_SND_HDA_GENERIC=y
>
>
> # good
> CONFIG_SND_HDA_CODEC_HDMI =m
> CONFIG_SND_HDA_INTEL=m
> CONFIG_SND_HDA_INPUT_BEEP=m
> CONFIG_SND_HDA_CODEC_HDMI=m
> CONFIG_SND_HDA_GENERIC=m
>
>
> Strange?
>
> Jeff

Of those modules, which are loaded? If you load all of them and then
try to suspend, does it still work?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [tip:x86/urgent] compiler/gcc4: Make quirk for asm_volatile_goto( ) unconditional

2014-02-13 Thread Steven Noonan

It's the current Arch Linux GCC package:

$ gcc -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.2/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: /build/gcc/src/gcc-4.8-20140206/configure
--prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib
--mandir=/usr/share/man --infodir=/usr/share/info
--with-bugurl=https://bugs.archlinux.org/
 --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++
--enable-shared --enable-threads=posix --with-system-zlib
--enable-__cxa_atexit --disable-libunwind-exceptions
--enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp
--enable-gnu-unique-object --enable-linker-build-id
--enable-cloog-backend=isl --disable-cloog-version-check --enable-lto
--enable-plugin --enable-install-libiberty
--with-linker-hash-style=gnu --disable-multilib --disable-werror
--enable-checking=release
Thread model: posix
gcc version 4.8.2 20140206 (prerelease) (GCC)

https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=packages/gcc&id=53e7ca8e616ebf6530f5dc43e86381dff92f136c

(Resending this message because gmail keeps defaulting to HTML email
and vger.kernel.org keeps rejecting those... Thanks Google.)

On Thu, Feb 13, 2014 at 3:55 AM, Jakub Jelinek  wrote:
> On Thu, Feb 13, 2014 at 03:37:08AM -0800, tip-bot for Steven Noonan wrote:
>> Commit-ID:  a9f180345f5378ac87d80ed0bea55ba421d83859
>> Gitweb: 
>> http://git.kernel.org/tip/a9f180345f5378ac87d80ed0bea55ba421d83859
>> Author: Steven Noonan 
>> AuthorDate: Wed, 12 Feb 2014 23:01:07 -0800
>> Committer:  Ingo Molnar 
>> CommitDate: Thu, 13 Feb 2014 12:34:05 +0100
>>
>> compiler/gcc4: Make quirk for asm_volatile_goto() unconditional
>>
>> I started noticing problems with KVM guest destruction on Linux
>> 3.12+, where guest memory wasn't being cleaned up. I bisected it
>> down to the commit introducing the new 'asm goto'-based atomics,
>> and found this quirk was later applied to those.
>>
>> Unfortunately, even with GCC 4.8.2 (which ostensibly fixed the
>> known 'asm goto' bug) I am still getting some kind of
>> miscompilation. If I enable the asm_volatile_goto quirk for my
>> compiler, KVM guests are destroyed correctly and the memory is
>> cleaned up.
>
> BTW, which exact 4.8.2 were you using?
> The last known asm goto bug has been fixed on October, 10th, 2013:
> http://gcc.gnu.org/PR58670
> so before the October, 16th, 2013 4.8.2 release.  But already since
> May 31th, 2013 the tip of the 4.8 GCC branch has been announcing itself
> as 4.8.2 prerelease.  While some distribution versions of GCC announce
> themselves as the new version only starting from the release date,
> i.e. snapshots in between 4.8.1 release and 4.8.2 release announce
> themselves as 4.8.1, in other distributions or upstream it announces itself
> as 4.8.2.  So, if you are using the latter and a snapshot in between May
> 31th, 2013 and October, 10th, 2013, then you could see gcc patchlevel 2,
> yet have a gcc with that bug unfixed.
> So, if the kernel doesn't use a runtime test/configure test to check for
> this issue, but instead just relies on the patchlevel version, the only
> safe way would be to look for GCC >= 4.9 or GCC 4.8 with patchlevel > 2
> rather than > 1.
>
> Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/urgent] compiler/gcc4: Make quirk for asm_volatile_goto( ) unconditional

2014-02-13 Thread tip-bot for Steven Noonan

Commit-ID:  a9f180345f5378ac87d80ed0bea55ba421d83859
Gitweb: http://git.kernel.org/tip/a9f180345f5378ac87d80ed0bea55ba421d83859
Author: Steven Noonan 
AuthorDate: Wed, 12 Feb 2014 23:01:07 -0800
Committer:  Ingo Molnar 
CommitDate: Thu, 13 Feb 2014 12:34:05 +0100

compiler/gcc4: Make quirk for asm_volatile_goto() unconditional

I started noticing problems with KVM guest destruction on Linux
3.12+, where guest memory wasn't being cleaned up. I bisected it
down to the commit introducing the new 'asm goto'-based atomics,
and found this quirk was later applied to those.

Unfortunately, even with GCC 4.8.2 (which ostensibly fixed the
known 'asm goto' bug) I am still getting some kind of
miscompilation. If I enable the asm_volatile_goto quirk for my
compiler, KVM guests are destroyed correctly and the memory is
cleaned up.

So make the quirk unconditional for now, until bug is found
and fixed.

Suggested-by: Linus Torvalds 
Signed-off-by: Steven Noonan 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Cc: Jakub Jelinek 
Cc: Richard Henderson 
Cc: Andrew Morton 
Cc: Oleg Nesterov 
Cc: 
Link: 
http://lkml.kernel.org/r/1392274867-15236-1-git-send-email-ste...@uplinklabs.net
Link: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670
Signed-off-by: Ingo Molnar 
---
 include/linux/compiler-gcc4.h | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
index ded4299..2507fd2 100644
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -75,11 +75,7 @@
  *
  * (asm goto is automatically volatile - the naming reflects this.)
  */
-#if GCC_VERSION <= 40801
-# define asm_volatile_goto(x...)   do { asm goto(x); asm (""); } while (0)
-#else
-# define asm_volatile_goto(x...)   do { asm goto(x); } while (0)
-#endif
+#define asm_volatile_goto(x...)do { asm goto(x); asm (""); } while (0)
 
 #ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP
 #if GCC_VERSION >= 40400
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] compiler/gcc4: make quirk for asm_volatile_goto unconditional

2014-02-12 Thread Steven Noonan

I started noticing problems with KVM guest destruction on Linux 3.12+, where
guest memory wasn't being cleaned up. I bisected it down to the commit
introducing the new 'asm goto'-based atomics, and found this quirk was later
applied to those.

Unfortunately, even with GCC 4.8.2 (which ostensibly fixed the known 'asm goto'
bug) I am still getting some kind of miscompilation. If I enable the
asm_volatile_goto quirk for my compiler, KVM guests are destroyed correctly and
the memory is cleaned up.

[1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670
Cc: Ingo Molnar 
Cc: Linus Torvalds 
Cc: sta...@vger.kernel.org
Signed-off-by: Steven Noonan 
---

v2: Adding sta...@vger.kernel.org to Cc.

 include/linux/compiler-gcc4.h | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
index ded4299..2507fd2 100644
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -75,11 +75,7 @@
  *
  * (asm goto is automatically volatile - the naming reflects this.)
  */
-#if GCC_VERSION <= 40801
-# define asm_volatile_goto(x...)   do { asm goto(x); asm (""); } while (0)
-#else
-# define asm_volatile_goto(x...)   do { asm goto(x); } while (0)
-#endif
+#define asm_volatile_goto(x...)do { asm goto(x); asm (""); } while (0)
 
 #ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP
 #if GCC_VERSION >= 40400
-- 
1.8.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] compiler/gcc4: make quirk for asm_volatile_goto unconditional

2014-02-12 Thread Steven Noonan

I started noticing problems with KVM guest destruction on Linux 3.12+, where
guest memory wasn't being cleaned up. I bisected it down to the commit
introducing the new 'asm goto'-based atomics, and found this quirk was later
applied to those.

Unfortunately, even with GCC 4.8.2 (which ostensibly fixed the known 'asm goto'
bug) I am still getting some kind of miscompilation. If I enable the
asm_volatile_goto quirk for my compiler, KVM guests are destroyed correctly and
the memory is cleaned up.

[1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670
Cc: Ingo Molnar 
Cc: Linus Torvalds 
Signed-off-by: Steven Noonan 
---
 include/linux/compiler-gcc4.h | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
index ded4299..2507fd2 100644
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -75,11 +75,7 @@
  *
  * (asm goto is automatically volatile - the naming reflects this.)
  */
-#if GCC_VERSION <= 40801
-# define asm_volatile_goto(x...)   do { asm goto(x); asm (""); } while (0)
-#else
-# define asm_volatile_goto(x...)   do { asm goto(x); } while (0)
-#endif
+#define asm_volatile_goto(x...)do { asm goto(x); asm (""); } while (0)
 
 #ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP
 #if GCC_VERSION >= 40400
-- 
1.8.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] compiler/gcc4: Add quirk for 'asm goto' miscompilation bug

2014-02-12 Thread Steven Noonan

Resurrecting this thread, as I'm running with GCC 4.8.2 and am
encountering miscompiles without this quirk being enabled for my
compiler version. I'm having trouble pinning down the miscompilation
itself, but I have a problem that seems reliably reproducible in my
environment.

I noticed that when I launch/destroy a KVM guest, the guest memory is
staying mapped. i.e. host has 200MB used, guest launches, ~4300MB
used, guest terminates, still 4300MB used and no qemu-system-x86_64
process hanging around. Thus depending on the guest memory size I can
exhaust host memory after a few guest reboots.

If I change the GCC_VERSION check for the asm_volatile_goto quirk to
include 4.8.2, then KVM guests are properly cleaned up.

The test case provided in the 'asm goto' bugzilla entry doesn't fail
on my compiler: gcc.gnu.org/bugzilla/show_bug.cgi?id=58670

So is there some other 'asm goto' bug we haven't yet fully uncovered
and reported to GCC upstream? Anyone have any idea where to look for
the miscompilation? I started by looking at the
mmu_notifier_unregister() function, since that seems like the obvious
place for a guest memory unmap problem. mmu_notifier_unregister()
calls mmdrop(), which uses the atomic_dec_and_test macro to determine
whether or not to call __mmdrop(). The generated code looks correct to
me, but it's possibly not this callsite that's broken:

On Sat, Oct 12, 2013 at 10:10 AM, Ingo Molnar  wrote:
> Linus,
>
> Please pull the latest core-urgent-for-linus git tree from:
>
>git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
> core-urgent-for-linus
>
>HEAD: 3f0116c3238a96bc18ad4b4acefe4e7be32fa861 compiler/gcc4: Add quirk 
> for 'asm goto' miscompilation bug
>
> This is the fix for the GCC miscompilation discussed in the following lkml
> thread:
>
>[x86] BUG: unable to handle kernel paging request at 00740060
>
> The bug in GCC has been fixed by Jakub and the fix will be part of the GCC
> 4.8.2 release expected to be released next week - so the quirk's version
> test checks for <= 4.8.1.
>
> The quirk is only added to compiler-gcc4.h and not to the higher level
> compiler.h because all asm goto uses are behind a feature check.
>
>  Thanks,
>
> Ingo
>
> -->
> Ingo Molnar (1):
>   compiler/gcc4: Add quirk for 'asm goto' miscompilation bug
>
>
>  arch/arm/include/asm/jump_label.h |  2 +-
>  arch/mips/include/asm/jump_label.h|  2 +-
>  arch/powerpc/include/asm/jump_label.h |  2 +-
>  arch/s390/include/asm/jump_label.h|  2 +-
>  arch/sparc/include/asm/jump_label.h   |  2 +-
>  arch/x86/include/asm/cpufeature.h |  6 +++---
>  arch/x86/include/asm/jump_label.h |  2 +-
>  arch/x86/include/asm/mutex_64.h   |  4 ++--
>  include/linux/compiler-gcc4.h | 15 +++
>  9 files changed, 26 insertions(+), 11 deletions(-)
>
> diff --git a/arch/arm/include/asm/jump_label.h 
> b/arch/arm/include/asm/jump_label.h
> index bfc198c..863c892 100644
> --- a/arch/arm/include/asm/jump_label.h
> +++ b/arch/arm/include/asm/jump_label.h
> @@ -16,7 +16,7 @@
>
>  static __always_inline bool arch_static_branch(struct static_key *key)
>  {
> -   asm goto("1:\n\t"
> +   asm_volatile_goto("1:\n\t"
>  JUMP_LABEL_NOP "\n\t"
>  ".pushsection __jump_table,  \"aw\"\n\t"
>  ".word 1b, %l[l_yes], %c0\n\t"
> diff --git a/arch/mips/include/asm/jump_label.h 
> b/arch/mips/include/asm/jump_label.h
> index 4d6d77e..e194f95 100644
> --- a/arch/mips/include/asm/jump_label.h
> +++ b/arch/mips/include/asm/jump_label.h
> @@ -22,7 +22,7 @@
>
>  static __always_inline bool arch_static_branch(struct static_key *key)
>  {
> -   asm goto("1:\tnop\n\t"
> +   asm_volatile_goto("1:\tnop\n\t"
> "nop\n\t"
> ".pushsection __jump_table,  \"aw\"\n\t"
> WORD_INSN " 1b, %l[l_yes], %0\n\t"
> diff --git a/arch/powerpc/include/asm/jump_label.h 
> b/arch/powerpc/include/asm/jump_label.h
> index ae098c4..f016bb6 100644
> --- a/arch/powerpc/include/asm/jump_label.h
> +++ b/arch/powerpc/include/asm/jump_label.h
> @@ -19,7 +19,7 @@
>
>  static __always_inline bool arch_static_branch(struct static_key *key)
>  {
> -   asm goto("1:\n\t"
> +   asm_volatile_goto("1:\n\t"
>  "nop\n\t"
>  ".pushsection __jump_table,  \"aw\"\n\t"
>  JUMP_ENTRY_TYPE "1b, %l[l_yes], %c0\n\t"
> diff --git a/arch/s390/include/asm/jump_label.h 
> b/arch/s390/include/asm/jump_label.h
> index 6c32190..346b1c8 100644
> --- a/arch/s390/include/asm/jump_label.h
> +++ b/arch/s390/include/asm/jump_label.h
> @@ -15,7 +15,7 @@
>
>  static __always_inline bool arch_static_branch(struct static_key *key)
>  {
> -   asm goto("0:brcl 0,0\n"
> +   asm_volatile_goto("0:   brcl 0,0\n"
> ".pushsection __jump_table, \"aw\"\n"
> ASM_ALIGN "\n"
> ASM_PTR " 0b, %l[label], %0\n"
> diff --git a/arch/sparc/include/asm/

Re: [BISECTED] Linux 3.12.7 introduces page map handling regression

2014-01-23 Thread Steven Noonan

On Thu, Jan 23, 2014 at 11:23:37AM -0500, Elena Ufimtseva wrote:
> On Wed, Jan 22, 2014 at 3:33 PM, Steven Noonan  wrote:
> > On Wed, Jan 22, 2014 at 03:18:50PM -0500, Elena Ufimtseva wrote:
> >> On Wed, Jan 22, 2014 at 9:29 AM, Daniel Borkmann  
> >> wrote:
> >> > On 01/22/2014 08:29 AM, Steven Noonan wrote:
> >> >>
> >> >> On Wed, Jan 22, 2014 at 12:02:15AM -0500, Konrad Rzeszutek Wilk wrote:
> >> >>>
> >> >>> On Tue, Jan 21, 2014 at 07:20:45PM -0800, Steven Noonan wrote:
> >> >>>>
> >> >>>> On Tue, Jan 21, 2014 at 06:47:07PM -0800, Linus Torvalds wrote:
> >> >>>>>
> >> >>>>> On Tue, Jan 21, 2014 at 5:49 PM, Greg Kroah-Hartman
> >> >>>>>  wrote:
> >> >>>
> >> >>>
> >> >>> Adding extra folks to the party.
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> Odds are this also shows up in 3.13, right?
> >> >>>>
> >> >>>>
> >> >>>> Reproduced using 3.13 on the PV guest:
> >> >>>>
> >> >>>> [  368.756763] BUG: Bad page map in process mp
> >> >>>> pte:8004a67c6165 pmd:e9b706067
> >> >>>> [  368.756777] page:ea001299f180 count:0 mapcount:-1
> >> >>>> mapping:  (null) index:0x0
> >> >>>> [  368.756781] page flags: 0x2f8014(referenced|dirty)
> >> >>>> [  368.756786] addr:7fd1388b7000 vm_flags:00100071
> >> >>>> anon_vma:880e9ba15f80 mapping:  (null) index:7fd1388b7
> >> >>>> [  368.756792] CPU: 29 PID: 618 Comm: mp Not tainted 
> >> >>>> 3.13.0-ec2
> >> >>>> #1
> >> >>>> [  368.756795]  880e9b718958 880e9eaf3cc0
> >> >>>> 814d8748 7fd1388b7000
> >> >>>> [  368.756803]  880e9eaf3d08 8116d289
> >> >>>>  
> >> >>>> [  368.756809]  880e9b7065b8 ea001299f180
> >> >>>> 7fd1388b8000 880e9eaf3e30
> >> >>>> [  368.756815] Call Trace:
> >> >>>> [  368.756825]  [] dump_stack+0x45/0x56
> >> >>>> [  368.756833]  [] print_bad_pte+0x229/0x250
> >> >>>> [  368.756837]  []
> >> >>>> unmap_single_vma+0x583/0x890
> >> >>>> [  368.756842]  [] unmap_vmas+0x65/0x90
> >> >>>> [  368.756847]  [] unmap_region+0xac/0x120
> >> >>>> [  368.756852]  [] ? 
> >> >>>> vma_rb_erase+0x1c9/0x210
> >> >>>> [  368.756856]  [] do_munmap+0x280/0x370
> >> >>>> [  368.756860]  [] vm_munmap+0x41/0x60
> >> >>>> [  368.756864]  [] SyS_munmap+0x22/0x30
> >> >>>> [  368.756869]  []
> >> >>>> system_call_fastpath+0x1a/0x1f
> >> >>>> [  368.756872] Disabling lock debugging due to kernel taint
> >> >>>> [  368.760084] BUG: Bad rss-counter state mm:880e9d079680
> >> >>>> idx:0 val:-1
> >> >>>> [  368.760091] BUG: Bad rss-counter state mm:880e9d079680
> >> >>>> idx:1 val:1
> >> >>>>
> >> >>>>>
> >> >>>>> Probably. I don't have a Xen PV setup to test with (and very little
> >> >>>>> interest in setting one up).. And I have a suspicion that it might 
> >> >>>>> not
> >> >>>>> be so much about Xen PV, as perhaps about the kind of hardware.
> >> >>>>>
> >> >>>>> I suspect the issue has something to do with the magic _PAGE_NUMA
> >> >>>>> tie-in with _PAGE_PRESENT. And then mprotect(PROT_NONE) ends up
> >> >>>>> removing the _PAGE_PRESENT bit, and now the crazy numa code is
> >> >>>>> confused.
> >> >>>>>
> >> >>>>> The whole _PAGE_NUMA thing is a f*cking horrible hack, and shares the
> >> >>>>> bit with _PAGE_PROTNONE, which is why it then has that tie-in to
> >> >>>>> _PAGE

Re: [BISECTED] Linux 3.12.7 introduces page map handling regression

2014-01-22 Thread Steven Noonan

On Wed, Jan 22, 2014 at 03:18:50PM -0500, Elena Ufimtseva wrote:
> On Wed, Jan 22, 2014 at 9:29 AM, Daniel Borkmann  
> wrote:
> > On 01/22/2014 08:29 AM, Steven Noonan wrote:
> >>
> >> On Wed, Jan 22, 2014 at 12:02:15AM -0500, Konrad Rzeszutek Wilk wrote:
> >>>
> >>> On Tue, Jan 21, 2014 at 07:20:45PM -0800, Steven Noonan wrote:
> >>>>
> >>>> On Tue, Jan 21, 2014 at 06:47:07PM -0800, Linus Torvalds wrote:
> >>>>>
> >>>>> On Tue, Jan 21, 2014 at 5:49 PM, Greg Kroah-Hartman
> >>>>>  wrote:
> >>>
> >>>
> >>> Adding extra folks to the party.
> >>>>>>
> >>>>>>
> >>>>>> Odds are this also shows up in 3.13, right?
> >>>>
> >>>>
> >>>> Reproduced using 3.13 on the PV guest:
> >>>>
> >>>> [  368.756763] BUG: Bad page map in process mp
> >>>> pte:8004a67c6165 pmd:e9b706067
> >>>> [  368.756777] page:ea001299f180 count:0 mapcount:-1
> >>>> mapping:  (null) index:0x0
> >>>> [  368.756781] page flags: 0x2f8014(referenced|dirty)
> >>>> [  368.756786] addr:7fd1388b7000 vm_flags:00100071
> >>>> anon_vma:880e9ba15f80 mapping:  (null) index:7fd1388b7
> >>>> [  368.756792] CPU: 29 PID: 618 Comm: mp Not tainted 3.13.0-ec2
> >>>> #1
> >>>> [  368.756795]  880e9b718958 880e9eaf3cc0
> >>>> 814d8748 7fd1388b7000
> >>>> [  368.756803]  880e9eaf3d08 8116d289
> >>>>  
> >>>> [  368.756809]  880e9b7065b8 ea001299f180
> >>>> 7fd1388b8000 880e9eaf3e30
> >>>> [  368.756815] Call Trace:
> >>>> [  368.756825]  [] dump_stack+0x45/0x56
> >>>> [  368.756833]  [] print_bad_pte+0x229/0x250
> >>>> [  368.756837]  []
> >>>> unmap_single_vma+0x583/0x890
> >>>> [  368.756842]  [] unmap_vmas+0x65/0x90
> >>>> [  368.756847]  [] unmap_region+0xac/0x120
> >>>> [  368.756852]  [] ? vma_rb_erase+0x1c9/0x210
> >>>> [  368.756856]  [] do_munmap+0x280/0x370
> >>>> [  368.756860]  [] vm_munmap+0x41/0x60
> >>>> [  368.756864]  [] SyS_munmap+0x22/0x30
> >>>> [  368.756869]  []
> >>>> system_call_fastpath+0x1a/0x1f
> >>>> [  368.756872] Disabling lock debugging due to kernel taint
> >>>> [  368.760084] BUG: Bad rss-counter state mm:880e9d079680
> >>>> idx:0 val:-1
> >>>> [  368.760091] BUG: Bad rss-counter state mm:880e9d079680
> >>>> idx:1 val:1
> >>>>
> >>>>>
> >>>>> Probably. I don't have a Xen PV setup to test with (and very little
> >>>>> interest in setting one up).. And I have a suspicion that it might not
> >>>>> be so much about Xen PV, as perhaps about the kind of hardware.
> >>>>>
> >>>>> I suspect the issue has something to do with the magic _PAGE_NUMA
> >>>>> tie-in with _PAGE_PRESENT. And then mprotect(PROT_NONE) ends up
> >>>>> removing the _PAGE_PRESENT bit, and now the crazy numa code is
> >>>>> confused.
> >>>>>
> >>>>> The whole _PAGE_NUMA thing is a f*cking horrible hack, and shares the
> >>>>> bit with _PAGE_PROTNONE, which is why it then has that tie-in to
> >>>>> _PAGE_PRESENT.
> >>>>>
> >>>>> Adding Andrea to the Cc, because he's the author of that horridness.
> >>>>> Putting Steven's test-case here as an attachement for Andrea, maybe
> >>>>> that makes him go "Ahh, yes, silly case".
> >>>>>
> >>>>> Also added Kirill, because he was involved the last _PAGE_NUMA debacle.
> >>>>>
> >>>>> Andrea, you can find the thread on lkml, but it boils down to commit
> >>>>> 1667918b6483 (backported to 3.12.7 as 3d792d616ba4) breaking the
> >>>>> attached test-case (but apparently only under Xen PV). There it
> >>>>> apparently causes a "BUG: Bad page map .." error.
> >>>
> >>&

Re: [BISECTED] Linux 3.12.7 introduces page map handling regression

2014-01-21 Thread Steven Noonan

On Wed, Jan 22, 2014 at 12:02:15AM -0500, Konrad Rzeszutek Wilk wrote:
> On Tue, Jan 21, 2014 at 07:20:45PM -0800, Steven Noonan wrote:
> > On Tue, Jan 21, 2014 at 06:47:07PM -0800, Linus Torvalds wrote:
> > > On Tue, Jan 21, 2014 at 5:49 PM, Greg Kroah-Hartman
> > >  wrote:
> 
> Adding extra folks to the party.
> > > >
> > > > Odds are this also shows up in 3.13, right?
> > 
> > Reproduced using 3.13 on the PV guest:
> > 
> > [  368.756763] BUG: Bad page map in process mp  pte:8004a67c6165 
> > pmd:e9b706067
> > [  368.756777] page:ea001299f180 count:0 mapcount:-1 mapping:   
> >(null) index:0x0
> > [  368.756781] page flags: 0x2f8014(referenced|dirty)
> > [  368.756786] addr:7fd1388b7000 vm_flags:00100071 
> > anon_vma:880e9ba15f80 mapping:  (null) index:7fd1388b7
> > [  368.756792] CPU: 29 PID: 618 Comm: mp Not tainted 3.13.0-ec2 #1
> > [  368.756795]  880e9b718958 880e9eaf3cc0 814d8748 
> > 7fd1388b7000
> > [  368.756803]  880e9eaf3d08 8116d289  
> > 
> > [  368.756809]  880e9b7065b8 ea001299f180 7fd1388b8000 
> > 880e9eaf3e30
> > [  368.756815] Call Trace:
> > [  368.756825]  [] dump_stack+0x45/0x56
> > [  368.756833]  [] print_bad_pte+0x229/0x250
> > [  368.756837]  [] unmap_single_vma+0x583/0x890
> > [  368.756842]  [] unmap_vmas+0x65/0x90
> > [  368.756847]  [] unmap_region+0xac/0x120
> > [  368.756852]  [] ? vma_rb_erase+0x1c9/0x210
> > [  368.756856]  [] do_munmap+0x280/0x370
> > [  368.756860]  [] vm_munmap+0x41/0x60
> > [  368.756864]  [] SyS_munmap+0x22/0x30
> > [  368.756869]  [] system_call_fastpath+0x1a/0x1f
> > [  368.756872] Disabling lock debugging due to kernel taint
> > [  368.760084] BUG: Bad rss-counter state mm:880e9d079680 idx:0 
> > val:-1
> > [  368.760091] BUG: Bad rss-counter state mm:880e9d079680 idx:1 
> > val:1
> > 
> > > 
> > > Probably. I don't have a Xen PV setup to test with (and very little
> > > interest in setting one up).. And I have a suspicion that it might not
> > > be so much about Xen PV, as perhaps about the kind of hardware.
> > > 
> > > I suspect the issue has something to do with the magic _PAGE_NUMA
> > > tie-in with _PAGE_PRESENT. And then mprotect(PROT_NONE) ends up
> > > removing the _PAGE_PRESENT bit, and now the crazy numa code is
> > > confused.
> > > 
> > > The whole _PAGE_NUMA thing is a f*cking horrible hack, and shares the
> > > bit with _PAGE_PROTNONE, which is why it then has that tie-in to
> > > _PAGE_PRESENT.
> > > 
> > > Adding Andrea to the Cc, because he's the author of that horridness.
> > > Putting Steven's test-case here as an attachement for Andrea, maybe
> > > that makes him go "Ahh, yes, silly case".
> > > 
> > > Also added Kirill, because he was involved the last _PAGE_NUMA debacle.
> > > 
> > > Andrea, you can find the thread on lkml, but it boils down to commit
> > > 1667918b6483 (backported to 3.12.7 as 3d792d616ba4) breaking the
> > > attached test-case (but apparently only under Xen PV). There it
> > > apparently causes a "BUG: Bad page map .." error.
> 
> I *think* it is due to the fact that pmd_numa and pte_numa is getting the 
> _raw_
> value of PMDs and PTEs. That is - it does not use the pvops interface
> and instead reads the values directly from the page-table. Since the
> page-table is also manipulated by the hypervisor - there are certain
> flags it also sets to do its business. It might be that it uses
> _PAGE_GLOBAL as well - and Linux picks up on that. If it was using
> pte_flags that would invoke the pvops interface.
> 
> Elena, Dariof and George, you guys had been looking at this a bit deeper
> than I have. Does the Xen hypervisor use the _PAGE_GLOBAL for PV guests?
> 
> This not-compiled-totally-bad-patch might shed some light on what I was
> thinking _could_ fix this issue - and IS NOT A FIX - JUST A HACK.
> It does not fix it for PMDs naturally (as there are no PMD paravirt ops
> for that).

Unfortunately the Totally Bad Patch seems to make no difference. I am
still able to repro the issue:

[  346.374929] BUG: Bad page map in process mp  pte:8004ae928065 
pmd:e993f9067
[  346.374942] page:ea0012ba4a00 count:0 mapcount:-1 mapping:   
   (null) index:0x0
[  346.374946] page flags: 0x2f8014(referenced|di

Re: [BISECTED] Linux 3.12.7 introduces page map handling regression

2014-01-21 Thread Steven Noonan

On Tue, Jan 21, 2014 at 06:47:07PM -0800, Linus Torvalds wrote:
> On Tue, Jan 21, 2014 at 5:49 PM, Greg Kroah-Hartman
>  wrote:
> >
> > Odds are this also shows up in 3.13, right?

Reproduced using 3.13 on the PV guest:

[  368.756763] BUG: Bad page map in process mp  pte:8004a67c6165 
pmd:e9b706067
[  368.756777] page:ea001299f180 count:0 mapcount:-1 mapping:   
   (null) index:0x0
[  368.756781] page flags: 0x2f8014(referenced|dirty)
[  368.756786] addr:7fd1388b7000 vm_flags:00100071 
anon_vma:880e9ba15f80 mapping:  (null) index:7fd1388b7
[  368.756792] CPU: 29 PID: 618 Comm: mp Not tainted 3.13.0-ec2 #1
[  368.756795]  880e9b718958 880e9eaf3cc0 814d8748 
7fd1388b7000
[  368.756803]  880e9eaf3d08 8116d289  

[  368.756809]  880e9b7065b8 ea001299f180 7fd1388b8000 
880e9eaf3e30
[  368.756815] Call Trace:
[  368.756825]  [] dump_stack+0x45/0x56
[  368.756833]  [] print_bad_pte+0x229/0x250
[  368.756837]  [] unmap_single_vma+0x583/0x890
[  368.756842]  [] unmap_vmas+0x65/0x90
[  368.756847]  [] unmap_region+0xac/0x120
[  368.756852]  [] ? vma_rb_erase+0x1c9/0x210
[  368.756856]  [] do_munmap+0x280/0x370
[  368.756860]  [] vm_munmap+0x41/0x60
[  368.756864]  [] SyS_munmap+0x22/0x30
[  368.756869]  [] system_call_fastpath+0x1a/0x1f
[  368.756872] Disabling lock debugging due to kernel taint
[  368.760084] BUG: Bad rss-counter state mm:880e9d079680 idx:0 
val:-1
[  368.760091] BUG: Bad rss-counter state mm:880e9d079680 idx:1 
val:1

> 
> Probably. I don't have a Xen PV setup to test with (and very little
> interest in setting one up).. And I have a suspicion that it might not
> be so much about Xen PV, as perhaps about the kind of hardware.
> 
> I suspect the issue has something to do with the magic _PAGE_NUMA
> tie-in with _PAGE_PRESENT. And then mprotect(PROT_NONE) ends up
> removing the _PAGE_PRESENT bit, and now the crazy numa code is
> confused.
> 
> The whole _PAGE_NUMA thing is a f*cking horrible hack, and shares the
> bit with _PAGE_PROTNONE, which is why it then has that tie-in to
> _PAGE_PRESENT.
> 
> Adding Andrea to the Cc, because he's the author of that horridness.
> Putting Steven's test-case here as an attachement for Andrea, maybe
> that makes him go "Ahh, yes, silly case".
> 
> Also added Kirill, because he was involved the last _PAGE_NUMA debacle.
> 
> Andrea, you can find the thread on lkml, but it boils down to commit
> 1667918b6483 (backported to 3.12.7 as 3d792d616ba4) breaking the
> attached test-case (but apparently only under Xen PV). There it
> apparently causes a "BUG: Bad page map .." error.
> 
> And I suspect this is another of those "this bug is only visible on
> real numa machines, because _PAGE_NUMA isn't actually ever set
> otherwise". That has pretty much guaranteed that it gets basically
> zero testing, which is not a great idea when coupled with that subtle
> sharing of the _PAGE_PROTNONE bit..
> 
> It may be that the whole "Xen PV" thing is a red herring, and that
> Steven only sees it on that one machine because the one he runs as a
> PV guest under is a real NUMA machine, and all the other machines he
> has tried it on haven't been numa. So it *may* be that that "only
> under Xen PV" is a red herring. But that's just a possible guess.

The PV and HVM guests are both on NUMA hosts, but we don't expose NUMA to the
PV guest, so it fakes a NUMA node at startup.

I've also tried running a PV guest on a dual socket host with interleaved
memory:

# dmesg | grep -i -e numa -e node
[0.00] NUMA turned off
[0.00] Faking a node at [mem 
0x-0x0005607f]
[0.00] Initmem setup node 0 [mem 0x-0x5607f]
[0.00]   NODE_DATA [mem 0x55d4f2000-0x55d518fff]
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[0.00]   node   0: [mem 0x1000-0x0009]
[0.00]   node   0: [mem 0x0010-0x5607f]
[0.00] On node 0 totalpages: 5638047
[0.00] setup_percpu: NR_CPUS:4096 nr_cpumask_bits:16 
nr_cpu_ids:16 nr_node_ids:1
[0.00] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=16, 
Nodes=1
[0.010697] Inode-cache hash table entries: 2097152 (order: 12, 
16777216 bytes)
# dmesg | tail -n 21
[  348.467265] BUG: Bad page map in process t  pte:80008a6ef165 
pmd:53aa39067
[  348.467280] page:ea000229bbc0 count:0 mapcount:-1 mapping:   
   (null) index:0x0
[  348.467286] page flags: 0x1ffc14(referenced|dirty)
[  348.467293] addr:7f8c9fca vm_flags:00100071 
anon_vma:88053aff19c0 mapping:  (

[BISECTED] Linux 3.12.7 introduces page map handling regression

2014-01-21 Thread Steven Noonan

A user reported a problem starting vsftpd on a Xen paravirtualized
guest, with this in dmesg:

[   60.654862] BUG: Bad page map in process vsftpd  pte:800493b88165 
pmd:e9cc01067
[   60.654876] page:ea00124ee200 count:0 mapcount:-1 mapping: (null) 
index:0x0
[   60.654879] page flags: 0x2ffc14(referenced|dirty)
[   60.654885] addr:7f97eea74000 vm_flags:00100071 
anon_vma:880e98f80380 mapping:  (null) index:7f97eea74
[   60.654890] CPU: 4 PID: 587 Comm: vsftpd Not tainted 3.12.7-1-ec2 #1
[   60.654893]  880e9cc6ec38 880e9cc61ca0 814c763b 
7f97eea74000
[   60.654900]  880e9cc61ce8 8116784e  

[   60.654906]  880e9cc013a0 ea00124ee200 7f97eea75000 
880e9cc61e10
[   60.654912] Call Trace:
[   60.654921]  [] dump_stack+0x45/0x56
[   60.654928]  [] print_bad_pte+0x22e/0x250
[   60.654933]  [] unmap_single_vma+0x583/0x890
[   60.654938]  [] unmap_vmas+0x65/0x90
[   60.654942]  [] exit_mmap+0xc5/0x170
[   60.654948]  [] mmput+0x65/0x100
[   60.654952]  [] do_exit+0x393/0x9e0
[   60.654955]  [] do_group_exit+0xcc/0x140
[   60.654959]  [] SyS_exit_group+0x14/0x20
[   60.654965]  [] system_call_fastpath+0x1a/0x1f
[   60.654968] Disabling lock debugging due to kernel taint
[   60.655191] BUG: Bad rss-counter state mm:880e9ca60580 idx:0 val:-1
[   60.655196] BUG: Bad rss-counter state mm:880e9ca60580 idx:1 val:1


The issue could not be reproduced under an HVM instance with the same
kernel, so it appears to be exclusive to paravirtual Xen guests.

I noted that it wasn't present in 3.10.27, but was present in 3.12.7 and
3.12.8. I ran through a bisection to find the root cause:

 # start: 'v3.12.7' 'v3.10.27'
 # bad:  [4301b7a8] Linux 3.12.7
 # good: [1071ea6e] Linux 3.10.27
 # good: [8bb495e3] Linux 3.10
 # good: [8fe73691] staging: comedi: comedi_bond: change return value
 # good: [22e04f6b] Merge branch 'for-linus' of git://git.kernel.org/p
 # good: [b7c09ad4] Merge branch 'for-linus' of git://git.kernel.org/p
 # good: [13caa8ed] Merge git://git.kernel.org/pub/scm/linux/kernel/gi
 # good: [13caa8ed] Merge git://git.kernel.org/pub/scm/linux/kernel/gi
 # good: [f5fa9283] ipv6: reset dst.expires value when clearing expire
 # good: [4af9d888] bridge: flush br's address entry in fdb when remov
 # good: [8c13daf6] dm delay: fix a possible deadlock due to shared wo
 # good: [93c02d70] firewire: sbp2: bring back WRITE SAME support
 # good: [18065245] ACPI / PCI / hotplug: Avoid warning when _ADR not
 # bad:  [8807a436] mm/memory-failure.c: transfer page count from head
 # bad:  [fd5df800] mm: numa: avoid unnecessary disruption of NUMA hin
 # good: [c18e3316] mm: numa: do not clear PMD during PTE update scan
 # good: [f3b578d9] mm: numa: avoid unnecessary work on the failure pa
 # bad:  [3d792d61] mm: numa: clear numa hinting information on mprote
 # good: [cefeb279] sched: numa: skip inaccessible VMAs
 # first bad:  [3d792d61] mm: numa: clear numa hinting information on mprote

If only I'd tested v3.12.0, that bisection would have been a lot shorter!


It looks like this is the change implicated (introduced in v3.12.7):

commit 3d792d616ba408ab55a54c1bb75a9367d997acfa
Author: Mel Gorman 
Date:   Tue Jan 7 14:00:44 2014 +

mm: numa: clear numa hinting information on mprotect

commit 1667918b6483b12a6496bf54151b827b8235d7b1 upstream.

On a protection change it is no longer clear if the page should be still
accessible.  This patch clears the NUMA hinting fault bits on a
protection change.

Signed-off-by: Mel Gorman 
Reviewed-by: Rik van Riel 
Cc: Alex Thorlton 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman 


This clearly points to breakage of mprotect() in particular. Checking
what vsftpd was doing via strace, I was able to come up with a simple
test case which triggers the issue:

#include 
#include 
#include 
#include 

void die(const char *what)
{
perror(what);
exit(1);
}

int main(int arg, char **argv)
{
void *p = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE | 
MAP_ANONYMOUS, -1, 0);

if (p == MAP_FAILED)
die("mmap");

/* Tickle the page. */
((char *)p)[0] = 0;

if (mprotect(p, 4096, PROT_NONE) != 0)
die("mprotect");

if (mprotect(p, 4096, PROT_READ) != 0)
die("mprotect");

if (munmap(p, 4096) != 0)
die("munmap");

return 0;
}

This could probably be reduced further. I didn't spend much time on it.

Adding people cited in the patch to CC, as well as Konrad since this is
a Xen issue (I haven't been able to repro on HVM or bare metal so far).

Any ideas what's causing the BUG, and how we can fix it?

- Steven
--
To uns

Re: [PATCH] menuconfig: Allow j/k to move down/up the menu

2013-07-09 Thread Steven Noonan

On Tue, Jul 9, 2013 at 9:55 AM, Heimo Stranner  wrote:
>
>> Well, one advantage of these key bindings is, that you don't have to
>> move your hands away from the base line. And as vim user I find myself
>> using vim key sequences in all tools. I cannot even count how many
>> processes I accidentally killed because I tried to scroll using 'k' in
>> htop where it does 'kill'.
>>
>> In a nutshell: I'm all for vim key bindings everywhere!
>>
> I can't agree more!
>
> Because vim key bindings are rather common I can't see a problem if this
> patch is applied.

The biggest problem I see is the conflict between j/k and the letter
shortcuts used to jump to specific menu items (the letter highlighted
in blue on each item). But I haven't ever used that functionality in
practice, so who knows.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Portege Z830 and Tecra R840 failing to resume from suspend

2013-04-07 Thread Steven Noonan

Bugzilla bug: https://bugzilla.kernel.org/show_bug.cgi?id=42977

Can someone please give this bug some attention? Aaron Lu from Intel
has been adding people to the bug CC list but nobody seems to be
responding, and per his suggestion I'm mailing the list.

The Z830 and R840 suspend successfully but do not resume without the
extra udelay added by acpi_sleep=s3_beep.

We haven't found any 64-bit kernel builds so far which resume
successfully. As for 32-bit kernels, I haven't tested them myself, but
Artur in the bugzilla (CC'd) has only successfully resumed his R840
with a 32-bit kernel.

- Steven
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Supporting SYSRQ on broken laptops like the thinkpad T530

2013-03-30 Thread Steven Noonan

On Sat, Mar 30, 2013 at 10:56 AM, Pavel Machek  wrote:
> On Fri 2013-03-22 15:31:41, Marc MERLIN wrote:
>> On Wed, Jan 09, 2013 at 03:36:44AM +0100, Roland Eggner wrote:
>> > On 2013-01-08 Tuesday at 15:09 -0800 Marc MERLIN wrote:
>> > > In its infinite wisdom, lenovo has removed the sysrq key on the latest
>> > > thinkpads, and replaced it with a stupid ALT+FN+S key combination, which
>> > > doesn't really work for doing sysrq from the console (nor do I know how 
>> > > the
>> > > genius who did that intended for SYSRQ-S to work).
>> > > http://forums.lenovo.com/t5/T400-T500-and-newer-T-series/T430-s-T530-Where-are-the-shortcut-function-keys-break-Pause-etc/ta-p/781749
>> > >
>> > > I realize that one solution is to throw my laptop window at a suitable 
>> > > high
>> > > floorand replace it with one from a vendor that doesn't randomly remove 
>> > > keys
>> > > from the keyboard.
>> > > That said, I was wondering if there were other solutions, especially
>> > > considering that thinkpads used to be the better linux laptops.
>> >
>> > My Dell “Precision M4500” notebook suffers similar (same?) problem.  So far
>> > I could not find a solution better than this:  e.g. Alt-Fn-SysRq-s
>> >
>> > press and hold Alt
>> > press and hold Fn
>> > press and leave F10|SysRq
>> > leave Fn
>> > press and leave s
>> > leave Alt
>>
>> Just for the sake of the archives, turns out that on the lenovo T430 and T530
>> you should ignore the Lenovo documentation I quoted above, and you can
>> indeed use the PrtSc key between Right Alt and Right Ctrl, that key works
>> just fine for Sysrq.
>>
>> I have no idea why Lenovo felt they had to document some complicated
>> alternate software sysrq with Fn+S
>
> Well... I feel resposible for sysrq stuff... I wanted it to be
> non-intrusive. But it might have been bad choice - sysrq was not meant
> to be used as a shift.
>
> Sometimes it works, sometimes it does not. Don't blame lenovo for
> that.
>
> Maybe it should be modified to take sysrq and _then_ key?
>
> Or maybe we should use something like lshift+rshift+lalt+ralt+key?
>

I vote for "something like". Some keyboards (such as my Thinkpad X230
with UK layout) have lshift, lalt, rshift, but no ralt. the ralt is
replaced by AltGr.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] s3c24xx: fix link failure if CONFIG_CONSOLE_POLL but !CONFIG_SERIAL_SAMSUNG_CONSOLE

2013-03-01 Thread Steven Noonan

Resolves this link failure:
ERROR: "s3c24xx_serial_get_poll_char" [drivers/tty/serial/samsung.ko] 
undefined!
ERROR: "s3c24xx_serial_put_poll_char" [drivers/tty/serial/samsung.ko] 
undefined!

Signed-off-by: Steven Noonan 
---
 drivers/tty/serial/samsung.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/tty/serial/samsung.c b/drivers/tty/serial/samsung.c
index 2769a38..1290646 100644
--- a/drivers/tty/serial/samsung.c
+++ b/drivers/tty/serial/samsung.c
@@ -889,17 +889,17 @@ static int __init s3c24xx_serial_console_init(void)
 }
 console_initcall(s3c24xx_serial_console_init);
 
-#define S3C24XX_SERIAL_CONSOLE &s3c24xx_serial_console
-#else
-#define S3C24XX_SERIAL_CONSOLE NULL
-#endif
-
 #ifdef CONFIG_CONSOLE_POLL
 static int s3c24xx_serial_get_poll_char(struct uart_port *port);
 static void s3c24xx_serial_put_poll_char(struct uart_port *port,
 unsigned char c);
 #endif
 
+#define S3C24XX_SERIAL_CONSOLE &s3c24xx_serial_console
+#else
+#define S3C24XX_SERIAL_CONSOLE NULL
+#endif
+
 static struct uart_ops s3c24xx_serial_ops = {
.pm = s3c24xx_serial_pm,
.tx_empty   = s3c24xx_serial_tx_empty,
@@ -918,7 +918,7 @@ static struct uart_ops s3c24xx_serial_ops = {
.request_port   = s3c24xx_serial_request_port,
.config_port= s3c24xx_serial_config_port,
.verify_port= s3c24xx_serial_verify_port,
-#ifdef CONFIG_CONSOLE_POLL
+#if defined(CONFIG_CONSOLE_POLL) && defined(CONFIG_SERIAL_SAMSUNG_CONSOLE)
.poll_get_char = s3c24xx_serial_get_poll_char,
.poll_put_char = s3c24xx_serial_put_poll_char,
 #endif
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] xenbus: fix compile failure on ARM with Xen enabled

2013-03-01 Thread Steven Noonan

Adding an include of linux/mm.h resolves this:
drivers/xen/xenbus/xenbus_client.c: In function 
‘xenbus_map_ring_valloc_hvm’:
drivers/xen/xenbus/xenbus_client.c:532:66: error: implicit declaration 
of function ‘page_to_section’ [-Werror=implicit-function-declaration]

Signed-off-by: Steven Noonan 
---
 drivers/xen/xenbus/xenbus_client.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/xen/xenbus/xenbus_client.c 
b/drivers/xen/xenbus/xenbus_client.c
index bcf3ba4..61786be 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -30,6 +30,7 @@
  * IN THE SOFTWARE.
  */
 
+#include 
 #include 
 #include 
 #include 
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[BISECTED] snd-hda-intel audio distortion in Linus' current tree

2012-09-26 Thread Steven Noonan

Started having audio problems when trying out the latest tree
(v3.6-rc7-10-g56d27ad). When playing any kind of audio, there was
significant distortion, mostly crackling noise. I'm using a Lenovo
ThinkPad X230 (Panther Point).

I did a git-bisect to locate the problem, and it seems this commit is to
blame:

c20c5a841cbe47f5b7812b57bd25397497e5fbc0 is the first bad commit
commit c20c5a841cbe47f5b7812b57bd25397497e5fbc0
Author: Seth Heasley 
Date:   Thu Jun 14 14:23:53 2012 -0700

ALSA: hda_intel: activate COMBO mode for Intel client chipsets

This patch activates the COMBO position_fix for recent Intel 
client chipsets.
COMBO mode is the recommended setting for Intel chipsets and 
eliminates HD
audio warnings in dmesg.  This patch has been tested on Lynx 
Point, Panther
Point, and Cougar Pont.

Signed-off-by: Seth Heasley 
Signed-off-by: Takashi Iwai 

It's pretty clear-cut. If I revert this patch, my sound starts
functioning normally again.

Any thoughts on how to proceed here? Can someone revert this, or is
there some testing that I can do?

Here's a pretty-printed bisection log, if needed:

 # good: [28a33cbc] Linux 3.5
 # bad:  [b13bc8dd] Merge tag 'staging-3.6-rc1' of git://git.kernel.or
 # good: [3c4cfade] Merge git://git.kernel.org/pub/scm/linux/kernel/gi
 # bad:  [9fc37779] Merge tag 'usb-3.6-rc1' of git://git.kernel.org/pu
 # bad:  [f14121ab] Merge tag 'dt-for-3.6' of git://sources.calxeda.co
 # good: [d14b7a41] Merge branch 'for-linus' of git://git.kernel.org/p
 # good: [15d47763] Merge branch 'for-3.5' into for-3.6
 # bad:  [dbf7b591] Merge tag 'sound-3.6' of git://git.kernel.org/pub/
 # bad:  [1c76684d] ALSA: hda - add Haswell HDMI codec id
 # bad:  [8b8d654b] ALSA: hda - Move one-time init codes from generic_
 # good: [80c8bfbe] ALSA: HDA: Create phantom jacks for fixed inputs a
 # bad:  [ceaa86ba] ALSA: hda - Remove invalid init verbs for Nvidia 2
 # bad:  [4b6ace9e] ALSA: hda - Add the support for VIA HDMI pin detec
 # bad:  [c20c5a84] ALSA: hda_intel: activate COMBO mode for Intel cli


- Steven
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

67 matches

Mail list logo