from:"Mikael Pettersson"

[REGRESSION][BISECTED] 5.9-rc4 disables console on radeon

2020-09-08 Thread Mikael Pettersson

Starting with linux-5.9-rc4, the Dell monitor on my desktop PC goes
black during boot
when the kernel activates the framebuffer console, except for this
error message shown
in the center of the screen:

"Dell U2412M
 The current input timing is not supported by the monitor display. Please
 change your input timing to 1920x1200@60Hz or any other monitor
 listed timing as per the monitor specifications.
 "

The monitor remains black until I reboot.

All kernels up to and including 5.9-rc3 were Ok.  A git bisect identified

# first bad commit: [fc8c70526bd30733ea8667adb8b8ffebea30a8ed]
drm/radeon: Prefer lower feedback dividers

as the culprit, and reverting that from -rc4 makes the console work again.

Adding a bit of debugging code to that function shows:

avivo_get_fb_ref_div: fb_div_new 136 fb_div_old 137
avivo_get_fb_ref_div: fb_div_new 142 fb_div_old 143
avivo_get_fb_ref_div: fb_div_new 136 fb_div_old 137
avivo_get_fb_ref_div: fb_div_new 119 fb_div_old 120
avivo_get_fb_ref_div: fb_div_new 136 fb_div_old 137

during boot, where "new" is what the commit above changed the code to compute,
and "old" is the value computed by the working code from rc3.

The graphics card is a Radeon HD6450 silent model:
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
[AMD/ATI] Caicos [Radeon HD 6450/7450/8450 / R5 230 OEM] (prog-if 00
[VGA controller])

Re: [PATCH 09/16] sparc64: use the generic get_user_pages_fast code

2019-08-10 Thread Mikael Pettersson

For the record the futex test case OOPSes a 5.3-rc3 kernel running on
a Sun Blade 2500 (2 x USIIIi).  This system runs a custom distro with
a custom toolchain (gcc-8.3 based), so I doubt it's a distro problem.

On Sat, Aug 10, 2019 at 9:17 AM Christoph Hellwig  wrote:
>
> There isn't really a way to use an arch-specific get_user_pages_fast
> in mainline, you'd need to revert the whole series.  As a relatively
> quick workaround you can just remove the
>
> select HAVE_FAST_GUP if SPARC64
>
> line from arch/sparc/Kconfig

Re: [5.0-rc5 regression] "scsi: kill off the legacy IO path" causes 5 minute delay during boot on Sun Blade 2500

2019-02-14 Thread Mikael Pettersson

On Sun, Feb 10, 2019 at 5:05 PM Jens Axboe  wrote:
>
> On 2/10/19 8:44 AM, James Bottomley wrote:
> > On Sun, 2019-02-10 at 10:17 +0100, Mikael Pettersson wrote:
> >> On Sat, Feb 9, 2019 at 7:19 PM James Bottomley
> >>  wrote:
> > [...]
> >>> I think the reason for this is that the block mq path doesn't feed
> >>> the kernel entropy pool correctly, hence the need to install an
> >>> entropy gatherer for systems that don't have other good random
> >>> number sources.
> >>
> >> That does sound plausible, I admit I didn't even consider the
> >> possibility that the old block I/O path also was an entropy source.
> >
> > In theory, the new one should be as well since the rotational entropy
> > collector is on the SCSI completion path.   I'd seen the same problem
> > but had assumed it was something someone had done to our internal
> > entropy pool and thus hadn't bisected it.
>
> The difference is that the old stack included ADD_RANDOM by default,
> so this check:
>
> if (blk_queue_add_random(q))
> add_disk_randomness(req->rq_disk);
>
> in scsi_end_request() would be true, and we'd add the randomness. For
> sd, it seems to set it just fine for non-rotational drives. Could this
> be because other devices don't? Maybe the below makes a difference.
>
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 6d65ac584eba..60e029911755 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1881,6 +1881,7 @@ struct request_queue *scsi_mq_alloc_queue(struct 
> scsi_device *sdev)
> sdev->request_queue->queuedata = sdev;
> __scsi_init_queue(sdev->host, sdev->request_queue);
> blk_queue_flag_set(QUEUE_FLAG_SCSI_PASSTHROUGH, sdev->request_queue);
> +   blk_queue_flag_set(QUEUE_FLAG_ADD_RANDOM, sdev->request_queue);
> return sdev->request_queue;
>  }

This patch eliminates my 5 minute boot-up delay problem.

/Mikael

Re: [5.0-rc5 regression] "scsi: kill off the legacy IO path" causes 5 minute delay during boot on Sun Blade 2500

2019-02-10 Thread Mikael Pettersson

On Sat, Feb 9, 2019 at 7:19 PM James Bottomley
 wrote:
>
> On Sat, 2019-02-09 at 18:04 +0100, Mikael Pettersson wrote:
> > 4.20 and earlier kernels boot fine on my Sun Blade 2500 (UltraSPARC
> > IIIi), but the 5.0-rc kernels consistently experience a 5 minute
> > delay
> > late during boot, after enabling networking but before allowing user
> > logins.  E.g. 5.0-rc5 dmesg has:
> >
> > [Fri Feb  8 17:13:17 2019] random: dbus-daemon: uninitialized urandom
> > read (12 bytes read)
> > [Fri Feb  8 17:18:14 2019] random: crng init done
>
> I've had the same problem on several of my test systems.  Are you sure
> it's not this bug report:
>
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=912087
>
> ?
>
> The solution for me was to install the haveged package which does
> active entropy gathering during boot and can make /dev/urandom
> available much earlier.

Thanks for the hint, I'll look into using haveged on this machine.

>
> > During this interval the machine answers pings but won't allow user
> > logins either on the console or over the network.
> >
> > A git bisect identified commit
> > f664a3cc17b7d0a2bc3b3ab96181e1029b0ec0e6
> > Author: Jens Axboe 
> > Date:   Thu Nov 1 16:36:27 2018 -0600
> >
> > scsi: kill off the legacy IO path
> >
> > as the point where this 5m delay was introduced.
>
> I think the reason for this is that the block mq path doesn't feed the
> kernel entropy pool correctly, hence the need to install an entropy
> gatherer for systems that don't have other good random number sources.

That does sound plausible, I admit I didn't even consider the possibility that
the old block I/O path also was an entropy source.

/Mikael

[5.0-rc5 regression] "scsi: kill off the legacy IO path" causes 5 minute delay during boot on Sun Blade 2500

2019-02-09 Thread Mikael Pettersson

4.20 and earlier kernels boot fine on my Sun Blade 2500 (UltraSPARC
IIIi), but the 5.0-rc kernels consistently experience a 5 minute delay
late during boot, after enabling networking but before allowing user
logins.  E.g. 5.0-rc5 dmesg has:

[Fri Feb  8 17:13:17 2019] random: dbus-daemon: uninitialized urandom
read (12 bytes read)
[Fri Feb  8 17:18:14 2019] random: crng init done

During this interval the machine answers pings but won't allow user
logins either on the console or over the network.

A git bisect identified commit f664a3cc17b7d0a2bc3b3ab96181e1029b0ec0e6
Author: Jens Axboe 
Date:   Thu Nov 1 16:36:27 2018 -0600

scsi: kill off the legacy IO path

as the point where this 5m delay was introduced.

My older kernels all have CONFIG_SCSI_MQ_DEFAULT=N, which the above
commit effectively forces to Y.
Rebuilding 4.20 with CONFIG_SCSI_MQ_DEFAULT=Y also triggers the 5m
delay behaviour.

I haven't seen this behaviour on my x86-64 boxes, so presumably it's
related to the sparc64 kernel or this machine's SCSI adapter.

.config and dmesg below.

/Mikael

#
# Automatically generated file; DO NOT EDIT.
# Linux/sparc64 4.20.0 Kernel Configuration
#

#
# Compiler: gcc (GCC) 7.4.1 20181227
#
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=70401
CONFIG_CLANG_VERSION=0
CONFIG_IRQ_WORK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION="-blkmq"
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_BUILD_SALT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
# CONFIG_POSIX_MQUEUE is not set
# CONFIG_CROSS_MEMORY_ATTACH is not set
# CONFIG_USELIB is not set
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_IRQ_PREFLOW_FASTEOI=y
CONFIG_IRQ_DOMAIN=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_SPARSE_IRQ=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_PSI is not set
# CONFIG_CPU_ISOLATION is not set

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=17
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_CGROUPS=y
# CONFIG_MEMCG is not set
# CONFIG_BLK_CGROUP is not set
# CONFIG_CGROUP_SCHED is not set
# CONFIG_CGROUP_PIDS is not set
# CONFIG_CGROUP_RDMA is not set
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CGROUP_HUGETLB is not set
# CONFIG_CPUSETS is not set
# CONFIG_CGROUP_DEVICE is not set
# CONFIG_CGROUP_CPUACCT is not set
# CONFIG_CGROUP_DEBUG is not set
# CONFIG_NAMESPACES is not set
# CONFIG_CHECKPOINT_RESTORE is not set
# CONFIG_SCHED_AUTOGROUP is not set
# CONFIG_SYSFS_DEPRECATED is not set
# CONFIG_RELAY is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
# CONFIG_RD_BZIP2 is not set
# CONFIG_RD_LZMA is not set
# CONFIG_RD_XZ is not set
# CONFIG_RD_LZO is not set
# CONFIG_RD_LZ4 is not set
# CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE is not set
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
CONFIG_HAVE_UID16=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_BPF=y
CONFIG_EXPERT=y
CONFIG_UID16=y
CONFIG_MULTIUSER=y
# CONFIG_SGETMASK_SYSCALL is not set
# CONFIG_SYSFS_SYSCALL is not set
# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_FHANDLE=y
CONFIG_POSIX_TIMERS=y
CONFIG_PRINTK=y
CONFIG_PRINTK_NMI=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_FUTEX_PI=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
# CONFIG_AIO is not set
CONFIG_ADVISE_SYSCALLS=y
# CONFIG_MEMBARRIER is not set
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
CONFIG_KALLSYMS_BASE_RELATIVE=y
# CONFIG_BPF_SYSCALL is not set
# CONFIG_USERFAULTFD is not set
CONFIG_EMBEDDED=y
CONFIG_HAVE_PERF_EVENTS=y
CONFIG_PERF_USE_VMALLOC=y
# CONFIG_PC104 is not set

#
# Kernel Performance Events And Counters
#
# CONFIG_PERF_EVENTS is not set
# CONFIG_VM_EVENT_COUNTERS is not set
# CONFIG_SLUB_DEBUG is not set
# CONFIG_COMPAT_BRK is not set
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_SLAB_MERGE_DEFAULT=y
# CONFIG_SLAB_FREELIST_RANDOM is not set
# CONFIG_SLAB_FREELIST_HARDENED is not set
CONFIG_SLUB_CPU_PARTIAL=y
# CONFIG_PROFILING is not set
CONFIG_64BIT=y
CONFIG_SPARC=y
CONFIG_SPARC64=y
CONFIG_ARCH_DEFCONFIG="arch/sparc/configs/sparc64_defconfig"
CONFIG_ARCH_PROC_KCORE_TEXT=y
CONFIG_CPU_BIG_ENDIAN=y
CONFIG_ARCH_ATU=y

Re: [PATCH] mm: disable `vm.max_map_count' sysctl limit

2017-11-27 Thread Mikael Pettersson

On Mon, Nov 27, 2017 at 6:25 PM, Andi Kleen  wrote:
> It's an arbitrary scaling limit on the how many mappings the process
> has. The more memory you have the bigger a problem it is. We've
> ran into this problem too on larger systems.
>
> The reason the limit was there originally because it allows a DoS
> attack against the kernel by filling all unswappable memory up with VMAs.
>
> The old limit was designed for much smaller systems than we have
> today.
>
> There needs to be some limit, but it should be on the number of memory
> pinned by the VMAs, and needs to scale with the available memory,
> so that large systems are not penalized.

Fully agreed.  One problem with the current limit is that number of VMAs
is only weakly related to the amount of memory one has mapped, and is
also prone to grow due to memory fragmentation.  I've seen processes
differ by 3X number of VMAs, even though they ran the same code and
had similar memory sizes; they only differed on how long they had been
running and which servers they ran on (and how long those had been up).

> Unfortunately just making it part of the existing mlock limit could
> break some existing setups which max out the mlock limit with something
> else. Maybe we need a new rlimit for this?
>
> -Andi

Re: [PATCH] mm: disable `vm.max_map_count' sysctl limit

2017-11-27 Thread Mikael Pettersson

On Mon, Nov 27, 2017 at 6:25 PM, Andi Kleen  wrote:
> It's an arbitrary scaling limit on the how many mappings the process
> has. The more memory you have the bigger a problem it is. We've
> ran into this problem too on larger systems.
>
> The reason the limit was there originally because it allows a DoS
> attack against the kernel by filling all unswappable memory up with VMAs.
>
> The old limit was designed for much smaller systems than we have
> today.
>
> There needs to be some limit, but it should be on the number of memory
> pinned by the VMAs, and needs to scale with the available memory,
> so that large systems are not penalized.

Fully agreed.  One problem with the current limit is that number of VMAs
is only weakly related to the amount of memory one has mapped, and is
also prone to grow due to memory fragmentation.  I've seen processes
differ by 3X number of VMAs, even though they ran the same code and
had similar memory sizes; they only differed on how long they had been
running and which servers they ran on (and how long those had been up).

> Unfortunately just making it part of the existing mlock limit could
> break some existing setups which max out the mlock limit with something
> else. Maybe we need a new rlimit for this?
>
> -Andi

Re: [PATCH] mm: disable `vm.max_map_count' sysctl limit

2017-11-27 Thread Mikael Pettersson

On Mon, Nov 27, 2017 at 5:22 PM, Matthew Wilcox  wrote:
>> Could you be more explicit about _why_ we need to remove this tunable?
>> I am not saying I disagree, the removal simplifies the code but I do not
>> really see any justification here.
>
> I imagine he started seeing random syscalls failing with ENOMEM and
> eventually tracked it down to this stupid limit we used to need.

Exactly, except the origin (mmap() failing) was hidden behind layers upon layers
of user-space memory management code (not ours), which just said "failed to
allocate N bytes" (with N about 0.001% of the free RAM).  And it
wasn't reproducible.

Re: [PATCH] mm: disable `vm.max_map_count' sysctl limit

2017-11-27 Thread Mikael Pettersson

On Mon, Nov 27, 2017 at 5:22 PM, Matthew Wilcox  wrote:
>> Could you be more explicit about _why_ we need to remove this tunable?
>> I am not saying I disagree, the removal simplifies the code but I do not
>> really see any justification here.
>
> I imagine he started seeing random syscalls failing with ENOMEM and
> eventually tracked it down to this stupid limit we used to need.

Exactly, except the origin (mmap() failing) was hidden behind layers upon layers
of user-space memory management code (not ours), which just said "failed to
allocate N bytes" (with N about 0.001% of the free RAM).  And it
wasn't reproducible.

Re: [PATCH] mm: disable `vm.max_map_count' sysctl limit

2017-11-27 Thread Mikael Pettersson

On Mon, Nov 27, 2017 at 11:12 AM, Michal Hocko  wrote:
> > I've kept the kernel tunable to not break the API towards user-space,
> > but it's a no-op now.  Also the distinction between split_vma() and
> > __split_vma() disappears, so they are merged.
>
> Could you be more explicit about _why_ we need to remove this tunable?
> I am not saying I disagree, the removal simplifies the code but I do not
> really see any justification here.

In principle you don't "need" to, as those that know about it can bump it
to some insanely high value and get on with life.  Meanwhile those that don't
(and I was one of them until fairly recently, and I'm no newcomer to Unix or
Linux) get to scratch their heads and wonder why the kernel says ENOMEM
when one has loads of free RAM.

But what _is_ the justification for having this arbitrary limit?
There might have
been historical reasons, but at least ELF core dumps are no longer a problem.

/Mikael

Re: [PATCH] mm: disable `vm.max_map_count' sysctl limit

2017-11-27 Thread Mikael Pettersson

On Mon, Nov 27, 2017 at 11:12 AM, Michal Hocko  wrote:
> > I've kept the kernel tunable to not break the API towards user-space,
> > but it's a no-op now.  Also the distinction between split_vma() and
> > __split_vma() disappears, so they are merged.
>
> Could you be more explicit about _why_ we need to remove this tunable?
> I am not saying I disagree, the removal simplifies the code but I do not
> really see any justification here.

In principle you don't "need" to, as those that know about it can bump it
to some insanely high value and get on with life.  Meanwhile those that don't
(and I was one of them until fairly recently, and I'm no newcomer to Unix or
Linux) get to scratch their heads and wonder why the kernel says ENOMEM
when one has loads of free RAM.

But what _is_ the justification for having this arbitrary limit?
There might have
been historical reasons, but at least ELF core dumps are no longer a problem.

/Mikael

[PATCH] mm: disable `vm.max_map_count' sysctl limit

2017-11-26 Thread Mikael Pettersson

The `vm.max_map_count' sysctl limit is IMO useless and confusing, so
this patch disables it.

- Old ELF had a limit of 64K segments, making core dumps from processes
  with more mappings than that problematic, but that was fixed in March
  2010 ("elf coredump: add extended numbering support").

- There are no internal data structures sized by this limit, making it
  entirely artificial.

- When viewed as a limit on memory consumption, it is ineffective since
  the number of mappings does not correspond directly to the amount of
  memory consumed, since each mapping is variable-length.

- Reaching the limit causes various memory management system calls to
  fail with ENOMEM, which is a lie.  Combined with the unpredictability
  of the number of mappings in a process, especially when non-trivial
  memory management or heavy file mapping is used, it can be difficult
  to reproduce these events and debug them.  It's also confusing to get
  ENOMEM when you know you have lots of free RAM.

This limit was apparently introduced in the 2.1.80 kernel (first as a
compile-time constant, later changed to a sysctl), but I haven't been
able to find any description for it in Git or the LKML archives, so I
don't know what the original motivation was.

I've kept the kernel tunable to not break the API towards user-space,
but it's a no-op now.  Also the distinction between split_vma() and
__split_vma() disappears, so they are merged.

Tested on x86_64 with Fedora 26 user-space.  Also built an ARM NOMMU
kernel to make sure NOMMU compiles and links cleanly.

Signed-off-by: Mikael Pettersson <mikpeli...@gmail.com>
---
 Documentation/sysctl/vm.txt   | 17 +-
 Documentation/vm/ksm.txt  |  4 
 Documentation/vm/remap_file_pages.txt |  4 
 fs/binfmt_elf.c   |  4 
 include/linux/mm.h| 23 ---
 kernel/sysctl.c   |  3 +++
 mm/madvise.c  | 12 ++
 mm/mmap.c | 42 ++-
 mm/mremap.c   |  7 --
 mm/nommu.c|  3 ---
 mm/util.c |  1 -
 11 files changed, 13 insertions(+), 107 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index b920423f88cb..0fcb511d07e6 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -35,7 +35,7 @@ Currently, these files are in /proc/sys/vm:
 - laptop_mode
 - legacy_va_layout
 - lowmem_reserve_ratio
-- max_map_count
+- max_map_count (unused, kept for backwards compatibility)
 - memory_failure_early_kill
 - memory_failure_recovery
 - min_free_kbytes
@@ -400,21 +400,6 @@ The minimum value is 1 (1/1 -> 100%).
 
 ==
 
-max_map_count:
-
-This file contains the maximum number of memory map areas a process
-may have. Memory map areas are used as a side-effect of calling
-malloc, directly by mmap, mprotect, and madvise, and also when loading
-shared libraries.
-
-While most applications need less than a thousand maps, certain
-programs, particularly malloc debuggers, may consume lots of them,
-e.g., up to one or two maps per allocation.
-
-The default value is 65536.
-
-=
-
 memory_failure_early_kill:
 
 Control how to kill processes when uncorrected memory error (typically
diff --git a/Documentation/vm/ksm.txt b/Documentation/vm/ksm.txt
index 6686bd267dc9..4a917f88cb11 100644
--- a/Documentation/vm/ksm.txt
+++ b/Documentation/vm/ksm.txt
@@ -38,10 +38,6 @@ the range for whenever the KSM daemon is started; even if 
the range
 cannot contain any pages which KSM could actually merge; even if
 MADV_UNMERGEABLE is applied to a range which was never MADV_MERGEABLE.
 
-If a region of memory must be split into at least one new MADV_MERGEABLE
-or MADV_UNMERGEABLE region, the madvise may return ENOMEM if the process
-will exceed vm.max_map_count (see Documentation/sysctl/vm.txt).
-
 Like other madvise calls, they are intended for use on mapped areas of
 the user address space: they will report ENOMEM if the specified range
 includes unmapped gaps (though working on the intervening mapped areas),
diff --git a/Documentation/vm/remap_file_pages.txt 
b/Documentation/vm/remap_file_pages.txt
index f609142f406a..85985a89f05d 100644
--- a/Documentation/vm/remap_file_pages.txt
+++ b/Documentation/vm/remap_file_pages.txt
@@ -21,7 +21,3 @@ systems are widely available.
 The syscall is deprecated and replaced it with an emulation now. The
 emulation creates new VMAs instead of nonlinear mappings. It's going to
 work slower for rare users of remap_file_pages() but ABI is preserved.
-
-One side effect of emulation (apart from performance) is that user can hit
-vm.max_map_count limit more easily due to additional VMAs. See comment for
-DEFAULT_MAX_MAP_COUNT for more d

[PATCH] mm: disable `vm.max_map_count' sysctl limit

2017-11-26 Thread Mikael Pettersson

The `vm.max_map_count' sysctl limit is IMO useless and confusing, so
this patch disables it.

- Old ELF had a limit of 64K segments, making core dumps from processes
  with more mappings than that problematic, but that was fixed in March
  2010 ("elf coredump: add extended numbering support").

- There are no internal data structures sized by this limit, making it
  entirely artificial.

- When viewed as a limit on memory consumption, it is ineffective since
  the number of mappings does not correspond directly to the amount of
  memory consumed, since each mapping is variable-length.

- Reaching the limit causes various memory management system calls to
  fail with ENOMEM, which is a lie.  Combined with the unpredictability
  of the number of mappings in a process, especially when non-trivial
  memory management or heavy file mapping is used, it can be difficult
  to reproduce these events and debug them.  It's also confusing to get
  ENOMEM when you know you have lots of free RAM.

This limit was apparently introduced in the 2.1.80 kernel (first as a
compile-time constant, later changed to a sysctl), but I haven't been
able to find any description for it in Git or the LKML archives, so I
don't know what the original motivation was.

I've kept the kernel tunable to not break the API towards user-space,
but it's a no-op now.  Also the distinction between split_vma() and
__split_vma() disappears, so they are merged.

Tested on x86_64 with Fedora 26 user-space.  Also built an ARM NOMMU
kernel to make sure NOMMU compiles and links cleanly.

Signed-off-by: Mikael Pettersson 
---
 Documentation/sysctl/vm.txt   | 17 +-
 Documentation/vm/ksm.txt  |  4 
 Documentation/vm/remap_file_pages.txt |  4 
 fs/binfmt_elf.c   |  4 
 include/linux/mm.h| 23 ---
 kernel/sysctl.c   |  3 +++
 mm/madvise.c  | 12 ++
 mm/mmap.c | 42 ++-
 mm/mremap.c   |  7 --
 mm/nommu.c|  3 ---
 mm/util.c |  1 -
 11 files changed, 13 insertions(+), 107 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index b920423f88cb..0fcb511d07e6 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -35,7 +35,7 @@ Currently, these files are in /proc/sys/vm:
 - laptop_mode
 - legacy_va_layout
 - lowmem_reserve_ratio
-- max_map_count
+- max_map_count (unused, kept for backwards compatibility)
 - memory_failure_early_kill
 - memory_failure_recovery
 - min_free_kbytes
@@ -400,21 +400,6 @@ The minimum value is 1 (1/1 -> 100%).
 
 ==
 
-max_map_count:
-
-This file contains the maximum number of memory map areas a process
-may have. Memory map areas are used as a side-effect of calling
-malloc, directly by mmap, mprotect, and madvise, and also when loading
-shared libraries.
-
-While most applications need less than a thousand maps, certain
-programs, particularly malloc debuggers, may consume lots of them,
-e.g., up to one or two maps per allocation.
-
-The default value is 65536.
-
-=
-
 memory_failure_early_kill:
 
 Control how to kill processes when uncorrected memory error (typically
diff --git a/Documentation/vm/ksm.txt b/Documentation/vm/ksm.txt
index 6686bd267dc9..4a917f88cb11 100644
--- a/Documentation/vm/ksm.txt
+++ b/Documentation/vm/ksm.txt
@@ -38,10 +38,6 @@ the range for whenever the KSM daemon is started; even if 
the range
 cannot contain any pages which KSM could actually merge; even if
 MADV_UNMERGEABLE is applied to a range which was never MADV_MERGEABLE.
 
-If a region of memory must be split into at least one new MADV_MERGEABLE
-or MADV_UNMERGEABLE region, the madvise may return ENOMEM if the process
-will exceed vm.max_map_count (see Documentation/sysctl/vm.txt).
-
 Like other madvise calls, they are intended for use on mapped areas of
 the user address space: they will report ENOMEM if the specified range
 includes unmapped gaps (though working on the intervening mapped areas),
diff --git a/Documentation/vm/remap_file_pages.txt 
b/Documentation/vm/remap_file_pages.txt
index f609142f406a..85985a89f05d 100644
--- a/Documentation/vm/remap_file_pages.txt
+++ b/Documentation/vm/remap_file_pages.txt
@@ -21,7 +21,3 @@ systems are widely available.
 The syscall is deprecated and replaced it with an emulation now. The
 emulation creates new VMAs instead of nonlinear mappings. It's going to
 work slower for rare users of remap_file_pages() but ABI is preserved.
-
-One side effect of emulation (apart from performance) is that user can hit
-vm.max_map_count limit more easily due to additional VMAs. See comment for
-DEFAULT_MAX_MAP_COUNT for more details on the limit.
di

Re: Possible gcc 4.8.5 bug about RELOC_HIDE marcro

2017-09-21 Thread Mikael Pettersson

Jia He writes:
 > I tried to build kernel 4.14-rc1 on a arm64 server in distro centos 7.3. 
 > The gcc version is 4.8.5

I have no input on the specifics of the issue, but please note that gcc-4.8
is no longer supported or maintained by upstream.  Even gcc-4.9 is EOL, and
gcc-5 will be EOL:d inn a week or so.  Also I'll note that gcc-4.8 is still
fairly early wrt to arm64 support, so bugs may be expected.

In short, please retry with gcc-6.4 or gcc-7.2.

Re: Possible gcc 4.8.5 bug about RELOC_HIDE marcro

2017-09-21 Thread Mikael Pettersson

Jia He writes:
 > I tried to build kernel 4.14-rc1 on a arm64 server in distro centos 7.3. 
 > The gcc version is 4.8.5

I have no input on the specifics of the issue, but please note that gcc-4.8
is no longer supported or maintained by upstream.  Even gcc-4.9 is EOL, and
gcc-5 will be EOL:d inn a week or so.  Also I'll note that gcc-4.8 is still
fairly early wrt to arm64 support, so bugs may be expected.

In short, please retry with gcc-6.4 or gcc-7.2.

Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels

2017-08-04 Thread Mikael Pettersson

David Miller writes:
 > From: Mikael Pettersson <mikpeli...@gmail.com>
 > Date: Thu, 3 Aug 2017 22:02:57 +0200
 > 
 > > With that in place the kernel booted fine.
 > > When I then ran the `poll' strace test binary, the OOPS was replaced by:
 > > 
 > > [  140.589913] _copy_from_user(fff000123c8dfa7c,   (null), 240) 
 > > res 240
 > > [  140.753162] _copy_from_user(fff000123c8dfa7c, f7e4a000, 8) res 8
 > > [  140.824155] _copy_from_user(fff000123c8dfa7c, f7e49ff8, 16) res 
 > > 18442240552407530112
 > > 
 > > That last `res' doesn't look good.
 > 
 > Please test this patch:
 > 
 > diff --git a/arch/sparc/lib/U3memcpy.S b/arch/sparc/lib/U3memcpy.S
 > index 54f9870..5a8cb37 100644
 > --- a/arch/sparc/lib/U3memcpy.S
 > +++ b/arch/sparc/lib/U3memcpy.S
 > @@ -145,13 +145,13 @@ ENDPROC(U3_retl_o2_plus_GS_plus_0x08)
 >  ENTRY(U3_retl_o2_and_7_plus_GS)
 >  and %o2, 7, %o2
 >  retl
 > - add%o2, GLOBAL_SPARE, %o2
 > + add%o2, GLOBAL_SPARE, %o0
 >  ENDPROC(U3_retl_o2_and_7_plus_GS)
 >  ENTRY(U3_retl_o2_and_7_plus_GS_plus_8)
 >  add GLOBAL_SPARE, 8, GLOBAL_SPARE
 >  and %o2, 7, %o2
 >  retl
 > - add%o2, GLOBAL_SPARE, %o2
 > + add%o2, GLOBAL_SPARE, %o0
 >  ENDPROC(U3_retl_o2_and_7_plus_GS_plus_8)
 >  #endif
 >  

Backing out my debugging patch and adding this one instead
gave me a working kernel that doesn't OOPS.  Thanks.

Tested-by: Mikael Pettersson <mikpeli...@gmail.com>

Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels

2017-08-04 Thread Mikael Pettersson

David Miller writes:
 > From: Mikael Pettersson 
 > Date: Thu, 3 Aug 2017 22:02:57 +0200
 > 
 > > With that in place the kernel booted fine.
 > > When I then ran the `poll' strace test binary, the OOPS was replaced by:
 > > 
 > > [  140.589913] _copy_from_user(fff000123c8dfa7c,   (null), 240) 
 > > res 240
 > > [  140.753162] _copy_from_user(fff000123c8dfa7c, f7e4a000, 8) res 8
 > > [  140.824155] _copy_from_user(fff000123c8dfa7c, f7e49ff8, 16) res 
 > > 18442240552407530112
 > > 
 > > That last `res' doesn't look good.
 > 
 > Please test this patch:
 > 
 > diff --git a/arch/sparc/lib/U3memcpy.S b/arch/sparc/lib/U3memcpy.S
 > index 54f9870..5a8cb37 100644
 > --- a/arch/sparc/lib/U3memcpy.S
 > +++ b/arch/sparc/lib/U3memcpy.S
 > @@ -145,13 +145,13 @@ ENDPROC(U3_retl_o2_plus_GS_plus_0x08)
 >  ENTRY(U3_retl_o2_and_7_plus_GS)
 >  and %o2, 7, %o2
 >  retl
 > - add%o2, GLOBAL_SPARE, %o2
 > + add%o2, GLOBAL_SPARE, %o0
 >  ENDPROC(U3_retl_o2_and_7_plus_GS)
 >  ENTRY(U3_retl_o2_and_7_plus_GS_plus_8)
 >  add GLOBAL_SPARE, 8, GLOBAL_SPARE
 >  and %o2, 7, %o2
 >  retl
 > - add%o2, GLOBAL_SPARE, %o2
 > + add%o2, GLOBAL_SPARE, %o0
 >  ENDPROC(U3_retl_o2_and_7_plus_GS_plus_8)
 >  #endif
 >  

Backing out my debugging patch and adding this one instead
gave me a working kernel that doesn't OOPS.  Thanks.

Tested-by: Mikael Pettersson

Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels

2017-08-03 Thread Mikael Pettersson

Sam Ravnborg writes:
 > On Tue, Aug 01, 2017 at 10:58:29PM +0200, Sam Ravnborg wrote:
 > > Hi Mikael.
 > > 
 > > I think this translates to the following code
 > > from linux/uaccess.h
 > > 
 > > first part is the inlined _copy_from_user()
 > > 
 > > > 
 > > > (gdb) x/10i do_sys_poll+0x80-16
 > > >0x516ed0 :  brz  %o0, 0x5170fc 
 > > if (unlikely(res))
 > > 
 > > >0x516ed4 :  mov  %o0, %o2
 > > >0x516ed8 :  sub  %i4, %o0, %i4
 > > >0x516edc :  clr  %o1
 > > >0x516ee0 :  call  0x7570b8 
 > > >0x516ee4 :  add  %l3, %i4, %o0
 > > memset(to + (n - res), 0, res);
 > 
 > And memset calls down to bzero, where %o0=buf, %o1=len
 > 
 > %o0 = 0xc
 > %o1 = 0xfff000123c897a80
 > %o2 = 0x0
 > %o3 = 0xc
 > 
 > So from this we know that:
 > res = 0xfff000123c897a80
 > to + (n - 0xfff000123c897a80)) = 0xc
 > 
 > The value "fff000123c897a80" really looks like a constructed address
 > from somewhere in the strace code, and where this constructed address
 > is used to provoke some unusual behaviour.
 > The "fff0" part may be a sparc thing.
 > 
 > So far the analysis seems to match the intial conclusion that
 > we in this special case try to zero out the remaining memory
 > based on the return value of raw_copy_from_user.
 > And therefore we use the return value (res) which triggers the oops.
 > 
 > So rather than manipulating with the assembler code as suggested
 > in the previous mail this simpler patch could be tested:
 > 
 > diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
 > index acdd6f915a8d..13d299ff1f21 100644
 > --- a/include/linux/uaccess.h
 > +++ b/include/linux/uaccess.h
 > @@ -115,7 +115,7 @@ _copy_from_user(void *to, const void __user *from, 
 > unsigned long n)
 >  res = raw_copy_from_user(to, from, n);
 >  }
 >  if (unlikely(res))
 > -memset(to + (n - res), 0, res);
 > +void: /*memset(to + (n - res), 0, res);*/
 >  return res;
 >  }
 >  #else
 > 
 > 
 > It would be good to know if this makes the opps go away.
 > 
 > And maybe you could try to print the parameters
 > supplied to _copy_from_user in case memset would be called,
 > so we have an idea what error path is taken.
 > 
 > I have tried to dechiper U3memcpy.S - but that is non-trivial.
 > So it would be good with a bit more data to verify the theory.

I applied the following:

--- linux-4.13-rc3/include/linux/uaccess.h.~1~  2017-08-01 08:49:48.397819726 
+0200
+++ linux-4.13-rc3/include/linux/uaccess.h  2017-08-03 21:33:11.009634421 
+0200
@@ -4,6 +4,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #define VERIFY_READ 0
 #define VERIFY_WRITE 1
@@ -115,7 +117,9 @@ _copy_from_user(void *to, const void __u
res = raw_copy_from_user(to, from, n);
}
if (unlikely(res))
-   memset(to + (n - res), 0, res);
+   {
+   printk_ratelimited("_copy_from_user(%p, %p, %lu) res %lu\n", 
to, from, n, res);
+   }
return res;
 }
 #else

With that in place the kernel booted fine.
When I then ran the `poll' strace test binary, the OOPS was replaced by:

[  140.589913] _copy_from_user(fff000123c8dfa7c,   (null), 240) res 240
[  140.753162] _copy_from_user(fff000123c8dfa7c, f7e4a000, 8) res 8
[  140.824155] _copy_from_user(fff000123c8dfa7c, f7e49ff8, 16) res 
18442240552407530112

That last `res' doesn't look good.

/Mikael

Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels

2017-08-03 Thread Mikael Pettersson

Sam Ravnborg writes:
 > On Tue, Aug 01, 2017 at 10:58:29PM +0200, Sam Ravnborg wrote:
 > > Hi Mikael.
 > > 
 > > I think this translates to the following code
 > > from linux/uaccess.h
 > > 
 > > first part is the inlined _copy_from_user()
 > > 
 > > > 
 > > > (gdb) x/10i do_sys_poll+0x80-16
 > > >0x516ed0 :  brz  %o0, 0x5170fc 
 > > if (unlikely(res))
 > > 
 > > >0x516ed4 :  mov  %o0, %o2
 > > >0x516ed8 :  sub  %i4, %o0, %i4
 > > >0x516edc :  clr  %o1
 > > >0x516ee0 :  call  0x7570b8 
 > > >0x516ee4 :  add  %l3, %i4, %o0
 > > memset(to + (n - res), 0, res);
 > 
 > And memset calls down to bzero, where %o0=buf, %o1=len
 > 
 > %o0 = 0xc
 > %o1 = 0xfff000123c897a80
 > %o2 = 0x0
 > %o3 = 0xc
 > 
 > So from this we know that:
 > res = 0xfff000123c897a80
 > to + (n - 0xfff000123c897a80)) = 0xc
 > 
 > The value "fff000123c897a80" really looks like a constructed address
 > from somewhere in the strace code, and where this constructed address
 > is used to provoke some unusual behaviour.
 > The "fff0" part may be a sparc thing.
 > 
 > So far the analysis seems to match the intial conclusion that
 > we in this special case try to zero out the remaining memory
 > based on the return value of raw_copy_from_user.
 > And therefore we use the return value (res) which triggers the oops.
 > 
 > So rather than manipulating with the assembler code as suggested
 > in the previous mail this simpler patch could be tested:
 > 
 > diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
 > index acdd6f915a8d..13d299ff1f21 100644
 > --- a/include/linux/uaccess.h
 > +++ b/include/linux/uaccess.h
 > @@ -115,7 +115,7 @@ _copy_from_user(void *to, const void __user *from, 
 > unsigned long n)
 >  res = raw_copy_from_user(to, from, n);
 >  }
 >  if (unlikely(res))
 > -memset(to + (n - res), 0, res);
 > +void: /*memset(to + (n - res), 0, res);*/
 >  return res;
 >  }
 >  #else
 > 
 > 
 > It would be good to know if this makes the opps go away.
 > 
 > And maybe you could try to print the parameters
 > supplied to _copy_from_user in case memset would be called,
 > so we have an idea what error path is taken.
 > 
 > I have tried to dechiper U3memcpy.S - but that is non-trivial.
 > So it would be good with a bit more data to verify the theory.

I applied the following:

--- linux-4.13-rc3/include/linux/uaccess.h.~1~  2017-08-01 08:49:48.397819726 
+0200
+++ linux-4.13-rc3/include/linux/uaccess.h  2017-08-03 21:33:11.009634421 
+0200
@@ -4,6 +4,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #define VERIFY_READ 0
 #define VERIFY_WRITE 1
@@ -115,7 +117,9 @@ _copy_from_user(void *to, const void __u
res = raw_copy_from_user(to, from, n);
}
if (unlikely(res))
-   memset(to + (n - res), 0, res);
+   {
+   printk_ratelimited("_copy_from_user(%p, %p, %lu) res %lu\n", 
to, from, n, res);
+   }
return res;
 }
 #else

With that in place the kernel booted fine.
When I then ran the `poll' strace test binary, the OOPS was replaced by:

[  140.589913] _copy_from_user(fff000123c8dfa7c,   (null), 240) res 240
[  140.753162] _copy_from_user(fff000123c8dfa7c, f7e4a000, 8) res 8
[  140.824155] _copy_from_user(fff000123c8dfa7c, f7e49ff8, 16) res 
18442240552407530112

That last `res' doesn't look good.

/Mikael

Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels

2017-08-01 Thread Mikael Pettersson

David Miller writes:
 > From: Anatoly Pugachev 
 > Date: Tue, 1 Aug 2017 01:01:47 +0300
 > 
 > > I don't know how to run on a running kernel , but as I understood:
 > > 
 > > root@v215:strace# gzip -dc /boot/vmlinuz-4.12.0 > vmlinux
 > > root@v215:strace# gdb -q vmlinux
 > > Reading symbols from vmlinux...(no debugging symbols found)...done.
 > > (gdb) x/20i 0x49b294 - 16
 > 
 > Unfortunately you need to do this on the build kernel image before it
 > has been stripped of all of it's symbols.
 > 
 > Mikael, you built your kernels right?
 > 
 > Go into one of your OOPS's and extract the "RPC: " hex value, and run
 > the gdb command:
 > 
 > bash$ cd src/linux
 > bash$ gdb ./vmlinux
 > (gdb) x/10i 0x${RPC_HEX_VALUE} - 16
 > 
 > Thanks.

Ok, with 4.13-rc3 I got

[  240.085153] Unable to handle kernel NULL pointer dereference
[  240.142397] tsk->{mm,active_mm}->context = 044a
[  240.198531] tsk->{mm,active_mm}->pgd = fff23c784000
[  240.250112]   \|/  \|/
 "@'/ .. \`@"
 /_| \__/ |_\
\__U_/
[  240.374879] poll(724): Oops [#1]
[  240.400132] CPU: 0 PID: 724 Comm: poll Not tainted 4.13.0-rc3 #1
[  240.462002] task: fff000123cc71e00 task.stack: fff000123c894000
[  240.522717] TSTATE: 004411001605 TPC: 007570fc TNPC: 
00757110 Y: Not tainted
[  240.634921] TPC: <__bzero+0x20/0xc0>
[  240.664747] g0: fff000123c897081 g1:  g2:  
g3: 008ca100
[  240.762068] g4: fff000123cc71e00 g5: fff23ef44000 g6: fff000123c894000 
g7: 0008
[  240.859389] o0: 000c o1: fff000123c897a80 o2:  
o3: 000c
[  240.956718] o4: fff000123c897a7c o5: 00fb sp: fff000123c897181 
ret_pc: 00516ee0
[  241.058627] RPC: 
[  241.094166] l0: 0002 l1: 014000c0 l2: 03fe 
l3: fff000123c897a7c
[  241.191506] l4:  l5:  l6: 006d 
l7: ffea
[  241.288822] i0: f7d93ff8 i1: 0002 i2: fff000123c897e90 
i3: fff000123c897a70
[  241.386141] i4: 000fffedc3768590 i5: fff000123c897a70 i6: fff000123c8975e1 
i7: 005177f8
[  241.483468] I7: 
[  241.513292] Call Trace:
[  241.528265]  [005177f8] SyS_poll+0x74/0xd0
[  241.574140]  [004061b4] linux_sparc_syscall32+0x34/0x60
[  241.634847] Disabling lock debugging due to kernel taint
[  241.687555] Caller[005177f8]: SyS_poll+0x74/0xd0
[  241.740276] Caller[004061b4]: linux_sparc_syscall32+0x34/0x60
[  241.807855] Caller[00010a20]: 0x10a20
[  241.847983] Instruction DUMP:
[  241.847987]  c56a2000 
[  241.869824]  808a2003 
[  241.883651]  02480006 
[  241.897475] 
[  241.911207]  90022001 
[  241.925032]  808a2003 
[  241.938755]  1247fffd 
[  241.952484]  92226001 
[  241.966310]  808a2007 

so the RPC should be do_sys_poll+0x80 right?  Then gdb on the original vmlinux 
said:

(gdb) x/10i do_sys_poll+0x80-16
   0x516ed0 :  brz  %o0, 0x5170fc 
   0x516ed4 :  mov  %o0, %o2
   0x516ed8 :  sub  %i4, %o0, %i4
   0x516edc :  clr  %o1
   0x516ee0 :  call  0x7570b8 
   0x516ee4 :  add  %l3, %i4, %o0
   0x516ee8 :  b  %xcc, 0x5170b0 
   0x516eec :  mov  -14, %l7
   0x516ef0 :  mov  %l2, %o0
   0x516ef4 :  movleu  %xcc, %l0, %o0
(gdb)

/Mikael

Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels

2017-08-01 Thread Mikael Pettersson

David Miller writes:
 > From: Anatoly Pugachev 
 > Date: Tue, 1 Aug 2017 01:01:47 +0300
 > 
 > > I don't know how to run on a running kernel , but as I understood:
 > > 
 > > root@v215:strace# gzip -dc /boot/vmlinuz-4.12.0 > vmlinux
 > > root@v215:strace# gdb -q vmlinux
 > > Reading symbols from vmlinux...(no debugging symbols found)...done.
 > > (gdb) x/20i 0x49b294 - 16
 > 
 > Unfortunately you need to do this on the build kernel image before it
 > has been stripped of all of it's symbols.
 > 
 > Mikael, you built your kernels right?
 > 
 > Go into one of your OOPS's and extract the "RPC: " hex value, and run
 > the gdb command:
 > 
 > bash$ cd src/linux
 > bash$ gdb ./vmlinux
 > (gdb) x/10i 0x${RPC_HEX_VALUE} - 16
 > 
 > Thanks.

Ok, with 4.13-rc3 I got

[  240.085153] Unable to handle kernel NULL pointer dereference
[  240.142397] tsk->{mm,active_mm}->context = 044a
[  240.198531] tsk->{mm,active_mm}->pgd = fff23c784000
[  240.250112]   \|/  \|/
 "@'/ .. \`@"
 /_| \__/ |_\
\__U_/
[  240.374879] poll(724): Oops [#1]
[  240.400132] CPU: 0 PID: 724 Comm: poll Not tainted 4.13.0-rc3 #1
[  240.462002] task: fff000123cc71e00 task.stack: fff000123c894000
[  240.522717] TSTATE: 004411001605 TPC: 007570fc TNPC: 
00757110 Y: Not tainted
[  240.634921] TPC: <__bzero+0x20/0xc0>
[  240.664747] g0: fff000123c897081 g1:  g2:  
g3: 008ca100
[  240.762068] g4: fff000123cc71e00 g5: fff23ef44000 g6: fff000123c894000 
g7: 0008
[  240.859389] o0: 000c o1: fff000123c897a80 o2:  
o3: 000c
[  240.956718] o4: fff000123c897a7c o5: 00fb sp: fff000123c897181 
ret_pc: 00516ee0
[  241.058627] RPC: 
[  241.094166] l0: 0002 l1: 014000c0 l2: 03fe 
l3: fff000123c897a7c
[  241.191506] l4:  l5:  l6: 006d 
l7: ffea
[  241.288822] i0: f7d93ff8 i1: 0002 i2: fff000123c897e90 
i3: fff000123c897a70
[  241.386141] i4: 000fffedc3768590 i5: fff000123c897a70 i6: fff000123c8975e1 
i7: 005177f8
[  241.483468] I7: 
[  241.513292] Call Trace:
[  241.528265]  [005177f8] SyS_poll+0x74/0xd0
[  241.574140]  [004061b4] linux_sparc_syscall32+0x34/0x60
[  241.634847] Disabling lock debugging due to kernel taint
[  241.687555] Caller[005177f8]: SyS_poll+0x74/0xd0
[  241.740276] Caller[004061b4]: linux_sparc_syscall32+0x34/0x60
[  241.807855] Caller[00010a20]: 0x10a20
[  241.847983] Instruction DUMP:
[  241.847987]  c56a2000 
[  241.869824]  808a2003 
[  241.883651]  02480006 
[  241.897475] 
[  241.911207]  90022001 
[  241.925032]  808a2003 
[  241.938755]  1247fffd 
[  241.952484]  92226001 
[  241.966310]  808a2007 

so the RPC should be do_sys_poll+0x80 right?  Then gdb on the original vmlinux 
said:

(gdb) x/10i do_sys_poll+0x80-16
   0x516ed0 :  brz  %o0, 0x5170fc 
   0x516ed4 :  mov  %o0, %o2
   0x516ed8 :  sub  %i4, %o0, %i4
   0x516edc :  clr  %o1
   0x516ee0 :  call  0x7570b8 
   0x516ee4 :  add  %l3, %i4, %o0
   0x516ee8 :  b  %xcc, 0x5170b0 
   0x516eec :  mov  -14, %l7
   0x516ef0 :  mov  %l2, %o0
   0x516ef4 :  movleu  %xcc, %l0, %o0
(gdb)

/Mikael

Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels

2017-07-31 Thread Mikael Pettersson

Mikael Pettersson writes:
 > Anatoly Pugachev writes:
 >  > On Fri, Jul 28, 2017 at 11:45 AM, Mikael Pettersson
 >  > <mikpeli...@gmail.com> wrote:
 >  > > It's an rpmbuild --rebuild of Fedora's strace-4.18-1.fc24.src.rpm, but 
 > according to the
 >  > > build log the following should do it:
 >  > >
 >  > > export CFLAGS='-O2 -g -pipe -Wall -Werror=format-security 
 > -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector 
 > --param=ssp-buffer-size=4 -grecord-gcc-switches  -m32 -mcpu=ultrasparc'
 >  > > ./configure --build=sparcv9-unknown-linux-gnu 
 > --host=sparcv9-unknown-linux-gnu --program-prefix= 
 > --disable-dependency-tracking --prefix=/usr --exec-prefix=/u
 >  > > sr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc 
 > --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib 
 > --libexecdir=/usr/libexec --local
 >  > > statedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man 
 > --infodir=/usr/share/info
 >  > > make -j2
 >  > > make -j2 -k check VERBOSE=1
 >  > 
 >  > cant' reproduce it here on debian sparc64 LDOM:
 > 
 > DaveM was also unable to reproduce it.
 > 
 > I'll be investigating a possible kernel miscompile next.

I don't think it's a miscompile.

First I recompiled 4.13-rc2 with each of gcc-7, gcc-6, and gcc-5, each
bootstrapped and regtested from the head of the respective FSF GCC branch:
no change, kernel 4.11 works while kernels >= 4.12 OOPS.  So a miscompile
seems unlikely.

Then I ran a git bisect between v4.11 (good) and v4.12 (bad), booting
each kernel and trying the problematic strace test binaries.  That
identified the following as the first bad commit:

commit 31af2f36d50e3b9b2fb7f17aa430c11c91f946c4
Author: Al Viro <v...@zeniv.linux.org.uk>
Date:   Tue Mar 21 17:04:45 2017 -0400

sparc: switch to RAW_COPY_USER

... and drop zeroing in sparc32.

Signed-off-by: Al Viro <v...@zeniv.linux.org.uk>

That touches the CPU model specific assembly code in arch/sparc/lib/ for
copy_{to,from}_user and changes how it's wired into the rest of the kernel.
There's different code for different UltraSPARC and Niagara generations,
so if there is a bug in e.g. the USIII code, you won't see it on Niagara.

Unfortunately I don't see anything obviously wrong in Al's patch...

/Mikael

Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels

2017-07-31 Thread Mikael Pettersson

Mikael Pettersson writes:
 > Anatoly Pugachev writes:
 >  > On Fri, Jul 28, 2017 at 11:45 AM, Mikael Pettersson
 >  >  wrote:
 >  > > It's an rpmbuild --rebuild of Fedora's strace-4.18-1.fc24.src.rpm, but 
 > according to the
 >  > > build log the following should do it:
 >  > >
 >  > > export CFLAGS='-O2 -g -pipe -Wall -Werror=format-security 
 > -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector 
 > --param=ssp-buffer-size=4 -grecord-gcc-switches  -m32 -mcpu=ultrasparc'
 >  > > ./configure --build=sparcv9-unknown-linux-gnu 
 > --host=sparcv9-unknown-linux-gnu --program-prefix= 
 > --disable-dependency-tracking --prefix=/usr --exec-prefix=/u
 >  > > sr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc 
 > --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib 
 > --libexecdir=/usr/libexec --local
 >  > > statedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man 
 > --infodir=/usr/share/info
 >  > > make -j2
 >  > > make -j2 -k check VERBOSE=1
 >  > 
 >  > cant' reproduce it here on debian sparc64 LDOM:
 > 
 > DaveM was also unable to reproduce it.
 > 
 > I'll be investigating a possible kernel miscompile next.

I don't think it's a miscompile.

First I recompiled 4.13-rc2 with each of gcc-7, gcc-6, and gcc-5, each
bootstrapped and regtested from the head of the respective FSF GCC branch:
no change, kernel 4.11 works while kernels >= 4.12 OOPS.  So a miscompile
seems unlikely.

Then I ran a git bisect between v4.11 (good) and v4.12 (bad), booting
each kernel and trying the problematic strace test binaries.  That
identified the following as the first bad commit:

commit 31af2f36d50e3b9b2fb7f17aa430c11c91f946c4
Author: Al Viro 
Date:   Tue Mar 21 17:04:45 2017 -0400

sparc: switch to RAW_COPY_USER

... and drop zeroing in sparc32.

Signed-off-by: Al Viro 

That touches the CPU model specific assembly code in arch/sparc/lib/ for
copy_{to,from}_user and changes how it's wired into the rest of the kernel.
There's different code for different UltraSPARC and Niagara generations,
so if there is a bug in e.g. the USIII code, you won't see it on Niagara.

Unfortunately I don't see anything obviously wrong in Al's patch...

/Mikael

Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels

2017-07-29 Thread Mikael Pettersson

Anatoly Pugachev writes:
 > On Fri, Jul 28, 2017 at 11:45 AM, Mikael Pettersson
 > <mikpeli...@gmail.com> wrote:
 > > It's an rpmbuild --rebuild of Fedora's strace-4.18-1.fc24.src.rpm, but 
 > > according to the
 > > build log the following should do it:
 > >
 > > export CFLAGS='-O2 -g -pipe -Wall -Werror=format-security 
 > > -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector 
 > > --param=ssp-buffer-size=4 -grecord-gcc-switches  -m32 -mcpu=ultrasparc'
 > > ./configure --build=sparcv9-unknown-linux-gnu 
 > > --host=sparcv9-unknown-linux-gnu --program-prefix= 
 > > --disable-dependency-tracking --prefix=/usr --exec-prefix=/u
 > > sr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc 
 > > --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib 
 > > --libexecdir=/usr/libexec --local
 > > statedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man 
 > > --infodir=/usr/share/info
 > > make -j2
 > > make -j2 -k check VERBOSE=1
 > 
 > cant' reproduce it here on debian sparc64 LDOM:

DaveM was also unable to reproduce it.

I'll be investigating a possible kernel miscompile next.


 > 
 > used git version of strace ( https://github.com/strace/strace )
 > 
 > strace$ make -j24 check VERBOSE=1
 > ...
 > 
 > Testsuite summary for strace 4.18.0.134.805d
 > 
 > # TOTAL: 443
 > # PASS:  389
 > # SKIP:  40
 > # XFAIL: 0
 > # FAIL:  14
 > # XPASS: 0
 > # ERROR: 0
 > 
 > 
 > while in kernel logs (journalctl -k -f):
 > 
 > Jul 29 12:49:22 ttip kernel: mmap: remap_file_page (77341) uses
 > deprecated remap_file_pages() syscall. See
 > Documentation/vm/remap_file_pages.txt.
 > Jul 29 12:49:22 ttip kernel: capability: warning: `caps' uses
 > deprecated v2 capabilities in a way that may be insecure
 > Jul 29 12:49:22 ttip kernel: capability: warning: `caps' uses 32-bit
 > capabilities (legacy support in use)
 > Jul 29 12:49:22 ttip kernel: [ cut here ]
 > Jul 29 12:49:22 ttip kernel: WARNING: CPU: 18 PID: 78388 at
 > arch/sparc/kernel/sys_sparc32.c:150
 > compat_SyS_sparc_sigaction+0x3c/0x60
 > Jul 29 12:49:22 ttip kernel: Modules linked in: tcp_diag inet_diag
 > xfrm_user xfrm_algo nfnetlink netlink_diag xt_tcpudp xt_multiport
 > xt_conntrack tun iptable_filter iptable_nat nf_conntrack_ipv4
 > nf_defrag_ipv4 nf_nat_ipv4 xfs camellia_sparc64 des_sparc64
 > des_generic aes_sparc64 n2_rng md5_sparc64 rng_core flash
 > sha512_sparc64 sha256_sparc64 sha1_sparc64 nf_nat_pptp
 > nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat
 > nf_conntrack libcrc32c crc32c_generic ip_tables x_tables autofs4 ext4
 > crc16 mbcache jbd2 crc32c_sparc64
 > Jul 29 12:49:22 ttip kernel: CPU: 18 PID: 78388 Comm: sigaction Not
 > tainted 4.13.0-rc2-00220-g0a07b238e5f4 #376
 > Jul 29 12:49:22 ttip kernel: Call Trace:
 > Jul 29 12:49:22 ttip kernel:  [0046c074] __warn+0xb4/0xe0
 > Jul 29 12:49:22 ttip kernel:  [0046c120] warn_slowpath_null+0x20/0x40
 > Jul 29 12:49:22 ttip kernel:  [0044b7bc]
 > compat_SyS_sparc_sigaction+0x3c/0x60
 > Jul 29 12:49:22 ttip kernel:  [004061d4] 
 > linux_sparc_syscall32+0x34/0x60
 > Jul 29 12:49:22 ttip kernel: ---[ end trace 1ad5184278304e6d ]---
 > Jul 29 12:49:25 ttip kernel: pc[83378]: segfault at 7974 ip
 > 7974 (rpc 796c) sp ffdd9438 error
 > 30001 in pc[7001+2000]

--

Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels

2017-07-29 Thread Mikael Pettersson

Anatoly Pugachev writes:
 > On Fri, Jul 28, 2017 at 11:45 AM, Mikael Pettersson
 >  wrote:
 > > It's an rpmbuild --rebuild of Fedora's strace-4.18-1.fc24.src.rpm, but 
 > > according to the
 > > build log the following should do it:
 > >
 > > export CFLAGS='-O2 -g -pipe -Wall -Werror=format-security 
 > > -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector 
 > > --param=ssp-buffer-size=4 -grecord-gcc-switches  -m32 -mcpu=ultrasparc'
 > > ./configure --build=sparcv9-unknown-linux-gnu 
 > > --host=sparcv9-unknown-linux-gnu --program-prefix= 
 > > --disable-dependency-tracking --prefix=/usr --exec-prefix=/u
 > > sr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc 
 > > --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib 
 > > --libexecdir=/usr/libexec --local
 > > statedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man 
 > > --infodir=/usr/share/info
 > > make -j2
 > > make -j2 -k check VERBOSE=1
 > 
 > cant' reproduce it here on debian sparc64 LDOM:

DaveM was also unable to reproduce it.

I'll be investigating a possible kernel miscompile next.


 > 
 > used git version of strace ( https://github.com/strace/strace )
 > 
 > strace$ make -j24 check VERBOSE=1
 > ...
 > 
 > Testsuite summary for strace 4.18.0.134.805d
 > 
 > # TOTAL: 443
 > # PASS:  389
 > # SKIP:  40
 > # XFAIL: 0
 > # FAIL:  14
 > # XPASS: 0
 > # ERROR: 0
 > 
 > 
 > while in kernel logs (journalctl -k -f):
 > 
 > Jul 29 12:49:22 ttip kernel: mmap: remap_file_page (77341) uses
 > deprecated remap_file_pages() syscall. See
 > Documentation/vm/remap_file_pages.txt.
 > Jul 29 12:49:22 ttip kernel: capability: warning: `caps' uses
 > deprecated v2 capabilities in a way that may be insecure
 > Jul 29 12:49:22 ttip kernel: capability: warning: `caps' uses 32-bit
 > capabilities (legacy support in use)
 > Jul 29 12:49:22 ttip kernel: [ cut here ]
 > Jul 29 12:49:22 ttip kernel: WARNING: CPU: 18 PID: 78388 at
 > arch/sparc/kernel/sys_sparc32.c:150
 > compat_SyS_sparc_sigaction+0x3c/0x60
 > Jul 29 12:49:22 ttip kernel: Modules linked in: tcp_diag inet_diag
 > xfrm_user xfrm_algo nfnetlink netlink_diag xt_tcpudp xt_multiport
 > xt_conntrack tun iptable_filter iptable_nat nf_conntrack_ipv4
 > nf_defrag_ipv4 nf_nat_ipv4 xfs camellia_sparc64 des_sparc64
 > des_generic aes_sparc64 n2_rng md5_sparc64 rng_core flash
 > sha512_sparc64 sha256_sparc64 sha1_sparc64 nf_nat_pptp
 > nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat
 > nf_conntrack libcrc32c crc32c_generic ip_tables x_tables autofs4 ext4
 > crc16 mbcache jbd2 crc32c_sparc64
 > Jul 29 12:49:22 ttip kernel: CPU: 18 PID: 78388 Comm: sigaction Not
 > tainted 4.13.0-rc2-00220-g0a07b238e5f4 #376
 > Jul 29 12:49:22 ttip kernel: Call Trace:
 > Jul 29 12:49:22 ttip kernel:  [0046c074] __warn+0xb4/0xe0
 > Jul 29 12:49:22 ttip kernel:  [0046c120] warn_slowpath_null+0x20/0x40
 > Jul 29 12:49:22 ttip kernel:  [0044b7bc]
 > compat_SyS_sparc_sigaction+0x3c/0x60
 > Jul 29 12:49:22 ttip kernel:  [004061d4] 
 > linux_sparc_syscall32+0x34/0x60
 > Jul 29 12:49:22 ttip kernel: ---[ end trace 1ad5184278304e6d ]---
 > Jul 29 12:49:25 ttip kernel: pc[83378]: segfault at 7974 ip
 > 7974 (rpc 796c) sp ffdd9438 error
 > 30001 in pc[7001+2000]

--

Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels

2017-07-28 Thread Mikael Pettersson

David Miller writes:
 > From: Mikael Pettersson <mikpeli...@gmail.com>
 > Date: Thu, 27 Jul 2017 21:45:25 +0200
 > 
 > > Attempting to build strace-4.18 as sparcv9 code and run its test suite
 > > on a sparc64 machine (Sun Blade 2500 w/ 2 x USIIIi in my case) fails
 > > reliably in three test cases (sched.gen, sched_xetattr.gen, and poll)
 > > because two test binaries (sched_xetattr and poll) OOPS the kernel and
 > > get killed.  Sample dmesg from 4.13-rc2:
 > > 
 > > [42912.270398] Unable to handle kernel NULL pointer dereference
 > > [42912.327717] tsk->{mm,active_mm}->context = 136a
 > > [42912.383789] tsk->{mm,active_mm}->pgd = fff227db4000
 > > [42912.435247]   \|/  \|/
 > >  "@'/ .. \`@"
 > >  /_| \__/ |_\
 > > \__U_/
 > > [42912.559982] sched_xetattr(21866): Oops [#1]
 > > [42912.597773] CPU: 0 PID: 21866 Comm: sched_xetattr Not tainted 
 > > 4.13.0-rc2 #1
 > > [42912.672138] task: fff229a5c380 task.stack: fff227dec000
 > > [42912.732876] TSTATE: 004411001603 TPC: 007570fc TNPC: 
 > > 00757110 Y: Not tainted
 > > [42912.845079] TPC: <__bzero+0x20/0xc0>
 > > [42912.874870] g0:  g1:  g2: 
 > > 0030 g3: 008ca100
 > > [42912.972120] g4: fff229a5c380 g5: fff23ef44000 g6: 
 > > fff227dec000 g7: 0030
 > > [42913.069446] o0: 0030 o1: fff227defe70 o2: 
 > >  o3: 0030
 > > [42913.166765] o4: fff227defe70 o5:  sp: 
 > > fff227def5c1 ret_pc: 00474fa4
 > > [42913.268664] RPC: <SyS_sched_setattr+0xb0/0x150>
 > 
 > This looks really strange.  It is a memset() call with the buffer pointer
 > and length arguments reversed.
 > 
 > What exact command did you give to configure and build strace-4.18 so that
 > I can try to reproduce this?

It's an rpmbuild --rebuild of Fedora's strace-4.18-1.fc24.src.rpm, but 
according to the
build log the following should do it:

export CFLAGS='-O2 -g -pipe -Wall -Werror=format-security 
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector 
--param=ssp-buffer-size=4 -grecord-gcc-switches  -m32 -mcpu=ultrasparc'
./configure --build=sparcv9-unknown-linux-gnu --host=sparcv9-unknown-linux-gnu 
--program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/u
sr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share 
--includedir=/usr/include --libdir=/usr/lib --libexecdir=/usr/libexec --local
statedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man 
--infodir=/usr/share/info
make -j2
make -j2 -k check VERBOSE=1


/Mikael

Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels

2017-07-28 Thread Mikael Pettersson

David Miller writes:
 > From: Mikael Pettersson 
 > Date: Thu, 27 Jul 2017 21:45:25 +0200
 > 
 > > Attempting to build strace-4.18 as sparcv9 code and run its test suite
 > > on a sparc64 machine (Sun Blade 2500 w/ 2 x USIIIi in my case) fails
 > > reliably in three test cases (sched.gen, sched_xetattr.gen, and poll)
 > > because two test binaries (sched_xetattr and poll) OOPS the kernel and
 > > get killed.  Sample dmesg from 4.13-rc2:
 > > 
 > > [42912.270398] Unable to handle kernel NULL pointer dereference
 > > [42912.327717] tsk->{mm,active_mm}->context = 136a
 > > [42912.383789] tsk->{mm,active_mm}->pgd = fff227db4000
 > > [42912.435247]   \|/  \|/
 > >  "@'/ .. \`@"
 > >  /_| \__/ |_\
 > > \__U_/
 > > [42912.559982] sched_xetattr(21866): Oops [#1]
 > > [42912.597773] CPU: 0 PID: 21866 Comm: sched_xetattr Not tainted 
 > > 4.13.0-rc2 #1
 > > [42912.672138] task: fff229a5c380 task.stack: fff227dec000
 > > [42912.732876] TSTATE: 004411001603 TPC: 007570fc TNPC: 
 > > 00757110 Y: Not tainted
 > > [42912.845079] TPC: <__bzero+0x20/0xc0>
 > > [42912.874870] g0:  g1:  g2: 
 > > 0030 g3: 008ca100
 > > [42912.972120] g4: fff229a5c380 g5: fff23ef44000 g6: 
 > > fff227dec000 g7: 0030
 > > [42913.069446] o0: 0030 o1: fff227defe70 o2: 
 > >  o3: 0030
 > > [42913.166765] o4: fff227defe70 o5:  sp: 
 > > fff227def5c1 ret_pc: 00474fa4
 > > [42913.268664] RPC: 
 > 
 > This looks really strange.  It is a memset() call with the buffer pointer
 > and length arguments reversed.
 > 
 > What exact command did you give to configure and build strace-4.18 so that
 > I can try to reproduce this?

It's an rpmbuild --rebuild of Fedora's strace-4.18-1.fc24.src.rpm, but 
according to the
build log the following should do it:

export CFLAGS='-O2 -g -pipe -Wall -Werror=format-security 
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector 
--param=ssp-buffer-size=4 -grecord-gcc-switches  -m32 -mcpu=ultrasparc'
./configure --build=sparcv9-unknown-linux-gnu --host=sparcv9-unknown-linux-gnu 
--program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/u
sr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share 
--includedir=/usr/include --libdir=/usr/lib --libexecdir=/usr/libexec --local
statedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man 
--infodir=/usr/share/info
make -j2
make -j2 -k check VERBOSE=1


/Mikael

strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels

2017-07-27 Thread Mikael Pettersson

Attempting to build strace-4.18 as sparcv9 code and run its test suite
on a sparc64 machine (Sun Blade 2500 w/ 2 x USIIIi in my case) fails
reliably in three test cases (sched.gen, sched_xetattr.gen, and poll)
because two test binaries (sched_xetattr and poll) OOPS the kernel and
get killed.  Sample dmesg from 4.13-rc2:

[42912.270398] Unable to handle kernel NULL pointer dereference
[42912.327717] tsk->{mm,active_mm}->context = 136a
[42912.383789] tsk->{mm,active_mm}->pgd = fff227db4000
[42912.435247]   \|/  \|/
 "@'/ .. \`@"
 /_| \__/ |_\
\__U_/
[42912.559982] sched_xetattr(21866): Oops [#1]
[42912.597773] CPU: 0 PID: 21866 Comm: sched_xetattr Not tainted 4.13.0-rc2 #1
[42912.672138] task: fff229a5c380 task.stack: fff227dec000
[42912.732876] TSTATE: 004411001603 TPC: 007570fc TNPC: 
00757110 Y: Not tainted
[42912.845079] TPC: <__bzero+0x20/0xc0>
[42912.874870] g0:  g1:  g2: 0030 
g3: 008ca100
[42912.972120] g4: fff229a5c380 g5: fff23ef44000 g6: fff227dec000 
g7: 0030
[42913.069446] o0: 0030 o1: fff227defe70 o2:  
o3: 0030
[42913.166765] o4: fff227defe70 o5:  sp: fff227def5c1 
ret_pc: 00474fa4
[42913.268664] RPC: 
[42913.311046] l0: f7b6caa8 l1: cccd l2: ffc2e7d4 
l3: f7b6c888
[42913.408293] l4:  l5:  l6:  
l7: f7ba2000
[42913.505627] i0:  i1: f79f1ffc i2:  
i3: 
[42913.602940] i4: fff227defe70 i5: fff227defe70 i6: fff227def6a1 
i7: 004061b4
[42913.700268] I7: 
[42913.744966] Call Trace:
[42913.759938]  [004061b4] linux_sparc_syscall32+0x34/0x60
[42913.820656] Disabling lock debugging due to kernel taint
[42913.873374] Caller[004061b4]: linux_sparc_syscall32+0x34/0x60
[42913.940953] Caller[00010ed0]: 0x10ed0
[42913.981081] Instruction DUMP:
[42913.981085]  c56a2000 
[42914.002817]  808a2003 
[42914.016643]  02480006 
[42914.030363] 
[42914.044094]  90022001 
[42914.057816]  808a2003 
[42914.071539]  1247fffd 
[42914.085269]  92226001 
[42914.098992]  808a2007 

[42914.471525] Unable to handle kernel NULL pointer dereference
[42914.528830] tsk->{mm,active_mm}->context = 17cd
[42914.584862] tsk->{mm,active_mm}->pgd = fff227b78000
[42914.636319]   \|/  \|/
 "@'/ .. \`@"
 /_| \__/ |_\
\__U_/
[42914.761013] sched_xetattr(22483): Oops [#2]
[42914.798837] CPU: 0 PID: 22483 Comm: sched_xetattr Tainted: G  D 
4.13.0-rc2 #1
[42914.889222] task: fff000123c73bc00 task.stack: fff0001238998000
[42914.949915] TSTATE: 004411001603 TPC: 007570fc TNPC: 
00757110 Y: Tainted: G  D
[42915.078076] TPC: <__bzero+0x20/0xc0>
[42915.107875] g0:  g1:  g2: 0030 
g3: 008ca100
[42915.205205] g4: fff000123c73bc00 g5: fff23ef44000 g6: fff0001238998000 
g7: 0030
[42915.302532] o0: 0030 o1: fff000123899be70 o2:  
o3: 0030
[42915.399851] o4: fff000123899be70 o5:  sp: fff000123899b5c1 
ret_pc: 00474fa4
[42915.501731] RPC: 
[42915.544033] l0: f784caa8 l1: cccd l2: ff91c7d4 
l3: f784c888
[42915.641289] l4:  l5:  l6:  
l7: f7882000
[42915.738582] i0:  i1: f76d1ffc i2:  
i3: 
[42915.835827] i4: fff000123899be70 i5: fff000123899be70 i6: fff000123899b6a1 
i7: 004061b4
[42915.933160] I7: 
[42915.977822] Call Trace:
[42915.992698]  [004061b4] linux_sparc_syscall32+0x34/0x60
[42916.053335] Caller[004061b4]: linux_sparc_syscall32+0x34/0x60
[42916.120934] Caller[00010ed0]: 0x10ed0
[42916.161052] Instruction DUMP:
[42916.161056]  c56a2000 
[42916.182878]  808a2003 
[42916.196607]  02480006 
[42916.210330] 
[42916.224052]  90022001 
[42916.237781]  808a2003 
[42916.251502]  1247fffd 
[42916.265224]  92226001 
[42916.278955]  808a2007 

[42918.071476] [ cut here ]
[42918.115146] WARNING: CPU: 0 PID: 23177 at 
arch/sparc/kernel/sys_sparc32.c:150 compat_SyS_sparc_sigaction+0x34/0x4c
[42918.234167] Modules linked in: af_packet ipv6 hid_generic tg3 hwmon 
i2c_ali1535 ohci_pci ptp ohci_hcd evdev i2c_core pps_core flash sr_mod cdrom 
pata_ali libata
[42918.405845] CPU: 0 PID: 23177 Comm: sigaction Tainted: G  D 
4.13.0-rc2 #1

strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels

2017-07-27 Thread Mikael Pettersson

Attempting to build strace-4.18 as sparcv9 code and run its test suite
on a sparc64 machine (Sun Blade 2500 w/ 2 x USIIIi in my case) fails
reliably in three test cases (sched.gen, sched_xetattr.gen, and poll)
because two test binaries (sched_xetattr and poll) OOPS the kernel and
get killed.  Sample dmesg from 4.13-rc2:

[42912.270398] Unable to handle kernel NULL pointer dereference
[42912.327717] tsk->{mm,active_mm}->context = 136a
[42912.383789] tsk->{mm,active_mm}->pgd = fff227db4000
[42912.435247]   \|/  \|/
 "@'/ .. \`@"
 /_| \__/ |_\
\__U_/
[42912.559982] sched_xetattr(21866): Oops [#1]
[42912.597773] CPU: 0 PID: 21866 Comm: sched_xetattr Not tainted 4.13.0-rc2 #1
[42912.672138] task: fff229a5c380 task.stack: fff227dec000
[42912.732876] TSTATE: 004411001603 TPC: 007570fc TNPC: 
00757110 Y: Not tainted
[42912.845079] TPC: <__bzero+0x20/0xc0>
[42912.874870] g0:  g1:  g2: 0030 
g3: 008ca100
[42912.972120] g4: fff229a5c380 g5: fff23ef44000 g6: fff227dec000 
g7: 0030
[42913.069446] o0: 0030 o1: fff227defe70 o2:  
o3: 0030
[42913.166765] o4: fff227defe70 o5:  sp: fff227def5c1 
ret_pc: 00474fa4
[42913.268664] RPC: 
[42913.311046] l0: f7b6caa8 l1: cccd l2: ffc2e7d4 
l3: f7b6c888
[42913.408293] l4:  l5:  l6:  
l7: f7ba2000
[42913.505627] i0:  i1: f79f1ffc i2:  
i3: 
[42913.602940] i4: fff227defe70 i5: fff227defe70 i6: fff227def6a1 
i7: 004061b4
[42913.700268] I7: 
[42913.744966] Call Trace:
[42913.759938]  [004061b4] linux_sparc_syscall32+0x34/0x60
[42913.820656] Disabling lock debugging due to kernel taint
[42913.873374] Caller[004061b4]: linux_sparc_syscall32+0x34/0x60
[42913.940953] Caller[00010ed0]: 0x10ed0
[42913.981081] Instruction DUMP:
[42913.981085]  c56a2000 
[42914.002817]  808a2003 
[42914.016643]  02480006 
[42914.030363] 
[42914.044094]  90022001 
[42914.057816]  808a2003 
[42914.071539]  1247fffd 
[42914.085269]  92226001 
[42914.098992]  808a2007 

[42914.471525] Unable to handle kernel NULL pointer dereference
[42914.528830] tsk->{mm,active_mm}->context = 17cd
[42914.584862] tsk->{mm,active_mm}->pgd = fff227b78000
[42914.636319]   \|/  \|/
 "@'/ .. \`@"
 /_| \__/ |_\
\__U_/
[42914.761013] sched_xetattr(22483): Oops [#2]
[42914.798837] CPU: 0 PID: 22483 Comm: sched_xetattr Tainted: G  D 
4.13.0-rc2 #1
[42914.889222] task: fff000123c73bc00 task.stack: fff0001238998000
[42914.949915] TSTATE: 004411001603 TPC: 007570fc TNPC: 
00757110 Y: Tainted: G  D
[42915.078076] TPC: <__bzero+0x20/0xc0>
[42915.107875] g0:  g1:  g2: 0030 
g3: 008ca100
[42915.205205] g4: fff000123c73bc00 g5: fff23ef44000 g6: fff0001238998000 
g7: 0030
[42915.302532] o0: 0030 o1: fff000123899be70 o2:  
o3: 0030
[42915.399851] o4: fff000123899be70 o5:  sp: fff000123899b5c1 
ret_pc: 00474fa4
[42915.501731] RPC: 
[42915.544033] l0: f784caa8 l1: cccd l2: ff91c7d4 
l3: f784c888
[42915.641289] l4:  l5:  l6:  
l7: f7882000
[42915.738582] i0:  i1: f76d1ffc i2:  
i3: 
[42915.835827] i4: fff000123899be70 i5: fff000123899be70 i6: fff000123899b6a1 
i7: 004061b4
[42915.933160] I7: 
[42915.977822] Call Trace:
[42915.992698]  [004061b4] linux_sparc_syscall32+0x34/0x60
[42916.053335] Caller[004061b4]: linux_sparc_syscall32+0x34/0x60
[42916.120934] Caller[00010ed0]: 0x10ed0
[42916.161052] Instruction DUMP:
[42916.161056]  c56a2000 
[42916.182878]  808a2003 
[42916.196607]  02480006 
[42916.210330] 
[42916.224052]  90022001 
[42916.237781]  808a2003 
[42916.251502]  1247fffd 
[42916.265224]  92226001 
[42916.278955]  808a2007 

[42918.071476] [ cut here ]
[42918.115146] WARNING: CPU: 0 PID: 23177 at 
arch/sparc/kernel/sys_sparc32.c:150 compat_SyS_sparc_sigaction+0x34/0x4c
[42918.234167] Modules linked in: af_packet ipv6 hid_generic tg3 hwmon 
i2c_ali1535 ohci_pci ptp ohci_hcd evdev i2c_core pps_core flash sr_mod cdrom 
pata_ali libata
[42918.405845] CPU: 0 PID: 23177 Comm: sigaction Tainted: G  D 
4.13.0-rc2 #1
[42918.491645] Call Trace:
[42918.506518]  [00455b18] __warn+0xb4/0xc4
[42918.549976]  [004449e4]

Re: v4.13-rc2: usb mouse stopped working?

2017-07-25 Thread Mikael Pettersson

Jiri Kosina writes:
 > On Mon, 24 Jul 2017, Pavel Machek wrote:
 > 
 > > On thinkpad x220, USB mouse stopped working in v4.13-rc2. v4.12 was
 > > ok, iirc.
 > > 
 > > Now, USB mouse is so common hw that I may have something wrong in my
 > > config...? But I did not change anything there.
 > 
 > Well, your particular USB mouse requires an always-poll quirk, so it's not 
 > *that* common; therefore ...
 > 
 > > In v4.9:
 > > 
 > > [   87.064408] input: Logitech USB Optical Mouse as
 > > /devices/pci:00/:00:1d.0/usb2/2-1/2-1.1/2-1.1.1/2-1.1.1.1/2-1.1.1.1:1.0/0003:046D:C05A.0005/input/input18
 > > [   87.065951] hid-generic 0003:046D:C05A.0005: input,hidraw0: USB HID
 > > v1.11 Mouse [Logitech USB Optical Mouse] on
 > > usb-:00:1d.0-1.1.1.1/input0
 > > pavel@duo:~$
 > > 
 > > v4.13-rc2:
 > > 
 > > [   87.886810] usb 2-1.1.1.1: new low-speed USB device number 10 using 
 > > ehci-pci
 > > [   88.019543] usb 2-1.1.1.1: New USB device found, idVendor=046d, 
 > > idProduct=c05a
 > > [   88.019561] usb 2-1.1.1.1: New USB device strings: Mfr=1, Product=2, 
 > > SerialNumber=0
 > > [   88.019572] usb 2-1.1.1.1: Product: USB Optical Mouse
 > > [   88.019581] usb 2-1.1.1.1: Manufacturer: Logitech
 > > [   88.027276] input: Logitech USB Optical Mouse as 
 > > /devices/pci:00/:00:1d.0/usb2/2-1/2-1.1/2-1.1.1/2-1.1.1.1/2-1.1.1.1:1.0/0003:046D:C05A.0005/input/input18
 > > [   88.027825] hid-generic 0003:046D:C05A.0005: input,hidraw1: USB HID
 > > v1.11 Mouse [Logitech USB Optical Mouse] on
 > > usb-:00:1d.0-1.1.1.1/input0
 > 
 > ... this is most likely caused by e399396a6b0, and fix for that is queued 
 > in hid.git as cf601774c9f ("HID: usbhid: fix "always poll" quirk"); I'm 
 > planning to send it to Linus' tomorrow, but if you can double-check that 
 > it does fix your issue as well, that'd be appreciated.

I had a similar problem (a Logitech USB Optical Mouse being completely dead
with 4.13-rc[12]), and cf601774c9f fixed it.

/Mikael

Re: v4.13-rc2: usb mouse stopped working?

2017-07-25 Thread Mikael Pettersson

Jiri Kosina writes:
 > On Mon, 24 Jul 2017, Pavel Machek wrote:
 > 
 > > On thinkpad x220, USB mouse stopped working in v4.13-rc2. v4.12 was
 > > ok, iirc.
 > > 
 > > Now, USB mouse is so common hw that I may have something wrong in my
 > > config...? But I did not change anything there.
 > 
 > Well, your particular USB mouse requires an always-poll quirk, so it's not 
 > *that* common; therefore ...
 > 
 > > In v4.9:
 > > 
 > > [   87.064408] input: Logitech USB Optical Mouse as
 > > /devices/pci:00/:00:1d.0/usb2/2-1/2-1.1/2-1.1.1/2-1.1.1.1/2-1.1.1.1:1.0/0003:046D:C05A.0005/input/input18
 > > [   87.065951] hid-generic 0003:046D:C05A.0005: input,hidraw0: USB HID
 > > v1.11 Mouse [Logitech USB Optical Mouse] on
 > > usb-:00:1d.0-1.1.1.1/input0
 > > pavel@duo:~$
 > > 
 > > v4.13-rc2:
 > > 
 > > [   87.886810] usb 2-1.1.1.1: new low-speed USB device number 10 using 
 > > ehci-pci
 > > [   88.019543] usb 2-1.1.1.1: New USB device found, idVendor=046d, 
 > > idProduct=c05a
 > > [   88.019561] usb 2-1.1.1.1: New USB device strings: Mfr=1, Product=2, 
 > > SerialNumber=0
 > > [   88.019572] usb 2-1.1.1.1: Product: USB Optical Mouse
 > > [   88.019581] usb 2-1.1.1.1: Manufacturer: Logitech
 > > [   88.027276] input: Logitech USB Optical Mouse as 
 > > /devices/pci:00/:00:1d.0/usb2/2-1/2-1.1/2-1.1.1/2-1.1.1.1/2-1.1.1.1:1.0/0003:046D:C05A.0005/input/input18
 > > [   88.027825] hid-generic 0003:046D:C05A.0005: input,hidraw1: USB HID
 > > v1.11 Mouse [Logitech USB Optical Mouse] on
 > > usb-:00:1d.0-1.1.1.1/input0
 > 
 > ... this is most likely caused by e399396a6b0, and fix for that is queued 
 > in hid.git as cf601774c9f ("HID: usbhid: fix "always poll" quirk"); I'm 
 > planning to send it to Linus' tomorrow, but if you can double-check that 
 > it does fix your issue as well, that'd be appreciated.

I had a similar problem (a Logitech USB Optical Mouse being completely dead
with 4.13-rc[12]), and cf601774c9f fixed it.

/Mikael

Re: 4.7-rc6, ext4, sparc64: Unable to handle kernel paging request at ...

2016-07-10 Thread Mikael Pettersson

Meelis Roos writes:
 > > > Just got this on bootup of my Sun T2000:
 > > >...
 > > > I have not seen it before, this includes 4.6.0 4.6.0-08907-g7639dad
 > > > 4.7.0-rc1-00094-g6b15d66 4.7.0-rc4-00014-g67016f6.
 > > >
 > > > It is not reproducible, did not appear on next reboot of the same
 > > > kernel.
 > > 
 > > mine T5120 boots ok 4.7.0-rc6, rootfs being on ext4 .
 > 
 > My T5120 and many other sparc64 machines also boot fine, most of them 
 > using ext4, others ext3 with ext4 driver.
 > 
 > However, I also got a very similar oops from T1000:
 > 
 > [   55.251101] Unable to handle kernel paging request at virtual address 
 > fe42a000
 > [   55.251348] tsk->{mm,active_mm}->context = 0083
 > [   55.251533] tsk->{mm,active_mm}->pgd = 8001f6224000
 > [   55.251719]   \|/  \|/
 >  "@'/ .. \`@"
 >  /_| \__/ |_\
 > \__U_/
 > [   55.252038] systemd-udevd(268): Oops [#1]
 > [   55.252274] CPU: 9 PID: 268 Comm: systemd-udevd Not tainted 4.7.0-rc6 #26
 > [   55.252367] task: 8001f6064380 ti: 8001f620c000 task.ti: 
 > 8001f620c000
 > [   55.252497] TSTATE: 000811001604 TPC: 00649380 TNPC: 
 > 00649384 Y: Not tainted
 > [   55.252651] TPC: <__radix_tree_lookup+0x60/0x1a0>
...

A few weeks ago I got a similar oops with 4.7.0-rc2 on a Sun Blade 2500 (dual 
USIIIi):

Jun 12 18:40:26 lauter kernel: Unable to handle kernel paging request at 
virtual address a000
Jun 12 18:40:26 lauter kernel: tsk->{mm,active_mm}->context = 17e3
Jun 12 18:40:26 lauter kernel: tsk->{mm,active_mm}->pgd = fff23edb8000
Jun 12 18:40:26 lauter kernel:   \|/  \|/
Jun 12 18:40:26 lauter kernel:   "@'/ .. \`@"
Jun 12 18:40:26 lauter kernel:   /_| \__/ |_\
Jun 12 18:40:26 lauter kernel:  \__U_/
Jun 12 18:40:26 lauter kernel: gnat1(19464): Oops [#1]
Jun 12 18:40:26 lauter kernel: CPU: 0 PID: 19464 Comm: gnat1 Not tainted 
4.7.0-rc2 #1
Jun 12 18:40:26 lauter kernel: task: fff23ebd1440 ti: fff000123c36 
task.ti: fff000123c36
Jun 12 18:40:27 lauter kernel: TSTATE: 11001604 TPC: 005db288 
TNPC: 005db28c Y: Not tainted
Jun 12 18:40:27 lauter kernel: TPC: <__radix_tree_lookup+0x44/0xd4>
Jun 12 18:40:27 lauter kernel: g0: 3000 g1: a6d9 g2: 
0001 g3: 
Jun 12 18:40:27 lauter kernel: g4: fff23ebd1440 g5: fff23ef7a000 g6: 
fff000123c36 g7: 
Jun 12 18:40:27 lauter kernel: o0: 000c o1: fff000123c363980 o2: 
fff000123c363988 o3: fff000123c363968
Jun 12 18:40:27 lauter kernel: o4: 0020 o5: fff23fffefc0 sp: 
fff000123c3630d1 ret_pc: fff232e42540
Jun 12 18:40:27 lauter kernel: RPC: <0xfff232e42540>
Jun 12 18:40:27 lauter kernel: l0: 024213ca l1:  l2: 
 l3: 
Jun 12 18:40:27 lauter kernel: l4:  l5:  l6: 
 l7: 
Jun 12 18:40:27 lauter kernel: i0: fff0001225e56900 i1: 0441 i2: 
 i3: 
Jun 12 18:40:27 lauter kernel: i4: a6d8 i5: fff232e42540 i6: 
fff000123c363191 i7: 004bf680
Jun 12 18:40:27 lauter kernel: I7: <__do_page_cache_readahead+0x78/0x200>
Jun 12 18:40:27 lauter kernel: Call Trace:
Jun 12 18:40:27 lauter kernel:  [004bf680] 
__do_page_cache_readahead+0x78/0x200
Jun 12 18:40:27 lauter kernel:  [004b5990] filemap_fault+0x164/0x4c4
Jun 12 18:40:27 lauter kernel:  [00562a84] ext4_filemap_fault+0x1c/0x38
Jun 12 18:40:27 lauter kernel:  [004d2c38] __do_fault+0x58/0xdc
Jun 12 18:40:27 lauter kernel:  [004d611c] handle_mm_fault+0x604/0xe5c
Jun 12 18:40:27 lauter kernel:  [00448288] do_sparc64_fault+0x228/0x684
Jun 12 18:40:27 lauter kernel:  [00407bcc] 
sparc64_realfault_common+0x10/0x20
Jun 12 18:40:28 lauter kernel: Disabling lock debugging due to kernel taint
Jun 12 18:40:28 lauter kernel: Caller[004bf680]: 
__do_page_cache_readahead+0x78/0x200
Jun 12 18:40:28 lauter kernel: Caller[004b5990]: 
filemap_fault+0x164/0x4c4
Jun 12 18:40:28 lauter kernel: Caller[00562a84]: 
ext4_filemap_fault+0x1c/0x38
Jun 12 18:40:28 lauter kernel: Caller[004d2c38]: __do_fault+0x58/0xdc
Jun 12 18:40:28 lauter kernel: Caller[004d611c]: 
handle_mm_fault+0x604/0xe5c
Jun 12 18:40:28 lauter kernel: Caller[00448288]: 
do_sparc64_fault+0x228/0x684
Jun 12 18:40:28 lauter kernel: Caller[00407bcc]: 
sparc64_realfault_common+0x10/0x20
Jun 12 18:40:28 lauter kernel: Caller[006ee248]: 
ip_options_compile+0x288/0x60c
Jun 12 18:40:28 lauter kernel: Instruction DUMP: 80a06001  0267fff2  b8087ffe 
 83365001  8208603f  84006004  83287003  8528b003

It's only happended that one time, so far.

Re: 4.7-rc6, ext4, sparc64: Unable to handle kernel paging request at ...

2016-07-10 Thread Mikael Pettersson

Meelis Roos writes:
 > > > Just got this on bootup of my Sun T2000:
 > > >...
 > > > I have not seen it before, this includes 4.6.0 4.6.0-08907-g7639dad
 > > > 4.7.0-rc1-00094-g6b15d66 4.7.0-rc4-00014-g67016f6.
 > > >
 > > > It is not reproducible, did not appear on next reboot of the same
 > > > kernel.
 > > 
 > > mine T5120 boots ok 4.7.0-rc6, rootfs being on ext4 .
 > 
 > My T5120 and many other sparc64 machines also boot fine, most of them 
 > using ext4, others ext3 with ext4 driver.
 > 
 > However, I also got a very similar oops from T1000:
 > 
 > [   55.251101] Unable to handle kernel paging request at virtual address 
 > fe42a000
 > [   55.251348] tsk->{mm,active_mm}->context = 0083
 > [   55.251533] tsk->{mm,active_mm}->pgd = 8001f6224000
 > [   55.251719]   \|/  \|/
 >  "@'/ .. \`@"
 >  /_| \__/ |_\
 > \__U_/
 > [   55.252038] systemd-udevd(268): Oops [#1]
 > [   55.252274] CPU: 9 PID: 268 Comm: systemd-udevd Not tainted 4.7.0-rc6 #26
 > [   55.252367] task: 8001f6064380 ti: 8001f620c000 task.ti: 
 > 8001f620c000
 > [   55.252497] TSTATE: 000811001604 TPC: 00649380 TNPC: 
 > 00649384 Y: Not tainted
 > [   55.252651] TPC: <__radix_tree_lookup+0x60/0x1a0>
...

A few weeks ago I got a similar oops with 4.7.0-rc2 on a Sun Blade 2500 (dual 
USIIIi):

Jun 12 18:40:26 lauter kernel: Unable to handle kernel paging request at 
virtual address a000
Jun 12 18:40:26 lauter kernel: tsk->{mm,active_mm}->context = 17e3
Jun 12 18:40:26 lauter kernel: tsk->{mm,active_mm}->pgd = fff23edb8000
Jun 12 18:40:26 lauter kernel:   \|/  \|/
Jun 12 18:40:26 lauter kernel:   "@'/ .. \`@"
Jun 12 18:40:26 lauter kernel:   /_| \__/ |_\
Jun 12 18:40:26 lauter kernel:  \__U_/
Jun 12 18:40:26 lauter kernel: gnat1(19464): Oops [#1]
Jun 12 18:40:26 lauter kernel: CPU: 0 PID: 19464 Comm: gnat1 Not tainted 
4.7.0-rc2 #1
Jun 12 18:40:26 lauter kernel: task: fff23ebd1440 ti: fff000123c36 
task.ti: fff000123c36
Jun 12 18:40:27 lauter kernel: TSTATE: 11001604 TPC: 005db288 
TNPC: 005db28c Y: Not tainted
Jun 12 18:40:27 lauter kernel: TPC: <__radix_tree_lookup+0x44/0xd4>
Jun 12 18:40:27 lauter kernel: g0: 3000 g1: a6d9 g2: 
0001 g3: 
Jun 12 18:40:27 lauter kernel: g4: fff23ebd1440 g5: fff23ef7a000 g6: 
fff000123c36 g7: 
Jun 12 18:40:27 lauter kernel: o0: 000c o1: fff000123c363980 o2: 
fff000123c363988 o3: fff000123c363968
Jun 12 18:40:27 lauter kernel: o4: 0020 o5: fff23fffefc0 sp: 
fff000123c3630d1 ret_pc: fff232e42540
Jun 12 18:40:27 lauter kernel: RPC: <0xfff232e42540>
Jun 12 18:40:27 lauter kernel: l0: 024213ca l1:  l2: 
 l3: 
Jun 12 18:40:27 lauter kernel: l4:  l5:  l6: 
 l7: 
Jun 12 18:40:27 lauter kernel: i0: fff0001225e56900 i1: 0441 i2: 
 i3: 
Jun 12 18:40:27 lauter kernel: i4: a6d8 i5: fff232e42540 i6: 
fff000123c363191 i7: 004bf680
Jun 12 18:40:27 lauter kernel: I7: <__do_page_cache_readahead+0x78/0x200>
Jun 12 18:40:27 lauter kernel: Call Trace:
Jun 12 18:40:27 lauter kernel:  [004bf680] 
__do_page_cache_readahead+0x78/0x200
Jun 12 18:40:27 lauter kernel:  [004b5990] filemap_fault+0x164/0x4c4
Jun 12 18:40:27 lauter kernel:  [00562a84] ext4_filemap_fault+0x1c/0x38
Jun 12 18:40:27 lauter kernel:  [004d2c38] __do_fault+0x58/0xdc
Jun 12 18:40:27 lauter kernel:  [004d611c] handle_mm_fault+0x604/0xe5c
Jun 12 18:40:27 lauter kernel:  [00448288] do_sparc64_fault+0x228/0x684
Jun 12 18:40:27 lauter kernel:  [00407bcc] 
sparc64_realfault_common+0x10/0x20
Jun 12 18:40:28 lauter kernel: Disabling lock debugging due to kernel taint
Jun 12 18:40:28 lauter kernel: Caller[004bf680]: 
__do_page_cache_readahead+0x78/0x200
Jun 12 18:40:28 lauter kernel: Caller[004b5990]: 
filemap_fault+0x164/0x4c4
Jun 12 18:40:28 lauter kernel: Caller[00562a84]: 
ext4_filemap_fault+0x1c/0x38
Jun 12 18:40:28 lauter kernel: Caller[004d2c38]: __do_fault+0x58/0xdc
Jun 12 18:40:28 lauter kernel: Caller[004d611c]: 
handle_mm_fault+0x604/0xe5c
Jun 12 18:40:28 lauter kernel: Caller[00448288]: 
do_sparc64_fault+0x228/0x684
Jun 12 18:40:28 lauter kernel: Caller[00407bcc]: 
sparc64_realfault_common+0x10/0x20
Jun 12 18:40:28 lauter kernel: Caller[006ee248]: 
ip_options_compile+0x288/0x60c
Jun 12 18:40:28 lauter kernel: Instruction DUMP: 80a06001  0267fff2  b8087ffe 
 83365001  8208603f  84006004  83287003  8528b003

It's only happended that one time, so far.

Re: SIGSYS annoyance

2016-06-10 Thread Mikael Pettersson

Andy Lutomirski writes:
 > On Mon, Jun 6, 2016 at 9:03 AM, Kees Cook  wrote:
 > > On Fri, Jun 3, 2016 at 10:16 PM, Andy Lutomirski  
 > > wrote:
 > >> https://bugzilla.mozilla.org/show_bug.cgi?id=1176099
 > >>
 > >> Should SIGSYS be delivered to the handler even if blocked?  What, if
 > >> anything, does POSIX say?  All I can find is in pthread_sigmask(3p):
 > >>
 > >> If any of the SIGFPE, SIGILL, SIGSEGV, or SIGBUS signals are generated
 > >> while they are blocked, the result is undefined, unless the signal was
 > >> generated by the action of another process, or by one of the functions
 > >> kill(), pthread_kill(), raise(), or sigqueue().
 > >>
 > >> It would be easy enough to change our behavior so that we deliver the
 > >> signal even if it's blocked or to at least add a flag so that users
 > >> can request that behavior.
 > >
 > > I had trouble following that bug. It sounded like glib just needed a
 > > way to define its signal mask, and that's what they ended up
 > > implementing?
 > >
 > > I think the current behavior is correct. SIGSYS is being generated by
 > > the running process (i.e. the seccomp filter) and if it has a handler
 > > but the signal is blocked, we should treat it as uncaught and kill. On
 > > the other hand, it could be seen like "raise", in which case the
 > > blocking should be ignored? Is there an active problem somewhere here?
 > > It seems like the referenced bug has been fixed already.
 > 
 > Agreed.
 > 
 > It could make sense to have a new sigaction flag SA_FORCE: when set,
 > if a non-default handler is installed, the signal is blocked, and the
 > signal is triggered synchronously (forced), then the handler will be
 > called.  But that isn't specific to seccomp.

Blocking a signal is a very deliberate act.  If some piece of code wants
to force-deliver it, it can unblock it first.  IOW, I don't see the need
for this SA_FORCE thing.

Re: SIGSYS annoyance

2016-06-10 Thread Mikael Pettersson

Andy Lutomirski writes:
 > On Mon, Jun 6, 2016 at 9:03 AM, Kees Cook  wrote:
 > > On Fri, Jun 3, 2016 at 10:16 PM, Andy Lutomirski  
 > > wrote:
 > >> https://bugzilla.mozilla.org/show_bug.cgi?id=1176099
 > >>
 > >> Should SIGSYS be delivered to the handler even if blocked?  What, if
 > >> anything, does POSIX say?  All I can find is in pthread_sigmask(3p):
 > >>
 > >> If any of the SIGFPE, SIGILL, SIGSEGV, or SIGBUS signals are generated
 > >> while they are blocked, the result is undefined, unless the signal was
 > >> generated by the action of another process, or by one of the functions
 > >> kill(), pthread_kill(), raise(), or sigqueue().
 > >>
 > >> It would be easy enough to change our behavior so that we deliver the
 > >> signal even if it's blocked or to at least add a flag so that users
 > >> can request that behavior.
 > >
 > > I had trouble following that bug. It sounded like glib just needed a
 > > way to define its signal mask, and that's what they ended up
 > > implementing?
 > >
 > > I think the current behavior is correct. SIGSYS is being generated by
 > > the running process (i.e. the seccomp filter) and if it has a handler
 > > but the signal is blocked, we should treat it as uncaught and kill. On
 > > the other hand, it could be seen like "raise", in which case the
 > > blocking should be ignored? Is there an active problem somewhere here?
 > > It seems like the referenced bug has been fixed already.
 > 
 > Agreed.
 > 
 > It could make sense to have a new sigaction flag SA_FORCE: when set,
 > if a non-default handler is installed, the signal is blocked, and the
 > signal is triggered synchronously (forced), then the handler will be
 > called.  But that isn't specific to seccomp.

Blocking a signal is a very deliberate act.  If some piece of code wants
to force-deliver it, it can unblock it first.  IOW, I don't see the need
for this SA_FORCE thing.

Re: fork on processes with lots of memory

2016-01-26 Thread Mikael Pettersson

Felix von Leitner writes:
 > > Dear Linux kernel devs,
 > 
 > > I talked to someone who uses large Linux based hardware to run a
 > > process with huge memory requirements (think 4 GB), and he told me that
 > > if they do a fork() syscall on that process, the whole system comes to
 > > standstill. And not just for a second or two. He said they measured a 45
 > > minute (!) delay before the system became responsive again.
 > 
 > I'm sorry, I meant 4 TB not 4 GB.
 > I'm not used to working with that kind of memory sizes.

Make sure you have >>4TB physical if you're going to fork from a process
with a 4TB virtual address space.  (I'm assuming it's not sparse, but all
actually being used.)

Disable transparent hugepages (THP).  The internal book-keeping mechanisms
have been known to run amok with large RAM sizes causing severe performance
issues.  Maybe 4.x kernels are better, I haven't checked.

If you're using explicit hugepages and these kinds of RAM sizes, don't
bother with RHEL 6 or 7 kernels -- they're broken.  Vanilla 4.x kernels work.

We're also in the TB range, though not quite 4TB, and fork()ing from inside
such processes definitely works for us.  We do disable THP since it kills us
otherwise.

 > 
 > > Their working theory is that all the pages need to be marked copy-on-write
 > > in both processes, and if you touch one page, a copy needs to be made,
 > > and than just takes a while if you have a billion pages.
 > 
 > > I was wondering if there is any advice for such situations from the
 > > memory management people on this list.
 > 
 > > In this case the fork was for an execve afterwards, but I was going to
 > > recommend fork to them for something else that can not be tricked around
 > > with vfork.
 > 
 > > Can anyone comment on whether the 45 minute number sounds like it could
 > > be real? When I heard it, I was flabberghasted. But the other person
 > > swore it was real. Can a fork cause this much of a delay? Is there a way
 > > to work around it?
 > 
 > > I was going to recommend the fork to create a boundary between the
 > > processes, so that you can recover from memory corruption in one
 > > process. In fact, after the fork I would want to munmap almost all of
 > > the shared pages anyway, but there is no way to tell fork that.
 > 
 > > Thanks,
 > 
 > > Felix
 > 
 > > PS: Please put me on Cc if you reply, I'm not subscribed to this mailing
 > > list.

--

Re: fork on processes with lots of memory

2016-01-26 Thread Mikael Pettersson

Felix von Leitner writes:
 > > Dear Linux kernel devs,
 > 
 > > I talked to someone who uses large Linux based hardware to run a
 > > process with huge memory requirements (think 4 GB), and he told me that
 > > if they do a fork() syscall on that process, the whole system comes to
 > > standstill. And not just for a second or two. He said they measured a 45
 > > minute (!) delay before the system became responsive again.
 > 
 > I'm sorry, I meant 4 TB not 4 GB.
 > I'm not used to working with that kind of memory sizes.

Make sure you have >>4TB physical if you're going to fork from a process
with a 4TB virtual address space.  (I'm assuming it's not sparse, but all
actually being used.)

Disable transparent hugepages (THP).  The internal book-keeping mechanisms
have been known to run amok with large RAM sizes causing severe performance
issues.  Maybe 4.x kernels are better, I haven't checked.

If you're using explicit hugepages and these kinds of RAM sizes, don't
bother with RHEL 6 or 7 kernels -- they're broken.  Vanilla 4.x kernels work.

We're also in the TB range, though not quite 4TB, and fork()ing from inside
such processes definitely works for us.  We do disable THP since it kills us
otherwise.

 > 
 > > Their working theory is that all the pages need to be marked copy-on-write
 > > in both processes, and if you touch one page, a copy needs to be made,
 > > and than just takes a while if you have a billion pages.
 > 
 > > I was wondering if there is any advice for such situations from the
 > > memory management people on this list.
 > 
 > > In this case the fork was for an execve afterwards, but I was going to
 > > recommend fork to them for something else that can not be tricked around
 > > with vfork.
 > 
 > > Can anyone comment on whether the 45 minute number sounds like it could
 > > be real? When I heard it, I was flabberghasted. But the other person
 > > swore it was real. Can a fork cause this much of a delay? Is there a way
 > > to work around it?
 > 
 > > I was going to recommend the fork to create a boundary between the
 > > processes, so that you can recover from memory corruption in one
 > > process. In fact, after the fork I would want to munmap almost all of
 > > the shared pages anyway, but there is no way to tell fork that.
 > 
 > > Thanks,
 > 
 > > Felix
 > 
 > > PS: Please put me on Cc if you reply, I'm not subscribed to this mailing
 > > list.

--

Re: [4.3-rc1 regression] modular 8250 doesn't load

2015-09-14 Thread Mikael Pettersson

Greg Kroah-Hartman writes:
 > On Mon, Sep 14, 2015 at 02:12:43PM -0700, Greg Kroah-Hartman wrote:
 > > On Mon, Sep 14, 2015 at 10:42:24PM +0200, Mikael Pettersson wrote:
 > > > Greg Kroah-Hartman writes:
 > > >  > On Mon, Sep 14, 2015 at 08:06:21PM +0200, Mikael Pettersson wrote:
 > > >  > > Greg Kroah-Hartman writes:
 > > >  > >  > On Mon, Sep 14, 2015 at 07:08:10PM +0200, Mikael Pettersson 
 > > > wrote:
 > > >  > >  > > I have CONFIG_SERIAL_8250=m.  With 4.2 '/sbin/modprobe 8250' 
 > > > worked
 > > >  > >  > > and resulted in:
 > > >  > >  > > 
 > > >  > >  > > [   41.354550] Serial: 8250/16550 driver, 4 ports, IRQ sharing 
 > > > disabled
 > > >  > >  > > [   41.375156] serial8250: ttyS0 at I/O 0x3f8 (irq = 4, 
 > > > base_baud = 115200) is a 16550A
 > > >  > >  > > 
 > > >  > >  > > With 4.3-rc1 however the command fails and logs the following:
 > > >  > >  > > 
 > > >  > >  > > [   34.140300] 8250_base: module license 'unspecified' taints 
 > > > kernel.
 > > >  > >  > 
 > > >  > >  > Oops, need to fix that.
 > > >  > >  > 
 > > >  > >  > > [   34.141846] Disabling lock debugging due to kernel taint
 > > >  > >  > > [   34.143388] 8250_base: Unknown symbol uart_insert_char (err 
 > > > 0)
 > > >  > >  > > [   34.144908] 8250_base: Unknown symbol 
 > > > uart_handle_dcd_change (err 0)
 > > >  > >  > > [   34.146439] 8250_base: Unknown symbol __pm_runtime_resume 
 > > > (err 0)
 > > >  > >  > > [   34.147901] 8250_base: Unknown symbol 
 > > > tty_termios_encode_baud_rate (err 0)
 > > >  > >  > > [   34.149354] 8250_base: Unknown symbol 
 > > > uart_handle_cts_change (err 0)
 > > >  > >  > > [   34.150798] 8250_base: Unknown symbol __pm_runtime_suspend 
 > > > (err 0)
 > > >  > >  > > [   34.152240] 8250_base: Unknown symbol nr_irqs (err 0)
 > > >  > >  > 
 > > >  > >  > Are you sure you did 'modprobe' and not 'insmod'?
 > > >  > > 
 > > >  > > Yes, I used modprobe.  I double-checked.
 > > >  > 
 > > >  > Then your build should have failed if these functions are not being
 > > >  > exported properly by your .config.  Most of these are in the 
 > > > serial_core
 > > >  > module, is that present/loaded?
 > > > 
 > > > Yes, serial_core is loaded.
 > > > 
 > > > uart_insert_char is EXPORT_SYMBOL_GPL, so could the missing license tag 
 > > > be preventing
 > > > 8250_core from binding to it?  (I haven't checked the other symbols but 
 > > > I assume they
 > > > are also _GPL.)
 > > 
 > > Ah, crap, yes, you are right.  You can test this with a simple:
 > >MODULE_LICENSE("GPL");
 > > line added to the 8250_base file.
 > 
 > Wait, 8250_base.c has a module license line.
 > 
 > Can you provide a full .config file?
 > 
 > thanks,
 > 
 > greg k-h

Here it is:

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_PERF_EVENTS_INTEL_UNCORE=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_64_SMP=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx 
-fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 
-fcall-saved-r11"
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
C

Re: [4.3-rc1 regression] modular 8250 doesn't load

2015-09-14 Thread Mikael Pettersson

Greg Kroah-Hartman writes:
 > On Mon, Sep 14, 2015 at 08:06:21PM +0200, Mikael Pettersson wrote:
 > > Greg Kroah-Hartman writes:
 > >  > On Mon, Sep 14, 2015 at 07:08:10PM +0200, Mikael Pettersson wrote:
 > >  > > I have CONFIG_SERIAL_8250=m.  With 4.2 '/sbin/modprobe 8250' worked
 > >  > > and resulted in:
 > >  > > 
 > >  > > [   41.354550] Serial: 8250/16550 driver, 4 ports, IRQ sharing 
 > > disabled
 > >  > > [   41.375156] serial8250: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 
 > > 115200) is a 16550A
 > >  > > 
 > >  > > With 4.3-rc1 however the command fails and logs the following:
 > >  > > 
 > >  > > [   34.140300] 8250_base: module license 'unspecified' taints kernel.
 > >  > 
 > >  > Oops, need to fix that.
 > >  > 
 > >  > > [   34.141846] Disabling lock debugging due to kernel taint
 > >  > > [   34.143388] 8250_base: Unknown symbol uart_insert_char (err 0)
 > >  > > [   34.144908] 8250_base: Unknown symbol uart_handle_dcd_change (err 
 > > 0)
 > >  > > [   34.146439] 8250_base: Unknown symbol __pm_runtime_resume (err 0)
 > >  > > [   34.147901] 8250_base: Unknown symbol tty_termios_encode_baud_rate 
 > > (err 0)
 > >  > > [   34.149354] 8250_base: Unknown symbol uart_handle_cts_change (err 
 > > 0)
 > >  > > [   34.150798] 8250_base: Unknown symbol __pm_runtime_suspend (err 0)
 > >  > > [   34.152240] 8250_base: Unknown symbol nr_irqs (err 0)
 > >  > 
 > >  > Are you sure you did 'modprobe' and not 'insmod'?
 > > 
 > > Yes, I used modprobe.  I double-checked.
 > 
 > Then your build should have failed if these functions are not being
 > exported properly by your .config.  Most of these are in the serial_core
 > module, is that present/loaded?

Yes, serial_core is loaded.

uart_insert_char is EXPORT_SYMBOL_GPL, so could the missing license tag be 
preventing
8250_core from binding to it?  (I haven't checked the other symbols but I 
assume they
are also _GPL.)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [4.3-rc1 regression] modular 8250 doesn't load

2015-09-14 Thread Mikael Pettersson

Greg Kroah-Hartman writes:
 > On Mon, Sep 14, 2015 at 07:08:10PM +0200, Mikael Pettersson wrote:
 > > I have CONFIG_SERIAL_8250=m.  With 4.2 '/sbin/modprobe 8250' worked
 > > and resulted in:
 > > 
 > > [   41.354550] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
 > > [   41.375156] serial8250: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 
 > > 115200) is a 16550A
 > > 
 > > With 4.3-rc1 however the command fails and logs the following:
 > > 
 > > [   34.140300] 8250_base: module license 'unspecified' taints kernel.
 > 
 > Oops, need to fix that.
 > 
 > > [   34.141846] Disabling lock debugging due to kernel taint
 > > [   34.143388] 8250_base: Unknown symbol uart_insert_char (err 0)
 > > [   34.144908] 8250_base: Unknown symbol uart_handle_dcd_change (err 0)
 > > [   34.146439] 8250_base: Unknown symbol __pm_runtime_resume (err 0)
 > > [   34.147901] 8250_base: Unknown symbol tty_termios_encode_baud_rate (err 
 > > 0)
 > > [   34.149354] 8250_base: Unknown symbol uart_handle_cts_change (err 0)
 > > [   34.150798] 8250_base: Unknown symbol __pm_runtime_suspend (err 0)
 > > [   34.152240] 8250_base: Unknown symbol nr_irqs (err 0)
 > 
 > Are you sure you did 'modprobe' and not 'insmod'?

Yes, I used modprobe.  I double-checked.

 > Peter, care to send a module license fix for this new module you
 > created?

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[4.3-rc1 regression] modular 8250 doesn't load

2015-09-14 Thread Mikael Pettersson

I have CONFIG_SERIAL_8250=m.  With 4.2 '/sbin/modprobe 8250' worked
and resulted in:

[   41.354550] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[   41.375156] serial8250: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is 
a 16550A

With 4.3-rc1 however the command fails and logs the following:

[   34.140300] 8250_base: module license 'unspecified' taints kernel.
[   34.141846] Disabling lock debugging due to kernel taint
[   34.143388] 8250_base: Unknown symbol uart_insert_char (err 0)
[   34.144908] 8250_base: Unknown symbol uart_handle_dcd_change (err 0)
[   34.146439] 8250_base: Unknown symbol __pm_runtime_resume (err 0)
[   34.147901] 8250_base: Unknown symbol tty_termios_encode_baud_rate (err 0)
[   34.149354] 8250_base: Unknown symbol uart_handle_cts_change (err 0)
[   34.150798] 8250_base: Unknown symbol __pm_runtime_suspend (err 0)
[   34.152240] 8250_base: Unknown symbol nr_irqs (err 0)

Relevant .config fragments:

CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODVERSIONS=y

# CONFIG_SERIAL_NONSTANDARD is not set
CONFIG_SERIAL_8250=m
# CONFIG_SERIAL_8250_DEPRECATED_OPTIONS is not set
# CONFIG_SERIAL_8250_PNP is not set
# CONFIG_SERIAL_8250_PCI is not set
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
# CONFIG_SERIAL_8250_EXTENDED is not set
# CONFIG_SERIAL_8250_DW is not set
# CONFIG_SERIAL_8250_FINTEK is not set
# CONFIG_SERIAL_UARTLITE is not set
CONFIG_SERIAL_CORE=m
# CONFIG_SERIAL_JSM is not set
# CONFIG_SERIAL_SCCNXP is not set
# CONFIG_SERIAL_SC16IS7XX is not set
# CONFIG_SERIAL_ALTERA_JTAGUART is not set
# CONFIG_SERIAL_ALTERA_UART is not set
# CONFIG_SERIAL_ARC is not set
# CONFIG_SERIAL_RP2 is not set
# CONFIG_SERIAL_FSL_LPUART is not set

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[4.3-rc1 regression] modular 8250 doesn't load

2015-09-14 Thread Mikael Pettersson

I have CONFIG_SERIAL_8250=m.  With 4.2 '/sbin/modprobe 8250' worked
and resulted in:

[   41.354550] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[   41.375156] serial8250: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is 
a 16550A

With 4.3-rc1 however the command fails and logs the following:

[   34.140300] 8250_base: module license 'unspecified' taints kernel.
[   34.141846] Disabling lock debugging due to kernel taint
[   34.143388] 8250_base: Unknown symbol uart_insert_char (err 0)
[   34.144908] 8250_base: Unknown symbol uart_handle_dcd_change (err 0)
[   34.146439] 8250_base: Unknown symbol __pm_runtime_resume (err 0)
[   34.147901] 8250_base: Unknown symbol tty_termios_encode_baud_rate (err 0)
[   34.149354] 8250_base: Unknown symbol uart_handle_cts_change (err 0)
[   34.150798] 8250_base: Unknown symbol __pm_runtime_suspend (err 0)
[   34.152240] 8250_base: Unknown symbol nr_irqs (err 0)

Relevant .config fragments:

CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODVERSIONS=y

# CONFIG_SERIAL_NONSTANDARD is not set
CONFIG_SERIAL_8250=m
# CONFIG_SERIAL_8250_DEPRECATED_OPTIONS is not set
# CONFIG_SERIAL_8250_PNP is not set
# CONFIG_SERIAL_8250_PCI is not set
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
# CONFIG_SERIAL_8250_EXTENDED is not set
# CONFIG_SERIAL_8250_DW is not set
# CONFIG_SERIAL_8250_FINTEK is not set
# CONFIG_SERIAL_UARTLITE is not set
CONFIG_SERIAL_CORE=m
# CONFIG_SERIAL_JSM is not set
# CONFIG_SERIAL_SCCNXP is not set
# CONFIG_SERIAL_SC16IS7XX is not set
# CONFIG_SERIAL_ALTERA_JTAGUART is not set
# CONFIG_SERIAL_ALTERA_UART is not set
# CONFIG_SERIAL_ARC is not set
# CONFIG_SERIAL_RP2 is not set
# CONFIG_SERIAL_FSL_LPUART is not set

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [4.3-rc1 regression] modular 8250 doesn't load

2015-09-14 Thread Mikael Pettersson

Greg Kroah-Hartman writes:
 > On Mon, Sep 14, 2015 at 08:06:21PM +0200, Mikael Pettersson wrote:
 > > Greg Kroah-Hartman writes:
 > >  > On Mon, Sep 14, 2015 at 07:08:10PM +0200, Mikael Pettersson wrote:
 > >  > > I have CONFIG_SERIAL_8250=m.  With 4.2 '/sbin/modprobe 8250' worked
 > >  > > and resulted in:
 > >  > > 
 > >  > > [   41.354550] Serial: 8250/16550 driver, 4 ports, IRQ sharing 
 > > disabled
 > >  > > [   41.375156] serial8250: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 
 > > 115200) is a 16550A
 > >  > > 
 > >  > > With 4.3-rc1 however the command fails and logs the following:
 > >  > > 
 > >  > > [   34.140300] 8250_base: module license 'unspecified' taints kernel.
 > >  > 
 > >  > Oops, need to fix that.
 > >  > 
 > >  > > [   34.141846] Disabling lock debugging due to kernel taint
 > >  > > [   34.143388] 8250_base: Unknown symbol uart_insert_char (err 0)
 > >  > > [   34.144908] 8250_base: Unknown symbol uart_handle_dcd_change (err 
 > > 0)
 > >  > > [   34.146439] 8250_base: Unknown symbol __pm_runtime_resume (err 0)
 > >  > > [   34.147901] 8250_base: Unknown symbol tty_termios_encode_baud_rate 
 > > (err 0)
 > >  > > [   34.149354] 8250_base: Unknown symbol uart_handle_cts_change (err 
 > > 0)
 > >  > > [   34.150798] 8250_base: Unknown symbol __pm_runtime_suspend (err 0)
 > >  > > [   34.152240] 8250_base: Unknown symbol nr_irqs (err 0)
 > >  > 
 > >  > Are you sure you did 'modprobe' and not 'insmod'?
 > > 
 > > Yes, I used modprobe.  I double-checked.
 > 
 > Then your build should have failed if these functions are not being
 > exported properly by your .config.  Most of these are in the serial_core
 > module, is that present/loaded?

Yes, serial_core is loaded.

uart_insert_char is EXPORT_SYMBOL_GPL, so could the missing license tag be 
preventing
8250_core from binding to it?  (I haven't checked the other symbols but I 
assume they
are also _GPL.)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [4.3-rc1 regression] modular 8250 doesn't load

2015-09-14 Thread Mikael Pettersson

Greg Kroah-Hartman writes:
 > On Mon, Sep 14, 2015 at 02:12:43PM -0700, Greg Kroah-Hartman wrote:
 > > On Mon, Sep 14, 2015 at 10:42:24PM +0200, Mikael Pettersson wrote:
 > > > Greg Kroah-Hartman writes:
 > > >  > On Mon, Sep 14, 2015 at 08:06:21PM +0200, Mikael Pettersson wrote:
 > > >  > > Greg Kroah-Hartman writes:
 > > >  > >  > On Mon, Sep 14, 2015 at 07:08:10PM +0200, Mikael Pettersson 
 > > > wrote:
 > > >  > >  > > I have CONFIG_SERIAL_8250=m.  With 4.2 '/sbin/modprobe 8250' 
 > > > worked
 > > >  > >  > > and resulted in:
 > > >  > >  > > 
 > > >  > >  > > [   41.354550] Serial: 8250/16550 driver, 4 ports, IRQ sharing 
 > > > disabled
 > > >  > >  > > [   41.375156] serial8250: ttyS0 at I/O 0x3f8 (irq = 4, 
 > > > base_baud = 115200) is a 16550A
 > > >  > >  > > 
 > > >  > >  > > With 4.3-rc1 however the command fails and logs the following:
 > > >  > >  > > 
 > > >  > >  > > [   34.140300] 8250_base: module license 'unspecified' taints 
 > > > kernel.
 > > >  > >  > 
 > > >  > >  > Oops, need to fix that.
 > > >  > >  > 
 > > >  > >  > > [   34.141846] Disabling lock debugging due to kernel taint
 > > >  > >  > > [   34.143388] 8250_base: Unknown symbol uart_insert_char (err 
 > > > 0)
 > > >  > >  > > [   34.144908] 8250_base: Unknown symbol 
 > > > uart_handle_dcd_change (err 0)
 > > >  > >  > > [   34.146439] 8250_base: Unknown symbol __pm_runtime_resume 
 > > > (err 0)
 > > >  > >  > > [   34.147901] 8250_base: Unknown symbol 
 > > > tty_termios_encode_baud_rate (err 0)
 > > >  > >  > > [   34.149354] 8250_base: Unknown symbol 
 > > > uart_handle_cts_change (err 0)
 > > >  > >  > > [   34.150798] 8250_base: Unknown symbol __pm_runtime_suspend 
 > > > (err 0)
 > > >  > >  > > [   34.152240] 8250_base: Unknown symbol nr_irqs (err 0)
 > > >  > >  > 
 > > >  > >  > Are you sure you did 'modprobe' and not 'insmod'?
 > > >  > > 
 > > >  > > Yes, I used modprobe.  I double-checked.
 > > >  > 
 > > >  > Then your build should have failed if these functions are not being
 > > >  > exported properly by your .config.  Most of these are in the 
 > > > serial_core
 > > >  > module, is that present/loaded?
 > > > 
 > > > Yes, serial_core is loaded.
 > > > 
 > > > uart_insert_char is EXPORT_SYMBOL_GPL, so could the missing license tag 
 > > > be preventing
 > > > 8250_core from binding to it?  (I haven't checked the other symbols but 
 > > > I assume they
 > > > are also _GPL.)
 > > 
 > > Ah, crap, yes, you are right.  You can test this with a simple:
 > >MODULE_LICENSE("GPL");
 > > line added to the 8250_base file.
 > 
 > Wait, 8250_base.c has a module license line.
 > 
 > Can you provide a full .config file?
 > 
 > thanks,
 > 
 > greg k-h

Here it is:

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_PERF_EVENTS_INTEL_UNCORE=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_64_SMP=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx 
-fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 
-fcall-saved-r11"
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
C

Re: [4.3-rc1 regression] modular 8250 doesn't load

2015-09-14 Thread Mikael Pettersson

Greg Kroah-Hartman writes:
 > On Mon, Sep 14, 2015 at 07:08:10PM +0200, Mikael Pettersson wrote:
 > > I have CONFIG_SERIAL_8250=m.  With 4.2 '/sbin/modprobe 8250' worked
 > > and resulted in:
 > > 
 > > [   41.354550] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
 > > [   41.375156] serial8250: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 
 > > 115200) is a 16550A
 > > 
 > > With 4.3-rc1 however the command fails and logs the following:
 > > 
 > > [   34.140300] 8250_base: module license 'unspecified' taints kernel.
 > 
 > Oops, need to fix that.
 > 
 > > [   34.141846] Disabling lock debugging due to kernel taint
 > > [   34.143388] 8250_base: Unknown symbol uart_insert_char (err 0)
 > > [   34.144908] 8250_base: Unknown symbol uart_handle_dcd_change (err 0)
 > > [   34.146439] 8250_base: Unknown symbol __pm_runtime_resume (err 0)
 > > [   34.147901] 8250_base: Unknown symbol tty_termios_encode_baud_rate (err 
 > > 0)
 > > [   34.149354] 8250_base: Unknown symbol uart_handle_cts_change (err 0)
 > > [   34.150798] 8250_base: Unknown symbol __pm_runtime_suspend (err 0)
 > > [   34.152240] 8250_base: Unknown symbol nr_irqs (err 0)
 > 
 > Are you sure you did 'modprobe' and not 'insmod'?

Yes, I used modprobe.  I double-checked.

 > Peter, care to send a module license fix for this new module you
 > created?

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [REGRESSION,BISECTED] 4.1-rc2 radeon audio changes oops the kernel hard

2015-05-04 Thread Mikael Pettersson

Deucher, Alexander writes:
 > > -Original Message-
 > > From: Mikael Pettersson [mailto:mikpeli...@gmail.com]
 > > Sent: Monday, May 04, 2015 11:53 AM
 > > To: linux-kernel@vger.kernel.org
 > > Cc: Deucher, Alexander
 > > Subject: [REGRESSION,BISECTED] 4.1-rc2 radeon audio changes oops the
 > > kernel hard
 > > 
 > > On my Ivy Bridge i7 mobo w/ Radeon graphics, the 4.1-rc2 kernel oopses
 > > hard,
 > > requiring a hard reset:
 > > 
 > > BUG: unable to handle kernel NULL pointer dereference at
 > > 0010
 > > IP: [] radeon_audio_detect+0x5b/0x150 [radeon]
 > > PGD 0
 > > Oops:  [#1] SMP
 > > Modules linked in: af_packet snd_hda_codec_generic snd_hda_intel
 > > snd_hda_controller snd_hda_codec snd_hwdep snd_hda_core snd_seq
 > > snd_seq_device snd_pcm radeon cfbfillrect cfbimgblt cfbcopyarea
 > > i2c_algo_bit backlight r8169 mii coretemp snd_timer drm_kms_helper ttm
 > > snd drm i2c_core xhci_pci xhci_hcd soundcore evdev firmware_class hwmon
 > > hid_generic usbhid hid ehci_pci ehci_hcd sr_mod cdrom usbcore
 > > usb_common ipv6
 > > CPU: 0 PID: 163 Comm: kworker/0:2 Not tainted 4.1.0-rc2 #1
 > > Hardware name: System manufacturer System Product Name/P8Z77-V LE
 > > PLUS, BIOS 0403 05/08/2012
 > > Workqueue: events output_poll_execute [drm_kms_helper]
 > > task: 8806012b1590 ti: 88003796 task.ti: 88003796
 > > RIP: 0010:[]  []
 > > radeon_audio_detect+0x5b/0x150 [radeon]
 > > RSP: 0018:880037963c78  EFLAGS: 00010246
 > > RAX: 880600c92da0 RBX: 880600cbb000 RCX: 0001
 > > RDX:  RSI:  RDI: 880037a3f600
 > > RBP: 880600c92da0 R08: 0001 R09: 0050
 > > R10: 0001 R11: 880603001a80 R12: 0001
 > > R13: 880600c924e0 R14: 880601f84000 R15: 0001
 > > FS:  () GS:88061ec0()
 > > knlGS:
 > > CS:  0010 DS:  ES:  CR0: 80050033
 > > CR2: 0010 CR3: 01478000 CR4: 001407f0
 > > Stack:
 > >  880600cbb000 0001 0001 880601f84000
 > >  a03e7d70 a03157ea 880601f84000 0002
 > >  880600baa200 880600cbb050 880600cbb000 880600e33800
 > > Call Trace:
 > >  [] ? radeon_dvi_detect+0x35a/0x4d0 [radeon]
 > >  [] ?
 > > drm_helper_probe_single_connector_modes_merge_bits+0x2e6/0x490
 > > [drm_kms_helper]
 > >  [] ?
 > > drm_fb_helper_probe_connector_modes.isra.5+0x48/0x70
 > > [drm_kms_helper]
 > >  [] ? drm_fb_helper_hotplug_event+0x55/0xe0
 > > [drm_kms_helper]
 > >  [] ? output_poll_execute+0x7c/0x1a0 [drm_kms_helper]
 > >  [] ? process_one_work+0x130/0x360
 > >  [] ? worker_thread+0x114/0x460
 > >  [] ? __schedule+0x20d/0x660
 > >  [] ? rescuer_thread+0x2f0/0x2f0
 > >  [] ? kthread+0xbc/0xe0
 > >  [] ? kthread_create_on_node+0x170/0x170
 > >  [] ? ret_from_fork+0x42/0x70
 > >  [] ? kthread_create_on_node+0x170/0x170
 > > Code: 8b 45 00 4c 8b ad 58 01 00 00 4c 8b 70 28 49 8b 85 00 01 00 00 48 85 
 > > c0 74
 > > 30 41 83 fc 01 74 38 48 8b 70 10 49 8b 96 c8 24 00 00 <48> 8b 4a 10 48 85 
 > > c9 74
 > > 0e 31 d2 4c 89 f7 ff d1 49 8b 85 00 01
 > > RIP  [] radeon_audio_detect+0x5b/0x150 [radeon]
 > >  RSP 
 > > CR2: 0010
 > > ---[ end trace 5b99e3870bfc7a92 ]---
 > > BUG: unable to handle kernel paging request at ffd8
 > > IP: [] kthread_data+0x7/0x10
 > > PGD 1479067 PUD 147b067 PMD 0
 > > Oops:  [#2] SMP
 > > Modules linked in: af_packet snd_hda_codec_generic snd_hda_intel
 > > snd_hda_controller snd_hda_codec snd_hwdep snd_hda_core snd_seq
 > > snd_seq_device snd_pcm radeon cfbfillrect cfbimgblt cfbcopyarea
 > > i2c_algo_bit backlight r8169 mii coretemp snd_timer drm_kms_helper ttm
 > > snd drm i2c_core xhci_pci xhci_hcd soundcore evdev firmware_class hwmon
 > > hid_generic usbhid hid ehci_pci ehci_hcd sr_mod cdrom usbcore
 > > usb_common ipv6
 > > CPU: 0 PID: 163 Comm: kworker/0:2 Tainted: G  D 4.1.0-rc2 #1
 > > Hardware name: System manufacturer System Product Name/P8Z77-V LE
 > > PLUS, BIOS 0403 05/08/2012
 > > task: 8806012b1590 ti: 88003796 task.ti: 88003796
 > > RIP: 0010:[]  [] kthread_data+0x7/0x10
 > > RSP: 0018:880037963a60  EFLAGS: 00010002
 > > RAX:  RBX:  RCX: 73c2bc6e
 > > RDX:  RSI:  RDI: 880601

[REGRESSION,BISECTED] 4.1-rc2 radeon audio changes oops the kernel hard

2015-05-04 Thread Mikael Pettersson

On my Ivy Bridge i7 mobo w/ Radeon graphics, the 4.1-rc2 kernel oopses hard,
requiring a hard reset:

BUG: unable to handle kernel NULL pointer dereference at 0010
IP: [] radeon_audio_detect+0x5b/0x150 [radeon]
PGD 0 
Oops:  [#1] SMP 
Modules linked in: af_packet snd_hda_codec_generic snd_hda_intel 
snd_hda_controller snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device 
snd_pcm radeon cfbfillrect cfbimgblt cfbcopyarea i2c_algo_bit backlight r8169 
mii coretemp snd_timer drm_kms_helper ttm snd drm i2c_core xhci_pci xhci_hcd 
soundcore evdev firmware_class hwmon hid_generic usbhid hid ehci_pci ehci_hcd 
sr_mod cdrom usbcore usb_common ipv6
CPU: 0 PID: 163 Comm: kworker/0:2 Not tainted 4.1.0-rc2 #1
Hardware name: System manufacturer System Product Name/P8Z77-V LE PLUS, BIOS 
0403 05/08/2012
Workqueue: events output_poll_execute [drm_kms_helper]
task: 8806012b1590 ti: 88003796 task.ti: 88003796
RIP: 0010:[]  [] 
radeon_audio_detect+0x5b/0x150 [radeon]
RSP: 0018:880037963c78  EFLAGS: 00010246
RAX: 880600c92da0 RBX: 880600cbb000 RCX: 0001
RDX:  RSI:  RDI: 880037a3f600
RBP: 880600c92da0 R08: 0001 R09: 0050
R10: 0001 R11: 880603001a80 R12: 0001
R13: 880600c924e0 R14: 880601f84000 R15: 0001
FS:  () GS:88061ec0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0010 CR3: 01478000 CR4: 001407f0
Stack:
 880600cbb000 0001 0001 880601f84000
 a03e7d70 a03157ea 880601f84000 0002
 880600baa200 880600cbb050 880600cbb000 880600e33800
Call Trace:
 [] ? radeon_dvi_detect+0x35a/0x4d0 [radeon]
 [] ? 
drm_helper_probe_single_connector_modes_merge_bits+0x2e6/0x490 [drm_kms_helper]
 [] ? drm_fb_helper_probe_connector_modes.isra.5+0x48/0x70 
[drm_kms_helper]
 [] ? drm_fb_helper_hotplug_event+0x55/0xe0 [drm_kms_helper]
 [] ? output_poll_execute+0x7c/0x1a0 [drm_kms_helper]
 [] ? process_one_work+0x130/0x360
 [] ? worker_thread+0x114/0x460
 [] ? __schedule+0x20d/0x660
 [] ? rescuer_thread+0x2f0/0x2f0
 [] ? kthread+0xbc/0xe0
 [] ? kthread_create_on_node+0x170/0x170
 [] ? ret_from_fork+0x42/0x70
 [] ? kthread_create_on_node+0x170/0x170
Code: 8b 45 00 4c 8b ad 58 01 00 00 4c 8b 70 28 49 8b 85 00 01 00 00 48 85 c0 
74 30 41 83 fc 01 74 38 48 8b 70 10 49 8b 96 c8 24 00 00 <48> 8b 4a 10 48 85 c9 
74 0e 31 d2 4c 89 f7 ff d1 49 8b 85 00 01 
RIP  [] radeon_audio_detect+0x5b/0x150 [radeon]
 RSP 
CR2: 0010
---[ end trace 5b99e3870bfc7a92 ]---
BUG: unable to handle kernel paging request at ffd8
IP: [] kthread_data+0x7/0x10
PGD 1479067 PUD 147b067 PMD 0 
Oops:  [#2] SMP 
Modules linked in: af_packet snd_hda_codec_generic snd_hda_intel 
snd_hda_controller snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device 
snd_pcm radeon cfbfillrect cfbimgblt cfbcopyarea i2c_algo_bit backlight r8169 
mii coretemp snd_timer drm_kms_helper ttm snd drm i2c_core xhci_pci xhci_hcd 
soundcore evdev firmware_class hwmon hid_generic usbhid hid ehci_pci ehci_hcd 
sr_mod cdrom usbcore usb_common ipv6
CPU: 0 PID: 163 Comm: kworker/0:2 Tainted: G  D 4.1.0-rc2 #1
Hardware name: System manufacturer System Product Name/P8Z77-V LE PLUS, BIOS 
0403 05/08/2012
task: 8806012b1590 ti: 88003796 task.ti: 88003796
RIP: 0010:[]  [] kthread_data+0x7/0x10
RSP: 0018:880037963a60  EFLAGS: 00010002
RAX:  RBX:  RCX: 73c2bc6e
RDX:  RSI:  RDI: 8806012b1590
RBP: 8806012b1590 R08: 0001 R09: 0001
R10: ea001804b800 R11: 001a R12: 8806012b1980
R13:  R14: 00014300 R15: 
FS:  () GS:88061ec0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0028 CR3: 01478000 CR4: 001407f0
Stack:
 81051068 88061ec14300 8134c203 
 880037964000 8806012b1878 880037963af8 
 880603188000 8806012b1590 8134c4aa 8800379637d8
Call Trace:
 [] ? wq_worker_sleeping+0x8/0x90
 [] ? __schedule+0x3e3/0x660
 [] ? schedule+0x2a/0x80
 [] ? do_exit+0x61e/0xa20
 [] ? oops_end+0x66/0xa0
 [] ? no_context+0x236/0x286
 [] ? page_fault+0x1f/0x30
 [] ? radeon_audio_detect+0x5b/0x150 [radeon]
 [] ? radeon_audio_detect+0xe2/0x150 [radeon]
 [] ? radeon_dvi_detect+0x35a/0x4d0 [radeon]
 [] ? 
drm_helper_probe_single_connector_modes_merge_bits+0x2e6/0x490 [drm_kms_helper]
 [] ? drm_fb_helper_probe_connector_modes.isra.5+0x48/0x70 
[drm_kms_helper]
 [] ? drm_fb_helper_hotplug_event+0x55/0xe0 [drm_kms_helper]
 [] ? output_poll_execute+0x7c/0x1a0 [drm_kms_helper]
 [] ? process_one_work+0x130/0x360
 [] ?

[REGRESSION,BISECTED] 4.1-rc2 radeon audio changes oops the kernel hard

2015-05-04 Thread Mikael Pettersson

On my Ivy Bridge i7 mobo w/ Radeon graphics, the 4.1-rc2 kernel oopses hard,
requiring a hard reset:

BUG: unable to handle kernel NULL pointer dereference at 0010
IP: [a03d0e1b] radeon_audio_detect+0x5b/0x150 [radeon]
PGD 0 
Oops:  [#1] SMP 
Modules linked in: af_packet snd_hda_codec_generic snd_hda_intel 
snd_hda_controller snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device 
snd_pcm radeon cfbfillrect cfbimgblt cfbcopyarea i2c_algo_bit backlight r8169 
mii coretemp snd_timer drm_kms_helper ttm snd drm i2c_core xhci_pci xhci_hcd 
soundcore evdev firmware_class hwmon hid_generic usbhid hid ehci_pci ehci_hcd 
sr_mod cdrom usbcore usb_common ipv6
CPU: 0 PID: 163 Comm: kworker/0:2 Not tainted 4.1.0-rc2 #1
Hardware name: System manufacturer System Product Name/P8Z77-V LE PLUS, BIOS 
0403 05/08/2012
Workqueue: events output_poll_execute [drm_kms_helper]
task: 8806012b1590 ti: 88003796 task.ti: 88003796
RIP: 0010:[a03d0e1b]  [a03d0e1b] 
radeon_audio_detect+0x5b/0x150 [radeon]
RSP: 0018:880037963c78  EFLAGS: 00010246
RAX: 880600c92da0 RBX: 880600cbb000 RCX: 0001
RDX:  RSI:  RDI: 880037a3f600
RBP: 880600c92da0 R08: 0001 R09: 0050
R10: 0001 R11: 880603001a80 R12: 0001
R13: 880600c924e0 R14: 880601f84000 R15: 0001
FS:  () GS:88061ec0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0010 CR3: 01478000 CR4: 001407f0
Stack:
 880600cbb000 0001 0001 880601f84000
 a03e7d70 a03157ea 880601f84000 0002
 880600baa200 880600cbb050 880600cbb000 880600e33800
Call Trace:
 [a03157ea] ? radeon_dvi_detect+0x35a/0x4d0 [radeon]
 [a0262b06] ? 
drm_helper_probe_single_connector_modes_merge_bits+0x2e6/0x490 [drm_kms_helper]
 [a026bde8] ? drm_fb_helper_probe_connector_modes.isra.5+0x48/0x70 
[drm_kms_helper]
 [a026cf55] ? drm_fb_helper_hotplug_event+0x55/0xe0 [drm_kms_helper]
 [a026267c] ? output_poll_execute+0x7c/0x1a0 [drm_kms_helper]
 [81050680] ? process_one_work+0x130/0x360
 [81050cb4] ? worker_thread+0x114/0x460
 [8134c02d] ? __schedule+0x20d/0x660
 [81050ba0] ? rescuer_thread+0x2f0/0x2f0
 [81054e4c] ? kthread+0xbc/0xe0
 [81054d90] ? kthread_create_on_node+0x170/0x170
 [8134f9e2] ? ret_from_fork+0x42/0x70
 [81054d90] ? kthread_create_on_node+0x170/0x170
Code: 8b 45 00 4c 8b ad 58 01 00 00 4c 8b 70 28 49 8b 85 00 01 00 00 48 85 c0 
74 30 41 83 fc 01 74 38 48 8b 70 10 49 8b 96 c8 24 00 00 48 8b 4a 10 48 85 c9 
74 0e 31 d2 4c 89 f7 ff d1 49 8b 85 00 01 
RIP  [a03d0e1b] radeon_audio_detect+0x5b/0x150 [radeon]
 RSP 880037963c78
CR2: 0010
---[ end trace 5b99e3870bfc7a92 ]---
BUG: unable to handle kernel paging request at ffd8
IP: [810552d7] kthread_data+0x7/0x10
PGD 1479067 PUD 147b067 PMD 0 
Oops:  [#2] SMP 
Modules linked in: af_packet snd_hda_codec_generic snd_hda_intel 
snd_hda_controller snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device 
snd_pcm radeon cfbfillrect cfbimgblt cfbcopyarea i2c_algo_bit backlight r8169 
mii coretemp snd_timer drm_kms_helper ttm snd drm i2c_core xhci_pci xhci_hcd 
soundcore evdev firmware_class hwmon hid_generic usbhid hid ehci_pci ehci_hcd 
sr_mod cdrom usbcore usb_common ipv6
CPU: 0 PID: 163 Comm: kworker/0:2 Tainted: G  D 4.1.0-rc2 #1
Hardware name: System manufacturer System Product Name/P8Z77-V LE PLUS, BIOS 
0403 05/08/2012
task: 8806012b1590 ti: 88003796 task.ti: 88003796
RIP: 0010:[810552d7]  [810552d7] kthread_data+0x7/0x10
RSP: 0018:880037963a60  EFLAGS: 00010002
RAX:  RBX:  RCX: 73c2bc6e
RDX:  RSI:  RDI: 8806012b1590
RBP: 8806012b1590 R08: 0001 R09: 0001
R10: ea001804b800 R11: 001a R12: 8806012b1980
R13:  R14: 00014300 R15: 
FS:  () GS:88061ec0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0028 CR3: 01478000 CR4: 001407f0
Stack:
 81051068 88061ec14300 8134c203 
 880037964000 8806012b1878 880037963af8 
 880603188000 8806012b1590 8134c4aa 8800379637d8
Call Trace:
 [81051068] ? wq_worker_sleeping+0x8/0x90
 [8134c203] ? __schedule+0x3e3/0x660
 [8134c4aa] ? schedule+0x2a/0x80
 [8103eb7e] ? do_exit+0x61e/0xa20
 [810059f6] ? oops_end+0x66/0xa0
 [813487df] ? no_context+0x236/0x286
 [81350bbf] ? page_fault+0x1f/0x30

RE: [REGRESSION,BISECTED] 4.1-rc2 radeon audio changes oops the kernel hard

2015-05-04 Thread Mikael Pettersson

Deucher, Alexander writes:
   -Original Message-
   From: Mikael Pettersson [mailto:mikpeli...@gmail.com]
   Sent: Monday, May 04, 2015 11:53 AM
   To: linux-kernel@vger.kernel.org
   Cc: Deucher, Alexander
   Subject: [REGRESSION,BISECTED] 4.1-rc2 radeon audio changes oops the
   kernel hard
   
   On my Ivy Bridge i7 mobo w/ Radeon graphics, the 4.1-rc2 kernel oopses
   hard,
   requiring a hard reset:
   
   BUG: unable to handle kernel NULL pointer dereference at
   0010
   IP: [a03d0e1b] radeon_audio_detect+0x5b/0x150 [radeon]
   PGD 0
   Oops:  [#1] SMP
   Modules linked in: af_packet snd_hda_codec_generic snd_hda_intel
   snd_hda_controller snd_hda_codec snd_hwdep snd_hda_core snd_seq
   snd_seq_device snd_pcm radeon cfbfillrect cfbimgblt cfbcopyarea
   i2c_algo_bit backlight r8169 mii coretemp snd_timer drm_kms_helper ttm
   snd drm i2c_core xhci_pci xhci_hcd soundcore evdev firmware_class hwmon
   hid_generic usbhid hid ehci_pci ehci_hcd sr_mod cdrom usbcore
   usb_common ipv6
   CPU: 0 PID: 163 Comm: kworker/0:2 Not tainted 4.1.0-rc2 #1
   Hardware name: System manufacturer System Product Name/P8Z77-V LE
   PLUS, BIOS 0403 05/08/2012
   Workqueue: events output_poll_execute [drm_kms_helper]
   task: 8806012b1590 ti: 88003796 task.ti: 88003796
   RIP: 0010:[a03d0e1b]  [a03d0e1b]
   radeon_audio_detect+0x5b/0x150 [radeon]
   RSP: 0018:880037963c78  EFLAGS: 00010246
   RAX: 880600c92da0 RBX: 880600cbb000 RCX: 0001
   RDX:  RSI:  RDI: 880037a3f600
   RBP: 880600c92da0 R08: 0001 R09: 0050
   R10: 0001 R11: 880603001a80 R12: 0001
   R13: 880600c924e0 R14: 880601f84000 R15: 0001
   FS:  () GS:88061ec0()
   knlGS:
   CS:  0010 DS:  ES:  CR0: 80050033
   CR2: 0010 CR3: 01478000 CR4: 001407f0
   Stack:
880600cbb000 0001 0001 880601f84000
a03e7d70 a03157ea 880601f84000 0002
880600baa200 880600cbb050 880600cbb000 880600e33800
   Call Trace:
[a03157ea] ? radeon_dvi_detect+0x35a/0x4d0 [radeon]
[a0262b06] ?
   drm_helper_probe_single_connector_modes_merge_bits+0x2e6/0x490
   [drm_kms_helper]
[a026bde8] ?
   drm_fb_helper_probe_connector_modes.isra.5+0x48/0x70
   [drm_kms_helper]
[a026cf55] ? drm_fb_helper_hotplug_event+0x55/0xe0
   [drm_kms_helper]
[a026267c] ? output_poll_execute+0x7c/0x1a0 [drm_kms_helper]
[81050680] ? process_one_work+0x130/0x360
[81050cb4] ? worker_thread+0x114/0x460
[8134c02d] ? __schedule+0x20d/0x660
[81050ba0] ? rescuer_thread+0x2f0/0x2f0
[81054e4c] ? kthread+0xbc/0xe0
[81054d90] ? kthread_create_on_node+0x170/0x170
[8134f9e2] ? ret_from_fork+0x42/0x70
[81054d90] ? kthread_create_on_node+0x170/0x170
   Code: 8b 45 00 4c 8b ad 58 01 00 00 4c 8b 70 28 49 8b 85 00 01 00 00 48 85 
   c0 74
   30 41 83 fc 01 74 38 48 8b 70 10 49 8b 96 c8 24 00 00 48 8b 4a 10 48 85 
   c9 74
   0e 31 d2 4c 89 f7 ff d1 49 8b 85 00 01
   RIP  [a03d0e1b] radeon_audio_detect+0x5b/0x150 [radeon]
RSP 880037963c78
   CR2: 0010
   ---[ end trace 5b99e3870bfc7a92 ]---
   BUG: unable to handle kernel paging request at ffd8
   IP: [810552d7] kthread_data+0x7/0x10
   PGD 1479067 PUD 147b067 PMD 0
   Oops:  [#2] SMP
   Modules linked in: af_packet snd_hda_codec_generic snd_hda_intel
   snd_hda_controller snd_hda_codec snd_hwdep snd_hda_core snd_seq
   snd_seq_device snd_pcm radeon cfbfillrect cfbimgblt cfbcopyarea
   i2c_algo_bit backlight r8169 mii coretemp snd_timer drm_kms_helper ttm
   snd drm i2c_core xhci_pci xhci_hcd soundcore evdev firmware_class hwmon
   hid_generic usbhid hid ehci_pci ehci_hcd sr_mod cdrom usbcore
   usb_common ipv6
   CPU: 0 PID: 163 Comm: kworker/0:2 Tainted: G  D 4.1.0-rc2 #1
   Hardware name: System manufacturer System Product Name/P8Z77-V LE
   PLUS, BIOS 0403 05/08/2012
   task: 8806012b1590 ti: 88003796 task.ti: 88003796
   RIP: 0010:[810552d7]  [810552d7] kthread_data+0x7/0x10
   RSP: 0018:880037963a60  EFLAGS: 00010002
   RAX:  RBX:  RCX: 73c2bc6e
   RDX:  RSI:  RDI: 8806012b1590
   RBP: 8806012b1590 R08: 0001 R09: 0001
   R10: ea001804b800 R11: 001a R12: 8806012b1980
   R13:  R14: 00014300 R15: 
   FS:  () GS:88061ec0()
   knlGS:
   CS:  0010 DS:  ES:  CR0: 80050033
   CR2: 0028 CR3: 01478000 CR4

Re: [PATCH] seccomp.2: Add note about alarm(2) not being sufficient to limit runtime

2015-03-12 Thread Mikael Pettersson

Jann Horn writes:
 > On Wed, Mar 11, 2015 at 10:43:50PM +0100, Mikael Pettersson wrote:
 > > Jann Horn writes:
 > >  > Or should I throw this patch away and write a patch
 > >  > for the prctl() manpage instead that documents that
 > >  > being able to call sigreturn() implies being able to
 > >  > effectively call sigprocmask(), at least on some
 > >  > architectures like X86?
 > > 
 > > Well, that is the semantics of sigreturn().  It is essentially
 > > setcontext() [which includes the actions of sigprocmask()], but
 > > with restrictions on parameter placement (at least on x86).
 > > 
 > > You could introduce some setting to restrict that aspect for
 > > seccomp processes, but you can't change this for normal processes
 > > without breaking things.
 > 
 > Then I think it's probably better and easier to just document the existing
 > behavior? If a new setting would have to be introduced and developers would
 > need to be aware of that, it's probably easier to just tell everyone to use
 > SIGKILL.
 > 
 > Does this manpage patch look good?

LGTM

Acked-by: Mikael Pettersson 

 > 
 > ---
 >  man2/seccomp.2 | 25 +
 >  1 file changed, 25 insertions(+)
 > 
 > diff --git a/man2/seccomp.2 b/man2/seccomp.2
 > index 702ceb8..f762d07 100644
 > --- a/man2/seccomp.2
 > +++ b/man2/seccomp.2
 > @@ -64,6 +64,31 @@ Strict secure computing mode is useful for 
 > number-crunching
 >  applications that may need to execute untrusted byte code, perhaps
 >  obtained by reading from a pipe or socket.
 >  
 > +Note that although the calling thread can no longer call
 > +.BR sigprocmask (2),
 > +it can use
 > +.BR sigreturn (2)
 > +to block all signals apart from
 > +.BR SIGKILL
 > +and
 > +.BR SIGSTOP .
 > +Therefore, to reliably terminate it,
 > +.BR SIGKILL
 > +has to be used, meaning that e.g.
 > +.BR alarm (2)
 > +is not sufficient for restricting its runtime. Instead, use
 > +.BR timer_create (2)
 > +with
 > +.BR SIGEV_SIGNAL
 > +and
 > +.BR sigev_signo
 > +set to
 > +.BR SIGKILL
 > +or use
 > +.BR setrlimit (2)
 > +to set the hard limit for
 > +.BR RLIMIT_CPU .
 > +
 >  This operation is available only if the kernel is configured with
 >  .BR CONFIG_SECCOMP
 >  enabled.
 > -- 
 > 2.1.4

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Don't allow blocking of signals using sigreturn.

2015-03-12 Thread Mikael Pettersson

Andy Lutomirski writes:
 > On Wed, Mar 11, 2015 at 2:43 PM, Mikael Pettersson  
 > wrote:
 > > Jann Horn writes:
 > >  > Or should I throw this patch away and write a patch
 > >  > for the prctl() manpage instead that documents that
 > >  > being able to call sigreturn() implies being able to
 > >  > effectively call sigprocmask(), at least on some
 > >  > architectures like X86?
 > >
 > > Well, that is the semantics of sigreturn().  It is essentially
 > > setcontext() [which includes the actions of sigprocmask()], but
 > > with restrictions on parameter placement (at least on x86).
 > >
 > > You could introduce some setting to restrict that aspect for
 > > seccomp processes, but you can't change this for normal processes
 > > without breaking things.
 > 
 > Which leads to the interesting question: does anyone ever call
 > sigreturn with a different signal mask than the kernel put there
 > during signal delivery

Yes.  Either a sigfillset();sigdelset(SIGSEGV), or a copy of the
thread's sigmask from a previous sigframe.

 > or, even more strangely, with a totally made up
 > context?

Not "totally made up", but certainly with adjustments(*) made to
both GPRs and PC.  In a different piece of SW: FPU controls.

(*) Rolling back or force-committing a micro-transaction until
PC+GPRs represent the state at an original instruction boundary.
This was in a product using dynamic binary instrumentation.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] seccomp.2: Add note about alarm(2) not being sufficient to limit runtime

2015-03-12 Thread Mikael Pettersson

Jann Horn writes:
  On Wed, Mar 11, 2015 at 10:43:50PM +0100, Mikael Pettersson wrote:
   Jann Horn writes:
 Or should I throw this patch away and write a patch
 for the prctl() manpage instead that documents that
 being able to call sigreturn() implies being able to
 effectively call sigprocmask(), at least on some
 architectures like X86?
   
   Well, that is the semantics of sigreturn().  It is essentially
   setcontext() [which includes the actions of sigprocmask()], but
   with restrictions on parameter placement (at least on x86).
   
   You could introduce some setting to restrict that aspect for
   seccomp processes, but you can't change this for normal processes
   without breaking things.
  
  Then I think it's probably better and easier to just document the existing
  behavior? If a new setting would have to be introduced and developers would
  need to be aware of that, it's probably easier to just tell everyone to use
  SIGKILL.
  
  Does this manpage patch look good?

LGTM

Acked-by: Mikael Pettersson mikpeli...@gmail.com

  
  ---
   man2/seccomp.2 | 25 +
   1 file changed, 25 insertions(+)
  
  diff --git a/man2/seccomp.2 b/man2/seccomp.2
  index 702ceb8..f762d07 100644
  --- a/man2/seccomp.2
  +++ b/man2/seccomp.2
  @@ -64,6 +64,31 @@ Strict secure computing mode is useful for 
  number-crunching
   applications that may need to execute untrusted byte code, perhaps
   obtained by reading from a pipe or socket.
   
  +Note that although the calling thread can no longer call
  +.BR sigprocmask (2),
  +it can use
  +.BR sigreturn (2)
  +to block all signals apart from
  +.BR SIGKILL
  +and
  +.BR SIGSTOP .
  +Therefore, to reliably terminate it,
  +.BR SIGKILL
  +has to be used, meaning that e.g.
  +.BR alarm (2)
  +is not sufficient for restricting its runtime. Instead, use
  +.BR timer_create (2)
  +with
  +.BR SIGEV_SIGNAL
  +and
  +.BR sigev_signo
  +set to
  +.BR SIGKILL
  +or use
  +.BR setrlimit (2)
  +to set the hard limit for
  +.BR RLIMIT_CPU .
  +
   This operation is available only if the kernel is configured with
   .BR CONFIG_SECCOMP
   enabled.
  -- 
  2.1.4

-- 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Don't allow blocking of signals using sigreturn.

2015-03-12 Thread Mikael Pettersson

Andy Lutomirski writes:
  On Wed, Mar 11, 2015 at 2:43 PM, Mikael Pettersson mikpeli...@gmail.com 
  wrote:
   Jann Horn writes:
 Or should I throw this patch away and write a patch
 for the prctl() manpage instead that documents that
 being able to call sigreturn() implies being able to
 effectively call sigprocmask(), at least on some
 architectures like X86?
  
   Well, that is the semantics of sigreturn().  It is essentially
   setcontext() [which includes the actions of sigprocmask()], but
   with restrictions on parameter placement (at least on x86).
  
   You could introduce some setting to restrict that aspect for
   seccomp processes, but you can't change this for normal processes
   without breaking things.
  
  Which leads to the interesting question: does anyone ever call
  sigreturn with a different signal mask than the kernel put there
  during signal delivery

Yes.  Either a sigfillset();sigdelset(SIGSEGV), or a copy of the
thread's sigmask from a previous sigframe.

  or, even more strangely, with a totally made up
  context?

Not totally made up, but certainly with adjustments(*) made to
both GPRs and PC.  In a different piece of SW: FPU controls.

(*) Rolling back or force-committing a micro-transaction until
PC+GPRs represent the state at an original instruction boundary.
This was in a product using dynamic binary instrumentation.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Don't allow blocking of signals using sigreturn.

2015-03-11 Thread Mikael Pettersson

Jann Horn writes:
 > Or should I throw this patch away and write a patch
 > for the prctl() manpage instead that documents that
 > being able to call sigreturn() implies being able to
 > effectively call sigprocmask(), at least on some
 > architectures like X86?

Well, that is the semantics of sigreturn().  It is essentially
setcontext() [which includes the actions of sigprocmask()], but
with restrictions on parameter placement (at least on x86).

You could introduce some setting to restrict that aspect for
seccomp processes, but you can't change this for normal processes
without breaking things.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Don't allow blocking of signals using sigreturn.

2015-03-11 Thread Mikael Pettersson

Jann Horn writes:
  Or should I throw this patch away and write a patch
  for the prctl() manpage instead that documents that
  being able to call sigreturn() implies being able to
  effectively call sigprocmask(), at least on some
  architectures like X86?

Well, that is the semantics of sigreturn().  It is essentially
setcontext() [which includes the actions of sigprocmask()], but
with restrictions on parameter placement (at least on x86).

You could introduce some setting to restrict that aspect for
seccomp processes, but you can't change this for normal processes
without breaking things.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: status of ia64 / hpsim

2015-01-05 Thread Mikael Pettersson

Tony Luck writes:
 > On Tue, Dec 30, 2014 at 7:50 AM, Christoph Hellwig  
 > wrote:
 > > IS the ia64 hpsim architecture still in use?  I noticed it because it
 > > has a fairly rudimentary SCSI driver under arch/ia64, which doesn't
 > > look very maintained.
 > 
 > Mikael was doing something with hpsim on the ski simulator back in Jan'14. 
 > Was
 > that something real, or just playing because it was there?

I was trying to set up an emulated platform for continuous GCC bootstrap and
regression testsuite runs, but something broke the ia64 kernel causing EXT4
file system errors in the emulated platform, so I had to scrap that idea.

I tried various ia64 kernel versions, compiling the ia64 kernel with older
GCCs, and compiling SKI with older host (x86_64) GCCs, but nothing worked.
With no known-good starting point there was no reasonable way for me to
debug the problem.

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: status of ia64 / hpsim

2015-01-05 Thread Mikael Pettersson

Tony Luck writes:
  On Tue, Dec 30, 2014 at 7:50 AM, Christoph Hellwig h...@infradead.org 
  wrote:
   IS the ia64 hpsim architecture still in use?  I noticed it because it
   has a fairly rudimentary SCSI driver under arch/ia64, which doesn't
   look very maintained.
  
  Mikael was doing something with hpsim on the ski simulator back in Jan'14. 
  Was
  that something real, or just playing because it was there?

I was trying to set up an emulated platform for continuous GCC bootstrap and
regression testsuite runs, but something broke the ia64 kernel causing EXT4
file system errors in the emulated platform, so I had to scrap that idea.

I tried various ia64 kernel versions, compiling the ia64 kernel with older
GCCs, and compiling SKI with older host (x86_64) GCCs, but nothing worked.
With no known-good starting point there was no reasonable way for me to
debug the problem.

/Mikael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] arm: Blacklist gcc 4.8.[012] and 4.9.0 with CONFIG_FRAME_POINTER

2014-10-12 Thread Mikael Pettersson

Peter Hurley writes:
 > On 10/11/2014 12:33 PM, Mikael Pettersson wrote:
 > > Peter Hurley writes:
 > >  > On 10/10/2014 12:36 PM, Russell King - ARM Linux wrote:
 > >  > > On Fri, Oct 10, 2014 at 12:26:14PM -0400, Peter Hurley wrote:
 > >  > >> gcc versions 4.8.[012] and 4.9.0 generates code that prematurely
 > >  > >> adjusts the stack pointer such that still-to-be-referenced locals
 > >  > >> are below the stack pointer, which allows them to be overwritten
 > >  > >> by interrupts.
 > >  > > 
 > >  > > I would much rather do this in asm-offsets.c, along side the other ARM
 > >  > > specific buggy compiler test(s).  I'm presently putting together such
 > >  > > a patch.
 > >  > > 
 > >  > > The information in the thread on linux-omap says only GCC 4.8.1 and
 > >  > > GCC 4.8.2.  Where do you get the other versions from?
 > >  > 
 > >  > The gcc PR linked in the commit message; see the "Known to fail" field.
 > > 
 > > The 4.8.0 release is broken, but the 4.9.0 one is not.  It's unfortunate,
 > > but "4.9.0" may refer to "the 4.9.0 release" or to "some point after trunk
 > > forked 4.8 branch up to and including the 4.9.0 release point".  In this
 > > case, it's the latter -- this can be inferred from the fact that the
 > > fix went into trunk in October 2013 while 4.9.0 was branched and released
 > > during the first half of 2014.
 > 
 > Is there a reasonably quick way to determine if a particular commit is
 > in a particular release of gcc?

If you want the process to be fully automatic and the answer to be
absolutely precise, then "no".

If you're willing to manually map GCC PR fixes to release versions,
and to have some false negatives (some GCCs having a certain fix
will be flagged as not having it), then "yes".

For this ARM bug, PR58854, we know that 4.8.[0-2] have the bug, but
4.7 and older, 4.8.3 and newer, and 4.9 and newer are Ok.

A problem is that a GCC that identifies itself as 4.8.3 may be
(a) a 4.8.3 pre-release (i.e., close to 4.8.2),
(b) a 4.8.3 release, or
(c) a 4.8.4 pre-release that's been patched to say 4.8.3 (Red Hat does this).

Case (a) may or may not have the fix (we can't easily(*) tell), but
cases (b) and (c) are Ok.  If you're willing to classify all three
as not having the fix (false negatives), then you want to test

#if (__GNUC__ == 4 && __GNUC_MINOR__ == 8 && __GNUC_PATCHLEVEL__ < 4)

for possibly broken versions.

A complication is that a bug has both starting and ending commits.
It's not uncommon for distros and others to backport changes, so a
compiler that claims to be e.g. 4.7.4 may include a backport of the
4.8 change that caused the bug you're trying to avoid.  There is no
easy way to detect this, unless you have a runtime test case for the
bug.  I'd ignore this case as "unfixable".

So I'd write the tests for vanilla upstream GCC only, and tell distro
users to complain to their distros if their kernels get miscompiled.

(*) __VERSION__ is defined like "4.8.3 20140515 (prerelease)" in
pre-releases but like "4.8.3" in ordinary releases, but this is not
something you can test for in the C preprocessor.  A configure-time
check could extract the date and compare that with the date the fix
went into that particular branch, but case (c) above make detecting
pre-releases a bit more complicated.

 > Starting from the mainline viewcvs revision page for this fix here,
 > https://gcc.gnu.org/viewcvs/gcc?view=revision=204203
 > (which is the link from the PR for the fix), navigation to anywhere
 > else in the gcc tree is impossible. I can't even look at the Changelog.

https://gcc.gnu.org/viewcvs/gcc/
then descend trunk or branches as needed.

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] arm: Blacklist gcc 4.8.[012] and 4.9.0 with CONFIG_FRAME_POINTER

2014-10-12 Thread Mikael Pettersson

Peter Hurley writes:
  On 10/11/2014 12:33 PM, Mikael Pettersson wrote:
   Peter Hurley writes:
 On 10/10/2014 12:36 PM, Russell King - ARM Linux wrote:
  On Fri, Oct 10, 2014 at 12:26:14PM -0400, Peter Hurley wrote:
  gcc versions 4.8.[012] and 4.9.0 generates code that prematurely
  adjusts the stack pointer such that still-to-be-referenced locals
  are below the stack pointer, which allows them to be overwritten
  by interrupts.
  
  I would much rather do this in asm-offsets.c, along side the other ARM
  specific buggy compiler test(s).  I'm presently putting together such
  a patch.
  
  The information in the thread on linux-omap says only GCC 4.8.1 and
  GCC 4.8.2.  Where do you get the other versions from?
 
 The gcc PR linked in the commit message; see the Known to fail field.
   
   The 4.8.0 release is broken, but the 4.9.0 one is not.  It's unfortunate,
   but 4.9.0 may refer to the 4.9.0 release or to some point after trunk
   forked 4.8 branch up to and including the 4.9.0 release point.  In this
   case, it's the latter -- this can be inferred from the fact that the
   fix went into trunk in October 2013 while 4.9.0 was branched and released
   during the first half of 2014.
  
  Is there a reasonably quick way to determine if a particular commit is
  in a particular release of gcc?

If you want the process to be fully automatic and the answer to be
absolutely precise, then no.

If you're willing to manually map GCC PR fixes to release versions,
and to have some false negatives (some GCCs having a certain fix
will be flagged as not having it), then yes.

For this ARM bug, PR58854, we know that 4.8.[0-2] have the bug, but
4.7 and older, 4.8.3 and newer, and 4.9 and newer are Ok.

A problem is that a GCC that identifies itself as 4.8.3 may be
(a) a 4.8.3 pre-release (i.e., close to 4.8.2),
(b) a 4.8.3 release, or
(c) a 4.8.4 pre-release that's been patched to say 4.8.3 (Red Hat does this).

Case (a) may or may not have the fix (we can't easily(*) tell), but
cases (b) and (c) are Ok.  If you're willing to classify all three
as not having the fix (false negatives), then you want to test

#if (__GNUC__ == 4  __GNUC_MINOR__ == 8  __GNUC_PATCHLEVEL__  4)

for possibly broken versions.

A complication is that a bug has both starting and ending commits.
It's not uncommon for distros and others to backport changes, so a
compiler that claims to be e.g. 4.7.4 may include a backport of the
4.8 change that caused the bug you're trying to avoid.  There is no
easy way to detect this, unless you have a runtime test case for the
bug.  I'd ignore this case as unfixable.

So I'd write the tests for vanilla upstream GCC only, and tell distro
users to complain to their distros if their kernels get miscompiled.

(*) __VERSION__ is defined like 4.8.3 20140515 (prerelease) in
pre-releases but like 4.8.3 in ordinary releases, but this is not
something you can test for in the C preprocessor.  A configure-time
check could extract the date and compare that with the date the fix
went into that particular branch, but case (c) above make detecting
pre-releases a bit more complicated.

  Starting from the mainline viewcvs revision page for this fix here,
  https://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=204203
  (which is the link from the PR for the fix), navigation to anywhere
  else in the gcc tree is impossible. I can't even look at the Changelog.

https://gcc.gnu.org/viewcvs/gcc/
then descend trunk or branches as needed.

/Mikael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] arm: Blacklist gcc 4.8.[012] and 4.9.0 with CONFIG_FRAME_POINTER

2014-10-11 Thread Mikael Pettersson

Peter Hurley writes:
 > On 10/10/2014 12:36 PM, Russell King - ARM Linux wrote:
 > > On Fri, Oct 10, 2014 at 12:26:14PM -0400, Peter Hurley wrote:
 > >> gcc versions 4.8.[012] and 4.9.0 generates code that prematurely
 > >> adjusts the stack pointer such that still-to-be-referenced locals
 > >> are below the stack pointer, which allows them to be overwritten
 > >> by interrupts.
 > > 
 > > I would much rather do this in asm-offsets.c, along side the other ARM
 > > specific buggy compiler test(s).  I'm presently putting together such
 > > a patch.
 > > 
 > > The information in the thread on linux-omap says only GCC 4.8.1 and
 > > GCC 4.8.2.  Where do you get the other versions from?
 > 
 > The gcc PR linked in the commit message; see the "Known to fail" field.

The 4.8.0 release is broken, but the 4.9.0 one is not.  It's unfortunate,
but "4.9.0" may refer to "the 4.9.0 release" or to "some point after trunk
forked 4.8 branch up to and including the 4.9.0 release point".  In this
case, it's the latter -- this can be inferred from the fact that the
fix went into trunk in October 2013 while 4.9.0 was branched and released
during the first half of 2014.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] arm: Blacklist gcc 4.8.[012] and 4.9.0 with CONFIG_FRAME_POINTER

2014-10-11 Thread Mikael Pettersson

Peter Hurley writes:
  On 10/10/2014 12:36 PM, Russell King - ARM Linux wrote:
   On Fri, Oct 10, 2014 at 12:26:14PM -0400, Peter Hurley wrote:
   gcc versions 4.8.[012] and 4.9.0 generates code that prematurely
   adjusts the stack pointer such that still-to-be-referenced locals
   are below the stack pointer, which allows them to be overwritten
   by interrupts.
   
   I would much rather do this in asm-offsets.c, along side the other ARM
   specific buggy compiler test(s).  I'm presently putting together such
   a patch.
   
   The information in the thread on linux-omap says only GCC 4.8.1 and
   GCC 4.8.2.  Where do you get the other versions from?
  
  The gcc PR linked in the commit message; see the Known to fail field.

The 4.8.0 release is broken, but the 4.9.0 one is not.  It's unfortunate,
but 4.9.0 may refer to the 4.9.0 release or to some point after trunk
forked 4.8 branch up to and including the 4.9.0 release point.  In this
case, it's the latter -- this can be inferred from the fact that the
fix went into trunk in October 2013 while 4.9.0 was branched and released
during the first half of 2014.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BISECTED] 3.17-rc1 radeon screen corruption due to "Always flush the HDP cache before submitting a CS to the GPU"

2014-09-14 Thread Mikael Pettersson

Michel Dänzer writes:
 > On 06.09.2014 01:49, Mikael Pettersson wrote:
 > > Michel Dänzer writes:
 > >   > On 30.08.2014 22:59, Mikael Pettersson wrote:
 > >   > > Since 3.17-rc1 my radeon card (RV370 / X1050 card) causes screen 
 > > corruption
 > >   > > after a while in X + firefox.  This still occurs with yesterday's 
 > > HEAD
 > >   > > of Linus' repo.  3.16 and ealier kernels are fine.
 > >   > >
 > >   > > I ran a bisect, which identified:
 > >   > >
 > >   > > commit 72a9987edcedb89db988079a03c9b9c65b6ec9ac
 > >   > > Author: Michel DÃ¤nzer 
 > >   > > Date:   Thu Jul 31 18:43:49 2014 +0900
 > >   > >
 > >   > >  drm/radeon: Always flush the HDP cache before submitting a CS 
 > > to the GPU
 > >   > >
 > >   > > as the cause of my screen corruption.  Reverting this from 3.17-rc2
 > >   > > (which requires manual intervention due to subsequent changes in
 > >   > > radeon_ring_commit()) eliminates the screen corruption.
 > >   >
 > >   > Does the patch below help?
 > > 
 > > Tested, sorry no joy.  I first reconfirmed the screen corruption with 
 > > 3.17-rc3.
 > > I then applied this and rebuilt/rebooted, and after a few minutes X had a 
 > > hickup
 > > (screen went black, came back after a few seconds, but then no cursor or
 > > reaction to mouse events), but I was able to kill it via my 
 > > Terminate_Server
 > > key binding.
 > 
 > I was afraid so, thanks for testing it.
 > 
 > 
 > I can't see any other option than the patch below then. Can you confirm that 
 > this
 > fixes the screen corruption?

It does, thanks.

Tested-by: Mikael Pettersson 

 > 
 > 
 > diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
 > index 4c5ec44..b0098e7 100644
 > --- a/drivers/gpu/drm/radeon/r100.c
 > +++ b/drivers/gpu/drm/radeon/r100.c
 > @@ -821,6 +821,20 @@ u32 r100_get_vblank_counter(struct radeon_device *rdev, 
 > int crtc)
 >  return RREG32(RADEON_CRTC2_CRNT_FRAME);
 >  }
 >  
 > +/**
 > + * r100_ring_hdp_flush - flush Host Data Path via the ring buffer
 > + * rdev: radeon device structure
 > + * ring: ring buffer struct for emitting packets
 > + */
 > +static void r100_ring_hdp_flush(struct radeon_device *rdev, struct 
 > radeon_ring *ring)
 > +{
 > +radeon_ring_write(ring, PACKET0(RADEON_HOST_PATH_CNTL, 0));
 > +radeon_ring_write(ring, rdev->config.r100.hdp_cntl |
 > +RADEON_HDP_READ_BUFFER_INVALIDATE);
 > +radeon_ring_write(ring, PACKET0(RADEON_HOST_PATH_CNTL, 0));
 > +radeon_ring_write(ring, rdev->config.r100.hdp_cntl);
 > +}
 > +
 >  /* Who ever call radeon_fence_emit should call ring_lock and ask
 >   * for enough space (today caller are ib schedule and buffer move) */
 >  void r100_fence_ring_emit(struct radeon_device *rdev,
 > @@ -1056,20 +1070,6 @@ void r100_gfx_set_wptr(struct radeon_device *rdev,
 >  (void)RREG32(RADEON_CP_RB_WPTR);
 >  }
 >  
 > -/**
 > - * r100_ring_hdp_flush - flush Host Data Path via the ring buffer
 > - * rdev: radeon device structure
 > - * ring: ring buffer struct for emitting packets
 > - */
 > -void r100_ring_hdp_flush(struct radeon_device *rdev, struct radeon_ring 
 > *ring)
 > -{
 > -radeon_ring_write(ring, PACKET0(RADEON_HOST_PATH_CNTL, 0));
 > -radeon_ring_write(ring, rdev->config.r100.hdp_cntl |
 > -RADEON_HDP_READ_BUFFER_INVALIDATE);
 > -radeon_ring_write(ring, PACKET0(RADEON_HOST_PATH_CNTL, 0));
 > -radeon_ring_write(ring, rdev->config.r100.hdp_cntl);
 > -}
 > -
 >  static void r100_cp_load_microcode(struct radeon_device *rdev)
 >  {
 >  const __be32 *fw_data;
 > diff --git a/drivers/gpu/drm/radeon/radeon_asic.c 
 > b/drivers/gpu/drm/radeon/radeon_asic.c
 > index abe..2dd5847 100644
 > --- a/drivers/gpu/drm/radeon/radeon_asic.c
 > +++ b/drivers/gpu/drm/radeon/radeon_asic.c
 > @@ -185,7 +185,6 @@ static struct radeon_asic_ring r100_gfx_ring = {
 >  .get_rptr = _gfx_get_rptr,
 >  .get_wptr = _gfx_get_wptr,
 >  .set_wptr = _gfx_set_wptr,
 > -.hdp_flush = _ring_hdp_flush,
 >  };
 >  
 >  static struct radeon_asic r100_asic = {
 > @@ -332,7 +331,6 @@ static struct radeon_asic_ring r300_gfx_ring = {
 >  .get_rptr = _gfx_get_rptr,
 >  .get_wptr = _gfx_get_wptr,
 >  .set_wptr = _gfx_set_wptr,
 > -.hdp_flush = _ring_hdp_flush,
 >  };
 >  
 >  static struct radeon_asic r300_asic = {
 > diff --git a/drivers/gpu/drm/radeon/radeon_asic.h 
 > b/drivers/gpu/drm/radeon/radeon_as

Re: [BISECTED] 3.17-rc1 radeon screen corruption due to Always flush the HDP cache before submitting a CS to the GPU

2014-09-14 Thread Mikael Pettersson

Michel Dänzer writes:
  On 06.09.2014 01:49, Mikael Pettersson wrote:
   Michel Dänzer writes:
  On 30.08.2014 22:59, Mikael Pettersson wrote:
   Since 3.17-rc1 my radeon card (RV370 / X1050 card) causes screen 
   corruption
   after a while in X + firefox.  This still occurs with yesterday's 
   HEAD
   of Linus' repo.  3.16 and ealier kernels are fine.
  
   I ran a bisect, which identified:
  
   commit 72a9987edcedb89db988079a03c9b9c65b6ec9ac
   Author: Michel DÃ¤nzer michel.daen...@amd.com
   Date:   Thu Jul 31 18:43:49 2014 +0900
  
drm/radeon: Always flush the HDP cache before submitting a CS 
   to the GPU
  
   as the cause of my screen corruption.  Reverting this from 3.17-rc2
   (which requires manual intervention due to subsequent changes in
   radeon_ring_commit()) eliminates the screen corruption.
 
  Does the patch below help?
   
   Tested, sorry no joy.  I first reconfirmed the screen corruption with 
   3.17-rc3.
   I then applied this and rebuilt/rebooted, and after a few minutes X had a 
   hickup
   (screen went black, came back after a few seconds, but then no cursor or
   reaction to mouse events), but I was able to kill it via my 
   Terminate_Server
   key binding.
  
  I was afraid so, thanks for testing it.
  
  
  I can't see any other option than the patch below then. Can you confirm that 
  this
  fixes the screen corruption?

It does, thanks.

Tested-by: Mikael Pettersson mikpeli...@gmail.com

  
  
  diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
  index 4c5ec44..b0098e7 100644
  --- a/drivers/gpu/drm/radeon/r100.c
  +++ b/drivers/gpu/drm/radeon/r100.c
  @@ -821,6 +821,20 @@ u32 r100_get_vblank_counter(struct radeon_device *rdev, 
  int crtc)
   return RREG32(RADEON_CRTC2_CRNT_FRAME);
   }
   
  +/**
  + * r100_ring_hdp_flush - flush Host Data Path via the ring buffer
  + * rdev: radeon device structure
  + * ring: ring buffer struct for emitting packets
  + */
  +static void r100_ring_hdp_flush(struct radeon_device *rdev, struct 
  radeon_ring *ring)
  +{
  +radeon_ring_write(ring, PACKET0(RADEON_HOST_PATH_CNTL, 0));
  +radeon_ring_write(ring, rdev-config.r100.hdp_cntl |
  +RADEON_HDP_READ_BUFFER_INVALIDATE);
  +radeon_ring_write(ring, PACKET0(RADEON_HOST_PATH_CNTL, 0));
  +radeon_ring_write(ring, rdev-config.r100.hdp_cntl);
  +}
  +
   /* Who ever call radeon_fence_emit should call ring_lock and ask
* for enough space (today caller are ib schedule and buffer move) */
   void r100_fence_ring_emit(struct radeon_device *rdev,
  @@ -1056,20 +1070,6 @@ void r100_gfx_set_wptr(struct radeon_device *rdev,
   (void)RREG32(RADEON_CP_RB_WPTR);
   }
   
  -/**
  - * r100_ring_hdp_flush - flush Host Data Path via the ring buffer
  - * rdev: radeon device structure
  - * ring: ring buffer struct for emitting packets
  - */
  -void r100_ring_hdp_flush(struct radeon_device *rdev, struct radeon_ring 
  *ring)
  -{
  -radeon_ring_write(ring, PACKET0(RADEON_HOST_PATH_CNTL, 0));
  -radeon_ring_write(ring, rdev-config.r100.hdp_cntl |
  -RADEON_HDP_READ_BUFFER_INVALIDATE);
  -radeon_ring_write(ring, PACKET0(RADEON_HOST_PATH_CNTL, 0));
  -radeon_ring_write(ring, rdev-config.r100.hdp_cntl);
  -}
  -
   static void r100_cp_load_microcode(struct radeon_device *rdev)
   {
   const __be32 *fw_data;
  diff --git a/drivers/gpu/drm/radeon/radeon_asic.c 
  b/drivers/gpu/drm/radeon/radeon_asic.c
  index abe..2dd5847 100644
  --- a/drivers/gpu/drm/radeon/radeon_asic.c
  +++ b/drivers/gpu/drm/radeon/radeon_asic.c
  @@ -185,7 +185,6 @@ static struct radeon_asic_ring r100_gfx_ring = {
   .get_rptr = r100_gfx_get_rptr,
   .get_wptr = r100_gfx_get_wptr,
   .set_wptr = r100_gfx_set_wptr,
  -.hdp_flush = r100_ring_hdp_flush,
   };
   
   static struct radeon_asic r100_asic = {
  @@ -332,7 +331,6 @@ static struct radeon_asic_ring r300_gfx_ring = {
   .get_rptr = r100_gfx_get_rptr,
   .get_wptr = r100_gfx_get_wptr,
   .set_wptr = r100_gfx_set_wptr,
  -.hdp_flush = r100_ring_hdp_flush,
   };
   
   static struct radeon_asic r300_asic = {
  diff --git a/drivers/gpu/drm/radeon/radeon_asic.h 
  b/drivers/gpu/drm/radeon/radeon_asic.h
  index 275a5dc..7756bc1 100644
  --- a/drivers/gpu/drm/radeon/radeon_asic.h
  +++ b/drivers/gpu/drm/radeon/radeon_asic.h
  @@ -148,8 +148,7 @@ u32 r100_gfx_get_wptr(struct radeon_device *rdev,
 struct radeon_ring *ring);
   void r100_gfx_set_wptr(struct radeon_device *rdev,
  struct radeon_ring *ring);
  -void r100_ring_hdp_flush(struct radeon_device *rdev,
  - struct radeon_ring *ring);
  +
   /*
* r200,rv250,rs300,rv280
*/
  diff --git a/drivers/gpu/drm/radeon/radeon_drv.c 
  b/drivers/gpu/drm/radeon/radeon_drv.c
  index a773830..ef5b60a 100644
  --- a/drivers/gpu/drm

Re: [BISECTED] 3.17-rc1 radeon screen corruption due to "Always flush the HDP cache before submitting a CS to the GPU"

2014-09-08 Thread Mikael Pettersson

Michel Dänzer writes:
 > On 06.09.2014 01:49, Mikael Pettersson wrote:
 > > Michel Dänzer writes:
 > >   > On 30.08.2014 22:59, Mikael Pettersson wrote:
 > >   > > Since 3.17-rc1 my radeon card (RV370 / X1050 card) causes screen 
 > > corruption
 > >   > > after a while in X + firefox.  This still occurs with yesterday's 
 > > HEAD
 > >   > > of Linus' repo.  3.16 and ealier kernels are fine.
 > >   > >
 > >   > > I ran a bisect, which identified:
 > >   > >
 > >   > > commit 72a9987edcedb89db988079a03c9b9c65b6ec9ac
 > >   > > Author: Michel DÃ¤nzer 
 > >   > > Date:   Thu Jul 31 18:43:49 2014 +0900
 > >   > >
 > >   > >  drm/radeon: Always flush the HDP cache before submitting a CS 
 > > to the GPU
 > >   > >
 > >   > > as the cause of my screen corruption.  Reverting this from 3.17-rc2
 > >   > > (which requires manual intervention due to subsequent changes in
 > >   > > radeon_ring_commit()) eliminates the screen corruption.
 > >   >
 > >   > Does the patch below help?
 > > 
 > > Tested, sorry no joy.  I first reconfirmed the screen corruption with 
 > > 3.17-rc3.
 > > I then applied this and rebuilt/rebooted, and after a few minutes X had a 
 > > hickup
 > > (screen went black, came back after a few seconds, but then no cursor or
 > > reaction to mouse events), but I was able to kill it via my 
 > > Terminate_Server
 > > key binding.
 > 
 > I was afraid so, thanks for testing it.
 > 
 > 
 > I can't see any other option than the patch below then. Can you confirm that 
 > this
 > fixes the screen corruption?

I'll test this on Friday evening when I'm back home and have access to the
affected machine.

/Mikael


 > 
 > 
 > diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
 > index 4c5ec44..b0098e7 100644
 > --- a/drivers/gpu/drm/radeon/r100.c
 > +++ b/drivers/gpu/drm/radeon/r100.c
 > @@ -821,6 +821,20 @@ u32 r100_get_vblank_counter(struct radeon_device *rdev, 
 > int crtc)
 >  return RREG32(RADEON_CRTC2_CRNT_FRAME);
 >  }
 >  
 > +/**
 > + * r100_ring_hdp_flush - flush Host Data Path via the ring buffer
 > + * rdev: radeon device structure
 > + * ring: ring buffer struct for emitting packets
 > + */
 > +static void r100_ring_hdp_flush(struct radeon_device *rdev, struct 
 > radeon_ring *ring)
 > +{
 > +radeon_ring_write(ring, PACKET0(RADEON_HOST_PATH_CNTL, 0));
 > +radeon_ring_write(ring, rdev->config.r100.hdp_cntl |
 > +RADEON_HDP_READ_BUFFER_INVALIDATE);
 > +radeon_ring_write(ring, PACKET0(RADEON_HOST_PATH_CNTL, 0));
 > +radeon_ring_write(ring, rdev->config.r100.hdp_cntl);
 > +}
 > +
 >  /* Who ever call radeon_fence_emit should call ring_lock and ask
 >   * for enough space (today caller are ib schedule and buffer move) */
 >  void r100_fence_ring_emit(struct radeon_device *rdev,
 > @@ -1056,20 +1070,6 @@ void r100_gfx_set_wptr(struct radeon_device *rdev,
 >  (void)RREG32(RADEON_CP_RB_WPTR);
 >  }
 >  
 > -/**
 > - * r100_ring_hdp_flush - flush Host Data Path via the ring buffer
 > - * rdev: radeon device structure
 > - * ring: ring buffer struct for emitting packets
 > - */
 > -void r100_ring_hdp_flush(struct radeon_device *rdev, struct radeon_ring 
 > *ring)
 > -{
 > -radeon_ring_write(ring, PACKET0(RADEON_HOST_PATH_CNTL, 0));
 > -radeon_ring_write(ring, rdev->config.r100.hdp_cntl |
 > -RADEON_HDP_READ_BUFFER_INVALIDATE);
 > -radeon_ring_write(ring, PACKET0(RADEON_HOST_PATH_CNTL, 0));
 > -radeon_ring_write(ring, rdev->config.r100.hdp_cntl);
 > -}
 > -
 >  static void r100_cp_load_microcode(struct radeon_device *rdev)
 >  {
 >  const __be32 *fw_data;
 > diff --git a/drivers/gpu/drm/radeon/radeon_asic.c 
 > b/drivers/gpu/drm/radeon/radeon_asic.c
 > index abe..2dd5847 100644
 > --- a/drivers/gpu/drm/radeon/radeon_asic.c
 > +++ b/drivers/gpu/drm/radeon/radeon_asic.c
 > @@ -185,7 +185,6 @@ static struct radeon_asic_ring r100_gfx_ring = {
 >  .get_rptr = _gfx_get_rptr,
 >  .get_wptr = _gfx_get_wptr,
 >  .set_wptr = _gfx_set_wptr,
 > -.hdp_flush = _ring_hdp_flush,
 >  };
 >  
 >  static struct radeon_asic r100_asic = {
 > @@ -332,7 +331,6 @@ static struct radeon_asic_ring r300_gfx_ring = {
 >  .get_rptr = _gfx_get_rptr,
 >  .get_wptr = _gfx_get_wptr,
 >  .set_wptr = _gfx_set_wptr,
 > -.hdp_flush = _ring_hdp_flush,
 >  };
 >  
 >  static struct radeon_asic r300_asic = {
 > diff --git

Re: [BISECTED] 3.17-rc1 radeon screen corruption due to Always flush the HDP cache before submitting a CS to the GPU

2014-09-08 Thread Mikael Pettersson

Michel Dänzer writes:
  On 06.09.2014 01:49, Mikael Pettersson wrote:
   Michel Dänzer writes:
  On 30.08.2014 22:59, Mikael Pettersson wrote:
   Since 3.17-rc1 my radeon card (RV370 / X1050 card) causes screen 
   corruption
   after a while in X + firefox.  This still occurs with yesterday's 
   HEAD
   of Linus' repo.  3.16 and ealier kernels are fine.
  
   I ran a bisect, which identified:
  
   commit 72a9987edcedb89db988079a03c9b9c65b6ec9ac
   Author: Michel DÃ¤nzer michel.daen...@amd.com
   Date:   Thu Jul 31 18:43:49 2014 +0900
  
drm/radeon: Always flush the HDP cache before submitting a CS 
   to the GPU
  
   as the cause of my screen corruption.  Reverting this from 3.17-rc2
   (which requires manual intervention due to subsequent changes in
   radeon_ring_commit()) eliminates the screen corruption.
 
  Does the patch below help?
   
   Tested, sorry no joy.  I first reconfirmed the screen corruption with 
   3.17-rc3.
   I then applied this and rebuilt/rebooted, and after a few minutes X had a 
   hickup
   (screen went black, came back after a few seconds, but then no cursor or
   reaction to mouse events), but I was able to kill it via my 
   Terminate_Server
   key binding.
  
  I was afraid so, thanks for testing it.
  
  
  I can't see any other option than the patch below then. Can you confirm that 
  this
  fixes the screen corruption?

I'll test this on Friday evening when I'm back home and have access to the
affected machine.

/Mikael


  
  
  diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
  index 4c5ec44..b0098e7 100644
  --- a/drivers/gpu/drm/radeon/r100.c
  +++ b/drivers/gpu/drm/radeon/r100.c
  @@ -821,6 +821,20 @@ u32 r100_get_vblank_counter(struct radeon_device *rdev, 
  int crtc)
   return RREG32(RADEON_CRTC2_CRNT_FRAME);
   }
   
  +/**
  + * r100_ring_hdp_flush - flush Host Data Path via the ring buffer
  + * rdev: radeon device structure
  + * ring: ring buffer struct for emitting packets
  + */
  +static void r100_ring_hdp_flush(struct radeon_device *rdev, struct 
  radeon_ring *ring)
  +{
  +radeon_ring_write(ring, PACKET0(RADEON_HOST_PATH_CNTL, 0));
  +radeon_ring_write(ring, rdev-config.r100.hdp_cntl |
  +RADEON_HDP_READ_BUFFER_INVALIDATE);
  +radeon_ring_write(ring, PACKET0(RADEON_HOST_PATH_CNTL, 0));
  +radeon_ring_write(ring, rdev-config.r100.hdp_cntl);
  +}
  +
   /* Who ever call radeon_fence_emit should call ring_lock and ask
* for enough space (today caller are ib schedule and buffer move) */
   void r100_fence_ring_emit(struct radeon_device *rdev,
  @@ -1056,20 +1070,6 @@ void r100_gfx_set_wptr(struct radeon_device *rdev,
   (void)RREG32(RADEON_CP_RB_WPTR);
   }
   
  -/**
  - * r100_ring_hdp_flush - flush Host Data Path via the ring buffer
  - * rdev: radeon device structure
  - * ring: ring buffer struct for emitting packets
  - */
  -void r100_ring_hdp_flush(struct radeon_device *rdev, struct radeon_ring 
  *ring)
  -{
  -radeon_ring_write(ring, PACKET0(RADEON_HOST_PATH_CNTL, 0));
  -radeon_ring_write(ring, rdev-config.r100.hdp_cntl |
  -RADEON_HDP_READ_BUFFER_INVALIDATE);
  -radeon_ring_write(ring, PACKET0(RADEON_HOST_PATH_CNTL, 0));
  -radeon_ring_write(ring, rdev-config.r100.hdp_cntl);
  -}
  -
   static void r100_cp_load_microcode(struct radeon_device *rdev)
   {
   const __be32 *fw_data;
  diff --git a/drivers/gpu/drm/radeon/radeon_asic.c 
  b/drivers/gpu/drm/radeon/radeon_asic.c
  index abe..2dd5847 100644
  --- a/drivers/gpu/drm/radeon/radeon_asic.c
  +++ b/drivers/gpu/drm/radeon/radeon_asic.c
  @@ -185,7 +185,6 @@ static struct radeon_asic_ring r100_gfx_ring = {
   .get_rptr = r100_gfx_get_rptr,
   .get_wptr = r100_gfx_get_wptr,
   .set_wptr = r100_gfx_set_wptr,
  -.hdp_flush = r100_ring_hdp_flush,
   };
   
   static struct radeon_asic r100_asic = {
  @@ -332,7 +331,6 @@ static struct radeon_asic_ring r300_gfx_ring = {
   .get_rptr = r100_gfx_get_rptr,
   .get_wptr = r100_gfx_get_wptr,
   .set_wptr = r100_gfx_set_wptr,
  -.hdp_flush = r100_ring_hdp_flush,
   };
   
   static struct radeon_asic r300_asic = {
  diff --git a/drivers/gpu/drm/radeon/radeon_asic.h 
  b/drivers/gpu/drm/radeon/radeon_asic.h
  index 275a5dc..7756bc1 100644
  --- a/drivers/gpu/drm/radeon/radeon_asic.h
  +++ b/drivers/gpu/drm/radeon/radeon_asic.h
  @@ -148,8 +148,7 @@ u32 r100_gfx_get_wptr(struct radeon_device *rdev,
 struct radeon_ring *ring);
   void r100_gfx_set_wptr(struct radeon_device *rdev,
  struct radeon_ring *ring);
  -void r100_ring_hdp_flush(struct radeon_device *rdev,
  - struct radeon_ring *ring);
  +
   /*
* r200,rv250,rs300,rv280
*/
  diff --git a/drivers/gpu/drm/radeon/radeon_drv.c 
  b/drivers/gpu/drm/radeon/radeon_drv.c
  index a773830

Re: [BISECTED] 3.17-rc1 radeon screen corruption due to "Always flush the HDP cache before submitting a CS to the GPU"

2014-09-05 Thread Mikael Pettersson

Michel Dänzer writes:
 > On 30.08.2014 22:59, Mikael Pettersson wrote:
 > > Since 3.17-rc1 my radeon card (RV370 / X1050 card) causes screen corruption
 > > after a while in X + firefox.  This still occurs with yesterday's HEAD
 > > of Linus' repo.  3.16 and ealier kernels are fine.
 > > 
 > > I ran a bisect, which identified:
 > > 
 > > commit 72a9987edcedb89db988079a03c9b9c65b6ec9ac
 > > Author: Michel DÃ¤nzer 
 > > Date:   Thu Jul 31 18:43:49 2014 +0900
 > > 
 > >  drm/radeon: Always flush the HDP cache before submitting a CS to the 
 > > GPU
 > > 
 > > as the cause of my screen corruption.  Reverting this from 3.17-rc2
 > > (which requires manual intervention due to subsequent changes in
 > > radeon_ring_commit()) eliminates the screen corruption.
 > 
 > Does the patch below help?

Tested, sorry no joy.  I first reconfirmed the screen corruption with 3.17-rc3.
I then applied this and rebuilt/rebooted, and after a few minutes X had a hickup
(screen went black, came back after a few seconds, but then no cursor or
reaction to mouse events), but I was able to kill it via my Terminate_Server
key binding.  The kernel log showed:

[ 1641.247760] radeon :01:00.0: ring 0 stalled for more than 1msec
[ 1641.247765] radeon :01:00.0: GPU lockup (waiting for 0x6241 
last fence id 0x6240 on ring 0)
[ 1641.247768] radeon :01:00.0: failed to get a new IB (-35)
[ 1641.247770] [drm:radeon_cs_ib_fill] *ERROR* Failed to get ib !
[ 1641.404052] Failed to wait GUI idle while programming pipes. Bad things 
might happen.
[ 1641.405075] radeon :01:00.0: Saved 859 dwords of commands on ring 0.
[ 1641.405084] radeon :01:00.0: (r300_asic_reset:394) RBBM_STATUS=0x80010140
[ 1641.910649] radeon :01:00.0: (r300_asic_reset:413) RBBM_STATUS=0x80010140
[ 1642.412182] radeon :01:00.0: (r300_asic_reset:425) RBBM_STATUS=0x0140
[ 1642.412218] radeon :01:00.0: GPU reset succeed
[ 1642.412220] radeon :01:00.0: GPU reset succeeded, trying to resume
[ 1642.412224] radeon :01:00.0: 88060274f800 unpin not necessary
[ 1642.626303] [drm] radeon: 1 quad pipes, 1 Z pipes initialized.
[ 1642.626325] [drm] PCIE GART of 512M enabled (table at 0xE004).
[ 1642.626328] radeon :01:00.0: WB enabled
[ 1642.626331] radeon :01:00.0: fence driver on ring 0 use gpu addr 
0xc000 and cpu addr 0x8800d9b9f000
[ 1642.626375] [drm] radeon: ring at 0xC0001000
[ 1642.783220] [drm:r100_ring_test] *ERROR* radeon: ring test failed 
(scratch(0x15E8)=0xCAFEDEAD)
[ 1642.783222] [drm:r100_cp_init] *ERROR* radeon: cp isn't working (-22).
[ 1642.783224] radeon :01:00.0: failed initializing CP (-22).

With a revert of the HDP flush patch things are stable.

/Mikael

 > 
 > diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
 > index 4c5ec44..3ff9c53 100644
 > --- a/drivers/gpu/drm/radeon/r100.c
 > +++ b/drivers/gpu/drm/radeon/r100.c
 > @@ -1070,6 +1070,20 @@ void r100_ring_hdp_flush(struct radeon_device *rdev, 
 > struct radeon_ring *ring)
 >  radeon_ring_write(ring, rdev->config.r100.hdp_cntl);
 >  }
 >  
 > +/**
 > + * r100_mmio_hdp_flush - flush Host Data Path via MMIO
 > + * rdev: radeon device structure
 > + */
 > +void r100_mmio_hdp_flush(struct radeon_device *rdev)
 > +{
 > +WREG32(RADEON_HOST_PATH_CNTL,
 > +   rdev->config.r100.hdp_cntl | RADEON_HDP_READ_BUFFER_INVALIDATE);
 > +(void)RREG32(RADEON_HOST_PATH_CNTL);
 > +WREG32(RADEON_HOST_PATH_CNTL,
 > +   rdev->config.r100.hdp_cntl);
 > +(void)RREG32(RADEON_HOST_PATH_CNTL);
 > +}
 > +
 >  static void r100_cp_load_microcode(struct radeon_device *rdev)
 >  {
 >  const __be32 *fw_data;
 > diff --git a/drivers/gpu/drm/radeon/radeon_asic.c 
 > b/drivers/gpu/drm/radeon/radeon_asic.c
 > index abe..c23a123 100644
 > --- a/drivers/gpu/drm/radeon/radeon_asic.c
 > +++ b/drivers/gpu/drm/radeon/radeon_asic.c
 > @@ -408,7 +408,7 @@ static struct radeon_asic r300_asic_pcie = {
 >  .resume = _resume,
 >  .vga_set_state = _vga_set_state,
 >  .asic_reset = _asic_reset,
 > -.mmio_hdp_flush = NULL,
 > +.mmio_hdp_flush = r100_mmio_hdp_flush,
 >  .gui_idle = _gui_idle,
 >  .mc_wait_for_idle = _mc_wait_for_idle,
 >  .gart = {
 > diff --git a/drivers/gpu/drm/radeon/radeon_asic.h 
 > b/drivers/gpu/drm/radeon/radeon_asic.h
 > index 275a5dc..e9b1c35 100644
 > --- a/drivers/gpu/drm/radeon/radeon_asic.h
 > +++ b/drivers/gpu/drm/radeon/radeon_asic.h
 > @@ -150,6 +150,8 @@ void r100_gfx_set_wptr(struct radeon_device *rdev,
 > struct radeon_ring *ring);
 >  void r100_ring_hdp_flush(struct radeon_device *rdev,
 >   struct radeon_ring

Re: [BISECTED] 3.17-rc1 radeon screen corruption due to Always flush the HDP cache before submitting a CS to the GPU

2014-09-05 Thread Mikael Pettersson

Michel Dänzer writes:
  On 30.08.2014 22:59, Mikael Pettersson wrote:
   Since 3.17-rc1 my radeon card (RV370 / X1050 card) causes screen corruption
   after a while in X + firefox.  This still occurs with yesterday's HEAD
   of Linus' repo.  3.16 and ealier kernels are fine.
   
   I ran a bisect, which identified:
   
   commit 72a9987edcedb89db988079a03c9b9c65b6ec9ac
   Author: Michel DÃ¤nzer michel.daen...@amd.com
   Date:   Thu Jul 31 18:43:49 2014 +0900
   
drm/radeon: Always flush the HDP cache before submitting a CS to the 
   GPU
   
   as the cause of my screen corruption.  Reverting this from 3.17-rc2
   (which requires manual intervention due to subsequent changes in
   radeon_ring_commit()) eliminates the screen corruption.
  
  Does the patch below help?

Tested, sorry no joy.  I first reconfirmed the screen corruption with 3.17-rc3.
I then applied this and rebuilt/rebooted, and after a few minutes X had a hickup
(screen went black, came back after a few seconds, but then no cursor or
reaction to mouse events), but I was able to kill it via my Terminate_Server
key binding.  The kernel log showed:

[ 1641.247760] radeon :01:00.0: ring 0 stalled for more than 1msec
[ 1641.247765] radeon :01:00.0: GPU lockup (waiting for 0x6241 
last fence id 0x6240 on ring 0)
[ 1641.247768] radeon :01:00.0: failed to get a new IB (-35)
[ 1641.247770] [drm:radeon_cs_ib_fill] *ERROR* Failed to get ib !
[ 1641.404052] Failed to wait GUI idle while programming pipes. Bad things 
might happen.
[ 1641.405075] radeon :01:00.0: Saved 859 dwords of commands on ring 0.
[ 1641.405084] radeon :01:00.0: (r300_asic_reset:394) RBBM_STATUS=0x80010140
[ 1641.910649] radeon :01:00.0: (r300_asic_reset:413) RBBM_STATUS=0x80010140
[ 1642.412182] radeon :01:00.0: (r300_asic_reset:425) RBBM_STATUS=0x0140
[ 1642.412218] radeon :01:00.0: GPU reset succeed
[ 1642.412220] radeon :01:00.0: GPU reset succeeded, trying to resume
[ 1642.412224] radeon :01:00.0: 88060274f800 unpin not necessary
[ 1642.626303] [drm] radeon: 1 quad pipes, 1 Z pipes initialized.
[ 1642.626325] [drm] PCIE GART of 512M enabled (table at 0xE004).
[ 1642.626328] radeon :01:00.0: WB enabled
[ 1642.626331] radeon :01:00.0: fence driver on ring 0 use gpu addr 
0xc000 and cpu addr 0x8800d9b9f000
[ 1642.626375] [drm] radeon: ring at 0xC0001000
[ 1642.783220] [drm:r100_ring_test] *ERROR* radeon: ring test failed 
(scratch(0x15E8)=0xCAFEDEAD)
[ 1642.783222] [drm:r100_cp_init] *ERROR* radeon: cp isn't working (-22).
[ 1642.783224] radeon :01:00.0: failed initializing CP (-22).

With a revert of the HDP flush patch things are stable.

/Mikael

  
  diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
  index 4c5ec44..3ff9c53 100644
  --- a/drivers/gpu/drm/radeon/r100.c
  +++ b/drivers/gpu/drm/radeon/r100.c
  @@ -1070,6 +1070,20 @@ void r100_ring_hdp_flush(struct radeon_device *rdev, 
  struct radeon_ring *ring)
   radeon_ring_write(ring, rdev-config.r100.hdp_cntl);
   }
   
  +/**
  + * r100_mmio_hdp_flush - flush Host Data Path via MMIO
  + * rdev: radeon device structure
  + */
  +void r100_mmio_hdp_flush(struct radeon_device *rdev)
  +{
  +WREG32(RADEON_HOST_PATH_CNTL,
  +   rdev-config.r100.hdp_cntl | RADEON_HDP_READ_BUFFER_INVALIDATE);
  +(void)RREG32(RADEON_HOST_PATH_CNTL);
  +WREG32(RADEON_HOST_PATH_CNTL,
  +   rdev-config.r100.hdp_cntl);
  +(void)RREG32(RADEON_HOST_PATH_CNTL);
  +}
  +
   static void r100_cp_load_microcode(struct radeon_device *rdev)
   {
   const __be32 *fw_data;
  diff --git a/drivers/gpu/drm/radeon/radeon_asic.c 
  b/drivers/gpu/drm/radeon/radeon_asic.c
  index abe..c23a123 100644
  --- a/drivers/gpu/drm/radeon/radeon_asic.c
  +++ b/drivers/gpu/drm/radeon/radeon_asic.c
  @@ -408,7 +408,7 @@ static struct radeon_asic r300_asic_pcie = {
   .resume = r300_resume,
   .vga_set_state = r100_vga_set_state,
   .asic_reset = r300_asic_reset,
  -.mmio_hdp_flush = NULL,
  +.mmio_hdp_flush = r100_mmio_hdp_flush,
   .gui_idle = r100_gui_idle,
   .mc_wait_for_idle = r300_mc_wait_for_idle,
   .gart = {
  diff --git a/drivers/gpu/drm/radeon/radeon_asic.h 
  b/drivers/gpu/drm/radeon/radeon_asic.h
  index 275a5dc..e9b1c35 100644
  --- a/drivers/gpu/drm/radeon/radeon_asic.h
  +++ b/drivers/gpu/drm/radeon/radeon_asic.h
  @@ -150,6 +150,8 @@ void r100_gfx_set_wptr(struct radeon_device *rdev,
  struct radeon_ring *ring);
   void r100_ring_hdp_flush(struct radeon_device *rdev,
struct radeon_ring *ring);
  +void r100_mmio_hdp_flush(struct radeon_device *rdev);
  +
   /*
* r200,rv250,rs300,rv280
*/
  diff --git a/drivers/gpu/drm/radeon/radeon_gem.c 
  b/drivers/gpu/drm/radeon/radeon_gem.c
  index bfd7e1b..3d0f564 100644
  --- a/drivers/gpu/drm/radeon/radeon_gem.c
  +++ b/drivers/gpu/drm/radeon

Re: bit fields && data tearing

2014-09-04 Thread Mikael Pettersson

Benjamin Herrenschmidt writes:
 > On Wed, 2014-09-03 at 18:51 -0400, Peter Hurley wrote:
 > 
 > > Apologies for hijacking this thread but I need to extend this discussion
 > > somewhat regarding what a compiler might do with adjacent fields in a 
 > > structure.
 > > 
 > > The tty subsystem defines a large aggregate structure, struct tty_struct.
 > > Importantly, several different locks apply to different fields within that
 > > structure; ie., a specific spinlock will be claimed before updating or 
 > > accessing
 > > certain fields while a different spinlock will be claimed before updating 
 > > or
 > > accessing certain _adjacent_ fields.
 > > 
 > > What is necessary and sufficient to prevent accidental false-sharing?
 > > The patch below was flagged as insufficient on ia64, and possibly ARM.
 > 
 > We expect native aligned scalar types to be accessed atomically (the
 > read/modify/write of a larger quantity that gcc does on some bitfield
 > cases has been flagged as a gcc bug, but shouldn't happen on normal
 > scalar types).
 > 
 > I am not 100% certain of "bool" here, I assume it's treated as a normal
 > scalar and thus atomic but if unsure, you can always use int.

Please use an aligned int or long.  Some machines cannot do atomic
accesses on sub-int/long quantities, so 'bool' may cause unexpected
rmw cycles on adjacent fields.

/Mikael

 > 
 > Another option is to use the atomic bitops and make these bits in a
 > bitmask but that is probably unnecessary if you have locks already.
 > 
 > Cheers,
 > Ben.
 > 
 > 
 > > Regards,
 > > Peter Hurley
 > > 
 > > --- >% ---
 > > Subject: [PATCH 21/26] tty: Convert tty_struct bitfield to bools
 > > 
 > > The stopped, hw_stopped, flow_stopped and packet bits are smp-unsafe
 > > and interrupt-unsafe. For example,
 > > 
 > > CPU 0 | CPU 1
 > >   |
 > > tty->flow_stopped = 1 | tty->hw_stopped = 0
 > > 
 > > One of these updates will be corrupted, as the bitwise operation
 > > on the bitfield is non-atomic.
 > > 
 > > Ensure each flag has a separate memory location, so concurrent
 > > updates do not corrupt orthogonal states.
 > > 
 > > Signed-off-by: Peter Hurley 
 > > ---
 > >  include/linux/tty.h | 5 -
 > >  1 file changed, 4 insertions(+), 1 deletion(-)
 > > 
 > > diff --git a/include/linux/tty.h b/include/linux/tty.h
 > > index 1c3316a..7cf61cb 100644
 > > --- a/include/linux/tty.h
 > > +++ b/include/linux/tty.h
 > > @@ -261,7 +261,10 @@ struct tty_struct {
 > >unsigned long flags;
 > >int count;
 > >struct winsize winsize; /* winsize_mutex */
 > > -  unsigned char stopped:1, hw_stopped:1, flow_stopped:1, packet:1;
 > > +  bool stopped;
 > > +  bool hw_stopped;
 > > +  bool flow_stopped;
 > > +  bool packet;
 > >unsigned char ctrl_status;  /* ctrl_lock */
 > >unsigned int receive_room;  /* Bytes free for queue */
 > >int flow_change;
 > 
 > 
 > --
 > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 > the body of a message to majord...@vger.kernel.org
 > More majordomo info at  http://vger.kernel.org/majordomo-info.html
 > Please read the FAQ at  http://www.tux.org/lkml/

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: bit fields data tearing

2014-09-04 Thread Mikael Pettersson

Benjamin Herrenschmidt writes:
  On Wed, 2014-09-03 at 18:51 -0400, Peter Hurley wrote:
  
   Apologies for hijacking this thread but I need to extend this discussion
   somewhat regarding what a compiler might do with adjacent fields in a 
   structure.
   
   The tty subsystem defines a large aggregate structure, struct tty_struct.
   Importantly, several different locks apply to different fields within that
   structure; ie., a specific spinlock will be claimed before updating or 
   accessing
   certain fields while a different spinlock will be claimed before updating 
   or
   accessing certain _adjacent_ fields.
   
   What is necessary and sufficient to prevent accidental false-sharing?
   The patch below was flagged as insufficient on ia64, and possibly ARM.
  
  We expect native aligned scalar types to be accessed atomically (the
  read/modify/write of a larger quantity that gcc does on some bitfield
  cases has been flagged as a gcc bug, but shouldn't happen on normal
  scalar types).
  
  I am not 100% certain of bool here, I assume it's treated as a normal
  scalar and thus atomic but if unsure, you can always use int.

Please use an aligned int or long.  Some machines cannot do atomic
accesses on sub-int/long quantities, so 'bool' may cause unexpected
rmw cycles on adjacent fields.

/Mikael

  
  Another option is to use the atomic bitops and make these bits in a
  bitmask but that is probably unnecessary if you have locks already.
  
  Cheers,
  Ben.
  
  
   Regards,
   Peter Hurley
   
   --- % ---
   Subject: [PATCH 21/26] tty: Convert tty_struct bitfield to bools
   
   The stopped, hw_stopped, flow_stopped and packet bits are smp-unsafe
   and interrupt-unsafe. For example,
   
   CPU 0 | CPU 1
 |
   tty-flow_stopped = 1 | tty-hw_stopped = 0
   
   One of these updates will be corrupted, as the bitwise operation
   on the bitfield is non-atomic.
   
   Ensure each flag has a separate memory location, so concurrent
   updates do not corrupt orthogonal states.
   
   Signed-off-by: Peter Hurley pe...@hurleysoftware.com
   ---
include/linux/tty.h | 5 -
1 file changed, 4 insertions(+), 1 deletion(-)
   
   diff --git a/include/linux/tty.h b/include/linux/tty.h
   index 1c3316a..7cf61cb 100644
   --- a/include/linux/tty.h
   +++ b/include/linux/tty.h
   @@ -261,7 +261,10 @@ struct tty_struct {
  unsigned long flags;
  int count;
  struct winsize winsize; /* winsize_mutex */
   -  unsigned char stopped:1, hw_stopped:1, flow_stopped:1, packet:1;
   +  bool stopped;
   +  bool hw_stopped;
   +  bool flow_stopped;
   +  bool packet;
  unsigned char ctrl_status;  /* ctrl_lock */
  unsigned int receive_room;  /* Bytes free for queue */
  int flow_change;
  
  
  --
  To unsubscribe from this list: send the line unsubscribe linux-kernel in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  Please read the FAQ at  http://www.tux.org/lkml/

-- 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BISECTED] 3.17-rc1 radeon screen corruption due to "Always flush the HDP cache before submitting a CS to the GPU"

2014-09-02 Thread Mikael Pettersson

Michel Dänzer writes:
 > On 30.08.2014 22:59, Mikael Pettersson wrote:
 > > Since 3.17-rc1 my radeon card (RV370 / X1050 card) causes screen corruption
 > > after a while in X + firefox.  This still occurs with yesterday's HEAD
 > > of Linus' repo.  3.16 and ealier kernels are fine.
 > > 
 > > I ran a bisect, which identified:
 > > 
 > > commit 72a9987edcedb89db988079a03c9b9c65b6ec9ac
 > > Author: Michel DÃ¤nzer 
 > > Date:   Thu Jul 31 18:43:49 2014 +0900
 > > 
 > >  drm/radeon: Always flush the HDP cache before submitting a CS to the 
 > > GPU
 > > 
 > > as the cause of my screen corruption.  Reverting this from 3.17-rc2
 > > (which requires manual intervention due to subsequent changes in
 > > radeon_ring_commit()) eliminates the screen corruption.
 > 
 > Does the patch below help?

Thanks for the patch, I'll test it on Friday evening when I'm
back home and have access to the affected machine.


 > 
 > diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
 > index 4c5ec44..3ff9c53 100644
 > --- a/drivers/gpu/drm/radeon/r100.c
 > +++ b/drivers/gpu/drm/radeon/r100.c
 > @@ -1070,6 +1070,20 @@ void r100_ring_hdp_flush(struct radeon_device *rdev, 
 > struct radeon_ring *ring)
 >  radeon_ring_write(ring, rdev->config.r100.hdp_cntl);
 >  }
 >  
 > +/**
 > + * r100_mmio_hdp_flush - flush Host Data Path via MMIO
 > + * rdev: radeon device structure
 > + */
 > +void r100_mmio_hdp_flush(struct radeon_device *rdev)
 > +{
 > +WREG32(RADEON_HOST_PATH_CNTL,
 > +   rdev->config.r100.hdp_cntl | RADEON_HDP_READ_BUFFER_INVALIDATE);
 > +(void)RREG32(RADEON_HOST_PATH_CNTL);
 > +WREG32(RADEON_HOST_PATH_CNTL,
 > +   rdev->config.r100.hdp_cntl);
 > +(void)RREG32(RADEON_HOST_PATH_CNTL);
 > +}
 > +
 >  static void r100_cp_load_microcode(struct radeon_device *rdev)
 >  {
 >  const __be32 *fw_data;
 > diff --git a/drivers/gpu/drm/radeon/radeon_asic.c 
 > b/drivers/gpu/drm/radeon/radeon_asic.c
 > index abe..c23a123 100644
 > --- a/drivers/gpu/drm/radeon/radeon_asic.c
 > +++ b/drivers/gpu/drm/radeon/radeon_asic.c
 > @@ -408,7 +408,7 @@ static struct radeon_asic r300_asic_pcie = {
 >  .resume = _resume,
 >  .vga_set_state = _vga_set_state,
 >  .asic_reset = _asic_reset,
 > -.mmio_hdp_flush = NULL,
 > +.mmio_hdp_flush = r100_mmio_hdp_flush,
 >  .gui_idle = _gui_idle,
 >  .mc_wait_for_idle = _mc_wait_for_idle,
 >  .gart = {
 > diff --git a/drivers/gpu/drm/radeon/radeon_asic.h 
 > b/drivers/gpu/drm/radeon/radeon_asic.h
 > index 275a5dc..e9b1c35 100644
 > --- a/drivers/gpu/drm/radeon/radeon_asic.h
 > +++ b/drivers/gpu/drm/radeon/radeon_asic.h
 > @@ -150,6 +150,8 @@ void r100_gfx_set_wptr(struct radeon_device *rdev,
 > struct radeon_ring *ring);
 >  void r100_ring_hdp_flush(struct radeon_device *rdev,
 >   struct radeon_ring *ring);
 > +void r100_mmio_hdp_flush(struct radeon_device *rdev);
 > +
 >  /*
 >   * r200,rv250,rs300,rv280
 >   */
 > diff --git a/drivers/gpu/drm/radeon/radeon_gem.c 
 > b/drivers/gpu/drm/radeon/radeon_gem.c
 > index bfd7e1b..3d0f564 100644
 > --- a/drivers/gpu/drm/radeon/radeon_gem.c
 > +++ b/drivers/gpu/drm/radeon/radeon_gem.c
 > @@ -368,6 +368,7 @@ int radeon_gem_wait_idle_ioctl(struct drm_device *dev, 
 > void *data,
 >  r = radeon_bo_wait(robj, _placement, false);
 >  /* Flush HDP cache via MMIO if necessary */
 >  if (rdev->asic->mmio_hdp_flush &&
 > +!rdev->asic->ring[RADEON_RING_TYPE_GFX_INDEX]->hdp_flush &&
 >  radeon_mem_type_to_domain(cur_placement) == RADEON_GEM_DOMAIN_VRAM)
 >  robj->rdev->asic->mmio_hdp_flush(rdev);
 >  drm_gem_object_unreference_unlocked(gobj);
 > diff --git a/drivers/gpu/drm/radeon/radeon_ring.c 
 > b/drivers/gpu/drm/radeon/radeon_ring.c
 > index d656079..b82843b 100644
 > --- a/drivers/gpu/drm/radeon/radeon_ring.c
 > +++ b/drivers/gpu/drm/radeon/radeon_ring.c
 > @@ -188,7 +188,8 @@ void radeon_ring_commit(struct radeon_device *rdev, 
 > struct radeon_ring *ring,
 >  /* If we are emitting the HDP flush via the ring buffer, we need to
 >   * do it before padding.
 >   */
 > -if (hdp_flush && rdev->asic->ring[ring->idx]->hdp_flush)
 > +if (hdp_flush && rdev->asic->ring[ring->idx]->hdp_flush &&
 > +!rdev->asic->mmio_hdp_flush)
 >  rdev->asic->ring[ring->idx]->hdp_flush(rdev, ring);
 >  /* We pad to match fetch size */
 >  while (ring->wptr & ring->align_mask) {
 > 
 > 
 > 
 > -- 
 > Earthling Michel Dänzer|  http://www.amd.com
 > Libre software enthusiast  |Mesa and X developer

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BISECTED] 3.17-rc1 radeon screen corruption due to Always flush the HDP cache before submitting a CS to the GPU

2014-09-02 Thread Mikael Pettersson

Michel Dänzer writes:
  On 30.08.2014 22:59, Mikael Pettersson wrote:
   Since 3.17-rc1 my radeon card (RV370 / X1050 card) causes screen corruption
   after a while in X + firefox.  This still occurs with yesterday's HEAD
   of Linus' repo.  3.16 and ealier kernels are fine.
   
   I ran a bisect, which identified:
   
   commit 72a9987edcedb89db988079a03c9b9c65b6ec9ac
   Author: Michel DÃ¤nzer michel.daen...@amd.com
   Date:   Thu Jul 31 18:43:49 2014 +0900
   
drm/radeon: Always flush the HDP cache before submitting a CS to the 
   GPU
   
   as the cause of my screen corruption.  Reverting this from 3.17-rc2
   (which requires manual intervention due to subsequent changes in
   radeon_ring_commit()) eliminates the screen corruption.
  
  Does the patch below help?

Thanks for the patch, I'll test it on Friday evening when I'm
back home and have access to the affected machine.


  
  diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
  index 4c5ec44..3ff9c53 100644
  --- a/drivers/gpu/drm/radeon/r100.c
  +++ b/drivers/gpu/drm/radeon/r100.c
  @@ -1070,6 +1070,20 @@ void r100_ring_hdp_flush(struct radeon_device *rdev, 
  struct radeon_ring *ring)
   radeon_ring_write(ring, rdev-config.r100.hdp_cntl);
   }
   
  +/**
  + * r100_mmio_hdp_flush - flush Host Data Path via MMIO
  + * rdev: radeon device structure
  + */
  +void r100_mmio_hdp_flush(struct radeon_device *rdev)
  +{
  +WREG32(RADEON_HOST_PATH_CNTL,
  +   rdev-config.r100.hdp_cntl | RADEON_HDP_READ_BUFFER_INVALIDATE);
  +(void)RREG32(RADEON_HOST_PATH_CNTL);
  +WREG32(RADEON_HOST_PATH_CNTL,
  +   rdev-config.r100.hdp_cntl);
  +(void)RREG32(RADEON_HOST_PATH_CNTL);
  +}
  +
   static void r100_cp_load_microcode(struct radeon_device *rdev)
   {
   const __be32 *fw_data;
  diff --git a/drivers/gpu/drm/radeon/radeon_asic.c 
  b/drivers/gpu/drm/radeon/radeon_asic.c
  index abe..c23a123 100644
  --- a/drivers/gpu/drm/radeon/radeon_asic.c
  +++ b/drivers/gpu/drm/radeon/radeon_asic.c
  @@ -408,7 +408,7 @@ static struct radeon_asic r300_asic_pcie = {
   .resume = r300_resume,
   .vga_set_state = r100_vga_set_state,
   .asic_reset = r300_asic_reset,
  -.mmio_hdp_flush = NULL,
  +.mmio_hdp_flush = r100_mmio_hdp_flush,
   .gui_idle = r100_gui_idle,
   .mc_wait_for_idle = r300_mc_wait_for_idle,
   .gart = {
  diff --git a/drivers/gpu/drm/radeon/radeon_asic.h 
  b/drivers/gpu/drm/radeon/radeon_asic.h
  index 275a5dc..e9b1c35 100644
  --- a/drivers/gpu/drm/radeon/radeon_asic.h
  +++ b/drivers/gpu/drm/radeon/radeon_asic.h
  @@ -150,6 +150,8 @@ void r100_gfx_set_wptr(struct radeon_device *rdev,
  struct radeon_ring *ring);
   void r100_ring_hdp_flush(struct radeon_device *rdev,
struct radeon_ring *ring);
  +void r100_mmio_hdp_flush(struct radeon_device *rdev);
  +
   /*
* r200,rv250,rs300,rv280
*/
  diff --git a/drivers/gpu/drm/radeon/radeon_gem.c 
  b/drivers/gpu/drm/radeon/radeon_gem.c
  index bfd7e1b..3d0f564 100644
  --- a/drivers/gpu/drm/radeon/radeon_gem.c
  +++ b/drivers/gpu/drm/radeon/radeon_gem.c
  @@ -368,6 +368,7 @@ int radeon_gem_wait_idle_ioctl(struct drm_device *dev, 
  void *data,
   r = radeon_bo_wait(robj, cur_placement, false);
   /* Flush HDP cache via MMIO if necessary */
   if (rdev-asic-mmio_hdp_flush 
  +!rdev-asic-ring[RADEON_RING_TYPE_GFX_INDEX]-hdp_flush 
   radeon_mem_type_to_domain(cur_placement) == RADEON_GEM_DOMAIN_VRAM)
   robj-rdev-asic-mmio_hdp_flush(rdev);
   drm_gem_object_unreference_unlocked(gobj);
  diff --git a/drivers/gpu/drm/radeon/radeon_ring.c 
  b/drivers/gpu/drm/radeon/radeon_ring.c
  index d656079..b82843b 100644
  --- a/drivers/gpu/drm/radeon/radeon_ring.c
  +++ b/drivers/gpu/drm/radeon/radeon_ring.c
  @@ -188,7 +188,8 @@ void radeon_ring_commit(struct radeon_device *rdev, 
  struct radeon_ring *ring,
   /* If we are emitting the HDP flush via the ring buffer, we need to
* do it before padding.
*/
  -if (hdp_flush  rdev-asic-ring[ring-idx]-hdp_flush)
  +if (hdp_flush  rdev-asic-ring[ring-idx]-hdp_flush 
  +!rdev-asic-mmio_hdp_flush)
   rdev-asic-ring[ring-idx]-hdp_flush(rdev, ring);
   /* We pad to match fetch size */
   while (ring-wptr  ring-align_mask) {
  
  
  
  -- 
  Earthling Michel Dänzer|  http://www.amd.com
  Libre software enthusiast  |Mesa and X developer

-- 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[BISECTED] 3.17-rc1 radeon screen corruption due to "Always flush the HDP cache before submitting a CS to the GPU"

2014-08-30 Thread Mikael Pettersson

Since 3.17-rc1 my radeon card (RV370 / X1050 card) causes screen corruption
after a while in X + firefox.  This still occurs with yesterday's HEAD
of Linus' repo.  3.16 and ealier kernels are fine.

I ran a bisect, which identified:

commit 72a9987edcedb89db988079a03c9b9c65b6ec9ac
Author: Michel DÃ¤nzer 
Date:   Thu Jul 31 18:43:49 2014 +0900

drm/radeon: Always flush the HDP cache before submitting a CS to the GPU

as the cause of my screen corruption.  Reverting this from 3.17-rc2
(which requires manual intervention due to subsequent changes in
radeon_ring_commit()) eliminates the screen corruption.

User-space is vanilla Fedora 19 / x86_64 with updates.  radeon_drv.so says:

[62.574] (II) LoadModule: "radeon"
[62.574] (II) Loading /usr/lib64/xorg/modules/drivers/radeon_drv.so
[62.574] (II) Module radeon: vendor="X.Org Foundation"
[62.574]compiled for 1.14.0, module version = 7.1.99
[62.574]Module class: X.Org Video Driver
[62.574]ABI class: X.Org Video Driver, version 14.1
...
[62.585] (--) RADEON(0): Chipset: "ATI Radeon X550 (RV370) 5B63 (PCIE)" 
(ChipID = 0x5b63)

See also my original report to LKML:


/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[BISECTED] 3.17-rc1 radeon screen corruption due to Always flush the HDP cache before submitting a CS to the GPU

2014-08-30 Thread Mikael Pettersson

Since 3.17-rc1 my radeon card (RV370 / X1050 card) causes screen corruption
after a while in X + firefox.  This still occurs with yesterday's HEAD
of Linus' repo.  3.16 and ealier kernels are fine.

I ran a bisect, which identified:

commit 72a9987edcedb89db988079a03c9b9c65b6ec9ac
Author: Michel DÃ¤nzer michel.daen...@amd.com
Date:   Thu Jul 31 18:43:49 2014 +0900

drm/radeon: Always flush the HDP cache before submitting a CS to the GPU

as the cause of my screen corruption.  Reverting this from 3.17-rc2
(which requires manual intervention due to subsequent changes in
radeon_ring_commit()) eliminates the screen corruption.

User-space is vanilla Fedora 19 / x86_64 with updates.  radeon_drv.so says:

[62.574] (II) LoadModule: radeon
[62.574] (II) Loading /usr/lib64/xorg/modules/drivers/radeon_drv.so
[62.574] (II) Module radeon: vendor=X.Org Foundation
[62.574]compiled for 1.14.0, module version = 7.1.99
[62.574]Module class: X.Org Video Driver
[62.574]ABI class: X.Org Video Driver, version 14.1
...
[62.585] (--) RADEON(0): Chipset: ATI Radeon X550 (RV370) 5B63 (PCIE) 
(ChipID = 0x5b63)

See also my original report to LKML:
http://marc.info/?l=linux-kernelm=140829066726743w=2

/Mikael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: rt_sigreturn rejects a substitute stack frame as invalid.

2014-08-18 Thread Mikael Pettersson

Steven Stewart-Gallus writes:
 > Hello,
 > 
 > I'm not totally sure that GLibc's setcontext is safe to use in a
 > signal handler. So, I decided I was going to play things safe and let
 > rt_sigreturn switch stacks for me instead. However, rt_sigreturn seems
 > to reject my substitute stack frame as invalid and I'm not sure why.

I did similar things at my previous work (doing dynamic binary
instrumentation and virtualization of user-space binaries; can't
share the code alas, it's proprietary), but my code operated
directly on top of the kernel/user-space API, using the actual
kernel/user-space data structures rather than glibc's fake ones.

If you're sure that it's the kernel's rt_sigreturn and not whatever
glibc runs before it that complains, then a simple way of debugging
this is to modify your kernel to printk some diagnostics whenever
rt_sigreturn decides to error out.

You may also want to check out the 'pth' package.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: rt_sigreturn rejects a substitute stack frame as invalid.

2014-08-18 Thread Mikael Pettersson

Steven Stewart-Gallus writes:
  Hello,
  
  I'm not totally sure that GLibc's setcontext is safe to use in a
  signal handler. So, I decided I was going to play things safe and let
  rt_sigreturn switch stacks for me instead. However, rt_sigreturn seems
  to reject my substitute stack frame as invalid and I'm not sure why.

I did similar things at my previous work (doing dynamic binary
instrumentation and virtualization of user-space binaries; can't
share the code alas, it's proprietary), but my code operated
directly on top of the kernel/user-space API, using the actual
kernel/user-space data structures rather than glibc's fake ones.

If you're sure that it's the kernel's rt_sigreturn and not whatever
glibc runs before it that complains, then a simple way of debugging
this is to modify your kernel to printk some diagnostics whenever
rt_sigreturn decides to error out.

You may also want to check out the 'pth' package.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[BUG] 3.17-rc1 radeon screen corruption and hang

2014-08-17 Thread Mikael Pettersson

On one of my machines, an Ivy Bridge desktop with a Radeon X550 (RV370)
graphics card, 3.17-rc1 causes random screen corruption with X running.
Eventually X stopped honoring mouse or keyboard clicks, and when I tried
to reboot it via an ssh login from another machine, it hanged hard.
3.16 and earlier kernels work perfectly.

After rebooting there was nothing interesting in /var/log/messages, but
Xorg.0.log.old showed the following:

(EE) [mi] EQ overflowing.  Additional events will be discarded until existing 
events are processed.
(EE) 
(EE) Backtrace:
(EE) 0: X (mieqEnqueue+0x22b) [0x57728b]
(EE) 1: X (QueuePointerEvents+0x52) [0x44db12]
(EE) 2: /usr/lib64/xorg/modules/input/evdev_drv.so (_init+0x2c4d) 
[0x7f287fbf7b1d]
(EE) 3: X (DPMSSupported+0xe8) [0x486498]
(EE) 4: X (xf86SerialModemClearBits+0x230) [0x4aea80]
(EE) 5: /lib64/libpthread.so.0 (__restore_rt+0x0) [0x7f28859f2f8f]
(EE) 6: /lib64/libc.so.6 (ioctl+0x7) [0x7f28846f5eb7]
(EE) 7: /lib64/libdrm.so.2 (drmIoctl+0x34) [0x7f28857db6c4]
(EE) 8: /lib64/libdrm.so.2 (drmCommandWrite+0x1e) [0x7f28857dd84e]
(EE) 9: /lib64/libdrm_radeon.so.1 (_init+0x4c1) [0x7f288032e511]
(EE) 10: /lib64/libdrm_radeon.so.1 (_init+0x6c4) [0x7f288032e924]
(EE) 11: /usr/lib64/xorg/modules/drivers/radeon_drv.so (_init+0x1af29) 
[0x7f2880577f09]
(EE) 12: /usr/lib64/xorg/modules/libexa.so (exaMoveOutPixmap+0x1287) 
[0x7f287f8fe9a7]
(EE) 13: /usr/lib64/xorg/modules/libexa.so (exaMoveOutPixmap+0x3a12) 
[0x7f287f903852]
(EE) 14: /usr/lib64/xorg/modules/libexa.so (exaEnableDisableFBAccess+0x481a) 
[0x7f287f90deda]
(EE) 15: /usr/lib64/xorg/modules/libexa.so (exaEnableDisableFBAccess+0x15b4) 
[0x7f287f907cb4]
(EE) 16: X (DamageRegionAppend+0x592) [0x521a92]
(EE) 17: X (AddTraps+0x3b8a) [0x51e99a]
(EE) 18: X (SendErrorToClient+0x3f7) [0x437217]
(EE) 19: X (_init+0x3aa2) [0x429db2]
(EE) 20: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x7f288462ab45]
(EE) 21: X (_start+0x29) [0x426a21]
(EE) 22: ? (?+0x29) [0x29]

This repeated a number of times, and the last entry was prefixed by:

(EE) [mi] EQ overflow continuing.  1000 events have been dropped.
(EE) [mi] No further overflow reports will be reported until the clog is 
cleared.

User-space is Fedora 19 / x86_64 with latest updates.  No non-Fedora X drivers.

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[BUG] 3.17-rc1 radeon screen corruption and hang

2014-08-17 Thread Mikael Pettersson

On one of my machines, an Ivy Bridge desktop with a Radeon X550 (RV370)
graphics card, 3.17-rc1 causes random screen corruption with X running.
Eventually X stopped honoring mouse or keyboard clicks, and when I tried
to reboot it via an ssh login from another machine, it hanged hard.
3.16 and earlier kernels work perfectly.

After rebooting there was nothing interesting in /var/log/messages, but
Xorg.0.log.old showed the following:

(EE) [mi] EQ overflowing.  Additional events will be discarded until existing 
events are processed.
(EE) 
(EE) Backtrace:
(EE) 0: X (mieqEnqueue+0x22b) [0x57728b]
(EE) 1: X (QueuePointerEvents+0x52) [0x44db12]
(EE) 2: /usr/lib64/xorg/modules/input/evdev_drv.so (_init+0x2c4d) 
[0x7f287fbf7b1d]
(EE) 3: X (DPMSSupported+0xe8) [0x486498]
(EE) 4: X (xf86SerialModemClearBits+0x230) [0x4aea80]
(EE) 5: /lib64/libpthread.so.0 (__restore_rt+0x0) [0x7f28859f2f8f]
(EE) 6: /lib64/libc.so.6 (ioctl+0x7) [0x7f28846f5eb7]
(EE) 7: /lib64/libdrm.so.2 (drmIoctl+0x34) [0x7f28857db6c4]
(EE) 8: /lib64/libdrm.so.2 (drmCommandWrite+0x1e) [0x7f28857dd84e]
(EE) 9: /lib64/libdrm_radeon.so.1 (_init+0x4c1) [0x7f288032e511]
(EE) 10: /lib64/libdrm_radeon.so.1 (_init+0x6c4) [0x7f288032e924]
(EE) 11: /usr/lib64/xorg/modules/drivers/radeon_drv.so (_init+0x1af29) 
[0x7f2880577f09]
(EE) 12: /usr/lib64/xorg/modules/libexa.so (exaMoveOutPixmap+0x1287) 
[0x7f287f8fe9a7]
(EE) 13: /usr/lib64/xorg/modules/libexa.so (exaMoveOutPixmap+0x3a12) 
[0x7f287f903852]
(EE) 14: /usr/lib64/xorg/modules/libexa.so (exaEnableDisableFBAccess+0x481a) 
[0x7f287f90deda]
(EE) 15: /usr/lib64/xorg/modules/libexa.so (exaEnableDisableFBAccess+0x15b4) 
[0x7f287f907cb4]
(EE) 16: X (DamageRegionAppend+0x592) [0x521a92]
(EE) 17: X (AddTraps+0x3b8a) [0x51e99a]
(EE) 18: X (SendErrorToClient+0x3f7) [0x437217]
(EE) 19: X (_init+0x3aa2) [0x429db2]
(EE) 20: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x7f288462ab45]
(EE) 21: X (_start+0x29) [0x426a21]
(EE) 22: ? (?+0x29) [0x29]

This repeated a number of times, and the last entry was prefixed by:

(EE) [mi] EQ overflow continuing.  1000 events have been dropped.
(EE) [mi] No further overflow reports will be reported until the clog is 
cleared.

User-space is Fedora 19 / x86_64 with latest updates.  No non-Fedora X drivers.

/Mikael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 0/2] __vdso_findsym

2014-06-15 Thread Mikael Pettersson

Andy Lutomirski writes:
 > The idea is to add AT_VDSO_FINDSYM pointing at __vdso_findsym.  This
 > implements __vdso_findsym.
 > 
 > This would make it easier for runtimes that don't otherwise implement
 > ELF loaders to use the vdso.
 > 
 > Thoughts?

I'm opposed to this based on the principle that the kernel should NOT
be a dumping ground for random code that user-space can and should
implement for itself.  As long as the vdso is correctly formatted ELF,
then parsing it is easy, and the kernel should not care at all if or
how user-space accesses it.

The fact that golang got it wrong is not an argument for moving this
functionality into the kernel.

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 0/2] __vdso_findsym

2014-06-15 Thread Mikael Pettersson

Andy Lutomirski writes:
  The idea is to add AT_VDSO_FINDSYM pointing at __vdso_findsym.  This
  implements __vdso_findsym.
  
  This would make it easier for runtimes that don't otherwise implement
  ELF loaders to use the vdso.
  
  Thoughts?

I'm opposed to this based on the principle that the kernel should NOT
be a dumping ground for random code that user-space can and should
implement for itself.  As long as the vdso is correctly formatted ELF,
then parsing it is easy, and the kernel should not care at all if or
how user-space accesses it.

The fact that golang got it wrong is not an argument for moving this
functionality into the kernel.

/Mikael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: rcu alignment warning tripping on m68k

2014-06-07 Thread Mikael Pettersson

Paul E. McKenney writes:
 > On Fri, May 30, 2014 at 11:29:41AM +1000, Greg Ungerer wrote:
 > > On 29/05/14 23:11, One Thousand Gnomes wrote:
 > > > On Thu, 29 May 2014 12:08:32 +1000
 > > > Greg Ungerer  wrote:
 > > > 
 > > >> Hi All,
 > > >>
 > > >> Inside kernel/rcy/tree.c in __call_rcu() it does an alignment check on
 > > >> the head pointer passed in. This trips on m68k systems, because they 
 > > >> only
 > > >> need alignment of 32bit quantities to 16bit boundaries.
 > > > 
 > > > __alignof perhaps ?
 > > 
 > > That might do. Change then becomes something like:
 > > 
 > > --- a/kernel/rcu/tree.c
 > > +++ b/kernel/rcu/tree.c
 > > @@ -2467,7 +2467,7 @@ __call_rcu(struct rcu_head *head, void 
 > > (*func)(struct rcu_
 > > unsigned long flags;
 > > struct rcu_data *rdp;
 > > 
 > > -   WARN_ON_ONCE((unsigned long)head & 0x3); /* Misaligned rcu_head! */
 > > +   WARN_ON_ONCE((unsigned long)head & (__alignof__(head) - 1)); /* 
 > > Misaligned rcu_head! */
 > 
 > Hmmm...  The purpose of the check is to reserve the low-order bits to
 > allow RCU to classify callbacks as being time-critical or not.  RCU
 > can probably live with a single bit, but if there is some architecture
 > out there that simply refuses to do alignment, I need to know about it.
 > 
 > (See "git show 0bb7b59d6e2b8" for more info.)
 > 
 > So how about this instead?
 > 
 >  -   WARN_ON_ONCE((unsigned long)head & 0x1); /* Misaligned rcu_head! */
 > 
 > (Trying to remember if I have seen Linux kernel code that uses both
 > the lower bits...)

As stated above, m68k-linux aligns to 16-bit boundaries by default, so you'd
get one bit but not necessarily more.  If you want more free low bits, why
not attach an explicit attribute aligned to the rcu_head type declaration?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: rcu alignment warning tripping on m68k

2014-06-07 Thread Mikael Pettersson

Paul E. McKenney writes:
  On Fri, May 30, 2014 at 11:29:41AM +1000, Greg Ungerer wrote:
   On 29/05/14 23:11, One Thousand Gnomes wrote:
On Thu, 29 May 2014 12:08:32 +1000
Greg Ungerer g...@uclinux.org wrote:

Hi All,
   
Inside kernel/rcy/tree.c in __call_rcu() it does an alignment check on
the head pointer passed in. This trips on m68k systems, because they 
only
need alignment of 32bit quantities to 16bit boundaries.

__alignof perhaps ?
   
   That might do. Change then becomes something like:
   
   --- a/kernel/rcu/tree.c
   +++ b/kernel/rcu/tree.c
   @@ -2467,7 +2467,7 @@ __call_rcu(struct rcu_head *head, void 
   (*func)(struct rcu_
   unsigned long flags;
   struct rcu_data *rdp;
   
   -   WARN_ON_ONCE((unsigned long)head  0x3); /* Misaligned rcu_head! */
   +   WARN_ON_ONCE((unsigned long)head  (__alignof__(head) - 1)); /* 
   Misaligned rcu_head! */
  
  Hmmm...  The purpose of the check is to reserve the low-order bits to
  allow RCU to classify callbacks as being time-critical or not.  RCU
  can probably live with a single bit, but if there is some architecture
  out there that simply refuses to do alignment, I need to know about it.
  
  (See git show 0bb7b59d6e2b8 for more info.)
  
  So how about this instead?
  
   -   WARN_ON_ONCE((unsigned long)head  0x1); /* Misaligned rcu_head! */
  
  (Trying to remember if I have seen Linux kernel code that uses both
  the lower bits...)

As stated above, m68k-linux aligns to 16-bit boundaries by default, so you'd
get one bit but not necessarily more.  If you want more free low bits, why
not attach an explicit attribute aligned to the rcu_head type declaration?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Q: uniqueness of pthread_t

2014-03-24 Thread Mikael Pettersson

Ulrich Windl writes:
 > Hi!
 > 
 > I'm programming a little bit with pthreads in Linux. As I understand 
 > pthread_t is an opaque type (a pointer address?) that cannot be mapped to 
 > the kernel's TID easily. Anyway: Is it expected that when one thread 
 > terminates and another thread is created (in fact the same thread again), 
 > that the p_thread my be resused?

Wrong list, please try  instead.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Q: uniqueness of pthread_t

2014-03-24 Thread Mikael Pettersson

Ulrich Windl writes:
  Hi!
  
  I'm programming a little bit with pthreads in Linux. As I understand 
  pthread_t is an opaque type (a pointer address?) that cannot be mapped to 
  the kernel's TID easily. Anyway: Is it expected that when one thread 
  terminates and another thread is created (in fact the same thread again), 
  that the p_thread my be resused?

Wrong list, please try libc-h...@sourceware.org instead.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

keyboard problems on Dell Latitude E6230

2014-02-06 Thread Mikael Pettersson

I'm having problems with the built-in keyboard on Dell Latitude E6230
laptops and Linux 3.12/3.13 kernels:

- sometimes the keyboard just keeps sending the same key code, as if
  the key was held down permanently; sometimes that can be cured by
  pressing ^C or something, but often only a reboot fixes it

- sometimes the keyboard just stops sending key codes; only a reboot
  fixes it

This issue has plagued me since Nov last year when I got my first E6230.
Last week I got a replacement machine, but it too has the same problems.

Since I haven't seen this problem with any other laptop under Linux for
the last 15 years or so, I have to conclude that there's some HW issue
with these machines that the Linux kernel doesn't handle.

Any ideas?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

keyboard problems on Dell Latitude E6230

2014-02-06 Thread Mikael Pettersson

I'm having problems with the built-in keyboard on Dell Latitude E6230
laptops and Linux 3.12/3.13 kernels:

- sometimes the keyboard just keeps sending the same key code, as if
  the key was held down permanently; sometimes that can be cured by
  pressing ^C or something, but often only a reboot fixes it

- sometimes the keyboard just stops sending key codes; only a reboot
  fixes it

This issue has plagued me since Nov last year when I got my first E6230.
Last week I got a replacement machine, but it too has the same problems.

Since I haven't seen this problem with any other laptop under Linux for
the last 15 years or so, I have to conclude that there's some HW issue
with these machines that the Linux kernel doesn't handle.

Any ideas?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/5] ia64 ski emulator patches

2014-01-28 Thread Mikael Pettersson

Mikael Pettersson writes:
 > Mikulas Patocka writes:
 >  > 
 >  > 
 >  > On Sat, 25 Jan 2014, Mikael Pettersson wrote:
 >  > 
 >  > > My ski patches are in 
 > <http://user.it.uu.se/~mikpe/linux/patches/ia64/ski-1.3.2/>
 >  > > for now.  I'll post the kernel patches to linux-ia64 @ vger in a few 
 > minutes.
 >  > > 
 >  > > /Mikael
 >  > 
 >  > Thanks for the patches.
 >  > 
 >  > Isn't this subject to races? - could it lock up if the signal happens 
 > just 
 >  > before the pause syscall?
 >  > 
 >  > +case SSC_HALT_LIGHT:
 >  > +  /* Sleep until SIGIO or SIGALRM is received; this relies on
 >  > +keyboard/ethernet input being detected via SIGIO, and the
 >  > +ITC now being emulated via setitimer() and SIGALRM.  */
 >  > +  pause ();
 >  > +  break;
 >  > +
 > 
 > Thanks for the review. You're right, the pause mustn't happen if
 > itc_itimer_fired == 1.  Let me ponder this for a while...

Ok, I've fixed this in two different ways: one patch which uses pselect,
and one patch which uses plain select + the self-pipe trick.  Both work
in limited testing, but the pselect one is much nicer and appears to have
a little less host CPU overhead, so that's the one I'm stress-testing now.

Both patches have been uploaded to the same place as before.

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/5] ia64 ski emulator patches

2014-01-28 Thread Mikael Pettersson

Mikael Pettersson writes:
  Mikulas Patocka writes:


On Sat, 25 Jan 2014, Mikael Pettersson wrote:

 My ski patches are in 
  http://user.it.uu.se/~mikpe/linux/patches/ia64/ski-1.3.2/
 for now.  I'll post the kernel patches to linux-ia64 @ vger in a few 
  minutes.
 
 /Mikael

Thanks for the patches.

Isn't this subject to races? - could it lock up if the signal happens 
  just 
before the pause syscall?

+case SSC_HALT_LIGHT:
+  /* Sleep until SIGIO or SIGALRM is received; this relies on
+keyboard/ethernet input being detected via SIGIO, and the
+ITC now being emulated via setitimer() and SIGALRM.  */
+  pause ();
+  break;
+
  
  Thanks for the review. You're right, the pause mustn't happen if
  itc_itimer_fired == 1.  Let me ponder this for a while...

Ok, I've fixed this in two different ways: one patch which uses pselect,
and one patch which uses plain select + the self-pipe trick.  Both work
in limited testing, but the pselect one is much nicer and appears to have
a little less host CPU overhead, so that's the one I'm stress-testing now.

Both patches have been uploaded to the same place as before.

/Mikael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/5] ia64 ski emulator patches

2014-01-25 Thread Mikael Pettersson

Mikulas Patocka writes:
 > 
 > 
 > On Sat, 25 Jan 2014, Mikael Pettersson wrote:
 > 
 > > My ski patches are in 
 > > <http://user.it.uu.se/~mikpe/linux/patches/ia64/ski-1.3.2/>
 > > for now.  I'll post the kernel patches to linux-ia64 @ vger in a few 
 > > minutes.
 > > 
 > > /Mikael
 > 
 > Thanks for the patches.
 > 
 > Isn't this subject to races? - could it lock up if the signal happens just 
 > before the pause syscall?
 > 
 > +case SSC_HALT_LIGHT:
 > +  /* Sleep until SIGIO or SIGALRM is received; this relies on
 > +keyboard/ethernet input being detected via SIGIO, and the
 > +ITC now being emulated via setitimer() and SIGALRM.  */
 > +  pause ();
 > +  break;
 > +

Thanks for the review. You're right, the pause mustn't happen if
itc_itimer_fired == 1.  Let me ponder this for a while...

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 0/5] ia64 ski emulator patches

2014-01-25 Thread Mikael Pettersson

Luck, Tony writes:
 > Mikulas:
 > >> Here I'm sending some ia64 patches to make it work in the ski emulator. 
 > >> This has been broken for a long time.
 > 
 > Thanks - There are questions from time to time on how to test ia64
 > for those people who do not have hardware.
 > 
 > Mikael:
 > > Thanks.  I've recently started running 3.x kernels on ia64 via ski,
 > > but I'm getting random kernel crashes with 3.13.  I'll give your
 > > patches a try shortly.
 > 
 > Let me know how that goes - I haven't used ski in a decade and
 > have quite forgotten how to set it up.

It seems my kernel crashes were due to using gcc-4.8.2 to compile ski,
going back to gcc-4.7.3 for ski, and gcc-4.3.6 for cross-compiling
the kernel, gives me a solid emulated system.

 > > I've written a few patches to improve other aspects of running the
 > > kernel on ski:
 > > - kernel patch to turn PAL_HALT_LIGHT into a new SSC_HALT_LIGHT,
 > >   and a corresponing ski patch to pause() on SSC_HALT_LIGHT; this
 > >   together with the fixed-frequency ITC patch allows ski to idle
 > >   with very low host CPU overhead when the guest kernel idles
 > > - kernel patch to bump the RAM size from 130MB to 2GB
 > >
 > > I'd be happy to share these patches if there's interest in them.
 > 
 > It seems that there are at least two of you out there - so I'm happy
 > to take kernel patches that make things better.  Not sure where the
 > ski patches go - is someone maintaining that?

The ski project on sourceforge seems dead, nothing have been posted
on its mailing list since 2008, and the web interface for browsing
the source repo is broken.  Red Hat, Gentoo, and Debian package ski
with a few common build fixes, but that's it.

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/5] ia64 ski emulator patches

2014-01-25 Thread Mikael Pettersson

Mikulas Patocka writes:
 > 
 > 
 > On Fri, 24 Jan 2014, Mikael Pettersson wrote:
 > 
 > > Mikulas Patocka writes:
 > >  > Hi
 > >  > 
 > >  > Here I'm sending some ia64 patches to make it work in the ski emulator. 
 > >  > This has been broken for a long time.
 > > 
 > > Thanks.  I've recently started running 3.x kernels on ia64 via ski,
 > > but I'm getting random kernel crashes with 3.13.  I'll give your
 > > patches a try shortly.
 > 
 > I also had some random page-table corruption when running recent kernels 
 > in ski. The problems occured when upgrading the whole Debian distribution. 
 > Kernel 2.6.8 was solid, new kernels caused problems, I don't know why.

What I've seen so far seems to indicate that gcc-4.8.2 miscompiles ski,
resulting in kernel oopses, and _possibly_ that gcc-4.7.3 miscompiles
the kernel.  Ski compiled by gcc-4.7.3 (on x86_64) running a kernel
compiled by gcc-4.3.6 seems to be a solid combination for me.

 > > I've written a few patches to improve other aspects of running the
 > > kernel on ski:
 > > - ski patch to use tun/tap networking (no need to run ski as root)
 > > - ski patch to implement a fixed-frequency ITC (the ITC is currently
 > >   highly variable, completely breaking basic timekeeping)
 > > - kernel patch to turn PAL_HALT_LIGHT into a new SSC_HALT_LIGHT,
 > >   and a corresponing ski patch to pause() on SSC_HALT_LIGHT; this
 > >   together with the fixed-frequency ITC patch allows ski to idle
 > >   with very low host CPU overhead when the guest kernel idles
 > > - kernel patch to bump the RAM size from 130MB to 2GB
 > > 
 > > I'd be happy to share these patches if there's interest in them.
 > > 
 > > /Mikael
 > 
 > I would be interested in them. I also patched that timekeeping issue in 
 > ski.

My ski patches are in 
<http://user.it.uu.se/~mikpe/linux/patches/ia64/ski-1.3.2/>
for now.  I'll post the kernel patches to linux-ia64 @ vger in a few minutes.

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/5] ia64 ski emulator patches

2014-01-25 Thread Mikael Pettersson

Mikulas Patocka writes:
 > 
 > 
 > On Fri, 24 Jan 2014, Mikael Pettersson wrote:
 > 
 > > Mikulas Patocka writes:
 > >  > Hi
 > >  > 
 > >  > Here I'm sending some ia64 patches to make it work in the ski emulator. 
 > >  > This has been broken for a long time.
 > > 
 > > Thanks.  I've recently started running 3.x kernels on ia64 via ski,
 > > but I'm getting random kernel crashes with 3.13.  I'll give your
 > > patches a try shortly.
 > 
 > BTW. does the kernel boot for you without that _STK_LIM_MAX change? For 
 > me, _STK_LIM_MAX was a showstopper, it wasn't able to spawn any userspace 
 > process without this patch.

Yes, everything works fine for me without your _STK_LIM_MAX change.
My ski VMs currently run 3.12 kernels with Fedora 9 user-space.

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/5] ia64 ski emulator patches

2014-01-25 Thread Mikael Pettersson

Mikulas Patocka writes:
  
  
  On Fri, 24 Jan 2014, Mikael Pettersson wrote:
  
   Mikulas Patocka writes:
 Hi
 
 Here I'm sending some ia64 patches to make it work in the ski emulator. 
 This has been broken for a long time.
   
   Thanks.  I've recently started running 3.x kernels on ia64 via ski,
   but I'm getting random kernel crashes with 3.13.  I'll give your
   patches a try shortly.
  
  BTW. does the kernel boot for you without that _STK_LIM_MAX change? For 
  me, _STK_LIM_MAX was a showstopper, it wasn't able to spawn any userspace 
  process without this patch.

Yes, everything works fine for me without your _STK_LIM_MAX change.
My ski VMs currently run 3.12 kernels with Fedora 9 user-space.

/Mikael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/5] ia64 ski emulator patches

2014-01-25 Thread Mikael Pettersson

Mikulas Patocka writes:
  
  
  On Fri, 24 Jan 2014, Mikael Pettersson wrote:
  
   Mikulas Patocka writes:
 Hi
 
 Here I'm sending some ia64 patches to make it work in the ski emulator. 
 This has been broken for a long time.
   
   Thanks.  I've recently started running 3.x kernels on ia64 via ski,
   but I'm getting random kernel crashes with 3.13.  I'll give your
   patches a try shortly.
  
  I also had some random page-table corruption when running recent kernels 
  in ski. The problems occured when upgrading the whole Debian distribution. 
  Kernel 2.6.8 was solid, new kernels caused problems, I don't know why.

What I've seen so far seems to indicate that gcc-4.8.2 miscompiles ski,
resulting in kernel oopses, and _possibly_ that gcc-4.7.3 miscompiles
the kernel.  Ski compiled by gcc-4.7.3 (on x86_64) running a kernel
compiled by gcc-4.3.6 seems to be a solid combination for me.

   I've written a few patches to improve other aspects of running the
   kernel on ski:
   - ski patch to use tun/tap networking (no need to run ski as root)
   - ski patch to implement a fixed-frequency ITC (the ITC is currently
 highly variable, completely breaking basic timekeeping)
   - kernel patch to turn PAL_HALT_LIGHT into a new SSC_HALT_LIGHT,
 and a corresponing ski patch to pause() on SSC_HALT_LIGHT; this
 together with the fixed-frequency ITC patch allows ski to idle
 with very low host CPU overhead when the guest kernel idles
   - kernel patch to bump the RAM size from 130MB to 2GB
   
   I'd be happy to share these patches if there's interest in them.
   
   /Mikael
  
  I would be interested in them. I also patched that timekeeping issue in 
  ski.

My ski patches are in 
http://user.it.uu.se/~mikpe/linux/patches/ia64/ski-1.3.2/
for now.  I'll post the kernel patches to linux-ia64 @ vger in a few minutes.

/Mikael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 0/5] ia64 ski emulator patches

2014-01-25 Thread Mikael Pettersson

Luck, Tony writes:
  Mikulas:
   Here I'm sending some ia64 patches to make it work in the ski emulator. 
   This has been broken for a long time.
  
  Thanks - There are questions from time to time on how to test ia64
  for those people who do not have hardware.
  
  Mikael:
   Thanks.  I've recently started running 3.x kernels on ia64 via ski,
   but I'm getting random kernel crashes with 3.13.  I'll give your
   patches a try shortly.
  
  Let me know how that goes - I haven't used ski in a decade and
  have quite forgotten how to set it up.

It seems my kernel crashes were due to using gcc-4.8.2 to compile ski,
going back to gcc-4.7.3 for ski, and gcc-4.3.6 for cross-compiling
the kernel, gives me a solid emulated system.

   I've written a few patches to improve other aspects of running the
   kernel on ski:
   - kernel patch to turn PAL_HALT_LIGHT into a new SSC_HALT_LIGHT,
 and a corresponing ski patch to pause() on SSC_HALT_LIGHT; this
 together with the fixed-frequency ITC patch allows ski to idle
 with very low host CPU overhead when the guest kernel idles
   - kernel patch to bump the RAM size from 130MB to 2GB
  
   I'd be happy to share these patches if there's interest in them.
  
  It seems that there are at least two of you out there - so I'm happy
  to take kernel patches that make things better.  Not sure where the
  ski patches go - is someone maintaining that?

The ski project on sourceforge seems dead, nothing have been posted
on its mailing list since 2008, and the web interface for browsing
the source repo is broken.  Red Hat, Gentoo, and Debian package ski
with a few common build fixes, but that's it.

/Mikael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/5] ia64 ski emulator patches

2014-01-25 Thread Mikael Pettersson

Mikulas Patocka writes:
  
  
  On Sat, 25 Jan 2014, Mikael Pettersson wrote:
  
   My ski patches are in 
   http://user.it.uu.se/~mikpe/linux/patches/ia64/ski-1.3.2/
   for now.  I'll post the kernel patches to linux-ia64 @ vger in a few 
   minutes.
   
   /Mikael
  
  Thanks for the patches.
  
  Isn't this subject to races? - could it lock up if the signal happens just 
  before the pause syscall?
  
  +case SSC_HALT_LIGHT:
  +  /* Sleep until SIGIO or SIGALRM is received; this relies on
  +keyboard/ethernet input being detected via SIGIO, and the
  +ITC now being emulated via setitimer() and SIGALRM.  */
  +  pause ();
  +  break;
  +

Thanks for the review. You're right, the pause mustn't happen if
itc_itimer_fired == 1.  Let me ponder this for a while...

/Mikael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/5] ia64 ski emulator patches

2014-01-24 Thread Mikael Pettersson

Mikulas Patocka writes:
 > Hi
 > 
 > Here I'm sending some ia64 patches to make it work in the ski emulator. 
 > This has been broken for a long time.

Thanks.  I've recently started running 3.x kernels on ia64 via ski,
but I'm getting random kernel crashes with 3.13.  I'll give your
patches a try shortly.

I've written a few patches to improve other aspects of running the
kernel on ski:
- ski patch to use tun/tap networking (no need to run ski as root)
- ski patch to implement a fixed-frequency ITC (the ITC is currently
  highly variable, completely breaking basic timekeeping)
- kernel patch to turn PAL_HALT_LIGHT into a new SSC_HALT_LIGHT,
  and a corresponing ski patch to pause() on SSC_HALT_LIGHT; this
  together with the fixed-frequency ITC patch allows ski to idle
  with very low host CPU overhead when the guest kernel idles
- kernel patch to bump the RAM size from 130MB to 2GB

I'd be happy to share these patches if there's interest in them.

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/5] ia64 ski emulator patches

2014-01-24 Thread Mikael Pettersson

Mikulas Patocka writes:
  Hi
  
  Here I'm sending some ia64 patches to make it work in the ski emulator. 
  This has been broken for a long time.

Thanks.  I've recently started running 3.x kernels on ia64 via ski,
but I'm getting random kernel crashes with 3.13.  I'll give your
patches a try shortly.

I've written a few patches to improve other aspects of running the
kernel on ski:
- ski patch to use tun/tap networking (no need to run ski as root)
- ski patch to implement a fixed-frequency ITC (the ITC is currently
  highly variable, completely breaking basic timekeeping)
- kernel patch to turn PAL_HALT_LIGHT into a new SSC_HALT_LIGHT,
  and a corresponing ski patch to pause() on SSC_HALT_LIGHT; this
  together with the fixed-frequency ITC patch allows ski to idle
  with very low host CPU overhead when the guest kernel idles
- kernel patch to bump the RAM size from 130MB to 2GB

I'd be happy to share these patches if there's interest in them.

/Mikael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG?] mtrr sanitizer fails on Latitude E6230

2013-11-08 Thread Mikael Pettersson

Yinghai Lu writes:
 > On Thu, Nov 7, 2013 at 12:25 AM, Mikael Pettersson  
 > wrote:
 > > Yinghai Lu writes:
 > >  > On Wed, Nov 6, 2013 at 1:16 AM, Mikael Pettersson 
 > >  wrote:
 > >  > > I recently got a Dell Latitude E6230 (Ivy Bridge i7-3540M) and 
 > > noticed that
 > >  > > the mtrr sanitizer failed on it:
 > >  > >
 > >  > > === snip ===
 > >  > > Linux version 3.12.0 (mikpe@barley) (gcc version 4.8.3 20131017 
 > > (prerelease) (GCC) ) #1 SMP Wed Nov 6 09:46:02 CET 2013
 > >  > > Command line: ro root=LABEL=/ resume=/dev/sda2 rd_NO_LUKS rd_NO_LVM 
 > > rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 
 > > KEYTABLE=sv-latin1
 > >  > ...
 > >  > gran_size: 8M  chunk_size: 64M num_reg: 9  lose cover RAM: 
 > > 6M
 > >  > ...
 > >  > > mtrr_cleanup: can not find optimal value
 > >  > > please specify mtrr_gran_size/mtrr_chunk_size
 > >  > > === snip ===
 > >  > >
 > >  > > For now I'm disabling the mtrr sanitizer in this machine's kernel.
 > >  >
 > >  > Can you try to boot with "mtrr_gran_size=8m mtrr_chunk_size=64m" ?
 > >
 > > That results in:
 > >
 > > reg 0, base: 0GB, range: 8GB, type WB
 > > reg 1, base: 8GB, range: 512MB, type WB
 > > reg 2, base: 3584MB, range: 512MB, type UC
 > > reg 3, base: 3520MB, range: 64MB, type UC
 > > reg 4, base: 3512MB, range: 8MB, type UC
 > > reg 5, base: 8688MB, range: 16MB, type UC
 > > reg 6, base: 8680MB, range: 8MB, type UC
 > > reg 7, base: 8678MB, range: 2MB, type UC
 > > total RAM covered: 8094M
 > >  gran_size: 8M  chunk_size: 64M num_reg: 9  lose cover RAM: 6M
 > > New variable MTRRs
 > > reg 0, base: 0GB, range: 2GB, type WB
 > > reg 1, base: 2GB, range: 1GB, type WB
 > > reg 2, base: 3GB, range: 256MB, type WB
 > > reg 3, base: 3328MB, range: 128MB, type WB
 > > reg 4, base: 3456MB, range: 64MB, type WB
 > > reg 5, base: 3512MB, range: 8MB, type UC
 > > reg 6, base: 4GB, range: 4GB, type WB
 > > reg 7, base: 8GB, range: 512MB, type WB
 > > reg 8, base: 8672MB, range: 32MB, type UC
 > > e820: update [mem 0xdb80-0x] usable ==> reserved
 > > e820: update [mem 0x21e00-0x21e5f] usable ==> reserved
 > ...
 > >> modified: [mem 0x00021e00-0x00021e5f] reserved
 > 
 > that is right, it throw 6M away.
 > 
 > Did you notice any slowness or speeding for x window?

I'm not noticing any change.  i915 kernel driver + xorg's intel driver.

 > What does /proc/mtrr look like after xwindow is started?

reg00: base=0x0 (0MB), size= 2048MB, count=1: write-back
reg01: base=0x08000 ( 2048MB), size= 1024MB, count=1: write-back
reg02: base=0x0c000 ( 3072MB), size=  256MB, count=1: write-back
reg03: base=0x0d000 ( 3328MB), size=  128MB, count=1: write-back
reg04: base=0x0d800 ( 3456MB), size=   64MB, count=1: write-back
reg05: base=0x0db80 ( 3512MB), size=8MB, count=1: uncachable
reg06: base=0x1 ( 4096MB), size= 4096MB, count=1: write-back
reg07: base=0x2 ( 8192MB), size=  512MB, count=1: write-back
reg08: base=0x21e00 ( 8672MB), size=   32MB, count=1: uncachable

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG?] mtrr sanitizer fails on Latitude E6230

2013-11-08 Thread Mikael Pettersson

Yinghai Lu writes:
  On Thu, Nov 7, 2013 at 12:25 AM, Mikael Pettersson mikpeli...@gmail.com 
  wrote:
   Yinghai Lu writes:
 On Wed, Nov 6, 2013 at 1:16 AM, Mikael Pettersson 
   mikpeli...@gmail.com wrote:
  I recently got a Dell Latitude E6230 (Ivy Bridge i7-3540M) and 
   noticed that
  the mtrr sanitizer failed on it:
 
  === snip ===
  Linux version 3.12.0 (mikpe@barley) (gcc version 4.8.3 20131017 
   (prerelease) (GCC) ) #1 SMP Wed Nov 6 09:46:02 CET 2013
  Command line: ro root=LABEL=/ resume=/dev/sda2 rd_NO_LUKS rd_NO_LVM 
   rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 
   KEYTABLE=sv-latin1
 ...
 gran_size: 8M  chunk_size: 64M num_reg: 9  lose cover RAM: 
   6M
 ...
  mtrr_cleanup: can not find optimal value
  please specify mtrr_gran_size/mtrr_chunk_size
  === snip ===
 
  For now I'm disabling the mtrr sanitizer in this machine's kernel.

 Can you try to boot with mtrr_gran_size=8m mtrr_chunk_size=64m ?
  
   That results in:
  
   reg 0, base: 0GB, range: 8GB, type WB
   reg 1, base: 8GB, range: 512MB, type WB
   reg 2, base: 3584MB, range: 512MB, type UC
   reg 3, base: 3520MB, range: 64MB, type UC
   reg 4, base: 3512MB, range: 8MB, type UC
   reg 5, base: 8688MB, range: 16MB, type UC
   reg 6, base: 8680MB, range: 8MB, type UC
   reg 7, base: 8678MB, range: 2MB, type UC
   total RAM covered: 8094M
gran_size: 8M  chunk_size: 64M num_reg: 9  lose cover RAM: 6M
   New variable MTRRs
   reg 0, base: 0GB, range: 2GB, type WB
   reg 1, base: 2GB, range: 1GB, type WB
   reg 2, base: 3GB, range: 256MB, type WB
   reg 3, base: 3328MB, range: 128MB, type WB
   reg 4, base: 3456MB, range: 64MB, type WB
   reg 5, base: 3512MB, range: 8MB, type UC
   reg 6, base: 4GB, range: 4GB, type WB
   reg 7, base: 8GB, range: 512MB, type WB
   reg 8, base: 8672MB, range: 32MB, type UC
   e820: update [mem 0xdb80-0x] usable == reserved
   e820: update [mem 0x21e00-0x21e5f] usable == reserved
  ...
   modified: [mem 0x00021e00-0x00021e5f] reserved
  
  that is right, it throw 6M away.
  
  Did you notice any slowness or speeding for x window?

I'm not noticing any change.  i915 kernel driver + xorg's intel driver.

  What does /proc/mtrr look like after xwindow is started?

reg00: base=0x0 (0MB), size= 2048MB, count=1: write-back
reg01: base=0x08000 ( 2048MB), size= 1024MB, count=1: write-back
reg02: base=0x0c000 ( 3072MB), size=  256MB, count=1: write-back
reg03: base=0x0d000 ( 3328MB), size=  128MB, count=1: write-back
reg04: base=0x0d800 ( 3456MB), size=   64MB, count=1: write-back
reg05: base=0x0db80 ( 3512MB), size=8MB, count=1: uncachable
reg06: base=0x1 ( 4096MB), size= 4096MB, count=1: write-back
reg07: base=0x2 ( 8192MB), size=  512MB, count=1: write-back
reg08: base=0x21e00 ( 8672MB), size=   32MB, count=1: uncachable

/Mikael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG?] mtrr sanitizer fails on Latitude E6230

2013-11-07 Thread Mikael Pettersson

Yinghai Lu writes:
 > On Wed, Nov 6, 2013 at 1:16 AM, Mikael Pettersson  
 > wrote:
 > > I recently got a Dell Latitude E6230 (Ivy Bridge i7-3540M) and noticed that
 > > the mtrr sanitizer failed on it:
 > >
 > > === snip ===
 > > Linux version 3.12.0 (mikpe@barley) (gcc version 4.8.3 20131017 
 > > (prerelease) (GCC) ) #1 SMP Wed Nov 6 09:46:02 CET 2013
 > > Command line: ro root=LABEL=/ resume=/dev/sda2 rd_NO_LUKS rd_NO_LVM 
 > > rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 
 > > KEYTABLE=sv-latin1
 > ...
 > gran_size: 8M  chunk_size: 64M num_reg: 9  lose cover RAM: 6M
 > ...
 > > mtrr_cleanup: can not find optimal value
 > > please specify mtrr_gran_size/mtrr_chunk_size
 > > === snip ===
 > >
 > > For now I'm disabling the mtrr sanitizer in this machine's kernel.
 > 
 > Can you try to boot with "mtrr_gran_size=8m mtrr_chunk_size=64m" ?

That results in:

Linux version 3.12.0-test (mikpe@barley) (gcc version 4.8.3 20131017 
(prerelease) (GCC) ) #1 SMP Thu Nov 7 09:12:14 CET 2013
Command line: mtrr_gran_size=8m mtrr_chunk_size=64m ro root=LABEL=/ 
resume=/dev/sda2 rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 
SYSFONT=latarcyrheb-sun16 KEYTABLE=sv-latin1
KERNEL supported cpus:
  Intel GenuineIntel
e820: BIOS-provided physical RAM map:
BIOS-e820: [mem 0x-0x0009d3ff] usable
BIOS-e820: [mem 0x0009d400-0x0009] reserved
BIOS-e820: [mem 0x000e-0x000f] reserved
BIOS-e820: [mem 0x0010-0x1fff] usable
BIOS-e820: [mem 0x2000-0x201f] reserved
BIOS-e820: [mem 0x2020-0x40003fff] usable
BIOS-e820: [mem 0x40004000-0x40004fff] reserved
BIOS-e820: [mem 0x40005000-0xd5f09fff] usable
BIOS-e820: [mem 0xd5f0a000-0xd5ff] reserved
BIOS-e820: [mem 0xd600-0xd6751fff] usable
BIOS-e820: [mem 0xd6752000-0xd67f] reserved
BIOS-e820: [mem 0xd680-0xd6fb2fff] usable
BIOS-e820: [mem 0xd6fb3000-0xd6ff] ACPI data
BIOS-e820: [mem 0xd700-0xd86fbfff] usable
BIOS-e820: [mem 0xd86fc000-0xd87f] ACPI NVS
BIOS-e820: [mem 0xd880-0xd96d4fff] usable
BIOS-e820: [mem 0xd96d5000-0xda054fff] reserved
BIOS-e820: [mem 0xda055000-0xda097fff] ACPI NVS
BIOS-e820: [mem 0xda098000-0xdaadcfff] usable
BIOS-e820: [mem 0xdaadd000-0xdafe] reserved
BIOS-e820: [mem 0xdaff-0xdaff] usable
BIOS-e820: [mem 0xdb80-0xdf9f] reserved
BIOS-e820: [mem 0xf800-0xfbff] reserved
BIOS-e820: [mem 0xfec0-0xfec00fff] reserved
BIOS-e820: [mem 0xfed0-0xfed03fff] reserved
BIOS-e820: [mem 0xfed1c000-0xfed1] reserved
BIOS-e820: [mem 0xfee0-0xfee00fff] reserved
BIOS-e820: [mem 0xff00-0x] reserved
BIOS-e820: [mem 0x0001-0x00021e5f] usable
NX (Execute Disable) protection: active
SMBIOS 2.7 present.
DMI: Dell Inc. Latitude E6230/0Y47PX, BIOS A11 08/27/2013
e820: update [mem 0x-0x0fff] usable ==> reserved
e820: remove [mem 0x000a-0x000f] usable
e820: last_pfn = 0x21e600 max_arch_pfn = 0x4
MTRR default type: uncachable
MTRR fixed ranges enabled:
  0-9 write-back
  A-B uncachable
  C-C write-protect
  D-E7FFF uncachable
  E8000-F write-protect
MTRR variable ranges enabled:
  0 base 0 mask E write-back
  1 base 2 mask FE000 write-back
  2 base 0E000 mask FE000 uncachable
  3 base 0DC00 mask FFC00 uncachable
  4 base 0DB80 mask FFF80 uncachable
  5 base 21F00 mask FFF00 uncachable
  6 base 21E80 mask FFF80 uncachable
  7 base 21E60 mask FFFE0 uncachable
  8 disabled
  9 disabled
x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
original variable MTRRs
reg 0, base: 0GB, range: 8GB, type WB
reg 1, base: 8GB, range: 512MB, type WB
reg 2, base: 3584MB, range: 512MB, type UC
reg 3, base: 3520MB, range: 64MB, type UC
reg 4, base: 3512MB, range: 8MB, type UC
reg 5, base: 8688MB, range: 16MB, type UC
reg 6, base: 8680MB, range: 8MB, type UC
reg 7, base: 8678MB, range: 2MB, type UC
total RAM covered: 8094M
 gran_size: 8M  chunk_size: 64M num_reg: 9  lose cover RAM: 6M
New variable MTRRs
reg 0, base: 0GB, range: 2GB, type WB
reg 1, base: 2GB, range: 1GB, type WB
reg 2, base: 3GB, range: 256MB, type WB
reg 3, base: 3328MB, range: 128MB, type WB
reg 4, base: 3456MB, range: 64MB, type WB
reg 5, base: 3512MB, range: 8MB, type UC
reg 6, base: 4GB, range: 4GB, type WB
reg 7, base: 8GB, range: 512MB, type WB
re

1 2 3 4 5 6 7 8 9 >

1 - 100 of 874 matches

Mail list logo