Re: Kernel version numbers after 4.9.255 and 4.4.255

2021-02-04 Thread Christoph Biedl
David Laight wrote...

> A full wrap might catch checks for less than (say) 4.4.2 which
> might be present to avoid very early versions.
> So sticking at 255 or wrapping onto (say) 128 to 255 might be better.

Hitting such version checks still might happen, though.

Also, any wrapping introduces a real risk package managers will see
version numbers running backwards and therefore will refrain from
installing an actually newer version.

For scripts/package/builddeb (I don't use that, though), you could work
around by setting an epoch, i.e. (untested)

-$sourcename ($packageversion) $distribution; urgency=low
+$sourcename (1:$packageversion) $distribution; urgency=low

but every packaging mechanism in-tree and outside should adopt such a
change, if even possible. Which is why this feels bad.

Possibly I am missing something: What's the reason to not use
EXTRAVERSION as back in the old 2.6.x.y days, so change to 4.4.255.1 and
so on? Well, unless there are still installations who treat 4.4.255 as
2.6.64.255.

Christoph


Re: [PATCH 4.20 41/65] Revert "powerpc/tm: Unset MSR[TS] if not recheckpointing"

2019-01-12 Thread Christoph Biedl
Greg Kroah-Hartman wrote...

> 4.20-stable review patch.  If anyone has any objections, please let me know.
>
> --
>
> From: Greg Kroah-Hartman 
>
> This reverts commit d412deb85a4aada382352a8202beb7af8921cd53 which is
> commit 6f5b9f018f4c7686fd944d920209d1382d320e4e upstream.
>
> It breaks the powerpc build, so drop it from the tree until a fix goes
> upstream.

Is this necessary on 4.20? The build failures I reported were on 4.19
only. The 4.20.2-rc1 kernel for my Powermac G5 builds with and without
that patch, both boot fine, no visible differences. Again however, Breno
is authoritative here.

Aside, I also checked 4.19.15-rc1, builds and runs without any
noticeable problems.

Christoph


Re: [PATCH 4.19 000/170] 4.19.14-stable review

2019-01-10 Thread Christoph Biedl
Greg Kroah-Hartman wrote...

> Ok, Breno and Christoph, what should I do here?  Should I drop this
> patch in the tree or add this new one?  Ideally I can fix this soon as I
> don't like having broken trees out there...

Whatever Breno says - I don't feel competent enough to decide what's
right here.

Christoph


Re: [PATCH 4.19 000/170] 4.19.14-stable review

2019-01-10 Thread Christoph Biedl
Christoph Biedl wrote...

> Sorry for not getting back to you earlier. Building yesterday's
> release (v4.19.14) *failed*, bisect led to
>
> | commit a9935a12768851762089fda8e5a9daaf0231808e (HEAD)
> | Author: Breno Leitao 
> | Date:   Mon Nov 26 18:12:00 2018 -0200
> |
> | powerpc/tm: Unset MSR[TS] if not recheckpointing
>
> Reverting that commit seems to be sufficient, build passes then.
>
> Additionally, neither 4.20 nor 5.0-rc1 show this problem. The
> | commit 6f5b9f018f4c7686fd944d920209d1382d320e4e upstream.
> builds as well, so I'll try to find the missing prerequisite next.

Cherry-picking

| commit 5c784c8414fba11b62e12439f11e109fb5751f38
| Author: Breno Leitao 
| Date:   Thu Aug 16 14:21:07 2018 -0300
|
| powerpc/tm: Remove msr_tm_active()

makes the build pass. Bruno, does this make sense?

 Christoph


Re: [PATCH 4.19 000/170] 4.19.14-stable review

2019-01-10 Thread Christoph Biedl
Greg Kroah-Hartman wrote...

> I dropped e1c3743e1a20 ("powerpc/tm: Set MSR[TS] just prior to recheckpoint")
> from the stable trees, which is what I was told was the commit that was
> causing the problems by Christoph and Breno (on to: now).
>
> Was that not the offending commit?  If so, what one was?
>
> totally confused,

Sorry for not getting back to you earlier. Building yesterday's
release (v4.19.14) *failed*, bisect led to

| commit a9935a12768851762089fda8e5a9daaf0231808e (HEAD)
| Author: Breno Leitao 
| Date:   Mon Nov 26 18:12:00 2018 -0200
|
| powerpc/tm: Unset MSR[TS] if not recheckpointing

Reverting that commit seems to be sufficient, build passes then.

Additionally, neither 4.20 nor 5.0-rc1 show this problem. The
| commit 6f5b9f018f4c7686fd944d920209d1382d320e4e upstream.
builds as well, so I'll try to find the missing prerequisite next.

An example .config is attached.

Christoph
#
# Automatically generated file; DO NOT EDIT.
# Linux/powerpc 4.19.14 Kernel Configuration
#

#
# Compiler: powerpc64-linux-gnu-gcc (Debian 8.2.0-11) 8.2.0
#
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=80200
CONFIG_CLANG_VERSION=0
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_XZ=y
# CONFIG_KERNEL_GZIP is not set
CONFIG_KERNEL_XZ=y
CONFIG_DEFAULT_HOSTNAME=""
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_USELIB is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_SHOW_LEVEL=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_IRQ_DOMAIN=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_ARCH_HAS_TICK_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_CPU_ISOLATION=y

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=17
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_CGROUPS=y
CONFIG_PAGE_COUNTER=y
CONFIG_MEMCG=y
# CONFIG_MEMCG_SWAP is not set
CONFIG_MEMCG_KMEM=y
CONFIG_BLK_CGROUP=y
# CONFIG_DEBUG_BLK_CGROUP is not set
CONFIG_CGROUP_WRITEBACK=y
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
# CONFIG_RT_GROUP_SCHED is not set
CONFIG_CGROUP_PIDS=y
# CONFIG_CGROUP_RDMA is not set
CONFIG_CGROUP_FREEZER=y
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_PERF=y
CONFIG_CGROUP_BPF=y
# CONFIG_CGROUP_DEBUG is not set
CONFIG_SOCK_CGROUP_DATA=y
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_IPC_NS=y
CONFIG_USER_NS=y
CONFIG_PID_NS=y
# CONFIG_CHECKPOINT_RESTORE is not set
CONFIG_SCHED_AUTOGROUP=y
# CONFIG_SYSFS_DEPRECATED is not set
# CONFIG_RELAY is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
# CONFIG_RD_BZIP2 is not set
# CONFIG_RD_LZMA is not set
CONFIG_RD_XZ=y
# CONFIG_RD_LZO is not set
# CONFIG_RD_LZ4 is not set
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION=y
# CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is not set
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_BPF=y
CONFIG_EXPERT=y
CONFIG_MULTIUSER=y
CONFIG_SGETMASK_SYSCALL=y
CONFIG_SYSFS_SYSCALL=y
# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_FHANDLE=y
CONFIG_POSIX_TIMERS=y
CONFIG_PRINTK=y
CONFIG_PRINTK_NMI=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_FUTEX_PI=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_ADVISE_SYSCALLS=y
CONFIG_MEMBARRIER=y
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
CONFIG_KALLSYMS_BASE_RELATIVE=y
CONFIG_BPF_SYSCALL=y
# CONFIG_USERFAULTFD is not set
CONFIG_ARCH_HAS_MEMBARRIER_CALLBACKS=y
CONFIG_RSEQ=y
# CONFIG_DEBUG_RSEQ is not set
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y
# CONFIG_PC104 is not set

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
CONFIG_VM_EVENT_COUNTERS=y
# 

Re: [PATCH 4.14 024/110] btrfs: use proper endianness accessors for super_copy

2018-03-17 Thread Christoph Biedl
Greg Kroah-Hartman wrote...

> On Thu, Mar 15, 2018 at 07:55:42PM +0100, Christoph Biedl wrote:

> > > commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.

> > On big-endian systems, this change intruduces severe corruption,
> > resulting in complete loss of the data on the used block device.

> That sucks.  Can you test Linus's tree to verify the problem is there?
> I'll gladly revert this if Linus's tree also gets the revert, I don't
> want you to hit this when you upgrade to a newer kernel.

Confirmed: The problem is, err ... was in Linus' tree as well. The
rather recent commit 8f5fd927c3a7 reverted the change, after that
everything is as expected again.

Looking at the original commit, I don't have a clue why things go wrong
so horribly - otherwise don't be afraid of my data. I took this as a
chance to verify my data recovery procedure, with success.

Christoph


Re: [PATCH 4.14 024/110] btrfs: use proper endianness accessors for super_copy

2018-03-17 Thread Christoph Biedl
Greg Kroah-Hartman wrote...

> On Thu, Mar 15, 2018 at 07:55:42PM +0100, Christoph Biedl wrote:

> > > commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.

> > On big-endian systems, this change intruduces severe corruption,
> > resulting in complete loss of the data on the used block device.

> That sucks.  Can you test Linus's tree to verify the problem is there?
> I'll gladly revert this if Linus's tree also gets the revert, I don't
> want you to hit this when you upgrade to a newer kernel.

Confirmed: The problem is, err ... was in Linus' tree as well. The
rather recent commit 8f5fd927c3a7 reverted the change, after that
everything is as expected again.

Looking at the original commit, I don't have a clue why things go wrong
so horribly - otherwise don't be afraid of my data. I took this as a
chance to verify my data recovery procedure, with success.

Christoph


Re: [PATCH 4.14 024/110] btrfs: use proper endianness accessors for super_copy

2018-03-15 Thread Christoph Biedl
Greg Kroah-Hartman wrote...

> 4.14-stable review patch.  If anyone has any objections, please let me know.

> commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.
(...)

> If the filesystem is always used on a same endian host, this will not
> be a problem.

>From my observations I cannot quite subscribe to that.

On big-endian systems, this change intruduces severe corruption,
resulting in complete loss of the data on the used block device.

Steps to reproduce (tested on ppc/powerpc and parisc/hppa):

# mkfs.btrfs $DEV
# mount $DEV /mnt/tmp/
# umount /mnt/tmp/

This simple umount corrupts the file system:

# mount $DEV /mnt/tmp/
mount: /mnt/tmp: wrong fs type, bad option, bad superblock on $DEV, missing 
codepage or helper program, or other error.

# dmesg:
BTRFS critical (device ): unable to find logical 4294967296 length 4096
BTRFS critical (device ): unable to find logical 4294967296 length 4096
BTRFS critical (device ): unable to find logical 18102363734671360 length 
16384
BTRFS error (device ): failed to read chunk root
BTRFS error (device ): open_ctree failed

Also fsck is of no help:

# btrfsck $DEV
Couldn't map the block 18102363734671360
No mapping for 18102363734671360-18102363734687744
Couldn't map the block 18102363734671360
bytenr mismatch, want=18102363734671360, have=0
ERROR: cannot read chunk root
ERROR: cannot open file system


Trying mount or fsck on a little-endian system does not help either. So
I consider the data on that device lost - luckily I use btrfs only for
files where a backup exists all the time.


Reverting that change restored the previous error-free behaviour. I
didn't check HEAD, i.e. v4.16-rc5, since the upstream commt was the last
that affected these files. Still I could give this a try if anybody
wishes so.

Cheers,

Christoph


Re: [PATCH 4.14 024/110] btrfs: use proper endianness accessors for super_copy

2018-03-15 Thread Christoph Biedl
Greg Kroah-Hartman wrote...

> 4.14-stable review patch.  If anyone has any objections, please let me know.

> commit 3c181c12c431fe33b669410d663beb9cceefcd1b upstream.
(...)

> If the filesystem is always used on a same endian host, this will not
> be a problem.

>From my observations I cannot quite subscribe to that.

On big-endian systems, this change intruduces severe corruption,
resulting in complete loss of the data on the used block device.

Steps to reproduce (tested on ppc/powerpc and parisc/hppa):

# mkfs.btrfs $DEV
# mount $DEV /mnt/tmp/
# umount /mnt/tmp/

This simple umount corrupts the file system:

# mount $DEV /mnt/tmp/
mount: /mnt/tmp: wrong fs type, bad option, bad superblock on $DEV, missing 
codepage or helper program, or other error.

# dmesg:
BTRFS critical (device ): unable to find logical 4294967296 length 4096
BTRFS critical (device ): unable to find logical 4294967296 length 4096
BTRFS critical (device ): unable to find logical 18102363734671360 length 
16384
BTRFS error (device ): failed to read chunk root
BTRFS error (device ): open_ctree failed

Also fsck is of no help:

# btrfsck $DEV
Couldn't map the block 18102363734671360
No mapping for 18102363734671360-18102363734687744
Couldn't map the block 18102363734671360
bytenr mismatch, want=18102363734671360, have=0
ERROR: cannot read chunk root
ERROR: cannot open file system


Trying mount or fsck on a little-endian system does not help either. So
I consider the data on that device lost - luckily I use btrfs only for
files where a backup exists all the time.


Reverting that change restored the previous error-free behaviour. I
didn't check HEAD, i.e. v4.16-rc5, since the upstream commt was the last
that affected these files. Still I could give this a try if anybody
wishes so.

Cheers,

Christoph


Re: [PATCH] crypto: sun4i-ss: add missing statesize

2015-11-08 Thread Christoph Biedl
Maxime Ripard wrote...

> > > > This patch specifiy statesize for sha1 and md5.
> > > > 
> > > > Signed-off-by: LABBE Corentin 
> > > > Cc: sta...@vger.kernel.org
> > > 
> > > Please also add a Fixes tag (and the stable version it applies to).
> > 
> > I don't see the point for a fixes tag as it would simply refer
> > to the original patch-set that added the driver.
> 
> What's the problem with that?

Fixes: should rather point to the commit that caused the breakage in my
opinion. Which did this by intention:

| commit 8996eafdcbad149ac0f772fb1649fbb75c482a6a
| Author: Russell King 
| Date:   Fri Oct 9 20:43:33 2015 +0100
| 
| crypto: ahash - ensure statesize is non-zero
(...)
+ This patch adds a check to prevent these drivers from registering
+ ahash algorithms until they are fixed.

Another crypto subsystem (mv_cesa) suffers from the same problem. I
have a patch ready but would prefer a consensus on these formalities
before submitting.

Aside from this, if you really need another Tested-by:, add me. And
also Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=107281

Christoph


signature.asc
Description: Digital signature


Re: [PATCH] crypto: sun4i-ss: add missing statesize

2015-11-08 Thread Christoph Biedl
Maxime Ripard wrote...

> > > > This patch specifiy statesize for sha1 and md5.
> > > > 
> > > > Signed-off-by: LABBE Corentin 
> > > > Cc: sta...@vger.kernel.org
> > > 
> > > Please also add a Fixes tag (and the stable version it applies to).
> > 
> > I don't see the point for a fixes tag as it would simply refer
> > to the original patch-set that added the driver.
> 
> What's the problem with that?

Fixes: should rather point to the commit that caused the breakage in my
opinion. Which did this by intention:

| commit 8996eafdcbad149ac0f772fb1649fbb75c482a6a
| Author: Russell King 
| Date:   Fri Oct 9 20:43:33 2015 +0100
| 
| crypto: ahash - ensure statesize is non-zero
(...)
+ This patch adds a check to prevent these drivers from registering
+ ahash algorithms until they are fixed.

Another crypto subsystem (mv_cesa) suffers from the same problem. I
have a patch ready but would prefer a consensus on these formalities
before submitting.

Aside from this, if you really need another Tested-by:, add me. And
also Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=107281

Christoph


signature.asc
Description: Digital signature


Re: Soft lockup issue in Linux 4.1.9

2015-10-08 Thread Christoph Biedl
Eric Dumazet wrote...

[ commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af ]

> It definitely should help !

Yesterday, I've experienced issues somewhat similar to this, but I'm
not entirely sure:

Four of five systems running 4.1.9 stopped working. No reaction on
network, keyboard, serial console. In one case, the stack trace as
below made it to the loghost.

Two things are quite different. First, the systems had a reasonable
uptime, about a week.

And second, the scary part: All incidents happened within a rather
short time span of three minutes the most, beginning after 16:41:28 and
before 16:41:54 UTC. So I assumed a brownout first - until I realized
the systems faded away at slightly different times, and one is at a
different location. While other systems using different kernel versions
continued to operate on both sites.

So, I'd be glad for answers for

- Is this the same issue or should I be even more afraid?
- What might be the reason for this temporal coincidence? I have no
  plausible idea.

Confused,
Christoph


 INFO: rcu_sched self-detected stall on CPU { 3}  (t=6000 jiffies g=8932806 
c=8932805 q=58491)
 rcu_sched kthread starved for 5999 jiffies!
 Task dump for CPU 3:
 swapper/3   R  running task0 0  1 0x0008
  81e396c0 88042dcc3b20 810807da 0003
  81e396c0 88042dcc3b40 81083b78 88042dcc3b80
  0003 88042dcc3b70 810a945c 88042dcd5740
 Call Trace:
[] sched_show_task+0xaa/0x110
  [] dump_cpu_task+0x38/0x40
  [] rcu_dump_cpu_stacks+0x8c/0xc0
  [] rcu_check_callbacks+0x3b1/0x680
  [] ? acct_account_cputime+0x17/0x20
  [] ? account_system_time+0x8e/0x180
  [] update_process_times+0x33/0x60
  [] tick_sched_handle.isra.14+0x30/0x40
  [] tick_sched_timer+0x43/0x80
  [] __run_hrtimer.isra.32+0x4a/0xd0
  [] hrtimer_interrupt+0xd5/0x1f0
  [] local_apic_timer_interrupt+0x34/0x60
 INFO: rcu_sched self-detected stall on CPU { 3}  (t=6000 jiffies g=8932806 
c=8932805 q=58491)
 rcu_sched kthread starved for 5999 jiffies!
 Task dump for CPU 3:
 swapper/3   R  running task0 0  1 0x0008
  81e396c0 88042dcc3b20 810807da 0003
  81e396c0 88042dcc3b40 81083b78 88042dcc3b80
  0003 88042dcc3b70 810a945c 88042dcd5740
 Call Trace:
[] sched_show_task+0xaa/0x110
  [] dump_cpu_task+0x38/0x40
  [] smp_apic_timer_interrupt+0x3c/0x60
  [] apic_timer_interrupt+0x6b/0x70
  [] ? _raw_spin_unlock_irqrestore+0x9/0x10
  [] try_to_del_timer_sync+0x48/0x60
  [] ? del_timer_sync+0x42/0x60
  [] del_timer_sync+0x4a/0x60
  [] inet_csk_reqsk_queue_drop+0x7a/0x1f0
  [] reqsk_timer_handler+0x12f/0x290
  [] ? inet_csk_reqsk_queue_drop+0x1f0/0x1f0
  [] call_timer_fn.isra.26+0x26/0x80
  [] rcu_dump_cpu_stacks+0x8c/0xc0
  [] rcu_check_callbacks+0x3b1/0x680
  [] ? acct_account_cputime+0x17/0x20
  [] ? account_system_time+0x8e/0x180
  [] update_process_times+0x33/0x60
  [] tick_sched_handle.isra.14+0x30/0x40
  [] tick_sched_timer+0x43/0x80
  [] __run_hrtimer.isra.32+0x4a/0xd0
  [] hrtimer_interrupt+0xd5/0x1f0
  [] local_apic_timer_interrupt+0x34/0x60
  [] run_timer_softirq+0x18e/0x220
  [] __do_softirq+0xda/0x1f0
  [] irq_exit+0x76/0xa0
  [] smp_apic_timer_interrupt+0x45/0x60
  [] apic_timer_interrupt+0x6b/0x70
[] ? sched_clock_cpu+0x9e/0xb0
  [] ? amd_e400_idle+0x35/0xd0
  [] ? amd_e400_idle+0x33/0xd0
  [] arch_cpu_idle+0xa/0x10
  [] cpu_startup_entry+0x2c3/0x330
  [] smp_apic_timer_interrupt+0x3c/0x60
  [] apic_timer_interrupt+0x6b/0x70
  [] ? _raw_spin_unlock_irqrestore+0x9/0x10
  [] try_to_del_timer_sync+0x48/0x60
  [] ? del_timer_sync+0x42/0x60
  [] del_timer_sync+0x4a/0x60
  [] inet_csk_reqsk_queue_drop+0x7a/0x1f0
  [] reqsk_timer_handler+0x12f/0x290
  [] ? inet_csk_reqsk_queue_drop+0x1f0/0x1f0
  [] call_timer_fn.isra.26+0x26/0x80
  [] start_secondary+0x17c/0x1a0
  [] run_timer_softirq+0x18e/0x220
  [] __do_softirq+0xda/0x1f0
  [] irq_exit+0x76/0xa0
  [] smp_apic_timer_interrupt+0x45/0x60
  [] apic_timer_interrupt+0x6b/0x70
[] ? sched_clock_cpu+0x9e/0xb0
  [] ? amd_e400_idle+0x35/0xd0
  [] ? amd_e400_idle+0x33/0xd0
  [] arch_cpu_idle+0xa/0x10
  [] cpu_startup_entry+0x2c3/0x330
  [] start_secondary+0x17c/0x1a0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Soft lockup issue in Linux 4.1.9

2015-10-08 Thread Christoph Biedl
Eric Dumazet wrote...

[ commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af ]

> It definitely should help !

Yesterday, I've experienced issues somewhat similar to this, but I'm
not entirely sure:

Four of five systems running 4.1.9 stopped working. No reaction on
network, keyboard, serial console. In one case, the stack trace as
below made it to the loghost.

Two things are quite different. First, the systems had a reasonable
uptime, about a week.

And second, the scary part: All incidents happened within a rather
short time span of three minutes the most, beginning after 16:41:28 and
before 16:41:54 UTC. So I assumed a brownout first - until I realized
the systems faded away at slightly different times, and one is at a
different location. While other systems using different kernel versions
continued to operate on both sites.

So, I'd be glad for answers for

- Is this the same issue or should I be even more afraid?
- What might be the reason for this temporal coincidence? I have no
  plausible idea.

Confused,
Christoph


 INFO: rcu_sched self-detected stall on CPU { 3}  (t=6000 jiffies g=8932806 
c=8932805 q=58491)
 rcu_sched kthread starved for 5999 jiffies!
 Task dump for CPU 3:
 swapper/3   R  running task0 0  1 0x0008
  81e396c0 88042dcc3b20 810807da 0003
  81e396c0 88042dcc3b40 81083b78 88042dcc3b80
  0003 88042dcc3b70 810a945c 88042dcd5740
 Call Trace:
[] sched_show_task+0xaa/0x110
  [] dump_cpu_task+0x38/0x40
  [] rcu_dump_cpu_stacks+0x8c/0xc0
  [] rcu_check_callbacks+0x3b1/0x680
  [] ? acct_account_cputime+0x17/0x20
  [] ? account_system_time+0x8e/0x180
  [] update_process_times+0x33/0x60
  [] tick_sched_handle.isra.14+0x30/0x40
  [] tick_sched_timer+0x43/0x80
  [] __run_hrtimer.isra.32+0x4a/0xd0
  [] hrtimer_interrupt+0xd5/0x1f0
  [] local_apic_timer_interrupt+0x34/0x60
 INFO: rcu_sched self-detected stall on CPU { 3}  (t=6000 jiffies g=8932806 
c=8932805 q=58491)
 rcu_sched kthread starved for 5999 jiffies!
 Task dump for CPU 3:
 swapper/3   R  running task0 0  1 0x0008
  81e396c0 88042dcc3b20 810807da 0003
  81e396c0 88042dcc3b40 81083b78 88042dcc3b80
  0003 88042dcc3b70 810a945c 88042dcd5740
 Call Trace:
[] sched_show_task+0xaa/0x110
  [] dump_cpu_task+0x38/0x40
  [] smp_apic_timer_interrupt+0x3c/0x60
  [] apic_timer_interrupt+0x6b/0x70
  [] ? _raw_spin_unlock_irqrestore+0x9/0x10
  [] try_to_del_timer_sync+0x48/0x60
  [] ? del_timer_sync+0x42/0x60
  [] del_timer_sync+0x4a/0x60
  [] inet_csk_reqsk_queue_drop+0x7a/0x1f0
  [] reqsk_timer_handler+0x12f/0x290
  [] ? inet_csk_reqsk_queue_drop+0x1f0/0x1f0
  [] call_timer_fn.isra.26+0x26/0x80
  [] rcu_dump_cpu_stacks+0x8c/0xc0
  [] rcu_check_callbacks+0x3b1/0x680
  [] ? acct_account_cputime+0x17/0x20
  [] ? account_system_time+0x8e/0x180
  [] update_process_times+0x33/0x60
  [] tick_sched_handle.isra.14+0x30/0x40
  [] tick_sched_timer+0x43/0x80
  [] __run_hrtimer.isra.32+0x4a/0xd0
  [] hrtimer_interrupt+0xd5/0x1f0
  [] local_apic_timer_interrupt+0x34/0x60
  [] run_timer_softirq+0x18e/0x220
  [] __do_softirq+0xda/0x1f0
  [] irq_exit+0x76/0xa0
  [] smp_apic_timer_interrupt+0x45/0x60
  [] apic_timer_interrupt+0x6b/0x70
[] ? sched_clock_cpu+0x9e/0xb0
  [] ? amd_e400_idle+0x35/0xd0
  [] ? amd_e400_idle+0x33/0xd0
  [] arch_cpu_idle+0xa/0x10
  [] cpu_startup_entry+0x2c3/0x330
  [] smp_apic_timer_interrupt+0x3c/0x60
  [] apic_timer_interrupt+0x6b/0x70
  [] ? _raw_spin_unlock_irqrestore+0x9/0x10
  [] try_to_del_timer_sync+0x48/0x60
  [] ? del_timer_sync+0x42/0x60
  [] del_timer_sync+0x4a/0x60
  [] inet_csk_reqsk_queue_drop+0x7a/0x1f0
  [] reqsk_timer_handler+0x12f/0x290
  [] ? inet_csk_reqsk_queue_drop+0x1f0/0x1f0
  [] call_timer_fn.isra.26+0x26/0x80
  [] start_secondary+0x17c/0x1a0
  [] run_timer_softirq+0x18e/0x220
  [] __do_softirq+0xda/0x1f0
  [] irq_exit+0x76/0xa0
  [] smp_apic_timer_interrupt+0x45/0x60
  [] apic_timer_interrupt+0x6b/0x70
[] ? sched_clock_cpu+0x9e/0xb0
  [] ? amd_e400_idle+0x35/0xd0
  [] ? amd_e400_idle+0x33/0xd0
  [] arch_cpu_idle+0xa/0x10
  [] cpu_startup_entry+0x2c3/0x330
  [] start_secondary+0x17c/0x1a0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 030/143] proc connector: fix info leaks

2014-05-12 Thread Christoph Biedl
Willy Tarreau wrote...

> Initialize event_data for all possible message types to prevent leaking
> kernel stack contents to userland (up to 20 bytes). Also set the flags
> member of the connector message to 0 to prevent leaking two more stack
> bytes this way.

There are build errors as shown below and I guess that one is the
culprit. Can do detailled checks tonight, I'm a bit in a hurry right
now.

(Using gcc-4.7 as provided by Debian wheezy)

Christoph

drivers/connector/cn_proc.c:286:9: error: expected declaration specifiers or 
'...' before '&' token
drivers/connector/cn_proc.c:286:26: error: expected declaration specifiers or 
'...' before numeric constant
drivers/connector/cn_proc.c:286:29: error: expected declaration specifiers or 
'...' before 'sizeof'
drivers/connector/cn_proc.c:287:5: error: expected '=', ',', ';', 'asm' or 
'__attribute__' before '->' token
make[5]: *** [drivers/connector/cn_proc.o] Error 1
make[4]: *** [drivers/connector] Error 2
make[4]: *** Waiting for unfinished jobs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 030/143] proc connector: fix info leaks

2014-05-12 Thread Christoph Biedl
Willy Tarreau wrote...

 Initialize event_data for all possible message types to prevent leaking
 kernel stack contents to userland (up to 20 bytes). Also set the flags
 member of the connector message to 0 to prevent leaking two more stack
 bytes this way.

There are build errors as shown below and I guess that one is the
culprit. Can do detailled checks tonight, I'm a bit in a hurry right
now.

(Using gcc-4.7 as provided by Debian wheezy)

Christoph

drivers/connector/cn_proc.c:286:9: error: expected declaration specifiers or 
'...' before '' token
drivers/connector/cn_proc.c:286:26: error: expected declaration specifiers or 
'...' before numeric constant
drivers/connector/cn_proc.c:286:29: error: expected declaration specifiers or 
'...' before 'sizeof'
drivers/connector/cn_proc.c:287:5: error: expected '=', ',', ';', 'asm' or 
'__attribute__' before '-' token
make[5]: *** [drivers/connector/cn_proc.o] Error 1
make[4]: *** [drivers/connector] Error 2
make[4]: *** Waiting for unfinished jobs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 00/13] 3.0.99-stable review

2013-10-03 Thread Christoph Biedl
Khalid Aziz wrote...

> Better yet, just pull this patch from stable from now. I will redo
> the patch and send another one for the next round.

FYI, after patching mm/swap.c accordingly, all the 3.0 and 3.4
configurations I use do build. Some boot tests will follow, I'll
follow up only if I see unusual behaviour.

Christoph
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 00/13] 3.0.99-stable review

2013-10-03 Thread Christoph Biedl
Khalid Aziz wrote...

> Thanks for tracking this down. I had not tried a configuration with
> CONFIG_HUGETLB_PAGE not set. In my config, I was getting many
> multiple definition errors for bunch of other defines from
> linux/hugetlb.h. I will look at my config again but chances are I
> had something else screwed up in my build since you did not see
> those errors. Did you compile with CONFIG_HUGETLB_PAGE set after
> including linux/hugetlb.h? If you did, including linux/hugetlb.h
> instead of importing just the definition of PageHuge in mm/swap.c
> would be the right thing to do.

Yes, one of my configurations has CONFIG_HUGETLB_PAGE, also
CONFIG_NUMA=y, and the kernel built. Could not test it, though.

There still might be other configuration settings that caused the
error messages you've seen. Manually picking both PageHuge definitions
from linux/hugetlb.h should be a safe alternative then, but that's
ugly.

Christoph
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 00/13] 3.0.99-stable review

2013-10-03 Thread Christoph Biedl
Guenter Roeck wrote...

> On 10/02/2013 09:04 PM, Greg Kroah-Hartman wrote:
> >This is the start of the stable review cycle for the 3.0.99 release.

> Heads up: I am getting lots of build failures in 3.0 and 3.4 builds.
> 
> mm/built-in.o: In function `__put_compound_page':
> slab.c:(.text+0xaa3c): undefined reference to `PageHuge'
> mm/built-in.o: In function `put_compound_page':
> slab.c:(.text+0xaab0): undefined reference to `PageHuge'
> mm/built-in.o: In function `__get_page_tail':
> slab.c:(.text+0xb178): undefined reference to `PageHuge'
> make: *** [.tmp_vmlinux1] Error 1

This is obviously due to

| [ 11/13] mm: fix aio performance regression for database caused by THP

and happens if CONFIG_HUGETLB_PAGE is not set.

Looking closer, upstream commit 7cb2ef56 included linux/hugetlb.h
while the backport for 3.0 just defines PageHuge. Reverting that like
in the patch below causes the build to complete, and the resulting
kernel shows no anomalies here.

However did that backport, why was it done that way? Or did I miss an
important point?

Christoph

--- a/mm/swap.c
+++ b/mm/swap.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 

 #include "internal.h"

@@ -41,8 +42,6 @@
 static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
 static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);

-int PageHuge(struct page *page);
-
 /*
  * This path almost never happens for VM activity - pages are normally
  * freed via pagevecs.  But it gets used by networking.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 00/13] 3.0.99-stable review

2013-10-03 Thread Christoph Biedl
Guenter Roeck wrote...

 On 10/02/2013 09:04 PM, Greg Kroah-Hartman wrote:
 This is the start of the stable review cycle for the 3.0.99 release.

 Heads up: I am getting lots of build failures in 3.0 and 3.4 builds.
 
 mm/built-in.o: In function `__put_compound_page':
 slab.c:(.text+0xaa3c): undefined reference to `PageHuge'
 mm/built-in.o: In function `put_compound_page':
 slab.c:(.text+0xaab0): undefined reference to `PageHuge'
 mm/built-in.o: In function `__get_page_tail':
 slab.c:(.text+0xb178): undefined reference to `PageHuge'
 make: *** [.tmp_vmlinux1] Error 1

This is obviously due to

| [ 11/13] mm: fix aio performance regression for database caused by THP

and happens if CONFIG_HUGETLB_PAGE is not set.

Looking closer, upstream commit 7cb2ef56 included linux/hugetlb.h
while the backport for 3.0 just defines PageHuge. Reverting that like
in the patch below causes the build to complete, and the resulting
kernel shows no anomalies here.

However did that backport, why was it done that way? Or did I miss an
important point?

Christoph

--- a/mm/swap.c
+++ b/mm/swap.c
@@ -31,6 +31,7 @@
 #include linux/backing-dev.h
 #include linux/memcontrol.h
 #include linux/gfp.h
+#include linux/hugetlb.h

 #include internal.h

@@ -41,8 +42,6 @@
 static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
 static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);

-int PageHuge(struct page *page);
-
 /*
  * This path almost never happens for VM activity - pages are normally
  * freed via pagevecs.  But it gets used by networking.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 00/13] 3.0.99-stable review

2013-10-03 Thread Christoph Biedl
Khalid Aziz wrote...

 Thanks for tracking this down. I had not tried a configuration with
 CONFIG_HUGETLB_PAGE not set. In my config, I was getting many
 multiple definition errors for bunch of other defines from
 linux/hugetlb.h. I will look at my config again but chances are I
 had something else screwed up in my build since you did not see
 those errors. Did you compile with CONFIG_HUGETLB_PAGE set after
 including linux/hugetlb.h? If you did, including linux/hugetlb.h
 instead of importing just the definition of PageHuge in mm/swap.c
 would be the right thing to do.

Yes, one of my configurations has CONFIG_HUGETLB_PAGE, also
CONFIG_NUMA=y, and the kernel built. Could not test it, though.

There still might be other configuration settings that caused the
error messages you've seen. Manually picking both PageHuge definitions
from linux/hugetlb.h should be a safe alternative then, but that's
ugly.

Christoph
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 00/13] 3.0.99-stable review

2013-10-03 Thread Christoph Biedl
Khalid Aziz wrote...

 Better yet, just pull this patch from stable from now. I will redo
 the patch and send another one for the next round.

FYI, after patching mm/swap.c accordingly, all the 3.0 and 3.4
configurations I use do build. Some boot tests will follow, I'll
follow up only if I see unusual behaviour.

Christoph
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.32.61 - x86/ptrace/gcc 4.7 build error

2013-06-14 Thread Christoph Biedl
Willy Tarreau wrote...

> I'm attaching the two patches here to be appled on top of 2.6.32.61, I would
> like it if you could try in your environment to confirm that they correctly
> fix the issue.

Confirmation: Kernel builds and runs for both Debian squeeze and
wheezy (gcc 4.4 and gcc 4.7) on i386.

There are still other issues that need investigation but they might be
older and/or related to changes on my end. virtio-net doesn't seem to
work at all (but does so in the Debian squeeze 2.6.32 kernel), and the
virtualbox guest module (4.1.18) fails to load (known issue on i386 if
build using gcc 4.7, but know this also happens with gcc 4.4).

Unfortunately my time ressources are very limited at the moment, and
there's also something in 3.4.49 which has higher priority. Stay
tuned.

Christoph
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.32.61 - x86/ptrace/gcc 4.7 build error

2013-06-14 Thread Christoph Biedl
Willy Tarreau wrote...

 I'm attaching the two patches here to be appled on top of 2.6.32.61, I would
 like it if you could try in your environment to confirm that they correctly
 fix the issue.

Confirmation: Kernel builds and runs for both Debian squeeze and
wheezy (gcc 4.4 and gcc 4.7) on i386.

There are still other issues that need investigation but they might be
older and/or related to changes on my end. virtio-net doesn't seem to
work at all (but does so in the Debian squeeze 2.6.32 kernel), and the
virtualbox guest module (4.1.18) fails to load (known issue on i386 if
build using gcc 4.7, but know this also happens with gcc 4.4).

Unfortunately my time ressources are very limited at the moment, and
there's also something in 3.4.49 which has higher priority. Stay
tuned.

Christoph
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 81/83] ACPI video: Ignore errors after _DOD evaluation.

2012-11-22 Thread Christoph Biedl
Greg Kroah-Hartman wrote...

> 3.6-stable review patch.  If anyone has any objections, please let me know.

Only a small set of the (guessed) 300 e-mails arrived here, they were:

 - 21/11 Greg Kroah-Hartman   ┬─>[ 81/83] ACPI video: Ignore errors after _DOD 
evaluation.
 - 21/11 Greg Kroah-Hartman   └─>[ 29/83] ASoC: core: Double control update err 
for snd_soc_put_volsw_sx
 - 21/11 Greg Kroah-Hartman   [ 05/38] ptp: update adjfreq callback description
 - 21/11 Greg Kroah-Hartman   ┬─>[ 105/171] libceph: fully initialize 
connection in con_init()
 - 21/11 Greg Kroah-Hartman   ├─>[ 001/171] mm: bugfix: set 
current->reclaim_state to NULL while returning from kswapd()
 - 21/11 Greg Kroah-Hartman   ├─>[ 075/171] ceph: ensure auth ops are defined 
before use
 - 21/11 Greg Kroah-Hartman   ├─>[ 086/171] libceph: flush msgr queue during 
mon_client shutdown
 - 21/11 Greg Kroah-Hartman   ├─>[ 005/171] mac80211: call 
skb_dequeue/ieee80211_free_txskb instead of __skb_queue_purge
 - 21/11 Greg Kroah-Hartman   ├─>[ 065/171] libceph: dont reset kvec in 
prepare_write_banner()
 - 21/11 Greg Kroah-Hartman   ├─>[ 049/171] drm/i915: fix overlay on i830M
 - 21/11 Greg Kroah-Hartman   └─>[ 039/171] r8169: Fix WoL on RTL8168d/8111d.


And they had a huge delay of eight hours at kernel.org:

| Received: from mail.kernel.org ([198.145.19.201]:49019 "EHLO mail.kernel.org"
| rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
| id S1753592Ab2KVSdp (ORCPT );
| Thu, 22 Nov 2012 13:33:45 -0500
| Received: from mail.kernel.org (localhost [127.0.0.1])
| by mail.kernel.org (Postfix) with ESMTP id 87EA220434;
| Thu, 22 Nov 2012 00:46:51 + (UTC)

Since there was nothing else in the two hours since then, care to
check what went wrong?

Nevertheless, thanks to the predictable patch download URLs, I could
start my tests, everything looks good so far.

Christoph
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 81/83] ACPI video: Ignore errors after _DOD evaluation.

2012-11-22 Thread Christoph Biedl
Greg Kroah-Hartman wrote...

 3.6-stable review patch.  If anyone has any objections, please let me know.

Only a small set of the (guessed) 300 e-mails arrived here, they were:

 - 21/11 Greg Kroah-Hartman   ┬─[ 81/83] ACPI video: Ignore errors after _DOD 
evaluation.
 - 21/11 Greg Kroah-Hartman   └─[ 29/83] ASoC: core: Double control update err 
for snd_soc_put_volsw_sx
 - 21/11 Greg Kroah-Hartman   [ 05/38] ptp: update adjfreq callback description
 - 21/11 Greg Kroah-Hartman   ┬─[ 105/171] libceph: fully initialize 
connection in con_init()
 - 21/11 Greg Kroah-Hartman   ├─[ 001/171] mm: bugfix: set 
current-reclaim_state to NULL while returning from kswapd()
 - 21/11 Greg Kroah-Hartman   ├─[ 075/171] ceph: ensure auth ops are defined 
before use
 - 21/11 Greg Kroah-Hartman   ├─[ 086/171] libceph: flush msgr queue during 
mon_client shutdown
 - 21/11 Greg Kroah-Hartman   ├─[ 005/171] mac80211: call 
skb_dequeue/ieee80211_free_txskb instead of __skb_queue_purge
 - 21/11 Greg Kroah-Hartman   ├─[ 065/171] libceph: dont reset kvec in 
prepare_write_banner()
 - 21/11 Greg Kroah-Hartman   ├─[ 049/171] drm/i915: fix overlay on i830M
 - 21/11 Greg Kroah-Hartman   └─[ 039/171] r8169: Fix WoL on RTL8168d/8111d.


And they had a huge delay of eight hours at kernel.org:

| Received: from mail.kernel.org ([198.145.19.201]:49019 EHLO mail.kernel.org
| rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
| id S1753592Ab2KVSdp (ORCPT rfc822;sta...@vger.kernel.org);
| Thu, 22 Nov 2012 13:33:45 -0500
| Received: from mail.kernel.org (localhost [127.0.0.1])
| by mail.kernel.org (Postfix) with ESMTP id 87EA220434;
| Thu, 22 Nov 2012 00:46:51 + (UTC)

Since there was nothing else in the two hours since then, care to
check what went wrong?

Nevertheless, thanks to the predictable patch download URLs, I could
start my tests, everything looks good so far.

Christoph
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/11] 3.2-stable: Fix for leapsecond caused hrtimer/futex issue

2012-07-19 Thread Christoph Biedl
John Stultz wrote...

> Attached is the test case I used to reproduce and test the solution
> to the hard-hang deadlock.

I was wondering whether anybody managed to crash a virtualbox guest
using your program. No avail, using version 4.1.18 on the host and the
guest kernel running several 3.0.x (x < 38) kernels on both x32 and
x64, the guest utilies were stopped. Rather a fun fact I guess but I
wanted to let you know.

All real hardware tested, including a dockstar on armel, crashed as
predicted, while 3.0.38-rc1 was immune.

Christoph
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/11] 3.2-stable: Fix for leapsecond caused hrtimer/futex issue

2012-07-19 Thread Christoph Biedl
John Stultz wrote...

 Attached is the test case I used to reproduce and test the solution
 to the hard-hang deadlock.

I was wondering whether anybody managed to crash a virtualbox guest
using your program. No avail, using version 4.1.18 on the host and the
guest kernel running several 3.0.x (x  38) kernels on both x32 and
x64, the guest utilies were stopped. Rather a fun fact I guess but I
wanted to let you know.

All real hardware tested, including a dockstar on armel, crashed as
predicted, while 3.0.38-rc1 was immune.

Christoph
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


it821x trouble since 2.6.18

2007-01-14 Thread Christoph Biedl
Hi,

There are ITE 8212-based controllers installed in some of my
computers. I had always skipped the build-in RAID things and used them
as plain controllers.

1. Beginning with 2.6.18 there was some trouble, basically slowed data
transfer, sometimes even a totally stalled system (I cannot reproduce
the latter though).

Diffing dmesg of 2.6.17.x and 2.6.18.x (same for 2.6.19):

 Probing IDE interface ide2...
 hde: SAMSUNG VA34324A, ATA DISK drive
 hde: Performing identify fixups.
 ide2 at 0xd000-0xd007,0xd402 on irq 10
 hde: max request size: 128KiB
-hde: 8446032 sectors (4324 MB) w/478KiB Cache, CHS=14896/9/63, BUG
+hde: 8446032 sectors (4324 MB) w/478KiB Cache, CHS=14896/9/63, BUG DMA OFF
  hde:hde: recal_intr: status=0x51 { DriveReady SeekComplete Error }
 hde: recal_intr: error=0x04 { DriveStatusError }
 ide: failed opcode was: unknown
+hde: irq timeout: status=0xd0 { Busy }
+ide: failed opcode was: unknown
+ide2: reset: master: ECC circuitry error
+hde: recal_intr: status=0x51 { DriveReady SeekComplete Error }
+hde: recal_intr: error=0x04 { DriveStatusError }
+ide: failed opcode was: unknown
  hde1 hde2 < hde5 hde6 >

OK, there was "BUG" up to and including 2.6.17 but the disk(s) worked
without any problem. 

Not surprisingly the disk throughput was affected by that. Comparing
"hdparm -tT /dev/hde":

2.6.17
/dev/hde:
 Timing cached reads:88 MB in  2.01 seconds =  43.72 MB/sec
 Timing buffered disk reads:   26 MB in  3.21 seconds =   8.10 MB/sec

2.6.18
/dev/hde:
 Timing cached reads:90 MB in  2.04 seconds =  44.01 MB/sec
 Timing buffered disk reads:   16 MB in  3.14 seconds =   5.10 MB/sec


2. Trying to understand what has happened I found the main difference
is not in the driver but in ide-dma.c:

--- linux-2.6.17.14/drivers/ide/ide-dma.c   2006-06-18 01:49:35.0 +
+++ linux-2.6.18.6/drivers/ide/ide-dma.c2006-09-20 03:42:06.0 +
@@ -752,7 +750,7 @@
goto bug_dma_off;
printk(", DMA");
} else if (id->field_valid & 1) {
-   printk(", BUG");
+   goto bug_dma_off;
}
return;
 bug_dma_off:


and reverting that change returns the old transfer rates. But that is
probably not a good idea.


3. To increase confusion: I had learned the ite8212 chip is close to
the cmd/sil680, and patching siimage.c had been a solution to get
support for 8212 before it showed up in the kernel.

So I tried this again:

--- ORG/linux-2.6.19.2/drivers/ide/pci/siimage.c2006-11-29 
21:57:37.0 +
+++ NEW/linux-2.6.19.2/drivers/ide/pci/siimage.c2007-01-14 
17:59:03.0 +
@@ -52,6 +52,7 @@
case PCI_DEVICE_ID_SII_1210SA:
return 1;
case PCI_DEVICE_ID_SII_680:
+   case PCI_DEVICE_ID_ITE_8212:
return 0;
}
BUG();
@@ -1082,6 +1083,7 @@

 static struct pci_device_id siimage_pci_tbl[] = {
{ PCI_VENDOR_ID_CMD, PCI_DEVICE_ID_SII_680,  PCI_ANY_ID, PCI_ANY_ID, 0, 
0, 0},
+   { PCI_VENDOR_ID_ITE, PCI_DEVICE_ID_ITE_8212,  PCI_ANY_ID, PCI_ANY_ID, 
0, 0, 0},
 #ifdef CONFIG_BLK_DEV_IDE_SATA
{ PCI_VENDOR_ID_CMD, PCI_DEVICE_ID_SII_3112, PCI_ANY_ID, PCI_ANY_ID, 0, 
0, 1},
{ PCI_VENDOR_ID_CMD, PCI_DEVICE_ID_SII_1210SA, PCI_ANY_ID, PCI_ANY_ID, 
0, 0, 2},


and disabled CONFIG_BLK_DEV_IT821X - and now the disk is even faster
than before:

/dev/hde:
 Timing cached reads:88 MB in  2.01 seconds =  43.71 MB/sec
 Timing buffered disk reads:   34 MB in  3.02 seconds =  11.25 MB/sec


So it seems the problem is it82xx.c at least for my controllers.


4. Now there is 2.6.19 and the new ata driver model. So I tried that
one, too:

Booting with 
| CONFIG_PATA_IT821X=m
stalls the system upon module load. The last kernel messages were

ata1.00: tag 0 cmd 0xc4 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: port failed to respond (30 secs, Status 0xd0)
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: tag 0 cmd 0xc4 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: port failed to respond (30 secs, Status 0xd0)
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: tag 0 cmd 0xc4 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: port failed to respond (30 secs, Status 0xd0)
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: tag 0 cmd 0xc4 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: port failed to respond (30 secs, Status 0xd0)
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: tag 0 cmd 0xc4 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: port failed to respond (30 secs, Status 0xd0)
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: tag 0 cmd 0xc4 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: port failed to respond (30 secs, Status 0xd0)
Buffer I/O error on device sda, logical block 0


Patching the sil driver again:

--- ORG/linux-2.6.19.2/drivers/ata/pata_sil680.c2006-11-29 

it821x trouble since 2.6.18

2007-01-14 Thread Christoph Biedl
Hi,

There are ITE 8212-based controllers installed in some of my
computers. I had always skipped the build-in RAID things and used them
as plain controllers.

1. Beginning with 2.6.18 there was some trouble, basically slowed data
transfer, sometimes even a totally stalled system (I cannot reproduce
the latter though).

Diffing dmesg of 2.6.17.x and 2.6.18.x (same for 2.6.19):

 Probing IDE interface ide2...
 hde: SAMSUNG VA34324A, ATA DISK drive
 hde: Performing identify fixups.
 ide2 at 0xd000-0xd007,0xd402 on irq 10
 hde: max request size: 128KiB
-hde: 8446032 sectors (4324 MB) w/478KiB Cache, CHS=14896/9/63, BUG
+hde: 8446032 sectors (4324 MB) w/478KiB Cache, CHS=14896/9/63, BUG DMA OFF
  hde:hde: recal_intr: status=0x51 { DriveReady SeekComplete Error }
 hde: recal_intr: error=0x04 { DriveStatusError }
 ide: failed opcode was: unknown
+hde: irq timeout: status=0xd0 { Busy }
+ide: failed opcode was: unknown
+ide2: reset: master: ECC circuitry error
+hde: recal_intr: status=0x51 { DriveReady SeekComplete Error }
+hde: recal_intr: error=0x04 { DriveStatusError }
+ide: failed opcode was: unknown
  hde1 hde2  hde5 hde6 

OK, there was BUG up to and including 2.6.17 but the disk(s) worked
without any problem. 

Not surprisingly the disk throughput was affected by that. Comparing
hdparm -tT /dev/hde:

2.6.17
/dev/hde:
 Timing cached reads:88 MB in  2.01 seconds =  43.72 MB/sec
 Timing buffered disk reads:   26 MB in  3.21 seconds =   8.10 MB/sec

2.6.18
/dev/hde:
 Timing cached reads:90 MB in  2.04 seconds =  44.01 MB/sec
 Timing buffered disk reads:   16 MB in  3.14 seconds =   5.10 MB/sec


2. Trying to understand what has happened I found the main difference
is not in the driver but in ide-dma.c:

--- linux-2.6.17.14/drivers/ide/ide-dma.c   2006-06-18 01:49:35.0 +
+++ linux-2.6.18.6/drivers/ide/ide-dma.c2006-09-20 03:42:06.0 +
@@ -752,7 +750,7 @@
goto bug_dma_off;
printk(, DMA);
} else if (id-field_valid  1) {
-   printk(, BUG);
+   goto bug_dma_off;
}
return;
 bug_dma_off:


and reverting that change returns the old transfer rates. But that is
probably not a good idea.


3. To increase confusion: I had learned the ite8212 chip is close to
the cmd/sil680, and patching siimage.c had been a solution to get
support for 8212 before it showed up in the kernel.

So I tried this again:

--- ORG/linux-2.6.19.2/drivers/ide/pci/siimage.c2006-11-29 
21:57:37.0 +
+++ NEW/linux-2.6.19.2/drivers/ide/pci/siimage.c2007-01-14 
17:59:03.0 +
@@ -52,6 +52,7 @@
case PCI_DEVICE_ID_SII_1210SA:
return 1;
case PCI_DEVICE_ID_SII_680:
+   case PCI_DEVICE_ID_ITE_8212:
return 0;
}
BUG();
@@ -1082,6 +1083,7 @@

 static struct pci_device_id siimage_pci_tbl[] = {
{ PCI_VENDOR_ID_CMD, PCI_DEVICE_ID_SII_680,  PCI_ANY_ID, PCI_ANY_ID, 0, 
0, 0},
+   { PCI_VENDOR_ID_ITE, PCI_DEVICE_ID_ITE_8212,  PCI_ANY_ID, PCI_ANY_ID, 
0, 0, 0},
 #ifdef CONFIG_BLK_DEV_IDE_SATA
{ PCI_VENDOR_ID_CMD, PCI_DEVICE_ID_SII_3112, PCI_ANY_ID, PCI_ANY_ID, 0, 
0, 1},
{ PCI_VENDOR_ID_CMD, PCI_DEVICE_ID_SII_1210SA, PCI_ANY_ID, PCI_ANY_ID, 
0, 0, 2},


and disabled CONFIG_BLK_DEV_IT821X - and now the disk is even faster
than before:

/dev/hde:
 Timing cached reads:88 MB in  2.01 seconds =  43.71 MB/sec
 Timing buffered disk reads:   34 MB in  3.02 seconds =  11.25 MB/sec


So it seems the problem is it82xx.c at least for my controllers.


4. Now there is 2.6.19 and the new ata driver model. So I tried that
one, too:

Booting with 
| CONFIG_PATA_IT821X=m
stalls the system upon module load. The last kernel messages were

ata1.00: tag 0 cmd 0xc4 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: port failed to respond (30 secs, Status 0xd0)
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: tag 0 cmd 0xc4 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: port failed to respond (30 secs, Status 0xd0)
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: tag 0 cmd 0xc4 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: port failed to respond (30 secs, Status 0xd0)
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: tag 0 cmd 0xc4 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: port failed to respond (30 secs, Status 0xd0)
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: tag 0 cmd 0xc4 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: port failed to respond (30 secs, Status 0xd0)
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: tag 0 cmd 0xc4 Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: port failed to respond (30 secs, Status 0xd0)
Buffer I/O error on device sda, logical block 0


Patching the sil driver again:

--- ORG/linux-2.6.19.2/drivers/ata/pata_sil680.c2006-11-29 
22:57:37.0 +0100