[REGRESSION][BISECTED] 5.9-rc4 disables console on radeon

2020-09-08 Thread Mikael Pettersson
Starting with linux-5.9-rc4, the Dell monitor on my desktop PC goes
black during boot
when the kernel activates the framebuffer console, except for this
error message shown
in the center of the screen:

"Dell U2412M
 The current input timing is not supported by the monitor display. Please
 change your input timing to 1920x1200@60Hz or any other monitor
 listed timing as per the monitor specifications.
 "

The monitor remains black until I reboot.

All kernels up to and including 5.9-rc3 were Ok.  A git bisect identified

# first bad commit: [fc8c70526bd30733ea8667adb8b8ffebea30a8ed]
drm/radeon: Prefer lower feedback dividers

as the culprit, and reverting that from -rc4 makes the console work again.

Adding a bit of debugging code to that function shows:

avivo_get_fb_ref_div: fb_div_new 136 fb_div_old 137
avivo_get_fb_ref_div: fb_div_new 142 fb_div_old 143
avivo_get_fb_ref_div: fb_div_new 136 fb_div_old 137
avivo_get_fb_ref_div: fb_div_new 119 fb_div_old 120
avivo_get_fb_ref_div: fb_div_new 136 fb_div_old 137

during boot, where "new" is what the commit above changed the code to compute,
and "old" is the value computed by the working code from rc3.

The graphics card is a Radeon HD6450 silent model:
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
[AMD/ATI] Caicos [Radeon HD 6450/7450/8450 / R5 230 OEM] (prog-if 00
[VGA controller])


Re: [BUG] 2.6.25-rc2-git8 fails to boot on 486 due to TSC breakage

2008-02-25 Thread Mikael Pettersson
Ingo Molnar writes:
 > 
 > * H. Peter Anvin <[EMAIL PROTECTED]> wrote:
 > 
 > > Please fix it in both places.  Using XOR instead of AND-NOT is a bug, 
 > > plain and simple.
 > 
 > yes, i already fixed that when i added Mikael's patch and it's all 
 > queued up.

Ok. For reference and for LKML viewers, this is what
the final patch should be:

diff -rupN linux-2.6.25-rc3/arch/x86/kernel/cpu/common.c 
linux-2.6.25-rc3.x86-apply-cleared_cpu_caps-correctly/arch/x86/kernel/cpu/common.c
--- linux-2.6.25-rc3/arch/x86/kernel/cpu/common.c   2008-02-25 
09:29:03.0 +0100
+++ 
linux-2.6.25-rc3.x86-apply-cleared_cpu_caps-correctly/arch/x86/kernel/cpu/common.c
  2008-02-25 09:44:11.0 +0100
@@ -504,7 +504,7 @@ void __cpuinit identify_cpu(struct cpuin
 
/* Clear all flags overriden by options */
for (i = 0; i < NCAPINTS; i++)
-   c->x86_capability[i] ^= cleared_cpu_caps[i];
+   c->x86_capability[i] &= ~cleared_cpu_caps[i];
 
/* Init Machine Check Exception if available. */
mcheck_init(c);
diff -rupN linux-2.6.25-rc3/arch/x86/kernel/setup_64.c 
linux-2.6.25-rc3.x86-apply-cleared_cpu_caps-correctly/arch/x86/kernel/setup_64.c
--- linux-2.6.25-rc3/arch/x86/kernel/setup_64.c 2008-02-25 09:29:03.0 
+0100
+++ 
linux-2.6.25-rc3.x86-apply-cleared_cpu_caps-correctly/arch/x86/kernel/setup_64.c
2008-02-25 09:44:57.0 +0100
@@ -1021,7 +1021,7 @@ void __cpuinit identify_cpu(struct cpuin
 
/* Clear all flags overriden by options */
for (i = 0; i < NCAPINTS; i++)
-   c->x86_capability[i] ^= cleared_cpu_caps[i];
+   c->x86_capability[i] &= ~cleared_cpu_caps[i];
 
 #ifdef CONFIG_X86_MCE
mcheck_init(c);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[BUG] 2.6.25-rc3 hangs in early boot on Sun Ultra5

2008-02-25 Thread Mikael Pettersson
Booting 2.6.25-rc3 on my Ultra5 causes a hang before or as
the console is switched over to the framebuffer. The console
output is (extrapolated from dmesg in -rc2 and handwritten
notes, as I don't have a serial cable to my U5):

PROMLIB: Sun IEEE Boot Prom 'OBP 3.25.3 2000/06/29 14:12'
PROMLIB: Root node compatible: 
*** the following line can't be seen in dmesg after rc2 has booted
console [earlyprom0] enabled
Linux version 2.6.25-rc3 ([EMAIL PROTECTED]) (gcc version 4.2.3) #1 Mon Feb 25 
18:49:41 CET 2008
ARCH: SUN4U
Ethernet address: 08:00:20:fd:ec:1f
[0002-f840] page_structs=262144 node=0 entry=0/0
[0002-f880] page_structs=262144 node=0 entry=1/0
[0002-f8c0] page_structs=262144 node=0 entry=2/0
[0002-f8000100] page_structs=262144 node=0 entry=3/0
OF stdout device is: /[EMAIL PROTECTED],0/[EMAIL PROTECTED],1/SUNW,[EMAIL 
PROTECTED]
PROM: Built device tree with 46617 bytes of memory.
On node 0 totalpages: 32299
  Normal zone: 335 pages used for memmap
  Normal zone: 0 pages reserved
  Normal zone: 31964 pages, LIFO batch:7
  Movable zone: 0 pages used for memmap
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 31964
Kernel command line: ro root=/dev/sda5
PID hash table entries: 1024 (order: 10, 8192 bytes)
clocksource: mult[28000] shift[16]
clockevent: mult[] shift[32]
Console: colour dummy device 80x25
*** the following line can't be seen in dmesg after rc2 has booted
console handover: boot [earlyprom0] -> real [tty0]

At this point rc3 hangs hard and won't even respond to sysrq.

Another difference is that with rc2 the first few lines of kernel
output while the console is still in OF mode either aren't shown
or disappear quickly since the switch to the framebuffer occurs
within a fraction of a second after the kernel has been loaded.
With rc3 the kernel output (the text shown above) in the OF-mode
console is very very slow.

(I should have quoted my .config here but I forgot to bring it.
It's exactly the same in rc2 and rc3, however.)

I'll try some rc2->rc3 bisecting later.

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.25-rc3 hangs in early boot on Sun Ultra5

2008-02-26 Thread Mikael Pettersson
Mikael Pettersson writes:
 > Booting 2.6.25-rc3 on my Ultra5 causes a hang before or as
 > the console is switched over to the framebuffer. The console
 > output is (extrapolated from dmesg in -rc2 and handwritten
 > notes, as I don't have a serial cable to my U5):
 > 
 > PROMLIB: Sun IEEE Boot Prom 'OBP 3.25.3 2000/06/29 14:12'
 > PROMLIB: Root node compatible: 
 > *** the following line can't be seen in dmesg after rc2 has booted
 > console [earlyprom0] enabled
 > Linux version 2.6.25-rc3 ([EMAIL PROTECTED]) (gcc version 4.2.3) #1 Mon Feb 
 > 25 18:49:41 CET 2008
 > ARCH: SUN4U
 > Ethernet address: 08:00:20:fd:ec:1f
 > [0002-f840] page_structs=262144 node=0 entry=0/0
 > [0002-f880] page_structs=262144 node=0 entry=1/0
 > [0002-f8c0] page_structs=262144 node=0 entry=2/0
 > [0002-f8000100] page_structs=262144 node=0 entry=3/0
 > OF stdout device is: /[EMAIL PROTECTED],0/[EMAIL PROTECTED],1/SUNW,[EMAIL 
 > PROTECTED]
 > PROM: Built device tree with 46617 bytes of memory.
 > On node 0 totalpages: 32299
 >   Normal zone: 335 pages used for memmap
 >   Normal zone: 0 pages reserved
 >   Normal zone: 31964 pages, LIFO batch:7
 >   Movable zone: 0 pages used for memmap
 > Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 31964
 > Kernel command line: ro root=/dev/sda5
 > PID hash table entries: 1024 (order: 10, 8192 bytes)
 > clocksource: mult[28000] shift[16]
 > clockevent: mult[] shift[32]
 > Console: colour dummy device 80x25
 > *** the following line can't be seen in dmesg after rc2 has booted
 > console handover: boot [earlyprom0] -> real [tty0]
 > 
 > At this point rc3 hangs hard and won't even respond to sysrq.
 > 
 > Another difference is that with rc2 the first few lines of kernel
 > output while the console is still in OF mode either aren't shown
 > or disappear quickly since the switch to the framebuffer occurs
 > within a fraction of a second after the kernel has been loaded.
 > With rc3 the kernel output (the text shown above) in the OF-mode
 > console is very very slow.
 > 
 > (I should have quoted my .config here but I forgot to bring it.
 > It's exactly the same in rc2 and rc3, however.)
 > 
 > I'll try some rc2->rc3 bisecting later.

Minor update: rc2-git7 has the slow initial console behaviour,
but successfully switches to the framebuffer. rc2-git8 however
hangs in the console handover. So I'll bisect git7->git8 next.

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/36] AArch64 Linux kernel port

2012-07-07 Thread Mikael Pettersson
Catalin Marinas writes:
 > Compilation requires a new aarch64-none-linux-gnu-
 > toolchain (http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01694.html).

Where are the corresponding binutils patches?  Without those it's
impossible for people outside ARM to build the toolchain and kernel.

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA kernel-buffered read VERY slow (not raid, Promise TX300 card); 2.6.23.1(vanilla)

2008-01-03 Thread Mikael Pettersson
Linda Walsh writes:
 > Robert Hancock wrote:
 > > Linda Walsh wrote:
 > >> Alan Cox wrote:
 >  rate began falling; at 128k block-reads-at-a-time or larger, it 
 >  drops below
 >  20MB/s (only on buffered SATA).
 > >>> Try disabling NCQ - see if you've got a drive with the 'NCQ = no
 > >>> readahead' flaw.
 > > http://linux-ata.org/faq.html#ncq
 > ---
 > When drive initializes, dmesg says it has NCQ (depth 0/32)
 > Reading the queue_depth under /sys, shows a queuedepth of "1".
 > 
 > But more importantly -- I notice a chronic error message associate
 > with this drive that may be causing some or all of the problem:
 > ---
 > Jan  2 20:06:10 Ishtar kernel: ata1.00: exception Emask 0x0 SAct 0x0 
 > SErr 0x0 action 0x2
 > Jan  2 20:06:10 Ishtar kernel: ata1.00: port_status 0x2008
 > Jan  2 20:06:10 Ishtar kernel: ata1.00: cmd 
 > c8/00:10:30:06:03/00:00:00:00:00/e0 tag 0 cdb 0x0 data 8192 in
 > Jan  2 20:06:10 Ishtar kernel:  res 
 > 50/00:00:3f:06:03/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
 > Jan  2 20:06:13 Ishtar kernel: ata1: limiting SATA link speed to 1.5 Gbps
 > Jan  2 20:06:13 Ishtar kernel: ata1.00: exception Emask 0x0 SAct 0x0 
 > SErr 0x0 action 0x6
 > Jan  2 20:06:13 Ishtar kernel: ata1.00: port_status 0x2008
 > Jan  2 20:06:13 Ishtar kernel: ata1.00: cmd 
 > c8/00:10:00:8b:04/00:00:00:00:00/e0 tag 0 cdb 0x0 data 8192 in
 > Jan  2 20:06:13 Ishtar kernel:  res 
 > 50/00:00:0f:8b:04/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
 > Jan  2 20:06:14 Ishtar kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 
 > 0x0 action 0x3
 > Jan  2 20:06:14 Ishtar kernel: ata1: hotplug_status 0x80
 > Jan  2 20:06:15 Ishtar kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 
 > 0x0 action 0x3
 > Jan  2 20:06:15 Ishtar kernel: ata1: hotplug_status 0x80
 > ---
 > What da heck?

Looks like the Promise ASIC SG bug. Apply

and let us know if things improve.

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re:Believed resolved: SATA kern-buffRd read slow: based on promise driver bug

2008-01-04 Thread Mikael Pettersson
Linda Walsh writes:
 > Mikael Pettersson wrote:
 > > Linda Walsh writes:
 > >  > Robert Hancock wrote:
 > >  > > Linda Walsh wrote:
 > >  > >>>> read rate began falling; at 128k block-reads-at-a-time or larger, 
 > > it 
 > >  > >>>> drops below 20MB/s (only on buffered SATA).
 > >  > 
 > >  > But more importantly -- I notice a chronic error message associate
 > >  > with this drive that may be causing some or all of the problem:
 > >  > ---
 > >  > ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
 > >  > ata1.00: port_status 0x2008
 > >  > ata1.00: cmd c8/00:10:30:06:03/00:00:00:00:00/e0 tag 0 cdb 0x0 data 
 > > 8192 in
 > >  >  res 50/00:00:3f:06:03/00:00:00:00:00/e0 Emask 0x2 (HSM 
 > > violation)
 > >  > ata1: limiting SATA link speed to 1.5 Gbps
 > >
 > >
 > > Looks like the Promise ASIC SG bug. Apply
 > > <http://user.it.uu.se/~mikpe/linux/patches/sata_promise/patch-sata_promise-1-asic-sg-bug-fix-v3-2.6.23>
 > > and let us know if things improve.
 > >
 > > /Mikael
 > >   
 > ---
 > Yep!  Hope that's making it into a patch soon or, at least 2.6.24.
 > Kernel buffered

Good to hear that it solved this problem.
The patch is in 2.6.24-rc2 and newer kernels, and will be sent
to -stable for the 2.6.23 and 2.6.22 series.

 > I seem to remember reading about some problems with Promise SATA & ACPI.
 > Does this address that or is that a separate issue?  (Am using no-acpi for

sata_promise does nothing ACPI-related. It doesn't need to.
(Drives may be a different story.)

 > Is the above bug mentioned/discussed in the linux-ide archives?

Yes.

 >  That
 > and I'd like to find out why TCQ/NCQ doesn't work with the Seagate drives --

The driver doesn't yet support NCQ.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: unable to handle kernel paging request at virtual address

2008-01-05 Thread Mikael Pettersson
Alexander Shaduri writes:
 > On Sat, 5 Jan 2008 09:10:12 +
 > Al Viro <[EMAIL PROTECTED]> wrote:
 > 
 > > and we have q->page == 0x48464443.  Seeing how we assign that sucker, that
 > > smells like we've got a page on quicklist with {0x43, 0x44, 0x46, 0x48}
 > > in its first 4 bytes.  Instead of having address of the next page stored
 > > in there...
 > > 
 > > Do other oopsen of the same kind give the same value?
 > 
 > I've got another oops here with a different value. This time a bttv message
 > preceded it. Note that the oops happened shortly *after* I stopped capturing
 > (watching the tv through mplayer).
 > 
 > Output of dmesg:
 > 
 > bttv0: OCERR @ 375e2014,bits: HSYNC OFLOW FDSR OCERR*
 > (two pages of the same message here)
 > 
 > bttv0: OCERR @ 375e2014,bits: HSYNC OFLOW FDSR OCERR*
 > bttv0: OCERR @ 375e2014,bits: HSYNC OFLOW FBUS FDSR OCERR*
 > BUG: unable to handle kernel paging request at virtual address 23232323
 >  printing eip:
 > c011d6f8
 > *pde = 
 > Oops:  [#1]
 > PREEMPT SMP
 > Modules linked in: ppp_generic slhc iptable_filter ip_tables ip6table_filter 
 > ip6_tables x_tables ipv6 cpufreq_conservative cpufreq_ondemand 
 > cpufreq_userspace cpufreq_powersave powernow_k8 freq_table snd_pcm_oss 
 > snd_mixer_oss snd_seq_midi snd_emu10k1_synth snd_emux_synth snd_seq_virmidi 
 > snd_seq_midi_event snd_seq_midi_emul snd_seq capability commoncap fuse 
 > nls_koi8_r nls_cp866 loop dm_mod binfmt_misc uhci_hcd it87 hwmon_vid eeprom 
 > nvidia(P) tuner tvaudio snd_emu10k1 bttv snd_rawmidi snd_ac97_codec 
 > video_buf firmware_class ir_common ac97_bus snd_pcm snd_seq_device 
 > compat_ioctl32 i2c_algo_bit snd_timer snd_page_alloc emu10k1_gp snd_util_mem 
 > btcx_risc tveeprom videodev gameport ohci1394 ieee1394 ide_cd snd_hwdep snd 
 > v4l2_common v4l1_compat agpgart soundcore i2c_nforce2 thermal button 
 > rtc_cmos rtc_core rtc_lib forcedeth k8temp i2c_core hwmon cdrom sg ohci_hcd 
 > ehci_hcd usbcore edd fan processor pata_amd
 > CPU:0
 > EIP:0060:[]Tainted: PVLI

This kernel is tainted by the nvidia module...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[BUG] 2.6.24-rc8 broke warm reboots on ASUS P5B-E Plus

2008-01-17 Thread Mikael Pettersson
The problematic machine has an Intel P965/ICH8R based ASUS P5B-E Plus
mainboard with a Core2Duo 6600 processor. Kernels up to and including
2.6.24-rc7 work fine on it.

Kernel 2.6.24-rc8 boots Ok, but if I try to do a warm reboot after
having run 2.6.24-rc8, the BIOS hangs. The initial BIOS screen shows:






[here it always hangs after having run 2.6.24-rc8]



Pressing reset will not fix the hang, nor will powering the machine
down using the mainboard's power button. The only thing that works is
to switch the PSU off, wait a few seconds, switch the PSU on again,
and then press the mainboard's power button.

Both the 32-bit and 64-bit 2.6.24-rc8 x86 kernels cause this problem,
and it's 100% repeatable.

I'll try to do some rc7->rc8 bisecting tomorrow. Meanwhile, I'm including
lspci and .config below.

/Mikael

00:00.0 Host bridge: Intel Corporation Unknown device 29a0 (rev 02)
00:01.0 PCI bridge: Intel Corporation Unknown device 29a1 (rev 02)
00:1a.0 USB Controller: Intel Corporation Unknown device 2834 (rev 02)
00:1a.1 USB Controller: Intel Corporation Unknown device 2835 (rev 02)
00:1a.7 USB Controller: Intel Corporation Unknown device 283a (rev 02)
00:1b.0 Audio device: Intel Corporation Unknown device 284b (rev 02)
00:1c.0 PCI bridge: Intel Corporation Unknown device 283f (rev 02)
00:1c.5 PCI bridge: Intel Corporation Unknown device 2849 (rev 02)
00:1d.0 USB Controller: Intel Corporation Unknown device 2830 (rev 02)
00:1d.1 USB Controller: Intel Corporation Unknown device 2831 (rev 02)
00:1d.2 USB Controller: Intel Corporation Unknown device 2832 (rev 02)
00:1d.7 USB Controller: Intel Corporation Unknown device 2836 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev f2)
00:1f.0 ISA bridge: Intel Corporation Unknown device 2810 (rev 02)
00:1f.2 SATA controller: Intel Corporation Unknown device 2821 (rev 02)
00:1f.3 SMBus: Intel Corporation Unknown device 283e (rev 02)
01:00.0 VGA compatible controller: ATI Technologies Inc RV370 [ATI Sapphire 
X550 Silent]
01:00.1 Display controller: ATI Technologies Inc RV370 secondary [ATI Sapphire 
X550 Silent]
02:00.0 Ethernet controller: Marvell Technology Group Ltd. Unknown device 4364 
(rev 12)

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.24-rc8
# Thu Jan 17 22:36:54 2008
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
# CONFIG_X86_64 is not set
CONFIG_X86=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
# CONFIG_RWSEM_GENERIC_SPINLOCK is not set
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_GENERIC_CALIBRATE_DELAY=y
# CONFIG_GENERIC_TIME_VSYSCALL is not set
CONFIG_ARCH_SUPPORTS_OPROFILE=y
# CONFIG_ZONE_DMA32 is not set
CONFIG_ARCH_POPULATES_NODE_MAP=y
# CONFIG_AUDIT_ARCH is not set
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_X86_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_BIOS_REBOOT=y
CONFIG_X86_TRAMPOLINE=y
CONFIG_KTIME_SCALAR=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
# CONFIG_POSIX_MQUEUE is not set
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_USER_NS is not set
# CONFIG_PID_NS is not set
# CONFIG_AUDIT is not set
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=15
# CONFIG_CGROUPS is not set
# CONFIG_FAIR_GROUP_SCHED is not set
CONFIG_SYSFS_DEPRECATED=y
# CONFIG_RELAY is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_EMBEDDED=y
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
# CONFIG_SIGNALFD is not set
# CONFIG_EVENTFD is not set
CONFIG_SHMEM=y
# CONFIG_VM_EVENT_COUNTERS is not set
CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
CONFIG_MODVERSIONS=y
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=y
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
# CONFIG_LBD is not set
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_LSF is not set
# CONFIG_BLK_DEV_BSG is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y

Re: NFSv2/3 broken exporting/mounting (permission denied) in 2.6.24-rc4

2007-12-07 Thread Mikael Pettersson
On Thu, 6 Dec 2007 21:20:41 -0500, Erez Zadok wrote:
> I get a "permission denied" when trying to mount a localhost nfsv2/3
> exported volume, on v2.6.24-rc4-124-gf194d13.  It works w/ nfsv4 mounting.
> It worked fine in 2.6.24-rc3.  Here's a sequence of ops I tried:
> 
> # mount -t ext2 /dev/hdb1 /n/lower/b0
> # exportfs -o no_root_squash,rw localhost:/n/lower/b0
> # mount -t nfs -o nfsvers=3 localhost:/n/lower/b0 /mnt

I'm seeing something similar too. NFSv3 export of an ext3 partition
to another machine in my lan fails (client gets permission denied)
when the server runs 2.6.24-rc4. It worked fine in 2.6.24-rc3.

There's no NFSv4 of any kind on either client or server.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: acpi ->video_device_list corruption

2007-12-12 Thread Mikael Pettersson
William Lee Irwin III writes:
 > The ->cap fields of struct acpi_video_device and struct acpi_video_bus
 > are 1B each, not 4B. The oversized memset()'s corrupted the subsequent
 > list_head fields. This resulted in silent corruption without
 > CONFIG_DEBUG_LIST and BUG's with it. This patch uses sizeof() to pass
 > the proper bounds to the memset() calls and thereby correct the bugs.
 > 
 > Included as a MIME attachment is a compressed dmesg from an affected
 > system. The patch was seen to resolve the issue on the affected system.
 > 
 > vs. 2.6.24-rc5
 > 
 > Signed-off-by: William Irwin <[EMAIL PROTECTED]>
 > 
 > 
 > -- wli
 > 
 > diff --git a/drivers/acpi/video.c b/drivers/acpi/video.c
 > index 44a0d9b..7895d57 100644
 > --- a/drivers/acpi/video.c
 > +++ b/drivers/acpi/video.c
 > @@ -577,7 +577,7 @@ static void acpi_video_device_find_cap(struct 
 > acpi_video_device *device)
 >  struct acpi_video_device_brightness *br = NULL;
 >  
 >  
 > -memset(>cap, 0, 4);
 > +memset(>cap, 0, sizeof(struct acpi_video_device_cap));

IMO the memset(ptr, 0, sizeof(*ptr)) idiom is both safer
and avoids having to write an uninteresting type name.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Soft lockup 2.6.23.14-uc0

2008-02-05 Thread Mikael Pettersson
Doug Kehn writes:
 > Hi All,
 > 
 > I am observing kernel soft lockups when running
 > network throughput tests with NUTTCP.  The kernel is a
 > stock 2.6.23 kernel with patches from uClinux.org.  I
 > have applied the incremental 2.6.23 patches to produce
 > the resulting 2.6.23.14-uc0 kernel.  This kernel is
 > executing on a 266MHz Intel XScale IXP420 processor
 > with 16MB flash (JFFS2) and 64MB RAM.  I am also using
 > the Intel Access Library v2.4 with patches from
 > snapgear.org.  (The Intel Access Library is the reason
 > for the tainted kernel.)  The toolchain to build the
 > kernel and all applications is comprised of:
 > 
 > binutils-2.16.tar.gz
 > gcc-3.4.4.tar.gz
 > glibc-2.3.3.tar.gz
 > glibc-linuxthreads-2.3.3.tar.gz
 > 
 > All applications are compiled against uClibc-0.9.27.
 > 
 > A soft lockup dump is provided below.  Any help in
 > determining the cause of the soft lock will be
 > appreciated.
 > 
 > Regards,
 > ...doug
 > 
 > 
 > # BUG: soft lockup - CPU#0 stuck for 11s! [awk:2960]
 > 
 > Pid: 2960, comm:  awk
 > CPU: 0Tainted: P (2.6.23.14-uc0 #1)
 > PC is at handle_IRQ_event+0x34/0x80
 > LR is at handle_level_irq+0x98/0xec
 > pc : []lr : []psr:
 > 4013
 > sp : c353deb0  ip : c353ded0  fp : c353decc
 > r10: 4000d090  r9 : c353c000  r8 : 4000515c
 > r7 : 0012  r6 :   r5 :   r4 :
 > c3f68a60
 > r3 : 4013  r2 : c025151c  r1 : c3f68a60  r0 :
 > 0012
 > Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM 
 > Segment user
 > Control: 39ff  Table: 0350  DAC: 0015
 > [] (show_regs+0x0/0x4c) from []
 > (softlockup_tick+0xe8/0x114)
 >  r4:1e13
 > [] (softlockup_tick+0x0/0x114) from
 > [] (run_local_timers+0x1
 > 8/0x1c)

Is this a new ixp4xx platform or one of the existing
ones in arch/arm/mach-ixp4xx?

Anyway, I can think of two things:

1. There was some very recent patches by Peter Zijlstra
   addressing hrtimer breakage on arm and some other
   archs in 2.6.24-git. If uclinux has backported some
   of that stuff then it might explain this issue.

2. There is a new native Linux driver for ixp4xx
   ethernet. Patches for the 2.6.23.14 kernel can
   be found in the nslu2-linux group's subversion
   repository. (You'll need new firmware files though.)
   Replacing Intel's IXP400 drivers with this driver
   should at least tell you if the lockups are
   related to your use of the Intel drivers.

   FWIW, I've never seen these lockups on my ixp4xx
   boxes, with the Intel IXP400 drivers or with
   the new native Linux drivers.

You should also Cc: the linux arm kernel mailing list,
as the issue probably is platform specific.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc8 broke warm reboots on ASUS P5B-E Plus

2008-01-18 Thread Mikael Pettersson
On Thu, 17 Jan 2008 23:13:50 +0100, Mikael Pettersson wrote:
 > The problematic machine has an Intel P965/ICH8R based ASUS P5B-E Plus
 > mainboard with a Core2Duo 6600 processor. Kernels up to and including
 > 2.6.24-rc7 work fine on it.
 > 
 > Kernel 2.6.24-rc8 boots Ok, but if I try to do a warm reboot after
 > having run 2.6.24-rc8, the BIOS hangs. The initial BIOS screen shows:
 > 
 > 
 > 
 > 
 > 
 > 
 > [here it always hangs after having run 2.6.24-rc8]
 > 
 > 
 > 
 > Pressing reset will not fix the hang, nor will powering the machine
 > down using the mainboard's power button. The only thing that works is
 > to switch the PSU off, wait a few seconds, switch the PSU on again,
 > and then press the mainboard's power button.
 > 
 > Both the 32-bit and 64-bit 2.6.24-rc8 x86 kernels cause this problem,
 > and it's 100% repeatable.
 > 
 > I'll try to do some rc7->rc8 bisecting tomorrow.

I've now narrowed it down to the following change in 2.6.24-rc7-git5:
84cd2dfb04d23a961c5f537baa243fa54d0987ac

>sky2: remove check for PCI wakeup setting from BIOS
>
>The driver checks status of PCI power management to mark
>default setting of Wake On Lan. On some systems this works, but often
>it reports a that WOL is disabled when it isn't.
>
>This patch gets rid of that check and just reports the wake on
>lan status based on the hardware capablity.
>
>Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
>Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]>

Reverting this eliminates the abovementioned BIOS hang.

I added a debug printk to sky2_init_netdev(), and it showed:

sky2_init_netdev: wol == 0x0, (sky2_wol_supported(hw) & WAKE_MAGIC) == 0x20

That is, 2.6.24-rc7-git4 and earlier drivers would set sky2->wol to 0,
while 2.6.24-rc7-git5 and newer will set it to 0x20. A quick look through
sky2.c shows that this will affect HW programming in several places.

Please revert or fix before 2.6.24 final.

lspci -vvxxx included below.

/Mikael

02:00.0 Ethernet controller: Marvell Technology Group Ltd. Unknown device 4364 
(rev 12)
Subsystem: ASUSTeK Computer Inc. Unknown device 81f8
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: The SX4 challenge

2008-01-20 Thread Mikael Pettersson
Jeff Garzik writes:
 > 
 > Promise just gave permission to post the docs for their PDC20621 (i.e. 
 > SX4) hardware:
 > http://gkernel.sourceforge.net/specs/promise/pdc20621-pguide-1.2.pdf.bz2
 > 
 > joining the existing PDC20621 DIMM and PLL docs:
 > http://gkernel.sourceforge.net/specs/promise/pdc20621-pguide-dimm-1.6.pdf.bz2
 > http://gkernel.sourceforge.net/specs/promise/pdc20621-pguide-pll-ata-timing-1.2.pdf.bz2
 > 
 > 
 > So, the SX4 is now open.  Yay :)  I am hoping to talk Mikael into 
 > becoming the sata_sx4 maintainer, and finally integrating my 'new-eh' 
 > conversion in libata-dev.git.

The best solution would be if some storage driver person would
take on the SX4 challenge and work towards integrating the SX4
into Linux' RAID framework.

If no-one steps forward I'll take over Jeff's SX4 card and just
maintain sata_sx4 as a plain non-RAID driver. Unfortunately I
don't have the time needed to turn it into a decent RAID or
RAID-offload driver myself.

/Mikael

 > 
 > But now is a good time to remind people how lame the sata_sx4 driver 
 > software really is -- and I should know, I wrote it.
 > 
 > The SX4 hardware, simplified, is three pieces:  XOR engine (for raid5), 
 > host<->board memcpy engine, and several ATA engines (and some helpful 
 > transaction sequencing features).  Data for each WRITE command is first 
 > copied to the board RAM, then the ATA engines DMA to/from the board RAM. 
 >   Data for each READ command is copied to board RAM via the ATA engines, 
 > then DMA'd across PCI to your host memory.
 > 
 > Therefore, while it is not hardware RAID, the SX4 provides all the 
 > pieces necessary to offload RAID1 and RAID5, and handle other RAID 
 > levels optimally.  RAID1 and 5 copies can be offloaded (provided all 
 > copies go to SX4-attached devices of course).  RAID5 XOR gen and 
 > checking can be offloaded, allowing the OS to see a single request, 
 > while the hardware processes a sequence of low-level requests sent in a 
 > batch.
 > 
 > This hardware presents an interesting challenge:  it does not really fit 
 > into software RAID (i.e. no RAID) /or/ hardware RAID categories.  The 
 > sata_sx4 driver presents the no-RAID configuration, while is terribly 
 > inefficient:
 > 
 >  WRITE:
 >  submit host DMA (copy to board)
 >  host DMA completion via interrupt
 >  submit ATA command
 >  ATA command completion via interrupt
 >  READ:
 >  submit ATA command
 >  ATA command completion via interrupt
 >  submit host DMA (copy from board)
 >  host DMA completion via interrupt
 > 
 > Thus, the "SX4 challenge" is a challenge to developers to figure out the 
 > most optimal configuration for this hardware, given the existing MD and 
 > DM work going on.
 > 
 > Now, it must be noted that the SX4 is not current-gen technology.  Most 
 > vendors have moved towards an "IOP" model, where the hw vendor puts most 
 > of their hard work into an ARM/MIPS firmware, running on an embedded 
 > chip specially tuned for storage purposes.  (ref "hptiop" and "stex" 
 > drivers, very very small SCSI drivers)
 > 
 > I know Dan Williams @ Intel is working on very similar issues on the IOP 
 > -- async memcpy, XOR offload, etc. -- and I am hoping that, due to that 
 > current work, some of the good ideas can be reused with the SX4.
 > 
 > Anyway...  it's open, it's interesting, even if it's not current-gen 
 > tech anymore.  You can probably find them on Ebay or in an 
 > out-of-the-way computer shop somewhere.
 > 
 >  Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: modprobing ipmi_si on Dell Power Edge 2600 make the terminal hang kernel ver. >= 2.6.20

2008-01-21 Thread Mikael Pettersson
william cheng writes:
 > Dear all,
 > 
 > We got some problem on modprobing the ipmi_si module on Dell
 > Power Edge 2600.
 > On modprobing the ipmi_si the terminal hang and the process
 > cannot be terminated by control-C.
 > We got these messages in dmesg
 > 
 > ipmi message handler version 39.1
 > IPMI System Interface driver.
 > ipmi_si: Trying SMBIOS-specified bt state machine at i/o
 > address 0xe4, slave address 0x20, irq 10
 >   Using irq 10
 > IPMI BT: req2rsp=10 secs retries=3
 > ipmi_si: Error clearing flags: c6
 > 
 > 
 > We have tested a few other dell machine (Power Edge 2500,
 > 2900, ), and this problem only occurs in Power Edge 2600.
 > 
 > 
 > The problem occurs in the following testes
 > Debian testing lenny
 > Ubuntu 7.10
 > Ubuntu 7.04
 > Fedore 8
 > Kernel 2.6.20
 > Kernel 2.6.21
 > Kernel 2.6.22
 > Kernel 2.6.22.5
 > Kernel 2.6.22.14
 > Kernel 2.6.22.15
 > Kernel 2.6.23.12
 > 
 > While in the following testes we can successfully modprobe the 
 > ipmi_si
 > Ubuntu 6.10
 > Debian 4.0r2
 > Kernel 2.6.18.3
 > Kernel 2.6.18.5
 > Kernel 2.6.18.8
 > Kernel 2.6.19
 > Kernel 2.6.19.7
 > 
 > It looks like the problem only occurs when using kernel after
 > 2.6.20 with Power Edge 2600.

I saw the same loop when upgrading a PE2600 from an RHEL4 2.6.9
kernel to an RHEL5 2.6.18 kernel. Since I had no interest in ipmi_si
I just de-configured it and wrote it off as yet another RHEL bug.
Seems now that RedHat backported an upstream bug :-(
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2.6.24] sym53c8xx_2 modpost section mismatch fix

2008-01-26 Thread Mikael Pettersson
Building 2.6.24 with

# CONFIG_HOTPLUG is not set
CONFIG_SCSI_SYM53C8XX_2=y

results in the following during modpost:

WARNING: vmlinux.o(.text+0x14b36c): Section mismatch: reference to 
.exit.text:sym2_remove (between 'sym2_io_error_detected' and 
'sym_set_cam_result_error')

because sym2_io_error_detected() calls sym2_remove(), which is marked __devexit.

Fixed by removing the __devexit from sym2_remove().

Signed-off-by: Mikael Pettersson <[EMAIL PROTECTED]>
---
Resend. Previously reported against 2.6.24-rc6 on 2007-12-15.

--- linux-2.6.24-rc5/drivers/scsi/sym53c8xx_2/sym_glue.c.~1~2007-12-15 
15:37:04.0 +0100
+++ linux-2.6.24-rc5/drivers/scsi/sym53c8xx_2/sym_glue.c2007-12-15 
16:22:08.0 +0100
@@ -1744,7 +1744,7 @@ static int __devinit sym2_probe(struct p
return -ENODEV;
 }
 
-static void __devexit sym2_remove(struct pci_dev *pdev)
+static void sym2_remove(struct pci_dev *pdev)
 {
struct Scsi_Host *shost = pci_get_drvdata(pdev);
 
@@ -2056,7 +2056,7 @@ static struct pci_driver sym2_driver = {
.name   = NAME53C8XX,
.id_table   = sym2_id_table,
.probe  = sym2_probe,
-   .remove = __devexit_p(sym2_remove),
+   .remove = sym2_remove,
.err_handler= _err_handler,
 };
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.24] sym53c8xx_2 modpost section mismatch fix

2008-01-26 Thread Mikael Pettersson
Sam Ravnborg writes:
 > On Sat, Jan 26, 2008 at 07:03:15PM +0100, Mikael Pettersson wrote:
 > > Building 2.6.24 with
 > > 
 > > # CONFIG_HOTPLUG is not set
 > > CONFIG_SCSI_SYM53C8XX_2=y
 > > 
 > > results in the following during modpost:
 > > 
 > > WARNING: vmlinux.o(.text+0x14b36c): Section mismatch: reference to 
 > > .exit.text:sym2_remove (between 'sym2_io_error_detected' and 
 > > 'sym_set_cam_result_error')
 > > 
 > > because sym2_io_error_detected() calls sym2_remove(), which is marked 
 > > __devexit.
 > > 
 > > Fixed by removing the __devexit from sym2_remove().
 > > 
 > > Signed-off-by: Mikael Pettersson <[EMAIL PROTECTED]>
 > > ---
 > > Resend. Previously reported against 2.6.24-rc6 on 2007-12-15.
 > 
 > Fixed in upstream kernel by
 > commit: 864473cbe99e95a57ad496894768cd77a567

Great, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Mikael Pettersson
Gene Heskett writes:
 > Greeting;
 > 
 > I had to reboot early this morning due to a freezeup, and I had a 
 > bunch of these in the messages log:
 > ==
 > Jan 27 19:42:11 coyote kernel: [42461.915961] ata1.00: exception Emask 0x0 
 > SAct 0x0 SErr 0x0 action 0x2 frozen
 > Jan 27 19:42:11 coyote kernel: [42461.915973] ata1.00: cmd 
 > ca/00:08:b1:66:46/00:00:00:00:00/e8 tag 0 dma 4096 out
 > Jan 27 19:42:11 coyote kernel: [42461.915974]  res 
 > 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
 > Jan 27 19:42:11 coyote kernel: [42461.915978] ata1.00: status: { DRDY }
 > Jan 27 19:42:11 coyote kernel: [42461.916005] ata1: soft resetting link
 > Jan 27 19:42:12 coyote kernel: [42462.078216] ata1.00: configured for 
 > UDMA/100
 > Jan 27 19:42:12 coyote kernel: [42462.078232] ata1: EH complete
 > Jan 27 19:42:12 coyote kernel: [42462.090700] sd 0:0:0:0: [sda] 390721968 
 > 512-byte hardware sectors (200050 MB)
 > Jan 27 19:42:12 coyote kernel: [42462.114230] sd 0:0:0:0: [sda] Write 
 > Protect is off
 > Jan 27 19:42:12 coyote kernel: [42462.115079] sd 0:0:0:0: [sda] Write cache: 
 > enabled, read cache: enabled, doesn't 
 > support DPO or FUA
 > ===
 > That one showed up about 2 hours ago, so I expect I'll be locked 
 > up again before I've managed a 24 hour uptime.  This drive passed
 > a 'smartctl -t long /dev/sda' with flying colors after the reboot
 > this morning.
 > 
 > Two instances were logged after I had rebooted to 2.6.24 from 2.6.24-rc8:
 > 
 > Jan 24 20:46:33 coyote kernel: [0.00] Linux version 2.6.24 ([EMAIL 
 > PROTECTED]) (gcc version 4.1.2 20070925 
 > (Red Hat 4.1.2-33)) #1 SMP Thu Jan 24 20:17:55 EST 2008
 > 
 > Jan 27 02:28:29 coyote kernel: [193207.445158] ata1.00: exception Emask 0x0 
 > SAct 0x0 SErr 0x0 action 0x2 frozen
 > Jan 27 02:28:29 coyote kernel: [193207.445170] ata1.00: cmd 
 > 35/00:08:f9:24:0a/00:00:17:00:00/e0 tag 0 dma 4096 out
 > Jan 27 02:28:29 coyote kernel: [193207.445172]  res 
 > 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
 > Jan 27 02:28:29 coyote kernel: [193207.445175] ata1.00: status: { DRDY }
 > Jan 27 02:28:29 coyote kernel: [193207.445202] ata1: soft resetting link
 > Jan 27 02:28:29 coyote kernel: [193207.607384] ata1.00: configured for 
 > UDMA/100
 > Jan 27 02:28:29 coyote kernel: [193207.607399] ata1: EH complete
 > Jan 27 02:28:29 coyote kernel: [193207.609681] sd 0:0:0:0: [sda] 390721968 
 > 512-byte hardware sectors (200050 MB)
 > Jan 27 02:28:29 coyote kernel: [193207.619277] sd 0:0:0:0: [sda] Write 
 > Protect is off
 > Jan 27 02:28:29 coyote kernel: [193207.649041] sd 0:0:0:0: [sda] Write 
 > cache: enabled, read cache: enabled, doesn't 
 > support DPO or FUA
 > Jan 27 02:30:06 coyote kernel: [193304.336929] ata1.00: exception Emask 0x0 
 > SAct 0x0 SErr 0x0 action 0x2 frozen
 > Jan 27 02:30:06 coyote kernel: [193304.336940] ata1.00: cmd 
 > ca/00:20:69:22:a6/00:00:00:00:00/e7 tag 0 dma 16384 out
 > Jan 27 02:30:06 coyote kernel: [193304.336942]  res 
 > 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
 > Jan 27 02:30:06 coyote kernel: [193304.336945] ata1.00: status: { DRDY }
 > Jan 27 02:30:06 coyote kernel: [193304.336972] ata1: soft resetting link
 > Jan 27 02:30:06 coyote kernel: [193304.499210] ata1.00: configured for 
 > UDMA/100
 > Jan 27 02:30:06 coyote kernel: [193304.499226] ata1: EH complete
 > Jan 27 02:30:06 coyote kernel: [193304.499714] sd 0:0:0:0: [sda] 390721968 
 > 512-byte hardware sectors (200050 MB)
 > Jan 27 02:30:06 coyote kernel: [193304.499857] sd 0:0:0:0: [sda] Write 
 > Protect is off
 > Jan 27 02:30:06 coyote kernel: [193304.502315] sd 0:0:0:0: [sda] Write 
 > cache: enabled, read cache: enabled, doesn't 
 > support DPO or FUA
 > 
 > None were logged during the time I was running an -rc7 or -rc8.
 > 
 > The previous hits on this resulted in the udma speed being downgraded 
 > till it was actually running in pio just before the freeze that 
 > required the hardware reset button.
 > 
 > I'll reboot to -rc8 right now and resume.  If its the drive, I should see it.
 > If not, then 2.6.24 is where I'll point the finger.
 > 
 > Idea's anyone?

1. Wrong mailing list; use linux-ide (@vger) instead.
2. Incomplete dmesg, in particular, we can't see what your hardware is.
   Just post the complete dmesg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24 regression: Wake On Lan in sky2 broken on Mac mini

2008-01-28 Thread Mikael Pettersson
Tino Keitel writes:
 > Hi folks,
 > 
 > with 2.6.24-rc8, Wake On LAN doesn't work anymore as it used to with
 > 2.6.23 on my Mac mini Core Duo. I saw that this was reported in
 > http://bugzilla.kernel.org/show_bug.cgi?id=9721 and on netdev a patch
 > for the sky2 driver was sent by Stephen Hemminger. This patch fixed WOL
 > for me after applying it to 2.6.24-rc8.
 > 
 > However, it seems as the patch never made it into the kernel. Instead,
 > the commit that was suspected to break WOL
 > (84cd2dfb04d23a961c5f537baa243fa54d0987ac) was reverted
 > (be63a21c9573fbf88106ff0f030da5974551257b).
 > 
 > Now I tried the 2.6.24 release and noticed that WOL is still broken.
 > I'll be happy to test any patches that can make it into 2.6.24.1.

1. Wrong mailing list; use netdev (@vger) instead.
2. The reverted commit had much much more serious consequences than
   "wol doesn't work", it actually caused BIOS hangs and failed reboots.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Mikael Pettersson
Peter Zijlstra writes:
 > 
 > On Mon, 2008-01-28 at 09:17 +0100, Mikael Pettersson wrote:
 > 
 > > 1. Wrong mailing list; use linux-ide (@vger) instead.
 > 
 > What, and keep all us other interested people in the dark?

MAINTAINERS clearly lists linux-ide as the primary mailing
list for all things IDE/ATA.

The original report only went to LKML, thus it has a high
chance of being missed or ignored by those most capable of
dealing with it.

If a topic is of general interest a simple Cc: lkml will
keep other parties in the loop.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Mikael Pettersson
Gene Heskett writes:
 > On Monday 28 January 2008, Peter Zijlstra wrote:
 > >On Mon, 2008-01-28 at 09:17 +0100, Mikael Pettersson wrote:
 > >> 1. Wrong mailing list; use linux-ide (@vger) instead.
 > >
 > >What, and keep all us other interested people in the dark?
 > 
 > As a test, I tried rebooting to the latest fedora kernel and found it kills 
 > X, 
 > so I'm back to the second to last fedora version ATM, and the 
 > third 'smartctl -t lng /dev/sda' in 24 hours is running now.  The first two 
 > completed with no errors.
 > 
 > I've added the linux-ide list to refresh those people of the problem, 
 > the logs are being spammed by this message stanza:
 > 
 >  Jan 28 04:46:25 coyote kernel: [26550.290016] ata1.00: exception Emask 0x0 
 > SAct 0x0 SErr 0x0 action 0x2 frozen
 > Jan 28 04:46:25 coyote kernel: [26550.290028] ata1.00: cmd 
 > 35/00:58:c9:9c:0a/00:01:00:00:00/e0 tag 0 dma 176128 out
 > Jan 28 04:46:25 coyote kernel: [26550.290029]  res 
 > 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
 > Jan 28 04:46:25 coyote kernel: [26550.290032] ata1.00: status: { DRDY }
 > Jan 28 04:46:25 coyote kernel: [26550.290060] ata1: soft resetting link
 > Jan 28 04:46:25 coyote kernel: [26550.452301] ata1.00: configured for 
 > UDMA/100
 > Jan 28 04:46:25 coyote kernel: [26550.452318] ata1: EH complete
 > Jan 28 04:46:25 coyote kernel: [26550.455898] sd 0:0:0:0: [sda] 390721968 
 > 512-byte hardware sectors (200050 MB)
 > Jan 28 04:46:25 coyote kernel: [26550.456151] sd 0:0:0:0: [sda] Write 
 > Protect is off
 > Jan 28 04:46:25 coyote kernel: [26550.456403] sd 0:0:0:0: [sda] Write cache: 
 > enabled, read cache: enabled, doesn't 
 > support DPO or FUA

It's not obvious from this incomplete dmesg log what HW or driver
is behind ata1, but if the 2.6.24-rc7 kernel matches the 2.6.24 one,
it should be pata_amd driving a WDC disk:

 > [   30.702887] pata_amd :00:09.0: version 0.3.10
 > [   30.703052] PCI: Setting latency timer of device :00:09.0 to 64
 > [   30.703188] scsi0 : pata_amd
 > [   30.709313] scsi1 : pata_amd
 > [   30.710076] ata1: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xf000 irq 
 > 14
 > [   30.710079] ata2: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xf008 irq 
 > 15
 > [   30.864753] ata1.00: ATA-6: WDC WD2000JB-00EVA0, 15.05R15, max UDMA/100
 > [   30.864756] ata1.00: 390721968 sectors, multi 16: LBA48 
 > [   30.871629] ata1.00: configured for UDMA/100

Unfortunately we also see:

 > [   48.285456] nvidia: module license 'NVIDIA' taints kernel.
 > [   48.549725] ACPI: PCI Interrupt :02:00.0[A] -> Link [APC4] -> GSI 19 
 > (level, high) -> IRQ 20
 > [   48.550149] NVRM: loading NVIDIA UNIX x86 Kernel Module  169.07  Thu Dec 
 > 13 18:42:56 PST 2007

We have no way of debugging that module, so please try 2.6.24 without it.
If the problems persist, please try to capture a complete log from the
failing kernel -- the interesting bits are everything from initial boot
up to and including the first few errors. You may need to increase the
kernel's log buffer size if the log gets truncated (CONFIG_LOG_BUF_SHIFT).

There are no pata_amd changes from 2.6.24-rc7 to 2.6.24 final.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24 regression: Wake On Lan in sky2 broken on Mac mini

2008-01-28 Thread Mikael Pettersson
Ingo Molnar writes:
 > 
 > * Mikael Pettersson <[EMAIL PROTECTED]> wrote:
 > 
 > >  > Now I tried the 2.6.24 release and noticed that WOL is still 
 > >  > broken. I'll be happy to test any patches that can make it into 
 > >  > 2.6.24.1.
 > > 
 > > 1. Wrong mailing list; use netdev (@vger) instead.
 > 
 > lkml is the right mailing list for reporting Linux bugs.
 > 
 > this is an extermely harmful trend i've seen lately: some kernel hackers 
 > going out on a limb directing the flow of bugreports _away_ from lkml, 
 > by suggesting to testers that lkml is somehow inappropriate for 
 > reporting Linux kernel bugs.
 > 
 > It's not even the standard "I Cc:-ed netdev, maybe they are interested 
 > in this" message but the above, plain incorrect: "this is the wrong 
 > mailing list" message.
 > 
 > Mikael, what you do is as harmful to Linux as if you were intentionally 
 > putting bugs into the kernel source. In fact it's more harmful because 
 > it is irreversible: bugs you put into Linux i can fix and i can review 
 > all past patches you did to undo the damage - tester attention and 
 > feedback you redirect we cannot direct back.

Ok, I can see how my overly terse statement could be interpreted
in this way, and I apologize for that.

However, it _is_ a fact that there is a proliferation of specialized
mailing lists, and it is also a fact that many developers _only_ read
those lists. I'm in no way defending this behaviour, on the contrary
I probably dislike it as much as you do. But we can't ignore it.

I should of course have written something like "please cc: "
instead of the stupid "wrong mailing" list comment.

 > Stop it!

Gladly.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Mikael Pettersson
Gene Heskett writes:
 > On Tuesday 29 January 2008, Alan Cox wrote:
 > >> As slight change here, I was going to use the same .config as 2.6.24-rc8,
 > >> but just discovered that neither rc8 nor final is finding the drivers for
 > >> my
 > >
 > >If it is not finding a driver that is nothing to do with libata. It means
 > >it's not being loaded by the distribution, or the distribution kernel is
 > >too old (2.6.22) for the hardware - in which case see the Fedora respins
 > >which are on 2.6.23.something right now.
 > >
 > >Alan
 > 
 > Home built kernel Alan.  But you are as good as anyone to tell me what I 
 > need to turn on in order for this dvdwriter to be enabled:
 > [   28.862478] ata2.00: ATAPI: LITE-ON DVDRW SHM-165H6S, HS06, max UDMA/66
 > 
 > [   28.908647] ata2.00: limited to UDMA/33 due to 40-wire cable
 > [   29.081253] ata2.00: configured for UDMA/33
 > 
 > it has had several 80 wire cables tried, hasn't fixed this, and does not
 > seem to effect its operation when it does work.
 > 
 > [   29.132405] scsi 1:0:0:0: CD-ROMLITE-ON  DVDRW SHM-165H6S 
 > HS06 PQ: 0 ANSI: 5
 > 
 > [   43.450795] scsi 1:0:0:0: Attached scsi generic sg1 type 5
 > ---
 > No further mention of it in dmesg, and k3b cannot find the drive at any 
 > /dev/sgX address.
 > 
 > .config attached, what else do I need to turn on?

...

 > # CONFIG_BLK_DEV_SR is not set

For starters, enable CONFIG_BLK_DEV_SR.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: futex local DoS on most architectures

2008-02-11 Thread Mikael Pettersson
Adrian Bunk writes:
 > The issue described in [1] is still present and unfixed (and even the 
 > fix there wasn't complete since it didn't cover SMP).
 > 
 > Thanks to Riku Voipio for noting that it is still unfixed.
 > 
 > cu
 > Adrian
 > 
 > [1] http://lkml.org/lkml/2007/8/1/474

I think calling it a local DoS may make people take it
less seriously.

The problem is not related to attacks or malice.
It's NORMAL futex usage on the affected architectures
that's broken and will throw the kernel into a loop.

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Small System Paging Problem - OOM-killer goes nuts

2007-11-25 Thread Mikael Pettersson
On Sun, 25 Nov 2007 15:02:15 -0700, Josh Goldsmith wrote:
>   I have a Linksys NSLU2 running 2.6.21 (I can replicate the problem on 
> 2.6.23 but it isn't fully supported on SlugOS).  It is a armv5teb device 
> with 32MB of RAM, 400+ MB swap on its 160GB USB2 root disk.  The machine is 
> used as a fileserver and to build packages for other ARM devices.  It may be 
> underpowered by today's standard but is a whole lot faster than my first 
> Linux system (386sx20 with 4MB RAM) but the whole system with disk uses <8 
> watts and is silent.
> 
>   The problem comes when I try to untar a large file (in this case 
> linux-2.6.23.tar.bz2).  Regardless if I kill off every other process, 
> eventually the oom-killer will appear and kill either the tar or the shell. 
> I've tried every tuning option I and my buddy Google could find including 
> (/proc/sys/vm/overcommit*) with no success.  I'm not worried about paging 
> impacting performance.
> 
>   I'd appreciate any help, pointers, or gentle taps with the cluebat.

I'm no VM tuning expert, but I have and still do heavy compile
jobs on similarly configured machines, with no OOM problems:

I regularly build 2.6 kernels and occasionally also gcc on a
100MHz 486 with 28MB of RAM and perhaps 500MB of swap. It runs
a standard but stripped down Fedora Core 4 user-space, with ext3
file systems and a kernel that doesn't include anything non-essential. 
The machine will swap madly, but the OOM killer never triggers.
(All system settings are FC4 defaults. I haven't touched them.)

In the past I did a fair amount of package rebuilds and test suite
runs on an NSLU2 myself, with a 2.4 Linksys/Openslug kernel, ext3,
and a 1GB or perhaps 2GB swap partition on a disk attached via a
USB2-to-PATA enclosure. Even when swapping heavily the OOM killer
wouldn't trigger.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Small System Paging Problem - OOM-killer goes nuts

2007-11-26 Thread Mikael Pettersson
On Sun, 25 Nov 2007 22:28:03 -0700, Josh Goldsmith wrote:
> Is your 486 running a IDE disk on a normal interface or via USB?  I wonder 
> if the NSLU2 only having I/O via USB might be significant.

My 486 has neither PCI nor USB, the disk is attached to a
plain ancient IDE port.

>  Also, this is a 
> 2.6 kernel and I've seen spurious reports across the internet about similar 
> oom-killer problems since about 2.6.7.

If it is, I don't think it's ARM-specific. The last two years
I've done a lot of work with 2.6 kernels on a DS101 ARM box.
It's similar to the NSLU2 except it has 64MB of RAM and a built-in
PCI PATA controller. I've stressed it quite a bit, but never
seen the OOM killer trigger on it.

So you using USB storage might be relevant.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: acpi ->video_device_list corruption

2007-12-12 Thread Mikael Pettersson
William Lee Irwin III writes:
 > On Wed, Dec 12, 2007 at 12:48:09PM +0100, Mikael Pettersson wrote:
 > > IMO the memset(ptr, 0, sizeof(*ptr)) idiom is both safer
 > > and avoids having to write an uninteresting type name.
 > 
 > How about this, then?

Looks good.

Acked-by: Mikael Pettersson <[EMAIL PROTECTED]>

 > 
 > The ->cap fields of struct acpi_video_device and struct acpi_video_bus
 > are 1B each, not 4B. The oversized memset()'s corrupted the subsequent
 > list_head fields. This resulted in silent corruption without
 > CONFIG_DEBUG_LIST and BUG's with it. This patch uses sizeof() to pass
 > the proper bounds to the memset() calls and thereby correct the bugs.
 > 
 > The patch was seen to resolve the issue on the affected system.
 > 
 > vs. 2.6.24-rc5
 > 
 > Signed-off-by: William Irwin <[EMAIL PROTECTED]>
 > 
 > diff --git a/drivers/acpi/video.c b/drivers/acpi/video.c
 > index 44a0d9b..bd77e81 100644
 > --- a/drivers/acpi/video.c
 > +++ b/drivers/acpi/video.c
 > @@ -577,7 +577,7 @@ static void acpi_video_device_find_cap(struct 
 > acpi_video_device *device)
 >  struct acpi_video_device_brightness *br = NULL;
 >  
 >  
 > -memset(>cap, 0, 4);
 > +memset(>cap, 0, sizeof(device->cap));
 >  
 >  if (ACPI_SUCCESS(acpi_get_handle(device->dev->handle, "_ADR", 
 > _dummy1))) {
 >  device->cap._ADR = 1;
 > @@ -697,7 +697,7 @@ static void acpi_video_bus_find_cap(struct 
 > acpi_video_bus *video)
 >  {
 >  acpi_handle h_dummy1;
 >  
 > -memset(>cap, 0, 4);
 > +memset(>cap, 0, sizeof(video->cap));
 >  if (ACPI_SUCCESS(acpi_get_handle(video->device->handle, "_DOS", 
 > _dummy1))) {
 >  video->cap._DOS = 1;
 >  }
 > --
 > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 > the body of a message to [EMAIL PROTECTED]
 > More majordomo info at  http://vger.kernel.org/majordomo-info.html
 > Please read the FAQ at  http://www.tux.org/lkml/
 > 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2.6.24-rc5] sym53c8xx_2 modpost section mismatch fix

2007-12-15 Thread Mikael Pettersson
Building 2.6.24-rc5 with

# CONFIG_HOTPLUG is not set
CONFIG_SCSI_SYM53C8XX_2=y

results in

WARNING: vmlinux.o(.text+0x14b36c): Section mismatch: reference to 
.exit.text:sym2_remove (between 'sym2_io_error_detected' and 
'sym_set_cam_result_error')

because sym2_io_error_detected() calls sym2_remove(), which is marked __devexit.

Fixed by removing the __devexit from sym2_remove().

Signed-off-by: Mikael Pettersson <[EMAIL PROTECTED]>

--- linux-2.6.24-rc5/drivers/scsi/sym53c8xx_2/sym_glue.c.~1~2007-12-15 
15:37:04.0 +0100
+++ linux-2.6.24-rc5/drivers/scsi/sym53c8xx_2/sym_glue.c2007-12-15 
16:22:08.0 +0100
@@ -1744,7 +1744,7 @@ static int __devinit sym2_probe(struct p
return -ENODEV;
 }
 
-static void __devexit sym2_remove(struct pci_dev *pdev)
+static void sym2_remove(struct pci_dev *pdev)
 {
struct Scsi_Host *shost = pci_get_drvdata(pdev);
 
@@ -2056,7 +2056,7 @@ static struct pci_driver sym2_driver = {
.name   = NAME53C8XX,
.id_table   = sym2_id_table,
.probe  = sym2_probe,
-   .remove = __devexit_p(sym2_remove),
+   .remove = sym2_remove,
.err_handler= _err_handler,
 };
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Make sysctl a separate filesystem

2008-02-15 Thread Mikael Pettersson
Andi Kleen writes:
 > Pavel Emelyanov <[EMAIL PROTECTED]> writes:
 > >this subdir;
 > > 3. sysctl inodes are now smaller than the procfs ones.
 > 
 > That's always a good thing.
 > 
 > > Note: update your initscripts to mount sysctl filesystem 
 > > right after the proc is mounted in order not to lose your
 > > /etc/sysctl.conf configuration (and optionally fstab).
 > 
 > That will break about everybody's init scripts I suspect.
 > 
 > I think you would need to go through some deprecation
 > period for this at least, with printks warning people
 > to fix their init scripts.
 > 
 > Or better find some way to do the mount automatically.

Doing it automatically is the only acceptable way, IMO.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Hardcoded instruction causes certain features to fail on ARM platfrom due to endianness

2012-10-15 Thread Mikael Pettersson
Yangfei (Felix) writes:
 > Hi all,
 > 
 > I found that hardcoded instruction in inline asm can cause certains 
 > certain features fail to work on ARM platform due to endianness.
 > As an example, consider the following code snippet of 
 > platform_do_lowpower function from arch/arm/mach-realview/hotplug.c:
 > / *
 >  * here's the WFI
 >  */
 > asm(".word  0xe320f003\n"
 > :
 > :
 > : "memory", "cc");
 > 
 > The instruction generated from this inline asm will not work on 
 > big-endian ARM platform, such as ARM BE-8 format. Instead, an exception will 
 > be generated.
 > 
 > Here the code should be:
 > / *
 >  * here's the WFI
 >  */
 > asm("WFI\n"
 > :
 > :
 > : "memory", "cc");
 > 
 > Seems the kernel doesn't support ARM BE-8 well. I don't know why this 
 > problem happens.
 > Can anyone tell me who owns this part? I can prepare a patch then. 
 > Thanks.

Questions regarding the ARM kernel should go to the linux-arm-kernel mailing 
list
(see the MAINTAINERS file), with an optional cc: to the regular LKML.

BE-8 is, if I recall correctly, ARMv7's broken format where code and data have
different endianess.  GAS supports an ".inst" directive which is like ".word"
except the data is assumed to be code.  This matters for disassembly, and may
also be required for BE-8.

That is, just s/.word/.inst/g above and report back if that works or not.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[3.6-rc3 regression] sata_mv cannot get optional clkdev breaking boot on QNAP TS-119P+

2012-08-25 Thread Mikael Pettersson
My Kirkwood-based QNAP TS-119P+ boots fine with the 3.5 kernel.  With
3.6-rc2 and 3.6-rc3 however sata_mv complains:

sata_mv sata_mv.0: cannot get optional clkdev
sata_mv sata_mv.0: slots 32 ports 2

and then the kernel grinds to a halt with no further messages.

Full boot log from 3.6-rc3 appended below.  .config available upon
request, but this is a non-DT kernel.

/Mikael

Uncompressing Linux... done, booting the kernel.
Booting Linux on physical CPU 0
Linux version 3.6.0-rc3 (mikpe@hallertau) (gcc version 4.6.4 20120706 
(prerelease) (GCC) ) #1 Sat Aug 25 14:43:24 CEST 2012
CPU: Feroceon 88FR131 [56251311] revision 1 (ARMv5TE), cr=00053977
CPU: VIVT data cache, VIVT instruction cache
Machine: QNAP TS-119/TS-219
Ignoring unrecognised tag 0x41000403
Memory policy: ECC disabled, Data cache writeback
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 130048
Kernel command line: console=ttyS0,115200n8 ro root=/dev/sda1
PID hash table entries: 2048 (order: 1, 8192 bytes)
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Memory: 512MB = 512MB total
Memory: 515884k/515884k available, 8404k reserved, 0K highmem
Virtual kernel memory layout:
vector  : 0x - 0x1000   (   4 kB)
fixmap  : 0xfff0 - 0xfffe   ( 896 kB)
vmalloc : 0xe080 - 0xff00   ( 488 MB)
lowmem  : 0xc000 - 0xe000   ( 512 MB)
modules : 0xbf00 - 0xc000   (  16 MB)
  .text : 0xc0008000 - 0xc0366000   (3448 kB)
  .init : 0xc0366000 - 0xc03844b8   ( 122 kB)
  .data : 0xc0386000 - 0xc03b32a0   ( 181 kB)
   .bss : 0xc03b32c4 - 0xc03c67c0   (  78 kB)
NR_IRQS:114
sched_clock: 32 bits at 200MHz, resolution 5ns, wraps every 21474ms
Console: colour dummy device 80x30
Calibrating delay loop... 1587.60 BogoMIPS (lpj=7938048)
pid_max: default: 32768 minimum: 301
Security Framework initialized
Mount-cache hash table entries: 512
CPU: Testing write buffer coherency: ok
Setting up static identity map for 0x2961e8 - 0x296224
devtmpfs: initialized
NET: Registered protocol family 16
DMA: preallocated 256 KiB pool for atomic coherent allocations
Kirkwood: MV88F6282-Rev-A0, TCLK=2.
Feroceon L2: Enabling L2
Feroceon L2: Cache support initialised.
Kirkwood PCIe port 0: 
link down, ignoring
bio: create slab  at 0
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
Switching to clocksource orion_clocksource
NET: Registered protocol family 2
TCP established hash table entries: 16384 (order: 5, 131072 bytes)
TCP bind hash table entries: 16384 (order: 4, 65536 bytes)
TCP: Hash tables configured (established 16384 bind 16384)
TCP: reno registered
UDP hash table entries: 256 (order: 0, 4096 bytes)
UDP-Lite hash table entries: 256 (order: 0, 4096 bytes)
NET: Registered protocol family 1
RPC: Registered named UNIX socket transport module.
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
RPC: Registered tcp NFSv4.1 backchannel transport module.
Installing knfsd (copyright (C) 1996 o...@monad.swb.de).
msgmni has been set to 1007
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
io scheduler noop registered
io scheduler deadline registered
io scheduler cfq registered (default)
mv_xor_shared mv_xor_shared.0: Marvell shared XOR driver
mv_xor_shared mv_xor_shared.1: Marvell shared XOR driver
mv_xor mv_xor.0: Marvell XOR: ( xor cpy )
mv_xor mv_xor.1: Marvell XOR: ( xor fill cpy )
mv_xor mv_xor.2: Marvell XOR: ( xor cpy )
mv_xor mv_xor.3: Marvell XOR: ( xor fill cpy )
Serial: 8250/16550 driver, 2 ports, IRQ sharing disabled
serial8250.0: ttyS0 at MMIO 0xf1012000 (irq = 33) is a 16550A
console [ttyS0] enabled
serial8250.1: ttyS1 at MMIO 0xf1012100 (irq = 34) is a 16550A
loop: module loaded
sata_mv sata_mv.0: cannot get optional clkdev
sata_mv sata_mv.0: slots 32 ports 2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.6-rc3 regression] sata_mv cannot get optional clkdev breaking boot on QNAP TS-119P+

2012-08-25 Thread Mikael Pettersson
Andrew Lunn writes:
 > On Sat, Aug 25, 2012 at 03:01:33PM +0200, Mikael Pettersson wrote:
 > > My Kirkwood-based QNAP TS-119P+ boots fine with the 3.5 kernel.  With
 > > 3.6-rc2 and 3.6-rc3 however sata_mv complains:
 > > 
 > > sata_mv sata_mv.0: cannot get optional clkdev
 > > sata_mv sata_mv.0: slots 32 ports 2
 > > 
 > > and then the kernel grinds to a halt with no further messages.
 > > 
 > > Full boot log from 3.6-rc3 appended below.  .config available upon
 > > request, but this is a non-DT kernel.
 > 
 > Hi Mikael
 > 
 > This is a known issue. See:
 > 
 > http://comments.gmane.org/gmane.linux.ports.arm.kernel/181989
 > 
 > There are patches being developed to fix this. I hope we can push them
 > to an RC soon.

Thanks for the pointer.  Aaro Koskinen's suggestion of passing
coherent_pool=1M to the kernel allowed it to boot successfully.
I'll use that workaround for now.

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Updated: [PATCH] hardening: add PROT_FINAL prot flag to mmap/mprotect

2012-10-04 Thread Mikael Pettersson
Ard Biesheuvel writes:
 > This patch adds support for the PROT_FINAL flag to
 > the mmap() and mprotect() syscalls.
 > 
 > The PROT_FINAL flag indicates that the requested set
 > of protection bits should be final, i.e., it shall
 > not be allowed for a subsequent mprotect call to
 > set protection bits that were not set already.
 > 
 > This is mainly intended for the dynamic linker,
 > which sets up the address space on behalf of
 > dynamic binaries. By using this flag, it can
 > prevent exploited code from remapping read-only
 > executable code or data sections read-write.

I can see why you might think this is a good idea, but I don't
like it for several reasons:

- If .text is mapped non-writable and final, how would a debugger
  (or any ptrace-using monitor-like application) plant a large
  number of breakpoints in a target process? Breakpoint registers
  aren't enough because (a) they're few in number, and (b) not
  all CPUs have them.

- You're proposing to give one component (the dynamic linker/
  loader) absolute power to impose new policies on all
  applications. How would an application that _deliberately_
  does something the new policies don't allow tell the dynamic
  linker or kernel to get out of its way?

This clearly changes the de-facto ABIs, and as such I think
it needs much more detailed analysis than what you've done
here.

At the very least I think this change should be opt-in, but
that would require a kernel option or sysctl, or some config
file for the user-space dynamic linker/loader.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: HDD problem, software bug, bios bug, or hardware ?

2012-09-08 Thread Mikael Pettersson
Adko Branil writes:
 > After updating bios no more crashes happened, i tested it many times
 > on heavy HDD IO loads, with many kernels (including CONFIG_PREEMPT
 > kernels). But now if enable "Cool'n' Quiet" option in bios, 
 > CONFIG_PREEMPT_VOLUNTARY kernel with passed "nosmp" at boot time,
 > crashes during boot process with kernel panic, while  CONFIG_PREEMPT
 > kernlel without "nosmp" works fine  - but it is another story i think,
 > should not be related with the crashes when it was old bios, and i
 > think it is probably "nosmp" the reason. (i have never changed cpu
 > frequency of this cpu at all) When "Cool'n' Quiet" is disabled, the
 > system works perfectly adequately with all kind of kernels i tried.
 > Except that this warning message in dmesg still appears (if it is
 > problem at all). I put here this message for "nosmp" case as well,
 > kernel is 3.5.2:
 > 
 > [    1.912494] =
 > [    1.912494] [ INFO: inconsistent lock state ]
 > [    1.912494] 3.5.2 #4 Not tainted
 > [    1.912494] -
 > [    1.912494] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
 > [    1.912494] swapper/0/1 [HC1[1]:SC1[1]:HE0:SE0] takes:
 > [    1.912494]  (&(>lock)->rlock){?.+...}, at: [] 
 > ata_bmdma_interrupt+0x27/0x1d0
 > [    1.912494] {HARDIRQ-ON-W} state was registered at:
 > [    1.912494]   [] __lock_acquire+0x61b/0x1af0
 > [    1.912494]   [] lock_acquire+0x8a/0x110
 > [    1.912494]   [] _raw_spin_lock+0x31/0x40
 > [    1.912494]   [] pdc_sata_hardreset+0x85/0x100

Please try the patch below, which implements the fix I described a
week ago. It's for 3.6-rc4 but should work in any recent kernel.
Without this patch one of my test machines always throws a lockdep
warning involving pdc_sata_hardreset and pdc_interrupt during bootup,
but with the patch the warning is gone, as expected.

If it works for you I'll add your Tested-by: and submit it properly.

/Mikael

--- linux-3.6-rc4/drivers/ata/sata_promise.c.~1~2012-09-08 
12:18:24.0 +0200
+++ linux-3.6-rc4/drivers/ata/sata_promise.c2012-09-08 17:55:49.0 
+0200
@@ -147,6 +147,10 @@ struct pdc_port_priv {
dma_addr_t  pkt_dma;
 };
 
+struct pdc_host_priv {
+   spinlock_t hard_reset_lock;
+};
+
 static int pdc_sata_scr_read(struct ata_link *link, unsigned int sc_reg, u32 
*val);
 static int pdc_sata_scr_write(struct ata_link *link, unsigned int sc_reg, u32 
val);
 static int pdc_ata_init_one(struct pci_dev *pdev, const struct pci_device_id 
*ent);
@@ -801,9 +805,10 @@ static void pdc_hard_reset_port(struct a
void __iomem *host_mmio = ap->host->iomap[PDC_MMIO_BAR];
void __iomem *pcictl_b1_mmio = host_mmio + PDC_PCI_CTL + 1;
unsigned int ata_no = pdc_ata_port_to_ata_no(ap);
+   struct pdc_host_priv *hpriv = ap->host->private_data;
u8 tmp;
 
-   spin_lock(>host->lock);
+   spin_lock(>hard_reset_lock);
 
tmp = readb(pcictl_b1_mmio);
tmp &= ~(0x10 << ata_no);
@@ -814,7 +819,7 @@ static void pdc_hard_reset_port(struct a
writeb(tmp, pcictl_b1_mmio);
readb(pcictl_b1_mmio); /* flush */
 
-   spin_unlock(>host->lock);
+   spin_unlock(>hard_reset_lock);
 }
 
 static int pdc_sata_hardreset(struct ata_link *link, unsigned int *class,
@@ -1182,6 +1187,7 @@ static int pdc_ata_init_one(struct pci_d
const struct ata_port_info *pi = _port_info[ent->driver_data];
const struct ata_port_info *ppi[PDC_MAX_PORTS];
struct ata_host *host;
+   struct pdc_host_priv *hpriv;
void __iomem *host_mmio;
int n_ports, i, rc;
int is_sataii_tx4;
@@ -1218,6 +1224,11 @@ static int pdc_ata_init_one(struct pci_d
dev_err(>dev, "failed to allocate host\n");
return -ENOMEM;
}
+   hpriv = devm_kzalloc(>dev, sizeof *hpriv, GFP_KERNEL);
+   if (!hpriv)
+   return -ENOMEM;
+   spin_lock_init(>hard_reset_lock);
+   host->private_data = hpriv;
host->iomap = pcim_iomap_table(pdev);
 
is_sataii_tx4 = pdc_is_sataii_tx4(pi->flags);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Make sysctl a separate filesystem

2008-02-22 Thread Mikael Pettersson
Al Viro writes:
 > On Fri, Feb 15, 2008 at 12:35:23PM +0100, Mikael Pettersson wrote:
 > > Andi Kleen writes:
 > >  > Pavel Emelyanov <[EMAIL PROTECTED]> writes:
 > >  > >this subdir;
 > >  > > 3. sysctl inodes are now smaller than the procfs ones.
 > >  > 
 > >  > That's always a good thing.
 > >  > 
 > >  > > Note: update your initscripts to mount sysctl filesystem 
 > >  > > right after the proc is mounted in order not to lose your
 > >  > > /etc/sysctl.conf configuration (and optionally fstab).
 > >  > 
 > >  > That will break about everybody's init scripts I suspect.
 > >  > 
 > >  > I think you would need to go through some deprecation
 > >  > period for this at least, with printks warning people
 > >  > to fix their init scripts.
 > >  > 
 > >  > Or better find some way to do the mount automatically.
 > > 
 > > Doing it automatically is the only acceptable way, IMO.
 > 
 > When and where?

I don't really care, as long as existing user-space doesn't get broken.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[BUG] 2.6.25-rc2-git8 fails to boot on 486 due to TSC breakage

2008-02-24 Thread Mikael Pettersson
The kernel for this 486 has CONFIG_M486=y and CONFIG_M586TSC=n,
but the 2.6.25 kernels still try to access the TSC. Here's the
oops from 2.6.25-rc2-git8:

Pid: 0, comm: swapper Not tainted (2.6.25-rc2-git8 #1)
EIP: 0060:[] EFLAGS: 00010002 CPU: 0
EIP is at native_read_tsc+0x6/0x10
EAX: 8ce6 EBX: c19f8620 ECX: c19f8620 EDX: 00300100
ESI: 00300100 EDI: 0001 EBP: c19f7578 ESP: c02a7eec
 DS: 007b ES: 007b FS:  GS:  SS: 0068
Process swapper (pid: 0, ti=c02a6000 task=c028d300 task.ti=c02a6000)
Stack: c01b37e5  1000 8ce6  c19f7578  c019a2e7 
   c02e05cc c02e05cc c01da229 0016 c01da302 1000 0001 c029d4e0 
   c02e05cc 0050 c01dee35  c02e05cc c01dfc18 c02e0580 c1834ca0 
Call Trace:
 [] add_timer_randomness+0x115/0x170
 [] __blk_end_request+0x17/0x50
 [] __ide_end_request+0x39/0xe0
 [] ide_end_request+0x32/0x50
 [] task_end_request+0x25/0x70
 [] task_in_intr+0xd8/0xe0
 [] ide_intr+0x7a/0x1a0
 [] task_in_intr+0x0/0xe0
 [] run_timer_softirq+0x12/0x150
 [] handle_IRQ_event+0x30/0x70
 [] handle_level_irq+0x42/0x90
 [] do_IRQ+0x41/0x70
 [] common_interrupt+0x23/0x30
 [] arm_timer+0xa0/0x2b0
 [] default_idle+0x3d/0x60
 [] default_idle+0x0/0x60
 [] cpu_idle+0x20/0x70
 [] start_kernel+0x1e3/0x260
 ===
Code: 90 90 90 90 90 b8 8e 21 00 00 e9 a6 28 0a 00 8d b6 00 00 00 00 e6 ed c3 
90 90 90 90 90 90 90 90 9 
EIP: [] native_read_tsc+0x6/0x10 SS:ESP 0068:c02a7eec
Kernel panic - not syncing: Fatal exception in interrupt

This bug is also seen with 2.6.25-rc1.
Kernels up to and including 2.6.24 did not have this bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.25-rc2-git8 fails to boot on 486 due to TSC breakage

2008-02-24 Thread Mikael Pettersson
Ingo Molnar writes:
 > 
 > * Mikael Pettersson <[EMAIL PROTECTED]> wrote:
 > 
 > > The kernel for this 486 has CONFIG_M486=y and CONFIG_M586TSC=n, but 
 > > the 2.6.25 kernels still try to access the TSC. Here's the oops from 
 > > 2.6.25-rc2-git8:
 > 
 > hm, could you send me the full .config you used?

I've put it here:
<http://user.it.uu.se/~mikpe/linux/tmp/config-2.6.24-git8>

Meanwhile, I've traced the breakage to 2.6.24-git8.

2.6.24-git8 changed include/asm-x86/tsc.h:get_cycles() to call
rdtscll() even if CONFIG_X86_TSC isn't set. The call is protected
by a cpu_has_tsc test, but starting with 2.6.24-git8 cpu_has_tsc
is non-zero on this machine, which is very very wrong.

Diffing dmesg between git7 and git8 doesn't sched any light since
git8 also removed the printouts of the x86 caps as they were being
initialised and updated. I'm currently adding those printouts back
in the hope of seeing where and when the caps get broken.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.25-rc2-git8 fails to boot on 486 due to TSC breakage

2008-02-24 Thread Mikael Pettersson
Mikael Pettersson writes:
 > Ingo Molnar writes:
 >  > 
 >  > * Mikael Pettersson <[EMAIL PROTECTED]> wrote:
 >  > 
 >  > > The kernel for this 486 has CONFIG_M486=y and CONFIG_M586TSC=n, but 
 >  > > the 2.6.25 kernels still try to access the TSC. Here's the oops from 
 >  > > 2.6.25-rc2-git8:
 >  > 
 >  > hm, could you send me the full .config you used?
 > 
 > I've put it here:
 > <http://user.it.uu.se/~mikpe/linux/tmp/config-2.6.24-git8>
 > 
 > Meanwhile, I've traced the breakage to 2.6.24-git8.
 > 
 > 2.6.24-git8 changed include/asm-x86/tsc.h:get_cycles() to call
 > rdtscll() even if CONFIG_X86_TSC isn't set. The call is protected
 > by a cpu_has_tsc test, but starting with 2.6.24-git8 cpu_has_tsc
 > is non-zero on this machine, which is very very wrong.
 > 
 > Diffing dmesg between git7 and git8 doesn't sched any light since
 > git8 also removed the printouts of the x86 caps as they were being
 > initialised and updated. I'm currently adding those printouts back
 > in the hope of seeing where and when the caps get broken.

That turned out to be very illuminating:

--- dmesg-2.6.24-git7   2008-02-24 18:01:25.295851000 +0100
+++ dmesg-2.6.24-git8   2008-02-24 18:01:25.530358000 +0100
...
 CPU: After generic identify, caps: 0003    
   
 
 CPU: After all inits, caps: 0003     
  
+CPU: After applying cleared_cpu_caps, caps: 0013   
    

Notice how the TSC cap bit goes from Off to On.

(The first two lines are printout loops from -git7 forward-ported
to -git8, the third line is the same printout loop added just after
the xor-with-cleared_cpu_caps[] loop.)

Here's how the breakage occurs:
1. arch/x86/kernel/tsc_32.c:tsc_init() sees !cpu_has_tsc,
   so bails and calls setup_clear_cpu_cap(X86_FEATURE_TSC).
2. include/asm-x86/cpufeature.h:setup_clear_cpu_cap(bit) clears
   the bit in boot_cpu_data and sets it in cleared_cpu_caps
3. arch/x86/kernel/cpu/common.c:identify_cpu() XORs all caps
   in with cleared_cpu_caps
   HOWEVER, at this point c->x86_capability correctly has TSC
   Off, cleared_cpu_caps has TSC On, so the XOR incorrectly
   sets TSC to On in c->x86_capability, with disastrous results.

The real bug is that clearing bits with XOR only works if the
bits are known to be 1 prior to the XOR, and that's not true here.

A simple fix is to convert the XOR to AND-NOT instead. The following
patch does that, and allows my 486 to boot 2.6.25-rc kernels again.

Signed-off-by: Mikael Pettersson <[EMAIL PROTECTED]>
---
There's a similar XOR loop in arch/x86/kernel/setup_64.c.
I haven't seen it fail yet, but perhaps it should be changed
too, for robustness and symmetry.

--- linux-2.6.25-rc2-git8/arch/x86/kernel/cpu/common.c.~1~  2008-02-24 
17:42:56.0 +0100
+++ linux-2.6.25-rc2-git8/arch/x86/kernel/cpu/common.c  2008-02-24 
17:44:06.0 +0100
@@ -504,7 +504,7 @@ void __cpuinit identify_cpu(struct cpuin
 
/* Clear all flags overriden by options */
for (i = 0; i < NCAPINTS; i++)
-   c->x86_capability[i] ^= cleared_cpu_caps[i];
+   c->x86_capability[i] &= ~cleared_cpu_caps[i];
 
/* Init Machine Check Exception if available. */
mcheck_init(c);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: HDD problem, software bug, bios bug, or hardware ?

2012-09-02 Thread Mikael Pettersson
Adko Branil writes:
 > >Right near the end there's a lockdep warning about a deadlock
 > 
 > >between sata_promise's hardreset thing and the machine getting a
 > >ata_bmdma_interrupt.
 > 
 > >But since I don't know this code, it would be nice if you could take a
 > >look at it.
 > 
 > I picked up 3 more dmesg after rebooting, and 2 more oopses.
 >  I will put here just pieces from dmesgs about these locks, they differs 
 > slightly each-other:
 > 
 > ***
 > 1.
 > 
 > 
 > [    1.859215] input: AT Translated Set 2 keyboard as 
 > /devices/platform/i8042/serio0/input/input1
 > [    1.943678] 
 > [    1.943679] =
 > [    1.943680] [ INFO: inconsistent lock state ]
 > [    1.943682] 3.5.2 #4 Not tainted
 > [    1.943683] -
 > [    1.943684] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
 > [    1.943686] swapper/1/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
 > [    1.943687]  (&(>lock)->rlock){?.+...}, at: [] 
 > ata_bmdma_interrupt+0x27/0x1d0
 > [    1.943695] {HARDIRQ-ON-W} state was registered at:
 > [    1.943696]   [] __lock_acquire+0x61b/0x1af0
 > [    1.943701]   [] lock_acquire+0x8a/0x110
 > [    1.943703]   [] _raw_spin_lock+0x31/0x40
 > [    1.943708]   [] pdc_sata_hardreset+0x85/0x100
 > [    1.943711]   [] ata_do_reset+0x3a/0x90
 > [    1.943713]   [] ata_eh_reset+0x372/0xe00
 > [    1.943716]   [] ata_eh_recover+0x2a5/0x13d0
 > [    1.943718]   [] ata_do_eh+0x4d/0xb0
 > [    1.943721]   [] ata_sff_error_handler+0xca/0x120
 > [    1.943723]   [] pdc_error_handler+0x24/0x30
 > [    1.943725]   [] ata_scsi_port_error_handler+0x47c/0x800
 > [    1.943728]   [] ata_scsi_error+0x9e/0xd0
 > [    1.943730]   [] scsi_error_handler+0xf8/0x500
 > [    1.943734]   [] kthread+0xae/0xc0
 > [    1.943737]   [] kernel_thread_helper+0x4/0x10
 > [    1.943740] irq event stamp: 51304
 > [    1.943741] hardirqs last  enabled at (51301): [] 
 > default_idle+0x5d/0x1b0
 > [    1.943745] hardirqs last disabled at (51302): [] 
 > common_interrupt+0x67/0x6c
 > [    1.943748] softirqs last  enabled at (51304): [] 
 > _local_bh_enable+0x13/0x20
 > [    1.943752] softirqs last disabled at (51303): [] 
 > irq_enter+0x75/0x90
 > [    1.943754] 
 > [    1.943754] other info that might help us debug this:
 > [    1.943755]  Possible unsafe locking scenario:
 > [    1.943755] 
 > [    1.943755]    CPU0
 > [    1.943755]    
 > [    1.943756]   lock(&(>lock)->rlock);
 > [    1.943757]   
 > [    1.943758] lock(&(>lock)->rlock);

I was initially able to reproduce the lockdep warning, and wrote
a crude test patch, but now I can't seem to reproduce the warning
with or without that patch, so I'm not sure what to make of it.

pdc_hard_reset_port needs to serialize because hard reset has to flip
a port-specific bit in a controller register that's shared by all ports,
so it takes the host lock. But now an interrupt occurs during the hard
reset, and pdc_interrupt also has to take the host lock. (I don't know
why the interrupt occurs, hotplug events are supposed to have been masked
by ->freeze before ->hardreset. It might come from a different device,
my test machine has multiple ATA controllers from different vendors,
and some of them do share IRQ.)

Jeff: ->hardreset is called with the host lock NOT held, right?

I think I'll have to introduce a new private lock just for serializing
pdc_hard_reset_port. Expect a patch next weekend (I'll be away from
my Promise test equipment until then.)

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] sata_promise: fix hardreset lockdep error

2012-09-16 Thread Mikael Pettersson
sata_promise's pdc_hard_reset_port() needs to serialize because it
flips a port-specific bit in controller register that's shared by
all ports. The code takes the ata host lock for this, but that's
broken because an interrupt may arrive on our irq during the hard
reset sequence, and that too will take the ata host lock. With
lockdep enabled a big nasty warning is seen.

Fixed by adding private state to the ata host structure, containing
a second lock used only for serializing the hard reset sequences.
This eliminated the lockdep warnings both on my test rig and on
the original reporter's machine.

Signed-off-by: Mikael Pettersson 
Tested-by: Adko Branil 
Cc: sta...@vger.kernel.org
---
This bug affects 2.6.32 and newer kernels.

 drivers/ata/sata_promise.c |   15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff -rupN linux-3.6-rc5/drivers/ata/sata_promise.c 
linux-3.6-rc5.sata_promise-hardreset-lockdep-fix/drivers/ata/sata_promise.c
--- linux-3.6-rc5/drivers/ata/sata_promise.c2012-09-09 16:31:11.0 
+0200
+++ linux-3.6-rc5.sata_promise-hardreset-lockdep-fix/drivers/ata/sata_promise.c 
2012-09-09 16:36:38.0 +0200
@@ -147,6 +147,10 @@ struct pdc_port_priv {
dma_addr_t  pkt_dma;
 };
 
+struct pdc_host_priv {
+   spinlock_t hard_reset_lock;
+};
+
 static int pdc_sata_scr_read(struct ata_link *link, unsigned int sc_reg, u32 
*val);
 static int pdc_sata_scr_write(struct ata_link *link, unsigned int sc_reg, u32 
val);
 static int pdc_ata_init_one(struct pci_dev *pdev, const struct pci_device_id 
*ent);
@@ -801,9 +805,10 @@ static void pdc_hard_reset_port(struct a
void __iomem *host_mmio = ap->host->iomap[PDC_MMIO_BAR];
void __iomem *pcictl_b1_mmio = host_mmio + PDC_PCI_CTL + 1;
unsigned int ata_no = pdc_ata_port_to_ata_no(ap);
+   struct pdc_host_priv *hpriv = ap->host->private_data;
u8 tmp;
 
-   spin_lock(>host->lock);
+   spin_lock(>hard_reset_lock);
 
tmp = readb(pcictl_b1_mmio);
tmp &= ~(0x10 << ata_no);
@@ -814,7 +819,7 @@ static void pdc_hard_reset_port(struct a
writeb(tmp, pcictl_b1_mmio);
readb(pcictl_b1_mmio); /* flush */
 
-   spin_unlock(>host->lock);
+   spin_unlock(>hard_reset_lock);
 }
 
 static int pdc_sata_hardreset(struct ata_link *link, unsigned int *class,
@@ -1182,6 +1187,7 @@ static int pdc_ata_init_one(struct pci_d
const struct ata_port_info *pi = _port_info[ent->driver_data];
const struct ata_port_info *ppi[PDC_MAX_PORTS];
struct ata_host *host;
+   struct pdc_host_priv *hpriv;
void __iomem *host_mmio;
int n_ports, i, rc;
int is_sataii_tx4;
@@ -1218,6 +1224,11 @@ static int pdc_ata_init_one(struct pci_d
dev_err(>dev, "failed to allocate host\n");
return -ENOMEM;
}
+   hpriv = devm_kzalloc(>dev, sizeof *hpriv, GFP_KERNEL);
+   if (!hpriv)
+   return -ENOMEM;
+   spin_lock_init(>hard_reset_lock);
+   host->private_data = hpriv;
host->iomap = pcim_iomap_table(pdev);
 
is_sataii_tx4 = pdc_is_sataii_tx4(pi->flags);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] kcmp: enable the kcmp syscall when C/R is enabled

2012-12-19 Thread Mikael Pettersson
Alexander Kartashov writes:
 > --- a/arch/arm/kernel/calls.S
 > +++ b/arch/arm/kernel/calls.S
 > @@ -387,7 +387,7 @@
 >  /* 375 */   CALL(sys_setns)
 >  CALL(sys_process_vm_readv)
 >  CALL(sys_process_vm_writev)
 > -CALL(sys_ni_syscall)/* reserved for sys_kcmp */
 > +CALL(sys_kcmp)  /* reserved for sys_kcmp */

The /* reserved for sys_kcmp */ comment is now obsolete and should be removed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[3.7.0 regression] rt2x00lib_probe_dev: Error - Failed to initialize hw

2012-12-17 Thread Mikael Pettersson
I just updated an old 1st gen AMD64 laptop from kernel 3.6.0 to 3.7.0,
Fedora 15 user-space, and was greeted by the following kernel warning:

WARNING: at net/wireless/core.c:389 wiphy_register+0x5c3/0x600 [cfg80211]()
Hardware name: SAM#451B
Modules linked in: rt2500pci(+) snd_mpu401_uart rt2x00pci snd_rawmidi rt2x00lib 
sg snd_seq_device mac80211 cfg80211 amd64_agp agpgart snd eeprom_93cx6 
soundcore via_rhine mii evdev ipv6 ehci_hcd uhci_hcd sr_mod cdrom usbcore 
usb_common
Pid: 378, comm: modprobe Not tainted 3.7.0 #1
Call Trace:
 [] ? warn_slowpath_common+0x79/0xc0
 [] ? wiphy_register+0x5c3/0x600 [cfg80211]
 [] ? __kmalloc+0xef/0x170
 [] ? ieee80211_register_hw+0x347/0x6d0 [mac80211]
 [] ? rt2x00lib_probe_dev+0x53d/0x720 [rt2x00lib]
 [] ? rt2x00pci_probe+0x168/0x2e4 [rt2x00pci]
 [] ? pci_device_probe+0xd0/0x170
 [] ? driver_probe_device+0x64/0x210
 [] ? __driver_attach+0x93/0xa0
 [] ? driver_probe_device+0x210/0x210
 [] ? bus_for_each_dev+0x45/0x80
 [] ? bus_add_driver+0x178/0x250
 [] ? driver_register+0x6e/0x170
 [] ? notifier_call_chain+0x44/0x60
 [] ? 0xa0195fff
 [] ? do_one_initcall+0x11a/0x160
 [] ? sys_init_module+0x76/0x1c0
 [] ? system_call_fastpath+0x16/0x1b
---[ end trace f8413f27de31929d ]---
phy0 -> rt2x00lib_probe_dev: Error - Failed to initialize hw.
rt2500pci: probe of :00:0c.0 failed with error -22

Rebooting into 3.6.0 shows no such error, booting 3.7.0 again shows the error,
so it's completely reproducible.

lspci -v for the device in question:

00:0c.0 Network controller: Ralink corp. RT2500 802.11g (rev 01)
Subsystem: Micro-Star International Co., Ltd. Unknown 802.11g mini-PCI 
Adapter
Flags: bus master, slow devsel, latency 64, IRQ 18
Memory at d000 (32-bit, non-prefetchable) [size=8K]
Capabilities: [40] Power Management version 2
Kernel driver in use: rt2500pci
Kernel modules: rt2500pci

I've never actually used or configured this device (the laptop is always
on wired ethernet), but thought you should know about this possible driver
regression.

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.7.0 regression] rt2x00lib_probe_dev: Error - Failed to initialize hw

2012-12-17 Thread Mikael Pettersson
devendra.aaru writes:
 > On Mon, Dec 17, 2012 at 5:51 AM, Mikael Pettersson  wrote:
 > > I just updated an old 1st gen AMD64 laptop from kernel 3.6.0 to 3.7.0,
 > > Fedora 15 user-space, and was greeted by the following kernel warning:
 > >
 > > WARNING: at net/wireless/core.c:389 wiphy_register+0x5c3/0x600 [cfg80211]()
 > 
 > I am seeing this line actually when i do vim +389 net/wireless/core.c.
 > 
 > u16 all_iftypes = 0;
 > 
 > its good if you tell us the top sha1 of yours?

It's the plain linux-3.7.tar.xz from kernel.org, no git involved.
Lines 389-390 of net/wireless/core.c are:

if (WARN_ON(c->max_interfaces < 2))
return -EINVAL;

/Mikael

 > 
 > 
 > > Hardware name: SAM#451B
 > > Modules linked in: rt2500pci(+) snd_mpu401_uart rt2x00pci snd_rawmidi 
 > > rt2x00lib sg snd_seq_device mac80211 cfg80211 amd64_agp agpgart snd 
 > > eeprom_93cx6 soundcore via_rhine mii evdev ipv6 ehci_hcd uhci_hcd sr_mod 
 > > cdrom usbcore usb_common
 > > Pid: 378, comm: modprobe Not tainted 3.7.0 #1
 > > Call Trace:
 > >  [] ? warn_slowpath_common+0x79/0xc0
 > >  [] ? wiphy_register+0x5c3/0x600 [cfg80211]
 > >  [] ? __kmalloc+0xef/0x170
 > >  [] ? ieee80211_register_hw+0x347/0x6d0 [mac80211]
 > >  [] ? rt2x00lib_probe_dev+0x53d/0x720 [rt2x00lib]
 > >  [] ? rt2x00pci_probe+0x168/0x2e4 [rt2x00pci]
 > >  [] ? pci_device_probe+0xd0/0x170
 > >  [] ? driver_probe_device+0x64/0x210
 > >  [] ? __driver_attach+0x93/0xa0
 > >  [] ? driver_probe_device+0x210/0x210
 > >  [] ? bus_for_each_dev+0x45/0x80
 > >  [] ? bus_add_driver+0x178/0x250
 > >  [] ? driver_register+0x6e/0x170
 > >  [] ? notifier_call_chain+0x44/0x60
 > >  [] ? 0xa0195fff
 > >  [] ? do_one_initcall+0x11a/0x160
 > >  [] ? sys_init_module+0x76/0x1c0
 > >  [] ? system_call_fastpath+0x16/0x1b
 > > ---[ end trace f8413f27de31929d ]---
 > > phy0 -> rt2x00lib_probe_dev: Error - Failed to initialize hw.
 > > rt2500pci: probe of :00:0c.0 failed with error -22
 > >
 > > Rebooting into 3.6.0 shows no such error, booting 3.7.0 again shows the 
 > > error,
 > > so it's completely reproducible.
 > >
 > 
 > there are lots of WARN_ON's btw, :) may be one of them triggered, but
 > please tell us the top sha1 of yours, so that the wireless dev's can
 > easily see whats' happening with the rt2x00 to trigger this :)
 > 
 > 
 > > lspci -v for the device in question:
 > >
 > > 00:0c.0 Network controller: Ralink corp. RT2500 802.11g (rev 01)
 > > Subsystem: Micro-Star International Co., Ltd. Unknown 802.11g 
 > > mini-PCI Adapter
 > > Flags: bus master, slow devsel, latency 64, IRQ 18
 > > Memory at d000 (32-bit, non-prefetchable) [size=8K]
 > > Capabilities: [40] Power Management version 2
 > > Kernel driver in use: rt2500pci
 > > Kernel modules: rt2500pci
 > >
 > > I've never actually used or configured this device (the laptop is always
 > > on wired ethernet), but thought you should know about this possible driver
 > > regression.
 > >
 > > /Mikael
 > > --
 > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 > > the body of a message to majord...@vger.kernel.org
 > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
 > > Please read the FAQ at  http://www.tux.org/lkml/
 > 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.7.0 regression] rt2x00lib_probe_dev: Error - Failed to initialize hw

2012-12-17 Thread Mikael Pettersson
Gertjan van Wingerde writes:
 > Mikael, Devendra,
 > 
 > On Mon, Dec 17, 2012 at 1:59 PM, devendra.aaru  
 > wrote:
 > > On Mon, Dec 17, 2012 at 7:22 AM, Mikael Pettersson  wrote:
 > >> devendra.aaru writes:
 > >>  > On Mon, Dec 17, 2012 at 5:51 AM, Mikael Pettersson  
 > >> wrote:
 > >>  > > I just updated an old 1st gen AMD64 laptop from kernel 3.6.0 to 
 > >> 3.7.0,
 > >>  > > Fedora 15 user-space, and was greeted by the following kernel 
 > >> warning:
 > >>  > >
 > >>  > > WARNING: at net/wireless/core.c:389 wiphy_register+0x5c3/0x600 
 > >> [cfg80211]()
...
 > This is caused by the introduction of interface combinations. Helmut
 > Schaa has already submitted a patch to fix this, but this has
 > unfortunately not ended up in 3.7. I'm confident it will end up in one
 > of the upcoming 3.7.x stable releases.
 > 
 > See http://marc.info/?l=linux-wireless=135478723823922=2 for the
 > patch submitted by Helmut.

That patch fixes the problem.  Thanks.

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: container-of Implementation

2013-01-14 Thread Mikael Pettersson
Schrober writes:
 > Hi,
 > 
 > I wondered why the container_of implementation is so complicated.
 > 
 > #define container_of(ptr, type, member) ({   \
 >  const typeof( ((type *)0)->member ) *__mptr = (ptr);\
 >  (type *)( (char *)__mptr - offsetof(type,member) );})
 > 
 > isn't the __mptr not unnecessary? Why not following version?
 > 
 > #define container_of(ptr, type, member) \
 > ((type *)((char *)(ptr) - offsetof(type, member)))

Compile-time type checking.  The first version requires ptr to be
assignment-compatible with the type of the struct member, the second
version accepts random junk for ptr.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[3.7-rc5/rc6 regression] "drm/nvc0/disp: fix regression in vblank semaphore release" broke nouveau driver and mplayer

2012-11-17 Thread Mikael Pettersson
mplayer worked fine on my Dell Latitude E6510 (nVidia GT218 [NVS 3100M] 
graphics)
up to and including kernel 3.7-rc4. However, with 3.7-rc5 or -rc6, any attempt 
to
run mplayer just blanks the screen, shows some stray white pixels in the upper 
left
corner, kills the X server, and spews the following errors from the kernel:

nouveau E[  PGRAPH][:01:00.0] TRAP_M2MF NOTIFY
nouveau E[  PGRAPH][:01:00.0] TRAP_M2MF 00304041 43e0  06000434
nouveau  [  PGRAPH][:01:00.0]  TRAP
nouveau E[  PGRAPH][:01:00.0] ch 2 [0x001fb44000] subc 3 class 0x5039 mthd 
0x0328 data 0x
nouveau E[ PFB][:01:00.0] trapped read at 0x002001a020 on channel 
0x0001fb44 SEMAPHORE_BG/PFIFO_READ/00 reason: PAGE_NOT_PRESENT
nouveau  [   PFIFO][:01:00.0] CACHE_ERROR - Ch 2/4 Mthd 0x0068 Data 
0x
nouveau E[  PGRAPH][:01:00.0] TRAP_DISPATCH (unknown 0x0004)
nouveau  [  PGRAPH][:01:00.0]  ILLEGAL_MTHD ILLEGAL_CLASS
nouveau E[  PGRAPH][:01:00.0] ch -1 [0x001fb44000] subc 3 class 0x mthd 
0x0860 data 0x
nouveau E[ PFB][:01:00.0] trapped read at 0x002001a024 on channel 
0x0001fb44 PFIFO/PFIFO_READ/SEMAPHORE reason: PAGE_NOT_PRESENT
nouveau  [  PGRAPH][:01:00.0]  ILLEGAL_MTHD ILLEGAL_CLASS
nouveau E[  PGRAPH][:01:00.0] ch -1 [0x001fb44000] subc 3 class 0x mthd 
0x0860 data 0x
nouveau  [  PGRAPH][:01:00.0]  ILLEGAL_MTHD ILLEGAL_CLASS
nouveau E[  PGRAPH][:01:00.0] ch -1 [0x001fb44000] subc 3 class 0x mthd 
0x0860 data 0x
nouveau  [  PGRAPH][:01:00.0]  ILLEGAL_MTHD ILLEGAL_CLASS

(this bit repeats itself for 700+ lines)

nouveau E[  PGRAPH][:01:00.0] ch -1 [0x001fb44000] subc 3 class 0x mthd 
0x0860 data 0x
nouveau  [  PGRAPH][:01:00.0]  ILLEGAL_MTHD ILLEGAL_CLASS
nouveau E[  PGRAPH][:01:00.0] ch -1 [0x001fb44000] subc 5 class 0x mthd 
0x0860 data 0x
nouveau  [  PGRAPH][:01:00.0]  ILLEGAL_MTHD ILLEGAL_CLASS

(this bit repeats itself for 30+ lines)

nouveau E[  PGRAPH][:01:00.0] ch -1 [0x001fb44000] subc 5 class 0x mthd 
0x03c4 data 0x4000
nouveau  [  PGRAPH][:01:00.0]  ILLEGAL_MTHD ILLEGAL_CLASS
nouveau E[  PGRAPH][:01:00.0] ch -1 [0x001fb44000] subc 5 class 0x mthd 
0x03c8 data 0x
nouveau E[ PFB][:01:00.0] trapped write at 0x00 on channel 
0x0001fb44 PGRAPH/DISPATCH/GRCTX reason: DMAOBJ_LIMIT
nouveau E[  PGRAPH][:01:00.0] TRAP_DISPATCH (unknown 0x0004)
nouveau  [  PGRAPH][:01:00.0]  ILLEGAL_MTHD ILLEGAL_CLASS
nouveau E[  PGRAPH][:01:00.0] ch -1 [0x001fb44000] subc 5 class 0x mthd 
0x03cc data 0x4000
nouveau E[ PFB][:01:00.0] trapped write at 0x000420 on channel 
0x0001fb44 PGRAPH/DISPATCH/GRCTX reason: DMAOBJ_LIMIT
nouveau  [  PGRAPH][:01:00.0]  ILLEGAL_MTHD ILLEGAL_CLASS
nouveau E[  PGRAPH][:01:00.0] ch -1 [0x001fb44000] subc 3 class 0x502d mthd 
0x0860 data 0x
nouveau E[ PFB][:01:00.0] trapped write at 0x02b000 on channel 
0x0001fcb0 PGRAPH/DISPATCH/GRCTX reason: DMAOBJ_LIMIT

The error is 100% repeatable.

git bisect identified the following culprit:

11d92561c81be2f4a7af37f035e1af294b960abe is the first bad commit
commit 11d92561c81be2f4a7af37f035e1af294b960abe
Author: Kelly Doran 
Date:   Wed Nov 7 10:02:04 2012 +1000

drm/nvc0/disp: fix regression in vblank semaphore release

Signed-off-by: Kelly Doran 
Reviewed-by: Maarten Lankhorst 
Signed-off-by: Ben Skeggs 

:04 04 e539bc754b029da133f89f3bcf5bf31495cb07c5 
4e779444a976c40cb07aafefc8e6e7b1e64f092c M  drivers

I've confirmed that reverting this from -rc5 and -rc6 allows mplayer to work 
again.

User-space is Fedora 15 x86_64 w/ final updates, plus 
mplayer-1.0-0.129.20110917svn from rpmfusion.
There are no binary-only or otherwise out-of-tree kernel or X drivers anywhere 
on the machine.

Please revert or fix this breakage before kernel 3.7.0 final.

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.7-rc5/rc6 regression] "drm/nvc0/disp: fix regression in vblank semaphore release" broke nouveau driver and mplayer

2012-11-18 Thread Mikael Pettersson
Marcin Slusarz writes:
 > On Sat, Nov 17, 2012 at 08:35:18PM +0100, Mikael Pettersson wrote:
 > > mplayer worked fine on my Dell Latitude E6510 (nVidia GT218 [NVS 3100M] 
 > > graphics)
 > > up to and including kernel 3.7-rc4. However, with 3.7-rc5 or -rc6, any 
 > > attempt to
 > > run mplayer just blanks the screen, shows some stray white pixels in the 
 > > upper left
 > > corner, kills the X server, and spews the following errors from the kernel:
 > 
 > Fix was already posted and should be merged soon.
 > 
 > http://cgit.freedesktop.org/nouveau/linux-2.6/commit/?id=7a259e65569bd7593ad541c84982027969ec9c45

That patch fixes the bug I reported. Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CONFIG_EXPERT is a booby trap

2012-10-01 Thread Mikael Pettersson
Tim Shepard writes:
 > This weekend I finally figured out why the keyboard in my MacBook Pro
 > stopped working between 3.4 and 3.5.
 > 
 > When I turned on CONFIG_EXPERT it turned off CONFIG_HID_APPLE.  There
 > was no warning that selecting "Configure standard kernel features" will
 > invisibly turn off needed things elsewhere in the configuration tree.
 > 
 > Something needs to be fixed, but it's not obvious that any simple change
 > will work without causing other troubles.

"diff" the before and after .config files.  That's alerted me to unexpected
changes (not just "where did CONFIG_${foo} go?" but also "wtf is CONFIG_${bar}
doing there?") on numerous occasions.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/16] sparc64: use the generic get_user_pages_fast code

2019-08-10 Thread Mikael Pettersson
For the record the futex test case OOPSes a 5.3-rc3 kernel running on
a Sun Blade 2500 (2 x USIIIi).  This system runs a custom distro with
a custom toolchain (gcc-8.3 based), so I doubt it's a distro problem.

On Sat, Aug 10, 2019 at 9:17 AM Christoph Hellwig  wrote:
>
> There isn't really a way to use an arch-specific get_user_pages_fast
> in mainline, you'd need to revert the whole series.  As a relatively
> quick workaround you can just remove the
>
> select HAVE_FAST_GUP if SPARC64
>
> line from arch/sparc/Kconfig


Re: [5.0-rc5 regression] "scsi: kill off the legacy IO path" causes 5 minute delay during boot on Sun Blade 2500

2019-02-14 Thread Mikael Pettersson
On Sun, Feb 10, 2019 at 5:05 PM Jens Axboe  wrote:
>
> On 2/10/19 8:44 AM, James Bottomley wrote:
> > On Sun, 2019-02-10 at 10:17 +0100, Mikael Pettersson wrote:
> >> On Sat, Feb 9, 2019 at 7:19 PM James Bottomley
> >>  wrote:
> > [...]
> >>> I think the reason for this is that the block mq path doesn't feed
> >>> the kernel entropy pool correctly, hence the need to install an
> >>> entropy gatherer for systems that don't have other good random
> >>> number sources.
> >>
> >> That does sound plausible, I admit I didn't even consider the
> >> possibility that the old block I/O path also was an entropy source.
> >
> > In theory, the new one should be as well since the rotational entropy
> > collector is on the SCSI completion path.   I'd seen the same problem
> > but had assumed it was something someone had done to our internal
> > entropy pool and thus hadn't bisected it.
>
> The difference is that the old stack included ADD_RANDOM by default,
> so this check:
>
> if (blk_queue_add_random(q))
> add_disk_randomness(req->rq_disk);
>
> in scsi_end_request() would be true, and we'd add the randomness. For
> sd, it seems to set it just fine for non-rotational drives. Could this
> be because other devices don't? Maybe the below makes a difference.
>
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 6d65ac584eba..60e029911755 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1881,6 +1881,7 @@ struct request_queue *scsi_mq_alloc_queue(struct 
> scsi_device *sdev)
> sdev->request_queue->queuedata = sdev;
> __scsi_init_queue(sdev->host, sdev->request_queue);
> blk_queue_flag_set(QUEUE_FLAG_SCSI_PASSTHROUGH, sdev->request_queue);
> +   blk_queue_flag_set(QUEUE_FLAG_ADD_RANDOM, sdev->request_queue);
> return sdev->request_queue;
>  }

This patch eliminates my 5 minute boot-up delay problem.

/Mikael


[5.0-rc5 regression] "scsi: kill off the legacy IO path" causes 5 minute delay during boot on Sun Blade 2500

2019-02-09 Thread Mikael Pettersson
4.20 and earlier kernels boot fine on my Sun Blade 2500 (UltraSPARC
IIIi), but the 5.0-rc kernels consistently experience a 5 minute delay
late during boot, after enabling networking but before allowing user
logins.  E.g. 5.0-rc5 dmesg has:

[Fri Feb  8 17:13:17 2019] random: dbus-daemon: uninitialized urandom
read (12 bytes read)
[Fri Feb  8 17:18:14 2019] random: crng init done

During this interval the machine answers pings but won't allow user
logins either on the console or over the network.

A git bisect identified commit f664a3cc17b7d0a2bc3b3ab96181e1029b0ec0e6
Author: Jens Axboe 
Date:   Thu Nov 1 16:36:27 2018 -0600

scsi: kill off the legacy IO path

as the point where this 5m delay was introduced.

My older kernels all have CONFIG_SCSI_MQ_DEFAULT=N, which the above
commit effectively forces to Y.
Rebuilding 4.20 with CONFIG_SCSI_MQ_DEFAULT=Y also triggers the 5m
delay behaviour.

I haven't seen this behaviour on my x86-64 boxes, so presumably it's
related to the sparc64 kernel or this machine's SCSI adapter.

.config and dmesg below.

/Mikael

#
# Automatically generated file; DO NOT EDIT.
# Linux/sparc64 4.20.0 Kernel Configuration
#

#
# Compiler: gcc (GCC) 7.4.1 20181227
#
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=70401
CONFIG_CLANG_VERSION=0
CONFIG_IRQ_WORK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION="-blkmq"
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_BUILD_SALT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
# CONFIG_POSIX_MQUEUE is not set
# CONFIG_CROSS_MEMORY_ATTACH is not set
# CONFIG_USELIB is not set
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_IRQ_PREFLOW_FASTEOI=y
CONFIG_IRQ_DOMAIN=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_SPARSE_IRQ=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_PSI is not set
# CONFIG_CPU_ISOLATION is not set

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=17
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_CGROUPS=y
# CONFIG_MEMCG is not set
# CONFIG_BLK_CGROUP is not set
# CONFIG_CGROUP_SCHED is not set
# CONFIG_CGROUP_PIDS is not set
# CONFIG_CGROUP_RDMA is not set
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CGROUP_HUGETLB is not set
# CONFIG_CPUSETS is not set
# CONFIG_CGROUP_DEVICE is not set
# CONFIG_CGROUP_CPUACCT is not set
# CONFIG_CGROUP_DEBUG is not set
# CONFIG_NAMESPACES is not set
# CONFIG_CHECKPOINT_RESTORE is not set
# CONFIG_SCHED_AUTOGROUP is not set
# CONFIG_SYSFS_DEPRECATED is not set
# CONFIG_RELAY is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
# CONFIG_RD_BZIP2 is not set
# CONFIG_RD_LZMA is not set
# CONFIG_RD_XZ is not set
# CONFIG_RD_LZO is not set
# CONFIG_RD_LZ4 is not set
# CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE is not set
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
CONFIG_HAVE_UID16=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_BPF=y
CONFIG_EXPERT=y
CONFIG_UID16=y
CONFIG_MULTIUSER=y
# CONFIG_SGETMASK_SYSCALL is not set
# CONFIG_SYSFS_SYSCALL is not set
# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_FHANDLE=y
CONFIG_POSIX_TIMERS=y
CONFIG_PRINTK=y
CONFIG_PRINTK_NMI=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_FUTEX_PI=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
# CONFIG_AIO is not set
CONFIG_ADVISE_SYSCALLS=y
# CONFIG_MEMBARRIER is not set
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
CONFIG_KALLSYMS_BASE_RELATIVE=y
# CONFIG_BPF_SYSCALL is not set
# CONFIG_USERFAULTFD is not set
CONFIG_EMBEDDED=y
CONFIG_HAVE_PERF_EVENTS=y
CONFIG_PERF_USE_VMALLOC=y
# CONFIG_PC104 is not set

#
# Kernel Performance Events And Counters
#
# CONFIG_PERF_EVENTS is not set
# CONFIG_VM_EVENT_COUNTERS is not set
# CONFIG_SLUB_DEBUG is not set
# CONFIG_COMPAT_BRK is not set
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_SLAB_MERGE_DEFAULT=y
# CONFIG_SLAB_FREELIST_RANDOM is not set
# CONFIG_SLAB_FREELIST_HARDENED is not set
CONFIG_SLUB_CPU_PARTIAL=y
# CONFIG_PROFILING is not set
CONFIG_64BIT=y
CONFIG_SPARC=y
CONFIG_SPARC64=y
CONFIG_ARCH_DEFCONFIG="arch/sparc/configs/sparc64_defconfig"
CONFIG_ARCH_PROC_KCORE_TEXT=y
CONFIG_CPU_BIG_ENDIAN=y
CONFIG_ARCH_ATU=y

Re: [5.0-rc5 regression] "scsi: kill off the legacy IO path" causes 5 minute delay during boot on Sun Blade 2500

2019-02-10 Thread Mikael Pettersson
On Sat, Feb 9, 2019 at 7:19 PM James Bottomley
 wrote:
>
> On Sat, 2019-02-09 at 18:04 +0100, Mikael Pettersson wrote:
> > 4.20 and earlier kernels boot fine on my Sun Blade 2500 (UltraSPARC
> > IIIi), but the 5.0-rc kernels consistently experience a 5 minute
> > delay
> > late during boot, after enabling networking but before allowing user
> > logins.  E.g. 5.0-rc5 dmesg has:
> >
> > [Fri Feb  8 17:13:17 2019] random: dbus-daemon: uninitialized urandom
> > read (12 bytes read)
> > [Fri Feb  8 17:18:14 2019] random: crng init done
>
> I've had the same problem on several of my test systems.  Are you sure
> it's not this bug report:
>
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=912087
>
> ?
>
> The solution for me was to install the haveged package which does
> active entropy gathering during boot and can make /dev/urandom
> available much earlier.

Thanks for the hint, I'll look into using haveged on this machine.

>
> > During this interval the machine answers pings but won't allow user
> > logins either on the console or over the network.
> >
> > A git bisect identified commit
> > f664a3cc17b7d0a2bc3b3ab96181e1029b0ec0e6
> > Author: Jens Axboe 
> > Date:   Thu Nov 1 16:36:27 2018 -0600
> >
> > scsi: kill off the legacy IO path
> >
> > as the point where this 5m delay was introduced.
>
> I think the reason for this is that the block mq path doesn't feed the
> kernel entropy pool correctly, hence the need to install an entropy
> gatherer for systems that don't have other good random number sources.

That does sound plausible, I admit I didn't even consider the possibility that
the old block I/O path also was an entropy source.

/Mikael


Re: bit fields && data tearing

2014-09-04 Thread Mikael Pettersson
Benjamin Herrenschmidt writes:
 > On Wed, 2014-09-03 at 18:51 -0400, Peter Hurley wrote:
 > 
 > > Apologies for hijacking this thread but I need to extend this discussion
 > > somewhat regarding what a compiler might do with adjacent fields in a 
 > > structure.
 > > 
 > > The tty subsystem defines a large aggregate structure, struct tty_struct.
 > > Importantly, several different locks apply to different fields within that
 > > structure; ie., a specific spinlock will be claimed before updating or 
 > > accessing
 > > certain fields while a different spinlock will be claimed before updating 
 > > or
 > > accessing certain _adjacent_ fields.
 > > 
 > > What is necessary and sufficient to prevent accidental false-sharing?
 > > The patch below was flagged as insufficient on ia64, and possibly ARM.
 > 
 > We expect native aligned scalar types to be accessed atomically (the
 > read/modify/write of a larger quantity that gcc does on some bitfield
 > cases has been flagged as a gcc bug, but shouldn't happen on normal
 > scalar types).
 > 
 > I am not 100% certain of "bool" here, I assume it's treated as a normal
 > scalar and thus atomic but if unsure, you can always use int.

Please use an aligned int or long.  Some machines cannot do atomic
accesses on sub-int/long quantities, so 'bool' may cause unexpected
rmw cycles on adjacent fields.

/Mikael

 > 
 > Another option is to use the atomic bitops and make these bits in a
 > bitmask but that is probably unnecessary if you have locks already.
 > 
 > Cheers,
 > Ben.
 > 
 > 
 > > Regards,
 > > Peter Hurley
 > > 
 > > --- >% ---
 > > Subject: [PATCH 21/26] tty: Convert tty_struct bitfield to bools
 > > 
 > > The stopped, hw_stopped, flow_stopped and packet bits are smp-unsafe
 > > and interrupt-unsafe. For example,
 > > 
 > > CPU 0 | CPU 1
 > >   |
 > > tty->flow_stopped = 1 | tty->hw_stopped = 0
 > > 
 > > One of these updates will be corrupted, as the bitwise operation
 > > on the bitfield is non-atomic.
 > > 
 > > Ensure each flag has a separate memory location, so concurrent
 > > updates do not corrupt orthogonal states.
 > > 
 > > Signed-off-by: Peter Hurley 
 > > ---
 > >  include/linux/tty.h | 5 -
 > >  1 file changed, 4 insertions(+), 1 deletion(-)
 > > 
 > > diff --git a/include/linux/tty.h b/include/linux/tty.h
 > > index 1c3316a..7cf61cb 100644
 > > --- a/include/linux/tty.h
 > > +++ b/include/linux/tty.h
 > > @@ -261,7 +261,10 @@ struct tty_struct {
 > >unsigned long flags;
 > >int count;
 > >struct winsize winsize; /* winsize_mutex */
 > > -  unsigned char stopped:1, hw_stopped:1, flow_stopped:1, packet:1;
 > > +  bool stopped;
 > > +  bool hw_stopped;
 > > +  bool flow_stopped;
 > > +  bool packet;
 > >unsigned char ctrl_status;  /* ctrl_lock */
 > >unsigned int receive_room;  /* Bytes free for queue */
 > >int flow_change;
 > 
 > 
 > --
 > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 > the body of a message to majord...@vger.kernel.org
 > More majordomo info at  http://vger.kernel.org/majordomo-info.html
 > Please read the FAQ at  http://www.tux.org/lkml/

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BISECTED] 3.17-rc1 radeon screen corruption due to "Always flush the HDP cache before submitting a CS to the GPU"

2014-09-05 Thread Mikael Pettersson
Michel Dänzer writes:
 > On 30.08.2014 22:59, Mikael Pettersson wrote:
 > > Since 3.17-rc1 my radeon card (RV370 / X1050 card) causes screen corruption
 > > after a while in X + firefox.  This still occurs with yesterday's HEAD
 > > of Linus' repo.  3.16 and ealier kernels are fine.
 > > 
 > > I ran a bisect, which identified:
 > > 
 > > commit 72a9987edcedb89db988079a03c9b9c65b6ec9ac
 > > Author: Michel Dänzer 
 > > Date:   Thu Jul 31 18:43:49 2014 +0900
 > > 
 > >  drm/radeon: Always flush the HDP cache before submitting a CS to the 
 > > GPU
 > > 
 > > as the cause of my screen corruption.  Reverting this from 3.17-rc2
 > > (which requires manual intervention due to subsequent changes in
 > > radeon_ring_commit()) eliminates the screen corruption.
 > 
 > Does the patch below help?

Tested, sorry no joy.  I first reconfirmed the screen corruption with 3.17-rc3.
I then applied this and rebuilt/rebooted, and after a few minutes X had a hickup
(screen went black, came back after a few seconds, but then no cursor or
reaction to mouse events), but I was able to kill it via my Terminate_Server
key binding.  The kernel log showed:

[ 1641.247760] radeon :01:00.0: ring 0 stalled for more than 1msec
[ 1641.247765] radeon :01:00.0: GPU lockup (waiting for 0x6241 
last fence id 0x6240 on ring 0)
[ 1641.247768] radeon :01:00.0: failed to get a new IB (-35)
[ 1641.247770] [drm:radeon_cs_ib_fill] *ERROR* Failed to get ib !
[ 1641.404052] Failed to wait GUI idle while programming pipes. Bad things 
might happen.
[ 1641.405075] radeon :01:00.0: Saved 859 dwords of commands on ring 0.
[ 1641.405084] radeon :01:00.0: (r300_asic_reset:394) RBBM_STATUS=0x80010140
[ 1641.910649] radeon :01:00.0: (r300_asic_reset:413) RBBM_STATUS=0x80010140
[ 1642.412182] radeon :01:00.0: (r300_asic_reset:425) RBBM_STATUS=0x0140
[ 1642.412218] radeon :01:00.0: GPU reset succeed
[ 1642.412220] radeon :01:00.0: GPU reset succeeded, trying to resume
[ 1642.412224] radeon :01:00.0: 88060274f800 unpin not necessary
[ 1642.626303] [drm] radeon: 1 quad pipes, 1 Z pipes initialized.
[ 1642.626325] [drm] PCIE GART of 512M enabled (table at 0xE004).
[ 1642.626328] radeon :01:00.0: WB enabled
[ 1642.626331] radeon :01:00.0: fence driver on ring 0 use gpu addr 
0xc000 and cpu addr 0x8800d9b9f000
[ 1642.626375] [drm] radeon: ring at 0xC0001000
[ 1642.783220] [drm:r100_ring_test] *ERROR* radeon: ring test failed 
(scratch(0x15E8)=0xCAFEDEAD)
[ 1642.783222] [drm:r100_cp_init] *ERROR* radeon: cp isn't working (-22).
[ 1642.783224] radeon :01:00.0: failed initializing CP (-22).

With a revert of the HDP flush patch things are stable.

/Mikael

 > 
 > diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
 > index 4c5ec44..3ff9c53 100644
 > --- a/drivers/gpu/drm/radeon/r100.c
 > +++ b/drivers/gpu/drm/radeon/r100.c
 > @@ -1070,6 +1070,20 @@ void r100_ring_hdp_flush(struct radeon_device *rdev, 
 > struct radeon_ring *ring)
 >  radeon_ring_write(ring, rdev->config.r100.hdp_cntl);
 >  }
 >  
 > +/**
 > + * r100_mmio_hdp_flush - flush Host Data Path via MMIO
 > + * rdev: radeon device structure
 > + */
 > +void r100_mmio_hdp_flush(struct radeon_device *rdev)
 > +{
 > +WREG32(RADEON_HOST_PATH_CNTL,
 > +   rdev->config.r100.hdp_cntl | RADEON_HDP_READ_BUFFER_INVALIDATE);
 > +(void)RREG32(RADEON_HOST_PATH_CNTL);
 > +WREG32(RADEON_HOST_PATH_CNTL,
 > +   rdev->config.r100.hdp_cntl);
 > +(void)RREG32(RADEON_HOST_PATH_CNTL);
 > +}
 > +
 >  static void r100_cp_load_microcode(struct radeon_device *rdev)
 >  {
 >  const __be32 *fw_data;
 > diff --git a/drivers/gpu/drm/radeon/radeon_asic.c 
 > b/drivers/gpu/drm/radeon/radeon_asic.c
 > index abe..c23a123 100644
 > --- a/drivers/gpu/drm/radeon/radeon_asic.c
 > +++ b/drivers/gpu/drm/radeon/radeon_asic.c
 > @@ -408,7 +408,7 @@ static struct radeon_asic r300_asic_pcie = {
 >  .resume = _resume,
 >  .vga_set_state = _vga_set_state,
 >  .asic_reset = _asic_reset,
 > -.mmio_hdp_flush = NULL,
 > +.mmio_hdp_flush = r100_mmio_hdp_flush,
 >  .gui_idle = _gui_idle,
 >  .mc_wait_for_idle = _mc_wait_for_idle,
 >  .gart = {
 > diff --git a/drivers/gpu/drm/radeon/radeon_asic.h 
 > b/drivers/gpu/drm/radeon/radeon_asic.h
 > index 275a5dc..e9b1c35 100644
 > --- a/drivers/gpu/drm/radeon/radeon_asic.h
 > +++ b/drivers/gpu/drm/radeon/radeon_asic.h
 > @@ -150,6 +150,8 @@ void r100_gfx_set_wptr(struct radeon_device *rdev,
 > struct radeon_ring *ring);
 >  void r100_ring_hdp_flush(struct radeon_device *rdev,
 >   struct radeon_ring

Re: [BISECTED] 3.17-rc1 radeon screen corruption due to "Always flush the HDP cache before submitting a CS to the GPU"

2014-09-14 Thread Mikael Pettersson
Michel Dänzer writes:
 > On 06.09.2014 01:49, Mikael Pettersson wrote:
 > > Michel Dänzer writes:
 > >   > On 30.08.2014 22:59, Mikael Pettersson wrote:
 > >   > > Since 3.17-rc1 my radeon card (RV370 / X1050 card) causes screen 
 > > corruption
 > >   > > after a while in X + firefox.  This still occurs with yesterday's 
 > > HEAD
 > >   > > of Linus' repo.  3.16 and ealier kernels are fine.
 > >   > >
 > >   > > I ran a bisect, which identified:
 > >   > >
 > >   > > commit 72a9987edcedb89db988079a03c9b9c65b6ec9ac
 > >   > > Author: Michel Dänzer 
 > >   > > Date:   Thu Jul 31 18:43:49 2014 +0900
 > >   > >
 > >   > >  drm/radeon: Always flush the HDP cache before submitting a CS 
 > > to the GPU
 > >   > >
 > >   > > as the cause of my screen corruption.  Reverting this from 3.17-rc2
 > >   > > (which requires manual intervention due to subsequent changes in
 > >   > > radeon_ring_commit()) eliminates the screen corruption.
 > >   >
 > >   > Does the patch below help?
 > > 
 > > Tested, sorry no joy.  I first reconfirmed the screen corruption with 
 > > 3.17-rc3.
 > > I then applied this and rebuilt/rebooted, and after a few minutes X had a 
 > > hickup
 > > (screen went black, came back after a few seconds, but then no cursor or
 > > reaction to mouse events), but I was able to kill it via my 
 > > Terminate_Server
 > > key binding.
 > 
 > I was afraid so, thanks for testing it.
 > 
 > 
 > I can't see any other option than the patch below then. Can you confirm that 
 > this
 > fixes the screen corruption?

It does, thanks.

Tested-by: Mikael Pettersson 

 > 
 > 
 > diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
 > index 4c5ec44..b0098e7 100644
 > --- a/drivers/gpu/drm/radeon/r100.c
 > +++ b/drivers/gpu/drm/radeon/r100.c
 > @@ -821,6 +821,20 @@ u32 r100_get_vblank_counter(struct radeon_device *rdev, 
 > int crtc)
 >  return RREG32(RADEON_CRTC2_CRNT_FRAME);
 >  }
 >  
 > +/**
 > + * r100_ring_hdp_flush - flush Host Data Path via the ring buffer
 > + * rdev: radeon device structure
 > + * ring: ring buffer struct for emitting packets
 > + */
 > +static void r100_ring_hdp_flush(struct radeon_device *rdev, struct 
 > radeon_ring *ring)
 > +{
 > +radeon_ring_write(ring, PACKET0(RADEON_HOST_PATH_CNTL, 0));
 > +radeon_ring_write(ring, rdev->config.r100.hdp_cntl |
 > +RADEON_HDP_READ_BUFFER_INVALIDATE);
 > +radeon_ring_write(ring, PACKET0(RADEON_HOST_PATH_CNTL, 0));
 > +radeon_ring_write(ring, rdev->config.r100.hdp_cntl);
 > +}
 > +
 >  /* Who ever call radeon_fence_emit should call ring_lock and ask
 >   * for enough space (today caller are ib schedule and buffer move) */
 >  void r100_fence_ring_emit(struct radeon_device *rdev,
 > @@ -1056,20 +1070,6 @@ void r100_gfx_set_wptr(struct radeon_device *rdev,
 >  (void)RREG32(RADEON_CP_RB_WPTR);
 >  }
 >  
 > -/**
 > - * r100_ring_hdp_flush - flush Host Data Path via the ring buffer
 > - * rdev: radeon device structure
 > - * ring: ring buffer struct for emitting packets
 > - */
 > -void r100_ring_hdp_flush(struct radeon_device *rdev, struct radeon_ring 
 > *ring)
 > -{
 > -radeon_ring_write(ring, PACKET0(RADEON_HOST_PATH_CNTL, 0));
 > -radeon_ring_write(ring, rdev->config.r100.hdp_cntl |
 > -RADEON_HDP_READ_BUFFER_INVALIDATE);
 > -radeon_ring_write(ring, PACKET0(RADEON_HOST_PATH_CNTL, 0));
 > -radeon_ring_write(ring, rdev->config.r100.hdp_cntl);
 > -}
 > -
 >  static void r100_cp_load_microcode(struct radeon_device *rdev)
 >  {
 >  const __be32 *fw_data;
 > diff --git a/drivers/gpu/drm/radeon/radeon_asic.c 
 > b/drivers/gpu/drm/radeon/radeon_asic.c
 > index abe..2dd5847 100644
 > --- a/drivers/gpu/drm/radeon/radeon_asic.c
 > +++ b/drivers/gpu/drm/radeon/radeon_asic.c
 > @@ -185,7 +185,6 @@ static struct radeon_asic_ring r100_gfx_ring = {
 >  .get_rptr = _gfx_get_rptr,
 >  .get_wptr = _gfx_get_wptr,
 >  .set_wptr = _gfx_set_wptr,
 > -.hdp_flush = _ring_hdp_flush,
 >  };
 >  
 >  static struct radeon_asic r100_asic = {
 > @@ -332,7 +331,6 @@ static struct radeon_asic_ring r300_gfx_ring = {
 >  .get_rptr = _gfx_get_rptr,
 >  .get_wptr = _gfx_get_wptr,
 >  .set_wptr = _gfx_set_wptr,
 > -.hdp_flush = _ring_hdp_flush,
 >  };
 >  
 >  static struct radeon_asic r300_asic = {
 > diff --git a/drivers/gpu/drm/radeon/radeon_asic.h 
 > b/drivers/gpu/drm/radeon/radeon_as

[BISECTED] 3.17-rc1 radeon screen corruption due to "Always flush the HDP cache before submitting a CS to the GPU"

2014-08-30 Thread Mikael Pettersson
Since 3.17-rc1 my radeon card (RV370 / X1050 card) causes screen corruption
after a while in X + firefox.  This still occurs with yesterday's HEAD
of Linus' repo.  3.16 and ealier kernels are fine.

I ran a bisect, which identified:

commit 72a9987edcedb89db988079a03c9b9c65b6ec9ac
Author: Michel Dänzer 
Date:   Thu Jul 31 18:43:49 2014 +0900

drm/radeon: Always flush the HDP cache before submitting a CS to the GPU

as the cause of my screen corruption.  Reverting this from 3.17-rc2
(which requires manual intervention due to subsequent changes in
radeon_ring_commit()) eliminates the screen corruption.

User-space is vanilla Fedora 19 / x86_64 with updates.  radeon_drv.so says:

[62.574] (II) LoadModule: "radeon"
[62.574] (II) Loading /usr/lib64/xorg/modules/drivers/radeon_drv.so
[62.574] (II) Module radeon: vendor="X.Org Foundation"
[62.574]compiled for 1.14.0, module version = 7.1.99
[62.574]Module class: X.Org Video Driver
[62.574]ABI class: X.Org Video Driver, version 14.1
...
[62.585] (--) RADEON(0): Chipset: "ATI Radeon X550 (RV370) 5B63 (PCIE)" 
(ChipID = 0x5b63)

See also my original report to LKML:


/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BISECTED] 3.17-rc1 radeon screen corruption due to "Always flush the HDP cache before submitting a CS to the GPU"

2014-09-02 Thread Mikael Pettersson
Michel Dänzer writes:
 > On 30.08.2014 22:59, Mikael Pettersson wrote:
 > > Since 3.17-rc1 my radeon card (RV370 / X1050 card) causes screen corruption
 > > after a while in X + firefox.  This still occurs with yesterday's HEAD
 > > of Linus' repo.  3.16 and ealier kernels are fine.
 > > 
 > > I ran a bisect, which identified:
 > > 
 > > commit 72a9987edcedb89db988079a03c9b9c65b6ec9ac
 > > Author: Michel Dänzer 
 > > Date:   Thu Jul 31 18:43:49 2014 +0900
 > > 
 > >  drm/radeon: Always flush the HDP cache before submitting a CS to the 
 > > GPU
 > > 
 > > as the cause of my screen corruption.  Reverting this from 3.17-rc2
 > > (which requires manual intervention due to subsequent changes in
 > > radeon_ring_commit()) eliminates the screen corruption.
 > 
 > Does the patch below help?

Thanks for the patch, I'll test it on Friday evening when I'm
back home and have access to the affected machine.


 > 
 > diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
 > index 4c5ec44..3ff9c53 100644
 > --- a/drivers/gpu/drm/radeon/r100.c
 > +++ b/drivers/gpu/drm/radeon/r100.c
 > @@ -1070,6 +1070,20 @@ void r100_ring_hdp_flush(struct radeon_device *rdev, 
 > struct radeon_ring *ring)
 >  radeon_ring_write(ring, rdev->config.r100.hdp_cntl);
 >  }
 >  
 > +/**
 > + * r100_mmio_hdp_flush - flush Host Data Path via MMIO
 > + * rdev: radeon device structure
 > + */
 > +void r100_mmio_hdp_flush(struct radeon_device *rdev)
 > +{
 > +WREG32(RADEON_HOST_PATH_CNTL,
 > +   rdev->config.r100.hdp_cntl | RADEON_HDP_READ_BUFFER_INVALIDATE);
 > +(void)RREG32(RADEON_HOST_PATH_CNTL);
 > +WREG32(RADEON_HOST_PATH_CNTL,
 > +   rdev->config.r100.hdp_cntl);
 > +(void)RREG32(RADEON_HOST_PATH_CNTL);
 > +}
 > +
 >  static void r100_cp_load_microcode(struct radeon_device *rdev)
 >  {
 >  const __be32 *fw_data;
 > diff --git a/drivers/gpu/drm/radeon/radeon_asic.c 
 > b/drivers/gpu/drm/radeon/radeon_asic.c
 > index abe..c23a123 100644
 > --- a/drivers/gpu/drm/radeon/radeon_asic.c
 > +++ b/drivers/gpu/drm/radeon/radeon_asic.c
 > @@ -408,7 +408,7 @@ static struct radeon_asic r300_asic_pcie = {
 >  .resume = _resume,
 >  .vga_set_state = _vga_set_state,
 >  .asic_reset = _asic_reset,
 > -.mmio_hdp_flush = NULL,
 > +.mmio_hdp_flush = r100_mmio_hdp_flush,
 >  .gui_idle = _gui_idle,
 >  .mc_wait_for_idle = _mc_wait_for_idle,
 >  .gart = {
 > diff --git a/drivers/gpu/drm/radeon/radeon_asic.h 
 > b/drivers/gpu/drm/radeon/radeon_asic.h
 > index 275a5dc..e9b1c35 100644
 > --- a/drivers/gpu/drm/radeon/radeon_asic.h
 > +++ b/drivers/gpu/drm/radeon/radeon_asic.h
 > @@ -150,6 +150,8 @@ void r100_gfx_set_wptr(struct radeon_device *rdev,
 > struct radeon_ring *ring);
 >  void r100_ring_hdp_flush(struct radeon_device *rdev,
 >   struct radeon_ring *ring);
 > +void r100_mmio_hdp_flush(struct radeon_device *rdev);
 > +
 >  /*
 >   * r200,rv250,rs300,rv280
 >   */
 > diff --git a/drivers/gpu/drm/radeon/radeon_gem.c 
 > b/drivers/gpu/drm/radeon/radeon_gem.c
 > index bfd7e1b..3d0f564 100644
 > --- a/drivers/gpu/drm/radeon/radeon_gem.c
 > +++ b/drivers/gpu/drm/radeon/radeon_gem.c
 > @@ -368,6 +368,7 @@ int radeon_gem_wait_idle_ioctl(struct drm_device *dev, 
 > void *data,
 >  r = radeon_bo_wait(robj, _placement, false);
 >  /* Flush HDP cache via MMIO if necessary */
 >  if (rdev->asic->mmio_hdp_flush &&
 > +!rdev->asic->ring[RADEON_RING_TYPE_GFX_INDEX]->hdp_flush &&
 >  radeon_mem_type_to_domain(cur_placement) == RADEON_GEM_DOMAIN_VRAM)
 >  robj->rdev->asic->mmio_hdp_flush(rdev);
 >  drm_gem_object_unreference_unlocked(gobj);
 > diff --git a/drivers/gpu/drm/radeon/radeon_ring.c 
 > b/drivers/gpu/drm/radeon/radeon_ring.c
 > index d656079..b82843b 100644
 > --- a/drivers/gpu/drm/radeon/radeon_ring.c
 > +++ b/drivers/gpu/drm/radeon/radeon_ring.c
 > @@ -188,7 +188,8 @@ void radeon_ring_commit(struct radeon_device *rdev, 
 > struct radeon_ring *ring,
 >  /* If we are emitting the HDP flush via the ring buffer, we need to
 >   * do it before padding.
 >   */
 > -if (hdp_flush && rdev->asic->ring[ring->idx]->hdp_flush)
 > +if (hdp_flush && rdev->asic->ring[ring->idx]->hdp_flush &&
 > +!rdev->asic->mmio_hdp_flush)
 >  rdev->asic->ring[ring->idx]->hdp_flush(rdev, ring);
 >  /* We pad to match fetch size */
 >  while (ring->wptr & ring->align_mask) {
 > 
 > 
 > 
 > -- 
 > Earthling Michel Dänzer|  http://www.amd.com
 > Libre software enthusiast  |Mesa and X developer

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BISECTED] 3.17-rc1 radeon screen corruption due to "Always flush the HDP cache before submitting a CS to the GPU"

2014-09-08 Thread Mikael Pettersson
Michel Dänzer writes:
 > On 06.09.2014 01:49, Mikael Pettersson wrote:
 > > Michel Dänzer writes:
 > >   > On 30.08.2014 22:59, Mikael Pettersson wrote:
 > >   > > Since 3.17-rc1 my radeon card (RV370 / X1050 card) causes screen 
 > > corruption
 > >   > > after a while in X + firefox.  This still occurs with yesterday's 
 > > HEAD
 > >   > > of Linus' repo.  3.16 and ealier kernels are fine.
 > >   > >
 > >   > > I ran a bisect, which identified:
 > >   > >
 > >   > > commit 72a9987edcedb89db988079a03c9b9c65b6ec9ac
 > >   > > Author: Michel Dänzer 
 > >   > > Date:   Thu Jul 31 18:43:49 2014 +0900
 > >   > >
 > >   > >  drm/radeon: Always flush the HDP cache before submitting a CS 
 > > to the GPU
 > >   > >
 > >   > > as the cause of my screen corruption.  Reverting this from 3.17-rc2
 > >   > > (which requires manual intervention due to subsequent changes in
 > >   > > radeon_ring_commit()) eliminates the screen corruption.
 > >   >
 > >   > Does the patch below help?
 > > 
 > > Tested, sorry no joy.  I first reconfirmed the screen corruption with 
 > > 3.17-rc3.
 > > I then applied this and rebuilt/rebooted, and after a few minutes X had a 
 > > hickup
 > > (screen went black, came back after a few seconds, but then no cursor or
 > > reaction to mouse events), but I was able to kill it via my 
 > > Terminate_Server
 > > key binding.
 > 
 > I was afraid so, thanks for testing it.
 > 
 > 
 > I can't see any other option than the patch below then. Can you confirm that 
 > this
 > fixes the screen corruption?

I'll test this on Friday evening when I'm back home and have access to the
affected machine.

/Mikael


 > 
 > 
 > diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
 > index 4c5ec44..b0098e7 100644
 > --- a/drivers/gpu/drm/radeon/r100.c
 > +++ b/drivers/gpu/drm/radeon/r100.c
 > @@ -821,6 +821,20 @@ u32 r100_get_vblank_counter(struct radeon_device *rdev, 
 > int crtc)
 >  return RREG32(RADEON_CRTC2_CRNT_FRAME);
 >  }
 >  
 > +/**
 > + * r100_ring_hdp_flush - flush Host Data Path via the ring buffer
 > + * rdev: radeon device structure
 > + * ring: ring buffer struct for emitting packets
 > + */
 > +static void r100_ring_hdp_flush(struct radeon_device *rdev, struct 
 > radeon_ring *ring)
 > +{
 > +radeon_ring_write(ring, PACKET0(RADEON_HOST_PATH_CNTL, 0));
 > +radeon_ring_write(ring, rdev->config.r100.hdp_cntl |
 > +RADEON_HDP_READ_BUFFER_INVALIDATE);
 > +radeon_ring_write(ring, PACKET0(RADEON_HOST_PATH_CNTL, 0));
 > +radeon_ring_write(ring, rdev->config.r100.hdp_cntl);
 > +}
 > +
 >  /* Who ever call radeon_fence_emit should call ring_lock and ask
 >   * for enough space (today caller are ib schedule and buffer move) */
 >  void r100_fence_ring_emit(struct radeon_device *rdev,
 > @@ -1056,20 +1070,6 @@ void r100_gfx_set_wptr(struct radeon_device *rdev,
 >  (void)RREG32(RADEON_CP_RB_WPTR);
 >  }
 >  
 > -/**
 > - * r100_ring_hdp_flush - flush Host Data Path via the ring buffer
 > - * rdev: radeon device structure
 > - * ring: ring buffer struct for emitting packets
 > - */
 > -void r100_ring_hdp_flush(struct radeon_device *rdev, struct radeon_ring 
 > *ring)
 > -{
 > -radeon_ring_write(ring, PACKET0(RADEON_HOST_PATH_CNTL, 0));
 > -radeon_ring_write(ring, rdev->config.r100.hdp_cntl |
 > -RADEON_HDP_READ_BUFFER_INVALIDATE);
 > -radeon_ring_write(ring, PACKET0(RADEON_HOST_PATH_CNTL, 0));
 > -radeon_ring_write(ring, rdev->config.r100.hdp_cntl);
 > -}
 > -
 >  static void r100_cp_load_microcode(struct radeon_device *rdev)
 >  {
 >  const __be32 *fw_data;
 > diff --git a/drivers/gpu/drm/radeon/radeon_asic.c 
 > b/drivers/gpu/drm/radeon/radeon_asic.c
 > index abe..2dd5847 100644
 > --- a/drivers/gpu/drm/radeon/radeon_asic.c
 > +++ b/drivers/gpu/drm/radeon/radeon_asic.c
 > @@ -185,7 +185,6 @@ static struct radeon_asic_ring r100_gfx_ring = {
 >  .get_rptr = _gfx_get_rptr,
 >  .get_wptr = _gfx_get_wptr,
 >  .set_wptr = _gfx_set_wptr,
 > -.hdp_flush = _ring_hdp_flush,
 >  };
 >  
 >  static struct radeon_asic r100_asic = {
 > @@ -332,7 +331,6 @@ static struct radeon_asic_ring r300_gfx_ring = {
 >  .get_rptr = _gfx_get_rptr,
 >  .get_wptr = _gfx_get_wptr,
 >  .set_wptr = _gfx_set_wptr,
 > -.hdp_flush = _ring_hdp_flush,
 >  };
 >  
 >  static struct radeon_asic r300_asic = {
 > diff --git 

Re: [PATCH] Don't allow blocking of signals using sigreturn.

2015-03-11 Thread Mikael Pettersson
Jann Horn writes:
 > Or should I throw this patch away and write a patch
 > for the prctl() manpage instead that documents that
 > being able to call sigreturn() implies being able to
 > effectively call sigprocmask(), at least on some
 > architectures like X86?

Well, that is the semantics of sigreturn().  It is essentially
setcontext() [which includes the actions of sigprocmask()], but
with restrictions on parameter placement (at least on x86).

You could introduce some setting to restrict that aspect for
seccomp processes, but you can't change this for normal processes
without breaking things.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Don't allow blocking of signals using sigreturn.

2015-03-12 Thread Mikael Pettersson
Andy Lutomirski writes:
 > On Wed, Mar 11, 2015 at 2:43 PM, Mikael Pettersson  
 > wrote:
 > > Jann Horn writes:
 > >  > Or should I throw this patch away and write a patch
 > >  > for the prctl() manpage instead that documents that
 > >  > being able to call sigreturn() implies being able to
 > >  > effectively call sigprocmask(), at least on some
 > >  > architectures like X86?
 > >
 > > Well, that is the semantics of sigreturn().  It is essentially
 > > setcontext() [which includes the actions of sigprocmask()], but
 > > with restrictions on parameter placement (at least on x86).
 > >
 > > You could introduce some setting to restrict that aspect for
 > > seccomp processes, but you can't change this for normal processes
 > > without breaking things.
 > 
 > Which leads to the interesting question: does anyone ever call
 > sigreturn with a different signal mask than the kernel put there
 > during signal delivery

Yes.  Either a sigfillset();sigdelset(SIGSEGV), or a copy of the
thread's sigmask from a previous sigframe.

 > or, even more strangely, with a totally made up
 > context?

Not "totally made up", but certainly with adjustments(*) made to
both GPRs and PC.  In a different piece of SW: FPU controls.

(*) Rolling back or force-committing a micro-transaction until
PC+GPRs represent the state at an original instruction boundary.
This was in a product using dynamic binary instrumentation.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] seccomp.2: Add note about alarm(2) not being sufficient to limit runtime

2015-03-12 Thread Mikael Pettersson
Jann Horn writes:
 > On Wed, Mar 11, 2015 at 10:43:50PM +0100, Mikael Pettersson wrote:
 > > Jann Horn writes:
 > >  > Or should I throw this patch away and write a patch
 > >  > for the prctl() manpage instead that documents that
 > >  > being able to call sigreturn() implies being able to
 > >  > effectively call sigprocmask(), at least on some
 > >  > architectures like X86?
 > > 
 > > Well, that is the semantics of sigreturn().  It is essentially
 > > setcontext() [which includes the actions of sigprocmask()], but
 > > with restrictions on parameter placement (at least on x86).
 > > 
 > > You could introduce some setting to restrict that aspect for
 > > seccomp processes, but you can't change this for normal processes
 > > without breaking things.
 > 
 > Then I think it's probably better and easier to just document the existing
 > behavior? If a new setting would have to be introduced and developers would
 > need to be aware of that, it's probably easier to just tell everyone to use
 > SIGKILL.
 > 
 > Does this manpage patch look good?

LGTM

Acked-by: Mikael Pettersson 

 > 
 > ---
 >  man2/seccomp.2 | 25 +
 >  1 file changed, 25 insertions(+)
 > 
 > diff --git a/man2/seccomp.2 b/man2/seccomp.2
 > index 702ceb8..f762d07 100644
 > --- a/man2/seccomp.2
 > +++ b/man2/seccomp.2
 > @@ -64,6 +64,31 @@ Strict secure computing mode is useful for 
 > number-crunching
 >  applications that may need to execute untrusted byte code, perhaps
 >  obtained by reading from a pipe or socket.
 >  
 > +Note that although the calling thread can no longer call
 > +.BR sigprocmask (2),
 > +it can use
 > +.BR sigreturn (2)
 > +to block all signals apart from
 > +.BR SIGKILL
 > +and
 > +.BR SIGSTOP .
 > +Therefore, to reliably terminate it,
 > +.BR SIGKILL
 > +has to be used, meaning that e.g.
 > +.BR alarm (2)
 > +is not sufficient for restricting its runtime. Instead, use
 > +.BR timer_create (2)
 > +with
 > +.BR SIGEV_SIGNAL
 > +and
 > +.BR sigev_signo
 > +set to
 > +.BR SIGKILL
 > +or use
 > +.BR setrlimit (2)
 > +to set the hard limit for
 > +.BR RLIMIT_CPU .
 > +
 >  This operation is available only if the kernel is configured with
 >  .BR CONFIG_SECCOMP
 >  enabled.
 > -- 
 > 2.1.4

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[REGRESSION,BISECTED] 4.1-rc2 radeon audio changes oops the kernel hard

2015-05-04 Thread Mikael Pettersson
On my Ivy Bridge i7 mobo w/ Radeon graphics, the 4.1-rc2 kernel oopses hard,
requiring a hard reset:

BUG: unable to handle kernel NULL pointer dereference at 0010
IP: [] radeon_audio_detect+0x5b/0x150 [radeon]
PGD 0 
Oops:  [#1] SMP 
Modules linked in: af_packet snd_hda_codec_generic snd_hda_intel 
snd_hda_controller snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device 
snd_pcm radeon cfbfillrect cfbimgblt cfbcopyarea i2c_algo_bit backlight r8169 
mii coretemp snd_timer drm_kms_helper ttm snd drm i2c_core xhci_pci xhci_hcd 
soundcore evdev firmware_class hwmon hid_generic usbhid hid ehci_pci ehci_hcd 
sr_mod cdrom usbcore usb_common ipv6
CPU: 0 PID: 163 Comm: kworker/0:2 Not tainted 4.1.0-rc2 #1
Hardware name: System manufacturer System Product Name/P8Z77-V LE PLUS, BIOS 
0403 05/08/2012
Workqueue: events output_poll_execute [drm_kms_helper]
task: 8806012b1590 ti: 88003796 task.ti: 88003796
RIP: 0010:[]  [] 
radeon_audio_detect+0x5b/0x150 [radeon]
RSP: 0018:880037963c78  EFLAGS: 00010246
RAX: 880600c92da0 RBX: 880600cbb000 RCX: 0001
RDX:  RSI:  RDI: 880037a3f600
RBP: 880600c92da0 R08: 0001 R09: 0050
R10: 0001 R11: 880603001a80 R12: 0001
R13: 880600c924e0 R14: 880601f84000 R15: 0001
FS:  () GS:88061ec0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0010 CR3: 01478000 CR4: 001407f0
Stack:
 880600cbb000 0001 0001 880601f84000
 a03e7d70 a03157ea 880601f84000 0002
 880600baa200 880600cbb050 880600cbb000 880600e33800
Call Trace:
 [] ? radeon_dvi_detect+0x35a/0x4d0 [radeon]
 [] ? 
drm_helper_probe_single_connector_modes_merge_bits+0x2e6/0x490 [drm_kms_helper]
 [] ? drm_fb_helper_probe_connector_modes.isra.5+0x48/0x70 
[drm_kms_helper]
 [] ? drm_fb_helper_hotplug_event+0x55/0xe0 [drm_kms_helper]
 [] ? output_poll_execute+0x7c/0x1a0 [drm_kms_helper]
 [] ? process_one_work+0x130/0x360
 [] ? worker_thread+0x114/0x460
 [] ? __schedule+0x20d/0x660
 [] ? rescuer_thread+0x2f0/0x2f0
 [] ? kthread+0xbc/0xe0
 [] ? kthread_create_on_node+0x170/0x170
 [] ? ret_from_fork+0x42/0x70
 [] ? kthread_create_on_node+0x170/0x170
Code: 8b 45 00 4c 8b ad 58 01 00 00 4c 8b 70 28 49 8b 85 00 01 00 00 48 85 c0 
74 30 41 83 fc 01 74 38 48 8b 70 10 49 8b 96 c8 24 00 00 <48> 8b 4a 10 48 85 c9 
74 0e 31 d2 4c 89 f7 ff d1 49 8b 85 00 01 
RIP  [] radeon_audio_detect+0x5b/0x150 [radeon]
 RSP 
CR2: 0010
---[ end trace 5b99e3870bfc7a92 ]---
BUG: unable to handle kernel paging request at ffd8
IP: [] kthread_data+0x7/0x10
PGD 1479067 PUD 147b067 PMD 0 
Oops:  [#2] SMP 
Modules linked in: af_packet snd_hda_codec_generic snd_hda_intel 
snd_hda_controller snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device 
snd_pcm radeon cfbfillrect cfbimgblt cfbcopyarea i2c_algo_bit backlight r8169 
mii coretemp snd_timer drm_kms_helper ttm snd drm i2c_core xhci_pci xhci_hcd 
soundcore evdev firmware_class hwmon hid_generic usbhid hid ehci_pci ehci_hcd 
sr_mod cdrom usbcore usb_common ipv6
CPU: 0 PID: 163 Comm: kworker/0:2 Tainted: G  D 4.1.0-rc2 #1
Hardware name: System manufacturer System Product Name/P8Z77-V LE PLUS, BIOS 
0403 05/08/2012
task: 8806012b1590 ti: 88003796 task.ti: 88003796
RIP: 0010:[]  [] kthread_data+0x7/0x10
RSP: 0018:880037963a60  EFLAGS: 00010002
RAX:  RBX:  RCX: 73c2bc6e
RDX:  RSI:  RDI: 8806012b1590
RBP: 8806012b1590 R08: 0001 R09: 0001
R10: ea001804b800 R11: 001a R12: 8806012b1980
R13:  R14: 00014300 R15: 
FS:  () GS:88061ec0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0028 CR3: 01478000 CR4: 001407f0
Stack:
 81051068 88061ec14300 8134c203 
 880037964000 8806012b1878 880037963af8 
 880603188000 8806012b1590 8134c4aa 8800379637d8
Call Trace:
 [] ? wq_worker_sleeping+0x8/0x90
 [] ? __schedule+0x3e3/0x660
 [] ? schedule+0x2a/0x80
 [] ? do_exit+0x61e/0xa20
 [] ? oops_end+0x66/0xa0
 [] ? no_context+0x236/0x286
 [] ? page_fault+0x1f/0x30
 [] ? radeon_audio_detect+0x5b/0x150 [radeon]
 [] ? radeon_audio_detect+0xe2/0x150 [radeon]
 [] ? radeon_dvi_detect+0x35a/0x4d0 [radeon]
 [] ? 
drm_helper_probe_single_connector_modes_merge_bits+0x2e6/0x490 [drm_kms_helper]
 [] ? drm_fb_helper_probe_connector_modes.isra.5+0x48/0x70 
[drm_kms_helper]
 [] ? drm_fb_helper_hotplug_event+0x55/0xe0 [drm_kms_helper]
 [] ? output_poll_execute+0x7c/0x1a0 [drm_kms_helper]
 [] ? process_one_work+0x130/0x360
 [] ? 

RE: [REGRESSION,BISECTED] 4.1-rc2 radeon audio changes oops the kernel hard

2015-05-04 Thread Mikael Pettersson
Deucher, Alexander writes:
 > > -Original Message-
 > > From: Mikael Pettersson [mailto:mikpeli...@gmail.com]
 > > Sent: Monday, May 04, 2015 11:53 AM
 > > To: linux-kernel@vger.kernel.org
 > > Cc: Deucher, Alexander
 > > Subject: [REGRESSION,BISECTED] 4.1-rc2 radeon audio changes oops the
 > > kernel hard
 > > 
 > > On my Ivy Bridge i7 mobo w/ Radeon graphics, the 4.1-rc2 kernel oopses
 > > hard,
 > > requiring a hard reset:
 > > 
 > > BUG: unable to handle kernel NULL pointer dereference at
 > > 0010
 > > IP: [] radeon_audio_detect+0x5b/0x150 [radeon]
 > > PGD 0
 > > Oops:  [#1] SMP
 > > Modules linked in: af_packet snd_hda_codec_generic snd_hda_intel
 > > snd_hda_controller snd_hda_codec snd_hwdep snd_hda_core snd_seq
 > > snd_seq_device snd_pcm radeon cfbfillrect cfbimgblt cfbcopyarea
 > > i2c_algo_bit backlight r8169 mii coretemp snd_timer drm_kms_helper ttm
 > > snd drm i2c_core xhci_pci xhci_hcd soundcore evdev firmware_class hwmon
 > > hid_generic usbhid hid ehci_pci ehci_hcd sr_mod cdrom usbcore
 > > usb_common ipv6
 > > CPU: 0 PID: 163 Comm: kworker/0:2 Not tainted 4.1.0-rc2 #1
 > > Hardware name: System manufacturer System Product Name/P8Z77-V LE
 > > PLUS, BIOS 0403 05/08/2012
 > > Workqueue: events output_poll_execute [drm_kms_helper]
 > > task: 8806012b1590 ti: 88003796 task.ti: 88003796
 > > RIP: 0010:[]  []
 > > radeon_audio_detect+0x5b/0x150 [radeon]
 > > RSP: 0018:880037963c78  EFLAGS: 00010246
 > > RAX: 880600c92da0 RBX: 880600cbb000 RCX: 0001
 > > RDX:  RSI:  RDI: 880037a3f600
 > > RBP: 880600c92da0 R08: 0001 R09: 0050
 > > R10: 0001 R11: 880603001a80 R12: 0001
 > > R13: 880600c924e0 R14: 880601f84000 R15: 0001
 > > FS:  () GS:88061ec0()
 > > knlGS:
 > > CS:  0010 DS:  ES:  CR0: 80050033
 > > CR2: 0010 CR3: 01478000 CR4: 001407f0
 > > Stack:
 > >  880600cbb000 0001 0001 880601f84000
 > >  a03e7d70 a03157ea 880601f84000 0002
 > >  880600baa200 880600cbb050 880600cbb000 880600e33800
 > > Call Trace:
 > >  [] ? radeon_dvi_detect+0x35a/0x4d0 [radeon]
 > >  [] ?
 > > drm_helper_probe_single_connector_modes_merge_bits+0x2e6/0x490
 > > [drm_kms_helper]
 > >  [] ?
 > > drm_fb_helper_probe_connector_modes.isra.5+0x48/0x70
 > > [drm_kms_helper]
 > >  [] ? drm_fb_helper_hotplug_event+0x55/0xe0
 > > [drm_kms_helper]
 > >  [] ? output_poll_execute+0x7c/0x1a0 [drm_kms_helper]
 > >  [] ? process_one_work+0x130/0x360
 > >  [] ? worker_thread+0x114/0x460
 > >  [] ? __schedule+0x20d/0x660
 > >  [] ? rescuer_thread+0x2f0/0x2f0
 > >  [] ? kthread+0xbc/0xe0
 > >  [] ? kthread_create_on_node+0x170/0x170
 > >  [] ? ret_from_fork+0x42/0x70
 > >  [] ? kthread_create_on_node+0x170/0x170
 > > Code: 8b 45 00 4c 8b ad 58 01 00 00 4c 8b 70 28 49 8b 85 00 01 00 00 48 85 
 > > c0 74
 > > 30 41 83 fc 01 74 38 48 8b 70 10 49 8b 96 c8 24 00 00 <48> 8b 4a 10 48 85 
 > > c9 74
 > > 0e 31 d2 4c 89 f7 ff d1 49 8b 85 00 01
 > > RIP  [] radeon_audio_detect+0x5b/0x150 [radeon]
 > >  RSP 
 > > CR2: 0010
 > > ---[ end trace 5b99e3870bfc7a92 ]---
 > > BUG: unable to handle kernel paging request at ffd8
 > > IP: [] kthread_data+0x7/0x10
 > > PGD 1479067 PUD 147b067 PMD 0
 > > Oops:  [#2] SMP
 > > Modules linked in: af_packet snd_hda_codec_generic snd_hda_intel
 > > snd_hda_controller snd_hda_codec snd_hwdep snd_hda_core snd_seq
 > > snd_seq_device snd_pcm radeon cfbfillrect cfbimgblt cfbcopyarea
 > > i2c_algo_bit backlight r8169 mii coretemp snd_timer drm_kms_helper ttm
 > > snd drm i2c_core xhci_pci xhci_hcd soundcore evdev firmware_class hwmon
 > > hid_generic usbhid hid ehci_pci ehci_hcd sr_mod cdrom usbcore
 > > usb_common ipv6
 > > CPU: 0 PID: 163 Comm: kworker/0:2 Tainted: G  D 4.1.0-rc2 #1
 > > Hardware name: System manufacturer System Product Name/P8Z77-V LE
 > > PLUS, BIOS 0403 05/08/2012
 > > task: 8806012b1590 ti: 88003796 task.ti: 88003796
 > > RIP: 0010:[]  [] kthread_data+0x7/0x10
 > > RSP: 0018:880037963a60  EFLAGS: 00010002
 > > RAX:  RBX:  RCX: 73c2bc6e
 > > RDX:  RSI:  RDI: 880601

Re: [PATCH] arm: Blacklist gcc 4.8.[012] and 4.9.0 with CONFIG_FRAME_POINTER

2014-10-11 Thread Mikael Pettersson
Peter Hurley writes:
 > On 10/10/2014 12:36 PM, Russell King - ARM Linux wrote:
 > > On Fri, Oct 10, 2014 at 12:26:14PM -0400, Peter Hurley wrote:
 > >> gcc versions 4.8.[012] and 4.9.0 generates code that prematurely
 > >> adjusts the stack pointer such that still-to-be-referenced locals
 > >> are below the stack pointer, which allows them to be overwritten
 > >> by interrupts.
 > > 
 > > I would much rather do this in asm-offsets.c, along side the other ARM
 > > specific buggy compiler test(s).  I'm presently putting together such
 > > a patch.
 > > 
 > > The information in the thread on linux-omap says only GCC 4.8.1 and
 > > GCC 4.8.2.  Where do you get the other versions from?
 > 
 > The gcc PR linked in the commit message; see the "Known to fail" field.

The 4.8.0 release is broken, but the 4.9.0 one is not.  It's unfortunate,
but "4.9.0" may refer to "the 4.9.0 release" or to "some point after trunk
forked 4.8 branch up to and including the 4.9.0 release point".  In this
case, it's the latter -- this can be inferred from the fact that the
fix went into trunk in October 2013 while 4.9.0 was branched and released
during the first half of 2014.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] arm: Blacklist gcc 4.8.[012] and 4.9.0 with CONFIG_FRAME_POINTER

2014-10-12 Thread Mikael Pettersson
Peter Hurley writes:
 > On 10/11/2014 12:33 PM, Mikael Pettersson wrote:
 > > Peter Hurley writes:
 > >  > On 10/10/2014 12:36 PM, Russell King - ARM Linux wrote:
 > >  > > On Fri, Oct 10, 2014 at 12:26:14PM -0400, Peter Hurley wrote:
 > >  > >> gcc versions 4.8.[012] and 4.9.0 generates code that prematurely
 > >  > >> adjusts the stack pointer such that still-to-be-referenced locals
 > >  > >> are below the stack pointer, which allows them to be overwritten
 > >  > >> by interrupts.
 > >  > > 
 > >  > > I would much rather do this in asm-offsets.c, along side the other ARM
 > >  > > specific buggy compiler test(s).  I'm presently putting together such
 > >  > > a patch.
 > >  > > 
 > >  > > The information in the thread on linux-omap says only GCC 4.8.1 and
 > >  > > GCC 4.8.2.  Where do you get the other versions from?
 > >  > 
 > >  > The gcc PR linked in the commit message; see the "Known to fail" field.
 > > 
 > > The 4.8.0 release is broken, but the 4.9.0 one is not.  It's unfortunate,
 > > but "4.9.0" may refer to "the 4.9.0 release" or to "some point after trunk
 > > forked 4.8 branch up to and including the 4.9.0 release point".  In this
 > > case, it's the latter -- this can be inferred from the fact that the
 > > fix went into trunk in October 2013 while 4.9.0 was branched and released
 > > during the first half of 2014.
 > 
 > Is there a reasonably quick way to determine if a particular commit is
 > in a particular release of gcc?

If you want the process to be fully automatic and the answer to be
absolutely precise, then "no".

If you're willing to manually map GCC PR fixes to release versions,
and to have some false negatives (some GCCs having a certain fix
will be flagged as not having it), then "yes".

For this ARM bug, PR58854, we know that 4.8.[0-2] have the bug, but
4.7 and older, 4.8.3 and newer, and 4.9 and newer are Ok.

A problem is that a GCC that identifies itself as 4.8.3 may be
(a) a 4.8.3 pre-release (i.e., close to 4.8.2),
(b) a 4.8.3 release, or
(c) a 4.8.4 pre-release that's been patched to say 4.8.3 (Red Hat does this).

Case (a) may or may not have the fix (we can't easily(*) tell), but
cases (b) and (c) are Ok.  If you're willing to classify all three
as not having the fix (false negatives), then you want to test

#if (__GNUC__ == 4 && __GNUC_MINOR__ == 8 && __GNUC_PATCHLEVEL__ < 4)

for possibly broken versions.

A complication is that a bug has both starting and ending commits.
It's not uncommon for distros and others to backport changes, so a
compiler that claims to be e.g. 4.7.4 may include a backport of the
4.8 change that caused the bug you're trying to avoid.  There is no
easy way to detect this, unless you have a runtime test case for the
bug.  I'd ignore this case as "unfixable".

So I'd write the tests for vanilla upstream GCC only, and tell distro
users to complain to their distros if their kernels get miscompiled.

(*) __VERSION__ is defined like "4.8.3 20140515 (prerelease)" in
pre-releases but like "4.8.3" in ordinary releases, but this is not
something you can test for in the C preprocessor.  A configure-time
check could extract the date and compare that with the date the fix
went into that particular branch, but case (c) above make detecting
pre-releases a bit more complicated.

 > Starting from the mainline viewcvs revision page for this fix here,
 > https://gcc.gnu.org/viewcvs/gcc?view=revision=204203
 > (which is the link from the PR for the fix), navigation to anywhere
 > else in the gcc tree is impossible. I can't even look at the Changelog.

https://gcc.gnu.org/viewcvs/gcc/
then descend trunk or branches as needed.

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: status of ia64 / hpsim

2015-01-05 Thread Mikael Pettersson
Tony Luck writes:
 > On Tue, Dec 30, 2014 at 7:50 AM, Christoph Hellwig  
 > wrote:
 > > IS the ia64 hpsim architecture still in use?  I noticed it because it
 > > has a fairly rudimentary SCSI driver under arch/ia64, which doesn't
 > > look very maintained.
 > 
 > Mikael was doing something with hpsim on the ski simulator back in Jan'14. 
 > Was
 > that something real, or just playing because it was there?

I was trying to set up an emulated platform for continuous GCC bootstrap and
regression testsuite runs, but something broke the ia64 kernel causing EXT4
file system errors in the emulated platform, so I had to scrap that idea.

I tried various ia64 kernel versions, compiling the ia64 kernel with older
GCCs, and compiling SKI with older host (x86_64) GCCs, but nothing worked.
With no known-good starting point there was no reasonable way for me to
debug the problem.

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: fork on processes with lots of memory

2016-01-26 Thread Mikael Pettersson
Felix von Leitner writes:
 > > Dear Linux kernel devs,
 > 
 > > I talked to someone who uses large Linux based hardware to run a
 > > process with huge memory requirements (think 4 GB), and he told me that
 > > if they do a fork() syscall on that process, the whole system comes to
 > > standstill. And not just for a second or two. He said they measured a 45
 > > minute (!) delay before the system became responsive again.
 > 
 > I'm sorry, I meant 4 TB not 4 GB.
 > I'm not used to working with that kind of memory sizes.

Make sure you have >>4TB physical if you're going to fork from a process
with a 4TB virtual address space.  (I'm assuming it's not sparse, but all
actually being used.)

Disable transparent hugepages (THP).  The internal book-keeping mechanisms
have been known to run amok with large RAM sizes causing severe performance
issues.  Maybe 4.x kernels are better, I haven't checked.

If you're using explicit hugepages and these kinds of RAM sizes, don't
bother with RHEL 6 or 7 kernels -- they're broken.  Vanilla 4.x kernels work.

We're also in the TB range, though not quite 4TB, and fork()ing from inside
such processes definitely works for us.  We do disable THP since it kills us
otherwise.

 > 
 > > Their working theory is that all the pages need to be marked copy-on-write
 > > in both processes, and if you touch one page, a copy needs to be made,
 > > and than just takes a while if you have a billion pages.
 > 
 > > I was wondering if there is any advice for such situations from the
 > > memory management people on this list.
 > 
 > > In this case the fork was for an execve afterwards, but I was going to
 > > recommend fork to them for something else that can not be tricked around
 > > with vfork.
 > 
 > > Can anyone comment on whether the 45 minute number sounds like it could
 > > be real? When I heard it, I was flabberghasted. But the other person
 > > swore it was real. Can a fork cause this much of a delay? Is there a way
 > > to work around it?
 > 
 > > I was going to recommend the fork to create a boundary between the
 > > processes, so that you can recover from memory corruption in one
 > > process. In fact, after the fork I would want to munmap almost all of
 > > the shared pages anyway, but there is no way to tell fork that.
 > 
 > > Thanks,
 > 
 > > Felix
 > 
 > > PS: Please put me on Cc if you reply, I'm not subscribed to this mailing
 > > list.

-- 


Re: 4.7-rc6, ext4, sparc64: Unable to handle kernel paging request at ...

2016-07-10 Thread Mikael Pettersson
Meelis Roos writes:
 > > > Just got this on bootup of my Sun T2000:
 > > >...
 > > > I have not seen it before, this includes 4.6.0 4.6.0-08907-g7639dad
 > > > 4.7.0-rc1-00094-g6b15d66 4.7.0-rc4-00014-g67016f6.
 > > >
 > > > It is not reproducible, did not appear on next reboot of the same
 > > > kernel.
 > > 
 > > mine T5120 boots ok 4.7.0-rc6, rootfs being on ext4 .
 > 
 > My T5120 and many other sparc64 machines also boot fine, most of them 
 > using ext4, others ext3 with ext4 driver.
 > 
 > However, I also got a very similar oops from T1000:
 > 
 > [   55.251101] Unable to handle kernel paging request at virtual address 
 > fe42a000
 > [   55.251348] tsk->{mm,active_mm}->context = 0083
 > [   55.251533] tsk->{mm,active_mm}->pgd = 8001f6224000
 > [   55.251719]   \|/  \|/
 >  "@'/ .. \`@"
 >  /_| \__/ |_\
 > \__U_/
 > [   55.252038] systemd-udevd(268): Oops [#1]
 > [   55.252274] CPU: 9 PID: 268 Comm: systemd-udevd Not tainted 4.7.0-rc6 #26
 > [   55.252367] task: 8001f6064380 ti: 8001f620c000 task.ti: 
 > 8001f620c000
 > [   55.252497] TSTATE: 000811001604 TPC: 00649380 TNPC: 
 > 00649384 Y: Not tainted
 > [   55.252651] TPC: <__radix_tree_lookup+0x60/0x1a0>
...

A few weeks ago I got a similar oops with 4.7.0-rc2 on a Sun Blade 2500 (dual 
USIIIi):

Jun 12 18:40:26 lauter kernel: Unable to handle kernel paging request at 
virtual address a000
Jun 12 18:40:26 lauter kernel: tsk->{mm,active_mm}->context = 17e3
Jun 12 18:40:26 lauter kernel: tsk->{mm,active_mm}->pgd = fff23edb8000
Jun 12 18:40:26 lauter kernel:   \|/  \|/
Jun 12 18:40:26 lauter kernel:   "@'/ .. \`@"
Jun 12 18:40:26 lauter kernel:   /_| \__/ |_\
Jun 12 18:40:26 lauter kernel:  \__U_/
Jun 12 18:40:26 lauter kernel: gnat1(19464): Oops [#1]
Jun 12 18:40:26 lauter kernel: CPU: 0 PID: 19464 Comm: gnat1 Not tainted 
4.7.0-rc2 #1
Jun 12 18:40:26 lauter kernel: task: fff23ebd1440 ti: fff000123c36 
task.ti: fff000123c36
Jun 12 18:40:27 lauter kernel: TSTATE: 11001604 TPC: 005db288 
TNPC: 005db28c Y: Not tainted
Jun 12 18:40:27 lauter kernel: TPC: <__radix_tree_lookup+0x44/0xd4>
Jun 12 18:40:27 lauter kernel: g0: 3000 g1: a6d9 g2: 
0001 g3: 
Jun 12 18:40:27 lauter kernel: g4: fff23ebd1440 g5: fff23ef7a000 g6: 
fff000123c36 g7: 
Jun 12 18:40:27 lauter kernel: o0: 000c o1: fff000123c363980 o2: 
fff000123c363988 o3: fff000123c363968
Jun 12 18:40:27 lauter kernel: o4: 0020 o5: fff23fffefc0 sp: 
fff000123c3630d1 ret_pc: fff232e42540
Jun 12 18:40:27 lauter kernel: RPC: <0xfff232e42540>
Jun 12 18:40:27 lauter kernel: l0: 024213ca l1:  l2: 
 l3: 
Jun 12 18:40:27 lauter kernel: l4:  l5:  l6: 
 l7: 
Jun 12 18:40:27 lauter kernel: i0: fff0001225e56900 i1: 0441 i2: 
 i3: 
Jun 12 18:40:27 lauter kernel: i4: a6d8 i5: fff232e42540 i6: 
fff000123c363191 i7: 004bf680
Jun 12 18:40:27 lauter kernel: I7: <__do_page_cache_readahead+0x78/0x200>
Jun 12 18:40:27 lauter kernel: Call Trace:
Jun 12 18:40:27 lauter kernel:  [004bf680] 
__do_page_cache_readahead+0x78/0x200
Jun 12 18:40:27 lauter kernel:  [004b5990] filemap_fault+0x164/0x4c4
Jun 12 18:40:27 lauter kernel:  [00562a84] ext4_filemap_fault+0x1c/0x38
Jun 12 18:40:27 lauter kernel:  [004d2c38] __do_fault+0x58/0xdc
Jun 12 18:40:27 lauter kernel:  [004d611c] handle_mm_fault+0x604/0xe5c
Jun 12 18:40:27 lauter kernel:  [00448288] do_sparc64_fault+0x228/0x684
Jun 12 18:40:27 lauter kernel:  [00407bcc] 
sparc64_realfault_common+0x10/0x20
Jun 12 18:40:28 lauter kernel: Disabling lock debugging due to kernel taint
Jun 12 18:40:28 lauter kernel: Caller[004bf680]: 
__do_page_cache_readahead+0x78/0x200
Jun 12 18:40:28 lauter kernel: Caller[004b5990]: 
filemap_fault+0x164/0x4c4
Jun 12 18:40:28 lauter kernel: Caller[00562a84]: 
ext4_filemap_fault+0x1c/0x38
Jun 12 18:40:28 lauter kernel: Caller[004d2c38]: __do_fault+0x58/0xdc
Jun 12 18:40:28 lauter kernel: Caller[004d611c]: 
handle_mm_fault+0x604/0xe5c
Jun 12 18:40:28 lauter kernel: Caller[00448288]: 
do_sparc64_fault+0x228/0x684
Jun 12 18:40:28 lauter kernel: Caller[00407bcc]: 
sparc64_realfault_common+0x10/0x20
Jun 12 18:40:28 lauter kernel: Caller[006ee248]: 
ip_options_compile+0x288/0x60c
Jun 12 18:40:28 lauter kernel: Instruction DUMP: 80a06001  0267fff2  b8087ffe 
 83365001  8208603f  84006004  83287003  8528b003

It's only happended that one time, so far.


Re: SIGSYS annoyance

2016-06-10 Thread Mikael Pettersson
Andy Lutomirski writes:
 > On Mon, Jun 6, 2016 at 9:03 AM, Kees Cook  wrote:
 > > On Fri, Jun 3, 2016 at 10:16 PM, Andy Lutomirski  
 > > wrote:
 > >> https://bugzilla.mozilla.org/show_bug.cgi?id=1176099
 > >>
 > >> Should SIGSYS be delivered to the handler even if blocked?  What, if
 > >> anything, does POSIX say?  All I can find is in pthread_sigmask(3p):
 > >>
 > >> If any of the SIGFPE, SIGILL, SIGSEGV, or SIGBUS signals are generated
 > >> while they are blocked, the result is undefined, unless the signal was
 > >> generated by the action of another process, or by one of the functions
 > >> kill(), pthread_kill(), raise(), or sigqueue().
 > >>
 > >> It would be easy enough to change our behavior so that we deliver the
 > >> signal even if it's blocked or to at least add a flag so that users
 > >> can request that behavior.
 > >
 > > I had trouble following that bug. It sounded like glib just needed a
 > > way to define its signal mask, and that's what they ended up
 > > implementing?
 > >
 > > I think the current behavior is correct. SIGSYS is being generated by
 > > the running process (i.e. the seccomp filter) and if it has a handler
 > > but the signal is blocked, we should treat it as uncaught and kill. On
 > > the other hand, it could be seen like "raise", in which case the
 > > blocking should be ignored? Is there an active problem somewhere here?
 > > It seems like the referenced bug has been fixed already.
 > 
 > Agreed.
 > 
 > It could make sense to have a new sigaction flag SA_FORCE: when set,
 > if a non-default handler is installed, the signal is blocked, and the
 > signal is triggered synchronously (forced), then the handler will be
 > called.  But that isn't specific to seccomp.

Blocking a signal is a very deliberate act.  If some piece of code wants
to force-deliver it, it can unblock it first.  IOW, I don't see the need
for this SA_FORCE thing.


[4.3-rc1 regression] modular 8250 doesn't load

2015-09-14 Thread Mikael Pettersson
I have CONFIG_SERIAL_8250=m.  With 4.2 '/sbin/modprobe 8250' worked
and resulted in:

[   41.354550] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[   41.375156] serial8250: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is 
a 16550A

With 4.3-rc1 however the command fails and logs the following:

[   34.140300] 8250_base: module license 'unspecified' taints kernel.
[   34.141846] Disabling lock debugging due to kernel taint
[   34.143388] 8250_base: Unknown symbol uart_insert_char (err 0)
[   34.144908] 8250_base: Unknown symbol uart_handle_dcd_change (err 0)
[   34.146439] 8250_base: Unknown symbol __pm_runtime_resume (err 0)
[   34.147901] 8250_base: Unknown symbol tty_termios_encode_baud_rate (err 0)
[   34.149354] 8250_base: Unknown symbol uart_handle_cts_change (err 0)
[   34.150798] 8250_base: Unknown symbol __pm_runtime_suspend (err 0)
[   34.152240] 8250_base: Unknown symbol nr_irqs (err 0)

Relevant .config fragments:

CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODVERSIONS=y

# CONFIG_SERIAL_NONSTANDARD is not set
CONFIG_SERIAL_8250=m
# CONFIG_SERIAL_8250_DEPRECATED_OPTIONS is not set
# CONFIG_SERIAL_8250_PNP is not set
# CONFIG_SERIAL_8250_PCI is not set
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
# CONFIG_SERIAL_8250_EXTENDED is not set
# CONFIG_SERIAL_8250_DW is not set
# CONFIG_SERIAL_8250_FINTEK is not set
# CONFIG_SERIAL_UARTLITE is not set
CONFIG_SERIAL_CORE=m
# CONFIG_SERIAL_JSM is not set
# CONFIG_SERIAL_SCCNXP is not set
# CONFIG_SERIAL_SC16IS7XX is not set
# CONFIG_SERIAL_ALTERA_JTAGUART is not set
# CONFIG_SERIAL_ALTERA_UART is not set
# CONFIG_SERIAL_ARC is not set
# CONFIG_SERIAL_RP2 is not set
# CONFIG_SERIAL_FSL_LPUART is not set

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [4.3-rc1 regression] modular 8250 doesn't load

2015-09-14 Thread Mikael Pettersson
Greg Kroah-Hartman writes:
 > On Mon, Sep 14, 2015 at 07:08:10PM +0200, Mikael Pettersson wrote:
 > > I have CONFIG_SERIAL_8250=m.  With 4.2 '/sbin/modprobe 8250' worked
 > > and resulted in:
 > > 
 > > [   41.354550] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
 > > [   41.375156] serial8250: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 
 > > 115200) is a 16550A
 > > 
 > > With 4.3-rc1 however the command fails and logs the following:
 > > 
 > > [   34.140300] 8250_base: module license 'unspecified' taints kernel.
 > 
 > Oops, need to fix that.
 > 
 > > [   34.141846] Disabling lock debugging due to kernel taint
 > > [   34.143388] 8250_base: Unknown symbol uart_insert_char (err 0)
 > > [   34.144908] 8250_base: Unknown symbol uart_handle_dcd_change (err 0)
 > > [   34.146439] 8250_base: Unknown symbol __pm_runtime_resume (err 0)
 > > [   34.147901] 8250_base: Unknown symbol tty_termios_encode_baud_rate (err 
 > > 0)
 > > [   34.149354] 8250_base: Unknown symbol uart_handle_cts_change (err 0)
 > > [   34.150798] 8250_base: Unknown symbol __pm_runtime_suspend (err 0)
 > > [   34.152240] 8250_base: Unknown symbol nr_irqs (err 0)
 > 
 > Are you sure you did 'modprobe' and not 'insmod'?

Yes, I used modprobe.  I double-checked.

 > Peter, care to send a module license fix for this new module you
 > created?

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [4.3-rc1 regression] modular 8250 doesn't load

2015-09-14 Thread Mikael Pettersson
Greg Kroah-Hartman writes:
 > On Mon, Sep 14, 2015 at 08:06:21PM +0200, Mikael Pettersson wrote:
 > > Greg Kroah-Hartman writes:
 > >  > On Mon, Sep 14, 2015 at 07:08:10PM +0200, Mikael Pettersson wrote:
 > >  > > I have CONFIG_SERIAL_8250=m.  With 4.2 '/sbin/modprobe 8250' worked
 > >  > > and resulted in:
 > >  > > 
 > >  > > [   41.354550] Serial: 8250/16550 driver, 4 ports, IRQ sharing 
 > > disabled
 > >  > > [   41.375156] serial8250: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 
 > > 115200) is a 16550A
 > >  > > 
 > >  > > With 4.3-rc1 however the command fails and logs the following:
 > >  > > 
 > >  > > [   34.140300] 8250_base: module license 'unspecified' taints kernel.
 > >  > 
 > >  > Oops, need to fix that.
 > >  > 
 > >  > > [   34.141846] Disabling lock debugging due to kernel taint
 > >  > > [   34.143388] 8250_base: Unknown symbol uart_insert_char (err 0)
 > >  > > [   34.144908] 8250_base: Unknown symbol uart_handle_dcd_change (err 
 > > 0)
 > >  > > [   34.146439] 8250_base: Unknown symbol __pm_runtime_resume (err 0)
 > >  > > [   34.147901] 8250_base: Unknown symbol tty_termios_encode_baud_rate 
 > > (err 0)
 > >  > > [   34.149354] 8250_base: Unknown symbol uart_handle_cts_change (err 
 > > 0)
 > >  > > [   34.150798] 8250_base: Unknown symbol __pm_runtime_suspend (err 0)
 > >  > > [   34.152240] 8250_base: Unknown symbol nr_irqs (err 0)
 > >  > 
 > >  > Are you sure you did 'modprobe' and not 'insmod'?
 > > 
 > > Yes, I used modprobe.  I double-checked.
 > 
 > Then your build should have failed if these functions are not being
 > exported properly by your .config.  Most of these are in the serial_core
 > module, is that present/loaded?

Yes, serial_core is loaded.

uart_insert_char is EXPORT_SYMBOL_GPL, so could the missing license tag be 
preventing
8250_core from binding to it?  (I haven't checked the other symbols but I 
assume they
are also _GPL.)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [4.3-rc1 regression] modular 8250 doesn't load

2015-09-14 Thread Mikael Pettersson
Greg Kroah-Hartman writes:
 > On Mon, Sep 14, 2015 at 02:12:43PM -0700, Greg Kroah-Hartman wrote:
 > > On Mon, Sep 14, 2015 at 10:42:24PM +0200, Mikael Pettersson wrote:
 > > > Greg Kroah-Hartman writes:
 > > >  > On Mon, Sep 14, 2015 at 08:06:21PM +0200, Mikael Pettersson wrote:
 > > >  > > Greg Kroah-Hartman writes:
 > > >  > >  > On Mon, Sep 14, 2015 at 07:08:10PM +0200, Mikael Pettersson 
 > > > wrote:
 > > >  > >  > > I have CONFIG_SERIAL_8250=m.  With 4.2 '/sbin/modprobe 8250' 
 > > > worked
 > > >  > >  > > and resulted in:
 > > >  > >  > > 
 > > >  > >  > > [   41.354550] Serial: 8250/16550 driver, 4 ports, IRQ sharing 
 > > > disabled
 > > >  > >  > > [   41.375156] serial8250: ttyS0 at I/O 0x3f8 (irq = 4, 
 > > > base_baud = 115200) is a 16550A
 > > >  > >  > > 
 > > >  > >  > > With 4.3-rc1 however the command fails and logs the following:
 > > >  > >  > > 
 > > >  > >  > > [   34.140300] 8250_base: module license 'unspecified' taints 
 > > > kernel.
 > > >  > >  > 
 > > >  > >  > Oops, need to fix that.
 > > >  > >  > 
 > > >  > >  > > [   34.141846] Disabling lock debugging due to kernel taint
 > > >  > >  > > [   34.143388] 8250_base: Unknown symbol uart_insert_char (err 
 > > > 0)
 > > >  > >  > > [   34.144908] 8250_base: Unknown symbol 
 > > > uart_handle_dcd_change (err 0)
 > > >  > >  > > [   34.146439] 8250_base: Unknown symbol __pm_runtime_resume 
 > > > (err 0)
 > > >  > >  > > [   34.147901] 8250_base: Unknown symbol 
 > > > tty_termios_encode_baud_rate (err 0)
 > > >  > >  > > [   34.149354] 8250_base: Unknown symbol 
 > > > uart_handle_cts_change (err 0)
 > > >  > >  > > [   34.150798] 8250_base: Unknown symbol __pm_runtime_suspend 
 > > > (err 0)
 > > >  > >  > > [   34.152240] 8250_base: Unknown symbol nr_irqs (err 0)
 > > >  > >  > 
 > > >  > >  > Are you sure you did 'modprobe' and not 'insmod'?
 > > >  > > 
 > > >  > > Yes, I used modprobe.  I double-checked.
 > > >  > 
 > > >  > Then your build should have failed if these functions are not being
 > > >  > exported properly by your .config.  Most of these are in the 
 > > > serial_core
 > > >  > module, is that present/loaded?
 > > > 
 > > > Yes, serial_core is loaded.
 > > > 
 > > > uart_insert_char is EXPORT_SYMBOL_GPL, so could the missing license tag 
 > > > be preventing
 > > > 8250_core from binding to it?  (I haven't checked the other symbols but 
 > > > I assume they
 > > > are also _GPL.)
 > > 
 > > Ah, crap, yes, you are right.  You can test this with a simple:
 > >MODULE_LICENSE("GPL");
 > > line added to the 8250_base file.
 > 
 > Wait, 8250_base.c has a module license line.
 > 
 > Can you provide a full .config file?
 > 
 > thanks,
 > 
 > greg k-h

Here it is:

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_PERF_EVENTS_INTEL_UNCORE=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_64_SMP=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx 
-fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 
-fcall-saved-r11"
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
C

Re: [PATCH] lib: One less subtraction in binary search iterations.

2013-07-09 Thread Mikael Pettersson
Vineet Gupta writes:
 > On 07/09/2013 09:21 AM, Wedson Almeida Filho wrote:
 > > On Sat, Jul 6, 2013 at 9:59 PM, Joe Perches  wrote:
 > >>
 > >> Not correct.
 > >>
 > >>>   while (start < end) {
 > >>> - size_t mid = start + (end - start) / 2;
 > >>> + size_t mid = (start + end) / 2;
 > >>
 > >> size_t start = 0x8000;
 > >> size_t end   = 0x8001;
 > > 
 > > Good point, they aren't equivalent in all cases.
 > > 
 > > For the overflow to happen though, we need an array with at least
 > > N/2+1 entries, where N is the address space size. The array wouldn't
 > > fit in addressable memory if the element size is greater than 1, so
 > > this can only really happen when the element size is 1. Even then, it
 > > would require the kernel range to be greater than half of all
 > > addressable memory, and allow an allocation taking that much memory. I
 > > don't know all architectures where linux runs, but I don't think such
 > > configuration is likely to exist.
 > > 
 > 
 > It does. In ARC port (arch/arc), the untranslated address space starts at
 > 0x8000_ and this is where kernel is linked at. So all ARC kernel 
 > addresses
 > (code/data) lie in that range. This means you don't need special corner case 
 > for
 > this trip on ARC - it will break rightaway - unless I'm missing something.

start and end aren't addresses but array indices relative to 'base'.
So even on ARC you should be safe, as long as no array has SIZE_MAX/2
or more elements.

I'm however far from convinced this micro-optimization is worth the
obvious source code quality reduction.  Surely the eliminated subtraction
is in the noise compared to the multiplies, indirect function calls,
and memory dereferences (in the cmp functions)?

It should be possible to eliminate the multiplies, since no array can
cross the -1/0 address boundary.  But even that is questionable: does
anyone have perf data showing that bsearch performance is a problem?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    4   5   6   7   8   9