Re: sun4v oops

2007-08-22 Thread David Miller
From: Tom \spot\ Callaway [EMAIL PROTECTED]
Date: Wed, 22 Aug 2007 16:31:58 -0400

 Got this oops with the attached config very quickly after init, threw me
 back to the prom. Firmware is fully updated on the box, latest SILO too.
 
 Starting udev: OOPS: Bogus kernel PC [] in fault handler
 OOPS: RPC [00477aa4]
 OOPS: Fault was to vaddr[f7f22000]

Something jumped to address zero.

This happened at address 0x477aa4, find out where that is.

Also, if you're using gcc-4.2.x for kernel builds, don't.
There is a miscompile, that others have run into, which
I have tracked down and am trying to build a test case
for so the gcc folks can look at and hopefully fix it.

I don't anticipate really making a lot of progress on the
gcc-4.2.x bug until way after Kernel Summit in September.
-
To unsubscribe from this list: send the line unsubscribe sparclinux in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sun4v oops

2007-08-22 Thread Tom \spot\ Callaway

On Wed, 2007-08-22 at 13:39 -0700, David Miller wrote:
 From: Tom \spot\ Callaway [EMAIL PROTECTED]
 Date: Wed, 22 Aug 2007 16:31:58 -0400
 
  Got this oops with the attached config very quickly after init, threw me
  back to the prom. Firmware is fully updated on the box, latest SILO too.
  
  Starting udev: OOPS: Bogus kernel PC [] in fault handler
  OOPS: RPC [00477aa4]
  OOPS: Fault was to vaddr[f7f22000]
 
 Something jumped to address zero.
 
 This happened at address 0x477aa4, find out where that is.
 
 Also, if you're using gcc-4.2.x for kernel builds, don't.
 There is a miscompile, that others have run into, which
 I have tracked down and am trying to build a test case
 for so the gcc folks can look at and hopefully fix it.

I'm not. Fedora is still on gcc 4.1.

System.map says:

004778bc t rcu_start_batch
00477928 t __rcu_process_callbacks
00477b98 t rcu_process_callbacks
00477bd4 t rcu_barrier_callback

Not sure how useful that is. I accidentally rebooted into this kernel,
and it booted all the way to login this time, but I noticed the
following in dmesg:

ldc.c:v1.0 (June 25, 2007)
ldc: Domaining disabled.
NET: Registered protocol family 16
VIO: Adding device channel-devices
VIO: Adding device vldc-port-0-0
VIO: Adding device vldc-port-0-1
VIO: Adding device vldc-port-0-2
VIO: Adding device vldc-port-1-0
VIO: Adding device vldc-port-3-0
VIO: Adding device vldc-port-3-8
VIO: Adding device ds-0
VIO: Adding device ds-1
VIO: Adding device ds-0
kobject_add failed for ds-0 with -EEXIST, don't try to register things
with the 
same name in the same directory.
Call Trace:
 [005536e8] kobject_shadow_add+0x1a4/0x1e8
 [0055373c] kobject_add+0x10/0x1c
 [005b60a8] device_add+0x88/0x588
 [005b65bc] device_register+0x14/0x20
 [0044fd58] vio_create_one+0x408/0x45c
 [0044fdc8] vio_add+0x1c/0x24
 [0043c340] mdesc_register_notifier+0x40/0x78
 [007e2c00] vio_init+0x17c/0x19c
 [007d6390] kernel_init+0x228/0x3c8
 [0042783c] kernel_thread+0x38/0x48
 [00685e74] rest_init+0x18/0x64
VIO: Could not register device ds-0, err=-17

One last item: This kernel loads the e1000 driver fine, but while it
detects the devices, it doesn't think there is link on the eth0 port (or
any of eth0-3 for that matter). There is definitely a good link there,
but it won't pull an IP over dhcp.

bash-3.1# dmesg |grep e1000
e1000: 0001:07:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4)
00:14:4f:1d:9f:4e
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
e1000: 0001:07:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4)
00:14:4f:1d:9f:4f
e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection
e1000: :04:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4)
00:14:4f:1d:9f:4c
e1000: eth2: e1000_probe: Intel(R) PRO/1000 Network Connection
e1000: :04:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4)
00:14:4f:1d:9f:4d
e1000: eth3: e1000_probe: Intel(R) PRO/1000 Network Connection

lspci says:

:04:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit
Ethernet Controller (rev 06)
:04:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit
Ethernet Controller (rev 06)
0001:07:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit
Ethernet Controller (rev 06)
0001:07:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit
Ethernet Controller (rev 06)

Is this the correct place to report that regression, or should I go to
LKML or Intel?

Thanks,

~spot

-
To unsubscribe from this list: send the line unsubscribe sparclinux in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sun4v oops

2007-08-22 Thread David Miller
From: Tom \spot\ Callaway [EMAIL PROTECTED]
Date: Wed, 22 Aug 2007 17:44:41 -0400

 I'm not. Fedora is still on gcc 4.1.
 
 System.map says:
 
 004778bc t rcu_start_batch
 00477928 t __rcu_process_callbacks
 00477b98 t rcu_process_callbacks
 00477bd4 t rcu_barrier_callback
 
 Not sure how useful that is. I accidentally rebooted into this kernel,
 and it booted all the way to login this time,

Strange.

 but I noticed the following in dmesg:

Those kobject_add messages are known and I'm working on a
solution to that issue.

 One last item: This kernel loads the e1000 driver fine, but while it
 detects the devices, it doesn't think there is link on the eth0 port (or
 any of eth0-3 for that matter). There is definitely a good link there,
 but it won't pull an IP over dhcp.

Works perfectly fine here on every Niagara I own.

You'll need to dig deep into this yourself since I suspect
you're going to be the only person who can reproduce this
and it's going to be inefficient to debug something like
this remotely.
-
To unsubscribe from this list: send the line unsubscribe sparclinux in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html