We have 2 OpenAFS servers running 1.4.14. We have many clients that we just
switched over to 1.6.1pre1. Starting earlier today, we started getting NULL
pointer dereferences, which has been completely hosing the clients. The
client machines hang on any call that deals with AFS, whether it's "ls /",
"ls /afs", "klist", etc... A "vos changeaddr" was done earlier today,
whereby a large collection (4000) of volumes were mistakenly assigned to
another server. These were corrected with "vos syncvldb" followed by "vos
syncserv". I mention it here, as it's the only thing we've done to the AFS
cluster today.

Here's what we found in the syslog:

Apr 20 01:30:43 SERVER kernel: [12861236.027818] BUG: unable to handle
kernel NULL pointer dereference at 0000000000000028
Apr 20 01:30:43 SERVER kernel: [12861236.027836] IP: [<ffffffffa0048087>]
afs_Conn+0x1e7/0x260 [openafs]
Apr 20 01:30:43 SERVER kernel: [12861236.027868] PGD 0
Apr 20 01:30:43 SERVER kernel: [12861236.027874] Oops: 0000 [#1] SMP
Apr 20 01:30:43 SERVER kernel: [12861236.027882] CPU 6
Apr 20 01:30:43 SERVER kernel: [12861236.027885] Modules linked in:
openafs(P) isofs acpiphp
Apr 20 01:30:43 SERVER kernel: [12861236.027897]
Apr 20 01:30:43 SERVER kernel: [12861236.027902] Pid: 1568, comm: apache2
Tainted: P           O 3.2.0-23-virtual #36-Ubuntu
Apr 20 01:30:43 SERVER kernel: [12861236.027912] RIP:
e030:[<ffffffffa0048087>]  [<ffffffffa0048087>] afs_Conn+0x1e7/0x260
[openafs]
Apr 20 01:30:43 SERVER kernel: [12861236.027936] RSP: e02b:ffff88017f417808
 EFLAGS: 00010282
Apr 20 01:30:43 SERVER kernel: [12861236.027942] RAX: ffffc9000188dbe0 RBX:
0000000000000000 RCX: 000000000000581b
Apr 20 01:30:43 SERVER kernel: [12861236.027950] RDX: ffff8801b112a000 RSI:
0000000000000001 RDI: ffff88017f761680
Apr 20 01:30:43 SERVER kernel: [12861236.027957] RBP: ffff88017f417858 R08:
0000000000000000 R09: 0000000000000000
Apr 20 01:30:43 SERVER kernel: [12861236.027964] R10: 0000000000000002 R11:
0000000000000000 R12: ffff880184756f48
Apr 20 01:30:43 SERVER kernel: [12861236.027971] R13: ffff88017f417a20 R14:
0000000000000004 R15: ffff88017f4178f0
Apr 20 01:30:43 SERVER kernel: [12861236.027983] FS:
 00007f1f6ae2f700(0000) GS:ffff8801bff73000(0000) knlGS:0000000000000000
Apr 20 01:30:43 SERVER kernel: [12861236.027991] CS:  e033 DS: 0000 ES:
0000 CR0: 000000008005003b
Apr 20 01:30:43 SERVER kernel: [12861236.027998] CR2: 0000000000000028 CR3:
0000000181465000 CR4: 0000000000002660
Apr 20 01:30:43 SERVER kernel: [12861236.028006] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Apr 20 01:30:43 SERVER kernel: [12861236.028013] DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
Apr 20 01:30:43 SERVER kernel: [12861236.028021] Process apache2 (pid:
1568, threadinfo ffff88017f416000, task ffff88017f41adc0)
Apr 20 01:30:43 SERVER kernel: [12861236.028028] Stack:
Apr 20 01:30:43 SERVER kernel: [12861236.028032]  000000004e2a6741
0000000000000000 0000000000000000 000000004f90bc43
Apr 20 01:30:43 SERVER kernel: [12861236.028046]  000000000001584a
ffff880184756cc0 ffff88017f41adc0 ffff88017f417a20
Apr 20 01:30:43 SERVER kernel: [12861236.028059]  ffff880184756f48
ffff880184756cc0 ffff88017f417928 ffffffffa0068658
Apr 20 01:30:43 SERVER kernel: [12861236.028072] Call Trace:
Apr 20 01:30:43 SERVER kernel: [12861236.028092]  [<ffffffffa0068658>]
afs_FetchStatus+0x58/0x450 [openafs]
Apr 20 01:30:43 SERVER kernel: [12861236.028113]  [<ffffffffa004672b>] ?
afs_GetCellStale+0x3b/0x60 [openafs]
Apr 20 01:30:43 SERVER kernel: [12861236.028134]  [<ffffffffa0046a25>] ?
afs_IsPrimaryCell+0x25/0x40 [openafs]
Apr 20 01:30:43 SERVER kernel: [12861236.028157]  [<ffffffffa0082b80>] ?
afs_GetVolume+0x40/0x1d0 [openafs]
Apr 20 01:30:43 SERVER kernel: [12861236.028179]  [<ffffffffa006ae8d>]
afs_GetVCache+0x26d/0x5d0 [openafs]
Apr 20 01:30:43 SERVER kernel: [12861236.028200]  [<ffffffffa006b343>]
afs_VerifyVCache2+0x153/0x200 [openafs]
Apr 20 01:30:43 SERVER kernel: [12861236.028222]  [<ffffffffa006ccec>]
afs_getattr+0x29c/0x350 [openafs]
Apr 20 01:30:43 SERVER kernel: [12861236.028242]  [<ffffffffa009340f>]
afs_linux_dentry_revalidate+0x39f/0x470 [openafs]
Apr 20 01:30:43 SERVER kernel: [12861236.028265]  [<ffffffffa006bf43>] ?
afs_AccessOK+0x113/0x1e0 [openafs]
Apr 20 01:30:43 SERVER kernel: [12861236.028279]  [<ffffffff816552de>] ?
_raw_spin_lock+0xe/0x20
Apr 20 01:30:43 SERVER kernel: [12861236.028290]  [<ffffffff811818eb>]
do_lookup+0x18b/0x310
Apr 20 01:30:43 SERVER kernel: [12861236.028298]  [<ffffffff8129885c>] ?
security_inode_permission+0x1c/0x30
Apr 20 01:30:43 SERVER kernel: [12861236.028306]  [<ffffffff81182268>]
link_path_walk+0x138/0x870
Apr 20 01:30:43 SERVER kernel: [12861236.028313]  [<ffffffff811834ad>] ?
path_init+0x2ed/0x3c0
Apr 20 01:30:43 SERVER kernel: [12861236.028319]  [<ffffffff811835d8>]
path_lookupat+0x58/0x750
Apr 20 01:30:43 SERVER kernel: [12861236.028339]  [<ffffffffa006cb3c>] ?
afs_getattr+0xec/0x350 [openafs]
Apr 20 01:30:43 SERVER kernel: [12861236.028348]  [<ffffffff810067be>] ?
xen_pmd_val+0xe/0x10
Apr 20 01:30:43 SERVER kernel: [12861236.028355]  [<ffffffff81183d01>]
do_path_lookup+0x31/0xc0
Apr 20 01:30:43 SERVER kernel: [12861236.028362]  [<ffffffff81184809>]
user_path_at_empty+0x59/0xa0
Apr 20 01:30:43 SERVER kernel: [12861236.028369]  [<ffffffff8100aa32>] ?
check_events+0x12/0x20
Apr 20 01:30:43 SERVER kernel: [12861236.028377]  [<ffffffff8100a25d>] ?
xen_force_evtchn_callback+0xd/0x10
Apr 20 01:30:43 SERVER kernel: [12861236.028384]  [<ffffffff81184861>]
user_path_at+0x11/0x20
Apr 20 01:30:43 SERVER kernel: [12861236.028391]  [<ffffffff8117995a>]
vfs_fstatat+0x3a/0x70
Apr 20 01:30:43 SERVER kernel: [12861236.028398]  [<ffffffff8100aa1f>] ?
xen_restore_fl_direct_reloc+0x4/0x4
Apr 20 01:30:43 SERVER kernel: [12861236.028405]  [<ffffffff8100465d>] ?
xen_clts+0x8d/0x190
Apr 20 01:30:43 SERVER kernel: [12861236.028412]  [<ffffffff811799ae>]
vfs_lstat+0x1e/0x20
Apr 20 01:30:43 SERVER kernel: [12861236.028418]  [<ffffffff81179b4a>]
sys_newlstat+0x1a/0x40
Apr 20 01:30:43 SERVER kernel: [12861236.028427]  [<ffffffff810146e1>] ?
math_state_restore+0x51/0x80
Apr 20 01:30:43 SERVER kernel: [12861236.028435]  [<ffffffff816562fe>] ?
do_device_not_available+0xe/0x10
Apr 20 01:30:43 SERVER kernel: [12861236.028445]  [<ffffffff8165f8cb>] ?
device_not_available+0x1b/0x20
Apr 20 01:30:43 SERVER kernel: [12861236.028452]  [<ffffffff8165d8c2>]
system_call_fastpath+0x16/0x1b
Apr 20 01:30:43 SERVER kernel: [12861236.028458] Code: 89 ef 48 89 45 c8 e8
39 c4 01 00 48 8b 45 c8 48 83 c4 28 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 85
ff 0f 84 95 fe ff ff 48 8b 5f 58 <f6> 43 28 20 0f 85 87 fe ff ff
41 80 7d 12 00 7e 29 41 80 7d 13
Apr 20 01:30:43 SERVER kernel: [12861236.028543] RIP  [<ffffffffa0048087>]
afs_Conn+0x1e7/0x260 [openafs]
Apr 20 01:30:43 SERVER kernel: [12861236.028563]  RSP <ffff88017f417808>
Apr 20 01:30:43 SERVER kernel: [12861236.028568] CR2: 0000000000000028

Reply via email to