Hello,

we are running OpenAFS 1.4.5 on a zLinux System (Novell SLES-9
distribution) and we see afs_cachetrim failing about once a weeknight
like this:

Feb 18 01:02:49 mclinx kernel: openafs: dcache hvkernel BUG at
/usr/src/packages/BUILD/openafs-1.4.5/obj/s390/src/libafs/MODLOAD-2.6
.5-7.287.3-s390x-MP/afs_dcache.c:719!
Feb 18 01:02:49 mclinx kernel: illegal operation: 0001 [#1]
Feb 18 01:02:49 mclinx kernel: CPU:    1    Tainted: PF  U
(2.6.5-7.287.3-s390x SLES9_SP3_BRANCH-20071002073136)
Feb 18 01:02:49 mclinx kernel: Process afs_cachetrim (pid: 6092, task:
00000003d4bf9000, ksp: 00000003e1d437b0)
Feb 18 01:02:49 mclinx kernel: Krnl PSW : 0700000180000000 00000003fe45a3c2
(afs_HashOutDCache+0x25a/0x284 [libafs])
Feb 18 01:02:49 mclinx kernel: Krnl GPRS: 00000000000000f0 0000000000507fc0
0000000000000079 00000003e1d43970
Feb 18 01:02:49 mclinx kernel:            00000003fe45a3c0 000000000002740c
00000003fe508670 0000000000000000
Feb 18 01:02:49 mclinx kernel:            0000000300000001 00000003fe4f1ff8
00000000000000c8 00000003feaae9c0
Feb 18 01:02:49 mclinx kernel:            00000003fe445000 00000003fe4c8e50
00000003fe45a3c0 00000003e1d43a70
Feb 18 01:02:49 mclinx kernel: Krnl Code: 00 00 e3 10 b0 a6 00 90 a5 1b 00
02 42 10 b0 a6 a7 18 00 00
Feb 18 01:02:49 mclinx kernel: Call Trace:
Feb 18 01:02:49 mclinx kernel:  [<00000003fe45ac9e>]
afs_GetDownD+0x79a/0x960 [libafs]
Feb 18 01:02:49 mclinx kernel:  [<00000003fe45ea02>]
afs_CacheTruncateDaemon+0x196/0x5e8 [libafs]
Feb 18 01:02:49 mclinx kernel:  [<00000003fe4bf7a4>]
afsd_thread+0x3c8/0x8e8 [libafs]
Feb 18 01:02:49 mclinx kernel:  [<0000000000108b60>]
kernel_thread_starter+0x14/0x1c
Feb 18 01:02:49 mclinx kernel:

Unfortunately, I have no idea how to reproduce this situation and we can't
predict when it will happen next.
Looking at the source code we can spot the problem here:

        /* remove entry from *other* hash chain */
        i = DVHash(&adc->f.fid);
        us = afs_dvhashTbl[i];
        if (us == adc->index) {
            /* first dude in the list */
            afs_dvhashTbl[i] = afs_dvnextTbl[adc->index];
        } else {
            /* somewhere on the chain */
            while (us != NULLIDX) {
                if (afs_dvnextTbl[us] == adc->index) {
                    /* found item pointing at the one to delete */
                    afs_dvnextTbl[us] = afs_dvnextTbl[adc->index];
                    break;
                }
                us = afs_dvnextTbl[us];
            }
            if (us == NULLIDX)
                osi_Panic("dcache hv"); <---------- this is the line 719 as
shown in the panic message
        }

Well, I don't know what the DVHash is, but it looks like the
index which is found in the afs_dvhashTbl[] array which is
indicated by &adc->f.fid is containing the value "NULLIDX"
and it seems the routine afs_HashOutDCache() doesn't like
this at all.
My question is under which circumstances the cachetrim thread enters
this situation. Is there indeed no other way then calling osi_Panic()
to handle this?
The cachetrim crash is painful for us because we have to restart
the whole system whenever we face it.

With kind regards,

Carsten Jacobi (*120-4468)
Firmware Development in Böblingen

IBM Deutschland Entwicklung GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschäftsführung: Herbert Kircher
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

Reply via email to