Hi everyone-
   After further investigation of this problem, I tracked this to an assignment 
in the function sym_flex in src/export/symtab.c.  Here is the function:


/*
 * sym_flex -   convert a symbol so that there is no distinction between
 *              flex-string and non flex-string format.
 *
 * Input:
 *      sym     -       ^ to symbol table entry
 *
 * Returns:
 *      ^ to static location containing modified symbol.
 */
static sym_t *
sym_flex(sym_t *sym)
{
    static sym_t symbol;
    static char name[48];

    strncpy(name, sym_str(sym), sizeof(name) - 1);

#ifndef __XCOFF64__
    if (sym->n_zeroes != 0)
        name[8] = 0;            /* make sure that we truncate correctly */
    symbol.n_zeroes = 0;
#endif

    symbol = *sym;
    symbol.n_nptr = name;

    return &symbol;
}

I'm not entirely sure why the backtrace doesn't show sym_flex as the culprit, 
but I added printf statements all over the place and found that we panic on 
this line:

   symbol = *sym;

   Some things that make me scratch my head are why the static definitions, and 
why symbol is declared as sym_t symbol whereas all the other uses of sym_t are 
like sym_t *foo.  Also, this function gets called by symsrch.  But what is 
"flex-string format" and why do we even need this as opposed to just having 
symsrch return the pointer directly instead of sym_flex'ing it?  Also why does 
this panic when compiled with ibm-clang and not XLC?

   I'm a little fuzzy on all the pointer stuff so I would appreciate a second 
set of eyes if anything jumps out to anyone.

Thank you!

-Ben

________________________________
From: Ben Huntsman
Sent: Monday, January 6, 2025 12:04 PM
To: [email protected] <[email protected]>
Subject: AIX kernel panic when using XL C 17.1+

Hi all-
   I made a number of changes in a local branch and got the kernel extension to 
compile under the new OpenXL C 17.1 on AIX 7.3  This new compiler is based on 
CLANG 13.

   On AIX, the kernel part is in two extensions, the "regular" libafs stuff, 
and also src/export.  The src/export code builds export64.ext.nonfs, which is 
what usually gets loaded before the "main" extension.  Loading 
export64.ext.nonfs works, but then we get a kernel panic right away when 
loading the afs extension.  The kernel trace seems to implicate the function 
symsrch in src/export/symtab.c:

CRASH INFORMATION:
CPU 0 CSA F00000002FF47600 at time of crash, error code for LEDs: 70000000
pvthread+098800 STACK:
[F10009D5B0650F34]symsrch+000108 (F10009D5B06521D0)                     In 
src/export/symtab.c
[F10009D5B0650CB0]sym_lookup+000024 (000000000017E734, 0000000000000000)      
In src/export/symtab.c
[F10009D5B065087C]import_kfunc+000034 (0000000110000FDB)                In 
src/export/export.c
[F10009D5B0654A74]kluge_init+000084 ()                                  In 
src/afs/AIX/osi_config.c
[F10009D5B06544F8]afs_config+000088 (0000000000000001, FFFF3FFFFFFFFFF3)
[00014D70].hkey_legacy_gate+00004C ()
[008797C8]IPRA.$config_kmod+000128 (??, ??, ??)
[00879FD4]sysconfig+000194 (??, ??, ??)
[00003984]syscall+000244 ()
[kdb_get_virtual_memory] no real storage @ FFFFFFFFFFFD870
[100000A44]0000000100000A44 ()

When compiling this extension, ibm-clang flags quite a few warnings, but not 
any errors.  The code looks pretty correct to me, but can anyone else spot an 
issue with it?

One thing I did notice is that clang and xlc 16.1 and earlier must do 
differently is how uninitialized variables work... For example in cfgexport.c, 
before we make the call to sysconfig, we set the cload struct member "path" to 
the path to the extension.  The struct also has a member "libpath", and if we 
do not explicitly set that to NULL, the extension load fails when compiled 
under ibm-clang, but works fine when built by XLC 16.1.  Could something 
similar be happening with the symsrch function?

Many thanks!

-Ben


Reply via email to