Hi everyone-
After further investigation of this problem, I tracked this to an assignment
in the function sym_flex in src/export/symtab.c. Here is the function:
/*
* sym_flex - convert a symbol so that there is no distinction between
* flex-string and non flex-string format.
*
* Input:
* sym - ^ to symbol table entry
*
* Returns:
* ^ to static location containing modified symbol.
*/
static sym_t *
sym_flex(sym_t *sym)
{
static sym_t symbol;
static char name[48];
strncpy(name, sym_str(sym), sizeof(name) - 1);
#ifndef __XCOFF64__
if (sym->n_zeroes != 0)
name[8] = 0; /* make sure that we truncate correctly */
symbol.n_zeroes = 0;
#endif
symbol = *sym;
symbol.n_nptr = name;
return &symbol;
}
I'm not entirely sure why the backtrace doesn't show sym_flex as the culprit,
but I added printf statements all over the place and found that we panic on
this line:
symbol = *sym;
Some things that make me scratch my head are why the static definitions, and
why symbol is declared as sym_t symbol whereas all the other uses of sym_t are
like sym_t *foo. Also, this function gets called by symsrch. But what is
"flex-string format" and why do we even need this as opposed to just having
symsrch return the pointer directly instead of sym_flex'ing it? Also why does
this panic when compiled with ibm-clang and not XLC?
I'm a little fuzzy on all the pointer stuff so I would appreciate a second
set of eyes if anything jumps out to anyone.
Thank you!
-Ben
________________________________
From: Ben Huntsman
Sent: Monday, January 6, 2025 12:04 PM
To: [email protected] <[email protected]>
Subject: AIX kernel panic when using XL C 17.1+
Hi all-
I made a number of changes in a local branch and got the kernel extension to
compile under the new OpenXL C 17.1 on AIX 7.3 This new compiler is based on
CLANG 13.
On AIX, the kernel part is in two extensions, the "regular" libafs stuff,
and also src/export. The src/export code builds export64.ext.nonfs, which is
what usually gets loaded before the "main" extension. Loading
export64.ext.nonfs works, but then we get a kernel panic right away when
loading the afs extension. The kernel trace seems to implicate the function
symsrch in src/export/symtab.c:
CRASH INFORMATION:
CPU 0 CSA F00000002FF47600 at time of crash, error code for LEDs: 70000000
pvthread+098800 STACK:
[F10009D5B0650F34]symsrch+000108 (F10009D5B06521D0) In
src/export/symtab.c
[F10009D5B0650CB0]sym_lookup+000024 (000000000017E734, 0000000000000000)
In src/export/symtab.c
[F10009D5B065087C]import_kfunc+000034 (0000000110000FDB) In
src/export/export.c
[F10009D5B0654A74]kluge_init+000084 () In
src/afs/AIX/osi_config.c
[F10009D5B06544F8]afs_config+000088 (0000000000000001, FFFF3FFFFFFFFFF3)
[00014D70].hkey_legacy_gate+00004C ()
[008797C8]IPRA.$config_kmod+000128 (??, ??, ??)
[00879FD4]sysconfig+000194 (??, ??, ??)
[00003984]syscall+000244 ()
[kdb_get_virtual_memory] no real storage @ FFFFFFFFFFFD870
[100000A44]0000000100000A44 ()
When compiling this extension, ibm-clang flags quite a few warnings, but not
any errors. The code looks pretty correct to me, but can anyone else spot an
issue with it?
One thing I did notice is that clang and xlc 16.1 and earlier must do
differently is how uninitialized variables work... For example in cfgexport.c,
before we make the call to sysconfig, we set the cload struct member "path" to
the path to the extension. The struct also has a member "libpath", and if we
do not explicitly set that to NULL, the extension load fails when compiled
under ibm-clang, but works fine when built by XLC 16.1. Could something
similar be happening with the symsrch function?
Many thanks!
-Ben