On 04/03/2021 05:38, Hajime Tazaki wrote:

On Thu, 04 Mar 2021 07:40:00 +0900,
Johannes Berg wrote:

I think the problem is here:

#24 0x000000006080f234 in ipc_init_ids (ids=0x60c60de8 <init_ipc_ns+8>)
at ipc/util.c:119
#25 0x0000000060813c6d in sem_init_ns (ns=0x60d895bb <textbuf+91>) at
ipc/sem.c:254
#26 0x0000000060015b5d in sem_init () at ipc/sem.c:268
#27 0x00007f89906d92f7 in ?? () from /lib/x86_64-linux-
gnu/libcom_err.so.2

You're in the init of libcom_err.so.2, which is loaded by

"libnss_nis.so.2"

which is loaded by normal NSS code (getgrnam):

#40 0x00007f89909bf3a6 in nss_load_library (ni=ni@entry=0x61497db0) at
nsswitch.c:359
#41 0x00007f89909bfc39 in __GI___nss_lookup_function (ni=0x61497db0,
fct_name=<optimized out>, fct_name@entry=0x7f899089b020 "setgrent") at
nsswitch.c:467
#42 0x00007f899089554b in init_nss_interface () at nss_compat/compat-
grp.c:83
#43 init_nss_interface () at nss_compat/compat-grp.c:79
#44 0x00007f8990895e35 in _nss_compat_getgrnam_r (name=0x7f8990a2a1e0
"tty", grp=0x7ffe3e7a2910, buffer=0x7ffe3e7a24e0 "", buflen=1024,
errnop=0x7f899089eb00) at nss_compat/compat-grp.c:486
#45 0x00007f8990968b85 in __getgrnam_r (name=name@entry=0x7f8990a2a1e0
"tty", resbuf=resbuf@entry=0x7ffe3e7a2910,
buffer=buffer@entry=0x7ffe3e7a24e0 "", buflen=1024,
result=result@entry=0x7ffe3e7a2908)
     at ../nss/getXXbyYY_r.c:315


You have a strange nsswitch configuration that causes all of this
(libnss_nis.so.2 -> libcom_err.so.2) to get loaded.

Now libcom_err.so.2 is trying to call sem_init(), and that gets ... tada
... Linux's sem_init() instead of libpthread's.

And then the crash.

Now, I don't know how to fix it (short of changing your nsswitch
configuration) - maybe we could somehow rename sem_init()? Or maybe we
can somehow give the kernel binary a lower symbol resolution than the
libc/libpthread.

objcopy (from binutils) can localize symbols (i.e., objcopy -L
sem_init $orig_file $new_file).  It also does renaming symbols.  But
not sure this is the ideal solution.

How does UML handle symbol conflicts between userspace code and Linux
kernel (like this case sem_init) ?  AFAIK, libnl has a same symbol as
Linux kernel (genlmsg_put) and others can possibly do as well.

It used to handle them. I do not think it does now - something broke and it's fairly recent.

I actually have something which confirms this.

I worked on a patch around 5.8-5.9 which would give the option to pick up libc equivalents for the functions from string.h and there was a clear performance difference of ~ 20%+ This is because UML has no means of optimizing them and picks up the worst case scenario x86 version.

I parked that for a while, because had to look at other stuff at work.

I restarted working on it after 5.10. My first observation was that despite not changing anything in the patches, the gain was no longer there. The performance was the same as if it picked up libc equivalents.

I can either try to reproduce the nss config which causes the sem_init issue or use my own libc patchset to try to dissect. The problem commit will be roughly around the time the performance difference from applying the "switch to libc" goes away.

Brgds,

A.


-- Hajime

_______________________________________________
linux-um mailing list
linux...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um



--
Anton R. Ivanov
https://www.kot-begemot.co.uk/

Reply via email to