Bug#983379: linux uml segfault

2021-03-07 Thread Johannes Berg
On Sun, 2021-03-07 at 21:22 +0900, Hajime Tazaki wrote:
> Sorry that this email is going to be long.  In summary, what Johannes
> said is right: what objcopy does is not sufficient, and with ld it
> transforms as we expected.
> 
> More goes to below.

[snip]

Interesting, thanks for looking into that!

johannes



Bug#983379: linux uml segfault

2021-03-07 Thread Hajime Tazaki


Sorry that this email is going to be long.  In summary, what Johannes
said is right: what objcopy does is not sufficient, and with ld it
transforms as we expected.

More goes to below.

On Sat, 06 Mar 2021 05:22:19 +0900,
Johannes Berg wrote:
> 
> On Thu, 2021-03-04 at 14:38 +0900, Hajime Tazaki wrote:
> > 
> > objcopy (from binutils) can localize symbols (i.e., objcopy -L
> > sem_init $orig_file $new_file).
> 
> This doesn't seem to be sufficient.
> 
> > It also does renaming symbols.  But
> > not sure this is the ideal solution.
> 
> Even that doesn't seem to actually work/help? I still get libcom_err
> trying to call UML's sem_init, even after doing
>  objcopy --redefine-sym sem_init=uml_sem_init
> 
> 
> > How does UML handle symbol conflicts between userspace code and Linux
> > kernel (like this case sem_init) ?  AFAIK, libnl has a same symbol as
> > Linux kernel (genlmsg_put) and others can possibly do as well.
> 
> I think like I said it just doesn't but since you don't have much
> userspace code linked with UML it never really mattered?
> 
> We only link a 'linux' binary, after all. How does LKL handle this
> though? It should be far more affected?
> 
> 
> Despite the objcopy *not* fixing it, this does seem to:

with slightly old version:
 - objcopy/ld version 2.29.1-23.fc28

I confirmed that objcopy (both --redefine-sym and --localize-symbol)
only changes symbols of .symtab table.  But there is another table,
.dynsym table, which is used to resolve.
So, the original file looks like this:


1) before objcopy (vmlinux)
% readelf -s obj-x86-um/vmlinux |grep -E "sem_init|Symbol table|Num:"
Symbol table '.dynsym' contains 179 entries:
   Num:Value  Size TypeBind   Vis  Ndx Name
   129: 60011d3872 FUNCGLOBAL DEFAULT2 sem_init
Symbol table '.symtab' contains 38474 entries:
   Num:Value  Size TypeBind   Vis  Ndx Name
 28515: 60011d3872 FUNCGLOBAL DEFAULT2 sem_init
 37798: 601e30d562 FUNCGLOBAL DEFAULT   13 sem_init_ns
 
the result object looks like

2) after objcopy (linux)
% readelf -s obj-x86-um/linux |grep -E "sem_init|Symbol table|Num:"
Symbol table '.dynsym' contains 179 entries:
   Num:Value  Size TypeBind   Vis  Ndx Name
   129: 60011d3872 FUNCGLOBAL DEFAULT2 sem_init
Symbol table '.symtab' contains 38474 entries:
   Num:Value  Size TypeBind   Vis  Ndx Name
 28455: 60011d3872 FUNCLOCAL  DEFAULT2 sem_init
 37798: 601e30d562 FUNCGLOBAL DEFAULT   13 sem_init_ns

Only .symtab symbol table is changed to local while .dynsym table is
not changed.  So, sem_init call from libcom_err.so still can resolve
the Linux symbol.


On the other hand, ld --version script solution does as we wish.

3) localized with ld
% readelf -s obj-x86-um/linux G -E "sem_init|Symbol table|Num:" 
Symbol table '.dynsym' contains 142 entries:
   Num:Value  Size TypeBind   Vis  Ndx Name
Symbol table '.symtab' contains 38474 entries:
   Num:Value  Size TypeBind   Vis  Ndx Name
 28512: 60011d3872 FUNCLOCAL  DEFAULT2 sem_init
 37669: 601e2b4562 FUNCLOCAL  DEFAULT   13 sem_init_ns

Only .symtab table is generated for the sem_init symbol and it's localized.


Because the way to build is different from what UML currently does,
LKL (and UML binaries) do not have this issue, with a quick check.

LKL applies objcopy before generating intermediate file (linux.o), and
the symbols of the final binary (linux) are localized and have no
.dynsym entries, thus no issue in this case.

refs:
https://stackoverflow.com/questions/54332797/binding-failure-with-objcopy-redefine-syms
https://sourceware.org/legacy-ml/binutils/2019-01/msg00254.html


-- Hajime



Bug#983379: linux uml segfault

2021-03-05 Thread Hajime Tazaki


might be late, but I'll give it a try with your dlopen reproducer.

-- Hajime

On Sat, 06 Mar 2021 05:22:19 +0900,
Johannes Berg wrote:
> 
> On Thu, 2021-03-04 at 14:38 +0900, Hajime Tazaki wrote:
> > 
> > objcopy (from binutils) can localize symbols (i.e., objcopy -L
> > sem_init $orig_file $new_file).
> 
> This doesn't seem to be sufficient.
> 
> > It also does renaming symbols.  But
> > not sure this is the ideal solution.
> 
> Even that doesn't seem to actually work/help? I still get libcom_err
> trying to call UML's sem_init, even after doing
>  objcopy --redefine-sym sem_init=uml_sem_init



Bug#983379: linux uml segfault

2021-03-05 Thread Johannes Berg
On Thu, 2021-03-04 at 14:38 +0900, Hajime Tazaki wrote:
> 
> objcopy (from binutils) can localize symbols (i.e., objcopy -L
> sem_init $orig_file $new_file).

This doesn't seem to be sufficient.

> It also does renaming symbols.  But
> not sure this is the ideal solution.

Even that doesn't seem to actually work/help? I still get libcom_err
trying to call UML's sem_init, even after doing
 objcopy --redefine-sym sem_init=uml_sem_init


> How does UML handle symbol conflicts between userspace code and Linux
> kernel (like this case sem_init) ?  AFAIK, libnl has a same symbol as
> Linux kernel (genlmsg_put) and others can possibly do as well.

I think like I said it just doesn't but since you don't have much
userspace code linked with UML it never really mattered?

We only link a 'linux' binary, after all. How does LKL handle this
though? It should be far more affected?


Despite the objcopy *not* fixing it, this does seem to:

diff --git a/arch/um/kernel/dyn.lds.S b/arch/um/kernel/dyn.lds.S
index dacbfabf66d8..2f2a8ce92f1e 100644
--- a/arch/um/kernel/dyn.lds.S
+++ b/arch/um/kernel/dyn.lds.S
@@ -6,6 +6,12 @@ OUTPUT_ARCH(ELF_ARCH)
 ENTRY(_start)
 jiffies = jiffies_64;
 
+VERSION {
+  {
+local: *;
+  };
+}
+
 SECTIONS
 {
   PROVIDE (__executable_start = START);
diff --git a/arch/um/kernel/uml.lds.S b/arch/um/kernel/uml.lds.S
index 45d957d7004c..7a8e2b123e29 100644
--- a/arch/um/kernel/uml.lds.S
+++ b/arch/um/kernel/uml.lds.S
@@ -7,6 +7,12 @@ OUTPUT_ARCH(ELF_ARCH)
 ENTRY(_start)
 jiffies = jiffies_64;
 
+VERSION {
+  {
+local: *;
+  };
+}
+
 SECTIONS
 {
   /* This must contain the right address - not quite the default ELF one.*/

johannes



Bug#983379: linux uml segfault

2021-03-05 Thread Johannes Berg


> Ritesh, can you give the following a spin - it renames sem_init as 
> um_sem_init for UML only?

FWIW, this fixes the issue in my reproducer, so should work here too:

diff --git a/ipc/util.h b/ipc/util.h
index 5766c61aed0e..cfed40ba983c 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#define sem_init uml_sem_init
 
 /*
  * The IPC ID contains 2 separate numbers - index and sequence number.

johannes



Bug#983379: linux uml segfault

2021-03-05 Thread Johannes Berg
On Fri, 2021-03-05 at 19:03 +, Anton Ivanov wrote:
> 
> I thought of that, but surrendered to the "dark side" of the quick and ugly 
> fix.

:)

> We can do that for the ipc/sem.c - it brings in uaccess.h which
> ultimately pulls uaccess from our asm tree. So if we do it there, it
> will end up in sem.c

Well, most easily you could do it in ipc/util.h, where it's declared. Or
any place that is pulled in by it, e.g. even asm/errno.h.

All ugly though.

johannes



Bug#983379: linux uml segfault

2021-03-05 Thread Johannes Berg
On Wed, 2021-03-03 at 23:40 +0100, Johannes Berg wrote:

> Now libcom_err.so.2 is trying to call sem_init(), and that gets ... tada
> ... Linux's sem_init() instead of libpthread's.
> 
> And then the crash.

FWIW, I can trivially reproduce this by simply force-loading
libcom_err.so:


diff --git a/arch/um/Makefile b/arch/um/Makefile
index 1cea46ff9bb7..a16b411154fb 100644
--- a/arch/um/Makefile
+++ b/arch/um/Makefile
@@ -134,7 +134,7 @@ LINK_WRAPS = -Wl,--wrap,malloc -Wl,--wrap,free 
-Wl,--wrap,calloc
 LD_FLAGS_CMDLINE = $(foreach opt,$(KBUILD_LDFLAGS),-Wl,$(opt))
 
 # Used by link-vmlinux.sh which has special support for um link
-export CFLAGS_vmlinux := $(LINK-y) $(LINK_WRAPS) $(LD_FLAGS_CMDLINE)
+export CFLAGS_vmlinux := $(LINK-y) $(LINK_WRAPS) $(LD_FLAGS_CMDLINE) -ldl
 
 # When cleaning we don't include .config, so we don't include
 # TT or skas makefiles and don't clean skas_ptregs.h.
diff --git a/arch/um/os-Linux/main.c b/arch/um/os-Linux/main.c
index c8a42ecbd7a2..873dc4c40cb7 100644
--- a/arch/um/os-Linux/main.c
+++ b/arch/um/os-Linux/main.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define PGD_BOUND (4 * 1024 * 1024)
 #define STACKSIZE (8 * 1024 * 1024)
@@ -115,6 +116,8 @@ int __init main(int argc, char **argv, char **envp)
 
setsid();
 
+dlopen("/usr/lib64/libcom_err.so.2", RTLD_NOW|RTLD_GLOBAL);
+
new_argv = malloc((argc + 1) * sizeof(char *));
if (new_argv == NULL) {
perror("Mallocing argv");


johannes



Bug#983379: linux uml segfault

2021-03-05 Thread Anton Ivanov




On 05/03/2021 18:32, Johannes Berg wrote:



On 5 March 2021 18:39:42 CET, Anton Ivanov  
wrote:



On 04/03/2021 07:47, Johannes Berg wrote:

On Thu, 2021-03-04 at 14:38 +0900, Hajime Tazaki wrote:


Now, I don't know how to fix it (short of changing your nsswitch
configuration) - maybe we could somehow rename sem_init()? Or maybe

we

can somehow give the kernel binary a lower symbol resolution than

the

libc/libpthread.


objcopy (from binutils) can localize symbols (i.e., objcopy -L
sem_init $orig_file $new_file).  It also does renaming symbols.  But
not sure this is the ideal solution.


Yes, we started thinking about it but it was too late at night when I
replied ...

I think there's basically a way to have an external list of symbols

to

export, for symbol versioning, that we could/should use to basically

not

export any of the kernel symbols out to libs.


How does UML handle symbol conflicts between userspace code and

Linux

kernel (like this case sem_init) ?  AFAIK, libnl has a same symbol

as

Linux kernel (genlmsg_put) and others can possibly do as well.


I fear it doesn't?


Let's assume it does not, and try to fix this by de-conflicting the
symbol.
For the time being, also, let's aim for a Debian specific patch just to
go into their "patches" dir for build so that UML is not dropped out of
the release.

This should make all internal uses of sem_init be um_sem_init in the
actual object files. I will chase the issue of it picking up glibc
memcpy separately.
Upon close inspection it looks like a different issue - it is in the
other direction (picking a dynamic symbol instead of the one from the
tree). I spent all day chasing it today and I cannot reproduce it. At
the same time it was reproducible yesterday without any problems :(



+#ifdef CONFIG_UML
+void __init um_sem_init(void)
+#else
  void __init sem_init(void)
+#endif


Might be easier to just

#define sem_init um_sem_init

in an appropriate header file, perhaps even in arch/um/?


I thought of that, but surrendered to the "dark side" of the quick and ugly fix.

We can do that for the ipc/sem.c - it brings in uaccess.h which ultimately 
pulls uaccess from our asm tree. So if we do it there, it will end up in sem.c

However, that function is also referenced and is invoked out of ipc/util.c 
which does not pull that include.

I am going to dig through the rest of our includes to see if we can find a suitable one 
which will be picked up by both sem.c and util.c. I hope there is a place which we can 
use for a "proper" fix.

By the way, I actually remember seeing a couple of includes like that somewhere 
dealing with other um symbol conflicts, just can't remember where I saw it.




johannes



--
Anton R. Ivanov
https://www.kot-begemot.co.uk/



Bug#983379: linux uml segfault

2021-03-05 Thread Johannes Berg



On 5 March 2021 18:39:42 CET, Anton Ivanov  
wrote:
>
>
>On 04/03/2021 07:47, Johannes Berg wrote:
>> On Thu, 2021-03-04 at 14:38 +0900, Hajime Tazaki wrote:
>> 
 Now, I don't know how to fix it (short of changing your nsswitch
 configuration) - maybe we could somehow rename sem_init()? Or maybe
>we
 can somehow give the kernel binary a lower symbol resolution than
>the
 libc/libpthread.
>>>
>>> objcopy (from binutils) can localize symbols (i.e., objcopy -L
>>> sem_init $orig_file $new_file).  It also does renaming symbols.  But
>>> not sure this is the ideal solution.
>> 
>> Yes, we started thinking about it but it was too late at night when I
>> replied ...
>> 
>> I think there's basically a way to have an external list of symbols
>to
>> export, for symbol versioning, that we could/should use to basically
>not
>> export any of the kernel symbols out to libs.
>> 
>>> How does UML handle symbol conflicts between userspace code and
>Linux
>>> kernel (like this case sem_init) ?  AFAIK, libnl has a same symbol
>as
>>> Linux kernel (genlmsg_put) and others can possibly do as well.
>> 
>> I fear it doesn't?
>
>Let's assume it does not, and try to fix this by de-conflicting the
>symbol.
>For the time being, also, let's aim for a Debian specific patch just to
>go into their "patches" dir for build so that UML is not dropped out of
>the release.
>
>This should make all internal uses of sem_init be um_sem_init in the
>actual object files. I will chase the issue of it picking up glibc
>memcpy separately.
>Upon close inspection it looks like a different issue - it is in the
>other direction (picking a dynamic symbol instead of the one from the
>tree). I spent all day chasing it today and I cannot reproduce it. At
>the same time it was reproducible yesterday without any problems :(

>+#ifdef CONFIG_UML
>+void __init um_sem_init(void)
>+#else
>  void __init sem_init(void)
>+#endif

Might be easier to just

#define sem_init um_sem_init

in an appropriate header file, perhaps even in arch/um/? 


johannes
-- 
Sent from my phone.



Bug#983379: linux uml segfault

2021-03-05 Thread Anton Ivanov




On 04/03/2021 07:47, Johannes Berg wrote:

On Thu, 2021-03-04 at 14:38 +0900, Hajime Tazaki wrote:


Now, I don't know how to fix it (short of changing your nsswitch
configuration) - maybe we could somehow rename sem_init()? Or maybe we
can somehow give the kernel binary a lower symbol resolution than the
libc/libpthread.


objcopy (from binutils) can localize symbols (i.e., objcopy -L
sem_init $orig_file $new_file).  It also does renaming symbols.  But
not sure this is the ideal solution.


Yes, we started thinking about it but it was too late at night when I
replied ...

I think there's basically a way to have an external list of symbols to
export, for symbol versioning, that we could/should use to basically not
export any of the kernel symbols out to libs.


How does UML handle symbol conflicts between userspace code and Linux
kernel (like this case sem_init) ?  AFAIK, libnl has a same symbol as
Linux kernel (genlmsg_put) and others can possibly do as well.


I fear it doesn't?


Let's assume it does not, and try to fix this by de-conflicting the symbol.
For the time being, also, let's aim for a Debian specific patch just to go into their 
"patches" dir for build so that UML is not dropped out of the release.

This should make all internal uses of sem_init be um_sem_init in the actual 
object files. I will chase the issue of it picking up glibc memcpy separately.
Upon close inspection it looks like a different issue - it is in the other 
direction (picking a dynamic symbol instead of the one from the tree). I spent 
all day chasing it today and I cannot reproduce it. At the same time it was 
reproducible yesterday without any problems :(

Ritesh, can you give the following a spin - it renames sem_init as um_sem_init 
for UML only?

diff --git a/ipc/sem.c b/ipc/sem.c
index f6c30a85dadf..5157796daf54 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -263,7 +263,11 @@ void sem_exit_ns(struct ipc_namespace *ns)
 }
 #endif

+#ifdef CONFIG_UML
+void __init um_sem_init(void)
+#else
 void __init sem_init(void)
+#endif
 {
sem_init_ns(_ipc_ns);
ipc_init_proc_interface("sysvipc/sem",
diff --git a/ipc/util.h b/ipc/util.h
index 5766c61aed0e..b3356efb3c96 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -47,7 +47,12 @@ extern int ipc_min_cycle;
 #define IPCMNI_IDX_MASK((1 << IPCMNI_SHIFT) - 1)
 #endif /* CONFIG_SYSVIPC_SYSCTL */

+#ifdef CONFIG_UML
+void um_sem_init(void);
+#define sem_init() um_sem_init()
+#else
 void sem_init(void);
+#endif
 void msg_init(void);
 void shm_init(void);





johannes




--
Anton R. Ivanov
https://www.kot-begemot.co.uk/



Bug#983379: linux uml segfault

2021-03-05 Thread Johannes Berg
On Fri, 2021-03-05 at 09:59 +, Anton Ivanov wrote:
> 
> This is proving very "interesting" to try to chase down, because the
> "picking the wrong library" does not happen every time.
> 
> F.E. yesterday my 5.10 builds were picking glibc memcpy and friends.
> Today with the same config and everything else the same it is picking
> built-ins.

Ouch.

> I need to finds some better way to reproduce this.

Maybe something like the original report? That caused sem_init() to be
called, so we know libc will/may call something there.

You and me probably don't have the nss setup to cause sem_init() to get
called, but maybe simply putting

void init_nss_interface(void)
{
  panic("how did we get here");
}

somewhere in the kernel image might already reproduce it?

johannes



Bug#983379: linux uml segfault

2021-03-05 Thread Anton Ivanov



On 04/03/2021 18:41, Anton Ivanov wrote:



On 04/03/2021 08:05, Benjamin Berg wrote:

On Thu, 2021-03-04 at 08:47 +0100, Johannes Berg wrote:
On Thu, 2021-03-04 at 14:38 +0900, Hajime Tazaki wrote:


Now, I don't know how to fix it (short of changing your nsswitch
configuration) - maybe we could somehow rename sem_init()? Or maybe
we
can somehow give the kernel binary a lower symbol resolution than
the
libc/libpthread.


objcopy (from binutils) can localize symbols (i.e., objcopy -L
sem_init $orig_file $new_file).  It also does renaming symbols.  But
not sure this is the ideal solution.


Yes, we started thinking about it but it was too late at night when I
replied ...

I think there's basically a way to have an external list of symbols to
export, for symbol versioning, that we could/should use to basically
not
export any of the kernel symbols out to libs.

Maybe using the ld --version-script= option here works to mark all
kernel symbols as being "local" and prevent them from being picked up
by libraries.

Benjamin


How does UML handle symbol conflicts between userspace code and Linux
kernel (like this case sem_init) ?  AFAIK, libnl has a same symbol as
Linux kernel (genlmsg_put) and others can possibly do as well.


I fear it doesn't?


I can confirm that it did and this bug is bisect-able.

with 5.7

# dd if=/dev/ubda of=/dev/null bs=1M
16384+1 records in
16384+1 records out
17179869696 bytes (17 GB, 16 GiB) copied, 10.6973 s, 1.6 GB/s

with 5.10 the speed is 2.2
5.7 with "strings from glibc" patch speed is 2.2

As we did not do anything else in this timeframe to jack up the speed from 1.6GB/s to 
2.2GB/s and as it is identical to the speed you get with the "use glibc 
strings.h" this looks like a good criteria to bisect on.

I am going to do a bisect with 5.7 "good" and 5.10 "bad" using the speed test 
as a working hypothesis.


This is proving very "interesting" to try to chase down, because the "picking the 
wrong library" does not happen every time.

F.E. yesterday my 5.10 builds were picking glibc memcpy and friends. Today with 
the same config and everything else the same it is picking built-ins.

I need to finds some better way to reproduce this.

A.




A.




johannes


___
linux-um mailing list
linux...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um



___
linux-um mailing list
linux...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um




--
Anton R. Ivanov
https://www.kot-begemot.co.uk/



Bug#983379: linux uml segfault

2021-03-04 Thread Anton Ivanov




On 04/03/2021 08:05, Benjamin Berg wrote:

On Thu, 2021-03-04 at 08:47 +0100, Johannes Berg wrote:
On Thu, 2021-03-04 at 14:38 +0900, Hajime Tazaki wrote:


Now, I don't know how to fix it (short of changing your nsswitch
configuration) - maybe we could somehow rename sem_init()? Or maybe
we
can somehow give the kernel binary a lower symbol resolution than
the
libc/libpthread.


objcopy (from binutils) can localize symbols (i.e., objcopy -L
sem_init $orig_file $new_file).  It also does renaming symbols.  But
not sure this is the ideal solution.


Yes, we started thinking about it but it was too late at night when I
replied ...

I think there's basically a way to have an external list of symbols to
export, for symbol versioning, that we could/should use to basically
not
export any of the kernel symbols out to libs.

Maybe using the ld --version-script= option here works to mark all
kernel symbols as being "local" and prevent them from being picked up
by libraries.

Benjamin


How does UML handle symbol conflicts between userspace code and Linux
kernel (like this case sem_init) ?  AFAIK, libnl has a same symbol as
Linux kernel (genlmsg_put) and others can possibly do as well.


I fear it doesn't?


I can confirm that it did and this bug is bisect-able.

with 5.7

# dd if=/dev/ubda of=/dev/null bs=1M
16384+1 records in
16384+1 records out
17179869696 bytes (17 GB, 16 GiB) copied, 10.6973 s, 1.6 GB/s

with 5.10 the speed is 2.2
5.7 with "strings from glibc" patch speed is 2.2

As we did not do anything else in this timeframe to jack up the speed from 1.6GB/s to 
2.2GB/s and as it is identical to the speed you get with the "use glibc 
strings.h" this looks like a good criteria to bisect on.

I am going to do a bisect with 5.7 "good" and 5.10 "bad" using the speed test 
as a working hypothesis.

A.




johannes


___
linux-um mailing list
linux...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um



___
linux-um mailing list
linux...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um



--
Anton R. Ivanov
https://www.kot-begemot.co.uk/



Bug#983379: linux uml segfault

2021-03-04 Thread Benjamin Berg
On Thu, 2021-03-04 at 08:47 +0100, Johannes Berg wrote:
On Thu, 2021-03-04 at 14:38 +0900, Hajime Tazaki wrote:

> > Now, I don't know how to fix it (short of changing your nsswitch
> > configuration) - maybe we could somehow rename sem_init()? Or maybe
> > we
> > can somehow give the kernel binary a lower symbol resolution than
> > the
> > libc/libpthread.
> 
> objcopy (from binutils) can localize symbols (i.e., objcopy -L
> sem_init $orig_file $new_file).  It also does renaming symbols.  But
> not sure this is the ideal solution.

Yes, we started thinking about it but it was too late at night when I
replied ...

I think there's basically a way to have an external list of symbols to
export, for symbol versioning, that we could/should use to basically
not
export any of the kernel symbols out to libs.

Maybe using the ld --version-script= option here works to mark all
kernel symbols as being "local" and prevent them from being picked up
by libraries.

Benjamin

> How does UML handle symbol conflicts between userspace code and Linux
> kernel (like this case sem_init) ?  AFAIK, libnl has a same symbol as
> Linux kernel (genlmsg_put) and others can possibly do as well.

I fear it doesn't?

johannes


___
linux-um mailing list
linux...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um




signature.asc
Description: This is a digitally signed message part


Bug#983379: linux uml segfault

2021-03-03 Thread Johannes Berg
On Thu, 2021-03-04 at 14:38 +0900, Hajime Tazaki wrote:

> > Now, I don't know how to fix it (short of changing your nsswitch
> > configuration) - maybe we could somehow rename sem_init()? Or maybe we
> > can somehow give the kernel binary a lower symbol resolution than the
> > libc/libpthread.
> 
> objcopy (from binutils) can localize symbols (i.e., objcopy -L
> sem_init $orig_file $new_file).  It also does renaming symbols.  But
> not sure this is the ideal solution.

Yes, we started thinking about it but it was too late at night when I
replied ...

I think there's basically a way to have an external list of symbols to
export, for symbol versioning, that we could/should use to basically not
export any of the kernel symbols out to libs.

> How does UML handle symbol conflicts between userspace code and Linux
> kernel (like this case sem_init) ?  AFAIK, libnl has a same symbol as
> Linux kernel (genlmsg_put) and others can possibly do as well.

I fear it doesn't?

johannes



Bug#983379: linux uml segfault

2021-03-03 Thread Anton Ivanov

On 04/03/2021 05:38, Hajime Tazaki wrote:


On Thu, 04 Mar 2021 07:40:00 +0900,
Johannes Berg wrote:


I think the problem is here:


#24 0x6080f234 in ipc_init_ids (ids=0x60c60de8 )
at ipc/util.c:119
#25 0x60813c6d in sem_init_ns (ns=0x60d895bb ) at
ipc/sem.c:254
#26 0x60015b5d in sem_init () at ipc/sem.c:268
#27 0x7f89906d92f7 in ?? () from /lib/x86_64-linux-
gnu/libcom_err.so.2


You're in the init of libcom_err.so.2, which is loaded by


"libnss_nis.so.2"


which is loaded by normal NSS code (getgrnam):


#40 0x7f89909bf3a6 in nss_load_library (ni=ni@entry=0x61497db0) at
nsswitch.c:359
#41 0x7f89909bfc39 in __GI___nss_lookup_function (ni=0x61497db0,
fct_name=, fct_name@entry=0x7f899089b020 "setgrent") at
nsswitch.c:467
#42 0x7f899089554b in init_nss_interface () at nss_compat/compat-
grp.c:83
#43 init_nss_interface () at nss_compat/compat-grp.c:79
#44 0x7f8990895e35 in _nss_compat_getgrnam_r (name=0x7f8990a2a1e0
"tty", grp=0x7ffe3e7a2910, buffer=0x7ffe3e7a24e0 "", buflen=1024,
errnop=0x7f899089eb00) at nss_compat/compat-grp.c:486
#45 0x7f8990968b85 in __getgrnam_r (name=name@entry=0x7f8990a2a1e0
"tty", resbuf=resbuf@entry=0x7ffe3e7a2910,
buffer=buffer@entry=0x7ffe3e7a24e0 "", buflen=1024,
result=result@entry=0x7ffe3e7a2908)
 at ../nss/getXXbyYY_r.c:315



You have a strange nsswitch configuration that causes all of this
(libnss_nis.so.2 -> libcom_err.so.2) to get loaded.

Now libcom_err.so.2 is trying to call sem_init(), and that gets ... tada
... Linux's sem_init() instead of libpthread's.

And then the crash.

Now, I don't know how to fix it (short of changing your nsswitch
configuration) - maybe we could somehow rename sem_init()? Or maybe we
can somehow give the kernel binary a lower symbol resolution than the
libc/libpthread.


objcopy (from binutils) can localize symbols (i.e., objcopy -L
sem_init $orig_file $new_file).  It also does renaming symbols.  But
not sure this is the ideal solution.

How does UML handle symbol conflicts between userspace code and Linux
kernel (like this case sem_init) ?  AFAIK, libnl has a same symbol as
Linux kernel (genlmsg_put) and others can possibly do as well.


It used to handle them. I do not think it does now - something broke and 
it's fairly recent.


I actually have something which confirms this.

I worked on a patch around 5.8-5.9 which would give the option to pick 
up libc equivalents for the functions from string.h and there was a 
clear performance difference of ~ 20%+ This is because UML has no means 
of optimizing them and picks up the worst case scenario x86 version.


I parked that for a while, because had to look at other stuff at work.

I restarted working on it after 5.10. My first observation was that 
despite not changing anything in the patches, the gain was no longer 
there. The performance was the same as if it picked up libc equivalents.


I can either try to reproduce the nss config which causes the sem_init 
issue or use my own libc patchset to try to dissect. The problem commit 
will be roughly around the time the performance difference from applying 
the "switch to libc" goes away.


Brgds,

A.



-- Hajime

___
linux-um mailing list
linux...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um




--
Anton R. Ivanov
https://www.kot-begemot.co.uk/



Bug#983379: linux uml segfault

2021-03-03 Thread Johannes Berg
On Thu, 2021-03-04 at 07:28 +, Anton Ivanov wrote:
> 
> > Now, I don't know how to fix it (short of changing your nsswitch
> > configuration) - maybe we could somehow rename sem_init()? Or maybe we
> > can somehow give the kernel binary a lower symbol resolution than the
> > libc/libpthread.
> 
> I have not looked in depth in how the linking process works, but it 
> should have picked up the sem_init from the kernel library, not libc.

Well, no, other way around? libnss/libcom_err should have gotten (should
get) the one from libpthread, not the one from the kernel.

> We are already supposed to do that regarding kernel vs libc string.h 
> functions - memcpy, etc.
> 
> Though for all of them the libc does the same so invoking the wrong one 
> does not kill you so this may have been broken for a while and we were 
> simply not noticing it.

Indeed.

johannes



Bug#983379: linux uml segfault

2021-03-03 Thread Anton Ivanov

On 03/03/2021 22:40, Johannes Berg wrote:

I think the problem is here:


#24 0x6080f234 in ipc_init_ids (ids=0x60c60de8 )
at ipc/util.c:119
#25 0x60813c6d in sem_init_ns (ns=0x60d895bb ) at
ipc/sem.c:254
#26 0x60015b5d in sem_init () at ipc/sem.c:268
#27 0x7f89906d92f7 in ?? () from /lib/x86_64-linux-
gnu/libcom_err.so.2


You're in the init of libcom_err.so.2, which is loaded by


"libnss_nis.so.2"


which is loaded by normal NSS code (getgrnam):


#40 0x7f89909bf3a6 in nss_load_library (ni=ni@entry=0x61497db0) at
nsswitch.c:359
#41 0x7f89909bfc39 in __GI___nss_lookup_function (ni=0x61497db0,
fct_name=, fct_name@entry=0x7f899089b020 "setgrent") at
nsswitch.c:467
#42 0x7f899089554b in init_nss_interface () at nss_compat/compat-
grp.c:83
#43 init_nss_interface () at nss_compat/compat-grp.c:79
#44 0x7f8990895e35 in _nss_compat_getgrnam_r (name=0x7f8990a2a1e0
"tty", grp=0x7ffe3e7a2910, buffer=0x7ffe3e7a24e0 "", buflen=1024,
errnop=0x7f899089eb00) at nss_compat/compat-grp.c:486
#45 0x7f8990968b85 in __getgrnam_r (name=name@entry=0x7f8990a2a1e0
"tty", resbuf=resbuf@entry=0x7ffe3e7a2910,
buffer=buffer@entry=0x7ffe3e7a24e0 "", buflen=1024,
result=result@entry=0x7ffe3e7a2908)
 at ../nss/getXXbyYY_r.c:315



You have a strange nsswitch configuration that causes all of this
(libnss_nis.so.2 -> libcom_err.so.2) to get loaded.

Now libcom_err.so.2 is trying to call sem_init(), and that gets ... tada
... Linux's sem_init() instead of libpthread's.

And then the crash.

Now, I don't know how to fix it (short of changing your nsswitch
configuration) - maybe we could somehow rename sem_init()? Or maybe we
can somehow give the kernel binary a lower symbol resolution than the
libc/libpthread.


I have not looked in depth in how the linking process works, but it 
should have picked up the sem_init from the kernel library, not libc.


We are already supposed to do that regarding kernel vs libc string.h 
functions - memcpy, etc.


Though for all of them the libc does the same so invoking the wrong one 
does not kill you so this may have been broken for a while and we were 
simply not noticing it.





johannes





--
Anton R. Ivanov
https://www.kot-begemot.co.uk/



Bug#983379: linux uml segfault

2021-03-03 Thread Hajime Tazaki


On Thu, 04 Mar 2021 07:40:00 +0900,
Johannes Berg wrote:
> 
> I think the problem is here:
> 
> > #24 0x6080f234 in ipc_init_ids (ids=0x60c60de8 )
> > at ipc/util.c:119
> > #25 0x60813c6d in sem_init_ns (ns=0x60d895bb ) at
> > ipc/sem.c:254
> > #26 0x60015b5d in sem_init () at ipc/sem.c:268
> > #27 0x7f89906d92f7 in ?? () from /lib/x86_64-linux-
> > gnu/libcom_err.so.2
> 
> You're in the init of libcom_err.so.2, which is loaded by
> 
> > "libnss_nis.so.2"
> 
> which is loaded by normal NSS code (getgrnam):
> 
> > #40 0x7f89909bf3a6 in nss_load_library (ni=ni@entry=0x61497db0) at
> > nsswitch.c:359
> > #41 0x7f89909bfc39 in __GI___nss_lookup_function (ni=0x61497db0,
> > fct_name=, fct_name@entry=0x7f899089b020 "setgrent") at
> > nsswitch.c:467
> > #42 0x7f899089554b in init_nss_interface () at nss_compat/compat-
> > grp.c:83
> > #43 init_nss_interface () at nss_compat/compat-grp.c:79
> > #44 0x7f8990895e35 in _nss_compat_getgrnam_r (name=0x7f8990a2a1e0
> > "tty", grp=0x7ffe3e7a2910, buffer=0x7ffe3e7a24e0 "", buflen=1024,
> > errnop=0x7f899089eb00) at nss_compat/compat-grp.c:486
> > #45 0x7f8990968b85 in __getgrnam_r (name=name@entry=0x7f8990a2a1e0
> > "tty", resbuf=resbuf@entry=0x7ffe3e7a2910,
> > buffer=buffer@entry=0x7ffe3e7a24e0 "", buflen=1024,
> > result=result@entry=0x7ffe3e7a2908)
> > at ../nss/getXXbyYY_r.c:315
> 
> 
> You have a strange nsswitch configuration that causes all of this
> (libnss_nis.so.2 -> libcom_err.so.2) to get loaded.
> 
> Now libcom_err.so.2 is trying to call sem_init(), and that gets ... tada
> ... Linux's sem_init() instead of libpthread's.
> 
> And then the crash.
> 
> Now, I don't know how to fix it (short of changing your nsswitch
> configuration) - maybe we could somehow rename sem_init()? Or maybe we
> can somehow give the kernel binary a lower symbol resolution than the
> libc/libpthread.

objcopy (from binutils) can localize symbols (i.e., objcopy -L
sem_init $orig_file $new_file).  It also does renaming symbols.  But
not sure this is the ideal solution.

How does UML handle symbol conflicts between userspace code and Linux
kernel (like this case sem_init) ?  AFAIK, libnl has a same symbol as
Linux kernel (genlmsg_put) and others can possibly do as well.


-- Hajime



Bug#983379: linux uml segfault

2021-03-03 Thread Johannes Berg
I think the problem is here:

> #24 0x6080f234 in ipc_init_ids (ids=0x60c60de8 )
> at ipc/util.c:119
> #25 0x60813c6d in sem_init_ns (ns=0x60d895bb ) at
> ipc/sem.c:254
> #26 0x60015b5d in sem_init () at ipc/sem.c:268
> #27 0x7f89906d92f7 in ?? () from /lib/x86_64-linux-
> gnu/libcom_err.so.2

You're in the init of libcom_err.so.2, which is loaded by

> "libnss_nis.so.2"

which is loaded by normal NSS code (getgrnam):

> #40 0x7f89909bf3a6 in nss_load_library (ni=ni@entry=0x61497db0) at
> nsswitch.c:359
> #41 0x7f89909bfc39 in __GI___nss_lookup_function (ni=0x61497db0,
> fct_name=, fct_name@entry=0x7f899089b020 "setgrent") at
> nsswitch.c:467
> #42 0x7f899089554b in init_nss_interface () at nss_compat/compat-
> grp.c:83
> #43 init_nss_interface () at nss_compat/compat-grp.c:79
> #44 0x7f8990895e35 in _nss_compat_getgrnam_r (name=0x7f8990a2a1e0
> "tty", grp=0x7ffe3e7a2910, buffer=0x7ffe3e7a24e0 "", buflen=1024,
> errnop=0x7f899089eb00) at nss_compat/compat-grp.c:486
> #45 0x7f8990968b85 in __getgrnam_r (name=name@entry=0x7f8990a2a1e0
> "tty", resbuf=resbuf@entry=0x7ffe3e7a2910,
> buffer=buffer@entry=0x7ffe3e7a24e0 "", buflen=1024,
> result=result@entry=0x7ffe3e7a2908)
> at ../nss/getXXbyYY_r.c:315


You have a strange nsswitch configuration that causes all of this
(libnss_nis.so.2 -> libcom_err.so.2) to get loaded.

Now libcom_err.so.2 is trying to call sem_init(), and that gets ... tada
... Linux's sem_init() instead of libpthread's.

And then the crash.

Now, I don't know how to fix it (short of changing your nsswitch
configuration) - maybe we could somehow rename sem_init()? Or maybe we
can somehow give the kernel binary a lower symbol resolution than the
libc/libpthread.


johannes



Bug#983379: linux uml segfault

2021-03-03 Thread Anton Ivanov



On 03/03/2021 10:45, Ritesh Raj Sarraf wrote:

HI Anton,

On Wed, 2021-03-03 at 09:30 +, Anton Ivanov wrote:

OTOH, I have one more user (other than you) who's not been able to
reproduce the issue.


I will do a dissect the moment I figure out how to reproduce it.
I
will try to do some more experiments on that tomorrow.

I tried to alter the userspace a bit, but it makes no difference.

Out of curiosity, what are you running it on?


Bare-metal machines. 3 different machines, all Intel processors.
And it fails on all 3 of them.


Hmmm...

All mine are AMD. I can try to boot up an Intel later today with Bullseye to 
see if it makes a difference.


On the distribution side, all 3 of them run Debian Unstable, with Linux
5.10.13


The code here is:

static inline u32 printk_caller_id(void)
{
 return in_task() ? task_pid_nr(current) :
 0x8000 + raw_smp_processor_id();
}


That is something which should not bomb out unless we have memory
corruption or something along those lines - current being invalid.


Must be something different. Not all machines could have bad memory at
the same time.


I did not mean bad memory. I meant memory corruption as a result of race, 
buffer overrun or anything else like that.





--
Anton R. Ivanov
https://www.kot-begemot.co.uk/



Bug#983379: linux uml segfault

2021-03-03 Thread Ritesh Raj Sarraf
HI Anton,

On Wed, 2021-03-03 at 09:30 +, Anton Ivanov wrote:
> 
> > 
> > OTOH, I have one more user (other than you) who's not been able to
> > reproduce the issue.
> > 
> > > I will do a dissect the moment I figure out how to reproduce it.
> > > I
> > > will try to do some more experiments on that tomorrow.
> 
> I tried to alter the userspace a bit, but it makes no difference.
> 
> Out of curiosity, what are you running it on?
> 

Bare-metal machines. 3 different machines, all Intel processors.
And it fails on all 3 of them.

On the distribution side, all 3 of them run Debian Unstable, with Linux
5.10.13

> > 
> 
> The code here is:
> 
> static inline u32 printk_caller_id(void)
> {
> return in_task() ? task_pid_nr(current) :
> 0x8000 + raw_smp_processor_id();
> }
> 
> 
> That is something which should not bomb out unless we have memory
> corruption or something along those lines - current being invalid.
> 

Must be something different. Not all machines could have bad memory at
the same time.


-- 
Given the large number of mailing lists I follow, I request you to CC
me in replies for quicker response


signature.asc
Description: This is a digitally signed message part


Bug#983379: linux uml segfault

2021-03-03 Thread Anton Ivanov




On 02/03/2021 17:27, Ritesh Raj Sarraf wrote:

On Tue, 2021-03-02 at 17:05 +, Anton Ivanov wrote:

So the best I can extract for you is to compile the kernel with as
much
information as possible.


Can you try using one of the older kernels so we can verify if this
is indeed a 5.10 thing.



That was the first thing I tried. I tested it with 5.10, 5.9 and 5.4.
All 3 crashed. That's when I knew this one was going to be painful one
to conclude.

The only other input I have is that I have one more user who's reported
to be able to reproduce the issue.

OTOH, I have one more user (other than you) who's not been able to
reproduce the issue.


I will do a dissect the moment I figure out how to reproduce it. I
will try to do some more experiments on that tomorrow.


I tried to alter the userspace a bit, but it makes no difference.

Out of curiosity, what are you running it on?




Meanwhile, I enabled some debug info in the kernel. Here's what I have
got so far:

```
(gdb) bt
#0  0x7f89908dc087 in kill () at ../sysdeps/unix/syscall-
template.S:120
#1  0x604a3514 in uml_abort () at arch/um/os-Linux/util.c:94
#2  0x604a3791 in os_dump_core () at arch/um/os-
Linux/util.c:149
#3  0x6048d126 in panic_exit (self=0x2e66d5, unused1=6,
unused2=0x0) at arch/um/kernel/um_arch.c:217
#4  0x604c725a in notifier_call_chain (nl=0x2e66d5, val=0,
v=0x60d82f40 , nr_to_call=-1, nr_calls=0x0) at
kernel/notifier.c:83
#5  0x604c72f6 in atomic_notifier_call_chain (nh=0x2e66d5,
val=6, v=0x0) at kernel/notifier.c:217
#6  0x60a54607 in panic (fmt=0x60a55225 
"UH\211\345H\201\354", ) at
kernel/panic.c:272
#7  0x6048cca3 in segv (fi=, ip=1615717312,
is_user=0, regs=0x60c2ee58 ) at
arch/um/kernel/trap.c:246
#8  0x6048ce64 in segv_handler (sig=3040981, unused_si=0x6,
regs=0x60c2ee58 ) at arch/um/kernel/trap.c:190
#9  0x604a2556 in sig_handler_common (sig=11, si=0x60c2fbf0
, mc=0x60c2fae8 ) at
arch/um/os-Linux/signal.c:48
#10 0x604a2aa2 in sig_handler (sig=3040981, si=0x6, mc=0x0) at
arch/um/os-Linux/signal.c:81
#11 0x604a265f in hard_handler (sig=3040981, si=0x60c2fbf0
, p=0x0) at arch/um/os-Linux/signal.c:180
#12 


The code here is:

static inline u32 printk_caller_id(void)
{
return in_task() ? task_pid_nr(current) :
0x8000 + raw_smp_processor_id();
}


That is something which should not bomb out unless we have memory corruption or 
something along those lines - current being invalid.

A.


#13 0x604de3c0 in printk_caller_id () at
kernel/printk/printk.c:1924
#14 log_output (text_len=, text=,
dev_info=, lflags=, level=, facility=) at kernel/printk/printk.c:1932
#15 vprintk_store (facility=1624806843, level=5, dev_info=0x0, fmt=0x35
, args=0x1) at
kernel/printk/printk.c:2004
#16 0x604de8b7 in vprintk_emit (facility=1624806843,
level=1622768673, dev_info=0x35, fmt=0x1 , args=0x60b97c22) at kernel/printk/printk.c:2029
#17 0x604debad in vprintk_deferred (fmt=0x1 , args=0x60b97c21) at
kernel/printk/printk.c:3079
#18 0x60a554de in printk_deferred (fmt=0x60d895bb 
"\n") at kernel/printk/printk.c:3091
#19 0x6092680f in _warn_unseeded_randomness
(previous=, caller=, func_name=) at drivers/char/random.c:1534
#20 _warn_unseeded_randomness (func_name=0x60abf380 <__func__.38>
"get_random_u32", caller=0x608b5f25 ,
previous=0x35) at drivers/char/random.c:1516
#21 0x60927d47 in get_random_u32 () at
drivers/char/random.c:2221
#22 0x608b5f25 in bucket_table_alloc (nbuckets=64, gfp=3264,
ht=) at lib/rhashtable.c:203
#23 0x608b6733 in rhashtable_init (ht=0x60c60e30
, params=0x608b5e06 ) at
lib/rhashtable.c:1061
#24 0x6080f234 in ipc_init_ids (ids=0x60c60de8 )
at ipc/util.c:119
#25 0x60813c6d in sem_init_ns (ns=0x60d895bb ) at
ipc/sem.c:254
#26 0x60015b5d in sem_init () at ipc/sem.c:268
#27 0x7f89906d92f7 in ?? () from /lib/x86_64-linux-
gnu/libcom_err.so.2
#28 0x7f8990ab8fb2 in call_init (l=,
argc=argc@entry=5, argv=argv@entry=0x7ffe3e7a4c98,
env=env@entry=0x7ffe3e7a4cc8) at dl-init.c:72
#29 0x7f8990ab90b9 in call_init (env=0x7ffe3e7a4cc8,
argv=0x7ffe3e7a4c98, argc=5, l=) at dl-init.c:30
#30 _dl_init (main_map=0x61497ea0, argc=5, argv=0x7ffe3e7a4c98,
env=0x7ffe3e7a4cc8) at dl-init.c:119
#31 0x7f89909d82bd in __GI__dl_catch_exception
(exception=exception@entry=0x0, operate=operate@entry=0x7f8990abc5a0
, args=args@entry=0x7ffe3e7a1e80) at dl-error-
skeleton.c:182
#32 0x7f8990abd028 in dl_open_worker (a=a@entry=0x7ffe3e7a2020) at
dl-open.c:758
#33 0x7f89909d8260 in __GI__dl_catch_exception
(exception=exception@entry=0x7ffe3e7a2000,
operate=operate@entry=0x7f8990abcc70 ,
args=args@entry=0x7ffe3e7a2020) at dl-error-skeleton.c:208
#34 0x7f8990abc8ca in _dl_open (file=0x7ffe3e7a22a0
"libnss_nis.so.2", mode=-2147483646, caller_dlopen=0x7f89909bf3a6
, nsid=-2, argc=5, argv=0x7ffe3e7a2000,
env=0x7ffe3e7a4cc8)
 at dl-open.c:837

Bug#983379: linux uml segfault

2021-03-02 Thread Ritesh Raj Sarraf
On Tue, 2021-03-02 at 17:05 +, Anton Ivanov wrote:
> > So the best I can extract for you is to compile the kernel with as
> > much
> > information as possible.
> 
> Can you try using one of the older kernels so we can verify if this
> is indeed a 5.10 thing.
> 

That was the first thing I tried. I tested it with 5.10, 5.9 and 5.4.
All 3 crashed. That's when I knew this one was going to be painful one
to conclude.

The only other input I have is that I have one more user who's reported
to be able to reproduce the issue.

OTOH, I have one more user (other than you) who's not been able to
reproduce the issue.

> I will do a dissect the moment I figure out how to reproduce it. I
> will try to do some more experiments on that tomorrow.


Meanwhile, I enabled some debug info in the kernel. Here's what I have
got so far:

```
(gdb) bt
#0  0x7f89908dc087 in kill () at ../sysdeps/unix/syscall-
template.S:120
#1  0x604a3514 in uml_abort () at arch/um/os-Linux/util.c:94
#2  0x604a3791 in os_dump_core () at arch/um/os-
Linux/util.c:149
#3  0x6048d126 in panic_exit (self=0x2e66d5, unused1=6,
unused2=0x0) at arch/um/kernel/um_arch.c:217
#4  0x604c725a in notifier_call_chain (nl=0x2e66d5, val=0,
v=0x60d82f40 , nr_to_call=-1, nr_calls=0x0) at
kernel/notifier.c:83
#5  0x604c72f6 in atomic_notifier_call_chain (nh=0x2e66d5,
val=6, v=0x0) at kernel/notifier.c:217
#6  0x60a54607 in panic (fmt=0x60a55225 
"UH\211\345H\201\354", ) at
kernel/panic.c:272
#7  0x6048cca3 in segv (fi=, ip=1615717312,
is_user=0, regs=0x60c2ee58 ) at
arch/um/kernel/trap.c:246
#8  0x6048ce64 in segv_handler (sig=3040981, unused_si=0x6,
regs=0x60c2ee58 ) at arch/um/kernel/trap.c:190
#9  0x604a2556 in sig_handler_common (sig=11, si=0x60c2fbf0
, mc=0x60c2fae8 ) at
arch/um/os-Linux/signal.c:48
#10 0x604a2aa2 in sig_handler (sig=3040981, si=0x6, mc=0x0) at
arch/um/os-Linux/signal.c:81
#11 0x604a265f in hard_handler (sig=3040981, si=0x60c2fbf0
, p=0x0) at arch/um/os-Linux/signal.c:180
#12 
#13 0x604de3c0 in printk_caller_id () at
kernel/printk/printk.c:1924
#14 log_output (text_len=, text=,
dev_info=, lflags=, level=, facility=) at kernel/printk/printk.c:1932
#15 vprintk_store (facility=1624806843, level=5, dev_info=0x0, fmt=0x35
, args=0x1) at
kernel/printk/printk.c:2004
#16 0x604de8b7 in vprintk_emit (facility=1624806843,
level=1622768673, dev_info=0x35, fmt=0x1 , args=0x60b97c22) at kernel/printk/printk.c:2029
#17 0x604debad in vprintk_deferred (fmt=0x1 , args=0x60b97c21) at
kernel/printk/printk.c:3079
#18 0x60a554de in printk_deferred (fmt=0x60d895bb 
"\n") at kernel/printk/printk.c:3091
#19 0x6092680f in _warn_unseeded_randomness
(previous=, caller=, func_name=) at drivers/char/random.c:1534
#20 _warn_unseeded_randomness (func_name=0x60abf380 <__func__.38>
"get_random_u32", caller=0x608b5f25 ,
previous=0x35) at drivers/char/random.c:1516
#21 0x60927d47 in get_random_u32 () at
drivers/char/random.c:2221
#22 0x608b5f25 in bucket_table_alloc (nbuckets=64, gfp=3264,
ht=) at lib/rhashtable.c:203
#23 0x608b6733 in rhashtable_init (ht=0x60c60e30
, params=0x608b5e06 ) at
lib/rhashtable.c:1061
#24 0x6080f234 in ipc_init_ids (ids=0x60c60de8 )
at ipc/util.c:119
#25 0x60813c6d in sem_init_ns (ns=0x60d895bb ) at
ipc/sem.c:254
#26 0x60015b5d in sem_init () at ipc/sem.c:268
#27 0x7f89906d92f7 in ?? () from /lib/x86_64-linux-
gnu/libcom_err.so.2
#28 0x7f8990ab8fb2 in call_init (l=,
argc=argc@entry=5, argv=argv@entry=0x7ffe3e7a4c98,
env=env@entry=0x7ffe3e7a4cc8) at dl-init.c:72
#29 0x7f8990ab90b9 in call_init (env=0x7ffe3e7a4cc8,
argv=0x7ffe3e7a4c98, argc=5, l=) at dl-init.c:30
#30 _dl_init (main_map=0x61497ea0, argc=5, argv=0x7ffe3e7a4c98,
env=0x7ffe3e7a4cc8) at dl-init.c:119
#31 0x7f89909d82bd in __GI__dl_catch_exception
(exception=exception@entry=0x0, operate=operate@entry=0x7f8990abc5a0
, args=args@entry=0x7ffe3e7a1e80) at dl-error-
skeleton.c:182
#32 0x7f8990abd028 in dl_open_worker (a=a@entry=0x7ffe3e7a2020) at
dl-open.c:758
#33 0x7f89909d8260 in __GI__dl_catch_exception
(exception=exception@entry=0x7ffe3e7a2000,
operate=operate@entry=0x7f8990abcc70 ,
args=args@entry=0x7ffe3e7a2020) at dl-error-skeleton.c:208
#34 0x7f8990abc8ca in _dl_open (file=0x7ffe3e7a22a0
"libnss_nis.so.2", mode=-2147483646, caller_dlopen=0x7f89909bf3a6
, nsid=-2, argc=5, argv=0x7ffe3e7a2000,
env=0x7ffe3e7a4cc8)
at dl-open.c:837
#35 0x7f89909d76dd in do_dlopen (ptr=ptr@entry=0x7ffe3e7a2260) at
dl-libc.c:96
#36 0x7f89909d8260 in __GI__dl_catch_exception
(exception=exception@entry=0x7ffe3e7a21e0,
operate=operate@entry=0x7f89909d76a0 ,
args=args@entry=0x7ffe3e7a2260) at dl-error-skeleton.c:208
#37 0x7f89909d831f in __GI__dl_catch_error
(objname=objname@entry=0x7ffe3e7a2238,
errstring=errstring@entry=0x7ffe3e7a2240,
mallocedp=mallocedp@entry=0x7ffe3e7a2237, 

Bug#983379: linux uml segfault

2021-03-02 Thread Anton Ivanov




On 02/03/2021 14:23, Ritesh Raj Sarraf wrote:

On Tue, 2021-03-02 at 11:34 +, Anton Ivanov wrote:

If gdb gives you the exact lines, that may be helpful.


It doesn't. But it does show drawbacks in my packaging. The debug
symbols packaged are not read/honored by gdb at all.

```
Reading symbols from /usr/bin/linux.uml...
Reading symbols from /usr/lib/debug/.build-
id/6f/ea141539149074c72e80fb8004de124fda115b.debug...
(No debugging symbols found in /usr/lib/debug/.build-
id/6f/ea141539149074c72e80fb8004de124fda115b.debug)

warning: Can't open file /dev/shm/#20817 (deleted) during file-backed
mapping note processing
[New LWP 18788]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-
gnu/libthread_db.so.1".
Core was generated by `linux ubd0=qemu-linux-image.img'.
Program terminated with signal SIGABRT, Aborted.
#0  0x7f51842c0087 in kill () at ../sysdeps/unix/syscall-
template.S:120
120 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0  0x7f51842c0087 in kill () at ../sysdeps/unix/syscall-
template.S:120
#1  0x6049dc20 in uml_abort ()
#2  0x6049de7a in os_dump_core ()
#3  0x60486e47 in panic_exit ()
#4  0x604c0a03 in notifier_call_chain ()
#5  0x604c0a98 in atomic_notifier_call_chain ()
#6  0x60a26b85 in panic ()
#7  0x604869e1 in segv ()
#8  0x60486ba9 in segv_handler ()
#9  0x6049ccc0 in sig_handler_common ()
#10 0x6049d1ec in sig_handler ()
#11 0x6049cdc6 in hard_handler ()
#12 
#13 0x604d45b4 in vprintk_store ()
#14 0x604d4aa8 in vprintk_emit ()
#15 0x604d4d86 in vprintk_deferred ()
#16 0x60a27a02 in printk_deferred ()
#17 0x609031b2 in get_random_u32 ()
#18 0x6088ff65 in bucket_table_alloc.isra ()
#19 0x60890740 in rhashtable_init ()
#20 0x607efaa2 in ipc_init_ids ()
#21 0x600153c9 in sem_init ()
```

So the best I can extract for you is to compile the kernel with as much
information as possible.


Can you try using one of the older kernels so we can verify if this is indeed a 
5.10 thing.

I will do a dissect the moment I figure out how to reproduce it. I will try to 
do some more experiments on that tomorrow.



Thanks,
Ritesh



--
Anton R. Ivanov
https://www.kot-begemot.co.uk/



Bug#983379: linux uml segfault

2021-03-02 Thread Ritesh Raj Sarraf
On Tue, 2021-03-02 at 11:34 +, Anton Ivanov wrote:
> If gdb gives you the exact lines, that may be helpful.

It doesn't. But it does show drawbacks in my packaging. The debug
symbols packaged are not read/honored by gdb at all.

```
Reading symbols from /usr/bin/linux.uml...
Reading symbols from /usr/lib/debug/.build-
id/6f/ea141539149074c72e80fb8004de124fda115b.debug...
(No debugging symbols found in /usr/lib/debug/.build-
id/6f/ea141539149074c72e80fb8004de124fda115b.debug)

warning: Can't open file /dev/shm/#20817 (deleted) during file-backed
mapping note processing
[New LWP 18788]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-
gnu/libthread_db.so.1".
Core was generated by `linux ubd0=qemu-linux-image.img'.
Program terminated with signal SIGABRT, Aborted.
#0  0x7f51842c0087 in kill () at ../sysdeps/unix/syscall-
template.S:120
120 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0  0x7f51842c0087 in kill () at ../sysdeps/unix/syscall-
template.S:120
#1  0x6049dc20 in uml_abort ()
#2  0x6049de7a in os_dump_core ()
#3  0x60486e47 in panic_exit ()
#4  0x604c0a03 in notifier_call_chain ()
#5  0x604c0a98 in atomic_notifier_call_chain ()
#6  0x60a26b85 in panic ()
#7  0x604869e1 in segv ()
#8  0x60486ba9 in segv_handler ()
#9  0x6049ccc0 in sig_handler_common ()
#10 0x6049d1ec in sig_handler ()
#11 0x6049cdc6 in hard_handler ()
#12 
#13 0x604d45b4 in vprintk_store ()
#14 0x604d4aa8 in vprintk_emit ()
#15 0x604d4d86 in vprintk_deferred ()
#16 0x60a27a02 in printk_deferred ()
#17 0x609031b2 in get_random_u32 ()
#18 0x6088ff65 in bucket_table_alloc.isra ()
#19 0x60890740 in rhashtable_init ()
#20 0x607efaa2 in ipc_init_ids ()
#21 0x600153c9 in sem_init ()
```

So the best I can extract for you is to compile the kernel with as much
information as possible.

Thanks,
Ritesh

-- 
Ritesh Raj Sarraf | http://people.debian.org/~rrs
Debian - The Universal Operating System


signature.asc
Description: This is a digitally signed message part


Bug#983379: linux uml segfault

2021-03-02 Thread Anton Ivanov




On 02/03/2021 09:09, Ritesh Raj Sarraf wrote:

On Wed, 2021-02-24 at 11:44 +, Anton Ivanov wrote:

In all cases it boots cleanly and there are no segfaults.

So, frankly, no idea what is causing it to crash - I have run most
combinations of 5.10 on a 5.10, all work fine here.


Is there any other way I can help you with this issue ?
I do have the core dump available on my local machine.


If gdb gives you the exact lines, that may be helpful.

I have looked through the bt several times, it is something through which my 
set-up cruises through.

The actual moment you see in the backtrace is this one:

[0.08] random: get_random_u32 called from 
bucket_table_alloc.isra.0+0x115/0x13d with crng_init=0

However, in your case, instead of getting this printk warning out it blows up.

Why - I don't know.

A.





___
linux-um mailing list
linux...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um



--
Anton R. Ivanov
https://www.kot-begemot.co.uk/



Bug#983379: linux uml segfault

2021-03-02 Thread Ritesh Raj Sarraf
On Wed, 2021-02-24 at 11:44 +, Anton Ivanov wrote:
> In all cases it boots cleanly and there are no segfaults.
> 
> So, frankly, no idea what is causing it to crash - I have run most
> combinations of 5.10 on a 5.10, all work fine here.

Is there any other way I can help you with this issue ?
I do have the core dump available on my local machine.


-- 
Ritesh Raj Sarraf | http://people.debian.org/~rrs
Debian - The Universal Operating System


signature.asc
Description: This is a digitally signed message part


Bug#983379: linux uml segfault

2021-02-24 Thread Anton Ivanov




On 23/02/2021 17:26, Ritesh Raj Sarraf wrote:

Added the debian bug report in CC.

On Tue, 2021-02-23 at 17:19 +, Anton Ivanov wrote:

The current Debian user-mode-linux package in unstable is based on
the 5.10.5 stable source which includes the mentioned patch, but is
still causing an error for some users.


After updating the tree to 5.10.5 and applying all Debian patches
from the package, I cannot reproduce the bug.

I am running it on 5.10, 5.2 and 4.19 hosts with the same parameters
without issues. Hosts are all up to date Debian 10.8 and so is the
UML userspace.



Did you mean 5.10, 5.2 and 4.19 (UML) guests ?

We've seen this happen on Debian Testing and Unstable Host (of which
the former would soon be the next stable i.e. Debian Bullseye).

In our tests, when running the same linux uml binary (5.10) on a Debian
Stable Host, it is working fine.


I cannot reproduce it on a physical Bullseye host using the Debian 
user-mode-linux package compiled from source.

Environment - Bullseye minimal install and build deps. 6 cores/12 threads Ryzen

I cannot reproduce it using the upstream source and the patches from the 
user-mode-linux package

Environment - same as above.

I cannot reproduce it using the upstream source + patches and compiling on 
Buster using the following:

1. Bullseye physical host, minimal install, same hardware

2. Bullseye VM, minimal install, running with 4 vCPUs on the same host

3. Bullseye LXC container running on a Debian Buster host, minimal install, 
same hardware

In all cases it boots cleanly and there are no segfaults.

So, frankly, no idea what is causing it to crash - I have run most combinations 
of 5.10 on a 5.10, all work fine here.

--
Anton R. Ivanov
https://www.kot-begemot.co.uk/



Bug#983379: linux uml segfault

2021-02-23 Thread Anton Ivanov

On 23/02/2021 17:26, Ritesh Raj Sarraf wrote:

Added the debian bug report in CC.

On Tue, 2021-02-23 at 17:19 +, Anton Ivanov wrote:

The current Debian user-mode-linux package in unstable is based on
the 5.10.5 stable source which includes the mentioned patch, but is
still causing an error for some users.


After updating the tree to 5.10.5 and applying all Debian patches
from the package, I cannot reproduce the bug.

I am running it on 5.10, 5.2 and 4.19 hosts with the same parameters
without issues. Hosts are all up to date Debian 10.8 and so is the
UML userspace.



Did you mean 5.10, 5.2 and 4.19 (UML) guests ?


No. Hosts.

I have several 6core/12thread Ryzens which are used for development 
testing.


They all use identical userspace with the sole difference being the 
kernel. They all use a selection of 5.x because 4.19 does not support 
the hardware properly.


The 4.19 testing is done on my old "test farm" which is all A8s and 
Athlon X760.




We've seen this happen on Debian Testing and Unstable Host (of which
the former would soon be the next stable i.e. Debian Bullseye).






In our tests, when running the same linux uml binary (5.10) on a Debian
Stable Host, it is working fine.




OK. I will upgrade one of my systems to Debian testing to try to 
reproduce this.



--
Anton R. Ivanov
https://www.kot-begemot.co.uk/



Bug#983379: linux uml segfault

2021-02-23 Thread Ritesh Raj Sarraf
Added the debian bug report in CC.

On Tue, 2021-02-23 at 17:19 +, Anton Ivanov wrote:
> > The current Debian user-mode-linux package in unstable is based on
> > the 5.10.5 stable source which includes the mentioned patch, but is
> > still causing an error for some users.
> 
> After updating the tree to 5.10.5 and applying all Debian patches
> from the package, I cannot reproduce the bug.
> 
> I am running it on 5.10, 5.2 and 4.19 hosts with the same parameters
> without issues. Hosts are all up to date Debian 10.8 and so is the
> UML userspace.
> 

Did you mean 5.10, 5.2 and 4.19 (UML) guests ?

We've seen this happen on Debian Testing and Unstable Host (of which
the former would soon be the next stable i.e. Debian Bullseye).

In our tests, when running the same linux uml binary (5.10) on a Debian
Stable Host, it is working fine.


-- 
Ritesh Raj Sarraf | http://people.debian.org/~rrs
Debian - The Universal Operating System


signature.asc
Description: This is a digitally signed message part