Re: Weird signal issues using yocto's warrior busybox (aarch64) when memory is cached

2020-02-01 Thread Sergio Paracuellos
Hi all,

On Fri, Jan 31, 2020 at 9:47 AM Sergio Paracuellos
 wrote:
>
> Hi Stefan,
>
> On Fri, Jan 31, 2020 at 8:41 AM Stefan Seyfried
>  wrote:
[snip]
>
> After looking to different core files it seems I am able to reproduce
> issues and seems to be related with a fork() -> execl pattern when
> running different programs. That seems the way the shell exec
> processes. I think I should also try to reproduce this behaviour with
> a more accurate C program doing the same :-). I don't want to make
> busybox the real guilty here, but until now all my problems are
> related with it, and that's why I am asking here :)). I'll try it and
> report here afterwards.

I was able to reproduce this bug also with C programs using the
mentioned pattern and after a deep research of what was happening
there I found the problem. It was a problem related with the
remoteproc nodes in the device tree. I use remoteproc to start , stop
and load the firmware in the ARM R5 cores from the linux side. These
cores use for the stack, interrupts, etc its own Tightly-Coupled
Memory but to share the acquired data with the linux side a shared
memory is used through libmetal. The shared memory is located in the
DDR and in the linux side is handled through uio kernel apis managed
in user space with libmetal also. In order this to work properly all
the shared memory range shall be defined as reserved memory also for
the remoteproc as 'no-map' and after that just define the uio's kernel
nodes with the memory ranges. The 'no-map' part was the problem here.
So it is properly working now. There was not an error with busybox at
all, so I apologize and really really sorry for the noise.

> >
> > BTDT.
> >
>
> Thanks for your help. Very appreciated.

Thanks to all again for your effort and advices.

Best regards,
Sergio Paracuellos
>
> Best regards,
>  Sergio Paracuellos
>
> > >> I just had a USB thumb drive corrupt a file with no warning, so it was
> > >> fresh on my mind.
> >
> > --
> > Stefan Seyfried
> >
> > "For a successful technology, reality must take precedence over
> >  public relations, for nature cannot be fooled." -- Richard Feynman
> > ___
> > busybox mailing list
> > busybox@busybox.net
> > http://lists.busybox.net/mailman/listinfo/busybox
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Weird signal issues using yocto's warrior busybox (aarch64) when memory is cached

2020-01-31 Thread Sergio Paracuellos
Hi Stefan,

On Fri, Jan 31, 2020 at 8:41 AM Stefan Seyfried
 wrote:
>
> Am 30.01.20 um 19:23 schrieb Sergio Paracuellos:
>
> >> First things first, do you trust the hardware?  If the system swapped
> >> out some pages and then read them back in and the controller or storage
> >> device gave back bad data, this could cause all sorts of crashes.  Maybe
> >> run a memtest, too, unless you can reproduce the issue on multiple
> >> different machines.
> >
> > I am able to reproduce the issue using different SOMs with its own RAM
> > chip on it, and its own mmc storage device where the system is
> > installed so I
> > am pretty sure the hardware is not the problem here.
>
> Then look at the kernel.
> Use an older / newer one.
> Use a different compiler to build everything.

I am using stable branch 4.9 since its beginning and merging it with
its updates. I did not have any notable regression in more than one
year, so I think the kernel is not the problem also. If the kernel
were the problem I think I'll see an oops or something weird in its
side... But, who knows :-)

>
> Even nowadays, arm64 is still an exotic architecture (outside of some 
> embedded niches and phones, but they all use their
> own, often old and patched-to-hell toolchain).

That's true. I am using a Xilinx's architecture ultrascale MPSoC
running linux in its A53 cores and some acquisition data in its arm R5
real time cores. I need some boot binaries to be compiled with
Xilinx's toolchain but I did not notice this kind of problems when I
was using other yocto's version (sumo branch, for example). That was a
gcc based 7.x compiler and the one we are using now is gcc 8.3.0 based
one.

>
> Try building busybox with another toolchain / distribution (use the crap that 
> came with the board from the vendor
> instead of yocto, chances are the vendor has patched the hell out and into 
> the old crap they give you to make it run on
> their broken hardware ;-))

After looking to different core files it seems I am able to reproduce
issues and seems to be related with a fork() -> execl pattern when
running different programs. That seems the way the shell exec
processes. I think I should also try to reproduce this behaviour with
a more accurate C program doing the same :-). I don't want to make
busybox the real guilty here, but until now all my problems are
related with it, and that's why I am asking here :)). I'll try it and
report here afterwards.
>
> BTDT.
>

Thanks for your help. Very appreciated.

Best regards,
 Sergio Paracuellos

> >> I just had a USB thumb drive corrupt a file with no warning, so it was
> >> fresh on my mind.
>
> --
> Stefan Seyfried
>
> "For a successful technology, reality must take precedence over
>  public relations, for nature cannot be fooled." -- Richard Feynman
> ___
> busybox mailing list
> busybox@busybox.net
> http://lists.busybox.net/mailman/listinfo/busybox
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Weird signal issues using yocto's warrior busybox (aarch64) when memory is cached

2020-01-30 Thread Stefan Seyfried
Am 30.01.20 um 19:23 schrieb Sergio Paracuellos:

>> First things first, do you trust the hardware?  If the system swapped
>> out some pages and then read them back in and the controller or storage
>> device gave back bad data, this could cause all sorts of crashes.  Maybe
>> run a memtest, too, unless you can reproduce the issue on multiple
>> different machines.
> 
> I am able to reproduce the issue using different SOMs with its own RAM
> chip on it, and its own mmc storage device where the system is
> installed so I
> am pretty sure the hardware is not the problem here.

Then look at the kernel.
Use an older / newer one.
Use a different compiler to build everything.

Even nowadays, arm64 is still an exotic architecture (outside of some embedded 
niches and phones, but they all use their
own, often old and patched-to-hell toolchain).

Try building busybox with another toolchain / distribution (use the crap that 
came with the board from the vendor
instead of yocto, chances are the vendor has patched the hell out and into the 
old crap they give you to make it run on
their broken hardware ;-))

BTDT.

>> I just had a USB thumb drive corrupt a file with no warning, so it was
>> fresh on my mind.

-- 
Stefan Seyfried

"For a successful technology, reality must take precedence over
 public relations, for nature cannot be fooled." -- Richard Feynman
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Weird signal issues using yocto's warrior busybox (aarch64) when memory is cached

2020-01-30 Thread Sergio Paracuellos
Hi Michael,

On Thu, Jan 30, 2020 at 5:38 PM Michael Conrad  wrote:
>
> On 1/30/2020 12:53 AM, Sergio Paracuellos wrote:
> >  warning: core file may not match specified executable file.
> >  [New LWP 23217]
> >
> >  warning: Could not load shared library symbols for 3 libraries,
> > e.g. /lib/libm.so.6.
> >  Use the "info sharedlibrary" command to see the complete listing.
> >  Do you need "set solib-search-path" or "set sysroot"?
> >  Core was generated by `sleep 1'.
> >  Program terminated with signal SIGSEGV, Segmentation fault.
> > #0  0xfcd7fc60 in ?? ()
> >  (gdb) bt
> > #0  0xfcd7fc60 in ?? ()
> > #1  0x00555668af74 in ?? ()
> >  Backtrace stopped: previous frame identical to this frame (corrupt 
> > stack?)
> >  (gdb)
> >
> > Look at the PC (0xfcd7fc60) at this point.. It has not sense
> > at all. For me it looks like a stack corruption.
>
> First things first, do you trust the hardware?  If the system swapped
> out some pages and then read them back in and the controller or storage
> device gave back bad data, this could cause all sorts of crashes.  Maybe
> run a memtest, too, unless you can reproduce the issue on multiple
> different machines.

I am able to reproduce the issue using different SOMs with its own RAM
chip on it, and its own mmc storage device where the system is
installed so I
am pretty sure the hardware is not the problem here.

> I just had a USB thumb drive corrupt a file with no warning, so it was
> fresh on my mind.
>
> -Mike

Thanks for the advice and your time.

Best regards,
Sergio Paracuellos

>
> ___
> busybox mailing list
> busybox@busybox.net
> http://lists.busybox.net/mailman/listinfo/busybox
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Weird signal issues using yocto's warrior busybox (aarch64) when memory is cached

2020-01-30 Thread Michael Conrad

On 1/30/2020 12:53 AM, Sergio Paracuellos wrote:

 warning: core file may not match specified executable file.
 [New LWP 23217]

 warning: Could not load shared library symbols for 3 libraries,
e.g. /lib/libm.so.6.
 Use the "info sharedlibrary" command to see the complete listing.
 Do you need "set solib-search-path" or "set sysroot"?
 Core was generated by `sleep 1'.
 Program terminated with signal SIGSEGV, Segmentation fault.
#0  0xfcd7fc60 in ?? ()
 (gdb) bt
#0  0xfcd7fc60 in ?? ()
#1  0x00555668af74 in ?? ()
 Backtrace stopped: previous frame identical to this frame (corrupt stack?)
 (gdb)

Look at the PC (0xfcd7fc60) at this point.. It has not sense
at all. For me it looks like a stack corruption.


First things first, do you trust the hardware?  If the system swapped 
out some pages and then read them back in and the controller or storage 
device gave back bad data, this could cause all sorts of crashes.  Maybe 
run a memtest, too, unless you can reproduce the issue on multiple 
different machines.


I just had a USB thumb drive corrupt a file with no warning, so it was 
fresh on my mind.


-Mike

___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Weird signal issues using yocto's warrior busybox (aarch64) when memory is cached

2020-01-30 Thread Sergio Paracuellos
Hi,

On Thu, Jan 30, 2020 at 1:42 PM Bernd Petrovitsch
 wrote:
>
> Hi all!
>
> On 30/01/2020 11:09, Sergio Paracuellos wrote:
> [...]
> > On Thu, Jan 30, 2020 at 11:37 AM Bernd Petrovitsch
> >  wrote:
> [...]
> >> "bt f" (short for "backtrace full") delivers more information.
> >
> > I tried also this command and the backtrace was the same, no extra info :-(.
>
> Ah, OK - I use only "bt f" (and don't remember why;-).
>
> [...]
> >> Another idea is - similar to valgrind - run the script
> >> with `strace -o strace.tst -F -F --" and see if the (last few)
> Ooops, should have been "-F -f" (yes, newer versions of strace warn
> on both).

Thanks, will try this also.

>
> >> called sys-calls and their parameters make sense.
> >> You may need "bash -c" and/or "sh -c" or similar ...
> >>
> >>> There are no oops and anything but the audit trace in the kernel side,
> >>> and also I am not successful trying to reproduce this bug running a
> >>
> >> Hmm, so it's
> > ??
>
> Ooops: Hmm, so it happens for busybox-sh and GNU-bash.
> Then it's not really busybox-specific but since it's
> reproducible we should hunt it down.

I do think that the bash receives a SIGSEGV because busybox sleep command is
launched from it. First SIGSEGV is always in a command linked to
busybox.nosuid (normally sleep)

>
> [...]
> > Thanks for your effort in this.
>
> De nada - I'm doing such stuff for money;-)

:-)

Best regards,
Sergio Paracuellos

>
> MfG,
>  Bernd
> --
> "I dislike type abstraction if it has no real reason. And saving
> on typing is not a good reason - if your typing speed is the main
> issue when you're coding, you're doing something seriously wrong."
> - Linus Torvalds
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Weird signal issues using yocto's warrior busybox (aarch64) when memory is cached

2020-01-30 Thread Bernd Petrovitsch
Hi all!

On 30/01/2020 11:09, Sergio Paracuellos wrote:
[...]
> On Thu, Jan 30, 2020 at 11:37 AM Bernd Petrovitsch
>  wrote:
[...]
>> "bt f" (short for "backtrace full") delivers more information.
> 
> I tried also this command and the backtrace was the same, no extra info :-(.

Ah, OK - I use only "bt f" (and don't remember why;-).

[...]
>> Another idea is - similar to valgrind - run the script
>> with `strace -o strace.tst -F -F --" and see if the (last few)
Ooops, should have been "-F -f" (yes, newer versions of strace warn
on both).

>> called sys-calls and their parameters make sense.
>> You may need "bash -c" and/or "sh -c" or similar ...
>>
>>> There are no oops and anything but the audit trace in the kernel side,
>>> and also I am not successful trying to reproduce this bug running a
>>
>> Hmm, so it's
> ??

Ooops: Hmm, so it happens for busybox-sh and GNU-bash.
Then it's not really busybox-specific but since it's
reproducible we should hunt it down.

[...]
> Thanks for your effort in this.

De nada - I'm doing such stuff for money;-)

MfG,
 Bernd
-- 
"I dislike type abstraction if it has no real reason. And saving
on typing is not a good reason - if your typing speed is the main
issue when you're coding, you're doing something seriously wrong."
- Linus Torvalds


pEpkey.asc
Description: application/pgp-keys
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Weird signal issues using yocto's warrior busybox (aarch64) when memory is cached

2020-01-30 Thread Sergio Paracuellos
Hi,

On Thu, Jan 30, 2020 at 12:09 PM Sergio Paracuellos
 wrote:

[snip]

> >
> > "bt f" (short for "backtrace full") delivers more information.
>
> I tried also this command and the backtrace was the same, no extra info :-(.
>
> > It's perhaps also more helpful if gdb can resolve the shared
> > libraries and have the symbols somewhere (so that we get the
> > somewhat exact location ine th source code).
> > And for the 2nd core file (from bash) too.
> > Perhaps there are similarities ...
>
> I'll try to add debug symbols for these two and see if we can get more
> info about this, thanks.

Resolving all debug symbols I got the following:

For the sleep core:

Core was generated by `sleep 1'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0xfcd7fc60 in ?? ()
(gdb) set sysroot /opt/poky/2.7.2/sysroots/aarch64-poky-linux/
Reading symbols from
/opt/poky/2.7.2/sysroots/aarch64-poky-linux/lib/libm.so.6...Reading
symbols from 
/opt/poky/2.7.2/sysroots/aarch64-poky-linux/lib/.debug/libm-2.29.so...done.
done.
Reading symbols from
/opt/poky/2.7.2/sysroots/aarch64-poky-linux/lib/libc.so.6...Reading
symbols from 
/opt/poky/2.7.2/sysroots/aarch64-poky-linux/lib/.debug/libc-2.29.so...done.
done.
Reading symbols from
/opt/poky/2.7.2/sysroots/aarch64-poky-linux/lib/ld-linux-aarch64.so.1...Reading
symbols from 
/opt/poky/2.7.2/sysroots/aarch64-poky-linux/lib/.debug/ld-2.29.so...done.
done.
(gdb) bt
#0  0xfcd7fc60 in ?? ()
#1  0x00555668af74 in xfunc_die () at libbb/xfunc_die.c:19
#2  0x00555668a0d8 in run_applet_no_and_exit (applet_no=130,
name=, argv=0x7fd2369798) at libbb/appletlib.c:998
#3  0x00555668a28c in run_applet_and_exit (name=0x7fd2369f00
"sleep", argv=0x7fd2369798) at libbb/appletlib.c:1014
#4  0x00555668a410 in main (argc=,
argv=0x7fd2369798) at libbb/appletlib.c:1122

For the bash one:

Core was generated by `-sh'.
Program terminated with signal SIGSEGV, Segmentation fault.

(gdb) bt
#0  hash_search (string=0x555a858e60 "_",
table=table@entry=0x55984c9270, flags=flags@entry=0) at
../bash-4.4.18/hashlib.c:183
#1  0x00555a7d2720 in hash_lookup (hashed_vars=0x55984c9270,
name=0x555a858e60 "_") at ../bash-4.4.18/variables.c:1821
#2  bind_variable_internal (name=0x555a858e60 "_", value=0x55984ed960
"]", table=0x55984c9270, hflags=, aflags=0) at
../bash-4.4.18/variables.c:2695
#3  0x00555a7c6fd0 in bind_lastarg (arg=arg@entry=0x55984ed960
"]") at ../bash-4.4.18/execute_cmd.c:3838
#4  0x00555a7c9d30 in execute_simple_command
(simple_command=, pipe_in=,
pipe_in@entry=-1, pipe_out=pipe_out@entry=-1, async=async@entry=0,
fds_to_close=fds_to_close@entry=0x55984ed900) at
../bash-4.4.18/execute_cmd.c:4401
#5  0x00555a7cb474 in execute_command_internal
(command=command@entry=0x55984ed2a0,
asynchronous=asynchronous@entry=0, pipe_in=pipe_in@entry=-1,
pipe_out=pipe_out@entry=-1,
fds_to_close=fds_to_close@entry=0x55984ed900) at
../bash-4.4.18/execute_cmd.c:807
#6  0x00555a7ccbe0 in execute_command (command=0x55984ed2a0) at
../bash-4.4.18/execute_cmd.c:405
#7  0x00555a7ccdb0 in execute_while_or_until
(while_command=0x55984ecc10, type=type@entry=0) at
../bash-4.4.18/execute_cmd.c:3492
#8  0x00555a7cb344 in execute_while_command
(while_command=) at ../bash-4.4.18/execute_cmd.c:3460
#9  execute_command_internal (command=command@entry=0x55984ec630,
asynchronous=asynchronous@entry=0, pipe_in=pipe_in@entry=-1,
pipe_out=pipe_out@entry=-1,
fds_to_close=fds_to_close@entry=0x55984edd80) at
../bash-4.4.18/execute_cmd.c:916
#10 0x00555a7ccbe0 in execute_command (command=0x55984ec630) at
../bash-4.4.18/execute_cmd.c:405
#11 0x00555a7b4f90 in reader_loop () at ../bash-4.4.18/eval.c:180
#12 0x00555a7b3570 in main (argc=1, argv=0x7ffe3f4668,
env=) at ../bash-4.4.18/shell.c:792

Best regards,
 Sergio Paracuellos
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Weird signal issues using yocto's warrior busybox (aarch64) when memory is cached

2020-01-30 Thread Sergio Paracuellos
Hi Bernd,

On Thu, Jan 30, 2020 at 11:37 AM Bernd Petrovitsch
 wrote:
>
> Hi all!
>
> On Thu, 2020-01-30 at 06:53 +0100, Sergio Paracuellos wrote:
> [...]
> > So I tried to get a backtrace of those two using the cores and this
> > two binaries:
> >
> > $ 
> > /opt/poky/2.7.2/sysroots/x86_64-pokysdk-linux/usr/bin/aarch64-poky-linux/aarch64-poky-linux-gdb
> > /home/sergio/YOCTO/tools/yocto/workspace/build/tmp/work/oberonx-poky-linux/oberonx-image/1.0-r0/rootfs/bin/busybox.nosuid
> > core.23217
> > GNU gdb (GDB) 8.2.1
> > Copyright (C) 2018 Free Software Foundation, Inc.
> > License GPLv3+: GNU GPL version 3 or later 
> > 
> > This is free software: you are free to change and redistribute it.
> > There is NO WARRANTY, to the extent permitted by law.
> > Type "show copying" and "show warranty" for details.
> > This GDB was configured as "--host=x86_64-pokysdk-linux
> > --target=aarch64-poky-linux".
> > Type "show configuration" for configuration details.
> > For bug reporting instructions, please see:
> > ;.
> > Find the GDB manual and other documentation resources online at:
> > ;.
> >
> > For help, type "help".
> > Type "apropos word" to search for commands related to "word"...
> > /home/sergio/.gdbinit:1: Error in sourced command file:
> > Undefined command: "layout".  Try "help".
> > Reading symbols from
> > /home/sergio/YOCTO/tools/yocto/workspace/build/tmp/work/oberonx-poky-linux/oberonx-image/1.0-r0/rootfs/bin/busybox.nosuid...(no
> > debugging symbols found)...done.
> >
> > warning: core file may not match specified executable file.
> > [New LWP 23217]
> >
> > warning: Could not load shared library symbols for 3 libraries,
> > e.g. /lib/libm.so.6.
> > Use the "info sharedlibrary" command to see the complete listing.
> > Do you need "set solib-search-path" or "set sysroot"?
>
> I don't know yocto (apart from reading about it;-) but perhaps
> they have a template .gdbinit file (or docs) somewhere for
> this.

Not really, there only few instructions to be able to use gdb for
local an remote debugging (add links just in case you have interest to
look into it):

https://www.yoctoproject.org/docs/latest/dev-manual/dev-manual.html#platdev-gdb-remotedebug
https://www.yoctoproject.org/docs/latest/dev-manual/dev-manual.html#debugging-with-the-gnu-project-debugger-gdb-on-the-target

>
> > Core was generated by `sleep 1'.
> > Program terminated with signal SIGSEGV, Segmentation fault.
> > #0  0xfcd7fc60 in ?? ()
> > (gdb) bt
>
> "bt f" (short for "backtrace full") delivers more information.

I tried also this command and the backtrace was the same, no extra info :-(.

> It's perhaps also more helpful if gdb can resolve the shared
> libraries and have the symbols somewhere (so that we get the
> somewhat exact location ine th source code).
> And for the 2nd core file (from bash) too.
> Perhaps there are similarities ...

I'll try to add debug symbols for these two and see if we can get more
info about this, thanks.

>
> > #0  0xfcd7fc60 in ?? ()
> > #1  0x00555668af74 in ?? ()
> > Backtrace stopped: previous frame identical to this frame (corrupt 
> > stack?)
> > (gdb)
> [...]
>
> Another idea is - similar to valgrind - run the script
> with `strace -o strace.tst -F -F --" and see if the (last few)
> called sys-calls and their parameters make sense.
> You may need "bash -c" and/or "sh -c" or similar ...
>
> > There are no oops and anything but the audit trace in the kernel side,
> > and also I am not successful trying to reproduce this bug running a
>
> Hmm, so it's

??

>
> > similar program in C like the following:
> >
> > #include 
> > #include 
> > #include 
> > #include 
> >
> > static void sig_handler(int signo)
> > {
> > printf("Sygnal catched: %d\n", signo);
> Typo;-)

Good catch ;)

> > exit(1);
> > }
> >
> > int main(void)
> > {
> > signal(SIGINT, sig_handler);
> > signal(SIGSTOP, sig_handler);
>
> That fails (at least according to the manual page).

Well, maybe true and I am not controlling return of these syscalls but
it is enough for me to be able to ctr +c program with the SIGINT one
:)

>
> > while (1) {
> > printf("Testing stuff\n");
> > sleep (1);
> > }
> >
> > return 0;
> > }
> >
> > So it seems to be a possible stack corruption with busybox.
>
> To state the obvious: If it is, the question is where
> it comes from...

That's what I want to know also :-).

>
> [...]
>
> MfG,
> Bernd

Thanks for your effort in this.

Best regards,
Sergio Paracuellos
> --
> Bernd Petrovitsch  Email : be...@petrovitsch.priv.at
>  LUGA : http://www.luga.at
>
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Weird signal issues using yocto's warrior busybox (aarch64) when memory is cached

2020-01-30 Thread Bernd Petrovitsch
Hi all!

On Thu, 2020-01-30 at 06:53 +0100, Sergio Paracuellos wrote:
[...]
> So I tried to get a backtrace of those two using the cores and this
> two binaries:
> 
> $ 
> /opt/poky/2.7.2/sysroots/x86_64-pokysdk-linux/usr/bin/aarch64-poky-linux/aarch64-poky-linux-gdb
> /home/sergio/YOCTO/tools/yocto/workspace/build/tmp/work/oberonx-poky-linux/oberonx-image/1.0-r0/rootfs/bin/busybox.nosuid
> core.23217
> GNU gdb (GDB) 8.2.1
> Copyright (C) 2018 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> Type "show copying" and "show warranty" for details.
> This GDB was configured as "--host=x86_64-pokysdk-linux
> --target=aarch64-poky-linux".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> ;.
> Find the GDB manual and other documentation resources online at:
> ;.
> 
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> /home/sergio/.gdbinit:1: Error in sourced command file:
> Undefined command: "layout".  Try "help".
> Reading symbols from
> /home/sergio/YOCTO/tools/yocto/workspace/build/tmp/work/oberonx-poky-linux/oberonx-image/1.0-r0/rootfs/bin/busybox.nosuid...(no
> debugging symbols found)...done.
> 
> warning: core file may not match specified executable file.
> [New LWP 23217]
> 
> warning: Could not load shared library symbols for 3 libraries,
> e.g. /lib/libm.so.6.
> Use the "info sharedlibrary" command to see the complete listing.
> Do you need "set solib-search-path" or "set sysroot"?

I don't know yocto (apart from reading about it;-) but perhaps
they have a template .gdbinit file (or docs) somewhere for
this.

> Core was generated by `sleep 1'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0xfcd7fc60 in ?? ()
> (gdb) bt

"bt f" (short for "backtrace full") delivers more information.
It's perhaps also more helpful if gdb can resolve the shared
libraries and have the symbols somewhere (so that we get the
somewhat exact location ine th source code).
And for the 2nd core file (from bash) too.
Perhaps there are similarities ...

> #0  0xfcd7fc60 in ?? ()
> #1  0x00555668af74 in ?? ()
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> (gdb)
[...]

Another idea is - similar to valgrind - run the script
with `strace -o strace.tst -F -F --" and see if the (last few)
called sys-calls and their parameters make sense.
You may need "bash -c" and/or "sh -c" or similar ...

> There are no oops and anything but the audit trace in the kernel side,
> and also I am not successful trying to reproduce this bug running a

Hmm, so it's

> similar program in C like the following:
> 
> #include 
> #include 
> #include 
> #include 
> 
> static void sig_handler(int signo)
> {
> printf("Sygnal catched: %d\n", signo);
Typo;-)
> exit(1);
> }
> 
> int main(void)
> {
> signal(SIGINT, sig_handler);
> signal(SIGSTOP, sig_handler);

That fails (at least according to the manual page).

> while (1) {
> printf("Testing stuff\n");
> sleep (1);
> }
> 
> return 0;
> }
> 
> So it seems to be a possible stack corruption with busybox.

To state the obvious: If it is, the question is where
it comes from...

[...]

MfG,
Bernd
-- 
Bernd Petrovitsch  Email : be...@petrovitsch.priv.at
 LUGA : http://www.luga.at

___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Weird signal issues using yocto's warrior busybox (aarch64) when memory is cached

2020-01-30 Thread Sergio Paracuellos
Hi again,

On Thu, Jan 30, 2020 at 9:36 AM Sergio Paracuellos
 wrote:
>
> Hi Lauri,
>
> On Thu, Jan 30, 2020 at 8:58 AM Lauri Kasanen  wrote:
> >
> > Hi,
>
> Thanks for your kick response.
>
> >
> > Try running your uptime loop script under valgrind?
>
> I will try to do during today and post results in this thread.

Ok , so I put my code in a script and run valgrind as follows:

$ cat foo.sh
#!/bin/sh

while [ 1 ]; do uptime && free && sleep 1; done

$ valgrind --tool=memcheck --error-exitcode=1 ./foo.sh

It tooks me about 1 hour to reproduce it and I got this.

 10:05:54 up 55 min,  load average: 1.25, 1.09, 1.02
  totalusedfree  shared  buff/cache   available
Mem:4053004  316728 10634649692 2672812 3639564
Swap: 0   0   0
[ 3336.156816] audit: type=1701 audit(1580378755.931:2):
auid=4294967295 uid=0 gid=0 ses=4294967295 pid=11559 comm="sleep"
exe="/bin/busybox.nosuid" sig=11

valgrind: ../../valgrind-3.14.0/coregrind/m_signals.c:2736 (sync_[
3336.171581] audit: type=1701 audit(1580378755.947:3): auid=4294967295
uid=0 gid=0 ses=4294967295 pid=3809 comm="memcheck-1
signalhandler_from_kernel): Assertion 'VG_(in_generated_code) == False' failed.

host stacktrace:
==3809==at 0x58044F00: ??? (in /usr/lib/valgrind/memcheck-arm64-linux)
==3809==by 0x58045043: ??? (in /usr/lib/valgrind/memcheck-arm64-linux)
==3809==by 0x58045193: ??? (in /usr/lib/valgrind/memcheck-arm64-linux)
==3809==by 0x58058B4B: ??? (in /usr/lib/valgrind/memcheck-arm64-linux)
==3809==by 0x5805624F: ??? (in /usr/lib/valgrind/memcheck-arm64-linux)
==3809==by 0x5102FFF: ??? (in /usr/lib/locale/locale-archive)

sched status:
  running_tid=4147314932

Thread 1: status = VgTs_WaitSys syscall 33181156378935378 (lwpid 3809)
Segmentation fault

Which seems not very useful but the SIGSEGV seems to be again in sleep first?

Should I run valgrind with any other parameters?

Any ideas on this?

Thanks in advance for your time.

Best regards,
Sergio Paracuellos

>
> >
> > - Lauri
>
> Best regards,
> Sergio Paracuellos
> > ___
> > busybox mailing list
> > busybox@busybox.net
> > http://lists.busybox.net/mailman/listinfo/busybox
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Weird signal issues using yocto's warrior busybox (aarch64) when memory is cached

2020-01-30 Thread Sergio Paracuellos
Hi Lauri,

On Thu, Jan 30, 2020 at 8:58 AM Lauri Kasanen  wrote:
>
> Hi,

Thanks for your kick response.

>
> Try running your uptime loop script under valgrind?

I will try to do during today and post results in this thread.

>
> - Lauri

Best regards,
Sergio Paracuellos
> ___
> busybox mailing list
> busybox@busybox.net
> http://lists.busybox.net/mailman/listinfo/busybox
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Weird signal issues using yocto's warrior busybox (aarch64) when memory is cached

2020-01-29 Thread Lauri Kasanen
Hi,

Try running your uptime loop script under valgrind?

- Lauri
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Weird signal issues using yocto's warrior busybox (aarch64) when memory is cached

2020-01-29 Thread Sergio Paracuellos
Hi all,

I am using busybox included version in yocto's warrior which is
version  1.30.1 for aarch64 architecture.

I log in to the system and run the following script directly in the prompt:

while [ 1 ]; do uptime && free && sleep 1; done

When the machine has all of the memory free this script does not cause
any problem. The problem seems to appear when the memory is cached and
the memory free is about 80 MB (but still practically all the memory
available because it is just cached):

State of system without processes writting to disk (when system is ok):

  totalusedfree shared
buff/cache  available
Mem:4053004  164284 3808712   10164   80008 3776824
Swap: 0   0   0

State of system with processes writting to disk and kernel properly
caching memory (system becomes unstable?):

totalusedfree shared
buff/cache  available
Mem:4053004  273072 85472   10164 3694460 3701456
Swap: 0   0   0

(NOTE: both of the free output measures are in KB).

When the system gets in this state random signals seems to be
triggered in the system. The normal signals which I can see are
SIGSEGV but sometimes I saw SIGABRT and more rarely SIGBUS to any
other periodic process (like watchdog scripts for example which all of
them uses bash from busibox).

In this state I can reproduce this issue ALWAYS just executing the
above script and just waiting (time to reproduce it is kind of
random).

When the bug appears I can see this kind of messages from audit daemon:

[49595.751038] audit: type=1701 audit(1580196182.291:4):
auid=4294967295 uid=0 gid=0 ses=4294967295 pid=9931 comm="sleep"
exe="/bin/busybox.nosuid" sig=11
[49605.793534] audit: type=1701 audit(1580196192.331:5):
auid=4294967295 uid=0 gid=0 ses=4294967295 pid=2747 comm="sh"
exe="/bin/bash.bash" sig=11

I don't really expect bash to get SIGSEGV signals and this is kind of weird...

I got two core files of this script receiving SIGSEGV signals (first
sleep 1 command and after that the shell itself dies):

$ file core.23217
core.23217: ELF 64-bit LSB core file ARM aarch64, version 1 (SYSV),
SVR4-style, from 'sleep 1'

$ file core.2739
core.2739: ELF 64-bit LSB core file ARM aarch64, version 1 (SYSV),
SVR4-style, from '-sh'

This file in my rootfs are links to the following files:

/home/sergio/YOCTO/tools/yocto/workspace/build/tmp/work/oberonx-poky-linux/oberonx-image/1.0-r0/rootfs/bin/sh
-> /bin/bash.bash

/home/sergio/YOCTO/tools/yocto/workspace/build/tmp/work/oberonx-poky-linux/oberonx-image/1.0-r0/rootfs/bin/sleep
-> /bin/busybox.nosuid

So I tried to get a backtrace of those two using the cores and this
two binaries:

$ 
/opt/poky/2.7.2/sysroots/x86_64-pokysdk-linux/usr/bin/aarch64-poky-linux/aarch64-poky-linux-gdb
/home/sergio/YOCTO/tools/yocto/workspace/build/tmp/work/oberonx-poky-linux/oberonx-image/1.0-r0/rootfs/bin/busybox.nosuid
core.23217
GNU gdb (GDB) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "--host=x86_64-pokysdk-linux
--target=aarch64-poky-linux".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
/home/sergio/.gdbinit:1: Error in sourced command file:
Undefined command: "layout".  Try "help".
Reading symbols from
/home/sergio/YOCTO/tools/yocto/workspace/build/tmp/work/oberonx-poky-linux/oberonx-image/1.0-r0/rootfs/bin/busybox.nosuid...(no
debugging symbols found)...done.

warning: core file may not match specified executable file.
[New LWP 23217]

warning: Could not load shared library symbols for 3 libraries,
e.g. /lib/libm.so.6.
Use the "info sharedlibrary" command to see the complete listing.
Do you need "set solib-search-path" or "set sysroot"?
Core was generated by `sleep 1'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0xfcd7fc60 in ?? ()
(gdb) bt
#0  0xfcd7fc60 in ?? ()
#1  0x00555668af74 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)

Look at the PC (0xfcd7fc60) at this point.. It has not sense
at all. For me it looks like a stack corruption.

$ 
/opt/poky/2.7.2/sysroots/x86_64-pokysdk-linux/usr/bin/aarch64-poky-linux/aarch64-poky-linux-gdb
/home/sergio/YOCTO/tools/yocto/workspace/build/tmp/work/oberonx-poky-lin