Re: strange failures with gcc-9.0.1-0.11.fc31.x86_64

2019-03-28 Thread Mark Wielaard
Hi,

On Thu, 2019-03-28 at 14:28 +, Zbigniew Jędrzejewski-Szmek wrote:
> On Thu, Mar 28, 2019 at 02:14:31PM +0100, Jakub Jelinek wrote:
> > On Thu, Mar 28, 2019 at 08:52:18AM +, Zbigniew Jędrzejewski-Szmek wrote:
> > > On Wed, Mar 27, 2019 at 01:55:44PM +, Zbigniew Jędrzejewski-Szmek 
> > > wrote:
> > > > I'm trying to compile systemd in koji and mock, and I'm getting 
> > > > suspicious
> > > > crashes...
> > > > 
> > > > $ valgrind x86_64-redhat-linux-gnu/test-terminal-util
> > > > /* test_default_term_for_tty */
> > > > ...
> > > > /* test_read_one_char */
> > > > ==21== Invalid read of size 4
> > > > ==21==at 0x48C09EC: fputs (in /usr/lib64/libc-2.29.9000.so)
> > > > ==21==by 0x109301: UnknownInlinedFun (test-terminal-util.c:43)
> > > > ==21==by 0x109301: main (test-terminal-util.c:80)
> > > > ==21==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
> > > > ==21== 
> > > > ==21== 
> > > > ==21== Process terminating with default action of signal 11 (SIGSEGV)
> > > > 
> > > > The problem is at this line, there is just a call to (a function which
> > > > transitively calls) mkostemp(). It seems like the inlining is somehow
> > > > going wrong.
> > > 
> > > It turns out that our test case was wrong. I was confused because the
> > > inlining causes the backtrace to report an unrelated spot.
> > 
> > So do you still need anything from me to debug?
> 
> Thanks. I need some advice mostly. There's still the question of bogus
> backtrace returned by valgrind. Is this a valgrind issue or the debug
> data produced by gdb or something else? If we cannot rely on
> backtraces with LTO, this would be a big drawback.

The above backtrace is produced by valgrind. The addresses should be
correct, but as "UnknownInlinedFun" shows it has some trouble resolving
the associated function/symbol names.

I don't know if LTO makes that valgrind bug worse.

If gdb works then you can also use gdb and valgrind together:
https://tromey.com/blog/?p=731

http://valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.gdbserver

gdb probably can produce a better backtrace than valgrind.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: strange failures with gcc-9.0.1-0.11.fc31.x86_64

2019-03-28 Thread Zbigniew Jędrzejewski-Szmek
On Thu, Mar 28, 2019 at 02:14:31PM +0100, Jakub Jelinek wrote:
> On Thu, Mar 28, 2019 at 08:52:18AM +, Zbigniew Jędrzejewski-Szmek wrote:
> > On Wed, Mar 27, 2019 at 01:55:44PM +, Zbigniew Jędrzejewski-Szmek wrote:
> > > I'm trying to compile systemd in koji and mock, and I'm getting suspicious
> > > crashes...
> > > 
> > > $ valgrind x86_64-redhat-linux-gnu/test-terminal-util
> > > /* test_default_term_for_tty */
> > > ...
> > > /* test_read_one_char */
> > > ==21== Invalid read of size 4
> > > ==21==at 0x48C09EC: fputs (in /usr/lib64/libc-2.29.9000.so)
> > > ==21==by 0x109301: UnknownInlinedFun (test-terminal-util.c:43)
> > > ==21==by 0x109301: main (test-terminal-util.c:80)
> > > ==21==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
> > > ==21== 
> > > ==21== 
> > > ==21== Process terminating with default action of signal 11 (SIGSEGV)
> > > 
> > > The problem is at this line, there is just a call to (a function which
> > > transitively calls) mkostemp(). It seems like the inlining is somehow
> > > going wrong.
> > 
> > It turns out that our test case was wrong. I was confused because the
> > inlining causes the backtrace to report an unrelated spot.
> 
> So do you still need anything from me to debug?

Thanks. I need some advice mostly. There's still the question of bogus
backtrace returned by valgrind. Is this a valgrind issue or the debug
data produced by gdb or something else? If we cannot rely on
backtraces with LTO, this would be a big drawback.

> gdb crashes I'll defer to the gdb team.  Is that with LTO only btw?

No, LTO doesn't seem to be relevant, despite what I said earlier.
With some programs (I tried a few, some crash, so don't, no idea what
is the rule, but it seems that the very simple ones don't):

In mock buildroot of systemd:
$ ninja -C x86_64-redhat-linux-gnu systemd
$ gdb x86_64-redhat-linux-gnu/systemd
GNU gdb (GDB) Fedora 8.3.50.20190321-3.fc31
...
$ r
...
Trying to run as user instance, but the system has not been booted with systemd.
[Inferior 1 (process 2466) exited with code 01]
Segmentation fault (core dumped)

So the crash seems to be when returning to the gdb prompt, either because
the debugee exited or crashed or hit a breakpoint (all three end the same).

Zbyszek
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: strange failures with gcc-9.0.1-0.11.fc31.x86_64

2019-03-28 Thread Jakub Jelinek
On Thu, Mar 28, 2019 at 08:52:18AM +, Zbigniew Jędrzejewski-Szmek wrote:
> On Wed, Mar 27, 2019 at 01:55:44PM +, Zbigniew Jędrzejewski-Szmek wrote:
> > I'm trying to compile systemd in koji and mock, and I'm getting suspicious
> > crashes...
> > 
> > $ valgrind x86_64-redhat-linux-gnu/test-terminal-util
> > /* test_default_term_for_tty */
> > ...
> > /* test_read_one_char */
> > ==21== Invalid read of size 4
> > ==21==at 0x48C09EC: fputs (in /usr/lib64/libc-2.29.9000.so)
> > ==21==by 0x109301: UnknownInlinedFun (test-terminal-util.c:43)
> > ==21==by 0x109301: main (test-terminal-util.c:80)
> > ==21==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
> > ==21== 
> > ==21== 
> > ==21== Process terminating with default action of signal 11 (SIGSEGV)
> > 
> > The problem is at this line, there is just a call to (a function which
> > transitively calls) mkostemp(). It seems like the inlining is somehow
> > going wrong.
> 
> It turns out that our test case was wrong. I was confused because the
> inlining causes the backtrace to report an unrelated spot.

So do you still need anything from me to debug?
gdb crashes I'll defer to the gdb team.  Is that with LTO only btw?

Jakub
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: strange failures with gcc-9.0.1-0.11.fc31.x86_64

2019-03-28 Thread Zbigniew Jędrzejewski-Szmek
On Wed, Mar 27, 2019 at 01:55:44PM +, Zbigniew Jędrzejewski-Szmek wrote:
> Hi,
> 
> I'm trying to compile systemd in koji and mock, and I'm getting suspicious
> crashes...
> 
> $ valgrind x86_64-redhat-linux-gnu/test-terminal-util
> /* test_default_term_for_tty */
> ...
> /* test_read_one_char */
> ==21== Invalid read of size 4
> ==21==at 0x48C09EC: fputs (in /usr/lib64/libc-2.29.9000.so)
> ==21==by 0x109301: UnknownInlinedFun (test-terminal-util.c:43)
> ==21==by 0x109301: main (test-terminal-util.c:80)
> ==21==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
> ==21== 
> ==21== 
> ==21== Process terminating with default action of signal 11 (SIGSEGV)
> 
> The problem is at this line, there is just a call to (a function which
> transitively calls) mkostemp(). It seems like the inlining is somehow
> going wrong.

It turns out that our test case was wrong. I was confused because the
inlining causes the backtrace to report an unrelated spot.

> Strangely, gdb also crashes:
> $ gdb x86_64-redhat-linux-gnu/test-terminal-util
> GNU gdb (GDB) Fedora 8.3.50.20190321-3.fc31
> ...
> Reading symbols from x86_64-redhat-linux-gnu/test-terminal-util...
> (gdb) r
> Starting program: 
> /builddir/build/BUILD/systemd-49bd196d693efe0acfc8d56c4e3d8f7ba9f91b5d/x86_64-redhat-linux-gnu/test-terminal-util
>  
> Missing separate debuginfos, use: dnf debuginfo-install 
> glibc-2.29.9000-8.fc31.x86_64
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> /* test_default_term_for_tty */
> ...
> /* test_read_one_char */
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x77e759ec in fputs () from /lib64/libc.so.6
> Segmentation fault (core dumped)

This is still a problem. gdb crashes on any program in rawhide mock
for me right now. But gcc seems to be fine.

Zbyszek
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


strange failures with gcc-9.0.1-0.11.fc31.x86_64

2019-03-27 Thread Zbigniew Jędrzejewski-Szmek
Hi,

I'm trying to compile systemd in koji and mock, and I'm getting suspicious
crashes...

$ valgrind x86_64-redhat-linux-gnu/test-terminal-util
/* test_default_term_for_tty */
...
/* test_read_one_char */
==21== Invalid read of size 4
==21==at 0x48C09EC: fputs (in /usr/lib64/libc-2.29.9000.so)
==21==by 0x109301: UnknownInlinedFun (test-terminal-util.c:43)
==21==by 0x109301: main (test-terminal-util.c:80)
==21==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==21== 
==21== 
==21== Process terminating with default action of signal 11 (SIGSEGV)

The problem is at this line, there is just a call to (a function which
transitively calls) mkostemp(). It seems like the inlining is somehow
going wrong.


Strangely, gdb also crashes:
$ gdb x86_64-redhat-linux-gnu/test-terminal-util
GNU gdb (GDB) Fedora 8.3.50.20190321-3.fc31
...
Reading symbols from x86_64-redhat-linux-gnu/test-terminal-util...
(gdb) r
Starting program: 
/builddir/build/BUILD/systemd-49bd196d693efe0acfc8d56c4e3d8f7ba9f91b5d/x86_64-redhat-linux-gnu/test-terminal-util
 
Missing separate debuginfos, use: dnf debuginfo-install 
glibc-2.29.9000-8.fc31.x86_64
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
/* test_default_term_for_tty */
...
/* test_read_one_char */

Program received signal SIGSEGV, Segmentation fault.
0x77e759ec in fputs () from /lib64/libc.so.6
Segmentation fault (core dumped)


There also are compilation failures related to inlining, when I disable LTO:

In file included from ../src/basic/macro.h:549,
 from ../src/basic/alloc-util.h:9,
 from ../src/network/networkd-link.c:9:
In function ‘link_enable_ipv6’,
inlined from ‘link_set_mtu’ at ../src/network/networkd-link.c:1483:16:
../src/basic/log.h:104:9: error: ‘%s’ directive argument is null 
[-Werror=format-overflow=]
  104 | log_internal_realm(LOG_REALM_PLUS_LEVEL(LOG_REALM, (level)), 
__VA_ARGS__)
  | 
^
../src/shared/log-link.h:21:25: note: in expansion of macro ‘log_internal’
   21 | log_internal(level, error, __FILE__, __LINE__, 
__func__, ##__VA_ARGS__); \
  | ^~~~
../src/shared/log-link.h:33:50: note: in expansion of macro ‘log_link_full’
   33 | #define log_link_warning_errno(link, error, ...) log_link_full(link, 
LOG_WARNING, error, ##__VA_ARGS__)
  |  ^
../src/network/networkd-link.c:324:17: note: in expansion of macro 
‘log_link_warning_errno’
  324 | log_link_warning_errno(link, r, "Cannot %s IPv6 for 
interface %s: %m",
  | ^~
../src/network/networkd-link.c: In function ‘link_set_mtu’:
../src/network/networkd-link.c:324:79: note: format string is defined here
  324 | log_link_warning_errno(link, r, "Cannot %s IPv6 for 
interface %s: %m",
  | 
  ^~

The argument is field in a structure, and when the structure is
created, it is always set. It's hard to say for sure that it's never
null, but I think gcc must be confused when it says it's *always* null.

The same rpm compiles fine with gcc-9.0.1-0.8.fc30, gcc-9.0.1-0.8.fc31.
I'm writing to the mailing list instead of opening a bug because I'm not
really sure if gcc is at fault, or if systemd code is somehow buggy in
a non-obvious way... Has anyone else seen similar failures with the
latest gcc build?

Zbyszek

example failed koji scratch build: 
https://koji.fedoraproject.org/koji/taskinfo?taskID=33792874
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org