Problems with dwarf-getmacros test
Hi, I frequently get failures from the run-dwarf-getmacros.sh test, on testfile-macros:0xb. The test repeatedly outputs "(null)" instead of the actual macros and then runs into the assert at dwarf-getmacros.c:50. The failure is very nondeterministic, though. I haven't found a reliable way to trigger it. Further examination reveals that the __libdw_in_section check in READ_AND_RELOCATE (libdwP.h:656), when called from __libdw_read_offset seems to be bogus. The "return -1" in there is what produces the null results and ultimately the assert. Experiments show that the address is frequently not in the section we're checking there, but still valid. Just dropping the check makes the test succeed. I'm currently at a loss about why this happens. One thing that strikes me was that the additional dbg_ret mechamism was added in 2012 with commit 775375e3, but the check in READ_AND_RELOCATE was not adapted then. However, the address is also not necessarily in dbg_ret at that point. Checking dbg_ret in addition to dbg still fails sometimes, and also that wouldn't explain the nondeterminism. Ulf
Re: Problems with dwarf-getmacros test
On Mon, 2017-05-08 at 18:22 +0200, Ulf Hermann wrote: > I frequently get failures from the run-dwarf-getmacros.sh test, on > testfile-macros:0xb. The test repeatedly outputs "(null)" instead of the > actual macros and then runs into the assert at dwarf-getmacros.c:50. The > failure is very nondeterministic, though. I haven't found a reliable way > to trigger it. Is it only with testfile-macros? All other testfiles always run correctly? > Further examination reveals that the __libdw_in_section check in > READ_AND_RELOCATE (libdwP.h:656), when called from __libdw_read_offset > seems to be bogus. The "return -1" in there is what produces the null > results and ultimately the assert. Do you have the whole call stack of that failed __libdw_read_offset call? Which source line in tests/dwarf-getmacros.c prints the "(null)"? > Experiments show that the address is > frequently not in the section we're checking there, but still valid. > Just dropping the check makes the test succeed. I think this might be related to our "fake" CU and attributes we invent in libdw/dwarf_getmacros.c (read_macros). See around this comment: /* We pretend this is a DW_AT_GNU_macros attribute so that DW_FORM_sec_offset forms get correctly interpreted as offset into .debug_macro. */ If that is the issue then we might need to somehow make READ_AND_RELOCATE and/or __libdw_in_section aware that the CU is fake and the check isn't needed. In which case we probably need to add some flag "fake" to the CU and pass the CU to the various __libdw_read_* functions to disable that sanity check in READ_AND_RELOCATE. > I'm currently at a loss about why this happens. One thing that strikes > me was that the additional dbg_ret mechamism was added in 2012 with > commit 775375e3, but the check in READ_AND_RELOCATE was not adapted then. The READ_AND_RELOCATE macro is hard to read because it captures the names of some of the variables it uses instead of getting them passed as arguments. It took me a couple of times to double check what it does seems correct. The check in READ_AND_RELOCATE is against (dbg, sec_index, addr) checking that the address is inside the section for the dbg Dwarf. While the __libdw_read_offset (dbg_ret, sec_ret, *ret) is for the returned valued in the returned section in the returned Dwarf. > However, the address is also not necessarily in dbg_ret at that point. > Checking dbg_ret in addition to dbg still fails sometimes, and also that > wouldn't explain the nondeterminism. The nondeterminism is weird indeed. Cheers, Mark
Re: Problems with dwarf-getmacros test
> Is it only with testfile-macros? > All other testfiles always run correctly? Yes, it only happens with the testfile-macros check at 0xb. > Do you have the whole call stack of that failed __libdw_read_offset > call? Which source line in tests/dwarf-getmacros.c prints the "(null)"? I'm having a hard time triggering the problem in a controlled environment. So far I didn't manage to create a stack trace. The (null) is the result of passing a value of null to dwarf-getmacros.c:88. I get an errno of DWARF_E_INVALID_OFFSET at the assert failure. So I put some debug messages into all places where that is generated and this is how I found the source of this. It seems I can't trigger the effect if I put the debug output directly into __libdw_in_section (but maybe I was just very unlucky). If I put it in READ_AND_RELOCATE I can see that the check always fails before it outputs (null) or hits the assertion. >> Experiments show that the address is >> frequently not in the section we're checking there, but still valid. >> Just dropping the check makes the test succeed. > > I think this might be related to our "fake" CU and attributes we invent > in libdw/dwarf_getmacros.c (read_macros). See around this comment: > > /* We pretend this is a DW_AT_GNU_macros attribute so that > DW_FORM_sec_offset forms get correctly interpreted as > offset into .debug_macro. */ > > If that is the issue then we might need to somehow make > READ_AND_RELOCATE and/or __libdw_in_section aware that the CU is fake > and the check isn't needed. In which case we probably need to add some > flag "fake" to the CU and pass the CU to the various __libdw_read_* > functions to disable that sanity check in READ_AND_RELOCATE. We don't set type_offset on the fake_cu, but we read it via cu_sec_idx when calling __libdw_read_offset from dwarf_formstring. This seems wrong. Ulf
Re: Problems with dwarf-getmacros test
Do you have the whole call stack of that failed __libdw_read_offset call? Which source line in tests/dwarf-getmacros.c prints the "(null)"? Actually I just managed to catch a backtrace by inserting an infinite loop at the point where it would normally return -1. It turns out the address we are looking for is not in the IDX_debug_info section, but in the IDX_debug_macro section (which by itself isn't all that surprising, but if that is a rule, how can this code ever work??). Thread 1 (Thread 220048.0x35af0): #0 0x70ec311e in __libdw_read_offset (sec_ret=8, size=1, ret=, width=4, addr=0xe90089 ".\027", sec_index=0, dbg_ret=, dbg=) at D:/qtcreator-super-master/elfutils/libdw/libdwP.h:712 orig_addr = #1 dwarf_formstring (attrp=attrp@entry=0x6af950) at D:/qtcreator-super-master/elfutils/libdw/dwarf_formstring.c:69 dbg = dbg_ret = off = #2 0x70ecc709 in dwarf_macro_param2 (macro=macro@entry=0x6afaa0, paramp=paramp@entry=0x0, strp=strp@entry=0x6af9f0) at D:/qtcreator-super-master/elfutils/libdw/dwarf_macro_param2.c:50 param = {code = 8473, form = 14, valp = 0xe90089 ".\027", cu = 0x6afac0} #3 0x00401784 in mac (macro=0x6afaa0, dbg=0xf926f0) at D:/qtcreator-super-master/elfutils/tests/dwarf-getmacros.c:87 value = 0xf927b8 "p?ù" level = 0 opcode = 5 #4 0x70ecbf1d in read_macros (dbg=dbg@entry=0xf926f0, sec_index=sec_index@entry=10, macoff=macoff@entry=0, callback=out>, callback@entry=0x4015e0 , arg=, offset=, offset@entry=0, accept_0xff=, accept_0xff@entry=false, cudie=, cudie@entry=0x6afe10) at D:/qtcreator-super-master/elfutils/libdw/dwarf_getmacros.c:409 opcode = idx = attributesp = nattributes = {{code = 8473, form = 15, valp = 0xe90088 "\001.\027", cu = 0x6afac0}, {code = 8473, form = 14, valp = 0xe90089 ".\027", cu = 0x6afac0}, {code = 0, form = 0, valp = 0x0, cu = 0x0}, {code = 0, form = 0, valp = 0x0, cu = 0x0}, {code = 0, form = 0, valp = 0x0, cu = 0x0}, {code = 0, form = 0, valp = 0x0, cu = 0x0}, {code = 0, form = 0, valp = 0x0, cu = 0x0}, {code = 0, form = 0, valp = 0x0, cu = 0x0}} fake_cu = {dbg = 0xf926f0, start = 0, end = 0, address_size = 0 '\000', offset_size = 4 '\004', version = 4, type_offset = 0, type_sig8 = 0, abbrev_hash = {size = 0, filled = 0, table = 0x0}, orig_abbrev_offset = 0, last_abbrev_offset = 0, lines = 0x0, files = 0x0, locs = 0x0, startp = 0xe90087, endp = 0xe90ac4} attributes = macro = {table = 0xf928b0, attributes = 0x6afb40, opcode = 5 '\005'} res = d = startp = endp = readp = 0xe9008d "\005\001î!" #5 0x70ecc302 in gnu_macros_getmacros_off (dbg=0xf926f0, macoff=0, callback=callback@entry=0x4015e0 , arg=arg@entry=0xf926f0, offset=offset@entry=0, accept_0xff=accept_0xff@entry=false, cudie=cudie@entry=0x6afe10) at D:/qtcreator-super-master/elfutils/libdw/dwarf_getmacros.c:478 No locals. #6 0x70ecc502 in dwarf_getmacros (cudie=cudie@entry=0x6afe10, callback=callback@entry=0x4015e0 , arg=arg@entry=0xf926f0, token=0) at D:/qtcreator-super-master/elfutils/libdw/dwarf_getmacros.c:564 macoff = 0 accept_0xff = false offset = #7 0x00409a36 in main (argc=3, argv=) at D:/qtcreator-super-master/elfutils/tests/dwarf-getmacros.c:133 off = name = cuoff = new_style = fd = dbg = 0xf926f0 cudie_mem = {addr = 0xf93a5b, cu = 0xf92810, abbrev = 0xf92890, padding__ = 0} cudie = 0x6afe10
Re: Problems with dwarf-getmacros test
Do you have the whole call stack of that failed __libdw_read_offset call? Which source line in tests/dwarf-getmacros.c prints the "(null)"? Actually I just managed to catch a backtrace by inserting an infinite loop at the point where it would normally return -1. It turns out the address we are looking for is not in the IDX_debug_info section, but in the IDX_debug_macro section (which by itself isn't all that surprising, but if that is a rule, how can this code ever work??). The likely solution of the mystery is this piece of code in __libdw_in_section: if (unlikely (addr < data->d_buf) || unlikely (data->d_size - (addr - data->d_buf) < size)) { __libdw_seterrno (DWARF_E_INVALID_OFFSET); return false; } If addr < data->d_buf we fail immediately. That's what's happening to me sometimes. However, if addr > data->d_buf, then it's likely much bigger, as the allocations of the different sections are unrelated. data->d_size is always 159 in this test case. Therefore, data->d_size - (addr - data->d_buf) will produce something negative, which then gets compared to a size_t, forcing it to overflow. size is 4, so this case never gets detected. Ulf