[Bug ld/31009] regression: assertion fail ../../bfd/merge.c:243
https://sourceware.org/bugzilla/show_bug.cgi?id=31009 --- Comment #7 from Jonny Weir --- I made the following change: diff --git a/bfd/merge.c b/bfd/merge.c index f21154dcd45..3b4ccfb86df 100644 --- a/bfd/merge.c +++ b/bfd/merge.c @@ -175,26 +175,43 @@ sec_merge_maybe_resize (struct sec_merge_hash *table, unsigned added) uint64_t *newl; unsigned long alloc; + printf ("XXX resize 1: count=%u added=%u newnb=%lu\n", bfdtab->count, added, newnb); while (bfdtab->count + added > newnb * 2 / 3) { newnb *= 2; if (!newnb) - return false; + { + printf ("false1: newnb=%lu\n", newnb); + return false; + } } + printf ("XXX resize 2: newnb=%lu\n", newnb); alloc = newnb * sizeof (newl[0]); if (alloc / sizeof (newl[0]) != newnb) - return false; + { + printf ("false2: alloc=%lu, newnb=%lu, sizeof(newl[0])=%lu, alloc/sizeof(newl[0])=%lu\n", alloc, newnb, sizeof(newl[0]), alloc/sizeof(newl[0]));; + return false; + } newl = objalloc_alloc ((struct objalloc *) table->table.memory, alloc); if (newl == NULL) - return false; + { + printf ("false3: newl == NULL"); + return false; + } memset (newl, 0, alloc); alloc = newnb * sizeof (newv[0]); if (alloc / sizeof (newv[0]) != newnb) - return false; + { + printf ("false4: alloc=%lu, newnb=%lu, sizeof(newv[0])=%lu, alloc/sizeof(newv[0])=%lu\n", alloc, newnb, sizeof(newv[0]), alloc/sizeof(newv[0])); + return false; + } newv = objalloc_alloc ((struct objalloc *) table->table.memory, alloc); if (newv == NULL) - return false; + { + printf ("false5: newv == NULL"); + return false; + } memset (newv, 0, alloc); The output of which: XXX resize 1: count=609 added=5699 newnb=16384 XXX resize 2: newnb=16384 XXX resize 1: count=1133 added=28709 newnb=32768 XXX resize 2: newnb=65536 XXX resize 1: count=114 added=9809 newnb=16384 XXX resize 2: newnb=16384 XXX resize 1: count=266 added=10658 newnb=32768 XXX resize 2: newnb=32768 XXX resize 1: count=447 added=30092 newnb=65536 XXX resize 2: newnb=65536 XXX resize 1: count=683 added=123892 newnb=131072 XXX resize 2: newnb=262144 XXX resize 1: count=1048 added=212455 newnb=524288 XXX resize 2: newnb=524288 XXX resize 1: count=1598 added=1086327410 newnb=1048576 XXX resize 2: newnb=2147483648 XXX resize 1: count=755248 added=68141 newnb=0 false1: newnb=0 -- You are receiving this mail because: You are on the CC list for the bug.
[Bug ld/31009] regression: assertion fail ../../bfd/merge.c:243
https://sourceware.org/bugzilla/show_bug.cgi?id=31009 --- Comment #6 from Michael Matz --- (In reply to Jonny Weir from comment #5) > Ignore that last message, it is misleading, this is a more accurate > representation of what is happening with the values: Ah, yes. I was suspecting already that you were printing the value*2/3. Anyway: > bfdtab->count + 1 = 1598 | table->nbuckets = 524288 | table->nbuckets * 2 / > 3 = 349525 > bfdtab->count + 1 = 1599 | table->nbuckets = 2147483648 | table->nbuckets * > 2 / 3 = 0 Yeez! One of the input sections is projected to possibly add 2 billion strings. Can you perhaps add some printfs to sec_merge_maybe_resize (the only place that does increase nbuckets)? Similar to below, maybe also add printf's for each early-out (all the 'return false' in there). And then we need to trace why the overflow isn't detected earlier (I tried to make it so that it does, obviously I failed, that's what the 'return false' are for, after all) and isn't gracefully handled. diff --git a/bfd/merge.c b/bfd/merge.c index 722e6659486..b36cee49b3a 100644 --- a/bfd/merge.c +++ b/bfd/merge.c @@ -175,12 +175,14 @@ sec_merge_maybe_resize (struct sec_merge_hash *table, unsigned added) uint64_t *newl; unsigned long alloc; + printf ("XXX resize 1: count=%u added=%u newnb=%lu\n", bfdtab->count, added, newnb); while (bfdtab->count + added > newnb * 2 / 3) { newnb *= 2; if (!newnb) return false; } + printf ("XXX resize 2: newnb=%lu\n", newnb); alloc = newnb * sizeof (newl[0]); if (alloc / sizeof (newl[0]) != newnb) -- You are receiving this mail because: You are on the CC list for the bug.
[Bug ld/31009] regression: assertion fail ../../bfd/merge.c:243
https://sourceware.org/bugzilla/show_bug.cgi?id=31009 --- Comment #5 from Jonny Weir --- Ignore that last message, it is misleading, this is a more accurate representation of what is happening with the values: bfdtab->count + 1 = 1 | table->nbuckets = 8192 | table->nbuckets * 2 / 3 = 5461 bfdtab->count + 1 = 2 | table->nbuckets = 8192 | table->nbuckets * 2 / 3 = 5461 bfdtab->count + 1 = 3 | table->nbuckets = 8192 | table->nbuckets * 2 / 3 = 5461 bfdtab->count + 1 = 4 | table->nbuckets = 8192 | table->nbuckets * 2 / 3 = 5461 bfdtab->count + 1 = 5 | table->nbuckets = 8192 | table->nbuckets * 2 / 3 = 5461 bfdtab->count + 1 = 6 | table->nbuckets = 8192 | table->nbuckets * 2 / 3 = 5461 ... bfdtab->count + 1 = 1937 | table->nbuckets = 65536 | table->nbuckets * 2 / 3 = 43690 bfdtab->count + 1 = 1938 | table->nbuckets = 65536 | table->nbuckets * 2 / 3 = 43690 bfdtab->count + 1 = 1 | table->nbuckets = 8192 | table->nbuckets * 2 / 3 = 5461 bfdtab->count + 1 = 2 | table->nbuckets = 8192 | table->nbuckets * 2 / 3 = 5461 ... bfdtab->count + 1 = 998 | table->nbuckets = 262144 | table->nbuckets * 2 / 3 = 174762 bfdtab->count + 1 = 999 | table->nbuckets = 262144 | table->nbuckets * 2 / 3 = 174762 bfdtab->count + 1 = 1000 | table->nbuckets = 262144 | table->nbuckets * 2 / 3 = 174762 bfdtab->count + 1 = 1001 | table->nbuckets = 262144 | table->nbuckets * 2 / 3 = 174762 bfdtab->count + 1 = 1002 | table->nbuckets = 262144 | table->nbuckets * 2 / 3 = 174762 bfdtab->count + 1 = 1003 | table->nbuckets = 262144 | table->nbuckets * 2 / 3 = 174762 ... bfdtab->count + 1 = 1047 | table->nbuckets = 262144 | table->nbuckets * 2 / 3 = 174762 bfdtab->count + 1 = 1048 | table->nbuckets = 262144 | table->nbuckets * 2 / 3 = 174762 bfdtab->count + 1 = 1049 | table->nbuckets = 524288 | table->nbuckets * 2 / 3 = 349525 bfdtab->count + 1 = 1050 | table->nbuckets = 524288 | table->nbuckets * 2 / 3 = 349525 ... bfdtab->count + 1 = 1594 | table->nbuckets = 524288 | table->nbuckets * 2 / 3 = 349525 bfdtab->count + 1 = 1595 | table->nbuckets = 524288 | table->nbuckets * 2 / 3 = 349525 bfdtab->count + 1 = 1596 | table->nbuckets = 524288 | table->nbuckets * 2 / 3 = 349525 bfdtab->count + 1 = 1597 | table->nbuckets = 524288 | table->nbuckets * 2 / 3 = 349525 bfdtab->count + 1 = 1598 | table->nbuckets = 524288 | table->nbuckets * 2 / 3 = 349525 bfdtab->count + 1 = 1599 | table->nbuckets = 2147483648 | table->nbuckets * 2 / 3 = 0 bfdtab->count + 1 = 1600 | table->nbuckets = 2147483648 | table->nbuckets * 2 / 3 = 0 bfdtab->count + 1 = 1601 | table->nbuckets = 2147483648 | table->nbuckets * 2 / 3 = 0 bfdtab->count + 1 = 1602 | table->nbuckets = 2147483648 | table->nbuckets * 2 / 3 = 0 ... bfdtab->count + 1 = 755246 | table->nbuckets = 2147483648 | table->nbuckets * 2 / 3 = 0 bfdtab->count + 1 = 755247 | table->nbuckets = 2147483648 | table->nbuckets * 2 / 3 = 0 bfdtab->count + 1 = 755248 | table->nbuckets = 2147483648 | table->nbuckets * 2 / 3 = 0 -- You are receiving this mail because: You are on the CC list for the bug.
[Bug ld/31009] regression: assertion fail ../../bfd/merge.c:243
https://sourceware.org/bugzilla/show_bug.cgi?id=31009 --- Comment #4 from Jonny Weir --- I was able to add some logging to get the values out and it seems that things are overflowing at some point. Some example output: bfdtab->count + 1 = 1 | table->nbuckets = 5461 bfdtab->count + 1 = 2 | table->nbuckets = 5461 bfdtab->count + 1 = 3 | table->nbuckets = 5461 bfdtab->count + 1 = 4 | table->nbuckets = 5461 bfdtab->count + 1 = 5 | table->nbuckets = 5461 bfdtab->count + 1 = 6 | table->nbuckets = 5461 ... bfdtab->count + 1 = 1937 | table->nbuckets = 43690 bfdtab->count + 1 = 1938 | table->nbuckets = 43690 bfdtab->count + 1 = 1 | table->nbuckets = 5461 bfdtab->count + 1 = 2 | table->nbuckets = 5461 ... bfdtab->count + 1 = 998 | table->nbuckets = 174762 bfdtab->count + 1 = 999 | table->nbuckets = 174762 bfdtab->count + 1 = 1000 | table->nbuckets = 174762 bfdtab->count + 1 = 1001 | table->nbuckets = 174762 bfdtab->count + 1 = 1002 | table->nbuckets = 174762 bfdtab->count + 1 = 1003 | table->nbuckets = 174762 ... bfdtab->count + 1 = 1047 | table->nbuckets = 174762 bfdtab->count + 1 = 1048 | table->nbuckets = 174762 bfdtab->count + 1 = 1049 | table->nbuckets = 349525 bfdtab->count + 1 = 1050 | table->nbuckets = 349525 ... bfdtab->count + 1 = 1594 | table->nbuckets = 349525 bfdtab->count + 1 = 1595 | table->nbuckets = 349525 bfdtab->count + 1 = 1596 | table->nbuckets = 349525 bfdtab->count + 1 = 1597 | table->nbuckets = 349525 bfdtab->count + 1 = 1598 | table->nbuckets = 349525 bfdtab->count + 1 = 1599 | table->nbuckets = 0 bfdtab->count + 1 = 1600 | table->nbuckets = 0 bfdtab->count + 1 = 1601 | table->nbuckets = 0 bfdtab->count + 1 = 1602 | table->nbuckets = 0 ... bfdtab->count + 1 = 755246 | table->nbuckets = 0 bfdtab->count + 1 = 755247 | table->nbuckets = 0 bfdtab->count + 1 = 755248 | table->nbuckets = 0 Not sure how helpful this is, but it does at least show why the assertion fail is raised. -- You are receiving this mail because: You are on the CC list for the bug.
[Bug ld/31009] regression: assertion fail ../../bfd/merge.c:243
https://sourceware.org/bugzilla/show_bug.cgi?id=31009 --- Comment #3 from Jonny Weir --- Hi Nick, (In reply to Nick Clifton from comment #1) > Hi Jonny, > > (In reply to Jonny Weir from comment #0) > > > linking stage when -O3 is used (-O2 builds and links correctly). To be > > clear, the only difference between success and failure is the optimisation > > level that is used. > > And to be even more clear, you are talking about the compiler's optimization > level and not the linker's, correct ? Correct. > > /bin/ld: BFD (GNU Binutils for Debian) 2.41 assertion fail > > ../../bfd/merge.c:243 > > Are you able to attach a debugger to the linker and discover the values that > are triggering this assertion ? The code looks like this: I attached gdb to the ld process, but I think a few things happen that make it more difficult to get a stack trace. I believe it forks another process and that is the process that spits out the assertion messages, so when that returns, there is no stack to examine. Unless there is a way to do this that I don't see, I cant see how to get the required stack trace. > > // We must not need resizing, otherwise _index is wrong > BFD_ASSERT (bfdtab->count + 1 <= table->nbuckets * 2 / 3); > > So it would be interesting to know the values of bfdtab->count and > table->nbuckets. > > Given that you are linking a very large project, I do wonder if the problem > is that one of these fields is overflowing. Are you able to build a version > of the linker with undefined behaviour sanitization enabled and then find > out if that catches something ? > > > > I appreciate that this description is quite vague without an example piece > > of code to illustrate the problem, but something appears to have been > > changed that causes this recursive output of messages upon failure. > > The change was (almost certainly) commit 1a528d3ef07f, which reworked the > string merge code to greatly improve its speed. So far the changes have > proved to be very robust, but it may be that this is the first time that > they have been asked to handle an extremely large project. > On the plus side, I have been able to checkout this commit and successfully build it and attempt my builds using the newly built version of ld. As expected, it spits out the same lines: /bin/ld: BFD (GNU Binutils) 2.40.50.20230120 assertion fail merge.c:243 > > > Unfortunately, due to the nature and complexity of the project, I have been > > unable > > to provide a code example that generates the above output. > > As an alternative, if we are able to offer you patches (to the linker) to > try out, are you able to apply and build your own linker to use for testing ? Yes, this would work, seeing as I can build ld from the specific commit you mention above. > > Cheers > Nick Many thanks for the prompt response, Jonny -- You are receiving this mail because: You are on the CC list for the bug.