[Bug ld/31009] regression: assertion fail ../../bfd/merge.c:243

2023-10-31 Thread jonny.weir at clearpool dot io
https://sourceware.org/bugzilla/show_bug.cgi?id=31009

--- Comment #7 from Jonny Weir  ---
I made the following change:

diff --git a/bfd/merge.c b/bfd/merge.c
index f21154dcd45..3b4ccfb86df 100644
--- a/bfd/merge.c
+++ b/bfd/merge.c
@@ -175,26 +175,43 @@ sec_merge_maybe_resize (struct sec_merge_hash *table,
unsigned added)
   uint64_t *newl;
   unsigned long alloc;

+  printf ("XXX resize 1: count=%u added=%u newnb=%lu\n", bfdtab->count,
added, newnb);
   while (bfdtab->count + added > newnb * 2 / 3)
{
  newnb *= 2;
  if (!newnb)
-   return false;
+   {
+ printf ("false1: newnb=%lu\n", newnb);
+ return false;
+   }
}
+  printf ("XXX resize 2: newnb=%lu\n", newnb);

   alloc = newnb * sizeof (newl[0]);
   if (alloc / sizeof (newl[0]) != newnb)
-   return false;
+   {
+ printf ("false2: alloc=%lu, newnb=%lu, sizeof(newl[0])=%lu,
alloc/sizeof(newl[0])=%lu\n", alloc, newnb, sizeof(newl[0]),
alloc/sizeof(newl[0]));;
+ return false;
+   }
   newl = objalloc_alloc ((struct objalloc *) table->table.memory, alloc);
   if (newl == NULL)
-   return false;
+   {
+ printf ("false3: newl == NULL");
+ return false;
+   }
   memset (newl, 0, alloc);
   alloc = newnb * sizeof (newv[0]);
   if (alloc / sizeof (newv[0]) != newnb)
-   return false;
+   {
+ printf ("false4: alloc=%lu, newnb=%lu, sizeof(newv[0])=%lu,
alloc/sizeof(newv[0])=%lu\n", alloc, newnb, sizeof(newv[0]),
alloc/sizeof(newv[0]));
+ return false;
+   }
   newv = objalloc_alloc ((struct objalloc *) table->table.memory, alloc);
   if (newv == NULL)
-   return false;
+   {
+ printf ("false5: newv == NULL");
+ return false;
+   }
   memset (newv, 0, alloc);


The output of which:

XXX resize 1: count=609 added=5699 newnb=16384
XXX resize 2: newnb=16384
XXX resize 1: count=1133 added=28709 newnb=32768
XXX resize 2: newnb=65536
XXX resize 1: count=114 added=9809 newnb=16384
XXX resize 2: newnb=16384
XXX resize 1: count=266 added=10658 newnb=32768
XXX resize 2: newnb=32768
XXX resize 1: count=447 added=30092 newnb=65536
XXX resize 2: newnb=65536
XXX resize 1: count=683 added=123892 newnb=131072
XXX resize 2: newnb=262144
XXX resize 1: count=1048 added=212455 newnb=524288
XXX resize 2: newnb=524288
XXX resize 1: count=1598 added=1086327410 newnb=1048576
XXX resize 2: newnb=2147483648
XXX resize 1: count=755248 added=68141 newnb=0
false1: newnb=0

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/31009] regression: assertion fail ../../bfd/merge.c:243

2023-10-31 Thread matz at suse dot de
https://sourceware.org/bugzilla/show_bug.cgi?id=31009

--- Comment #6 from Michael Matz  ---
(In reply to Jonny Weir from comment #5)
> Ignore that last message, it is misleading, this is a more accurate
> representation of what is happening with the values:

Ah, yes.  I was suspecting already that you were printing the value*2/3.
Anyway:

> bfdtab->count + 1 = 1598 | table->nbuckets = 524288 | table->nbuckets * 2 /
> 3 = 349525
> bfdtab->count + 1 = 1599 | table->nbuckets = 2147483648 | table->nbuckets *
> 2 / 3 = 0

Yeez!  One of the input sections is projected to possibly add 2 billion
strings.
Can you perhaps add some printfs to sec_merge_maybe_resize (the only place
that does increase nbuckets)?  Similar to below, maybe also add printf's for
each early-out (all the 'return false' in there).

And then we need to trace why the overflow isn't detected earlier (I tried to
make it so that it does, obviously I failed, that's what the 'return false' are
for, after all) and isn't gracefully handled.

diff --git a/bfd/merge.c b/bfd/merge.c
index 722e6659486..b36cee49b3a 100644
--- a/bfd/merge.c
+++ b/bfd/merge.c
@@ -175,12 +175,14 @@ sec_merge_maybe_resize (struct sec_merge_hash *table,
unsigned added)
   uint64_t *newl;
   unsigned long alloc;

+  printf ("XXX resize 1: count=%u added=%u newnb=%lu\n", bfdtab->count,
added, newnb);
   while (bfdtab->count + added > newnb * 2 / 3)
{
  newnb *= 2;
  if (!newnb)
return false;
}
+  printf ("XXX resize 2: newnb=%lu\n", newnb);

   alloc = newnb * sizeof (newl[0]);
   if (alloc / sizeof (newl[0]) != newnb)

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/31009] regression: assertion fail ../../bfd/merge.c:243

2023-10-31 Thread jonny.weir at clearpool dot io
https://sourceware.org/bugzilla/show_bug.cgi?id=31009

--- Comment #5 from Jonny Weir  ---
Ignore that last message, it is misleading, this is a more accurate
representation of what is happening with the values:

bfdtab->count + 1 = 1 | table->nbuckets = 8192 | table->nbuckets * 2 / 3 = 5461
bfdtab->count + 1 = 2 | table->nbuckets = 8192 | table->nbuckets * 2 / 3 = 5461
bfdtab->count + 1 = 3 | table->nbuckets = 8192 | table->nbuckets * 2 / 3 = 5461
bfdtab->count + 1 = 4 | table->nbuckets = 8192 | table->nbuckets * 2 / 3 = 5461
bfdtab->count + 1 = 5 | table->nbuckets = 8192 | table->nbuckets * 2 / 3 = 5461
bfdtab->count + 1 = 6 | table->nbuckets = 8192 | table->nbuckets * 2 / 3 = 5461
...
bfdtab->count + 1 = 1937 | table->nbuckets = 65536 | table->nbuckets * 2 / 3 =
43690
bfdtab->count + 1 = 1938 | table->nbuckets = 65536 | table->nbuckets * 2 / 3 =
43690
bfdtab->count + 1 = 1 | table->nbuckets = 8192 | table->nbuckets * 2 / 3 = 5461
bfdtab->count + 1 = 2 | table->nbuckets = 8192 | table->nbuckets * 2 / 3 = 5461
...
bfdtab->count + 1 = 998 | table->nbuckets = 262144 | table->nbuckets * 2 / 3 =
174762
bfdtab->count + 1 = 999 | table->nbuckets = 262144 | table->nbuckets * 2 / 3 =
174762
bfdtab->count + 1 = 1000 | table->nbuckets = 262144 | table->nbuckets * 2 / 3 =
174762
bfdtab->count + 1 = 1001 | table->nbuckets = 262144 | table->nbuckets * 2 / 3 =
174762
bfdtab->count + 1 = 1002 | table->nbuckets = 262144 | table->nbuckets * 2 / 3 =
174762
bfdtab->count + 1 = 1003 | table->nbuckets = 262144 | table->nbuckets * 2 / 3 =
174762
...
bfdtab->count + 1 = 1047 | table->nbuckets = 262144 | table->nbuckets * 2 / 3 =
174762
bfdtab->count + 1 = 1048 | table->nbuckets = 262144 | table->nbuckets * 2 / 3 =
174762
bfdtab->count + 1 = 1049 | table->nbuckets = 524288 | table->nbuckets * 2 / 3 =
349525
bfdtab->count + 1 = 1050 | table->nbuckets = 524288 | table->nbuckets * 2 / 3 =
349525
...
bfdtab->count + 1 = 1594 | table->nbuckets = 524288 | table->nbuckets * 2 / 3 =
349525
bfdtab->count + 1 = 1595 | table->nbuckets = 524288 | table->nbuckets * 2 / 3 =
349525
bfdtab->count + 1 = 1596 | table->nbuckets = 524288 | table->nbuckets * 2 / 3 =
349525
bfdtab->count + 1 = 1597 | table->nbuckets = 524288 | table->nbuckets * 2 / 3 =
349525
bfdtab->count + 1 = 1598 | table->nbuckets = 524288 | table->nbuckets * 2 / 3 =
349525
bfdtab->count + 1 = 1599 | table->nbuckets = 2147483648 | table->nbuckets * 2 /
3 = 0
bfdtab->count + 1 = 1600 | table->nbuckets = 2147483648 | table->nbuckets * 2 /
3 = 0
bfdtab->count + 1 = 1601 | table->nbuckets = 2147483648 | table->nbuckets * 2 /
3 = 0
bfdtab->count + 1 = 1602 | table->nbuckets = 2147483648 | table->nbuckets * 2 /
3 = 0
...
bfdtab->count + 1 = 755246 | table->nbuckets = 2147483648 | table->nbuckets * 2
/ 3 = 0
bfdtab->count + 1 = 755247 | table->nbuckets = 2147483648 | table->nbuckets * 2
/ 3 = 0
bfdtab->count + 1 = 755248 | table->nbuckets = 2147483648 | table->nbuckets * 2
/ 3 = 0

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/31009] regression: assertion fail ../../bfd/merge.c:243

2023-10-31 Thread jonny.weir at clearpool dot io
https://sourceware.org/bugzilla/show_bug.cgi?id=31009

--- Comment #4 from Jonny Weir  ---
I was able to add some logging to get the values out and it seems that things
are overflowing at some point. Some example output:

bfdtab->count + 1 =  1 | table->nbuckets =  5461
bfdtab->count + 1 =  2 | table->nbuckets =  5461
bfdtab->count + 1 =  3 | table->nbuckets =  5461
bfdtab->count + 1 =  4 | table->nbuckets =  5461
bfdtab->count + 1 =  5 | table->nbuckets =  5461
bfdtab->count + 1 =  6 | table->nbuckets =  5461
...
bfdtab->count + 1 =  1937 | table->nbuckets =  43690
bfdtab->count + 1 =  1938 | table->nbuckets =  43690
bfdtab->count + 1 =  1 | table->nbuckets =  5461
bfdtab->count + 1 =  2 | table->nbuckets =  5461
...
bfdtab->count + 1 =  998 | table->nbuckets =  174762
bfdtab->count + 1 =  999 | table->nbuckets =  174762
bfdtab->count + 1 =  1000 | table->nbuckets =  174762
bfdtab->count + 1 =  1001 | table->nbuckets =  174762
bfdtab->count + 1 =  1002 | table->nbuckets =  174762
bfdtab->count + 1 =  1003 | table->nbuckets =  174762
...
bfdtab->count + 1 =  1047 | table->nbuckets =  174762
bfdtab->count + 1 =  1048 | table->nbuckets =  174762
bfdtab->count + 1 =  1049 | table->nbuckets =  349525
bfdtab->count + 1 =  1050 | table->nbuckets =  349525
...
bfdtab->count + 1 =  1594 | table->nbuckets =  349525
bfdtab->count + 1 =  1595 | table->nbuckets =  349525
bfdtab->count + 1 =  1596 | table->nbuckets =  349525
bfdtab->count + 1 =  1597 | table->nbuckets =  349525
bfdtab->count + 1 =  1598 | table->nbuckets =  349525
bfdtab->count + 1 =  1599 | table->nbuckets =  0
bfdtab->count + 1 =  1600 | table->nbuckets =  0
bfdtab->count + 1 =  1601 | table->nbuckets =  0
bfdtab->count + 1 =  1602 | table->nbuckets =  0
...
bfdtab->count + 1 =  755246 | table->nbuckets =  0
bfdtab->count + 1 =  755247 | table->nbuckets =  0
bfdtab->count + 1 =  755248 | table->nbuckets =  0

Not sure how helpful this is, but it does at least show why the assertion fail
is raised.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/31009] regression: assertion fail ../../bfd/merge.c:243

2023-10-31 Thread jonny.weir at clearpool dot io
https://sourceware.org/bugzilla/show_bug.cgi?id=31009

--- Comment #3 from Jonny Weir  ---
Hi Nick,

(In reply to Nick Clifton from comment #1)
> Hi Jonny,
> 
> (In reply to Jonny Weir from comment #0)
> 
> > linking stage when -O3 is used (-O2 builds and links correctly). To be
> > clear, the only difference between success and failure is the optimisation
> > level that is used.
> 
> And to be even more clear, you are talking about the compiler's optimization
> level and not the linker's, correct ?

Correct.


> > /bin/ld: BFD (GNU Binutils for Debian) 2.41 assertion fail
> > ../../bfd/merge.c:243
> 
> Are you able to attach a debugger to the linker and discover the values that
> are triggering this assertion ?  The code looks like this:

I attached gdb to the ld process, but I think a few things happen that make it
more difficult to get a stack trace. I believe it forks another process and
that is the process that spits out the assertion messages, so when that
returns, there is no stack to examine. Unless there is a way to do this that I
don't see, I cant see how to get the required stack trace.

> 
>   // We must not need resizing, otherwise _index is wrong
>   BFD_ASSERT (bfdtab->count + 1 <= table->nbuckets * 2 / 3);
> 
> So it would be interesting to know the values of bfdtab->count and
> table->nbuckets.
> 
> Given that you are linking a very large project, I do wonder if the problem
> is that one of these fields is overflowing.  Are you able to build a version
> of the linker with undefined behaviour sanitization enabled and then find
> out if that catches something ?
> 
> 
> > I appreciate that this description is quite vague without an example piece
> > of code to illustrate the problem, but something appears to have been
> > changed that causes this recursive output of messages upon failure.
> 
> The change was (almost certainly) commit 1a528d3ef07f, which reworked the
> string merge code to greatly improve its speed.  So far the changes have
> proved to be very robust, but it may be that this is the first time that
> they have been asked to handle an extremely large project.
> 

On the plus side, I have been able to checkout this commit and successfully
build it and attempt my builds using the newly built version of ld. As
expected, it spits out the same lines:

/bin/ld: BFD (GNU Binutils) 2.40.50.20230120 assertion fail merge.c:243

> 
> > Unfortunately, due to the nature and complexity of the project, I have been 
> > unable 
> > to provide a code example that generates the above output.
> 
> As an alternative, if we are able to offer you patches (to the linker) to
> try out, are you able to apply and build your own linker to use for testing ?

Yes, this would work, seeing as I can build ld from the specific commit you
mention above.

> 
> Cheers
>   Nick

Many thanks for the prompt response,

Jonny

-- 
You are receiving this mail because:
You are on the CC list for the bug.