https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

--- Comment #20 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
(note that if you uninclude the testcase and compile with -fno-exceptions it's
much faster)

On the smaller testcase from comment 14, prune_unused_phi_nodes invokes
gcc_qsort 53386 times. There are two distinct phases.

In the first phase, the count of struct dom_dfsnum to sort grows in a roughly
linear fashion up to 23437 on the 12294'th invocation. Hence this first phase
is quadratic in the overall number of processed dom_dfsnum-s.

In the second phase, it almost always sorts exactly 7 elements for the
remaining ~41000 invocations.

The number of pairwise comparisons performed by gcc_qsort is approximately
(n+1)*(log_2(n)-1), which results in 1.8e9 comparisons overall for the 53386
invocations. perf shows 10.9e9 cycles spent under gcc_qsort, i.e. 6 cycles per
comparison, which looks about right. It's possible to reduce that further by
switching from classic to bidirectional merge, and using cmovs instead of
bitwise arithmetic for branchless selects.

> I'll note the swapping of 8 bytes is a bit odd and it seems to be
> if-converted, thus always doing a write.

That is not a swap. That's the merge step of a mergesort, we are taking the
smaller element from the heads of two arrays and moving it to the tail of a
third array.

Basically there's quadratic behavior in tree-into-ssa, gcc_qsort shows
relatively higher on the profile because the log_2(n) factor becomes
noticeable.

Hope that helps!

Reply via email to