On 23/07/2025 17:05, Jan Hubicka wrote:
Am 23.07.2025 um 16:42 schrieb Andrew Stubbs <a...@baylibre.com>:
On 23/07/2025 15:22, Michael Matz wrote:
Hello,
On Tue, 22 Jul 2025, Andrew Stubbs wrote:
Don’t you need to instrument more (or at least different) edges when
only having visited bits ? With counters you can derive coverage of
related edges by diffing counters (consider a simple diamond).
I have not yet got this deep into the analysis (apparently). In my
head, if I just reduced every counter to a single bit (using saturated
add) then gcov would just present a report with "1" or "#####" all down
the left side.
Are you saying that it might actually report "2" (or more) for some
cases where the result is derived from multiple counters? If so, this
is probably acceptable for a first implementation, but we might want to
do something about it.
No, it will have to say "unknown" for some of the non-instrumented edges.
For instance, in a branch:
-- e1 --> if (cond) -- e2 --> ...
\-- e3 --> ...
We will only instrument a subset of all edges, as c(e1)==c(e2)+c(e3). The
subset we select will be a minimal-cost edge graph cover of the CFG.
Depending on surrounding code we may select (for demostrations sake) to
instrument e1 and e2 (or perhaps we know c(e1) from still other edges).
So we know c(e1) and c(e2), and can compute c(e3)=c(e1)-c(e2) correctly.
With visited flags you loose that because the meeting operation
(logical or) isn't invertible: if you know that e1 was visited, and you
know that e2 was visited, you still don't know anything about e3. It may,
or may not, have been visited. You won't know unless you do instrument
e3. Or select a different subset of edge (here, v(e1) could be computed
from v(e2)|v(e3)).
So, that need to instrument more edges offsets the storage savings. I
have no gut feeling for how bad that is, though. You'd have to try :)
Perhaps it still comes out positive.
Thank you for the careful explanation. It seems like all the "innermost" blocks
would need to be instrumented, and in most cases the rest can be assumed. Unfortunately,
that probably means instrumenting the hottest blocks, but hopefully/maybe/possibly
there's no so many of them. I'll take a look at the selection algorithm and figure out
how it's working, but I assume I will end up needing a matching gcov to go with the new
instrumentation.
I think you can work on the instrumentation part by keeping the counter format
and just changing the update to set bit zero.
we build minimum spaning tree of the CFG and then instrument all
non-spanning edges. Since typical out-degree of a basic block is 2 you
get approximately as many counters as basic blocks. The counts on the
spaning tree are then determined by Kirhoff law.
https://dl.acm.org/doi/pdf/10.1145/183432.183527
Thank you.
If you want to track bits, you may simply instrument all edges and the
data size would be still approx 32times smaller at the expense of more
work in code segment. You can optimize this. If BB has only one
sucessor S you know that if BB is executed, S is also executed. Not sure
if you can do much better.
I was also playing with idea of not streaming all the zeros as 64bit
values or possibly one can allocate counters on demand...
As for atomic operations, I think you will still need locking unless you
can set a given bit in memory without loading+storing the remaining bits
of the byte.
I am leaning toward using 8-bit flags and getting most of the
size-reduction from simplifying the instrumentation to a single
non-atomic blind write.
If an architecture has an atomic or instruction then 1-bit flags might
be an option, but otherwise I think it's probably more expensive in
instrumentation code.
Andrew
Honza