Re: [Valgrind-users] Cache conflict detection support in cachegrind

2023-01-29 Thread Eliot Moss

On 1/30/2023 7:08 AM, Ivica B wrote:

Can you please share the instructions on how to do it?

On Sun, Jan 29, 2023, 9:07 PM Eliot Moss mailto:m...@cs.umass.edu>> wrote:

I have used lackey to get traces, which I have fed into
a cache model to detect conflicts and such.  You could
also start with the lackey code and model the cache model
into the tool (which a student of mine did at one point).


Lackey is one of the built-in valgrind tools.  It has instructions.
It produces a trace giving one memory access per line, and indicating
if the access is for instruction fetch, memory read, memory write, or
both read and write, with the address and size.

You write a program to parse that and run your own model of whatever
cache you're concerned with.  Doing that part is for you to figure
out.  You do need to know the details of the cache you're going to
model.  There may be programs or libraries out there for analyzing
address traces, but this would not be the list to find them.

Sorry, but I'm not prepared to go through how to code a cache model ...

EM


___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Cache conflict detection support in cachegrind

2023-01-29 Thread Eliot Moss

I have used lackey to get traces, which I have fed into
a cache model to detect conflicts and such.  You could
also start with the lackey code and model the cache model
into the tool (which a student of mine did at one point).

Regards - Eliot Moss


___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Cache conflict detection support in cachegrind

2023-01-29 Thread Ivica B
Can you please share the instructions on how to do it?

On Sun, Jan 29, 2023, 9:07 PM Eliot Moss  wrote:

> I have used lackey to get traces, which I have fed into
> a cache model to detect conflicts and such.  You could
> also start with the lackey code and model the cache model
> into the tool (which a student of mine did at one point).
>
> Regards - Eliot Moss
>
___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Cache conflict detection support in cachegrind

2023-01-29 Thread Ivica B
Hi Paul!

I read the info you provided, but none of the programs actually
support detecting cache conflicts.

Performance counters can detect cache misses, similar to cachegrind,
but they cannot distinguish between cache misses related to cache
conflicts and other cache misses.

pahole is a tool with completely different usage, and that is to
detect paddings in data structures. This isn't related to cache
conflicts in any way.

DHAT provides useful information, by allowing you to assess which data
is accessed more frequently, but you need additional data to verify
that the hot data is not evicted from the cache too soon.

On Sun, Jan 29, 2023 at 4:25 PM Paul Floyd  wrote:
>
>
>
> On 29-01-23 14:31, Ivica B wrote:
> > Hi!
> >
> > I am looking for a tool that can detect cache conflicts, but I am not
> > finding any. There are a few that are mostly academic, and thus not
> > maintained. I think it is important for the performance analysis
> > community to have a tool that to some extent can detect cache
> > conflicts. Is it possible to implement support for detecting source
> > code lines where cache conflicts occur? More info on cache conflicts
> > below.
>
> [snip]
>
> I agree that this is an interesting topic. If anyone else has ideas I'm
> all ears.
>
> My recommendations for this are:
>
> 1/ PMU/PMC (performance monitoring unit/counter) event counting tools
> (perf record on Linux, pmcstat on FreeBSD, Oracle Studio collect on
> Solaris, don't know for macOS). These can record events such as cache
> misses with the associated callstacks. You can then use tools HotSpot
> and perfgrind/kcachegrind (I hae used HotSpot but not perfgrind).
>
> The big advantage of this is that the PMCs are part of the hardware and
> the overhead of doing this is minor. The only slight limitation is that
> then number of counters is limited.
>
> 2/ pahole
> https://github.com/acmel/dwarves
> A really nice binary analysis tool. It will analyze your binary (with
> debuginfo) and generate a report for all structures showing holes,
> padding and cache lines. It can even generate modified source with
> members reordered to improve the packing. However as this is a static
> tool working only on the data structures it knows nothing about your
> access patterns.
>
> 3/ DHAT
> One of the Valgrind tools. This profiles heap memory. If the block is
> less than 1k it will also generate a kind of ascii-html heat map. That
> map is an aggregate, but you can usually guess which offsets get hit the
> most together.
>
> Cachegrind doesn't really do this with the kind of accuracy that PMCs
> do. It has a reduced model of the cache and has a basic branch
> predictor. I don't know if or how speculative execution affects the
> cache hit rate, but Valgrind doesn't do any of that.
>
> A+
> Paul
>
>
> ___
> Valgrind-users mailing list
> Valgrind-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/valgrind-users


___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Cache conflict detection support in cachegrind

2023-01-29 Thread John Reiser

On 2023-01-29, Paul Floyd wrote:


My recommendations for this are:

1/ PMU/PMC (performance monitoring unit/counter) event counting tools (perf record on Linux, pmcstat on FreeBSD, Oracle Studio collect on Solaris, don't know for macOS). These can record events such as cache misses with the associated callstacks. You can then use tools HotSpot and 
perfgrind/kcachegrind (I hae used HotSpot but not perfgrind).


The big advantage of this is that the PMCs are part of the hardware and the 
overhead of doing this is minor. The only slight limitation is that then number 
of counters is limited.


Another disadvantage: the hardware does not know which accesses
belong to the target code versus which accesses belong to
the code of valgrind itself.

Even if the hardware could separate accesses on that basis, it does not know
about stack frames.  Allocating a stack frame shortly after CALL, and
discarding it shortly before RETURN, can be significant reasons for
cache misses, either immediately or in the near future.

Then there are system calls, which might significantly alter cache contents.
Sometimes the resulting cache misses should be included (they most certainly
do affect wall clock time), but in some other cases you may wish that the
operating system was ignored.

If the target program uses threads, then using memory for inter-thread
communication (semaphore, mutex, pipeline, etc.) becomes another factor.



___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Cache conflict detection support in cachegrind

2023-01-29 Thread Paul Floyd




On 29-01-23 14:31, Ivica B wrote:

Hi!

I am looking for a tool that can detect cache conflicts, but I am not
finding any. There are a few that are mostly academic, and thus not
maintained. I think it is important for the performance analysis
community to have a tool that to some extent can detect cache
conflicts. Is it possible to implement support for detecting source
code lines where cache conflicts occur? More info on cache conflicts
below.


[snip]

I agree that this is an interesting topic. If anyone else has ideas I'm 
all ears.


My recommendations for this are:

1/ PMU/PMC (performance monitoring unit/counter) event counting tools 
(perf record on Linux, pmcstat on FreeBSD, Oracle Studio collect on 
Solaris, don't know for macOS). These can record events such as cache 
misses with the associated callstacks. You can then use tools HotSpot 
and perfgrind/kcachegrind (I hae used HotSpot but not perfgrind).


The big advantage of this is that the PMCs are part of the hardware and 
the overhead of doing this is minor. The only slight limitation is that 
then number of counters is limited.


2/ pahole
https://github.com/acmel/dwarves
A really nice binary analysis tool. It will analyze your binary (with 
debuginfo) and generate a report for all structures showing holes, 
padding and cache lines. It can even generate modified source with 
members reordered to improve the packing. However as this is a static 
tool working only on the data structures it knows nothing about your 
access patterns.


3/ DHAT
One of the Valgrind tools. This profiles heap memory. If the block is 
less than 1k it will also generate a kind of ascii-html heat map. That 
map is an aggregate, but you can usually guess which offsets get hit the 
most together.


Cachegrind doesn't really do this with the kind of accuracy that PMCs 
do. It has a reduced model of the cache and has a basic branch 
predictor. I don't know if or how speculative execution affects the 
cache hit rate, but Valgrind doesn't do any of that.


A+
Paul


___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


[Valgrind-users] Cache conflict detection support in cachegrind

2023-01-29 Thread Ivica B
Hi!

I am looking for a tool that can detect cache conflicts, but I am not
finding any. There are a few that are mostly academic, and thus not
maintained. I think it is important for the performance analysis
community to have a tool that to some extent can detect cache
conflicts. Is it possible to implement support for detecting source
code lines where cache conflicts occur? More info on cache conflicts
below.

=== What are cache conflicts? ===
Cache conflict happens when a cache line is brought up from the memory
to the cache, but very soon has to be evicted to the main memory
because another cache line is mapped to the same entry.

The problem with detecting cache conflicts is that it is normal that
one cache line gets evicted because it is replaced by another cache
line. Therefore, a cache conflict is an outlier: the cache line spent
very little time in the cache before it got evicted.

=== How to detect cache conflicts? ===
As I said, there are a few science papers that talk about it. And
probably there are a few different approaches on how to do it.

One approach is to count the amount of time a cache line has been
sitting in cache before it got evicted. For each instruction that
causes an eviction, we count what is the amount of time that the
evicted cache line spent in the cache. Next we build a statistic.
Instructions evicting mostly shortly-lived cache lines are the ones
where cache conflicts are most likely to happen.

=

Please comment!

Ivica


___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users