Hi Qiong,
It's good that we've agreed on hardware issues, now we can discuss the
method by itself in details.
I read the Chilimbi's paper and think it can really work. But there are some
issues in article that are unclear to me and I want to discuss here.

1) Chilimbi's algorithm does not use  hardware counters at all. You said
that you want to "use the hardware performance counter
to get the data reference trace". I agree that PMU counters could help to
optimize the algorithm (e.g. we can do trace profiling only for hot methods
with high cache-miss rate) but I do not understand how do you want to
collect a trace with PMU counters? The usage of PMU counters in programs I
know is time or event based sampling. Sampling means that you get only 1
event of thousands: how to get the trace in this case? But anyway PMU could
be useful to avoid profiling in a methods with low cache-miss rate.

2) Chilimbi does not investigate his algorithm behaviour when GC moves
objects in memory. But once profiling is active the whole time of the
program execution I think that trace cache will be tuned very fast to the
new object locations.

3) I do not understand how Chilimbi's algorithm deals memory access in loops
(example is an iteration of LinkedList). SEQUITUR examples are very smart
when you have an alphabet with a limited number of letters. When you access
the 10000 or more memory addresses from inside of a loop how the trace will
look like?

--
Mikhail Fursov

Reply via email to