Hi zouqiong, I looked through Chilimbi article and I think it contains some good points on tracing of memory access. I going to read it more carefully this weekend and will ready to discuss their algorithm in details next week.
Now I propose to set the goals we want to achieve more precisely. Once you decide that PMU profiling must be used we have face with the following problems: 1) Neither Linux nor Windows systems do now allow to developer to access PMU without kernel patch or driver-installation. So even if our implementation is successful and included into harmony distribution it won't be enabled by default. 2) It's hard to maintain all of the CPU families. E.g. I think that we should select P4 and newer architectures. I mean the implementation will be easier if we have PEBS support in CPU. 3) I do not know any open-source library that supports both Linux and Windows and P4 PMU events. I'll try to find it out but I'm afraid we have to limit our implementation only for Linux today. If these limitation are OK we can create a separate thread with subject prefix [drlvm][dpgo] to discuss our plans and algorithms in details. If limitations are too much and you expect to write an optimization that will work on any platform, may be instrumentation based value profiling might be a right choice. ? -- Mikhail Fursov