On 6/16/2023 11:39 AM, Khan Shaikhul Hadi via gem5-users wrote:
Hi,

I'm trying to figure out how "clflush" instruction works in gem5. Specially, how it issues a signal to the cache controller to evict the block from cache hierarchy throughout the system and how it receives confirmation to clean the store buffer so that the next fence let following instructions to proceed. Anyone have any idea how this works or where I should look for better understanding ?


I have tried to trace clflush execution and found some confusing facts. It would be great if anyone could clarify this.


1. "clflush" instruction execution eventually calls Clflushopt::initiateAcc() (build/X86/arch/x86/generated/exec-ns.cc.inc ) as macroop definition of CLFLUSH uses clflushopt. So, there is no dedicated clflush operation in gem5 but all flush operations are treated as clflushopt ?

clflush is a clflushopt followed by a microop that waits for the store queues
to be empty.  This is what causes the stronger ordering of clflush vs 
clflushopt.

2. When  Clflushopt::initiateAcc() executes in timing simulation ( CPUType::TIMING), it eventually calls TimingSimpleCPU::writeMem() function in src/cpu/simple/timing.cc. Here you have :

    if (data == NULL) {
             assert(flags & Request::STORE_NO_DATA);
             // This must be a cache block cleaning request
             memset(newData, 0, size);
         } else {
             memcpy(newData, data, size);
         }


So, I was assuming it will have data==NULL and execute memset() but it actually executes memcpy(). This seems weird. Am I missing something ?

Some processors have an operation to zero a cache block (line).  That's what 
the memset is for.
Otherwise the flushed data have been sent to the memory and need to be stored 
(memcpy).

3. For out-of-order simulation (CPUType.O3), Clflush::initiateAcc() is called twice the number of clflush instructions in my workload. For example, if my workload has 6 clflush instructions, gdb  breakpoint at Clflush::initiateAcc shows that this function is called 12 times (timing simulation called this function 6 times as it should). Can anyone explain what happens here?

I'd have to go dig into the code, but maybe what you're seeing is that the 
instruction must
first do a virtual address translation, and only after the result of that is 
available (some
number of cycles later) can it send the actual request (which is put into a 
store queue and
acted on in due course).

Note further that an operation like clflush may travel all the way out to the 
coherent xbar
closest to the memory and then snoops will be sent down to *all* the caches 
(since the line
in question may be in some other processor's L1 cache (for example)).  
Whichever cache has
the data will respond.  If none respond, then the cache line is not resident 
anyway (or was
not dirty and is now dropped by all the caches) so there is no further work to 
do.

There are some aspects of this where gem5 does not follow what x86 processors 
do ... in
particular, gem5 handles all x86 memory store operations (clflush is in this 
category)
in order (Intel TSO - total store order), even though Intel ordering of 
clflushopt and
clwb is weaker.  I coded up something more like actual Intel behavior, but have 
not
submitted it back to gem5 :-( ...  It made the store queue processing rather 
more subtle,
since the existing code counted on things proceeding in order.

HTH
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

Reply via email to