In my configuration I used CPUTypes.O3 and PrivateL1SharedL2CacheHeirarchy
to check how clflush and fence impacts the timing of workload. In my
workload I run 10,000 iteration to update an array value, 200 updates per
thread. In workload, I have :
for( ;index <end_index-1;index++){
ARR[index]=thread_ID;
ARR[index+1]=thread_ID;
FENCE;
}
to simulate two consecutive localized write operations and see the
impact of the fence. Insertion of FENCE ( macro to insert mfence ) increase
execution time by 24%. In second scenario, I have :

for( ;index <end_index-1;index++){
> ARR[index]=thread_ID;
> FLUSH(&ARR[index]);
> FENCE;
> }
>

Where FLUSH (macro for _mm_clflush) should take more time to complete than
ARR[index+1]=thread_ID as this memory update should be highly localized and
flush needs to get acknowledgement from all levels of cache before
complete. So, FENCE should have much more penalty for flush compared to
write operation. So, I was hoping to see a high execution time increase for
insertion of fences in the second scenario. But insertion of the fence only
increases 2% execution time which is counter intuitive.
Can anyone explain why I'm seeing this behaviour ? As far as I understand,
the memory fence should let the following instruction execute after all
previous instructions are completed and removed from the store buffer
in which case clflush should take more time than regular write operation.

Best
Shaikhul
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

Reply via email to