FrozenGene commented on pull request #5914: URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-650738036
> I agree that the cache flush mechanism is useful in getting preciser measurement. It would be great if @FrozenGene can provide some experimental data to further assure. > > I vote for folding cache flushing factor into time_evaluator for succinctness. And making it more configurable and generic sounds good to me. @yidawang Previous experimental data is almost based on Ansor, especially x86 winograd. Like winograd of 1x7x7x512x512, the single op tuning performance time could reach in 0.11ms (on one skylake 512 machine), but when to execute on e2e, this op even could cost several ms (sorry I lost this number, only record 0.11ms). The issue is the const matrix and weight (for example, 3x3x513x512 will become 6x6x512x512 if tile size is 4). Another benefit to add clflush is we needn't min_repeat_ms (like 1000) because we could measure it very precisely. Like this pr, we even only set repeat to be 10. So we could reduce our tuning time. I am collecting auto tvm resnet18 data on one skylake machine and will share it when it completes ASAP. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
