tqchen commented on pull request #5914:
URL: https://github.com/apache/incubator-tvm/pull/5914#issuecomment-648903111
I like the overall util for cache flushing, however, it would be great to
discuss the interface of cache eviction. In terms the API choices:
- While it is understandable that we could like to include first
argument(activation) and flush the rest of the argument, it is still a very
specific setup(ideally it might be better to make it configurable).
- Right now things are configured through env variable, is it the best way
to configure API?
- The current logic does not check for other context(besides CPU), and will
results un determined behavior when we use OpenCL or CUDA(because the opaque
data ptr does not corresponds to a CPU addresss), it might also cause problem
when the function is not an DLTensor
Here are a few alternative API choices for configuring the cache flushing
behavior.
### A0: Fold cache flushing factor into time_evaluator
```python
mod = load_module()
# flush cpu cache of args 1
f = mod.time_evaluator("myfunc", repeat=10, cache_flush_cpu_args_begin=1)
```
### A1: Decoupled Composite style
```python
mod = load_module()
# cache flush packed is a packed func that performs the cpu cache flush
cache_flush_packed = remote.get_function("cpu_cache_flush")(args=begin=1)
# fprepare is a callback that will be called before the evaluation, it takes
in args as arguments.
f = mod.time_evaluator("myfunc", repeat=10, fprepare=cache_flush_packed)
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]