comaniac opened a new pull request #6686:
URL: https://github.com/apache/incubator-tvm/pull/6686


   This PR improves the cache read sketch generation rule in auto-scheduler to 
support multiple cache reads. Previously, cache read sketch generation rule 
will simply give up if a tensor has multiple consumers. This results in all 
consumers read that tensor directly from the host memory, which hurts the 
performance a lot. This PR resolves this limitation.
   
   In addition, this PR also fixes the following bugs/issues:
   - Consider `max_threads_per_block` in cross thread reduction rule condition, 
so that we can guarantee computation intensive ops won't have cross thread 
reduction sketch, which usually doesn't help.
   - Add `.d` suffix to the name of redundant cache read ops to avoid conflict. 
Without this change, auto-scheduler outputs Python schedule APIs won't work for 
multiple cache reads.
   - Add a new constructor of `ComputeDAG` by taking a schedule, so that we can 
enforce stage order consistency.
   
   cc @merrymercy @jcf94 @tqchen 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to