Lunderberg commented on PR #16569:
URL: https://github.com/apache/tvm/pull/16569#issuecomment-1994301290

   Taking a look at that commit, there's a couple of things that stand out to 
me.
   
   * Related to `MergeSharedMemoryAllocations`
     * The `MergeSharedMemoryAllocations` is a copy/paste of `StorageRewrite` 
with modifications.  The `StorageRewrite` pass is applied earlier and already 
handled merging of shared memory allocations, so I'm not sure why the separate 
pass is needed.
       * It looks like there's some different descriptions of `StorageRewrite`. 
 [This 2021 
comment](https://github.com/apache/tvm/pull/9341#issuecomment-949269726) states 
that `StorageRewrite` cannot merge allocations of different dtypes.  However, 
[this 2018 PR](https://github.com/apache/tvm/pull/9341#issuecomment-949269726) 
added memory-sharing of allocations of different dtypes for `StorageRewrite`.  
This may be related to static vs dynamic shapes.
     * The `StorageRewrite` pass only merged allocations that were within the 
same `attr::thread_extent` annotation, and would be part of the same kernel.  
Assuming that `MergeSharedMemoryAllocations` kept this behavior from 
`StorageRewrite`, moving it after `SplitHostDevice` shouldn't be necessary.
     * I'd have to dig into it to be sure, but using 
`MergeSharedMemoryAllocations` after `ThreadSync("warp")` seems like it could 
cause an issue.  Applying `MergeSharedMemoryAllocations` (merge two buffers 
into one) feels like it would undo the benefit of `ThreadSync("warp")` 
(synchronize to avoid contention of a single buffer).
   
   * Related to `HoistIfThenElse`
     * The comment about requiring `HoistIfThenElse` to be before `UnrollLoop` 
doesn't make sense to me.  Any condition that could be statically proven for 
all indices prior to `UnrollLoop` could also be statically proven for a 
specific loop index.  The end result of `UnrollLoop` and `HoistIfThenElse` 
should be the same regardless of the ordering.
       * Slight caveat: Unless the `HoistIfThenElse` causes a different 
decision to be made by the automatic unroll heuristics in `UnrollLoop`.  
However, those are disabled by default, and are not enabled by the unit test.
     * Moving `HoistIfThenElse` changed it's ordering relative to 
user-specified passes in the `"tir.add_lower_pass"` configuration.  While I 
don't know of any usage of this functionality, it's a bit worrying that it 
occurred without discussion.
     * The new function attribute `"tir.HoistIfThenElseExprWithBlock"` 
overrides the user's configuration for `"tir.HoistIfThenElse"`.  If the user 
had selected a stronger hoisting to be performed (e.g. hoisting 
`tir::IfThenElse` statements), it would no longer be applied.
     * The configuration used when the new function attribute 
`"tir.HoistIfThenElseExprWithBlock"` is set doesn't make sense for this pass's 
location in the lowering flow.  The `HoistIfThenElse` pass occurs after 
`ConvertBlocksToOpaque` has removed all block variables, so the 
`HoistedConditionals::kUsingBlockVar` flag shouldn't be required.
   
   * Related to unit tests
     * The unit tests added in `test_gpu_low_batch_gemv.py` only exercise the 
meta-schedule functionality, and do not exercise any of the changes made to 
`driver_api.cc` or to `HoistIfThenElse`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to