Lunderberg commented on PR #77:
URL: https://github.com/apache/tvm-rfcs/pull/77#issuecomment-1162392893

   > Introducing changes to TIR would needs some additional thoughts that 
deserves some extra consideration. Due to the N*M complexity (where N is the 
TIR possibilities and M is the number of primitives to be supported) that needs 
to be handled in implementation (by backend implementers and primitive 
implementers)
   
   This was part of the design consideration, to minimize the impact of the 
proposed changes to primitives, lowering transformations, and backends.
   
   * The `BufferConstraint` annotations do not need specific handling at the 
codegen level, as it is only present to enable compile-time optimizations.
     
   * Use of the `BufferConstraint` hints would occur within existing utilities, 
primarily as additional information available in `arith::Analyzer` utilities.  
This minimizes the need for other primitives/transforms to be aware of the 
buffer constraints, while still benefiting from them.
     
   * The `T.undef()` built-in does not need specific handling at the codegen 
level, as it is removed during lowering.
     
   * The `T.undef()` built-in does not require specific handling from other 
primitives, as stores of `T.undef()` can be treated the same as stores of any 
other value.
     
   > Right now it is possible to do non-local constraint rewriting flowings as 
part of the graph pass. Note that while E1 is indeed less "compact" on one 
hand, we can use it to reconstruct the desirable compact data 
structure(something like BufferConstraint that represents the layout mapping) 
that we can use to flow the decisions across the graph node during the pass.
     
   I definitely agree that graph-level transforms are where the layouts and 
constraints should be decided.  The `BufferConstraint` annotations are not 
intended as a way to override in TIR what was already decided at the graph 
level, but rather a way to communicate to TIR transformations what has been 
decided at the graph level.
   
   > E1: Composing a stage that transforms the layout(a loop that represents 
the mapping)
   
   I'm still a bit confused with this approach, specifically how one would 
avoid having a separate compute definition for each workload on a new target 
(Initially brought up by @csullivan 
[here](https://github.com/apache/tvm-rfcs/pull/77#discussion_r893701372).) In 
my mind, if I'm going to compose a layout transformation stage, it would need 
to be followed by a compute stage that takes a transformed layout as input.  So 
rather than having a single conv2d that can be generalized over layouts, each 
transformed layout would still need to have a compute stage for it.
   
   > Note that intiially such data structure do not need to live beyond the 
life of a pass, because they can be reconstructed at anytime from the other 
representation.
   
   How would this be represented while optimizing the performance of a 
subgraph?  My concern would be how to express the non-local constraints while 
keeping a small search space for optimization.
   
   * Ensure that the producer and consumer stages are within the same subgraph. 
 Since the constraints provided to a consumer depend not only on the producer, 
but also on the constraints provided to the producer, so this might require 
fusing the entire end-to-end model into a single monolithic kernel.
     
     My understanding is that this would result in a search space that is too 
large to effectively optimize, though I haven't explicitly tested it.
     
   * Insert a transformation stage into the subgraph, in which the constraint 
is written.  Later portions of the subgraph could then rely on the constraint 
without examining other subgraphs.
     
     Would need to have some way to indicate that the transformation stage 
shouldn't be altered during optimization, nor should it be part of the 
performance timing.
     
   * Express the graph-level constraints to a subgraph, so that it can optimize 
using those constraints.
     
     This was my intent with the `BufferConstraint` annotations, since then the 
subgraphs could take advantage of
     
   > E1 also enables some additional capabilities (e.g.) expressing future 
memory remappings that do not necessarily fit into padding/packing.
   
   Is there an existing annotation to indicate that a stage should be removed 
entirely during lowering?  That might be an effective way to allow more general 
usage by annotating a stage that can be assumed to have been performed prior to 
the subgraph.  This would be a way to express the second option of an extra 
transformation stage, while still providing enough information to remove the 
transformation stage during lowering.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to