mbs-octoml opened a new pull request, #11631:
URL: https://github.com/apache/tvm/pull/11631

   (See 
https://discuss.tvm.apache.org/t/byoc-supporting-cutlass-byoc-with-collage/12796/6
 for context, which in turn is part of Collage 
(https://github.com/apache/tvm-rfcs/blob/main/rfcs/0062-collage.md). 
   
   Currently CUTLASS has four entry points:
     - The usual 'partition_for_cutlass' partitioning function, using the 
standard pattern table and pass machinery (see cutlass/build.py).
     - A 'tune_cutlass_kernels' function which augments CUTLASS partition 
functions with the results of building and running test kernels (see 
cutlass/build.py).
     - A 'relay.ext.cutlass' external codegen function which inspects the  
turning results and generates a CSourceModule for each partitions (see 
cutlass/codegen.cc).
     - A 'build_cutlass_kernels_vm' function which runs 'export_library' with 
all the nvcc compiler options needed to build all the CSourceModules (see 
cutlass/bild.py).
     
   For Collage we'd like CUTLASS to have only two entry points: 
'partition_for_cutlass', and 'relay.ext.cutlass' or equivalent. This makes the 
CUTLASS external codegen integration composable with other integrations, which 
in turn helps Collage avoid having to understand any external codegen APIs 
other than the global pattern table and the custom compilation function/pass.  
   
   Collage also tends to end up requiring multiple partitions for the same 
backend since it is more aggressive at mixing-and-matching smaller sub-graphs 
between backends. Thus we'd also like to make sure all tuning, generated code 
and compilation overhead is shared between all such CUTLASS partitions. 
   
   So, in this PR:
   - We add all the CUTLASS-specific tuning and compilation options as new 
Target attributes for the 'external codegen' "cutlass" TargetKind 
(cutlass/target.cc). The user now has one place to provide those settings, and 
we've already done the legwork to plumb the target instance.
   - We replace 'relay.ext.cutlass' with a 'RelayToTIR' custom pass hook 
'CompileForCutlass' (see cutlass/codegen.cc). This pass obviously can see all 
the CUTLASS partitions in the IRModule, so we can now share tuning results 
between them all and can be sure to generate a single CSourceModule. The pass 
can also invoke the compiler to yield a StaticModule, which we've also already 
done the legwork to support. In this way all CUTLASS-specific steps are handled 
at once.
   - For convenience we supply 'finalize_modules' and 'finalize_modules_vm' 
which invoke nvcc for final linking (using export_library as usual). However, 
there's now nothing CUTLASS specific in those helpers other than their 
overriding of the 'compiler' to be nvcc.
   - test_cutlass.py is updated to use the new API.
   
   Though this is a breaking change for existing users of the CUTLASS 
integration the change is pretty minor, as shown in test_cutlass.py.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to