mbs-octoml commented on a change in pull request #48: URL: https://github.com/apache/tvm-rfcs/pull/48#discussion_r787217701
########## File path: rfcs/0048-BYOC-Marvell-ML-accelerator-integration.md ########## @@ -0,0 +1,547 @@ +- Feature Name: (fill me in with a unique identifier, `my_awesome_feature`) +- Start Date: (fill me in with today's date, YYYY-MM-DD) +- RFC PR: [apache/tvm-rfcs#0000](https://github.com/apache/tvm-rfcs/pull/0000) +- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000) +- GitHub pre-RFC PR: [apache/tvm-PR-9730](https://github.com/apache/tvm/pull/9730) +- GitHub pre-RFC discussion: [BYOC-Marvell](https://discuss.tvm.apache.org/t/pre-rfc-byoc-marvell-ml-ai-accelerator-integration/11691) + +# Summary +[summary]: #summary + +Integrate Marvell’s ML/AI accelerator with TVM BYOC framework in order to bring the TVM ecosystem to Marvell customers. + +# Motivation +[motivation]: #motivation + +Marvell MLIP is an ML/AI inference accelerator and is embedded on our ARM Neoverse N2-based OCTEON 10 processor. + We are building an easy-to-use, open, software suite for our customers by integrating and utilizing TVM so that + we can bring TVM capability and experience to our customers. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +Based on what Marvell ML/AI inference accelerator does the best, a given pre-trained network model +will be applied to a TVM-Mrvl-BYOC AOT compilation and code-gen flow as illustrated in steps below. + +STEP (1) Run TVM-Mrvl-BYOC AOT ML Frontend Compilation and Mrvl-BYOC code-gen. The steps involved in this are: + +* Load pre-trained network into TVM IR graph + +* Do Marvell-specific layout conversions to transform IR graph in order to meet requirements of the accelerator + +* Do Marvell-specific composite-merging/fusing to transform IR graph in order to utilize available HW capability Review comment: Hi, thanks for the RFC. My team at OctoML is looking at bringing some training features to the BYOC world (a la https://arxiv.org/pdf/2111.00655.pdf), so I'm looking at this RFC with that future in mind. Can you expand on: - Is the fusion using the existing MergeComposite / AnnotateTarget/ MergeCompilerRegions(maybe) / PartitionGraph sequence? - Other than the global layout xform, which necessarily must be done before any fusion etc, are there any other xforms before the above partitioning takes place? - Can you explain the need to limit to one kernel for each of your byoc and the default tvm? Perhaps it's an artifact of how you're later trying to capture the byoc output in json graph form? Ideally the BYOC target.ext.name function could be run multiple times, the resulting runtime::Module would be accumulated in the IRModule, and the runtime::Modules later merged. Perhaps supporting that would actually be easier and would remove the at-most-one kernel limit? - Ideally there'd be a single entry point for 'partition for marvel', after which the regular TVM build would deal with fusion, lowering and codegen for everything that's left (ie overall model - kernels you already partitioned out). I may not be following the explanation but it seems you're proposing the driver splits things more explicitly. - Like @areusch I'm a bit confused by the special handling of the graph. Perhaps it would be worth going through the tensorrt BYOC integration as a reference example since it too collects a JSON representation of the to-be-complied fused sub-graph (we invoke the TensorRT build function at runtime not compile time), but it does so on top of existing machinery. Let me know if it would be easier to discuss this on a PR rather than here, then we could come back to here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org