[GitHub] [tvm] areusch commented on pull request #12087: [UMA] UMA v1.0

GitBox Thu, 21 Jul 2022 14:39:57 -0700


areusch commented on PR #12087:
URL: https://github.com/apache/tvm/pull/12087#issuecomment-1191960637

we discussed this in the [Community
Meeting](https://discuss.tvm.apache.org/t/next-tvm-community-meeting-july-20/13148/2)
yesterday. here are notes on the discussion:
- when the design references "Accelerator A" and "Accelerator B," does this
mean we're using both simultaneously?
- not in this v1, though the architecture supports it. at present they can
simply coexist as options.
- should we integrate this with TVMC?
- @areusch: it should be fairly easy to integrate the UMA targets with the
`tvmc run` command
- @manupa-arm : this should be pretty straightforward to add to tvmc. the
bigger concern here was around `uma_cli.py`, which is supposed to generate a
starter implementation for new accelerators in uma.
- @areusch : we should either have tvmc or some other developer-facing
entry point to house tools like this. probably not bad to add dev tools to tvmc
now--we can always migrate them out if we need to.
- @MichaelJKlaiber : intention of uma_cli is just to make the tutorial
easier to replicate on your own, so there are two steps there--create the
accelerator flow and then run inference.
- @manupa-arm : do we expect the CLI to work when we're in an environment
where only the tvm wheel is present? e.g what about the C sources included with
accelerator? should those go in the wheel?
- @MichaelJKlaiber: those sources are copied into the generated dir by
uma_cli.
- @areusch : what's the include path folks are expected to set on their
downstream C compiler? seems like the C files included with accelerator
template should really make it into the Model Library Format. Could produce
another CSourceModule which would create another e.g. `default_lib3.cc` in the
MLF. Could also use the `import_c` pragma
[similar](https://github.com/apache/tvm/blob/main/python/tvm/topi/arm_cpu/mprofile/dsp/micro_kernel/max_pool.py#L87).
to how we do for microTVM.
- where should the template live?
- @areusch : could go either way or both. how do we expect people to
package their accelerator flow? if merging into mainline, perhaps we want in
the python import path. if keeping accelerator flow private, perhaps apps is
similar to carrying that code alongside the tvm wheel.
- @manupa-arm : deciding intended location based on whether a flow will
get upstreamed makes sense. `_template` is an example rather than a target, so
maybe `apps` could make more sense for it.
- @manupa-arm : also suggest to break the CLI changes into another PR.

- @MichaelJKlaiber : only vanilla accelerator was impl'd; do folks have
suggestions for chocolate and strawberry? feel free ot post in discuss thread
or contact
- @areusch : would be cool to see something that leverages usmp to model
physical accelerator memories. could also be cool to see an example where
buffers were marked to live on-device.
- Slava: are the optimization provided in the default TVM pipeline also part
of the UMA pipeline?
- @areusch : you can classify the optimizations in terms of relay passes,
scheduling, and post-scheduling passes. TVM tries to operate on an
IRModule-to-IRModule prinicple, where each optimization or step takes an
IRModule and returns an IRModule. when you mark a subgraph as offloaded to an
UMA pipeline, some optimizations aren't enabled--for example, Relay-level
operator fusion. Others e.g. those which operate post-scheduling (usmp, for
example) will run on UMA operators.
- Slava: if I have a conv2d followed by batch norm, and only the conv2d is
offloaded, then the batch norm is not fused by default?
- @areusch: the right way to do that would be to mark both as offloaded
and do the fusion yourself. there are also some efforts to enable
post-scheduling fusion via Relax, but those haven't landed yet.
- Slava: what's the best way to leverage UMA if e.g. we have 2 different
implementations of conv2d depending on kernel size?
- @areusch : you'd need to give your pattern matcher enough fidelity to
differentiate those two workloads. you can also inspect the matched subgraph
after using a looser pattern.
- slava: what's the rough timeline?
- not really a timeline, but see https://github.com/apache/tvm/issues/11260
- @MichaelJKlaiber : can also discuss more questions in high-bandwidth with
folks.
- suggest folks post up on the discuss forum. we can also use this meeting
for further discussion.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [tvm] areusch commented on pull request #12087: [UMA] UMA v1.0

Reply via email to