areusch commented on PR #12087:
URL: https://github.com/apache/tvm/pull/12087#issuecomment-1191960637

   we discussed this in the [Community 
Meeting](https://discuss.tvm.apache.org/t/next-tvm-community-meeting-july-20/13148/2)
 yesterday. here are notes on the discussion:
   - when the design references "Accelerator A" and "Accelerator B," does this 
mean we're using both simultaneously?
     - not in this v1, though the architecture supports it. at present they can 
simply coexist as options.
   - should we integrate this with TVMC?
     - @areusch: it should be fairly easy to integrate the UMA targets with the 
`tvmc run` command
     - @manupa-arm : this should be pretty straightforward to add to tvmc. the 
bigger concern here was around `uma_cli.py`, which is supposed to generate a 
starter implementation for new accelerators in uma.
     - @areusch : we should either have tvmc or some other developer-facing 
entry point to house tools like this. probably not bad to add dev tools to tvmc 
now--we can always migrate them out if we need to.
     - @MichaelJKlaiber : intention of uma_cli is just to make the tutorial 
easier to replicate on your own, so there are two steps there--create the 
accelerator flow and then run inference.
     - @manupa-arm : do we expect the CLI to work when we're in an environment 
where only the tvm wheel is present? e.g what about the C sources included with 
accelerator? should those go in the wheel?
     - @MichaelJKlaiber: those sources are copied into the generated dir by 
uma_cli.
     - @areusch : what's the include path folks are expected to set on their 
downstream C compiler? seems like the C files included with accelerator 
template should really make it into the Model Library Format. Could produce 
another CSourceModule which would create another e.g. `default_lib3.cc` in the 
MLF. Could also use the `import_c` pragma 
[similar](https://github.com/apache/tvm/blob/main/python/tvm/topi/arm_cpu/mprofile/dsp/micro_kernel/max_pool.py#L87).
 to how we do for microTVM.
   - where should the template live?
     - @areusch : could go either way or both. how do we expect people to 
package their accelerator flow? if merging into mainline, perhaps we want in 
the python import path. if keeping accelerator flow private, perhaps apps is 
similar to carrying that code alongside the tvm wheel.
     - @manupa-arm : deciding intended location based on whether a flow will 
get upstreamed makes sense. `_template` is an example rather than a target, so 
maybe `apps` could make more sense for it.
   - @manupa-arm : also suggest to break the CLI changes into another PR.
   
   - @MichaelJKlaiber : only vanilla accelerator was impl'd; do folks have 
suggestions for chocolate and strawberry? feel free ot post in discuss thread 
or contact
     - @areusch : would be cool to see something that leverages usmp to model 
physical accelerator memories. could also be cool to see an example where 
buffers were marked to live on-device.
   - Slava: are the optimization provided in the default TVM pipeline also part 
of the UMA pipeline?
     - @areusch : you can classify the optimizations in terms of relay passes, 
scheduling, and post-scheduling passes. TVM tries to operate on an 
IRModule-to-IRModule prinicple, where each optimization or step takes an 
IRModule and returns an IRModule. when you mark a subgraph as offloaded to an 
UMA pipeline, some optimizations aren't enabled--for example, Relay-level 
operator fusion. Others e.g. those which operate post-scheduling (usmp, for 
example) will run on UMA operators.
     - Slava: if I have a conv2d followed by batch norm, and only the conv2d is 
offloaded, then the batch norm is not fused by default?
     - @areusch: the right way to do that would be to mark both as offloaded 
and do the fusion yourself. there are also some efforts to enable 
post-scheduling fusion via Relax, but those haven't landed yet.
   - Slava: what's the best way to leverage UMA if e.g. we have 2 different 
implementations of conv2d depending on kernel size?
     - @areusch : you'd need to give your pattern matcher enough fidelity to 
differentiate those two workloads. you can also inspect the matched subgraph 
after using a looser pattern.
   - slava: what's the rough timeline?
     - not really a timeline, but see https://github.com/apache/tvm/issues/11260
   - @MichaelJKlaiber : can also discuss more questions in high-bandwidth with 
folks.
     - suggest folks post up on the discuss forum. we can also use this meeting 
for further discussion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to