elvin-n opened a new pull request, #12537: URL: https://github.com/apache/tvm/pull/12537
The current support of mixed precision in adreno schedules was implemented as standalone schedules having "fp32" suffix. Such kernels can be selected during compilation due to two reasons 1. We had higher priority for fp16_acc32 schedule than pure fp16 schedules 2. If we saw AutoTVM tune statistics with proper schedule name The tune flow in its turn was not able to distinguish and point only fp16 or fp16_acc32. Both schedules are tuned and during the compilation the schedule having best time is selected. I.e. by fact without artificial approach we are not able to tune and compile pure fp16 or fp16_acc32. Only manual selection of tune statistics causes currently to execute one of this mode. In addition to this, the conversion function to fp16 was custom made in the user's script that isnot available to the public tvm user. To address above issues we are proposing to use `ToMixedPrecision()` pass. It supports mixed precision (fp16 compute with fp32 accumulation) as well. Current PR changes 1. Adreno strategy to remove extra fp16_acc32 schedules 2. topi/adreno/* to leave the only schedule per each convolution instead of 3 3. topi/adreno/conv2d_alter_op.py <- to address performance issue happening in the new flow due to different order of casts and causing in some cases more data be passed between opencl kernels. We addressed case where we have number of input channel not divideable by 4 and number of output channels dividable by 4. Previously we generated 4 kernels for repacking data in runtime, now we have only two kernels for such case and we do not repack weights in runtime + we do not repack output back to NCHW -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
