aasorokiin opened a new pull request #32: URL: https://github.com/apache/tvm-vta/pull/32
VTA modification for parametrizable AXI data size. Performance change: Numbers are based on test_benchmark_topi_conv2d.py limited to resnet-18.C2 Baseline 64bit data tsim run: 192M cycles. New 64bit data transfer: 97M cycles. 128bit data: 58M cycles. 256bit data: 39M cycles. 512bit data: 29M cycles. Code changes: * AXI 64/128/256/512 data bits support by AXIParams->dataBits * TensorLoad is modified to replace all VME load operations. Multiple simultaneous requests can be generated. Load is pipelined and separated from request generation. A "wide" implementation of load/store is used when AXI interface data width is larger than number of bits of a tensor. * TensorStore -> TensorStoreNarrowVME TensorStoreWideVME. The narrow one is the original one * TensorLoad -> TensorLoadSimple (original) TensorLoadWideVME TensorLoadNarrowVME * LoadUop -> LoadUopSimple is the original one. The new one is based on TensorLoad * Fetch -> FetchVME64 FetchWideVME. Reuse communication part from TensorLoad. Implemented as a 64bit tensor with double tensor read to allow 64bit address alignment. * DPI interface changed to transfer more than 64bit. svOpenArrayHandle is used. tsim library compilation now requires verilator includes * Compute is changed to use TensorLoad style of load uop. * VME changed to generate/queue/respond to multiple simultaneous load requests * Added SyncQueue with tests - Implementation uses sync memory to implement larger queues. Code contributions to this PR were made by the following individuals (in alphabetical order): @suvadeep89, @stevenmburns, @pasqoc, @adavare, @sjain12intel, @aasorokiin, and @zhenkuny. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org