Hello, I would like to submit abstracts for 2 talks:
Title: The History and the Future of RunTime Compilation in MXNet Categories: Framework architecture and compiler technology, Optimization and performance As the computational capabilities of Deep Learning hardware become greater, there is a growing performance discrepancy between different types of DL operations. Some operations, like convolutions and fully connected layers, are in the spotlight for being compute intensive and are therefore benefiting greatly from those hardware advancements. Other operations, like normalization layers or even simple ReLU activations, fly under most people's radar when thinking about model performance optimization. However, because advancements in hardware focus on central well-known model operations, those traditionally small parts of the model are now taking a significantly greater portion of the training time. For example, when we started optimizing the ResNet 50 model for the first round of MLPerf benchmark, ReLU activation took about 10% of the training time. In this talk we will present how RTC (RunTime Compilation) lets us tackle this problem within MXNet currently. We will also talk about the work being done to expand RTC use in the upcoming MXNet 2.0 to achieve the best efficiency. Title: Towards the ultimate framework efficiency Categories: Framework architecture and compiler technology, Optimization and performance The dependency engine inside MXNet has a very elegant design that enables efficient multithreaded execution by bypassing Python's Global Interpreter Lock (GIL). While the design works very well for hybridized execution, imperative execution exposes small inefficiencies in the approach. Furthermore, as GPUs get faster and the number of GPUs used in training jobs gets larger, those inefficiencies become a problem even for the hybridized models. In this talk we will explore the current MXNet dependency engine design and look at examples of the inefficient execution. We will also propose ways to improve this design to achieve the fastest training and inference times for both imperative and hybridized models. Thank you, Przemyslaw Tredak
