Hello,

I would like to submit abstracts for 2 talks:

Title: The History and the Future of RunTime Compilation in MXNet
Categories: Framework architecture and compiler technology, Optimization and 
performance

As the computational capabilities of Deep Learning hardware become greater, 
there is a growing
performance discrepancy between different types of DL operations. Some 
operations, like convolutions
and fully connected layers, are in the spotlight for being compute intensive 
and are therefore
benefiting greatly from those hardware advancements. Other operations, like 
normalization layers or
even simple ReLU activations, fly under most people's radar when thinking about 
model performance
optimization.
However, because advancements in hardware focus on central well-known model 
operations, those
traditionally small parts of the model are now taking a significantly greater 
portion of the
training time. For example, when we started optimizing the ResNet 50 model for 
the first round of
MLPerf benchmark, ReLU activation took about 10% of the training time.
In this talk we will present how RTC (RunTime Compilation) lets us tackle this 
problem within MXNet
currently. We will also talk about the work being done to expand RTC use in the 
upcoming MXNet 2.0
to achieve the best efficiency.


Title: Towards the ultimate framework efficiency
Categories: Framework architecture and compiler technology, Optimization and 
performance

The dependency engine inside MXNet has a very elegant design that enables 
efficient
multithreaded execution by bypassing Python's Global Interpreter Lock (GIL). 
While the design works
very well for hybridized execution, imperative execution exposes small 
inefficiencies in the
approach. Furthermore, as GPUs get faster and the number of GPUs used in 
training jobs gets larger, those
inefficiencies become a problem even for the hybridized models.
In this talk we will explore the current MXNet dependency engine design and 
look at examples of
the inefficient execution. We will also propose ways to improve this design to 
achieve the
fastest training and inference times for both imperative and hybridized models.



Thank you,
Przemyslaw Tredak

Reply via email to