From: Brenton Chu <[email protected]>
Date: Thursday, November 12, 2020 at 2:16 PM
To: apachemxnetday <[email protected]>
Subject: Submission for Apache MXNet Day: Optimizing Inference for Neural 
Machine Translation using Sockeye 2

Title:
Optimizing Inference for Neural Machine Translation using Sockeye 2

Abstract:
Transformer networks have revolutionized the field of Machine Translation and 
have been shown to produce better translations, especially for long input 
sentences, than the traditional recurrent neural networks.
However, such models can become computationally intensive when considering the 
length of the output sentences. In this session, we will explore the 
Transformer based model using Sockeye, the open source NMT implementation that 
powers Amazon Translate. We will discuss methods to profile deep learning 
workloads using Nvidia NSight Systems and identify areas for improving 
performance. Specific optimizations including faster multi-head attention, 
using and supporting lower precision, and beam search updates are also 
discussed; these optimizations can provide up to 15x speed-up over a comparable 
CPU instance. All the relevant changes have been made available as part of 
latest release of Apache MXNet and Amazon Sockeye framework
Finally, we will demonstrate the impact of these optimization techniques by 
showing the most cost-effective Inference to date on an Amazon EC2 G4 instance 
with Nvidia T4 GPUs.

Speaker:
Brenton Chu

Reply via email to