Zha0q1 opened a new issue #19747:
URL: https://github.com/apache/incubator-mxnet/issues/19747


   I am trying to enable the path mxnet/gluon-nlp --> onnx --> tensorrt.
   There is a bug that if I use a pretrained bert model, then running inference 
with tensor rt in fp16 mode will produce `nan`'s.
   
   Using pretrained weights:
   ```
   bert, _ = nlp.model.get_model(
       name=model_name,
       ctx=ctx,
       dataset_name=dataset,
       pretrained=True,
       use_pooler=True,
       use_decoder=False,
       num_layers=3, # hardcode this as 3 layer since this is what the customer 
uses
       use_classifier=False,
       hparam_allow_override=True)
   model = bert
   ```
   Not using pretrained weights:
   ```
   bert, _ = nlp.model.get_model(
       name=model_name,
       ctx=ctx,
       dataset_name=dataset,
       pretrained=False,
       use_pooler=True,
       use_decoder=False,
       num_layers=3, # hardcode this as 3 layer since this is what the customer 
uses
       use_classifier=False,
       hparam_allow_override=True)
   model = bert
   model.initialize(ctx=ctx)
   ```
   
   More specifically, WITHOUT pretrained weights, tensor rt can produce 
reasonable outputs in both fp16 mode and regular fp32 mode. However, WITH 
pretrained weights, tensor rt will produce nan ouputs in fp16 mode, but fp32 
mode seems to work fine. Furthermore, it seems like this nan issue is triggered 
by the size of `seq_length`: when `seq_length<=16` even fp16 mode will produce 
reasonable outputs; when `seq_length>17`, fp 16 mode will start to produce 
`nan`'s.  `batch` batch size seems to not affect the nan behavior.
   
   Reproducible code and steps can be found here 
https://github.com/apache/incubator-mxnet/pull/19746. Because we have a 
customer requesting this feature, it would be great if friends at Nvidia can 
help look into this issue. Please let me know how I can provide further 
info/help
   
   @sandeep-krishnamurthy @MoisesHer @Kh4L 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to