yzh119 opened a new pull request, #14814:
URL: https://github.com/apache/tvm/pull/14814

   # Motivation
   The gelu operator has different implementations, in 
[PyTorch](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html), they 
are distinguished by `approximate='none'` or `approximate='tanh'`. Currently, 
Relax only implements the case where `approximate="none"`  which depends on erf 
function, this PR implements the gelu operator approximated by tanh. In hugging 
face transformers, the gelu with tanh operator is called 
[gelu_new](https://github.com/huggingface/transformers/blob/366a8ca09e8dd92dfb1956c8be3118b5a2b13639/src/transformers/activations.py#L49)
 and was used in many models.
   
   ## Difference with erf gelu
   Actually, I only observed a very slight numerical difference between the two 
implementations, so I'm still not sure whether to include this variant or not.
   
   Below is a figure showing the numerical difference between the two, and the 
script:
   
![image](https://github.com/apache/tvm/assets/11773619/e002b8a5-9e0a-49ed-9e61-eebdfb7c0eba)
   
   ```python
   import numpy as np
   import torch
   import torch.nn.functional as F
   import matplotlib.pyplot as plt
   
   x = np.linspace(-6, 6, 200)
   x = torch.from_numpy(x).float()
   y_none = F.gelu(x, approximate='none')
   y_tanh = F.gelu(x, approximate='tanh')
   plt.plot(x.numpy(), (y_tanh - y_none).numpy(), label='difference')
   plt.legend()
   plt.show()
   ```
   
   The largest difference is 5e-4.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to