yzh119 opened a new pull request, #14814: URL: https://github.com/apache/tvm/pull/14814
# Motivation The gelu operator has different implementations, in [PyTorch](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html), they are distinguished by `approximate='none'` or `approximate='tanh'`. Currently, Relax only implements the case where `approximate="none"` which depends on erf function, this PR implements the gelu operator approximated by tanh. In hugging face transformers, the gelu with tanh operator is called [gelu_new](https://github.com/huggingface/transformers/blob/366a8ca09e8dd92dfb1956c8be3118b5a2b13639/src/transformers/activations.py#L49) and was used in many models. ## Difference with erf gelu Actually, I only observed a very slight numerical difference between the two implementations, so I'm still not sure whether to include this variant or not. Below is a figure showing the numerical difference between the two, and the script:  ```python import numpy as np import torch import torch.nn.functional as F import matplotlib.pyplot as plt x = np.linspace(-6, 6, 200) x = torch.from_numpy(x).float() y_none = F.gelu(x, approximate='none') y_tanh = F.gelu(x, approximate='tanh') plt.plot(x.numpy(), (y_tanh - y_none).numpy(), label='difference') plt.legend() plt.show() ``` The largest difference is 5e-4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
