damccorm commented on code in PR #25226:
URL: https://github.com/apache/beam/pull/25226#discussion_r1097844820
##########
website/www/site/layouts/partials/section-menu/en/documentation.html:
##########
@@ -225,6 +225,7 @@
<li><a href="/documentation/ml/anomaly-detection/">Anomaly
Detection</a></li>
<li><a href="/documentation/ml/large-language-modeling">Large Language
Model Inference in Beam</a></li>
<li><a href="/documentation/ml/per-entity-training">Per Entity Training in
Beam</a></li>
+ <li><a href="/documentation/ml/tensorrt-runinference">TensorRT Text
Classification Inference</a></li>
Review Comment:
```suggestion
<li><a href="/documentation/ml/tensorrt-runinference">TensorRT
Inference</a></li>
```
This will help this render more naturally and illustrates the main point of
the document
##########
website/www/site/content/en/documentation/ml/tensorrt-runinference.md:
##########
@@ -0,0 +1,150 @@
+---
+title: "TensorRT RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Use TensorRT with RunInference
+- [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) is an SDK that
facilitates high-performance machine learning inference. It is designed to work
with deep learning frameworks such as TensorFlow, PyTorch, and MXNet. It
focuses specifically on optimizing and running a trained neural network to
efficiently run inference on NVIDIA GPUs. TensorRT can maximize inference
throughput with multiple optimizations while preserving model accuracy
including model quantization, layer and tensor fusions, kernel auto-tuning,
multi-stream executions, and efficient tensor memory usage.
+
+- In Apache Beam 2.43.0, Beam introduced the
[TensorRTEngineHandler](https://beam.apache.org/releases/pydoc/2.43.0/apache_beam.ml.inference.tensorrt_inference.html#apache_beam.ml.inference.tensorrt_inference.TensorRTEngineHandlerNumPy),
which lets you deploy a TensorRT engine in a Beam pipeline. The RunInference
transform simplifies the ML inference pipeline creation process by allowing
developers to use Sklearn, PyTorch, TensorFlow and now TensorRT models in
production pipelines without needing lots of boilerplate code.
+
+The following example that demonstrates how to use TensorRT with the
RunInference API using a BERT-based text classification model in a Beam
pipeline.
+
+## Build a TensorRT engine for inference
+To use TensorRT with Apache Beam, you need a converted TensorRT engine file
from a trained model. We take a trained BERT based text classification model
that does sentiment analysis, that is, it classifies any text into two classes:
positive or negative. The trained model is available [from
HuggingFace](https://huggingface.co/textattack/bert-base-uncased-SST-2). To
convert the PyTorch Model to TensorRT engine, you need to first convert the
model to ONNX and then from ONNX to TensorRT.
+
+### Conversion to ONNX
+
+You can use the HuggingFace `transformers` library to convert a PyTorch model
to ONNX. For details, see the blog post [Convert Transformers to ONNX with
Hugging Face
Optimum](https://huggingface.co/blog/convert-transformers-to-onnx). The blog
post explains which required packages to install. The following code that is
used for the conversion.
Review Comment:
```suggestion
You can use the HuggingFace `transformers` library to convert a PyTorch
model to ONNX. For details, see the blog post [Convert Transformers to ONNX
with Hugging Face
Optimum](https://huggingface.co/blog/convert-transformers-to-onnx). The blog
post explains which required packages to install. The following code is used
for the conversion.
```
##########
website/www/site/content/en/documentation/ml/tensorrt-runinference.md:
##########
@@ -0,0 +1,150 @@
+---
+title: "TensorRT RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Use TensorRT with RunInference
+- [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) is an SDK that
facilitates high-performance machine learning inference. It is designed to work
with deep learning frameworks such as TensorFlow, PyTorch, and MXNet. It
focuses specifically on optimizing and running a trained neural network to
efficiently run inference on NVIDIA GPUs. TensorRT can maximize inference
throughput with multiple optimizations while preserving model accuracy
including model quantization, layer and tensor fusions, kernel auto-tuning,
multi-stream executions, and efficient tensor memory usage.
+
+- In Apache Beam 2.43.0, Beam introduced the
[TensorRTEngineHandler](https://beam.apache.org/releases/pydoc/2.43.0/apache_beam.ml.inference.tensorrt_inference.html#apache_beam.ml.inference.tensorrt_inference.TensorRTEngineHandlerNumPy),
which lets you deploy a TensorRT engine in a Beam pipeline. The RunInference
transform simplifies the ML inference pipeline creation process by allowing
developers to use Sklearn, PyTorch, TensorFlow and now TensorRT models in
production pipelines without needing lots of boilerplate code.
+
+The following example that demonstrates how to use TensorRT with the
RunInference API using a BERT-based text classification model in a Beam
pipeline.
+
+## Build a TensorRT engine for inference
+To use TensorRT with Apache Beam, you need a converted TensorRT engine file
from a trained model. We take a trained BERT based text classification model
that does sentiment analysis, that is, it classifies any text into two classes:
positive or negative. The trained model is available [from
HuggingFace](https://huggingface.co/textattack/bert-base-uncased-SST-2). To
convert the PyTorch Model to TensorRT engine, you need to first convert the
model to ONNX and then from ONNX to TensorRT.
Review Comment:
```suggestion
To use TensorRT with Apache Beam, you need a converted TensorRT engine file
from a trained model. We take a trained BERT based text classification model
that does sentiment analysis and classifies any text into two classes: positive
or negative. The trained model is available [from
HuggingFace](https://huggingface.co/textattack/bert-base-uncased-SST-2). To
convert the PyTorch Model to TensorRT engine, you need to first convert the
model to ONNX and then from ONNX to TensorRT.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]