[GitHub] [beam] damccorm commented on a diff in pull request #25226: Add TensorRT runinference example for Text Classification

via GitHub Mon, 06 Feb 2023 11:51:09 -0800


damccorm commented on code in PR #25226:
URL: https://github.com/apache/beam/pull/25226#discussion_r1097844820



##########
website/www/site/layouts/partials/section-menu/en/documentation.html:
##########
@@ -225,6 +225,7 @@
     <li><a href="/documentation/ml/anomaly-detection/">Anomaly 
Detection</a></li>
     <li><a href="/documentation/ml/large-language-modeling">Large Language 
Model Inference in Beam</a></li>
     <li><a href="/documentation/ml/per-entity-training">Per Entity Training in 
Beam</a></li>
+    <li><a href="/documentation/ml/tensorrt-runinference">TensorRT Text 
Classification Inference</a></li>

Review Comment:
   ```suggestion
       <li><a href="/documentation/ml/tensorrt-runinference">TensorRT 
Inference</a></li>
   ```
   
   This will help this render more naturally and illustrates the main point of 
the document



##########
website/www/site/content/en/documentation/ml/tensorrt-runinference.md:
##########
@@ -0,0 +1,150 @@
+---
+title: "TensorRT RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Use TensorRT with RunInference
+- [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) is an SDK that 
facilitates high-performance machine learning inference. It is designed to work 
with deep learning frameworks such as TensorFlow, PyTorch, and MXNet. It 
focuses specifically on optimizing and running a trained neural network to 
efficiently run inference on NVIDIA GPUs. TensorRT can maximize inference 
throughput with multiple optimizations while preserving model accuracy 
including model quantization, layer and tensor fusions, kernel auto-tuning, 
multi-stream executions, and efficient tensor memory usage.
+
+- In Apache Beam 2.43.0, Beam introduced the 
[TensorRTEngineHandler](https://beam.apache.org/releases/pydoc/2.43.0/apache_beam.ml.inference.tensorrt_inference.html#apache_beam.ml.inference.tensorrt_inference.TensorRTEngineHandlerNumPy),
 which lets you deploy a TensorRT engine in a Beam pipeline. The RunInference 
transform simplifies the ML inference pipeline creation process by allowing 
developers to use Sklearn, PyTorch, TensorFlow and now TensorRT models in 
production pipelines without needing lots of boilerplate code.
+
+The following example that demonstrates how to use TensorRT with the 
RunInference API using a BERT-based text classification model in a Beam 
pipeline.
+
+## Build a TensorRT engine for inference
+To use TensorRT with Apache Beam, you need a converted TensorRT engine file 
from a trained model. We take a trained BERT based text classification model 
that does sentiment analysis, that is, it classifies any text into two classes: 
positive or negative. The trained model is available [from 
HuggingFace](https://huggingface.co/textattack/bert-base-uncased-SST-2). To 
convert the PyTorch Model to TensorRT engine, you need to first convert the 
model to ONNX and then from ONNX to TensorRT.
+
+### Conversion to ONNX
+
+You can use the HuggingFace `transformers` library to convert a PyTorch model 
to ONNX. For details, see the blog post [Convert Transformers to ONNX with 
Hugging Face 
Optimum](https://huggingface.co/blog/convert-transformers-to-onnx). The blog 
post explains which required packages to install. The following code that is 
used for the conversion.

Review Comment:
   ```suggestion
   You can use the HuggingFace `transformers` library to convert a PyTorch 
model to ONNX. For details, see the blog post [Convert Transformers to ONNX 
with Hugging Face 
Optimum](https://huggingface.co/blog/convert-transformers-to-onnx). The blog 
post explains which required packages to install. The following code is used 
for the conversion.
   ```



##########
website/www/site/content/en/documentation/ml/tensorrt-runinference.md:
##########
@@ -0,0 +1,150 @@
+---
+title: "TensorRT RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Use TensorRT with RunInference
+- [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) is an SDK that 
facilitates high-performance machine learning inference. It is designed to work 
with deep learning frameworks such as TensorFlow, PyTorch, and MXNet. It 
focuses specifically on optimizing and running a trained neural network to 
efficiently run inference on NVIDIA GPUs. TensorRT can maximize inference 
throughput with multiple optimizations while preserving model accuracy 
including model quantization, layer and tensor fusions, kernel auto-tuning, 
multi-stream executions, and efficient tensor memory usage.
+
+- In Apache Beam 2.43.0, Beam introduced the 
[TensorRTEngineHandler](https://beam.apache.org/releases/pydoc/2.43.0/apache_beam.ml.inference.tensorrt_inference.html#apache_beam.ml.inference.tensorrt_inference.TensorRTEngineHandlerNumPy),
 which lets you deploy a TensorRT engine in a Beam pipeline. The RunInference 
transform simplifies the ML inference pipeline creation process by allowing 
developers to use Sklearn, PyTorch, TensorFlow and now TensorRT models in 
production pipelines without needing lots of boilerplate code.
+
+The following example that demonstrates how to use TensorRT with the 
RunInference API using a BERT-based text classification model in a Beam 
pipeline.
+
+## Build a TensorRT engine for inference
+To use TensorRT with Apache Beam, you need a converted TensorRT engine file 
from a trained model. We take a trained BERT based text classification model 
that does sentiment analysis, that is, it classifies any text into two classes: 
positive or negative. The trained model is available [from 
HuggingFace](https://huggingface.co/textattack/bert-base-uncased-SST-2). To 
convert the PyTorch Model to TensorRT engine, you need to first convert the 
model to ONNX and then from ONNX to TensorRT.

Review Comment:
   ```suggestion
   To use TensorRT with Apache Beam, you need a converted TensorRT engine file 
from a trained model. We take a trained BERT based text classification model 
that does sentiment analysis and classifies any text into two classes: positive 
or negative. The trained model is available [from 
HuggingFace](https://huggingface.co/textattack/bert-base-uncased-SST-2). To 
convert the PyTorch Model to TensorRT engine, you need to first convert the 
model to ONNX and then from ONNX to TensorRT.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] damccorm commented on a diff in pull request #25226: Add TensorRT runinference example for Text Classification

Reply via email to