[GitHub] [beam] pcoet commented on a diff in pull request #21871: Modify README for 3 pytorch examples

GitBox Tue, 14 Jun 2022 16:50:06 -0700


pcoet commented on code in PR #21871:
URL: https://github.com/apache/beam/pull/21871#discussion_r897404144



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -48,77 +58,181 @@ i.e. "See the
 for details."
 -->
 
-### Datasets and models for RunInference
-
-The RunInference example pipelines read example data from 
`gs://apache-beam-ml/`.
-You can view the data
-[here](https://console.cloud.google.com/storage/browser/apache-beam-ml). You 
can
-also list the example data using the
-[gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted).
-
-```
-gsutil ls gs://apache-beam-ml
-```
-
 ---
-## Image classification with ImageNet dataset
+## Image classification
 
 [`pytorch_image_classification.py`](./pytorch_image_classification.py) contains
-an implementation for a RunInference pipeline that peforms image classification
-on the [ImageNet dataset](https://www.image-net.org/) using the MobileNetV2
+an implementation for a RunInference pipeline thatpeforms image classification 
using the mobilenet_v2
 architecture.
 
 The pipeline reads the images, performs basic preprocessing, passes them to the
 PyTorch implementation of RunInference, and then writes the predictions
-to a text file in GCS.
+to a text file.
 
 ### Dataset and model for image classification
 
-The image classification pipeline uses the following data:
-<!---
-TODO: Add once benchmark test is released
-- `gs://apache-beam-ml/testing/inputs/imagenet_validation_inputs.txt`:
-  text file containing the GCS paths of the images of all 5000 imagenet 
validation data
-    - 
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-    - ...
-    - 
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00050000.JPEG
--->
-- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt`:
-  Text file containing the GCS paths of the images of a subset of ImageNet
-  validation data. The following example command shows how to view contents of
-  the file:
-
-  ```
-  $ gsutil cat 
gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt
-  
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-  ...
-  
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000015.JPEG
-  ```
-
-- 
`gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_*.JPEG`:
-  JPEG images for the entire validation dataset.
-
-- `gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth`: Path to
-  the location of the saved `state_dict` of the pretrained `mobilenet_v2` model
-  from the `torchvision.models` subdirectory.
+You will need to create or download images, and place them into this directory 
called `IMAGES_DIR`. One popular dataset is from 
[ImageNet](https://www.image-net.org/). Please follow their instructions to 
download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the 
absolute paths of each of the images in `IMAGES_DIR` on which you want to run 
image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the 
saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the 
[mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
+model from Pytorch's repository of pretrained models. Make sure you have 
installed `torchvision` too.
+```
+import torch
+from torchvision.models.detection import mobilenet_v2
+model = mobilenet_v2(pretrained=True)
+torch.save(model.state_dict(), 'mobilenet_v2.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images 
are stored. that contains the images you want to feed into your model. Not 
required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.

Review Comment:
   It looks like this should be deleted: "that contains the images you want to 
feed into your model. "



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -48,77 +58,181 @@ i.e. "See the
 for details."
 -->
 
-### Datasets and models for RunInference
-
-The RunInference example pipelines read example data from 
`gs://apache-beam-ml/`.
-You can view the data
-[here](https://console.cloud.google.com/storage/browser/apache-beam-ml). You 
can
-also list the example data using the
-[gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted).
-
-```
-gsutil ls gs://apache-beam-ml
-```
-
 ---
-## Image classification with ImageNet dataset
+## Image classification
 
 [`pytorch_image_classification.py`](./pytorch_image_classification.py) contains
-an implementation for a RunInference pipeline that peforms image classification
-on the [ImageNet dataset](https://www.image-net.org/) using the MobileNetV2
+an implementation for a RunInference pipeline thatpeforms image classification 
using the mobilenet_v2
 architecture.
 
 The pipeline reads the images, performs basic preprocessing, passes them to the
 PyTorch implementation of RunInference, and then writes the predictions
-to a text file in GCS.
+to a text file.
 
 ### Dataset and model for image classification
 
-The image classification pipeline uses the following data:
-<!---
-TODO: Add once benchmark test is released
-- `gs://apache-beam-ml/testing/inputs/imagenet_validation_inputs.txt`:
-  text file containing the GCS paths of the images of all 5000 imagenet 
validation data
-    - 
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-    - ...
-    - 
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00050000.JPEG
--->
-- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt`:
-  Text file containing the GCS paths of the images of a subset of ImageNet
-  validation data. The following example command shows how to view contents of
-  the file:
-
-  ```
-  $ gsutil cat 
gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt
-  
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-  ...
-  
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000015.JPEG
-  ```
-
-- 
`gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_*.JPEG`:
-  JPEG images for the entire validation dataset.
-
-- `gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth`: Path to
-  the location of the saved `state_dict` of the pretrained `mobilenet_v2` model
-  from the `torchvision.models` subdirectory.
+You will need to create or download images, and place them into this directory 
called `IMAGES_DIR`. One popular dataset is from 
[ImageNet](https://www.image-net.org/). Please follow their instructions to 
download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the 
absolute paths of each of the images in `IMAGES_DIR` on which you want to run 
image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the 
saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the 
[mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
+model from Pytorch's repository of pretrained models. Make sure you have 
installed `torchvision` too.
+```
+import torch
+from torchvision.models.detection import mobilenet_v2
+model = mobilenet_v2(pretrained=True)
+torch.save(model.state_dict(), 'mobilenet_v2.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images 
are stored. that contains the images you want to feed into your model. Not 
required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.
 
 ### Running `pytorch_image_classification.py`
 
 To run the image classification pipeline locally, use the following command:
 ```sh
 python -m apache_beam.examples.inference.pytorch_image_classification \
-  --input gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt 
\
+  --input IMAGE_FILE_NAMES \
+  --images_dir IMAGES_DIR \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_classification \
+  --input image_file_names.txt \
   --output predictions.csv \
-  --model_state_dict_path 
gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth
+  --model_state_dict_path maskrcnn_resnet50_fpn.pth
+```
+This writes the output to the `predictions.csv` with contents like:
 ```
+/absolute/path/to/image1.jpg;1
+/absolute/path/to/image2.jpg;333
+...
+```
+---
+## Image segmentation
+
+[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains
+an implementation for a RunInference pipeline that peforms image segementation 
using the
+maskrcnn_resnet50_fpn architecture.
 
-This writes the output to `predictions.csv` with contents like:
+The pipeline reads images, performs basic preprocessing, passes them to the
+PyTorch implementation of RunInference, and then writes the predictions
+to a text file.
+
+### Dataset and model for image segmentation
+You will need to create or download images, and place them into this directory 
called `IMAGES_DIR`. Another popular dataset is from 
[Coco](https://cocodataset.org/#home). Please follow their instructions to 
download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the 
absolute paths of each of the images in `IMAGES_DIR` on which you want to run 
image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the 
saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the 
[maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70)
+model from Pytorch's repository of pretrained models. Make sure you have 
installed `torchvision` too.
 ```
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005002.JPEG,333
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005003.JPEG,711
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005004.JPEG,286
+import torch
+from torchvision.models.detection import maskrcnn_resnet50_fpn
+model = maskrcnn_resnet50_fpn(pretrained=True)
+torch.save(model.state_dict(), 'maskrcnn_resnet50_fpn.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images 
are stored. that contains the images you want to feed into your model. Not 
required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.

Review Comment:
   Looks like this should be deleted: "that contains the images you want to 
feed into your model. "



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -48,77 +58,181 @@ i.e. "See the
 for details."
 -->
 
-### Datasets and models for RunInference
-
-The RunInference example pipelines read example data from 
`gs://apache-beam-ml/`.
-You can view the data
-[here](https://console.cloud.google.com/storage/browser/apache-beam-ml). You 
can
-also list the example data using the
-[gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted).
-
-```
-gsutil ls gs://apache-beam-ml
-```
-
 ---
-## Image classification with ImageNet dataset
+## Image classification
 
 [`pytorch_image_classification.py`](./pytorch_image_classification.py) contains
-an implementation for a RunInference pipeline that peforms image classification
-on the [ImageNet dataset](https://www.image-net.org/) using the MobileNetV2
+an implementation for a RunInference pipeline thatpeforms image classification 
using the mobilenet_v2
 architecture.
 
 The pipeline reads the images, performs basic preprocessing, passes them to the
 PyTorch implementation of RunInference, and then writes the predictions
-to a text file in GCS.
+to a text file.
 
 ### Dataset and model for image classification
 
-The image classification pipeline uses the following data:
-<!---
-TODO: Add once benchmark test is released
-- `gs://apache-beam-ml/testing/inputs/imagenet_validation_inputs.txt`:
-  text file containing the GCS paths of the images of all 5000 imagenet 
validation data
-    - 
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-    - ...
-    - 
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00050000.JPEG
--->
-- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt`:
-  Text file containing the GCS paths of the images of a subset of ImageNet
-  validation data. The following example command shows how to view contents of
-  the file:
-
-  ```
-  $ gsutil cat 
gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt
-  
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-  ...
-  
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000015.JPEG
-  ```
-
-- 
`gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_*.JPEG`:
-  JPEG images for the entire validation dataset.
-
-- `gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth`: Path to
-  the location of the saved `state_dict` of the pretrained `mobilenet_v2` model
-  from the `torchvision.models` subdirectory.
+You will need to create or download images, and place them into this directory 
called `IMAGES_DIR`. One popular dataset is from 
[ImageNet](https://www.image-net.org/). Please follow their instructions to 
download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the 
absolute paths of each of the images in `IMAGES_DIR` on which you want to run 
image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the 
saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the 
[mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
+model from Pytorch's repository of pretrained models. Make sure you have 
installed `torchvision` too.
+```
+import torch
+from torchvision.models.detection import mobilenet_v2
+model = mobilenet_v2(pretrained=True)
+torch.save(model.state_dict(), 'mobilenet_v2.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images 
are stored. that contains the images you want to feed into your model. Not 
required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.

Review Comment:
   Also, "absolute path" -> "absolute paths"



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -48,77 +58,181 @@ i.e. "See the
 for details."
 -->
 
-### Datasets and models for RunInference
-
-The RunInference example pipelines read example data from 
`gs://apache-beam-ml/`.
-You can view the data
-[here](https://console.cloud.google.com/storage/browser/apache-beam-ml). You 
can
-also list the example data using the
-[gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted).
-
-```
-gsutil ls gs://apache-beam-ml
-```
-
 ---
-## Image classification with ImageNet dataset
+## Image classification
 
 [`pytorch_image_classification.py`](./pytorch_image_classification.py) contains
-an implementation for a RunInference pipeline that peforms image classification
-on the [ImageNet dataset](https://www.image-net.org/) using the MobileNetV2
+an implementation for a RunInference pipeline thatpeforms image classification 
using the mobilenet_v2
 architecture.
 
 The pipeline reads the images, performs basic preprocessing, passes them to the
 PyTorch implementation of RunInference, and then writes the predictions
-to a text file in GCS.
+to a text file.
 
 ### Dataset and model for image classification
 
-The image classification pipeline uses the following data:
-<!---
-TODO: Add once benchmark test is released
-- `gs://apache-beam-ml/testing/inputs/imagenet_validation_inputs.txt`:
-  text file containing the GCS paths of the images of all 5000 imagenet 
validation data
-    - 
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-    - ...
-    - 
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00050000.JPEG
--->
-- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt`:
-  Text file containing the GCS paths of the images of a subset of ImageNet
-  validation data. The following example command shows how to view contents of
-  the file:
-
-  ```
-  $ gsutil cat 
gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt
-  
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-  ...
-  
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000015.JPEG
-  ```
-
-- 
`gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_*.JPEG`:
-  JPEG images for the entire validation dataset.
-
-- `gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth`: Path to
-  the location of the saved `state_dict` of the pretrained `mobilenet_v2` model
-  from the `torchvision.models` subdirectory.
+You will need to create or download images, and place them into this directory 
called `IMAGES_DIR`. One popular dataset is from 
[ImageNet](https://www.image-net.org/). Please follow their instructions to 
download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the 
absolute paths of each of the images in `IMAGES_DIR` on which you want to run 
image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the 
saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the 
[mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
+model from Pytorch's repository of pretrained models. Make sure you have 
installed `torchvision` too.
+```
+import torch
+from torchvision.models.detection import mobilenet_v2
+model = mobilenet_v2(pretrained=True)
+torch.save(model.state_dict(), 'mobilenet_v2.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images 
are stored. that contains the images you want to feed into your model. Not 
required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.
 
 ### Running `pytorch_image_classification.py`
 
 To run the image classification pipeline locally, use the following command:
 ```sh
 python -m apache_beam.examples.inference.pytorch_image_classification \
-  --input gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt 
\
+  --input IMAGE_FILE_NAMES \
+  --images_dir IMAGES_DIR \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_classification \
+  --input image_file_names.txt \
   --output predictions.csv \
-  --model_state_dict_path 
gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth
+  --model_state_dict_path maskrcnn_resnet50_fpn.pth
+```
+This writes the output to the `predictions.csv` with contents like:
 ```
+/absolute/path/to/image1.jpg;1
+/absolute/path/to/image2.jpg;333
+...
+```
+---
+## Image segmentation
+
+[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains
+an implementation for a RunInference pipeline that peforms image segementation 
using the
+maskrcnn_resnet50_fpn architecture.
 
-This writes the output to `predictions.csv` with contents like:
+The pipeline reads images, performs basic preprocessing, passes them to the
+PyTorch implementation of RunInference, and then writes the predictions
+to a text file.
+
+### Dataset and model for image segmentation
+You will need to create or download images, and place them into this directory 
called `IMAGES_DIR`. Another popular dataset is from 
[Coco](https://cocodataset.org/#home). Please follow their instructions to 
download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the 
absolute paths of each of the images in `IMAGES_DIR` on which you want to run 
image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the 
saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the 
[maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70)
+model from Pytorch's repository of pretrained models. Make sure you have 
installed `torchvision` too.
 ```
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005002.JPEG,333
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005003.JPEG,711
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005004.JPEG,286
+import torch
+from torchvision.models.detection import maskrcnn_resnet50_fpn
+model = maskrcnn_resnet50_fpn(pretrained=True)
+torch.save(model.state_dict(), 'maskrcnn_resnet50_fpn.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images 
are stored. that contains the images you want to feed into your model. Not 
required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.
+### Running `pytorch_image_segmentation.py`
+
+To run the image segmentation pipeline locally, use the following command:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_segmentation \
+  --input IMAGE_FILE_NAMES \
+  --images_dir IMAGES_DIR \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_segmentation \
+  --input image_file_names.txt \
+  --output predictions.csv \
+  --model_state_dict_path maskrcnn_resnet50_fpn.pth
+```
+This writes the output to the `predictions.csv` with contents like:
+```
+/absolute/path/to/image1.jpg;['parking meter', 'bottle', 'person', 'traffic 
light', 'traffic light', 'traffic light']
+/absolute/path/to/image2.jpg;['bottle', 'person', 'person']
 ...
 ```
+Each line has data separated by a semicolon ";".
+The first item is the file name. The second item are a list of predicted 
instances.

Review Comment:
   "item are" -> "item is"



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -48,77 +58,181 @@ i.e. "See the
 for details."
 -->
 
-### Datasets and models for RunInference
-
-The RunInference example pipelines read example data from 
`gs://apache-beam-ml/`.
-You can view the data
-[here](https://console.cloud.google.com/storage/browser/apache-beam-ml). You 
can
-also list the example data using the
-[gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted).
-
-```
-gsutil ls gs://apache-beam-ml
-```
-
 ---
-## Image classification with ImageNet dataset
+## Image classification
 
 [`pytorch_image_classification.py`](./pytorch_image_classification.py) contains
-an implementation for a RunInference pipeline that peforms image classification
-on the [ImageNet dataset](https://www.image-net.org/) using the MobileNetV2
+an implementation for a RunInference pipeline thatpeforms image classification 
using the mobilenet_v2
 architecture.
 
 The pipeline reads the images, performs basic preprocessing, passes them to the
 PyTorch implementation of RunInference, and then writes the predictions
-to a text file in GCS.
+to a text file.
 
 ### Dataset and model for image classification
 
-The image classification pipeline uses the following data:
-<!---
-TODO: Add once benchmark test is released
-- `gs://apache-beam-ml/testing/inputs/imagenet_validation_inputs.txt`:
-  text file containing the GCS paths of the images of all 5000 imagenet 
validation data
-    - 
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-    - ...
-    - 
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00050000.JPEG
--->
-- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt`:
-  Text file containing the GCS paths of the images of a subset of ImageNet
-  validation data. The following example command shows how to view contents of
-  the file:
-
-  ```
-  $ gsutil cat 
gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt
-  
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-  ...
-  
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000015.JPEG
-  ```
-
-- 
`gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_*.JPEG`:
-  JPEG images for the entire validation dataset.
-
-- `gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth`: Path to
-  the location of the saved `state_dict` of the pretrained `mobilenet_v2` model
-  from the `torchvision.models` subdirectory.
+You will need to create or download images, and place them into this directory 
called `IMAGES_DIR`. One popular dataset is from 
[ImageNet](https://www.image-net.org/). Please follow their instructions to 
download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the 
absolute paths of each of the images in `IMAGES_DIR` on which you want to run 
image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the 
saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the 
[mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
+model from Pytorch's repository of pretrained models. Make sure you have 
installed `torchvision` too.
+```
+import torch
+from torchvision.models.detection import mobilenet_v2
+model = mobilenet_v2(pretrained=True)
+torch.save(model.state_dict(), 'mobilenet_v2.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images 
are stored. that contains the images you want to feed into your model. Not 
required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.
 
 ### Running `pytorch_image_classification.py`
 
 To run the image classification pipeline locally, use the following command:
 ```sh
 python -m apache_beam.examples.inference.pytorch_image_classification \
-  --input gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt 
\
+  --input IMAGE_FILE_NAMES \
+  --images_dir IMAGES_DIR \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_classification \
+  --input image_file_names.txt \
   --output predictions.csv \
-  --model_state_dict_path 
gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth
+  --model_state_dict_path maskrcnn_resnet50_fpn.pth
+```
+This writes the output to the `predictions.csv` with contents like:
 ```
+/absolute/path/to/image1.jpg;1
+/absolute/path/to/image2.jpg;333
+...
+```
+---
+## Image segmentation
+
+[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains
+an implementation for a RunInference pipeline that peforms image segementation 
using the
+maskrcnn_resnet50_fpn architecture.
 
-This writes the output to `predictions.csv` with contents like:
+The pipeline reads images, performs basic preprocessing, passes them to the
+PyTorch implementation of RunInference, and then writes the predictions
+to a text file.
+
+### Dataset and model for image segmentation
+You will need to create or download images, and place them into this directory 
called `IMAGES_DIR`. Another popular dataset is from 
[Coco](https://cocodataset.org/#home). Please follow their instructions to 
download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the 
absolute paths of each of the images in `IMAGES_DIR` on which you want to run 
image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the 
saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the 
[maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70)
+model from Pytorch's repository of pretrained models. Make sure you have 
installed `torchvision` too.
 ```
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005002.JPEG,333
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005003.JPEG,711
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005004.JPEG,286
+import torch
+from torchvision.models.detection import maskrcnn_resnet50_fpn
+model = maskrcnn_resnet50_fpn(pretrained=True)
+torch.save(model.state_dict(), 'maskrcnn_resnet50_fpn.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images 
are stored. that contains the images you want to feed into your model. Not 
required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.
+### Running `pytorch_image_segmentation.py`
+
+To run the image segmentation pipeline locally, use the following command:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_segmentation \
+  --input IMAGE_FILE_NAMES \
+  --images_dir IMAGES_DIR \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_segmentation \
+  --input image_file_names.txt \
+  --output predictions.csv \
+  --model_state_dict_path maskrcnn_resnet50_fpn.pth
+```
+This writes the output to the `predictions.csv` with contents like:
+```
+/absolute/path/to/image1.jpg;['parking meter', 'bottle', 'person', 'traffic 
light', 'traffic light', 'traffic light']
+/absolute/path/to/image2.jpg;['bottle', 'person', 'person']
 ...
 ```
+Each line has data separated by a semicolon ";".
+The first item is the file name. The second item are a list of predicted 
instances.
 
-The second item in each line is the integer representing the predicted class 
of the
-image.
+---
+## Language modeling
+
+[`pytorch_language_modeling.py`](./pytorch_language_modeling.py) contains
+an implementation for a RunInference pipeline that peforms masked language
+modeling (i.e. decoding a masked token in a sentence) using the BertForMaskedLM
+architecture from Hugging Face.
+
+The pipeline reads sentences, performs basic preprocessing to conver the last
+word into a `[MASK]` token, passes the masked sentence to PyTorch
+implementation of RunInference, and then writes the predictions to a text file.
+
+### Dataset and model for language modeling
+
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the 
saved
+parameters of the BertForMaskedLM model. You will need to download the 
[BertForMaskedLM](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForMaskedLM)
+model from Hugging Face's repository of pretrained models.
+Make sure you have installed `transformers` too.
+```
+import torch
+from transformers import BertForMaskedLM
+model = BertForMaskedLM.from_pretrained('bert-base-uncased', return_dict=True)
+torch.save(model.state_dict(), 'BertForMaskedLM.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: A path to a file called `SENTENCES` that contains sentences to
+feed into the model. It should look something like this:
+```
+The capital of France is Paris .
+It is raining cats and dogs .
+...
+```
+### Running `pytorch_language_modeling.py`
+
+To run the language modeling pipeline locally, use the following command:
+```sh
+python -m apache_beam.examples.inference.pytorch_language_modeling \
+  --input SENTENCES \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_language_modeling \
+  --input sentences.txt \
+  --output predictions.csv \
+  --model_state_dict_path BertForMaskedLM.pth
+```
+If you don't provide a sentences file, it will run the pipeline with some
+example sentences we created.

Review Comment:
   You can probably remove "we created".



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -48,77 +58,181 @@ i.e. "See the
 for details."
 -->
 
-### Datasets and models for RunInference
-
-The RunInference example pipelines read example data from 
`gs://apache-beam-ml/`.
-You can view the data
-[here](https://console.cloud.google.com/storage/browser/apache-beam-ml). You 
can
-also list the example data using the
-[gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted).
-
-```
-gsutil ls gs://apache-beam-ml
-```
-
 ---
-## Image classification with ImageNet dataset
+## Image classification
 
 [`pytorch_image_classification.py`](./pytorch_image_classification.py) contains
-an implementation for a RunInference pipeline that peforms image classification
-on the [ImageNet dataset](https://www.image-net.org/) using the MobileNetV2
+an implementation for a RunInference pipeline thatpeforms image classification 
using the mobilenet_v2
 architecture.
 
 The pipeline reads the images, performs basic preprocessing, passes them to the
 PyTorch implementation of RunInference, and then writes the predictions
-to a text file in GCS.
+to a text file.
 
 ### Dataset and model for image classification
 
-The image classification pipeline uses the following data:
-<!---
-TODO: Add once benchmark test is released
-- `gs://apache-beam-ml/testing/inputs/imagenet_validation_inputs.txt`:
-  text file containing the GCS paths of the images of all 5000 imagenet 
validation data
-    - 
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-    - ...
-    - 
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00050000.JPEG
--->
-- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt`:
-  Text file containing the GCS paths of the images of a subset of ImageNet
-  validation data. The following example command shows how to view contents of
-  the file:
-
-  ```
-  $ gsutil cat 
gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt
-  
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-  ...
-  
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000015.JPEG
-  ```
-
-- 
`gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_*.JPEG`:
-  JPEG images for the entire validation dataset.
-
-- `gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth`: Path to
-  the location of the saved `state_dict` of the pretrained `mobilenet_v2` model
-  from the `torchvision.models` subdirectory.
+You will need to create or download images, and place them into this directory 
called `IMAGES_DIR`. One popular dataset is from 
[ImageNet](https://www.image-net.org/). Please follow their instructions to 
download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the 
absolute paths of each of the images in `IMAGES_DIR` on which you want to run 
image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the 
saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the 
[mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
+model from Pytorch's repository of pretrained models. Make sure you have 
installed `torchvision` too.
+```
+import torch
+from torchvision.models.detection import mobilenet_v2
+model = mobilenet_v2(pretrained=True)
+torch.save(model.state_dict(), 'mobilenet_v2.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images 
are stored. that contains the images you want to feed into your model. Not 
required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.
 
 ### Running `pytorch_image_classification.py`
 
 To run the image classification pipeline locally, use the following command:
 ```sh
 python -m apache_beam.examples.inference.pytorch_image_classification \
-  --input gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt 
\
+  --input IMAGE_FILE_NAMES \
+  --images_dir IMAGES_DIR \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_classification \
+  --input image_file_names.txt \
   --output predictions.csv \
-  --model_state_dict_path 
gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth
+  --model_state_dict_path maskrcnn_resnet50_fpn.pth
+```
+This writes the output to the `predictions.csv` with contents like:
 ```
+/absolute/path/to/image1.jpg;1
+/absolute/path/to/image2.jpg;333
+...
+```
+---
+## Image segmentation
+
+[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains
+an implementation for a RunInference pipeline that peforms image segementation 
using the
+maskrcnn_resnet50_fpn architecture.
 
-This writes the output to `predictions.csv` with contents like:
+The pipeline reads images, performs basic preprocessing, passes them to the
+PyTorch implementation of RunInference, and then writes the predictions
+to a text file.
+
+### Dataset and model for image segmentation
+You will need to create or download images, and place them into this directory 
called `IMAGES_DIR`. Another popular dataset is from 
[Coco](https://cocodataset.org/#home). Please follow their instructions to 
download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the 
absolute paths of each of the images in `IMAGES_DIR` on which you want to run 
image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the 
saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the 
[maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70)
+model from Pytorch's repository of pretrained models. Make sure you have 
installed `torchvision` too.
 ```
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005002.JPEG,333
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005003.JPEG,711
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005004.JPEG,286
+import torch
+from torchvision.models.detection import maskrcnn_resnet50_fpn
+model = maskrcnn_resnet50_fpn(pretrained=True)
+torch.save(model.state_dict(), 'maskrcnn_resnet50_fpn.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images 
are stored. that contains the images you want to feed into your model. Not 
required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.
+### Running `pytorch_image_segmentation.py`
+
+To run the image segmentation pipeline locally, use the following command:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_segmentation \
+  --input IMAGE_FILE_NAMES \
+  --images_dir IMAGES_DIR \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_segmentation \
+  --input image_file_names.txt \
+  --output predictions.csv \
+  --model_state_dict_path maskrcnn_resnet50_fpn.pth
+```
+This writes the output to the `predictions.csv` with contents like:
+```
+/absolute/path/to/image1.jpg;['parking meter', 'bottle', 'person', 'traffic 
light', 'traffic light', 'traffic light']
+/absolute/path/to/image2.jpg;['bottle', 'person', 'person']
 ...
 ```
+Each line has data separated by a semicolon ";".
+The first item is the file name. The second item are a list of predicted 
instances.
 
-The second item in each line is the integer representing the predicted class 
of the
-image.
+---
+## Language modeling
+
+[`pytorch_language_modeling.py`](./pytorch_language_modeling.py) contains
+an implementation for a RunInference pipeline that peforms masked language
+modeling (i.e. decoding a masked token in a sentence) using the BertForMaskedLM
+architecture from Hugging Face.
+
+The pipeline reads sentences, performs basic preprocessing to conver the last

Review Comment:
   "conver" -> "convert"



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -48,77 +58,181 @@ i.e. "See the
 for details."
 -->
 
-### Datasets and models for RunInference
-
-The RunInference example pipelines read example data from 
`gs://apache-beam-ml/`.
-You can view the data
-[here](https://console.cloud.google.com/storage/browser/apache-beam-ml). You 
can
-also list the example data using the
-[gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted).
-
-```
-gsutil ls gs://apache-beam-ml
-```
-
 ---
-## Image classification with ImageNet dataset
+## Image classification
 
 [`pytorch_image_classification.py`](./pytorch_image_classification.py) contains
-an implementation for a RunInference pipeline that peforms image classification
-on the [ImageNet dataset](https://www.image-net.org/) using the MobileNetV2
+an implementation for a RunInference pipeline thatpeforms image classification 
using the mobilenet_v2
 architecture.
 
 The pipeline reads the images, performs basic preprocessing, passes them to the
 PyTorch implementation of RunInference, and then writes the predictions
-to a text file in GCS.
+to a text file.
 
 ### Dataset and model for image classification
 
-The image classification pipeline uses the following data:
-<!---
-TODO: Add once benchmark test is released
-- `gs://apache-beam-ml/testing/inputs/imagenet_validation_inputs.txt`:
-  text file containing the GCS paths of the images of all 5000 imagenet 
validation data
-    - 
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-    - ...
-    - 
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00050000.JPEG
--->
-- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt`:
-  Text file containing the GCS paths of the images of a subset of ImageNet
-  validation data. The following example command shows how to view contents of
-  the file:
-
-  ```
-  $ gsutil cat 
gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt
-  
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-  ...
-  
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000015.JPEG
-  ```
-
-- 
`gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_*.JPEG`:
-  JPEG images for the entire validation dataset.
-
-- `gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth`: Path to
-  the location of the saved `state_dict` of the pretrained `mobilenet_v2` model
-  from the `torchvision.models` subdirectory.
+You will need to create or download images, and place them into this directory 
called `IMAGES_DIR`. One popular dataset is from 
[ImageNet](https://www.image-net.org/). Please follow their instructions to 
download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the 
absolute paths of each of the images in `IMAGES_DIR` on which you want to run 
image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the 
saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the 
[mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
+model from Pytorch's repository of pretrained models. Make sure you have 
installed `torchvision` too.
+```
+import torch
+from torchvision.models.detection import mobilenet_v2
+model = mobilenet_v2(pretrained=True)
+torch.save(model.state_dict(), 'mobilenet_v2.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images 
are stored. that contains the images you want to feed into your model. Not 
required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.
 
 ### Running `pytorch_image_classification.py`
 
 To run the image classification pipeline locally, use the following command:
 ```sh
 python -m apache_beam.examples.inference.pytorch_image_classification \
-  --input gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt 
\
+  --input IMAGE_FILE_NAMES \
+  --images_dir IMAGES_DIR \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_classification \
+  --input image_file_names.txt \
   --output predictions.csv \
-  --model_state_dict_path 
gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth
+  --model_state_dict_path maskrcnn_resnet50_fpn.pth
+```
+This writes the output to the `predictions.csv` with contents like:
 ```
+/absolute/path/to/image1.jpg;1
+/absolute/path/to/image2.jpg;333
+...
+```
+---
+## Image segmentation
+
+[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains
+an implementation for a RunInference pipeline that peforms image segementation 
using the
+maskrcnn_resnet50_fpn architecture.
 
-This writes the output to `predictions.csv` with contents like:
+The pipeline reads images, performs basic preprocessing, passes them to the
+PyTorch implementation of RunInference, and then writes the predictions
+to a text file.
+
+### Dataset and model for image segmentation
+You will need to create or download images, and place them into this directory 
called `IMAGES_DIR`. Another popular dataset is from 
[Coco](https://cocodataset.org/#home). Please follow their instructions to 
download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the 
absolute paths of each of the images in `IMAGES_DIR` on which you want to run 
image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the 
saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the 
[maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70)
+model from Pytorch's repository of pretrained models. Make sure you have 
installed `torchvision` too.
 ```
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005002.JPEG,333
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005003.JPEG,711
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005004.JPEG,286
+import torch
+from torchvision.models.detection import maskrcnn_resnet50_fpn
+model = maskrcnn_resnet50_fpn(pretrained=True)
+torch.save(model.state_dict(), 'maskrcnn_resnet50_fpn.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images 
are stored. that contains the images you want to feed into your model. Not 
required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.

Review Comment:
   Also, "absolute path" -> "absolute paths"



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -48,77 +58,181 @@ i.e. "See the
 for details."
 -->
 
-### Datasets and models for RunInference
-
-The RunInference example pipelines read example data from 
`gs://apache-beam-ml/`.
-You can view the data
-[here](https://console.cloud.google.com/storage/browser/apache-beam-ml). You 
can
-also list the example data using the
-[gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted).
-
-```
-gsutil ls gs://apache-beam-ml
-```
-
 ---
-## Image classification with ImageNet dataset
+## Image classification
 
 [`pytorch_image_classification.py`](./pytorch_image_classification.py) contains
-an implementation for a RunInference pipeline that peforms image classification
-on the [ImageNet dataset](https://www.image-net.org/) using the MobileNetV2
+an implementation for a RunInference pipeline thatpeforms image classification 
using the mobilenet_v2
 architecture.
 
 The pipeline reads the images, performs basic preprocessing, passes them to the
 PyTorch implementation of RunInference, and then writes the predictions
-to a text file in GCS.
+to a text file.
 
 ### Dataset and model for image classification
 
-The image classification pipeline uses the following data:
-<!---
-TODO: Add once benchmark test is released
-- `gs://apache-beam-ml/testing/inputs/imagenet_validation_inputs.txt`:
-  text file containing the GCS paths of the images of all 5000 imagenet 
validation data
-    - 
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-    - ...
-    - 
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00050000.JPEG
--->
-- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt`:
-  Text file containing the GCS paths of the images of a subset of ImageNet
-  validation data. The following example command shows how to view contents of
-  the file:
-
-  ```
-  $ gsutil cat 
gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt
-  
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-  ...
-  
gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000015.JPEG
-  ```
-
-- 
`gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_*.JPEG`:
-  JPEG images for the entire validation dataset.
-
-- `gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth`: Path to
-  the location of the saved `state_dict` of the pretrained `mobilenet_v2` model
-  from the `torchvision.models` subdirectory.
+You will need to create or download images, and place them into this directory 
called `IMAGES_DIR`. One popular dataset is from 
[ImageNet](https://www.image-net.org/). Please follow their instructions to 
download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the 
absolute paths of each of the images in `IMAGES_DIR` on which you want to run 
image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the 
saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the 
[mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
+model from Pytorch's repository of pretrained models. Make sure you have 
installed `torchvision` too.
+```
+import torch
+from torchvision.models.detection import mobilenet_v2
+model = mobilenet_v2(pretrained=True)
+torch.save(model.state_dict(), 'mobilenet_v2.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images 
are stored. that contains the images you want to feed into your model. Not 
required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.
 
 ### Running `pytorch_image_classification.py`
 
 To run the image classification pipeline locally, use the following command:
 ```sh
 python -m apache_beam.examples.inference.pytorch_image_classification \
-  --input gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt 
\
+  --input IMAGE_FILE_NAMES \
+  --images_dir IMAGES_DIR \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_classification \
+  --input image_file_names.txt \
   --output predictions.csv \
-  --model_state_dict_path 
gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth
+  --model_state_dict_path maskrcnn_resnet50_fpn.pth
+```
+This writes the output to the `predictions.csv` with contents like:
 ```
+/absolute/path/to/image1.jpg;1
+/absolute/path/to/image2.jpg;333
+...
+```
+---
+## Image segmentation
+
+[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains
+an implementation for a RunInference pipeline that peforms image segementation 
using the
+maskrcnn_resnet50_fpn architecture.
 
-This writes the output to `predictions.csv` with contents like:
+The pipeline reads images, performs basic preprocessing, passes them to the
+PyTorch implementation of RunInference, and then writes the predictions
+to a text file.
+
+### Dataset and model for image segmentation
+You will need to create or download images, and place them into this directory 
called `IMAGES_DIR`. Another popular dataset is from 
[Coco](https://cocodataset.org/#home). Please follow their instructions to 
download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the 
absolute paths of each of the images in `IMAGES_DIR` on which you want to run 
image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the 
saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the 
[maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70)
+model from Pytorch's repository of pretrained models. Make sure you have 
installed `torchvision` too.
 ```
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005002.JPEG,333
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005003.JPEG,711
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005004.JPEG,286
+import torch
+from torchvision.models.detection import maskrcnn_resnet50_fpn
+model = maskrcnn_resnet50_fpn(pretrained=True)
+torch.save(model.state_dict(), 'maskrcnn_resnet50_fpn.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images 
are stored. that contains the images you want to feed into your model. Not 
required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.
+### Running `pytorch_image_segmentation.py`
+
+To run the image segmentation pipeline locally, use the following command:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_segmentation \
+  --input IMAGE_FILE_NAMES \
+  --images_dir IMAGES_DIR \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_segmentation \
+  --input image_file_names.txt \
+  --output predictions.csv \
+  --model_state_dict_path maskrcnn_resnet50_fpn.pth
+```
+This writes the output to the `predictions.csv` with contents like:
+```
+/absolute/path/to/image1.jpg;['parking meter', 'bottle', 'person', 'traffic 
light', 'traffic light', 'traffic light']
+/absolute/path/to/image2.jpg;['bottle', 'person', 'person']
 ...
 ```
+Each line has data separated by a semicolon ";".
+The first item is the file name. The second item are a list of predicted 
instances.
 
-The second item in each line is the integer representing the predicted class 
of the
-image.
+---
+## Language modeling
+
+[`pytorch_language_modeling.py`](./pytorch_language_modeling.py) contains
+an implementation for a RunInference pipeline that peforms masked language
+modeling (i.e. decoding a masked token in a sentence) using the BertForMaskedLM
+architecture from Hugging Face.
+
+The pipeline reads sentences, performs basic preprocessing to conver the last
+word into a `[MASK]` token, passes the masked sentence to PyTorch

Review Comment:
   "to PyTorch" -> "to the PyTorch"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] pcoet commented on a diff in pull request #21871: Modify README for 3 pytorch examples

Reply via email to