[GitHub] [beam] damccorm commented on a diff in pull request #28243: Add docs for per key inference

via GitHub Mon, 11 Sep 2023 06:53:54 -0700


damccorm commented on code in PR #28243:
URL: https://github.com/apache/beam/pull/28243#discussion_r1321574614



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -215,6 +215,53 @@ with pipeline as p:
 
 If you are unsure if your data is keyed, you can also use 
`MaybeKeyedModelHandler`.
 
+You can also use a `KeyedModelHandler` to load several different models based 
on their associated key:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+keyed_model_handler = KeyedModelHandler([
+  KeyModelMapping(['key1'], PytorchModelHandlerTensor(<config1>)),
+  KeyModelMapping(['key2', 'key3'], PytorchModelHandlerTensor(<config2>))
+])
+with pipeline as p:
+   data = p | beam.Create([
+      ('key1', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('key2', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('key3', torch.tensor([[1,2,3],[4,5,6],...])),
+   ])
+   predictions = data | RunInference(keyed_model_handler)
+```
+
+The previous example will load a model using `config1` and use that for 
inference for all examples associated
+with `key1`, and will load a model using `config2` and use that for all 
examples associated with `key2` and `key3`.
+
+There are memory risks associated with loading multiple models at once. By 
default, `KeyedModelHandler` will not
+limit the number of models loaded into memory at once. This means that if not 
all models fit into memory at once,
+your pipeline will likely fail with an Out of Memory exception. To avoid this, 
you can provide a hint about the
+maximum number of models loaded at once.=:
+
+```
+mhs = [
+  KeyModelMapping(['key1'], PytorchModelHandlerTensor(<config1>)),
+  KeyModelMapping(['key2', 'key3'], PytorchModelHandlerTensor(<config2>)),
+  KeyModelMapping(['key4'], PytorchModelHandlerTensor(<config3>)),
+  KeyModelMapping(['key5', 'key6', 'key7'], 
PytorchModelHandlerTensor(<config4>)),
+]
+keyed_model_handler = KeyedModelHandler(mhs, max_models_per_worker_hint=2)
+```
+
+The previous example will load at most 2 models per worker at any given time, 
and will unload models that aren't

Review Comment:
   I haven't yet, but its a good idea I can follow up with



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -215,6 +215,53 @@ with pipeline as p:
 
 If you are unsure if your data is keyed, you can also use 
`MaybeKeyedModelHandler`.
 
+You can also use a `KeyedModelHandler` to load several different models based 
on their associated key:

Review Comment:
   Yes, I thought I had already done that. Updated



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -215,6 +215,53 @@ with pipeline as p:
 
 If you are unsure if your data is keyed, you can also use 
`MaybeKeyedModelHandler`.
 
+You can also use a `KeyedModelHandler` to load several different models based 
on their associated key:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+keyed_model_handler = KeyedModelHandler([
+  KeyModelMapping(['key1'], PytorchModelHandlerTensor(<config1>)),
+  KeyModelMapping(['key2', 'key3'], PytorchModelHandlerTensor(<config2>))
+])
+with pipeline as p:
+   data = p | beam.Create([
+      ('key1', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('key2', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('key3', torch.tensor([[1,2,3],[4,5,6],...])),
+   ])
+   predictions = data | RunInference(keyed_model_handler)
+```
+
+The previous example will load a model using `config1` and use that for 
inference for all examples associated
+with `key1`, and will load a model using `config2` and use that for all 
examples associated with `key2` and `key3`.
+
+There are memory risks associated with loading multiple models at once. By 
default, `KeyedModelHandler` will not
+limit the number of models loaded into memory at once. This means that if not 
all models fit into memory at once,
+your pipeline will likely fail with an Out of Memory exception. To avoid this, 
you can provide a hint about the
+maximum number of models loaded at once.=:
+
+```
+mhs = [
+  KeyModelMapping(['key1'], PytorchModelHandlerTensor(<config1>)),
+  KeyModelMapping(['key2', 'key3'], PytorchModelHandlerTensor(<config2>)),
+  KeyModelMapping(['key4'], PytorchModelHandlerTensor(<config3>)),
+  KeyModelMapping(['key5', 'key6', 'key7'], 
PytorchModelHandlerTensor(<config4>)),
+]
+keyed_model_handler = KeyedModelHandler(mhs, max_models_per_worker_hint=2)
+```
+
+The previous example will load at most 2 models per worker at any given time, 
and will unload models that aren't
+currently being used as needed. Runners that have multiple workers on a given 
machine will load at most
+`max_models_per_worker_hint*<num workers>` models onto the machine. Make sure 
you leave enough space for the models

Review Comment:
   Updated references to `SDK worker process`



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -215,6 +215,53 @@ with pipeline as p:
 
 If you are unsure if your data is keyed, you can also use 
`MaybeKeyedModelHandler`.
 
+You can also use a `KeyedModelHandler` to load several different models based 
on their associated key:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+keyed_model_handler = KeyedModelHandler([
+  KeyModelMapping(['key1'], PytorchModelHandlerTensor(<config1>)),
+  KeyModelMapping(['key2', 'key3'], PytorchModelHandlerTensor(<config2>))
+])
+with pipeline as p:
+   data = p | beam.Create([
+      ('key1', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('key2', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('key3', torch.tensor([[1,2,3],[4,5,6],...])),
+   ])
+   predictions = data | RunInference(keyed_model_handler)
+```
+
+The previous example will load a model using `config1` and use that for 
inference for all examples associated
+with `key1`, and will load a model using `config2` and use that for all 
examples associated with `key2` and `key3`.
+
+There are memory risks associated with loading multiple models at once. By 
default, `KeyedModelHandler` will not
+limit the number of models loaded into memory at once. This means that if not 
all models fit into memory at once,
+your pipeline will likely fail with an Out of Memory exception. To avoid this, 
you can provide a hint about the
+maximum number of models loaded at once.=:
+
+```
+mhs = [
+  KeyModelMapping(['key1'], PytorchModelHandlerTensor(<config1>)),
+  KeyModelMapping(['key2', 'key3'], PytorchModelHandlerTensor(<config2>)),
+  KeyModelMapping(['key4'], PytorchModelHandlerTensor(<config3>)),
+  KeyModelMapping(['key5', 'key6', 'key7'], 
PytorchModelHandlerTensor(<config4>)),
+]
+keyed_model_handler = KeyedModelHandler(mhs, max_models_per_worker_hint=2)
+```
+
+The previous example will load at most 2 models per worker at any given time, 
and will unload models that aren't

Review Comment:
   I mentioned that there's some delay, let me know if you think the wording is 
ok



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] damccorm commented on a diff in pull request #28243: Add docs for per key inference

Reply via email to