[GitHub] [beam] tvalentyn commented on a diff in pull request #28243: Add docs for per key inference

via GitHub Fri, 08 Sep 2023 22:12:49 -0700


tvalentyn commented on code in PR #28243:
URL: https://github.com/apache/beam/pull/28243#discussion_r1320479937



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -215,6 +215,53 @@ with pipeline as p:
 
 If you are unsure if your data is keyed, you can also use 
`MaybeKeyedModelHandler`.
 
+You can also use a `KeyedModelHandler` to load several different models based 
on their associated key:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+keyed_model_handler = KeyedModelHandler([
+  KeyModelMapping(['key1'], PytorchModelHandlerTensor(<config1>)),
+  KeyModelMapping(['key2', 'key3'], PytorchModelHandlerTensor(<config2>))
+])
+with pipeline as p:
+   data = p | beam.Create([
+      ('key1', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('key2', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('key3', torch.tensor([[1,2,3],[4,5,6],...])),
+   ])
+   predictions = data | RunInference(keyed_model_handler)
+```
+
+The previous example will load a model using `config1` and use that for 
inference for all examples associated
+with `key1`, and will load a model using `config2` and use that for all 
examples associated with `key2` and `key3`.
+
+There are memory risks associated with loading multiple models at once. By 
default, `KeyedModelHandler` will not
+limit the number of models loaded into memory at once. This means that if not 
all models fit into memory at once,
+your pipeline will likely fail with an Out of Memory exception. To avoid this, 
you can provide a hint about the
+maximum number of models loaded at once.=:

Review Comment:
   extra chars at EOL



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -215,6 +215,53 @@ with pipeline as p:
 
 If you are unsure if your data is keyed, you can also use 
`MaybeKeyedModelHandler`.
 
+You can also use a `KeyedModelHandler` to load several different models based 
on their associated key:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+keyed_model_handler = KeyedModelHandler([
+  KeyModelMapping(['key1'], PytorchModelHandlerTensor(<config1>)),
+  KeyModelMapping(['key2', 'key3'], PytorchModelHandlerTensor(<config2>))
+])
+with pipeline as p:
+   data = p | beam.Create([
+      ('key1', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('key2', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('key3', torch.tensor([[1,2,3],[4,5,6],...])),
+   ])
+   predictions = data | RunInference(keyed_model_handler)
+```
+
+The previous example will load a model using `config1` and use that for 
inference for all examples associated
+with `key1`, and will load a model using `config2` and use that for all 
examples associated with `key2` and `key3`.
+
+There are memory risks associated with loading multiple models at once. By 
default, `KeyedModelHandler` will not
+limit the number of models loaded into memory at once. This means that if not 
all models fit into memory at once,
+your pipeline will likely fail with an Out of Memory exception. To avoid this, 
you can provide a hint about the
+maximum number of models loaded at once.=:
+
+```
+mhs = [
+  KeyModelMapping(['key1'], PytorchModelHandlerTensor(<config1>)),
+  KeyModelMapping(['key2', 'key3'], PytorchModelHandlerTensor(<config2>)),
+  KeyModelMapping(['key4'], PytorchModelHandlerTensor(<config3>)),
+  KeyModelMapping(['key5', 'key6', 'key7'], 
PytorchModelHandlerTensor(<config4>)),
+]
+keyed_model_handler = KeyedModelHandler(mhs, max_models_per_worker_hint=2)
+```
+
+The previous example will load at most 2 models per worker at any given time, 
and will unload models that aren't

Review Comment:
    > load at most 2 models per worker 
   
   Users might perceive it as a guarantee, and come to us if/when they see a 
single OOM error.  If this cannot be guaranteed, we can phrase that the upper  
ceiling is enforced as best effort. Or mention that there may be some delay 
between when the model is unloaded and the memory is freed.



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -215,6 +215,53 @@ with pipeline as p:
 
 If you are unsure if your data is keyed, you can also use 
`MaybeKeyedModelHandler`.
 
+You can also use a `KeyedModelHandler` to load several different models based 
on their associated key:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+keyed_model_handler = KeyedModelHandler([
+  KeyModelMapping(['key1'], PytorchModelHandlerTensor(<config1>)),
+  KeyModelMapping(['key2', 'key3'], PytorchModelHandlerTensor(<config2>))
+])
+with pipeline as p:
+   data = p | beam.Create([
+      ('key1', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('key2', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('key3', torch.tensor([[1,2,3],[4,5,6],...])),
+   ])
+   predictions = data | RunInference(keyed_model_handler)
+```
+
+The previous example will load a model using `config1` and use that for 
inference for all examples associated
+with `key1`, and will load a model using `config2` and use that for all 
examples associated with `key2` and `key3`.
+
+There are memory risks associated with loading multiple models at once. By 
default, `KeyedModelHandler` will not
+limit the number of models loaded into memory at once. This means that if not 
all models fit into memory at once,
+your pipeline will likely fail with an Out of Memory exception. To avoid this, 
you can provide a hint about the
+maximum number of models loaded at once.=:
+
+```
+mhs = [
+  KeyModelMapping(['key1'], PytorchModelHandlerTensor(<config1>)),
+  KeyModelMapping(['key2', 'key3'], PytorchModelHandlerTensor(<config2>)),
+  KeyModelMapping(['key4'], PytorchModelHandlerTensor(<config3>)),
+  KeyModelMapping(['key5', 'key6', 'key7'], 
PytorchModelHandlerTensor(<config4>)),
+]
+keyed_model_handler = KeyedModelHandler(mhs, max_models_per_worker_hint=2)
+```
+
+The previous example will load at most 2 models per worker at any given time, 
and will unload models that aren't
+currently being used as needed. Runners that have multiple workers on a given 
machine will load at most
+`max_models_per_worker_hint*<num workers>` models onto the machine. Make sure 
you leave enough space for the models

Review Comment:
   Since worker is such an overloaded term, users might be confused. In 
Dataflow context, `worker` often refers to the  VM, as in  `max_num_workers` 
pipeline option.
   
   Here by 'worker' you refer to SDK worker process. In  
https://cloud.google.com/dataflow/docs/guides/troubleshoot-oom#sdk-process-memory
 we call these  'Apache Beam SDK process`. How about you use that or `SDK 
worker process` instead of `worker`?
   



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -215,6 +215,53 @@ with pipeline as p:
 
 If you are unsure if your data is keyed, you can also use 
`MaybeKeyedModelHandler`.
 
+You can also use a `KeyedModelHandler` to load several different models based 
on their associated key:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+keyed_model_handler = KeyedModelHandler([
+  KeyModelMapping(['key1'], PytorchModelHandlerTensor(<config1>)),
+  KeyModelMapping(['key2', 'key3'], PytorchModelHandlerTensor(<config2>))
+])
+with pipeline as p:
+   data = p | beam.Create([
+      ('key1', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('key2', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('key3', torch.tensor([[1,2,3],[4,5,6],...])),
+   ])
+   predictions = data | RunInference(keyed_model_handler)
+```
+
+The previous example will load a model using `config1` and use that for 
inference for all examples associated
+with `key1`, and will load a model using `config2` and use that for all 
examples associated with `key2` and `key3`.
+
+There are memory risks associated with loading multiple models at once. By 
default, `KeyedModelHandler` will not
+limit the number of models loaded into memory at once. This means that if not 
all models fit into memory at once,
+your pipeline will likely fail with an Out of Memory exception. To avoid this, 
you can provide a hint about the
+maximum number of models loaded at once.=:
+
+```
+mhs = [
+  KeyModelMapping(['key1'], PytorchModelHandlerTensor(<config1>)),
+  KeyModelMapping(['key2', 'key3'], PytorchModelHandlerTensor(<config2>)),
+  KeyModelMapping(['key4'], PytorchModelHandlerTensor(<config3>)),
+  KeyModelMapping(['key5', 'key6', 'key7'], 
PytorchModelHandlerTensor(<config4>)),
+]
+keyed_model_handler = KeyedModelHandler(mhs, max_models_per_worker_hint=2)
+```
+
+The previous example will load at most 2 models per worker at any given time, 
and will unload models that aren't

Review Comment:
   > The previous example will load at most 2 models per worker at any given 
time
   
   Have we tried running multi-key inference under load, for example 10 models, 
many examples,  but only 1 model can fit in memory?  We could try that with and 
without GBK.



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -215,6 +215,53 @@ with pipeline as p:
 
 If you are unsure if your data is keyed, you can also use 
`MaybeKeyedModelHandler`.
 
+You can also use a `KeyedModelHandler` to load several different models based 
on their associated key:

Review Comment:
   should this content also be linked from the docstring?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] tvalentyn commented on a diff in pull request #28243: Add docs for per key inference

Reply via email to