tvalentyn commented on code in PR #28243:
URL: https://github.com/apache/beam/pull/28243#discussion_r1320479937
##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -215,6 +215,53 @@ with pipeline as p:
If you are unsure if your data is keyed, you can also use
`MaybeKeyedModelHandler`.
+You can also use a `KeyedModelHandler` to load several different models based
on their associated key:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+keyed_model_handler = KeyedModelHandler([
+ KeyModelMapping(['key1'], PytorchModelHandlerTensor(<config1>)),
+ KeyModelMapping(['key2', 'key3'], PytorchModelHandlerTensor(<config2>))
+])
+with pipeline as p:
+ data = p | beam.Create([
+ ('key1', torch.tensor([[1,2,3],[4,5,6],...])),
+ ('key2', torch.tensor([[1,2,3],[4,5,6],...])),
+ ('key3', torch.tensor([[1,2,3],[4,5,6],...])),
+ ])
+ predictions = data | RunInference(keyed_model_handler)
+```
+
+The previous example will load a model using `config1` and use that for
inference for all examples associated
+with `key1`, and will load a model using `config2` and use that for all
examples associated with `key2` and `key3`.
+
+There are memory risks associated with loading multiple models at once. By
default, `KeyedModelHandler` will not
+limit the number of models loaded into memory at once. This means that if not
all models fit into memory at once,
+your pipeline will likely fail with an Out of Memory exception. To avoid this,
you can provide a hint about the
+maximum number of models loaded at once.=:
Review Comment:
extra chars at EOL
##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -215,6 +215,53 @@ with pipeline as p:
If you are unsure if your data is keyed, you can also use
`MaybeKeyedModelHandler`.
+You can also use a `KeyedModelHandler` to load several different models based
on their associated key:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+keyed_model_handler = KeyedModelHandler([
+ KeyModelMapping(['key1'], PytorchModelHandlerTensor(<config1>)),
+ KeyModelMapping(['key2', 'key3'], PytorchModelHandlerTensor(<config2>))
+])
+with pipeline as p:
+ data = p | beam.Create([
+ ('key1', torch.tensor([[1,2,3],[4,5,6],...])),
+ ('key2', torch.tensor([[1,2,3],[4,5,6],...])),
+ ('key3', torch.tensor([[1,2,3],[4,5,6],...])),
+ ])
+ predictions = data | RunInference(keyed_model_handler)
+```
+
+The previous example will load a model using `config1` and use that for
inference for all examples associated
+with `key1`, and will load a model using `config2` and use that for all
examples associated with `key2` and `key3`.
+
+There are memory risks associated with loading multiple models at once. By
default, `KeyedModelHandler` will not
+limit the number of models loaded into memory at once. This means that if not
all models fit into memory at once,
+your pipeline will likely fail with an Out of Memory exception. To avoid this,
you can provide a hint about the
+maximum number of models loaded at once.=:
+
+```
+mhs = [
+ KeyModelMapping(['key1'], PytorchModelHandlerTensor(<config1>)),
+ KeyModelMapping(['key2', 'key3'], PytorchModelHandlerTensor(<config2>)),
+ KeyModelMapping(['key4'], PytorchModelHandlerTensor(<config3>)),
+ KeyModelMapping(['key5', 'key6', 'key7'],
PytorchModelHandlerTensor(<config4>)),
+]
+keyed_model_handler = KeyedModelHandler(mhs, max_models_per_worker_hint=2)
+```
+
+The previous example will load at most 2 models per worker at any given time,
and will unload models that aren't
Review Comment:
> load at most 2 models per worker
Users might perceive it as a guarantee, and come to us if/when they see a
single OOM error. If this cannot be guaranteed, we can phrase that the upper
ceiling is enforced as best effort. Or mention that there may be some delay
between when the model is unloaded and the memory is freed.
##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -215,6 +215,53 @@ with pipeline as p:
If you are unsure if your data is keyed, you can also use
`MaybeKeyedModelHandler`.
+You can also use a `KeyedModelHandler` to load several different models based
on their associated key:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+keyed_model_handler = KeyedModelHandler([
+ KeyModelMapping(['key1'], PytorchModelHandlerTensor(<config1>)),
+ KeyModelMapping(['key2', 'key3'], PytorchModelHandlerTensor(<config2>))
+])
+with pipeline as p:
+ data = p | beam.Create([
+ ('key1', torch.tensor([[1,2,3],[4,5,6],...])),
+ ('key2', torch.tensor([[1,2,3],[4,5,6],...])),
+ ('key3', torch.tensor([[1,2,3],[4,5,6],...])),
+ ])
+ predictions = data | RunInference(keyed_model_handler)
+```
+
+The previous example will load a model using `config1` and use that for
inference for all examples associated
+with `key1`, and will load a model using `config2` and use that for all
examples associated with `key2` and `key3`.
+
+There are memory risks associated with loading multiple models at once. By
default, `KeyedModelHandler` will not
+limit the number of models loaded into memory at once. This means that if not
all models fit into memory at once,
+your pipeline will likely fail with an Out of Memory exception. To avoid this,
you can provide a hint about the
+maximum number of models loaded at once.=:
+
+```
+mhs = [
+ KeyModelMapping(['key1'], PytorchModelHandlerTensor(<config1>)),
+ KeyModelMapping(['key2', 'key3'], PytorchModelHandlerTensor(<config2>)),
+ KeyModelMapping(['key4'], PytorchModelHandlerTensor(<config3>)),
+ KeyModelMapping(['key5', 'key6', 'key7'],
PytorchModelHandlerTensor(<config4>)),
+]
+keyed_model_handler = KeyedModelHandler(mhs, max_models_per_worker_hint=2)
+```
+
+The previous example will load at most 2 models per worker at any given time,
and will unload models that aren't
+currently being used as needed. Runners that have multiple workers on a given
machine will load at most
+`max_models_per_worker_hint*<num workers>` models onto the machine. Make sure
you leave enough space for the models
Review Comment:
Since worker is such an overloaded term, users might be confused. In
Dataflow context, `worker` often refers to the VM, as in `max_num_workers`
pipeline option.
Here by 'worker' you refer to SDK worker process. In
https://cloud.google.com/dataflow/docs/guides/troubleshoot-oom#sdk-process-memory
we call these 'Apache Beam SDK process`. How about you use that or `SDK
worker process` instead of `worker`?
##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -215,6 +215,53 @@ with pipeline as p:
If you are unsure if your data is keyed, you can also use
`MaybeKeyedModelHandler`.
+You can also use a `KeyedModelHandler` to load several different models based
on their associated key:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+keyed_model_handler = KeyedModelHandler([
+ KeyModelMapping(['key1'], PytorchModelHandlerTensor(<config1>)),
+ KeyModelMapping(['key2', 'key3'], PytorchModelHandlerTensor(<config2>))
+])
+with pipeline as p:
+ data = p | beam.Create([
+ ('key1', torch.tensor([[1,2,3],[4,5,6],...])),
+ ('key2', torch.tensor([[1,2,3],[4,5,6],...])),
+ ('key3', torch.tensor([[1,2,3],[4,5,6],...])),
+ ])
+ predictions = data | RunInference(keyed_model_handler)
+```
+
+The previous example will load a model using `config1` and use that for
inference for all examples associated
+with `key1`, and will load a model using `config2` and use that for all
examples associated with `key2` and `key3`.
+
+There are memory risks associated with loading multiple models at once. By
default, `KeyedModelHandler` will not
+limit the number of models loaded into memory at once. This means that if not
all models fit into memory at once,
+your pipeline will likely fail with an Out of Memory exception. To avoid this,
you can provide a hint about the
+maximum number of models loaded at once.=:
+
+```
+mhs = [
+ KeyModelMapping(['key1'], PytorchModelHandlerTensor(<config1>)),
+ KeyModelMapping(['key2', 'key3'], PytorchModelHandlerTensor(<config2>)),
+ KeyModelMapping(['key4'], PytorchModelHandlerTensor(<config3>)),
+ KeyModelMapping(['key5', 'key6', 'key7'],
PytorchModelHandlerTensor(<config4>)),
+]
+keyed_model_handler = KeyedModelHandler(mhs, max_models_per_worker_hint=2)
+```
+
+The previous example will load at most 2 models per worker at any given time,
and will unload models that aren't
Review Comment:
> The previous example will load at most 2 models per worker at any given
time
Have we tried running multi-key inference under load, for example 10 models,
many examples, but only 1 model can fit in memory? We could try that with and
without GBK.
##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -215,6 +215,53 @@ with pipeline as p:
If you are unsure if your data is keyed, you can also use
`MaybeKeyedModelHandler`.
+You can also use a `KeyedModelHandler` to load several different models based
on their associated key:
Review Comment:
should this content also be linked from the docstring?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]