jrmccluskey commented on code in PR #31536: URL: https://github.com/apache/beam/pull/31536#discussion_r1631269292
########## sdks/python/apache_beam/ml/transforms/embeddings/huggingface.py: ########## @@ -153,6 +154,45 @@ def get_ptransform_for_processing(self, **kwargs) -> beam.PTransform: )) +class SentenceTransformerImageEmbeddings(EmbeddingsManager): + def __init__(self, model_name: str, columns: List[str], **kwargs): + """ + Embedding config for sentence-transformers. This config can be used with + MLTransform to embed image data. Models are loaded using the RunInference + PTransform with the help of ModelHandler. + + Args: + model_name: Name of the model to use. The model should be hosted on + HuggingFace Hub or compatible with sentence_transformers. See + https://www.sbert.net/docs/sentence_transformer/pretrained_models.html#image-text-models # pylint: disable=line-too-long + for a list of sentence_transformers models. + columns: List of columns to be embedded. + min_batch_size: The minimum batch size to be used for inference. Review Comment: this is a bit of a weird case where those parameters are passed up as kwargs and handled by the `EmbeddingsManager`. I'd be okay to explicitly have these in the constructor though -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org