jrmccluskey commented on code in PR #35677:
URL: https://github.com/apache/beam/pull/35677#discussion_r2285680282
##########
sdks/python/apache_beam/ml/transforms/embeddings/vertex_ai.py:
##########
@@ -281,3 +294,194 @@ def get_ptransform_for_processing(self, **kwargs) ->
beam.PTransform:
return RunInference(
model_handler=_ImageEmbeddingHandler(self),
inference_args=self.inference_args)
+
+
+@dataclass
+class VertexAIMultiModalInput:
+ image: Optional[Image] = None
+ video: Optional[Video] = None
+ contextual_text: Optional[str] = None
Review Comment:
So for something like the mimeType being specifiable that would likely be
something composed in the Image object (if you look at get_embeddings() [from
Vertex](https://cloud.google.com/vertex-ai/generative-ai/docs/reference/python/1.71.0/vertexai.vision_models.MultiModalEmbeddingModel#vertexai_vision_models_MultiModalEmbeddingModel_get_embeddings)
those objects are how we construct the request) and not something we'd have to
explicitly support. If we really wanted to support that now we'd be rolling our
own request code, which I don't think is in our best interest.
I think my main hangup on adding a bunch of wrappers around each field is
largely around the ease of use around the transform. The whole point of adding
the input adapters was so users wouldn't have to compose the input dataclass
themselves, just pass the fields to us and we do the rest. I do worry about
overcomplicating this in anticipation of _maybe_ having additional
configuration options added to vertex later. Admittedly I am not as familiar
with other frameworks for multimodal embeddings beyond Vertex, so maybe this is
a reasonable expectation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]