jrmccluskey commented on code in PR #35677:
URL: https://github.com/apache/beam/pull/35677#discussion_r2298373705


##########
sdks/python/apache_beam/ml/transforms/embeddings/vertex_ai.py:
##########
@@ -281,3 +294,194 @@ def get_ptransform_for_processing(self, **kwargs) -> 
beam.PTransform:
     return RunInference(
         model_handler=_ImageEmbeddingHandler(self),
         inference_args=self.inference_args)
+
+
+@dataclass
+class VertexAIMultiModalInput:
+  image: Optional[Image] = None
+  video: Optional[Video] = None
+  contextual_text: Optional[str] = None

Review Comment:
   It is __not__ part of the image object today, but given how get_embeddings() 
works there wouldn't be a way to specify that otherwise unless they really 
expanded the function signature. 
   
   There was some conversation with @claudevdm around expanding Chunks to be 
capable of containing multimodal data and writing a separate input/output 
adapter pair for them, but I suppose there's separate merit around just 
accepting text Chunks here. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to