jlaneve commented on code in PR #34891:
URL: https://github.com/apache/airflow/pull/34891#discussion_r1363762415


##########
airflow/providers/postgres/operators/postgres.py:
##########
@@ -80,3 +86,60 @@ def __init__(
             AirflowProviderDeprecationWarning,
             stacklevel=2,
         )
+
+
+class PgVectorIngestOperator(BaseOperator):
+    """
+    Operator for ingesting text and embeddings into a PostgreSQL database 
using the pgvector library.
+
+    :param conn_id: The connection ID for the postgresql database.
+    :param input_data: Tuple containing the string input content and 
corresponding list of float vector
+        embeddings.
+    :param input_callable: A callable that returns a tuple containing the 
string input content and
+        corresponding  list of float vector embeddings, if ``input_data`` is 
not provided.
+    :param input_callable_args: Positional arguments for the 'input_callable'.
+    :param input_callable_kwargs: Keyword arguments for the 'input_callable'.
+    :param kwargs: Additional keyword arguments for the BaseOperator.
+    """
+
+    def __init__(
+        self,
+        conn_id: str,
+        input_data: tuple[str, list[float]] | None = None,
+        input_callable: Callable[[Any], Any] | None = None,
+        input_callable_args: Collection[Any] | None = None,
+        input_callable_kwargs: Mapping[str, Any] | None = None,

Review Comment:
   yes, purpose here is to give a workaround to storing things in XComs. 
depending on how much data you're working with, passing a large number of 
vectors through XComs can be unideal (especially if you don't have a custom 
XCom backend). instead, giving the user the ability to execute the same data 
fetching code within the task means we don't pollute XComs.
   
   this has disadvantages though, particularly around retries. hence why there 
are two different input methods to let the user decide which is right for them: 
(1) from XComs and (2) with a callable



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to