uranusjr commented on code in PR #34891:
URL: https://github.com/apache/airflow/pull/34891#discussion_r1363227895


##########
airflow/providers/postgres/hooks/postgres.py:
##########
@@ -320,6 +320,34 @@ def _generate_insert_sql(
 
         return sql
 
+    def ingest_embedding(self, table: str, input_data: Iterable[tuple[str, 
list[float]]], vector_size: int) -> None:
+        """
+        Store embedding vector in Postgres table.
+
+        :param table: The Name of the table
+        :param input_data: Iterable containing tuples of input data and 
corresponding embedding vectors.
+        :param vector_size: The size of vector. The maximum dimensions can be 
2,000
+        """
+        from pgvector.psycopg import register_vector
+        from psycopg2 import sql
+
+        self.conn.execute("CREATE EXTENSION IF NOT EXISTS vector")
+        register_vector(self.conn)
+
+        create_table_query = sql.SQL(
+            "CREATE TABLE IF NOT EXISTS {} (id bigserial PRIMARY KEY, content 
text, embedding vector({}))"
+        ).format(sql.Identifier(table), sql.Literal(vector_size))
+
+        self.conn.execute(create_table_query)
+
+        for data_item in input_data:
+            insert_query = sql.SQL("INSERT INTO {} (content, embedding) VALUES 
(%s, %s)").format(
+                sql.Identifier(table)
+            )
+            content = data_item[0]
+            embedding = data_item[1]
+            self.conn.execute(insert_query, (content, embedding))

Review Comment:
   ```suggestion
               self.conn.execute(insert_query, data)
   ```
   
   The unpacking seems unnecessary since the two are joined immediately?



##########
airflow/providers/postgres/hooks/postgres.py:
##########
@@ -320,6 +320,34 @@ def _generate_insert_sql(
 
         return sql
 
+    def ingest_embedding(self, table: str, input_data: Iterable[tuple[str, 
list[float]]], vector_size: int) -> None:
+        """
+        Store embedding vector in Postgres table.
+
+        :param table: The Name of the table
+        :param input_data: Iterable containing tuples of input data and 
corresponding embedding vectors.
+        :param vector_size: The size of vector. The maximum dimensions can be 
2,000
+        """
+        from pgvector.psycopg import register_vector
+        from psycopg2 import sql
+
+        self.conn.execute("CREATE EXTENSION IF NOT EXISTS vector")
+        register_vector(self.conn)
+
+        create_table_query = sql.SQL(
+            "CREATE TABLE IF NOT EXISTS {} (id bigserial PRIMARY KEY, content 
text, embedding vector({}))"
+        ).format(sql.Identifier(table), sql.Literal(vector_size))
+
+        self.conn.execute(create_table_query)
+
+        for data_item in input_data:
+            insert_query = sql.SQL("INSERT INTO {} (content, embedding) VALUES 
(%s, %s)").format(
+                sql.Identifier(table)
+            )
+            content = data_item[0]
+            embedding = data_item[1]
+            self.conn.execute(insert_query, (content, embedding))

Review Comment:
   ```suggestion
               self.conn.execute(insert_query, data_item)
   ```
   
   The unpacking seems unnecessary since the two are joined immediately?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to