uranusjr commented on code in PR #36085: URL: https://github.com/apache/airflow/pull/36085#discussion_r1418707392
########## airflow/providers/weaviate/hooks/weaviate.py: ########## @@ -135,22 +141,52 @@ def create_schema(self, schema_json: dict[str, Any]) -> None: client = self.conn client.schema.create(schema_json) + @staticmethod + def check_http_error_should_retry(exc: BaseException): + return isinstance(exc, requests.HTTPError) and not exc.response.ok + def batch_data( - self, class_name: str, data: list[dict[str, Any]], batch_config_params: dict[str, Any] | None = None + self, + class_name: str, + data: list[dict[str, Any]] | pd.DataFrame, + batch_config_params: dict[str, Any] | None = None, + vector_col: str = "Vector", + retry_attempts_per_object: int = 5, ) -> None: + """ + Add multiple objects or object references at once into weaviate. + + :param class_name: The name of the class that objects belongs to. + :param data: list or dataframe of objects we want to add. + :param batch_config_params: dict of batch configuration option. + .. seealso:: `batch_config_params options <https://weaviate-python-client.readthedocs.io/en/v3.25.3/weaviate.batch.html#weaviate.batch.Batch.configure>`__ + :param vector_col: name of the column containing the vector. + :param retry_attempts_per_object: number of time to try in case of failure before giving up. + """ + import pandas as pd Review Comment: Since Weaviate does not strictly require Pandas to function, it would be better to do something like ```python with contextlib.suppress(ImportError): import pandas if isinstance(data, pandas.DataFrame): ... ``` If the import fails, `data` can never be a DataFrame (it’s impossible to create without Pandas installed), so we can safely guard the check in a try-except. ########## airflow/providers/weaviate/hooks/weaviate.py: ########## @@ -135,22 +141,52 @@ def create_schema(self, schema_json: dict[str, Any]) -> None: client = self.conn client.schema.create(schema_json) + @staticmethod + def check_http_error_should_retry(exc: BaseException): + return isinstance(exc, requests.HTTPError) and not exc.response.ok + def batch_data( - self, class_name: str, data: list[dict[str, Any]], batch_config_params: dict[str, Any] | None = None + self, + class_name: str, + data: list[dict[str, Any]] | pd.DataFrame, + batch_config_params: dict[str, Any] | None = None, + vector_col: str = "Vector", + retry_attempts_per_object: int = 5, ) -> None: + """ + Add multiple objects or object references at once into weaviate. + + :param class_name: The name of the class that objects belongs to. + :param data: list or dataframe of objects we want to add. + :param batch_config_params: dict of batch configuration option. + .. seealso:: `batch_config_params options <https://weaviate-python-client.readthedocs.io/en/v3.25.3/weaviate.batch.html#weaviate.batch.Batch.configure>`__ + :param vector_col: name of the column containing the vector. + :param retry_attempts_per_object: number of time to try in case of failure before giving up. + """ + import pandas as pd Review Comment: Since Weaviate does not strictly require Pandas to function, it would be better to do something like ```python with contextlib.suppress(ImportError): import pandas if isinstance(data, pandas.DataFrame): ... ``` If the import fails, `data` can never be a DataFrame (it’s impossible to create without Pandas installed), so we can safely guard the check in a try-except. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org