SameerMesiah97 commented on code in PR #66929:
URL: https://github.com/apache/airflow/pull/66929#discussion_r3251292312
##########
providers/google/src/airflow/providers/google/suite/operators/sheets.py:
##########
@@ -33,6 +34,8 @@ class GoogleSheetsCreateSpreadsheetOperator(BaseOperator):
:param spreadsheet: an instance of Spreadsheet
https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets#Spreadsheet
+ :param drive_id: Shared drive id where user want to store spreadsheet. In
case when user want to use the operator
+ with service account. Modern Google Security Policy restricts having
storage space on SA.
Review Comment:
This could be phrased better. Refer to the below for guidance:
```
:param drive_id: Shared Drive ID where the spreadsheet should be created.
This is useful when using a service account, since service accounts
do not have personal Drive storage.
```
##########
providers/google/src/airflow/providers/google/suite/operators/sheets.py:
##########
@@ -65,14 +70,42 @@ def __init__(
self.spreadsheet = spreadsheet
self.impersonation_chain = impersonation_chain
self.api_endpoint = api_endpoint
+ self.drive_id = drive_id
def execute(self, context: Any) -> dict[str, Any]:
+ if self.drive_id:
+ spreadsheet = self._create_spreadsheet_via_google_drive_api()
+ else:
+ spreadsheet = self._create_spreadsheet_via_sheets_api()
+ context["task_instance"].xcom_push(key="spreadsheet_id",
value=spreadsheet["spreadsheetId"])
+ context["task_instance"].xcom_push(key="spreadsheet_url",
value=spreadsheet["spreadsheetUrl"])
+ return spreadsheet
+
+ def _construct_spreadsheet_metadata(self, spreadsheet) -> dict:
+ return {
+ "name": spreadsheet["properties"]["title"],
+ "mimeType": "application/vnd.google-apps.spreadsheet",
+ "parents": [self.drive_id],
Review Comment:
Don't see why this helper is needed. I would roll this behavior into
`_create_spreadsheet_via_google_drive_api` where it is being invoked. Also,
won't `"parents": [self.drive_id]` be `"parents": [None]` if `drive_id` is not
set? Will this result in a bug?
##########
providers/google/tests/system/google/cloud/gcs/example_gcs_to_sheets.py:
##########
@@ -88,7 +99,10 @@ def create_connection(connection_id: str):
create_connection_task = create_connection(connection_id=CONNECTION_ID)
create_spreadsheet = GoogleSheetsCreateSpreadsheetOperator(
- task_id="create_spreadsheet", spreadsheet=SPREADSHEET,
gcp_conn_id=CONNECTION_ID
+ task_id="create_spreadsheet",
+ spreadsheet=SPREADSHEET,
+ gcp_conn_id=CONNECTION_ID,
+ drive_id=GDRIVE_ID,
Review Comment:
It seems like you are changing the codepath for an existing task. I think
you should remove the GDRIVE_ID argument. If you want to illustrate the new
`drive_id` behavior, it would be better to add a new
`GoogleSheetsCreateSpreadsheetOperator` task and maybe call it
`create_spreadsheet_shared_drive`. You can rename the existing one to
`create_spreadsheet_default`. But then this complicates the dependency chain
for the tasks. Maybe it is better to leave this unchanged.
##########
providers/google/tests/system/google/cloud/gcs/example_gcs_to_sheets.py:
##########
@@ -65,15 +68,23 @@
schedule="@once", # Override to match your needs
catchup=False,
tags=["example", "gcs"],
+ render_template_as_native_obj=True,
) as dag:
+
+ @task
+ def get_shared_drive_id() -> str:
+ return get_secret(secret_id=GDRIVE_SECRET_ID).strip()
+
+ get_shared_drive_id_task = get_shared_drive_id()
Review Comment:
Why is this here?
##########
providers/google/tests/system/google/cloud/sql_to_sheets/example_sql_to_sheets.py:
##########
@@ -248,7 +259,10 @@ def setup_sheets_connection():
setup_sheets_connection_task = setup_sheets_connection()
create_spreadsheet = GoogleSheetsCreateSpreadsheetOperator(
- task_id="create_spreadsheet", spreadsheet=SPREADSHEET,
gcp_conn_id=SHEETS_CONNECTION_ID
+ task_id="create_spreadsheet",
+ spreadsheet=SPREADSHEET,
+ gcp_conn_id=SHEETS_CONNECTION_ID,
+ drive_id=GDRIVE_ID,
Review Comment:
And this file.
##########
providers/google/src/airflow/providers/google/suite/hooks/drive.py:
##########
@@ -322,3 +322,19 @@ def download_file(self, file_id: str, file_handle: IO,
chunk_size: int = 100 * 1
"""
request = self.get_media_request(file_id=file_id)
self.download_content_from_request(file_handle=file_handle,
request=request, chunk_size=chunk_size)
+
+ def create_file(self, file_metadata) -> dict[str, Any]:
Review Comment:
You have introduced a new public hook API called `create_file`, but most
request parameters are still hardcoded internally (`fields`,
`supportsAllDrives`, etc.). We should probably expose additional request
parameters (or accept **kwargs) to avoid locking the public API into a very
narrow behavior.
##########
providers/google/tests/system/google/cloud/gcs/example_sheets.py:
##########
@@ -101,7 +112,10 @@ def create_connection(connection_id: str):
# [START create_spreadsheet]
create_spreadsheet = GoogleSheetsCreateSpreadsheetOperator(
- task_id="create_spreadsheet", spreadsheet=SPREADSHEET,
gcp_conn_id=CONNECTION_ID
+ task_id="create_spreadsheet",
+ spreadsheet=SPREADSHEET,
+ gcp_conn_id=CONNECTION_ID,
+ drive_id=GDRIVE_ID,
Review Comment:
All of the comments above apply to this file as well.
##########
providers/google/tests/system/google/cloud/gcs/example_sheets_to_gcs.py:
##########
@@ -88,7 +99,10 @@ def create_connection(connection_id: str):
create_connection_task = create_connection(connection_id=CONNECTION_ID)
create_spreadsheet = GoogleSheetsCreateSpreadsheetOperator(
- task_id="create_spreadsheet", spreadsheet=SPREADSHEET,
gcp_conn_id=CONNECTION_ID
+ task_id="create_spreadsheet",
+ spreadsheet=SPREADSHEET,
+ gcp_conn_id=CONNECTION_ID,
+ drive_id=GDRIVE_ID,
Review Comment:
This file too.
##########
providers/google/src/airflow/providers/google/suite/operators/sheets.py:
##########
@@ -65,14 +70,42 @@ def __init__(
self.spreadsheet = spreadsheet
self.impersonation_chain = impersonation_chain
self.api_endpoint = api_endpoint
+ self.drive_id = drive_id
def execute(self, context: Any) -> dict[str, Any]:
+ if self.drive_id:
+ spreadsheet = self._create_spreadsheet_via_google_drive_api()
+ else:
+ spreadsheet = self._create_spreadsheet_via_sheets_api()
+ context["task_instance"].xcom_push(key="spreadsheet_id",
value=spreadsheet["spreadsheetId"])
+ context["task_instance"].xcom_push(key="spreadsheet_url",
value=spreadsheet["spreadsheetUrl"])
+ return spreadsheet
+
+ def _construct_spreadsheet_metadata(self, spreadsheet) -> dict:
+ return {
+ "name": spreadsheet["properties"]["title"],
+ "mimeType": "application/vnd.google-apps.spreadsheet",
+ "parents": [self.drive_id],
+ }
+
+ def _create_spreadsheet_via_google_drive_api(self) -> dict:
Review Comment:
nit: this could be renamed to `_create_spreadsheet_via_drive_api` for
brevity.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]