[ https://issues.apache.org/jira/browse/AIRFLOW-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16895291#comment-16895291 ]
Elad commented on AIRFLOW-3503: ------------------------------- I don't think this example code can ever work. The hook.delete() can delete a single file. You can specify /* and expect it to delete everything in that path. The proper way to achieve such functionality is something like: {code:java} def delete_folder(path_to_delete): """ Delete files Google cloud storage """ hook = GoogleCloudStorageHook( google_cloud_storage_conn_id=CONNECTION_ID) files = hook.list( bucket=GCS_BUCKET_ID, prefix=path_to_delete) for file in files: hook.delete( bucket=GCS_BUCKET_ID, object=file) {code} Maybe the best approach to resolve this is to do what happens in delete_objects of [S3Hook|https://github.com/apache/airflow/blob/master/airflow/hooks/S3_hook.py#L520]. The delete_objects know it's a single file if keys is a string and multiple files if keys is a list. With that approach you can just use the output of list() directly as input to delete() I think this simplify the process significantly. > GoogleCloudStorageHook delete return success when nothing was done > ------------------------------------------------------------------- > > Key: AIRFLOW-3503 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3503 > Project: Apache Airflow > Issue Type: Bug > Components: gcp > Affects Versions: 1.10.1 > Reporter: lot > Assignee: Yohei Onishi > Priority: Major > Labels: gcp, gcs, hooks > > I'm loading files to BigQuery from Storage using: > > {{gcs_export_uri = BQ_TABLE_NAME + '/' + EXEC_TIMESTAMP_PATH + '/*' > gcs_to_bigquery_op = GoogleCloudStorageToBigQueryOperator( dag=dag, > task_id='load_products_to_BigQuery', bucket=GCS_BUCKET_ID, > destination_project_dataset_table=table_name_template, > source_format='NEWLINE_DELIMITED_JSON', source_objects=[gcs_export_uri], > src_fmt_configs=\{'ignoreUnknownValues': True}, > create_disposition='CREATE_IF_NEEDED', write_disposition='WRITE_TRUNCATE', > skip_leading_rows = 1, google_cloud_storage_conn_id=CONNECTION_ID, > bigquery_conn_id=CONNECTION_ID)}} > > After that I want to delete the files so I do: > {{def delete_folder():}} > {{ """}} > {{ Delete files Google cloud storage}} > {{ """}} > {{ hook = GoogleCloudStorageHook(}} > {{ google_cloud_storage_conn_id=CONNECTION_ID)}} > {{ hook.delete(}} > {{ bucket=GCS_BUCKET_ID,}} > {{ object=gcs_export_uri)}} > > > {{This runs with PythonOperator.}} > {{The task marked as Success even though nothing was deleted.}} > {{Log:}} > [2018-12-12 11:31:29,247] \{base_task_runner.py:98} INFO - Subtask: > [2018-12-12 11:31:29,247] \{transport.py:151} INFO - Attempting refresh to > obtain initial access_token [2018-12-12 11:31:29,249] > \{base_task_runner.py:98} INFO - Subtask: [2018-12-12 11:31:29,249] > \{client.py:795} INFO - Refreshing access_token [2018-12-12 11:31:29,584] > \{base_task_runner.py:98} INFO - Subtask: [2018-12-12 11:31:29,583] > \{python_operator.py:90} INFO - Done. Returned value was: None > > > I expect the function to fail and return something like "file was not found" > if there is nothing to delete Or let the user decide with specific flag if he > wants the function to fail or success if files were not found. > -- This message was sent by Atlassian JIRA (v7.6.14#76016)