[ 
https://issues.apache.org/jira/browse/AIRFLOW-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16895291#comment-16895291
 ] 

Elad commented on AIRFLOW-3503:
-------------------------------

I don't think this example code can ever work.
 The hook.delete() can delete a single file. You can specify /* and expect it 
to delete everything in that path.
 The proper way to achieve such functionality is something like:
{code:java}
def delete_folder(path_to_delete):
    """
    Delete files Google cloud storage
    """
    hook = GoogleCloudStorageHook(
            google_cloud_storage_conn_id=CONNECTION_ID)
    files = hook.list(
        bucket=GCS_BUCKET_ID,
        prefix=path_to_delete)
    for file in files:
        hook.delete(
            bucket=GCS_BUCKET_ID,
            object=file)
{code}
Maybe the best approach to resolve this is to do what happens in delete_objects 
of 
[S3Hook|https://github.com/apache/airflow/blob/master/airflow/hooks/S3_hook.py#L520].
 The delete_objects know it's a single file if keys is a string and multiple 
files if keys is a list.

With that approach you can just use the output of list() directly as input to 
delete()

I think this simplify the process significantly.

> GoogleCloudStorageHook  delete return success when nothing was done
> -------------------------------------------------------------------
>
>                 Key: AIRFLOW-3503
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3503
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: gcp
>    Affects Versions: 1.10.1
>            Reporter: lot
>            Assignee: Yohei Onishi
>            Priority: Major
>              Labels: gcp, gcs, hooks
>
> I'm loading files to BigQuery from Storage using:
>  
> {{gcs_export_uri = BQ_TABLE_NAME + '/' + EXEC_TIMESTAMP_PATH + '/*' 
> gcs_to_bigquery_op = GoogleCloudStorageToBigQueryOperator( dag=dag, 
> task_id='load_products_to_BigQuery', bucket=GCS_BUCKET_ID, 
> destination_project_dataset_table=table_name_template, 
> source_format='NEWLINE_DELIMITED_JSON', source_objects=[gcs_export_uri], 
> src_fmt_configs=\{'ignoreUnknownValues': True}, 
> create_disposition='CREATE_IF_NEEDED', write_disposition='WRITE_TRUNCATE', 
> skip_leading_rows = 1, google_cloud_storage_conn_id=CONNECTION_ID, 
> bigquery_conn_id=CONNECTION_ID)}}
>  
> After that I want to delete the files so I do:
> {{def delete_folder():}}
> {{    """}}
> {{    Delete files Google cloud storage}}
> {{    """}}
> {{    hook = GoogleCloudStorageHook(}}
> {{            google_cloud_storage_conn_id=CONNECTION_ID)}}
> {{    hook.delete(}}
> {{        bucket=GCS_BUCKET_ID,}}
> {{        object=gcs_export_uri)}}
>  
>  
> {{This runs with PythonOperator.}}
> {{The task marked as Success even though nothing was deleted.}}
> {{Log:}}
> [2018-12-12 11:31:29,247] \{base_task_runner.py:98} INFO - Subtask: 
> [2018-12-12 11:31:29,247] \{transport.py:151} INFO - Attempting refresh to 
> obtain initial access_token [2018-12-12 11:31:29,249] 
> \{base_task_runner.py:98} INFO - Subtask: [2018-12-12 11:31:29,249] 
> \{client.py:795} INFO - Refreshing access_token [2018-12-12 11:31:29,584] 
> \{base_task_runner.py:98} INFO - Subtask: [2018-12-12 11:31:29,583] 
> \{python_operator.py:90} INFO - Done. Returned value was: None
>  
>  
> I expect the function to fail and return something like "file was not found" 
> if there is nothing to delete Or let the user decide with specific flag if he 
> wants the function to fail or success if files were not found.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to