ferruzzi commented on code in PR #67144:
URL: https://github.com/apache/airflow/pull/67144#discussion_r3283454009


##########
providers/amazon/src/airflow/providers/amazon/aws/log/s3_task_handler.py:
##########
@@ -62,6 +62,8 @@ def upload(self, path: os.PathLike | str, ti: RuntimeTI):
             has_uploaded = self.write(log, remote_loc)
             if has_uploaded and self.delete_local_copy:
                 shutil.rmtree(os.path.dirname(local_loc))
+            elif has_uploaded:
+                local_loc.write_text("")

Review Comment:
   Non-blocking idea:    This is definitely an improvement in the happy case, 
but if you want to take it further in another PR, consider what happens if the 
upload to S3 fails for some reason.   I haven't thought this through all the 
way so I may be wrong.  I think has_uploaded will remain false, so the local 
copy doesn't get truncated, then on the next pass we still write with 
duplication.  If that's the case, you could tail the s3 log before uploading to 
check for duplicates and trim those before writing?  
   
   
   Just an idea.  Thanks for fixing this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to