This is an automated email from the ASF dual-hosted git repository.

eladkal pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/main by this push:
     new f9c952bb7f Fix broken log streaming from #35677 (#36127)
f9c952bb7f is described below

commit f9c952bb7f5b8dd321bd33697d3fcddc16c39d8c
Author: stavdav <58971745+stavdav...@users.noreply.github.com>
AuthorDate: Sat Dec 9 11:09:56 2023 +0100

    Fix broken log streaming from #35677 (#36127)
    
    * Fixes #28452: "TaskInstances do not succeed when using 
enable_logging=True option in DockerSwarmOperator"
    It introduces logging of Docker Swarm services which was previously not
    working.
    
    * tty=True/False to be chosen by the user as it was the case before this 
fix (#28452)
    
    * Follow=true for logs will always result in tasks not ending. This is  
standard and provided upstream by the Docker API.
    Therefore in DockerSwarmOperator follow is always false.
    
    * service_logs called multiple times as we continuously poll the Docker API 
for logs. As we indicated in the previous commmit,
    the docker client malfunctions when we try to get the logs with 
follow=True. Therefore we make multiple calls to the API (every 2 seconds), to 
fetch the new logs.
    
    * service_logs called multiple times. In this test the tasks increase (6 
instead of 5) as we check if the service has terminated (+1). As this assertion 
makes less sense in a situation where we do multiple calls to the Docker API 
(polling), we might think of removing it or replacing it with something more 
suitable.
    
    * Final commit of this PR marking the test case that validates logging in 
the Docker Swarm Operator. We log two times a different message and we assert 
that the two lines are given back in the logs in the expected sequence.
    
    * Formatting ruff
    
    * Reverting as Github actions don't run this test as a swarm node:
    
    docker.errors.APIError: 503 Server Error for 
http+docker://localhost/v1.43/services/create: Service Unavailable ("This node 
is not a swarm manager. Use "docker swarm init" or "docker swarm join" to 
connect this node to swarm and try again.")
    
    Revert "Final commit of this PR marking the test case that validates 
logging in the Docker Swarm Operator. We log two times a different message and 
we assert that the two lines are given back in the logs in the expected 
sequence."
    
    This reverts commit 048ba1ece25fcd02c5ccc752d63759c76b098e44.
    
    * Logging "since" timestamp to avoid memory issues
    Fix for #28452
    
    * Formatting - Fix for #28452
    
    * Fixes #28452: "TaskInstances do not succeed when using 
enable_logging=True option in DockerSwarmOperator"
    It introduces logging of Docker Swarm services which was previously not
    working.
    
    * tty=True/False to be chosen by the user as it was the case before this 
fix (#28452)
    
    * Follow=true for logs will always result in tasks not ending. This is  
standard and provided upstream by the Docker API.
    Therefore in DockerSwarmOperator follow is always false.
    
    * service_logs called multiple times as we continuously poll the Docker API 
for logs. As we indicated in the previous commmit,
    the docker client malfunctions when we try to get the logs with 
follow=True. Therefore we make multiple calls to the API (every 2 seconds), to 
fetch the new logs.
    
    * service_logs called multiple times. In this test the tasks increase (6 
instead of 5) as we check if the service has terminated (+1). As this assertion 
makes less sense in a situation where we do multiple calls to the Docker API 
(polling), we might think of removing it or replacing it with something more 
suitable.
    
    * Final commit of this PR marking the test case that validates logging in 
the Docker Swarm Operator. We log two times a different message and we assert 
that the two lines are given back in the logs in the expected sequence.
    
    * Formatting ruff
    
    * Reverting as Github actions don't run this test as a swarm node:
    
    docker.errors.APIError: 503 Server Error for 
http+docker://localhost/v1.43/services/create: Service Unavailable ("This node 
is not a swarm manager. Use "docker swarm init" or "docker swarm join" to 
connect this node to swarm and try again.")
    
    Revert "Final commit of this PR marking the test case that validates 
logging in the Docker Swarm Operator. We log two times a different message and 
we assert that the two lines are given back in the logs in the expected 
sequence."
    
    This reverts commit 048ba1ece25fcd02c5ccc752d63759c76b098e44.
    
    * Logging "since" timestamp to avoid memory issues
    Fix for #28452
    
    * Formatting - Fix for #28452
    
    * Fix bugs (#28452): Correctly assign last_line_logged, last_timestamp
---
 airflow/providers/docker/operators/docker_swarm.py | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/airflow/providers/docker/operators/docker_swarm.py 
b/airflow/providers/docker/operators/docker_swarm.py
index 5622d87437..54158db0ff 100644
--- a/airflow/providers/docker/operators/docker_swarm.py
+++ b/airflow/providers/docker/operators/docker_swarm.py
@@ -201,6 +201,12 @@ class DockerSwarmOperator(DockerOperator):
                 match = 
re.match(r"(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{6,}Z) (.*)", line)
                 timestamp, message = match.groups()
                 self.log.info(message)
+
+            if len(logs) == 0:
+                return last_line_logged, since
+
+            last_line_logged = line
+
             # Floor nanoseconds to microseconds
             last_timestamp = re.sub(r"(\.\d{6})\d+Z", r"\1Z", timestamp)
             last_timestamp = datetime.strptime(last_timestamp, 
"%Y-%m-%dT%H:%M:%S.%fZ")

Reply via email to