asaf400 opened a new issue #18874: URL: https://github.com/apache/airflow/issues/18874
### Apache Airflow version 2.1.3 ### Operating System ubuntu 20.04 ### Versions of Apache Airflow Providers apache-airflow-providers-amazon==1.3.0 apache-airflow-providers-celery==1.0.1 apache-airflow-providers-databricks==1.0.1 apache-airflow-providers-docker==1.2.0 apache-airflow-providers-ftp==1.0.1 apache-airflow-providers-http==1.1.1 apache-airflow-providers-imap==1.0.1 apache-airflow-providers-oracle==1.1.0 apache-airflow-providers-postgres==1.0.2 apache-airflow-providers-presto==1.0.2 apache-airflow-providers-sftp==1.2.0 apache-airflow-providers-sqlite==1.0.2 apache-airflow-providers-ssh==1.3.0 apache-airflow-providers-tableau==1.0.0 ### Deployment Virtualenv installation ### Deployment details _No response_ ### What happened When using DockerOperator with 'do_xcom_push' set as True, Airflow documentation states that without xcom_all also True, the default False behavior is that airflow would push only the last line from docker into xcom return_value, but this statement is False do_xcom_push just returns the entire \ chunked part of the entire docker output as a single 'merged' string, all the lines, as far as I can tell.. See code lines: https://github.com/apache/airflow/blob/10023fdd65fa78033e7125d3d8103b63c127056e/airflow/providers/docker/operators/docker.py#L258 lines is the generator that cli.attach returns earlier in the code, the generator comes from docker py, see here: https://github.com/docker/docker-py/blob/7172269b067271911a9e643ebdcdca8318f2ded3/docker/api/client.py#L418 Following the api code for docker-py, and into the generator logic (socket read and stuff..) we can conclude that the assumption that 'cli.attach' returns 'lines' is wrong, therefore do_xcom_push is braking the 'contract' set by the documentation of the parameter, it never actually returns the last line. So it is probably better just to also use 'xcom_all' because that at least uses cli.logs which is a safer interface to use, until this issue is fixed, probably we need to create a matching issue in the docker-py repo, but maybe airflow should just handle the stream itself and split it into lines, returning the last line, see this example, modified from another issue on xcoms: `from docker import APIClient d = APIClient() c = d.create_container(image='ubuntu:20.04', name='TEST', command="""bash -c "echo 'test' && echo 'test2'" """, host_config=d.create_host_config(auto_remove=False, network_mode='bridge')) gen=d.attach(c['Id'], stderr=True, stdout=True, stream=True) d.start(c['Id']) print( [i for i in gen] ) print([i for i in d.logs(c['Id'], stream=True, stdout=True, stderr=True)]) d.remove_container(c['Id'])` Thank you @nullhack Related issues: https://github.com/apache/airflow/issues/15952 https://github.com/apache/airflow/issues/9164 https://github.com/apache/airflow/issues/14809 - This is a real nasty issue, I'm also having ### What you expected to happen _No response_ ### How to reproduce _No response_ ### Anything else _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org