asaf400 opened a new issue #18874:
URL: https://github.com/apache/airflow/issues/18874


   ### Apache Airflow version
   
   2.1.3
   
   ### Operating System
   
   ubuntu 20.04
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-amazon==1.3.0
   apache-airflow-providers-celery==1.0.1
   apache-airflow-providers-databricks==1.0.1
   apache-airflow-providers-docker==1.2.0
   apache-airflow-providers-ftp==1.0.1
   apache-airflow-providers-http==1.1.1
   apache-airflow-providers-imap==1.0.1
   apache-airflow-providers-oracle==1.1.0
   apache-airflow-providers-postgres==1.0.2
   apache-airflow-providers-presto==1.0.2
   apache-airflow-providers-sftp==1.2.0
   apache-airflow-providers-sqlite==1.0.2
   apache-airflow-providers-ssh==1.3.0
   apache-airflow-providers-tableau==1.0.0
   
   ### Deployment
   
   Virtualenv installation
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   When using DockerOperator with 'do_xcom_push' set as True,
   Airflow documentation states that without xcom_all also True, the default 
False behavior is that airflow would push only the last line from docker into 
xcom return_value, but this statement is False
   
   do_xcom_push just returns the entire \ chunked part of the entire docker 
output as a single 'merged' string,
   all the lines, as far as I can tell..
   
   See code lines:
   
https://github.com/apache/airflow/blob/10023fdd65fa78033e7125d3d8103b63c127056e/airflow/providers/docker/operators/docker.py#L258
   
   lines is the generator that cli.attach returns earlier in the code, the 
generator comes from docker py, see here:
   
https://github.com/docker/docker-py/blob/7172269b067271911a9e643ebdcdca8318f2ded3/docker/api/client.py#L418
   
   Following the api code for docker-py, and into the generator logic (socket 
read and stuff..) 
   we can conclude that the assumption that 'cli.attach' returns 'lines' is 
wrong, therefore do_xcom_push is braking the 'contract' set by the 
documentation of the parameter, it never actually returns the last line.
   
   So it is probably better just to also use 'xcom_all' because that at least 
uses cli.logs which is a safer interface to use, until this issue is fixed, 
probably we need to create a matching issue in the docker-py repo, but maybe 
airflow should just handle the stream itself and split it into lines, returning 
the last line, see this example, modified from another issue on xcoms:
   
   `from docker import APIClient                                                
                                                                                
                               
   d = APIClient()                                                              
                                                                                
                              
   c = d.create_container(image='ubuntu:20.04', name='TEST', command="""bash -c 
"echo 'test' && echo 'test2'" """, 
host_config=d.create_host_config(auto_remove=False, network_mode='bridge'))
   gen=d.attach(c['Id'], stderr=True, stdout=True, stream=True)                 
                                                                                
                              
   d.start(c['Id'])                                                             
                                                                                
                              
   print( [i for i in gen] )                                                    
                                                                                
                              
   print([i for i in d.logs(c['Id'], stream=True, stdout=True, stderr=True)])   
                                                                                
                              
   d.remove_container(c['Id'])`
   
   Thank you @nullhack
   Related issues: 
   https://github.com/apache/airflow/issues/15952
   https://github.com/apache/airflow/issues/9164
   https://github.com/apache/airflow/issues/14809 - This is a real nasty issue, 
I'm also having
   
   ### What you expected to happen
   
   _No response_
   
   ### How to reproduce
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to