aaronluo created AIRFLOW-4150:
---------------------------------

             Summary: Modify the docker operator implementation
                 Key: AIRFLOW-4150
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4150
             Project: Apache Airflow
          Issue Type: Improvement
          Components: docker
            Reporter: aaronluo


1.  I create a test python script  testpython.py for docker
{quote}import time

time.sleep(1000)
{quote}
 

2. I create a DAG with a task that calls the script through a docker
{quote}docker_ls = DockerOperator(
 task_id='docker_ls',
 image='python',
 working_dir = '/data/wf/',
 command='python testpython.py',
 docker_url='http://192.168.1.215:2375',
 start_date=datetime(2015, 6, 1),
 volumes = ['/data/wf:/data/wf/'],
 dag=dag

)
{quote}
 

3. When I run this DAG, obviously, celery worker will be working for a very 
long time,

In addition, docker container will also run for a long time。
{quote}for line in self.cli.logs(container=self.container['Id'], stream=True):
 line = line.strip()
 if hasattr(line, 'decode'):
 line = line.decode('utf-8')
 self.log.info(line)

result = self.cli.wait(self.container['Id'])
if result['StatusCode'] != 0:
 raise AirflowException('docker container failed: ' + repr(result)){quote}
 

My suggestion is that after submitting the task to docker, celery's 
corresponding worker should end up monitoring the docker's events, rather than 
blocking them all the time, because the events take a long time to execute.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to