[jira] [Updated] (AIRFLOW-1148) Airflow cannot handle datetime(6) column values(execution_time, start_date, end_date)

2017-04-25 Thread Maoya Sato (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maoya Sato updated AIRFLOW-1148:

Description: 
Airflow cannot handle datetime(6) column values (execution_date, start_date, 
end_date etc..)
{code}
mysql> select dag_id, execution_date from dag_run;
+-++
| dag_id  | execution_date |
+-++
| test_dag   | 2017-04-26 13:15:00.00 |
+-++
{code}
{code}
>>> from airflow import settings
>>> session = settings.Session()
>>> from airflow.models import DagRun
>>> dag = session.query(DagRun).filter_by(dag_id='test_dag').first()
>>> dag.execution_date
>>>
{code}
execution_date gets None though it should be like datetime(2017, 4, 26, 13, 15)
The reason that I know is datetime(6) is the cause. if I try with datetime 
without  fractional seconds precision, it works.
It has something to do with this migration(adding fsp to datetime column)
https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/4addfa1236f1_add_fractional_seconds_to_mysql_tables.py

I've created a simple dag (python2)
{code}
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import timedelta, datetime

default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2017, 4, 26, 13, 15),
'retries': 1,
'retry_delay': timedelta(minutes=5),
'queue': 'airflow-dev',
'end_date': datetime(2017, 4, 27, 0, 0)
}

dag = DAG(
'test_dag',
default_args=default_args,
description='A simple tutorial DAG',
schedule_interval=timedelta(minutes=1))

t1 = BashOperator(
task_id='print_date',
bash_command='date',
dag=dag)
{code}

Error below occurs
{code}
{jobs.py:354} DagFileProcessor3 ERROR - Got an exception! Propagating...
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 346, in 
helper
pickle_dags)
  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, 
in wrapper
result = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 1583, in 
process_file
self._process_dags(dagbag, dags, ti_keys_to_schedule)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 1173, in 
_process_dags
dag_run = self.create_dag_run(dag)
  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, 
in wrapper
result = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 803, in 
create_dag_run
while next_run_date <= last_run.execution_date:
TypeError: can't compare datetime.datetime to NoneType
{code}

  was:
Airflow cannot handle datetime(6) column values (execution_date, start_date, 
end_date etc..)
{code}
mysql> select dag_id, execution_date from dag_run;
+-++
| dag_id  | execution_date |
+-++
| test_dag   | 2017-04-26 13:15:00.00 |
+-++
{code}
{code}
>>> from airflow import settings
>>> session = settings.Session()
>>> from airflow.models import DagRun
>>> dag = session.query(DagRun).filter_by(dag_id='test_dag').first()
>>> dag.execution_date
>>>
{code}
execution_date gets None though it should be like datetime(2017, 4, 26, 13, 15)
The reason that I know is datetime(6) is the cause. if I try with datetime 
without  fractional seconds precision, it works.
It has something to do with this migration(adding fsp to datetime column)
https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/4addfa1236f1_add_fractional_seconds_to_mysql_tables.py

I've created a simple dag (python2)
{code:python}
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import timedelta, datetime

default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2017, 4, 26, 13, 15),
'retries': 1,
'retry_delay': timedelta(minutes=5),
'queue': 'airflow-dev',
'end_date': datetime(2017, 4, 27, 0, 0)
}

dag = DAG(
'test_dag',
default_args=default_args,
description='A simple tutorial DAG',
schedule_interval=timedelta(minutes=1))

t1 = BashOperator(
task_id='print_date',
bash_command='date',
dag=dag)
{code}

Error below occurs
{code}
{jobs.py:354} DagFileProcessor3 ERROR - Got an exception! Propagating...
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 346, in 
helper
pickle_dags)
  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, 
in wrapper
result = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jo

[jira] [Updated] (AIRFLOW-1148) Airflow cannot handle datetime(6) column values(execution_time, start_date, end_date)

2017-04-25 Thread Maoya Sato (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maoya Sato updated AIRFLOW-1148:

Description: 
Airflow cannot handle datetime(6) column values (execution_date, start_date, 
end_date etc..)
{code}
mysql> select dag_id, execution_date from dag_run;
+-++
| dag_id  | execution_date |
+-++
| test_dag   | 2017-04-26 13:15:00.00 |
+-++
{code}
{code}
>>> from airflow import settings
>>> session = settings.Session()
>>> from airflow.models import DagRun
>>> dag = session.query(DagRun).filter_by(dag_id='test_dag').first()
>>> dag.execution_date
>>>
{code}
execution_date gets None though it should be like datetime(2017, 4, 26, 13, 15)
The reason that I know is datetime(6) is the cause. if I try with datetime 
without  fractional seconds precision, it works.
It has something to do with this migration(adding fsp to datetime column)
https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/4addfa1236f1_add_fractional_seconds_to_mysql_tables.py

I've created a simple dag (python2)
{code:python}
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import timedelta, datetime

default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2017, 4, 26, 13, 15),
'retries': 1,
'retry_delay': timedelta(minutes=5),
'queue': 'airflow-dev',
'end_date': datetime(2017, 4, 27, 0, 0)
}

dag = DAG(
'test_dag',
default_args=default_args,
description='A simple tutorial DAG',
schedule_interval=timedelta(minutes=1))

t1 = BashOperator(
task_id='print_date',
bash_command='date',
dag=dag)
{code}

Error below occurs
{code}
{jobs.py:354} DagFileProcessor3 ERROR - Got an exception! Propagating...
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 346, in 
helper
pickle_dags)
  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, 
in wrapper
result = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 1583, in 
process_file
self._process_dags(dagbag, dags, ti_keys_to_schedule)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 1173, in 
_process_dags
dag_run = self.create_dag_run(dag)
  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, 
in wrapper
result = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 803, in 
create_dag_run
while next_run_date <= last_run.execution_date:
TypeError: can't compare datetime.datetime to NoneType
{code}

  was:
Airflow cannot handle datetime(6) column values
{code}
mysql> select dag_id, execution_date from dag_run;
+-++
| dag_id  | execution_date |
+-++
| test_dag   | 2017-04-26 13:15:00.00 |
+-++
{code}
{code}
>>> from airflow import settings
>>> session = settings.Session()
>>> from airflow.models import DagRun
>>> dag = session.query(DagRun).filter_by(dag_id='test_dag').first()
>>> dag.execution_date
>>>
{code}
execution_date gets None though it should be like datetime(2017, 4, 26, 13, 15)
The reason that I know is datetime(6) is the cause. if I try with datetime 
without  fractional seconds precision, it works.
It has something to do with this migration(adding fsp to datetime column)
https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/4addfa1236f1_add_fractional_seconds_to_mysql_tables.py

I've created a simple dag (python2)
{code:python}
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import timedelta, datetime

default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2017, 4, 26, 13, 15),
'retries': 1,
'retry_delay': timedelta(minutes=5),
'queue': 'airflow-dev',
'end_date': datetime(2017, 4, 27, 0, 0)
}

dag = DAG(
'test_dag',
default_args=default_args,
description='A simple tutorial DAG',
schedule_interval=timedelta(minutes=1))

t1 = BashOperator(
task_id='print_date',
bash_command='date',
dag=dag)
{code}

Error below occurs
{code}
{jobs.py:354} DagFileProcessor3 ERROR - Got an exception! Propagating...
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 346, in 
helper
pickle_dags)
  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, 
in wrapper
result = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 1583, in 
process_file
   

[jira] [Updated] (AIRFLOW-1148) Airflow cannot handle datetime(6) column values(execution_time, start_date, end_date)

2017-04-25 Thread Maoya Sato (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maoya Sato updated AIRFLOW-1148:

Description: 
Airflow cannot handle datetime(6) column values
{code}
mysql> select dag_id, execution_date from dag_run;
+-++
| dag_id  | execution_date |
+-++
| test_dag   | 2017-04-26 13:15:00.00 |
+-++
{code}
{code}
>>> from airflow import settings
>>> session = settings.Session()
>>> from airflow.models import DagRun
>>> dag = session.query(DagRun).filter_by(dag_id='test_dag').first()
>>> dag.execution_date
>>>
{code}
execution_date gets None though it should be like datetime(2017, 4, 26, 13, 15)
The reason that I know is datetime(6) is the cause. if I try with datetime 
without  fractional seconds precision, it works.
It has something to do with this migration(adding fsp to datetime column)
https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/4addfa1236f1_add_fractional_seconds_to_mysql_tables.py

I've created a simple dag (python2)
{code:python}
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import timedelta, datetime

default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2017, 4, 26, 13, 15),
'retries': 1,
'retry_delay': timedelta(minutes=5),
'queue': 'airflow-dev',
'end_date': datetime(2017, 4, 27, 0, 0)
}

dag = DAG(
'test_dag',
default_args=default_args,
description='A simple tutorial DAG',
schedule_interval=timedelta(minutes=1))

t1 = BashOperator(
task_id='print_date',
bash_command='date',
dag=dag)
{code}

Error below occurs
{code}
{jobs.py:354} DagFileProcessor3 ERROR - Got an exception! Propagating...
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 346, in 
helper
pickle_dags)
  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, 
in wrapper
result = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 1583, in 
process_file
self._process_dags(dagbag, dags, ti_keys_to_schedule)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 1173, in 
_process_dags
dag_run = self.create_dag_run(dag)
  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, 
in wrapper
result = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 803, in 
create_dag_run
while next_run_date <= last_run.execution_date:
TypeError: can't compare datetime.datetime to NoneType
{code}

  was:
Airflow cannot handle datetime(6) column values
{code}
mysql> select dag_id, execution_date from dag_run;
+-++
| dag_id  | execution_date |
+-++
| test_dag_v5 | 2017-04-26 13:15:00.00 |
+-++
{code}
{code}
>>> from airflow import settings
>>> session = settings.Session()
>>> from airflow.models import DagRun
>>> dag = session.query(DagRun).filter_by(dag_id='test_dag').first()
>>> dag.execution_date
>>>
{code}
execution_date gets None though it should be like datetime(2017, 4, 26, 13, 15)
The reason that I know is datetime(6) is the cause. if I try with datetime 
without  fractional seconds precision, it works.
It has something to do with this migration(adding fsp to datetime column)
https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/4addfa1236f1_add_fractional_seconds_to_mysql_tables.py

I've created a simple dag (python2)
{code:python}
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import timedelta, datetime

default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2017, 4, 26, 13, 15),
'retries': 1,
'retry_delay': timedelta(minutes=5),
'queue': 'airflow-dev',
'end_date': datetime(2017, 4, 27, 0, 0)
}

dag = DAG(
'test_dag',
default_args=default_args,
description='A simple tutorial DAG',
schedule_interval=timedelta(minutes=1))

t1 = BashOperator(
task_id='print_date',
bash_command='date',
dag=dag)
{code}

Error below occurs
{code}
{jobs.py:354} DagFileProcessor3 ERROR - Got an exception! Propagating...
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 346, in 
helper
pickle_dags)
  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, 
in wrapper
result = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 1583, in 
process_file
self._process_dags(dagbag, dags, ti_keys_to_

[jira] [Updated] (AIRFLOW-1148) Airflow cannot handle datetime(6) column values(execution_time, start_date, end_date)

2017-04-25 Thread Maoya Sato (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maoya Sato updated AIRFLOW-1148:

Description: 
Airflow cannot handle datetime(6) column values
{code:mysql}
mysql> select dag_id, execution_date from dag_run;
+-++
| dag_id  | execution_date |
+-++
| test_dag_v5 | 2017-04-26 13:15:00.00 |
+-++
{code}
{code}
>>> from airflow import settings
>>> session = settings.Session()
>>> from airflow.models import DagRun
>>> dag = session.query(DagRun).filter_by(dag_id='test_dag').first()
>>> dag.execution_date
>>>
{code}
execution_date gets None though it should be like datetime(2017, 4, 26, 13, 15)
The reason that I know is datetime(6) is the cause. if I try with datetime 
without  fractional seconds precision, it works.
It has something to do with this migration(adding fsp to datetime column)
https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/4addfa1236f1_add_fractional_seconds_to_mysql_tables.py

I've created a simple dag (python2)
{code:python}
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import timedelta, datetime

default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2017, 4, 26, 13, 15),
'retries': 1,
'retry_delay': timedelta(minutes=5),
'queue': 'airflow-dev',
'end_date': datetime(2017, 4, 27, 0, 0)
}

dag = DAG(
'test_dag',
default_args=default_args,
description='A simple tutorial DAG',
schedule_interval=timedelta(minutes=1))

t1 = BashOperator(
task_id='print_date',
bash_command='date',
dag=dag)
{code}

Error below occurs
{code}
{jobs.py:354} DagFileProcessor3 ERROR - Got an exception! Propagating...
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 346, in 
helper
pickle_dags)
  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, 
in wrapper
result = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 1583, in 
process_file
self._process_dags(dagbag, dags, ti_keys_to_schedule)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 1173, in 
_process_dags
dag_run = self.create_dag_run(dag)
  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, 
in wrapper
result = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 803, in 
create_dag_run
while next_run_date <= last_run.execution_date:
TypeError: can't compare datetime.datetime to NoneType
{code}

  was:
Airflow cannot handle datetime(6) column values
{code}
mysql> select * from dag_run;
++-++-++--+--+--++
| id | dag_id  | execution_date | state   | run_id  
   | external_trigger | conf | end_date | start_date |
++-++-++--+--+--++
|  2 | test_dag | 2017-04-26 13:15:00.00 | running | 
scheduled__2017-04-26T13:15:00 |0 | NULL | NULL | 
2017-04-26 13:16:00.00 |
++-++-++--+--+--++
{code}
{code}
>>> from airflow import settings
>>> session = settings.Session()
>>> from airflow.models import DagRun
>>> dag = session.query(DagRun).filter_by(dag_id='test_dag').first()
>>> dag.execution_date
>>>
{code}
execution_date gets None though it should be like datetime(2017, 4, 26, 13, 15)
The reason that I know is datetime(6) is the cause. if I try with datetime 
without  fractional seconds precision, it works.
It has something to do with this migration(adding fsp to datetime column)
https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/4addfa1236f1_add_fractional_seconds_to_mysql_tables.py

I've created a simple dag (python2)
{code:python}
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import timedelta, datetime

default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2017, 4, 26, 13, 15),
'retries': 1,
'retry_delay': timedelta(minutes=5),
'queue': 'airflow-dev',
'end_date': datetime(2017, 4, 27, 0, 0)
}

dag = DAG(
'test_dag',
default_args=default_args,
description='A simple tutorial DAG',
schedule_interval=timedelta(minutes=1))

t1 = BashOperator(
task_id='print_

[jira] [Updated] (AIRFLOW-1148) Airflow cannot handle datetime(6) column values(execution_time, start_date, end_date)

2017-04-25 Thread Maoya Sato (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maoya Sato updated AIRFLOW-1148:

Description: 
Airflow cannot handle datetime(6) column values
{code}
mysql> select dag_id, execution_date from dag_run;
+-++
| dag_id  | execution_date |
+-++
| test_dag_v5 | 2017-04-26 13:15:00.00 |
+-++
{code}
{code}
>>> from airflow import settings
>>> session = settings.Session()
>>> from airflow.models import DagRun
>>> dag = session.query(DagRun).filter_by(dag_id='test_dag').first()
>>> dag.execution_date
>>>
{code}
execution_date gets None though it should be like datetime(2017, 4, 26, 13, 15)
The reason that I know is datetime(6) is the cause. if I try with datetime 
without  fractional seconds precision, it works.
It has something to do with this migration(adding fsp to datetime column)
https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/4addfa1236f1_add_fractional_seconds_to_mysql_tables.py

I've created a simple dag (python2)
{code:python}
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import timedelta, datetime

default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2017, 4, 26, 13, 15),
'retries': 1,
'retry_delay': timedelta(minutes=5),
'queue': 'airflow-dev',
'end_date': datetime(2017, 4, 27, 0, 0)
}

dag = DAG(
'test_dag',
default_args=default_args,
description='A simple tutorial DAG',
schedule_interval=timedelta(minutes=1))

t1 = BashOperator(
task_id='print_date',
bash_command='date',
dag=dag)
{code}

Error below occurs
{code}
{jobs.py:354} DagFileProcessor3 ERROR - Got an exception! Propagating...
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 346, in 
helper
pickle_dags)
  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, 
in wrapper
result = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 1583, in 
process_file
self._process_dags(dagbag, dags, ti_keys_to_schedule)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 1173, in 
_process_dags
dag_run = self.create_dag_run(dag)
  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, 
in wrapper
result = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 803, in 
create_dag_run
while next_run_date <= last_run.execution_date:
TypeError: can't compare datetime.datetime to NoneType
{code}

  was:
Airflow cannot handle datetime(6) column values
{code:mysql}
mysql> select dag_id, execution_date from dag_run;
+-++
| dag_id  | execution_date |
+-++
| test_dag_v5 | 2017-04-26 13:15:00.00 |
+-++
{code}
{code}
>>> from airflow import settings
>>> session = settings.Session()
>>> from airflow.models import DagRun
>>> dag = session.query(DagRun).filter_by(dag_id='test_dag').first()
>>> dag.execution_date
>>>
{code}
execution_date gets None though it should be like datetime(2017, 4, 26, 13, 15)
The reason that I know is datetime(6) is the cause. if I try with datetime 
without  fractional seconds precision, it works.
It has something to do with this migration(adding fsp to datetime column)
https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/4addfa1236f1_add_fractional_seconds_to_mysql_tables.py

I've created a simple dag (python2)
{code:python}
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import timedelta, datetime

default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2017, 4, 26, 13, 15),
'retries': 1,
'retry_delay': timedelta(minutes=5),
'queue': 'airflow-dev',
'end_date': datetime(2017, 4, 27, 0, 0)
}

dag = DAG(
'test_dag',
default_args=default_args,
description='A simple tutorial DAG',
schedule_interval=timedelta(minutes=1))

t1 = BashOperator(
task_id='print_date',
bash_command='date',
dag=dag)
{code}

Error below occurs
{code}
{jobs.py:354} DagFileProcessor3 ERROR - Got an exception! Propagating...
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 346, in 
helper
pickle_dags)
  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, 
in wrapper
result = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 1583, in 
process_file
self._process_dags(dagbag, dags, ti_k

[jira] [Updated] (AIRFLOW-1148) Airflow cannot handle datetime(6) column values(execution_time, start_date, end_date)

2017-04-25 Thread Maoya Sato (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maoya Sato updated AIRFLOW-1148:

Description: 
Airflow cannot handle datetime(6) column values
{code}
mysql> select * from dag_run;
++-++-++--+--+--++
| id | dag_id  | execution_date | state   | run_id  
   | external_trigger | conf | end_date | start_date |
++-++-++--+--+--++
|  2 | test_dag | 2017-04-26 13:15:00.00 | running | 
scheduled__2017-04-26T13:15:00 |0 | NULL | NULL | 
2017-04-26 13:16:00.00 |
++-++-++--+--+--++
{code}
{code}
>>> from airflow import settings
>>> session = settings.Session()
>>> from airflow.models import DagRun
>>> dag = session.query(DagRun).filter_by(dag_id='test_dag').first()
>>> dag.execution_date
>>>
{code}
execution_date gets None though it should be like datetime(2017, 4, 26, 13, 15)
The reason that I know is datetime(6) is the cause. if I try with datetime 
without  fractional seconds precision, it works.
It has something to do with this migration(adding fsp to datetime column)
https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/4addfa1236f1_add_fractional_seconds_to_mysql_tables.py

I've created a simple dag (python2)
{code:python}
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import timedelta, datetime

default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2017, 4, 26, 13, 15),
'retries': 1,
'retry_delay': timedelta(minutes=5),
'queue': 'airflow-dev',
'end_date': datetime(2017, 4, 27, 0, 0)
}

dag = DAG(
'test_dag',
default_args=default_args,
description='A simple tutorial DAG',
schedule_interval=timedelta(minutes=1))

t1 = BashOperator(
task_id='print_date',
bash_command='date',
dag=dag)
{code}

Error below occurs
{code}
{jobs.py:354} DagFileProcessor3 ERROR - Got an exception! Propagating...
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 346, in 
helper
pickle_dags)
  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, 
in wrapper
result = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 1583, in 
process_file
self._process_dags(dagbag, dags, ti_keys_to_schedule)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 1173, in 
_process_dags
dag_run = self.create_dag_run(dag)
  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, 
in wrapper
result = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 803, in 
create_dag_run
while next_run_date <= last_run.execution_date:
TypeError: can't compare datetime.datetime to NoneType
{code}

  was:
Airflow cannot handle datetime(6) column values

{code}
>>> from airflow import settings
>>> session = settings.Session()
>>> from airflow.models import DagRun
>>> dag = session.query(DagRun).filter_by(dag_id='test_dag').first()
>>> dag.execution_date
>>>
{code}
execution_date gets None though it should be like datetime(2017, 4, 26, 13, 10)
The reason that I know is datetime(6) is the cause. if I try with datetime 
without  fractional seconds precision, it works.
It has something to do with this migration(adding fsp to datetime column)
https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/4addfa1236f1_add_fractional_seconds_to_mysql_tables.py

I've created a simple dag (python2)
{code:python}
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import timedelta, datetime

default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2017, 4, 26, 13, 15),
'retries': 1,
'retry_delay': timedelta(minutes=5),
'queue': 'airflow-dev',
'end_date': datetime(2017, 4, 27, 0, 0)
}

dag = DAG(
'test_dag',
default_args=default_args,
description='A simple tutorial DAG',
schedule_interval=timedelta(minutes=1))

t1 = BashOperator(
task_id='print_date',
bash_command='date',
dag=dag)
{code}

Error below occurs
{code}
{jobs.py:354} DagFileProcessor3 ERROR - Got an exception! Propagating...
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 346, in 
helper
pickle_dags)
  Fil

[jira] [Updated] (AIRFLOW-1148) Airflow cannot handle datetime(6) column values(execution_time, start_date, end_date)

2017-04-25 Thread Maoya Sato (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maoya Sato updated AIRFLOW-1148:

Description: 
Airflow cannot handle datetime(6) column values

{code}
>>> from airflow import settings
>>> session = settings.Session()
>>> from airflow.models import DagRun
>>> dag = session.query(DagRun).filter_by(dag_id='test_dag').first()
>>> dag.execution_date
>>>
{code}
execution_date gets None though it should be like datetime(2017, 4, 26, 13, 10)
The reason that I know is datetime(6) is the cause. if I try with datetime 
without  fractional seconds precision, it works.
It has something to do with this migration(adding fsp to datetime column)
https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/4addfa1236f1_add_fractional_seconds_to_mysql_tables.py

I've created a simple dag (python2)
{code:python}
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import timedelta, datetime

default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2017, 4, 26, 13, 15),
'retries': 1,
'retry_delay': timedelta(minutes=5),
'queue': 'airflow-dev',
'end_date': datetime(2017, 4, 27, 0, 0)
}

dag = DAG(
'test_dag',
default_args=default_args,
description='A simple tutorial DAG',
schedule_interval=timedelta(minutes=1))

t1 = BashOperator(
task_id='print_date',
bash_command='date',
dag=dag)
{code}

Error below occurs
{code}
{jobs.py:354} DagFileProcessor3 ERROR - Got an exception! Propagating...
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 346, in 
helper
pickle_dags)
  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, 
in wrapper
result = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 1583, in 
process_file
self._process_dags(dagbag, dags, ti_keys_to_schedule)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 1173, in 
_process_dags
dag_run = self.create_dag_run(dag)
  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, 
in wrapper
result = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 803, in 
create_dag_run
while next_run_date <= last_run.execution_date:
TypeError: can't compare datetime.datetime to NoneType
{code}

  was:
Airflow cannot handle datetime(6) column values

{code}
>>> from airflow import settings
>>> session = settings.Session()
>>> from airflow.models import DagRun
>>> dag = session.query(DagRun).filter_by(dag_id='test_dag').first()
>>> dag.execution_date
>>>
{code}
execution_date gets None though it should be like datetime(2017, 4, 26, 13, 10)
The reason that I know is datetime(6) is the cause. if I try with datetime 
without  fractional seconds precision, it works.
It has something to do with this migration(adding fsp to datetime column)
https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/4addfa1236f1_add_fractional_seconds_to_mysql_tables.py

I've created a simple dag (python2)
```
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import timedelta, datetime

default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2017, 4, 26, 13, 15),
'retries': 1,
'retry_delay': timedelta(minutes=5),
'queue': 'airflow-dev',
'end_date': datetime(2017, 4, 27, 0, 0)
}

dag = DAG(
'test_dag',
default_args=default_args,
description='A simple tutorial DAG',
schedule_interval=timedelta(minutes=1))

t1 = BashOperator(
task_id='print_date',
bash_command='date',
dag=dag)
```

Error below occurs
```
{jobs.py:354} DagFileProcessor3 ERROR - Got an exception! Propagating...
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 346, in 
helper
pickle_dags)
  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, 
in wrapper
result = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 1583, in 
process_file
self._process_dags(dagbag, dags, ti_keys_to_schedule)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 1173, in 
_process_dags
dag_run = self.create_dag_run(dag)
  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, 
in wrapper
result = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 803, in 
create_dag_run
while next_run_date <= last_run.execution_date:
TypeError: can't compare datetime.datetime to NoneType ```


> Airflow cannot handle datetime(6) column values(execution_time, start_date, 
> end_date)
> -

[jira] [Updated] (AIRFLOW-1148) Airflow cannot handle datetime(6) column values(execution_time, start_date, end_date)

2017-04-25 Thread Maoya Sato (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maoya Sato updated AIRFLOW-1148:

Description: 
Airflow cannot handle datetime(6) column values

{code}
>>> from airflow import settings
>>> session = settings.Session()
>>> from airflow.models import DagRun
>>> dag = session.query(DagRun).filter_by(dag_id='test_dag').first()
>>> dag.execution_date
>>>
{code}
execution_date gets None though it should be like datetime(2017, 4, 26, 13, 10)
The reason that I know is datetime(6) is the cause. if I try with datetime 
without  fractional seconds precision, it works.
It has something to do with this migration(adding fsp to datetime column)
https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/4addfa1236f1_add_fractional_seconds_to_mysql_tables.py

I've created a simple dag (python2)
```
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import timedelta, datetime

default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2017, 4, 26, 13, 15),
'retries': 1,
'retry_delay': timedelta(minutes=5),
'queue': 'airflow-dev',
'end_date': datetime(2017, 4, 27, 0, 0)
}

dag = DAG(
'test_dag',
default_args=default_args,
description='A simple tutorial DAG',
schedule_interval=timedelta(minutes=1))

t1 = BashOperator(
task_id='print_date',
bash_command='date',
dag=dag)
```

Error below occurs
```
{jobs.py:354} DagFileProcessor3 ERROR - Got an exception! Propagating...
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 346, in 
helper
pickle_dags)
  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, 
in wrapper
result = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 1583, in 
process_file
self._process_dags(dagbag, dags, ti_keys_to_schedule)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 1173, in 
_process_dags
dag_run = self.create_dag_run(dag)
  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, 
in wrapper
result = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 803, in 
create_dag_run
while next_run_date <= last_run.execution_date:
TypeError: can't compare datetime.datetime to NoneType ```

  was:
Airflow cannot handle datetime(6) column values

```
>>> from airflow import settings
>>> session = settings.Session()
>>> from airflow.models import DagRun
>>> dag = session.query(DagRun).filter_by(dag_id='test_dag').first()
>>> dag.execution_date
>>>
```
execution_date gets None though it should be like datetime(2017, 4, 26, 13, 10)
The reason that I know is datetime(6) is the cause. if I try with datetime 
without  fractional seconds precision, it works.
It has something to do with this migration(adding fsp to datetime column)
https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/4addfa1236f1_add_fractional_seconds_to_mysql_tables.py

I've created a simple dag (python2)
```
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import timedelta, datetime

default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2017, 4, 26, 13, 15),
'retries': 1,
'retry_delay': timedelta(minutes=5),
'queue': 'airflow-dev',
'end_date': datetime(2017, 4, 27, 0, 0)
}

dag = DAG(
'test_dag',
default_args=default_args,
description='A simple tutorial DAG',
schedule_interval=timedelta(minutes=1))

t1 = BashOperator(
task_id='print_date',
bash_command='date',
dag=dag)
```

Error below occurs
```
{jobs.py:354} DagFileProcessor3 ERROR - Got an exception! Propagating...
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 346, in 
helper
pickle_dags)
  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, 
in wrapper
result = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 1583, in 
process_file
self._process_dags(dagbag, dags, ti_keys_to_schedule)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 1173, in 
_process_dags
dag_run = self.create_dag_run(dag)
  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, 
in wrapper
result = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 803, in 
create_dag_run
while next_run_date <= last_run.execution_date:
TypeError: can't compare datetime.datetime to NoneType ```


> Airflow cannot handle datetime(6) column values(execution_time, start_date, 
> end_date)
> --

[jira] [Created] (AIRFLOW-1148) Airflow cannot handle datetime(6) column values(execution_time, start_date, end_date)

2017-04-25 Thread Maoya Sato (JIRA)
Maoya Sato created AIRFLOW-1148:
---

 Summary: Airflow cannot handle datetime(6) column 
values(execution_time, start_date, end_date)
 Key: AIRFLOW-1148
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1148
 Project: Apache Airflow
  Issue Type: Bug
  Components: DagRun
Affects Versions: 1.8.0
 Environment: sql_alchemy_conn: cloudSQL via cloud_sql_proxy
celery broker: amazon SQS
Reporter: Maoya Sato


Airflow cannot handle datetime(6) column values

```
>>> from airflow import settings
>>> session = settings.Session()
>>> from airflow.models import DagRun
>>> dag = session.query(DagRun).filter_by(dag_id='test_dag').first()
>>> dag.execution_date
>>>
```
execution_date gets None though it should be like datetime(2017, 4, 26, 13, 10)
The reason that I know is datetime(6) is the cause. if I try with datetime 
without  fractional seconds precision, it works.
It has something to do with this migration(adding fsp to datetime column)
https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/4addfa1236f1_add_fractional_seconds_to_mysql_tables.py

I've created a simple dag (python2)
```
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import timedelta, datetime

default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2017, 4, 26, 13, 15),
'retries': 1,
'retry_delay': timedelta(minutes=5),
'queue': 'airflow-dev',
'end_date': datetime(2017, 4, 27, 0, 0)
}

dag = DAG(
'test_dag',
default_args=default_args,
description='A simple tutorial DAG',
schedule_interval=timedelta(minutes=1))

t1 = BashOperator(
task_id='print_date',
bash_command='date',
dag=dag)
```

Error below occurs
```
{jobs.py:354} DagFileProcessor3 ERROR - Got an exception! Propagating...
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 346, in 
helper
pickle_dags)
  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, 
in wrapper
result = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 1583, in 
process_file
self._process_dags(dagbag, dags, ti_keys_to_schedule)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 1173, in 
_process_dags
dag_run = self.create_dag_run(dag)
  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, 
in wrapper
result = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line 803, in 
create_dag_run
while next_run_date <= last_run.execution_date:
TypeError: can't compare datetime.datetime to NoneType ```



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-1142) SubDAG Tasks Not Executed Even Though All Dependencies Met

2017-04-25 Thread Joe Schmid (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Schmid updated AIRFLOW-1142:

Attachment: SubDAGOperatorTaskLog-DEBUG.txt

It took a bunch of attempts, but on the 8th manually triggered dagrun I 
observed the same issue while I had DEBUG logging enabled. Here's the log of 
the subdag operator task that fails to run one of the tasks. You'll see the 
same issue as before -- it logs "Dependencies all met for  SubDAG Tasks Not Executed Even Though All Dependencies Met
> --
>
> Key: AIRFLOW-1142
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1142
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: subdag
>Affects Versions: 1.8.1
> Environment: 1.8.1rc1+incubating, Celery
>Reporter: Joe Schmid
>Priority: Blocker
> Attachments: 2017-04-24T23-20-38-776547, 
> SubDAGOperatorTaskLog-DEBUG.txt, Test_Nested_SubDAG_0.png, 
> Test_Nested_SubDAG_1-Zoomed.png, test_nested_subdag.py
>
>
> Testing on 1.8.1rc1, we noticed that tasks in subdags were not getting 
> executed even though all dependencies had been met.
> We were able to create a simple test DAG that re-creates the issue. Attached 
> is a test DAG, the log file of the subdag operator that shows it fails to run 
> even though dependencies are met, and screenshots of what the UI looks like.
> This is definitely a regression as we have many similarly constructed DAGs 
> that have been running successfully on a pre-v1.8 version (a fork of 
> 1.7.1.3+master) for some time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-1147) airflow scheduler not working

2017-04-25 Thread Mubin Khalid (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mubin Khalid updated AIRFLOW-1147:
--
Description: 
I've created some `DAG`s, and I tried to put it on scheduler. I want to run all 
the tasks in the DAG after exact 24 hours.
I tried to do something like this.
{code}
DEFAULT_ARGS= {
'owner'   : 'mubin',
'depends_on_past' : False,
'start_date'  : datetime(2017, 4, 24, 14, 30),
'retries' : 5,
'retry_delay' : timedetla(1),
}
SCHEDULE_INTERVAL  = timedelta(minutes=1440)
# SCHEDULE_INTERVAL= timedelta(hours=24)
# SCHEDULE_INTERVAL= timedelta(days=1)
dag = DAG('StandardizeDataDag',
default_args   = DEFAULT_ARGS,
schedule_interval  = SCHEDULE_INTERVAL
)
 {code}   
I tried to put different intervals, but not any working. However if I try to 
reset db  {code} airflow resetdb -y {code}  and then run  {code} airflow initdb 
{code} , it works for once. then after that, scheduler isn't able to run it.

PS.  {code} airflow scheduler {code}  executed from  {code} root {code} 


  was:
I've created some `DAG`s, and I tried to put it on scheduler. I want to run all 
the tasks in the DAG after exact 24 hours.
I tried to do something like this.
{code}
DEFAULT_ARGS= {
'owner'   : 'mubin',
'depends_on_past' : False,
'start_date'  : datetime(2017, 4, 24, 14, 30),
'retries' : 5,
'retry_delay' : timedetla(1),
}
SCHEDULE_INTERVAL  = timedelta(minutes=1440)
# SCHEDULE_INTERVAL= timedelta(hours=24)
# SCHEDULE_INTERVAL= timedelta(days=1)
dag = DAG('StandardizeDataDag',
default_args   = DEFAULT_ARGS,
schedule_interval  = SCHEDULE_INTERVAL
)
 {code}   
I tried to put different intervals, but not any working. However if I try to 
reset db `airflow resetdb -y` and then run `airflow initdb`, it works for once. 
then after that, scheduler isn't able to run it.

PS. `airflow scheduler` is executed via `root`

can anybody point me what I'm doing wrong?


> airflow scheduler not working
> -
>
> Key: AIRFLOW-1147
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1147
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.8
> Environment: CentOS running on 128 GB ram
>Reporter: Mubin Khalid
>Priority: Critical
>  Labels: documentation, newbie
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I've created some `DAG`s, and I tried to put it on scheduler. I want to run 
> all the tasks in the DAG after exact 24 hours.
> I tried to do something like this.
> {code}
> DEFAULT_ARGS= {
> 'owner'   : 'mubin',
> 'depends_on_past' : False,
> 'start_date'  : datetime(2017, 4, 24, 14, 30),
> 'retries' : 5,
> 'retry_delay' : timedetla(1),
> }
> SCHEDULE_INTERVAL  = timedelta(minutes=1440)
> # SCHEDULE_INTERVAL= timedelta(hours=24)
> # SCHEDULE_INTERVAL= timedelta(days=1)
> dag = DAG('StandardizeDataDag',
> default_args   = DEFAULT_ARGS,
> schedule_interval  = SCHEDULE_INTERVAL
> )
>  {code}   
> I tried to put different intervals, but not any working. However if I try to 
> reset db  {code} airflow resetdb -y {code}  and then run  {code} airflow 
> initdb {code} , it works for once. then after that, scheduler isn't able to 
> run it.
> PS.  {code} airflow scheduler {code}  executed from  {code} root {code} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-1147) airflow scheduler not working

2017-04-25 Thread Mubin Khalid (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mubin Khalid updated AIRFLOW-1147:
--
Description: 
I've created some `DAG`s, and I tried to put it on scheduler. I want to run all 
the tasks in the DAG after exact 24 hours.
I tried to do something like this.
{code}
DEFAULT_ARGS= {
'owner'   : 'mubin',
'depends_on_past' : False,
'start_date'  : datetime(2017, 4, 24, 14, 30),
'retries' : 5,
'retry_delay' : timedetla(1),
}
SCHEDULE_INTERVAL  = timedelta(minutes=1440)
# SCHEDULE_INTERVAL= timedelta(hours=24)
# SCHEDULE_INTERVAL= timedelta(days=1)
dag = DAG('StandardizeDataDag',
default_args   = DEFAULT_ARGS,
schedule_interval  = SCHEDULE_INTERVAL
)
 {code}   
I tried to put different intervals, but not any working. However if I try to 
reset db `airflow resetdb -y` and then run `airflow initdb`, it works for once. 
then after that, scheduler isn't able to run it.

PS. `airflow scheduler` is executed via `root`

can anybody point me what I'm doing wrong?

  was:
I've created some `DAG`s, and I tried to put it on scheduler. I want to run all 
the tasks in the DAG after exact 24 hours.
I tried to do something like this.

DEFAULT_ARGS= {
'owner'   : 'mubin',
'depends_on_past' : False,
'start_date'  : datetime(2017, 4, 24, 14, 30),
'retries' : 5,
'retry_delay' : timedetla(1),
}
SCHEDULE_INTERVAL  = timedelta(minutes=1440)
# SCHEDULE_INTERVAL= timedelta(hours=24)
# SCHEDULE_INTERVAL= timedelta(days=1)
dag = DAG('StandardizeDataDag',
default_args   = DEFAULT_ARGS,
schedule_interval  = SCHEDULE_INTERVAL
)

I tried to put different intervals, but not any working. However if I try to 
reset db `airflow resetdb -y` and then run `airflow initdb`, it works for once. 
then after that, scheduler isn't able to run it.

PS. `airflow scheduler` is executed via `root`

can anybody point me what I'm doing wrong?


> airflow scheduler not working
> -
>
> Key: AIRFLOW-1147
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1147
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.8
> Environment: CentOS running on 128 GB ram
>Reporter: Mubin Khalid
>Priority: Critical
>  Labels: documentation, newbie
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I've created some `DAG`s, and I tried to put it on scheduler. I want to run 
> all the tasks in the DAG after exact 24 hours.
> I tried to do something like this.
> {code}
> DEFAULT_ARGS= {
> 'owner'   : 'mubin',
> 'depends_on_past' : False,
> 'start_date'  : datetime(2017, 4, 24, 14, 30),
> 'retries' : 5,
> 'retry_delay' : timedetla(1),
> }
> SCHEDULE_INTERVAL  = timedelta(minutes=1440)
> # SCHEDULE_INTERVAL= timedelta(hours=24)
> # SCHEDULE_INTERVAL= timedelta(days=1)
> dag = DAG('StandardizeDataDag',
> default_args   = DEFAULT_ARGS,
> schedule_interval  = SCHEDULE_INTERVAL
> )
>  {code}   
> I tried to put different intervals, but not any working. However if I try to 
> reset db `airflow resetdb -y` and then run `airflow initdb`, it works for 
> once. then after that, scheduler isn't able to run it.
> PS. `airflow scheduler` is executed via `root`
> can anybody point me what I'm doing wrong?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1147) airflow scheduler not working

2017-04-25 Thread Mubin Khalid (JIRA)
Mubin Khalid created AIRFLOW-1147:
-

 Summary: airflow scheduler not working
 Key: AIRFLOW-1147
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1147
 Project: Apache Airflow
  Issue Type: Bug
  Components: scheduler
Affects Versions: Airflow 1.8
 Environment: CentOS running on 128 GB ram
Reporter: Mubin Khalid
Priority: Critical


I've created some `DAG`s, and I tried to put it on scheduler. I want to run all 
the tasks in the DAG after exact 24 hours.
I tried to do something like this.

DEFAULT_ARGS= {
'owner'   : 'mubin',
'depends_on_past' : False,
'start_date'  : datetime(2017, 4, 24, 14, 30),
'retries' : 5,
'retry_delay' : timedetla(1),
}
SCHEDULE_INTERVAL  = timedelta(minutes=1440)
# SCHEDULE_INTERVAL= timedelta(hours=24)
# SCHEDULE_INTERVAL= timedelta(days=1)
dag = DAG('StandardizeDataDag',
default_args   = DEFAULT_ARGS,
schedule_interval  = SCHEDULE_INTERVAL
)

I tried to put different intervals, but not any working. However if I try to 
reset db `airflow resetdb -y` and then run `airflow initdb`, it works for once. 
then after that, scheduler isn't able to run it.

PS. `airflow scheduler` is executed via `root`

can anybody point me what I'm doing wrong?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1146) izip use in Python 3.4

2017-04-25 Thread Alexander Panzhin (JIRA)
Alexander Panzhin created AIRFLOW-1146:
--

 Summary: izip use in Python 3.4
 Key: AIRFLOW-1146
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1146
 Project: Apache Airflow
  Issue Type: Bug
  Components: hive_hooks
Affects Versions: Airflow 1.8
Reporter: Alexander Panzhin


Python 3 no longer has itertools.izip, but it is still used in 
airflow/hooks/hive_hooks.py

This causes all kinds of havoc.

This needs fixed, if this is to be used on Python 3+



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work started] (AIRFLOW-1145) Closest_date_partition not working with before = True

2017-04-25 Thread Julien GRAND-MOURCEL (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-1145 started by Julien GRAND-MOURCEL.
-
> Closest_date_partition not working with before = True
> -
>
> Key: AIRFLOW-1145
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1145
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hive_hooks, plugins
>Affects Versions: Airflow 2.0, Airflow 1.8
>Reporter: Julien GRAND-MOURCEL
>Assignee: Julien GRAND-MOURCEL
>Priority: Minor
>  Labels: easyfix, features, newbie
> Fix For: Airflow 2.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When using the closest_date_partition with the parameter "before" set to 
> True, the function always returns the oldest date for this partition.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-1142) SubDAG Tasks Not Executed Even Though All Dependencies Met

2017-04-25 Thread Joe Schmid (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982876#comment-15982876
 ] 

Joe Schmid commented on AIRFLOW-1142:
-

Bolke, thanks for checking this out. Answers to your questions:

* The two successful runs shown in the UI had a different number of tasks (2) 
in the Level2 subdag. The failure happened when I added a third task to the 
Level2 subdag. However, the issue does seem to be intermittent and you might 
have to trigger that test DAG a few times to observe the issue.
* I can definitely rerun with debug logging on and capture logs

The item that sticks out from the existing log is that dependencies are all met 
for the task that never runs and it just logs that over and over:

Dependencies all met for 
{models.py:4061} INFO - Updating state for  considering 3 
task(s)
{jobs.py:1994} INFO - [backfill progress] | finished run 0 of 1 | tasks 
waiting: 1 | succeeded: 2 | kicked_off: 0 | failed: 0 | skipped: 0 | 
deadlocked: 0 | not ready: 0
{models.py:1126} INFO - Dependencies all met for 
{models.py:4061} INFO - Updating state for  considering 3 
task(s)
{jobs.py:1994} INFO - [backfill progress] | finished run 0 of 1 | tasks 
waiting: 1 | succeeded: 2 | kicked_off: 0 | failed: 0 | skipped: 0 | 
deadlocked: 0 | not ready: 0


> SubDAG Tasks Not Executed Even Though All Dependencies Met
> --
>
> Key: AIRFLOW-1142
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1142
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: subdag
>Affects Versions: 1.8.1
> Environment: 1.8.1rc1+incubating, Celery
>Reporter: Joe Schmid
>Priority: Blocker
> Attachments: 2017-04-24T23-20-38-776547, Test_Nested_SubDAG_0.png, 
> Test_Nested_SubDAG_1-Zoomed.png, test_nested_subdag.py
>
>
> Testing on 1.8.1rc1, we noticed that tasks in subdags were not getting 
> executed even though all dependencies had been met.
> We were able to create a simple test DAG that re-creates the issue. Attached 
> is a test DAG, the log file of the subdag operator that shows it fails to run 
> even though dependencies are met, and screenshots of what the UI looks like.
> This is definitely a regression as we have many similarly constructed DAGs 
> that have been running successfully on a pre-v1.8 version (a fork of 
> 1.7.1.3+master) for some time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1145) Closest_date_partition not working with before = True

2017-04-25 Thread Julien GRAND-MOURCEL (JIRA)
Julien GRAND-MOURCEL created AIRFLOW-1145:
-

 Summary: Closest_date_partition not working with before = True
 Key: AIRFLOW-1145
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1145
 Project: Apache Airflow
  Issue Type: Bug
  Components: hive_hooks, plugins
Affects Versions: Airflow 2.0, Airflow 1.8
Reporter: Julien GRAND-MOURCEL
Assignee: Julien GRAND-MOURCEL
Priority: Minor
 Fix For: Airflow 2.0


When using the closest_date_partition with the parameter "before" set to True, 
the function always returns the oldest date for this partition.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-1086) Fail to execute task with upstream dependency in subdag

2017-04-25 Thread Laurent Bonafons (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laurent Bonafons updated AIRFLOW-1086:
--
Affects Version/s: 1.8.0

> Fail to execute task with upstream dependency in subdag
> ---
>
> Key: AIRFLOW-1086
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1086
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: celery, subdag
>Affects Versions: Airflow 1.8, 1.8.0
>Reporter: Laurent Bonafons
> Attachments: test_bubdag_task_instances.png, test_subdag_graph.png, 
> test_subdag.py
>
>
> Hello,
> We have been migrating from Airflow v1.7.1.3 to v1.8.0 and we can't run 
> subdags anymore. We use CeleryExecutor with RabbitMQ for backend.
> I tested on more and more simplified cases to finish up with the great 
> example "test_subdag" from Joe Schmid (cf attachment).
> And it still doesn't work. In a subdag only the first tasks, the ones without 
> upstream dependencies, run.
> When a task is successful in a subdag, downstream tasks are not executed at 
> all even if in the log of the subdag we can see that "Dependencies all met" 
> for the task.
> This looks similar to AIRFLOW-955 ("job failed to execute tasks") reported by 
> Jeff Liu
> but here we're not on level 2, it's just a subdag containing tasks.
> Here an example of subdag log in v1.7.1.3
> {noformat} [2017-04-06 12:11:33,648] {models.py:154} INFO - Filling up the 
> DagBag from /usr/local/airflow/dags/tricky_test_3.py
> [2017-04-06 12:11:35,052] {models.py:154} INFO - Filling up the DagBag from 
> /usr/local/airflow/dags/tricky_test_3.py
> [2017-04-06 12:11:35,125] {models.py:1196} INFO - 
> 
> Starting attempt 1 of 1
> 
> [2017-04-06 12:11:35,136] {models.py:1219} INFO - Executing 
>  on 2017-04-03 00:00:00
> [2017-04-06 12:11:35,165] {base_executor.py:36} INFO - Adding to queue: 
> airflow run Test_SubDAG.SubDagOp SubDAG_Task1 2017-04-03T00:00:00 --local -sd 
> DAGS_FOLDER/tricky_test_3.py 
> [2017-04-06 12:11:40,014] {sequential_executor.py:26} INFO - Executing 
> command: airflow run Test_SubDAG.SubDagOp SubDAG_Task1 2017-04-03T00:00:00 
> --local -sd DAGS_FOLDER/tricky_test_3.py 
> [2017-04-06 12:11:46,176] {jobs.py:934} INFO - Task instance 
> ('Test_SubDAG.SubDagOp', 'SubDAG_Task1', datetime.datetime(2017, 4, 3, 0, 0)) 
> succeeded
> [2017-04-06 12:11:46,176] {jobs.py:997} INFO - [backfill progress] | waiting: 
> 1 | succeeded: 1 | kicked_off: 1 | failed: 0 | skipped: 0 | deadlocked: 0
> [2017-04-06 12:11:46,185] {base_executor.py:36} INFO - Adding to queue: 
> airflow run Test_SubDAG.SubDagOp SubDAG_Task2 2017-04-03T00:00:00 --local -sd 
> DAGS_FOLDER/tricky_test_3.py 
> [2017-04-06 12:11:46,195] {sequential_executor.py:26} INFO - Executing 
> command: airflow run Test_SubDAG.SubDagOp SubDAG_Task2 2017-04-03T00:00:00 
> --local -sd DAGS_FOLDER/tricky_test_3.py 
> [2017-04-06 12:11:52,177] {jobs.py:934} INFO - Task instance 
> ('Test_SubDAG.SubDagOp', 'SubDAG_Task2', datetime.datetime(2017, 4, 3, 0, 0)) 
> succeeded
> [2017-04-06 12:11:52,177] {jobs.py:997} INFO - [backfill progress] | waiting: 
> 0 | succeeded: 2 | kicked_off: 2 | failed: 0 | skipped: 0 | deadlocked: 0
> [2017-04-06 12:11:52,178] {jobs.py:1026} INFO - Backfill done. Exiting.
> {noformat}
> And here in v1.8.0
> {noformat}
> [2017-04-05 16:17:51,854] {models.py:167} INFO - Filling up the DagBag from 
> /usr/local/airflow/dags/tricky_test_3.py
> [2017-04-05 16:17:51,996] {base_task_runner.py:112} INFO - Running: ['bash', 
> '-c', u'airflow run Test_SubDAG SubDagOp 2017-04-04T00:00:00 --job_id 9987 
> --raw -sd DAGS_FOLDER/tricky_test_3.py']
> [2017-04-05 16:17:52,803] {base_task_runner.py:95} INFO - Subtask: 
> [2017-04-05 16:17:52,803] {__init__.py:57} INFO - Using executor 
> CeleryExecutor
> [2017-04-05 16:17:52,917] {base_task_runner.py:95} INFO - Subtask: 
> [2017-04-05 16:17:52,917] {driver.py:120} INFO - Generating grammar tables 
> from /usr/lib/python2.7/lib2to3/Grammar.txt
> [2017-04-05 16:17:52,957] {base_task_runner.py:95} INFO - Subtask: 
> [2017-04-05 16:17:52,956] {driver.py:120} INFO - Generating grammar tables 
> from /usr/lib/python2.7/lib2to3/PatternGrammar.txt
> [2017-04-05 16:17:53,262] {base_task_runner.py:95} INFO - Subtask: 
> [2017-04-05 16:17:53,262] {models.py:167} INFO - Filling up the DagBag from 
> /usr/local/airflow/dags/tricky_test_3.py
> [2017-04-05 16:17:53,401] {base_task_runner.py:95} INFO - Subtask: 
> [2017-04-05 16:17:53,400] {models.py:1126} INFO - Dependencies all met for 
> 
> [2017-04-05 16:17:53,409] {base_task_runner.py:95} INFO - Subtask: 
> [2017-04-05 16:17:53,409] {models.py:1126} INFO - Dependencies all met for 
> 
>

[jira] [Commented] (AIRFLOW-1119) Redshift to S3 operator - headers not on first row

2017-04-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982633#comment-15982633
 ] 

ASF subversion and git services commented on AIRFLOW-1119:
--

Commit 4147d6b8091309cbf373d8f24d40da2e4b549473 in incubator-airflow's branch 
refs/heads/master from Thomas Hofer
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=4147d6b ]

[AIRFLOW-1119] Fix unload query so headers are on first row[]

Closes #2245 from th11/airflow-1119-fix


> Redshift to S3 operator - headers not on first row
> --
>
> Key: AIRFLOW-1119
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1119
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Thomas H
> Fix For: 1.8.2
>
> Attachments: airflow-bug.png
>
>
> The RedshiftToS3 operator adds headers when unloading data from redshift to 
> s3. However, there is a bug where the headers can appear in any row (see 
> screenshot). Need to add `ORDER BY 1 DESC` to the query below to ensure 
> headers are on first row.
> https://github.com/apache/incubator-airflow/blob/master/airflow/operators/redshift_to_s3_operator.py#L93:L102
> More info regarding unloading data with headers:
> http://stackoverflow.com/questions/24681214/unloading-from-redshift-to-s3-with-headers
> https://medium.com/carwow-product-engineering/unloading-a-file-from-redshift-to-s3-with-headers-fb707f5480f7



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-1119) Redshift to S3 operator - headers not on first row

2017-04-25 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-1119.
-
   Resolution: Fixed
Fix Version/s: 1.8.2

Issue resolved by pull request #2245
[https://github.com/apache/incubator-airflow/pull/2245]

> Redshift to S3 operator - headers not on first row
> --
>
> Key: AIRFLOW-1119
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1119
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Thomas H
> Fix For: 1.8.2
>
> Attachments: airflow-bug.png
>
>
> The RedshiftToS3 operator adds headers when unloading data from redshift to 
> s3. However, there is a bug where the headers can appear in any row (see 
> screenshot). Need to add `ORDER BY 1 DESC` to the query below to ensure 
> headers are on first row.
> https://github.com/apache/incubator-airflow/blob/master/airflow/operators/redshift_to_s3_operator.py#L93:L102
> More info regarding unloading data with headers:
> http://stackoverflow.com/questions/24681214/unloading-from-redshift-to-s3-with-headers
> https://medium.com/carwow-product-engineering/unloading-a-file-from-redshift-to-s3-with-headers-fb707f5480f7



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


incubator-airflow git commit: [AIRFLOW-1119] Fix unload query so headers are on first row[]

2017-04-25 Thread bolke
Repository: incubator-airflow
Updated Branches:
  refs/heads/master e5b914789 -> 4147d6b80


[AIRFLOW-1119] Fix unload query so headers are on first row[]

Closes #2245 from th11/airflow-1119-fix


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/4147d6b8
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/4147d6b8
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/4147d6b8

Branch: refs/heads/master
Commit: 4147d6b8091309cbf373d8f24d40da2e4b549473
Parents: e5b9147
Author: Thomas Hofer 
Authored: Tue Apr 25 11:31:31 2017 +0200
Committer: Bolke de Bruin 
Committed: Tue Apr 25 11:31:31 2017 +0200

--
 airflow/operators/redshift_to_s3_operator.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/4147d6b8/airflow/operators/redshift_to_s3_operator.py
--
diff --git a/airflow/operators/redshift_to_s3_operator.py 
b/airflow/operators/redshift_to_s3_operator.py
index d9ef59d..fda88d9 100644
--- a/airflow/operators/redshift_to_s3_operator.py
+++ b/airflow/operators/redshift_to_s3_operator.py
@@ -93,7 +93,8 @@ class RedshiftToS3Transfer(BaseOperator):
 unload_query = """
 UNLOAD ('SELECT {0}
 UNION ALL
-SELECT {1} FROM {2}.{3}')
+SELECT {1} FROM {2}.{3}
+ORDER BY 1 DESC')
 TO 's3://{4}/{5}/{3}_'
 with
 credentials 
'aws_access_key_id={6};aws_secret_access_key={7}'



[jira] [Resolved] (AIRFLOW-1089) Add Spark application arguments to SparkSubmitOperator

2017-04-25 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-1089.
-
   Resolution: Fixed
Fix Version/s: 1.9.0
   1.8.2

Issue resolved by pull request #2229
[https://github.com/apache/incubator-airflow/pull/2229]

> Add Spark application arguments to SparkSubmitOperator
> --
>
> Key: AIRFLOW-1089
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1089
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Affects Versions: Airflow 1.8
>Reporter: Stephan Werges
> Fix For: 1.8.2, 1.9.0
>
>
> Pass Spark application arguments to SparkSubmitOperator.  For example:
> spark-submit -class com.Foo.Bar foobar.jar arg1 arg2



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-1089) Add Spark application arguments to SparkSubmitOperator

2017-04-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982629#comment-15982629
 ] 

ASF subversion and git services commented on AIRFLOW-1089:
--

Commit e5b9147894b0d47bf36f1c2570d765b16c1c2506 in incubator-airflow's branch 
refs/heads/master from [~camshrun]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=e5b9147 ]

[AIRFLOW-1089] Add Spark application arguments

Allows arguments to be passed to the Spark
application being
submitted. For example:

- spark-submit --class foo.Bar foobar.jar arg1
arg2
- spark-submit app.py arg1 arg2

Closes #2229 from camshrun/sparkSubmitAppArgs


> Add Spark application arguments to SparkSubmitOperator
> --
>
> Key: AIRFLOW-1089
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1089
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Affects Versions: Airflow 1.8
>Reporter: Stephan Werges
>
> Pass Spark application arguments to SparkSubmitOperator.  For example:
> spark-submit -class com.Foo.Bar foobar.jar arg1 arg2



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


incubator-airflow git commit: [AIRFLOW-1089] Add Spark application arguments

2017-04-25 Thread bolke
Repository: incubator-airflow
Updated Branches:
  refs/heads/master 831f8d504 -> e5b914789


[AIRFLOW-1089] Add Spark application arguments

Allows arguments to be passed to the Spark
application being
submitted. For example:

- spark-submit --class foo.Bar foobar.jar arg1
arg2
- spark-submit app.py arg1 arg2

Closes #2229 from camshrun/sparkSubmitAppArgs


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/e5b91478
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/e5b91478
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/e5b91478

Branch: refs/heads/master
Commit: e5b9147894b0d47bf36f1c2570d765b16c1c2506
Parents: 831f8d5
Author: Stephan Werges 
Authored: Tue Apr 25 11:28:31 2017 +0200
Committer: Bolke de Bruin 
Committed: Tue Apr 25 11:28:31 2017 +0200

--
 airflow/contrib/hooks/spark_submit_hook.py  |  9 +
 airflow/contrib/operators/spark_submit_operator.py  |  5 +
 tests/contrib/hooks/test_spark_submit_hook.py   | 16 +++-
 .../contrib/operators/test_spark_submit_operator.py |  7 ++-
 4 files changed, 35 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/e5b91478/airflow/contrib/hooks/spark_submit_hook.py
--
diff --git a/airflow/contrib/hooks/spark_submit_hook.py 
b/airflow/contrib/hooks/spark_submit_hook.py
index 59d28b5..e4ce797 100644
--- a/airflow/contrib/hooks/spark_submit_hook.py
+++ b/airflow/contrib/hooks/spark_submit_hook.py
@@ -56,6 +56,8 @@ class SparkSubmitHook(BaseHook):
 :type name: str
 :param num_executors: Number of executors to launch
 :type num_executors: int
+:param application_args: Arguments for the application being submitted
+:type application_args: list
 :param verbose: Whether to pass the verbose flag to spark-submit process 
for debugging
 :type verbose: bool
 """
@@ -74,6 +76,7 @@ class SparkSubmitHook(BaseHook):
  principal=None,
  name='default-name',
  num_executors=None,
+ application_args=None,
  verbose=False):
 self._conf = conf
 self._conn_id = conn_id
@@ -88,6 +91,7 @@ class SparkSubmitHook(BaseHook):
 self._principal = principal
 self._name = name
 self._num_executors = num_executors
+self._application_args = application_args
 self._verbose = verbose
 self._sp = None
 self._yarn_application_id = None
@@ -183,6 +187,11 @@ class SparkSubmitHook(BaseHook):
 # The actual script to execute
 connection_cmd += [application]
 
+# Append any application arguments
+if self._application_args:
+for arg in self._application_args:
+connection_cmd += [arg]
+
 logging.debug("Spark-Submit cmd: {}".format(connection_cmd))
 
 return connection_cmd

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/e5b91478/airflow/contrib/operators/spark_submit_operator.py
--
diff --git a/airflow/contrib/operators/spark_submit_operator.py 
b/airflow/contrib/operators/spark_submit_operator.py
index f62c395..2a7e3cf 100644
--- a/airflow/contrib/operators/spark_submit_operator.py
+++ b/airflow/contrib/operators/spark_submit_operator.py
@@ -56,6 +56,8 @@ class SparkSubmitOperator(BaseOperator):
 :type name: str
 :param num_executors: Number of executors to launch
 :type num_executors: int
+:param application_args: Arguments for the application being submitted
+:type application_args: list
 :param verbose: Whether to pass the verbose flag to spark-submit process 
for debugging
 :type verbose: bool
 """
@@ -76,6 +78,7 @@ class SparkSubmitOperator(BaseOperator):
  principal=None,
  name='airflow-spark',
  num_executors=None,
+ application_args=None,
  verbose=False,
  *args,
  **kwargs):
@@ -93,6 +96,7 @@ class SparkSubmitOperator(BaseOperator):
 self._principal = principal
 self._name = name
 self._num_executors = num_executors
+self._application_args = application_args
 self._verbose = verbose
 self._hook = None
 self._conn_id = conn_id
@@ -115,6 +119,7 @@ class SparkSubmitOperator(BaseOperator):
 principal=self._principal,
 name=self._name,
 num_executors=self._num_executors,
+application_args=self._application_args,
 verbose=self._verbose
 )
 self._hook.submit(self._application)

http://git

[jira] [Commented] (AIRFLOW-1125) Clarify documentation regarding fernet_key

2017-04-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982627#comment-15982627
 ] 

ASF subversion and git services commented on AIRFLOW-1125:
--

Commit 831f8d504f8c7a1511dab61a560b7ec72dc95c4d in incubator-airflow's branch 
refs/heads/master from [~boristyukin]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=831f8d5 ]

[AIRFLOW-1125] Document encrypted connections

Clarify documentation regarding fernet_key and how
to
enable encryption if it was not enabled during
install.

Closes #2251 from boristyukin/airflow-1125


> Clarify documentation regarding fernet_key
> --
>
> Key: AIRFLOW-1125
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1125
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Boris Tyukin
>Assignee: Boris Tyukin
>Priority: Trivial
>  Labels: documentation
> Fix For: 1.9.0
>
>
> Steps are not clear how to setup connections encryption if airflow[crypto] 
> was not installed initially



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-1125) Clarify documentation regarding fernet_key

2017-04-25 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-1125.
-
   Resolution: Fixed
Fix Version/s: 1.9.0

Issue resolved by pull request #2251
[https://github.com/apache/incubator-airflow/pull/2251]

> Clarify documentation regarding fernet_key
> --
>
> Key: AIRFLOW-1125
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1125
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Boris Tyukin
>Assignee: Boris Tyukin
>Priority: Trivial
>  Labels: documentation
> Fix For: 1.9.0
>
>
> Steps are not clear how to setup connections encryption if airflow[crypto] 
> was not installed initially



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


incubator-airflow git commit: [AIRFLOW-1125] Document encrypted connections

2017-04-25 Thread bolke
Repository: incubator-airflow
Updated Branches:
  refs/heads/master a08761a39 -> 831f8d504


[AIRFLOW-1125] Document encrypted connections

Clarify documentation regarding fernet_key and how
to
enable encryption if it was not enabled during
install.

Closes #2251 from boristyukin/airflow-1125


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/831f8d50
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/831f8d50
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/831f8d50

Branch: refs/heads/master
Commit: 831f8d504f8c7a1511dab61a560b7ec72dc95c4d
Parents: a08761a
Author: Boris Tyukin 
Authored: Tue Apr 25 11:27:11 2017 +0200
Committer: Bolke de Bruin 
Committed: Tue Apr 25 11:27:11 2017 +0200

--
 docs/configuration.rst | 25 +
 docs/faq.rst   |  4 ++--
 2 files changed, 27 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/831f8d50/docs/configuration.rst
--
diff --git a/docs/configuration.rst b/docs/configuration.rst
index 5ff4284..ccafb71 100644
--- a/docs/configuration.rst
+++ b/docs/configuration.rst
@@ -83,6 +83,31 @@ within the metadata database. The ``crypto`` package is 
highly recommended
 during installation. The ``crypto`` package does require that your operating
 system have libffi-dev installed.
 
+If ``crypto`` package was not installed initially, you can still enable 
encryption for 
+connections by following steps below:
+
+1. Install crypto package ``pip install airflow[crypto]``
+2. Generate fernet_key, using this code snippet below. fernet_key must be a 
base64-encoded 32-byte key.
+
+.. code:: python
+
+from cryptography.fernet import Fernet
+fernet_key= Fernet.generate_key()
+print(fernet_key) # your fernet_key, keep it in secured place!
+
+3. Replace ``airflow.cfg`` fernet_key value with the one from step 2. 
+Alternatively, you can store your fernet_key in OS environment variable. You
+do not need to change ``airflow.cfg`` in this case as AirFlow will use 
environment 
+variable over the value in ``airflow.cfg``:
+
+.. code-block:: bash
+  
+  # Note the double underscores
+  EXPORT AIRFLOW__CORE__FERNET_KEY = your_fernet_key
+ 
+4. Restart AirFlow webserver.
+5. For existing connections (the ones that you had defined before installing 
``airflow[crypto]`` and creating a Fernet key), you need to open each 
connection in the connection admin UI, re-type the password, and save it.
+
 Connections in Airflow pipelines can be created using environment variables.
 The environment variable needs to have a prefix of ``AIRFLOW_CONN_`` for
 Airflow with the value in a URI format to use the connection properly. Please

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/831f8d50/docs/faq.rst
--
diff --git a/docs/faq.rst b/docs/faq.rst
index 1e4c038..2e6417b 100644
--- a/docs/faq.rst
+++ b/docs/faq.rst
@@ -66,8 +66,8 @@ documentation
 Why are connection passwords still not encrypted in the metadata db after I 
installed airflow[crypto]?
 
--
 
-- Verify that the ``fernet_key`` defined in ``$AIRFLOW_HOME/airflow.cfg`` is a 
valid Fernet key. It must be a base64-encoded 32-byte key. You need to restart 
the webserver after you update the key
-- For existing connections (the ones that you had defined before installing 
``airflow[crypto]`` and creating a Fernet key), you need to open each 
connection in the connection admin UI, re-type the password, and save it
+Check out the ``Connections`` section in the Configuration section of the
+documentation
 
 What's the deal with ``start_date``?
 



[jira] [Resolved] (AIRFLOW-1122) Node strokes are too thin for people with color vision deficiency

2017-04-25 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-1122.
-
   Resolution: Fixed
Fix Version/s: 1.9.0

Issue resolved by pull request #2246
[https://github.com/apache/incubator-airflow/pull/2246]

> Node strokes are too thin for people with color vision deficiency
> -
>
> Key: AIRFLOW-1122
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1122
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Reporter: Michael O.
>Assignee: Michael O.
>Priority: Trivial
> Fix For: 1.9.0
>
>
> The thickness of node strokes in the graph view makes it hard to discriminate 
> failed (yellow) and successful (lime) tasks.
> An increase of the stroke-width to 3px makes it much more accessible.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-1122) Node strokes are too thin for people with color vision deficiency

2017-04-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982622#comment-15982622
 ] 

ASF subversion and git services commented on AIRFLOW-1122:
--

Commit a08761a39b9a1db785e7233692eb1eaa9e2892eb in incubator-airflow's branch 
refs/heads/master from michaelosthege
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=a08761a ]

[AIRFLOW-1122] Increase stroke width in UI

A stroke width of 2px is to narrow to determine
the color for people with color vision deficiency.
A 3px stroke is much more accessible.

Closes #2246 from michaelosthege/master


> Node strokes are too thin for people with color vision deficiency
> -
>
> Key: AIRFLOW-1122
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1122
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Reporter: Michael O.
>Assignee: Michael O.
>Priority: Trivial
> Fix For: 1.9.0
>
>
> The thickness of node strokes in the graph view makes it hard to discriminate 
> failed (yellow) and successful (lime) tasks.
> An increase of the stroke-width to 3px makes it much more accessible.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


incubator-airflow git commit: [AIRFLOW-1122] Increase stroke width in UI

2017-04-25 Thread bolke
Repository: incubator-airflow
Updated Branches:
  refs/heads/master 94f9822ff -> a08761a39


[AIRFLOW-1122] Increase stroke width in UI

A stroke width of 2px is to narrow to determine
the color for people with color vision deficiency.
A 3px stroke is much more accessible.

Closes #2246 from michaelosthege/master


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/a08761a3
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/a08761a3
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/a08761a3

Branch: refs/heads/master
Commit: a08761a39b9a1db785e7233692eb1eaa9e2892eb
Parents: 94f9822
Author: michaelosthege 
Authored: Tue Apr 25 11:24:57 2017 +0200
Committer: Bolke de Bruin 
Committed: Tue Apr 25 11:25:10 2017 +0200

--
 airflow/www/static/graph.css | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/a08761a3/airflow/www/static/graph.css
--
diff --git a/airflow/www/static/graph.css b/airflow/www/static/graph.css
index e724b7a..f1d3480 100644
--- a/airflow/www/static/graph.css
+++ b/airflow/www/static/graph.css
@@ -18,7 +18,7 @@
 */
 
 g.node rect {
-stroke-width: 2;
+stroke-width: 3;
 stroke: white;
 cursor: pointer;
 }



[jira] [Commented] (AIRFLOW-1142) SubDAG Tasks Not Executed Even Though All Dependencies Met

2017-04-25 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982596#comment-15982596
 ] 

Bolke de Bruin commented on AIRFLOW-1142:
-

- You UI seems to indicate two successful runs. Can you explain why the other 
two runs seem to succeed?
- Can you rerun with debug logging enabled? I have too much missing information

At the moment I cannot confirm this with LocalExecutor and by triggering a 
single dag.

> SubDAG Tasks Not Executed Even Though All Dependencies Met
> --
>
> Key: AIRFLOW-1142
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1142
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: subdag
>Affects Versions: 1.8.1
> Environment: 1.8.1rc1+incubating, Celery
>Reporter: Joe Schmid
>Priority: Blocker
> Attachments: 2017-04-24T23-20-38-776547, Test_Nested_SubDAG_0.png, 
> Test_Nested_SubDAG_1-Zoomed.png, test_nested_subdag.py
>
>
> Testing on 1.8.1rc1, we noticed that tasks in subdags were not getting 
> executed even though all dependencies had been met.
> We were able to create a simple test DAG that re-creates the issue. Attached 
> is a test DAG, the log file of the subdag operator that shows it fails to run 
> even though dependencies are met, and screenshots of what the UI looks like.
> This is definitely a regression as we have many similarly constructed DAGs 
> that have been running successfully on a pre-v1.8 version (a fork of 
> 1.7.1.3+master) for some time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-1144) Logging causes UnicodeEncodeError when using Japanese characters

2017-04-25 Thread Sushant Karki (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushant Karki updated AIRFLOW-1144:
---
Description: 
I am using the bash operator to pipe a sql dump to my database. Since, the 
encoding of my psql client is Japanese, the output displays some Japanese 
characters. Whenever the logger tries to log the output, it raises a 
UnicodeEncodeError.
Here are the details of the error.

{code}
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/threading.py", line 813, in __bootstrap_inner
self.run()
  File "/usr/lib64/python2.7/threading.py", line 766, in run
self.__target(*self.__args, **self.__kwargs)
  File 
"/home/karki/virtualenv/master/local/lib/python2.7/site-packages/airflow/task_runner/base_task_runner.py",
 line 95, in _read_task_logs
self.logger.info('Subtask: {}'.format(line.rstrip('\n')))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u884c' in position 
58: ordinal not in range(128)
{code}

  was:
I am using the bash operator to pipe a sql dump to my database. Since, the 
encoding of my psql client is Japanese, the output displays some Japanese 
characters. Whenever the logger tries to log the output, it raises a 
UnicodeEncodeError.
Here are the details of the error.

```
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/threading.py", line 813, in __bootstrap_inner
self.run()
  File "/usr/lib64/python2.7/threading.py", line 766, in run
self.__target(*self.__args, **self.__kwargs)
  File 
"/home/karki/virtualenv/master/local/lib/python2.7/site-packages/airflow/task_runner/base_task_runner.py",
 line 95, in _read_task_logs
self.logger.info('Subtask: {}'.format(line.rstrip('\n')))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u884c' in position 
58: ordinal not in range(128)
```


> Logging causes UnicodeEncodeError when using Japanese characters
> 
>
> Key: AIRFLOW-1144
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1144
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging, worker
>Affects Versions: 1.8.0
>Reporter: Sushant Karki
>
> I am using the bash operator to pipe a sql dump to my database. Since, the 
> encoding of my psql client is Japanese, the output displays some Japanese 
> characters. Whenever the logger tries to log the output, it raises a 
> UnicodeEncodeError.
> Here are the details of the error.
> {code}
> Exception in thread Thread-1:
> Traceback (most recent call last):
>   File "/usr/lib64/python2.7/threading.py", line 813, in __bootstrap_inner
> self.run()
>   File "/usr/lib64/python2.7/threading.py", line 766, in run
> self.__target(*self.__args, **self.__kwargs)
>   File 
> "/home/karki/virtualenv/master/local/lib/python2.7/site-packages/airflow/task_runner/base_task_runner.py",
>  line 95, in _read_task_logs
> self.logger.info('Subtask: {}'.format(line.rstrip('\n')))
> UnicodeEncodeError: 'ascii' codec can't encode character u'\u884c' in 
> position 58: ordinal not in range(128)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-1144) Logging causes UnicodeEncodeError when using Japanese characters

2017-04-25 Thread Sushant Karki (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushant Karki updated AIRFLOW-1144:
---
Description: 
I am using the bash operator to pipe a sql dump to my database. Since, the 
encoding of my psql client is Japanese, the output displays some Japanese 
characters. Whenever the logger tries to log the output, it raises a 
UnicodeEncodeError.
Here are the details of the error.
```
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/threading.py", line 813, in __bootstrap_inner
self.run()
  File "/usr/lib64/python2.7/threading.py", line 766, in run
self.__target(*self.__args, **self.__kwargs)
  File 
"/home/karki/virtualenv/master/local/lib/python2.7/site-packages/airflow/task_runner/base_task_runner.py",
 line 95, in _read_task_logs
self.logger.info('Subtask: {}'.format(line.rstrip('\n')))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u884c' in position 
58: ordinal not in range(128)
```

> Logging causes UnicodeEncodeError when using Japanese characters
> 
>
> Key: AIRFLOW-1144
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1144
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging, worker
>Affects Versions: 1.8.0
>Reporter: Sushant Karki
>
> I am using the bash operator to pipe a sql dump to my database. Since, the 
> encoding of my psql client is Japanese, the output displays some Japanese 
> characters. Whenever the logger tries to log the output, it raises a 
> UnicodeEncodeError.
> Here are the details of the error.
> ```
> Exception in thread Thread-1:
> Traceback (most recent call last):
>   File "/usr/lib64/python2.7/threading.py", line 813, in __bootstrap_inner
> self.run()
>   File "/usr/lib64/python2.7/threading.py", line 766, in run
> self.__target(*self.__args, **self.__kwargs)
>   File 
> "/home/karki/virtualenv/master/local/lib/python2.7/site-packages/airflow/task_runner/base_task_runner.py",
>  line 95, in _read_task_logs
> self.logger.info('Subtask: {}'.format(line.rstrip('\n')))
> UnicodeEncodeError: 'ascii' codec can't encode character u'\u884c' in 
> position 58: ordinal not in range(128)
> ```



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-1144) Logging causes UnicodeEncodeError when using Japanese characters

2017-04-25 Thread Sushant Karki (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushant Karki updated AIRFLOW-1144:
---
Description: 
I am using the bash operator to pipe a sql dump to my database. Since, the 
encoding of my psql client is Japanese, the output displays some Japanese 
characters. Whenever the logger tries to log the output, it raises a 
UnicodeEncodeError.
Here are the details of the error.

```
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/threading.py", line 813, in __bootstrap_inner
self.run()
  File "/usr/lib64/python2.7/threading.py", line 766, in run
self.__target(*self.__args, **self.__kwargs)
  File 
"/home/karki/virtualenv/master/local/lib/python2.7/site-packages/airflow/task_runner/base_task_runner.py",
 line 95, in _read_task_logs
self.logger.info('Subtask: {}'.format(line.rstrip('\n')))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u884c' in position 
58: ordinal not in range(128)
```

  was:
I am using the bash operator to pipe a sql dump to my database. Since, the 
encoding of my psql client is Japanese, the output displays some Japanese 
characters. Whenever the logger tries to log the output, it raises a 
UnicodeEncodeError.
Here are the details of the error.
```
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/threading.py", line 813, in __bootstrap_inner
self.run()
  File "/usr/lib64/python2.7/threading.py", line 766, in run
self.__target(*self.__args, **self.__kwargs)
  File 
"/home/karki/virtualenv/master/local/lib/python2.7/site-packages/airflow/task_runner/base_task_runner.py",
 line 95, in _read_task_logs
self.logger.info('Subtask: {}'.format(line.rstrip('\n')))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u884c' in position 
58: ordinal not in range(128)
```


> Logging causes UnicodeEncodeError when using Japanese characters
> 
>
> Key: AIRFLOW-1144
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1144
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: logging, worker
>Affects Versions: 1.8.0
>Reporter: Sushant Karki
>
> I am using the bash operator to pipe a sql dump to my database. Since, the 
> encoding of my psql client is Japanese, the output displays some Japanese 
> characters. Whenever the logger tries to log the output, it raises a 
> UnicodeEncodeError.
> Here are the details of the error.
> ```
> Exception in thread Thread-1:
> Traceback (most recent call last):
>   File "/usr/lib64/python2.7/threading.py", line 813, in __bootstrap_inner
> self.run()
>   File "/usr/lib64/python2.7/threading.py", line 766, in run
> self.__target(*self.__args, **self.__kwargs)
>   File 
> "/home/karki/virtualenv/master/local/lib/python2.7/site-packages/airflow/task_runner/base_task_runner.py",
>  line 95, in _read_task_logs
> self.logger.info('Subtask: {}'.format(line.rstrip('\n')))
> UnicodeEncodeError: 'ascii' codec can't encode character u'\u884c' in 
> position 58: ordinal not in range(128)
> ```



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1144) Logging causes UnicodeEncodeError when using Japanese characters

2017-04-25 Thread Sushant Karki (JIRA)
Sushant Karki created AIRFLOW-1144:
--

 Summary: Logging causes UnicodeEncodeError when using Japanese 
characters
 Key: AIRFLOW-1144
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1144
 Project: Apache Airflow
  Issue Type: Bug
  Components: logging, worker
Affects Versions: 1.8.0
Reporter: Sushant Karki






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)