[jira] [Created] (AIRFLOW-2150) Use get_partition_names() instead of get_partitions() in HiveMetastoreHook().max_partition()

2018-02-26 Thread Kevin Yang (JIRA)
Kevin Yang created AIRFLOW-2150:
---

 Summary: Use get_partition_names() instead of get_partitions() in 
HiveMetastoreHook().max_partition()
 Key: AIRFLOW-2150
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2150
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Kevin Yang
Assignee: Kevin Yang


get_partitions() is extremely expensive for large tables, max_partition() 
should be using get_partition_names() instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2149) Add documentation link to create a Cloud DataFlow self executing jar

2018-02-26 Thread Lorenzo Caggioni (JIRA)
Lorenzo Caggioni created AIRFLOW-2149:
-

 Summary: Add documentation link to create a Cloud DataFlow self 
executing jar
 Key: AIRFLOW-2149
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2149
 Project: Apache Airflow
  Issue Type: Improvement
  Components: contrib
Reporter: Lorenzo Caggioni


Add documentation link to create a Cloud DataFlow self executing jar



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-2149) Add documentation link to create a Cloud DataFlow self executing jar

2018-02-26 Thread Lorenzo Caggioni (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lorenzo Caggioni reassigned AIRFLOW-2149:
-

Assignee: Lorenzo Caggioni

> Add documentation link to create a Cloud DataFlow self executing jar
> 
>
> Key: AIRFLOW-2149
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2149
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Reporter: Lorenzo Caggioni
>Assignee: Lorenzo Caggioni
>Priority: Trivial
>
> Add documentation link to create a Cloud DataFlow self executing jar



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-401) scheduler gets stuck without a trace

2018-02-26 Thread Hongchang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376911#comment-16376911
 ] 

Hongchang Li commented on AIRFLOW-401:
--

same issue, with airflow 1.8.2, python 2.7, with _parallelism_ =X child 
processes spawned, and one  got "defunct" status.

Add more logs of the stuck workflow, seems that it was related to DB deadlock. 
{code:java}
[2018-02-26 19:13:53,071] {base_task_runner.py:95} INFO - Subtask: Starting 
attempt 1 of 2
[2018-02-26 19:13:53,071] {base_task_runner.py:95} INFO - Subtask: 

[2018-02-26 19:13:53,071] {base_task_runner.py:95} INFO - Subtask: 
[2018-02-26 19:13:53,082] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 
19:13:53,081] {models.py:1358} INFO - Executing  on 2018-02-26 18:10:00
[2018-02-26 19:13:53,294] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 
19:13:53,293] {models.py:1128} INFO - Dependencies all met for 
[2018-02-26 19:13:53,301] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 
19:13:53,301] {base_executor.py:50} INFO - Adding to queue: airflow run 
x.HourLoginInfo HourLoginInfo-7001 2018-02-26T18:10:00 --local --pool 
cp_backfill -sd DAGS_FOLDER/x.py
[2018-02-26 19:13:53,320] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 
19:13:53,319] {models.py:1128} INFO - Dependencies all met for 
[2018-02-26 19:13:53,330] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 
19:13:53,330] {base_executor.py:50} INFO - Adding to queue: airflow run 
x.HourLoginInfo HourLoginInfo-7003 2018-02-26T18:10:00 --local --pool 
cp_backfill -sd DAGS_FOLDER/x.py
[2018-02-26 19:13:53,373] {base_task_runner.py:95} INFO - Subtask: [2018-02-26 
19:13:53,336] {models.py:1433} ERROR - (_mysql_exceptions.OperationalError) 
(1213, 'Deadlock found when trying to get lock; try restarting transaction') 
[SQL: u'UPDATE task_instance SET state=%s WHERE task_instance.task_id = %s AND 
task_instance.dag_id = %s AND task_instance.execution_date = %s'] [parameters: 
(u'queued', 'HourLoginInfo-7003', 'x.HourLoginInfo', 
datetime.datetime(2018, 2, 26, 18, 10))] (Background on this error at: 
http://sqlalche.me/e/e3q8)
[2018-02-26 19:13:53,374] {base_task_runner.py:95} INFO - Subtask: Traceback 
(most recent call last):
[2018-02-26 19:13:53,374] {base_task_runner.py:95} INFO - Subtask: File 
"/opt/rh/python27/root/usr/lib/python2.7/site-packages/airflow/models.py", line 
1390, in run
[2018-02-26 19:13:53,374] {base_task_runner.py:95} INFO - Subtask: result = 
task_copy.execute(context=context)
[2018-02-26 19:13:53,374] {base_task_runner.py:95} INFO - Subtask: File 
"/opt/rh/python27/root/usr/lib/python2.7/site-packages/airflow/operators/subdag_operator.py",
 line 88, in execute
[2018-02-26 19:13:53,374] {base_task_runner.py:95} INFO - Subtask: 
executor=self.executor)
[2018-02-26 19:13:53,375] {base_task_runner.py:95} INFO - Subtask: File 
"/opt/rh/python27/root/usr/lib/python2.7/site-packages/airflow/models.py", line 
3414, in run
[2018-02-26 19:13:53,375] {base_task_runner.py:95} INFO - Subtask: job.run()
[2018-02-26 19:13:53,375] {base_task_runner.py:95} INFO - Subtask: File 
"/opt/rh/python27/root/usr/lib/python2.7/site-packages/airflow/jobs.py", line 
201, in run
[2018-02-26 19:13:53,375] {base_task_runner.py:95} INFO - Subtask: 
self._execute()
[2018-02-26 19:13:53,376] {base_task_runner.py:95} INFO - Subtask: File 
"/opt/rh/python27/root/usr/lib/python2.7/site-packages/airflow/jobs.py", line 
1944, in _execute
[2018-02-26 19:13:53,376] {base_task_runner.py:95} INFO - Subtask: 
session.commit()
[2018-02-26 19:13:53,376] {base_task_runner.py:95} INFO - Subtask: File 
"/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py",
 line 943, in commit
[2018-02-26 19:13:53,376] {base_task_runner.py:95} INFO - Subtask: 
self.transaction.commit()
[2018-02-26 19:13:53,376] {base_task_runner.py:95} INFO - Subtask: File 
"/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py",
 line 467, in commit
[2018-02-26 19:13:53,376] {base_task_runner.py:95} INFO - Subtask: 
self._prepare_impl()
[2018-02-26 19:13:53,377] {base_task_runner.py:95} INFO - Subtask: File 
"/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py",
 line 447, in _prepare_impl
[2018-02-26 19:13:53,377] {base_task_runner.py:95} INFO - Subtask: 
self.session.flush()
[2018-02-26 19:13:53,377] {base_task_runner.py:95} INFO - Subtask: File 
"/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py",
 line 2243, in flush
[2018-02-26 19:13:53,377] {base_task_runner.py:95} INFO - Subtask: 
self._flush(objects)
[2018-02-26 19:13:53,377] {base_task_runner.py:95} INFO - Subtask: File 
"/opt/rh/python27/root/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py",
 line 2369,