[GitHub] [airflow] ashb commented on a change in pull request #8227: Add run_type to DagRun

2020-06-03 Thread GitBox


ashb commented on a change in pull request #8227:
URL: https://github.com/apache/airflow/pull/8227#discussion_r434626388



##
File path: airflow/jobs/scheduler_job.py
##
@@ -26,7 +26,7 @@
 import time
 from collections import defaultdict
 from contextlib import redirect_stderr, redirect_stdout, suppress
-from datetime import timedelta
+from datetime import datetime, timedelta

Review comment:
   ```suggestion
   from datetime import timedelta
   ```
   
   (Unused import error from tests.)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ashb commented on a change in pull request #8227: Add run_type to DagRun

2020-06-03 Thread GitBox


ashb commented on a change in pull request #8227:
URL: https://github.com/apache/airflow/pull/8227#discussion_r434621791



##
File path: airflow/www/views.py
##
@@ -2493,8 +2492,8 @@ class DagRunModelView(AirflowModelView):
 base_permissions = ['can_list', 'can_add']
 
 add_columns = ['state', 'dag_id', 'execution_date', 'run_id', 
'external_trigger', 'conf']
-list_columns = ['state', 'dag_id', 'execution_date', 'run_id', 
'external_trigger', 'conf']
-search_columns = ['state', 'dag_id', 'execution_date', 'run_id', 
'external_trigger', 'conf']
+list_columns = ['state', 'dag_id', 'execution_date', 'run_id', 
'external_trigger', 'conf', 'run_type']
+search_columns = ['state', 'dag_id', 'execution_date', 'run_id', 
'external_trigger', 'conf', 'run_type']

Review comment:
   Since this is the order the columns are displayed in the UI I think this 
would be a more useful order
   
   ```suggestion
   list_columns = ['state', 'dag_id', 'execution_date', 'run_id', 
'run_type', 'external_trigger', 'conf']
   search_columns = ['state', 'dag_id', 'execution_date', 'run_id', 
'run_type', 'external_trigger', 'conf']
   ```
   
   (specifically before the possibly quite unwieldy conf.)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ashb commented on a change in pull request #8227: Add run_type to DagRun

2020-06-01 Thread GitBox


ashb commented on a change in pull request #8227:
URL: https://github.com/apache/airflow/pull/8227#discussion_r433173638



##
File path: airflow/models/dag.py
##
@@ -1468,14 +1472,28 @@ def create_dagrun(self,
 :param session: database session
 :type session: sqlalchemy.orm.session.Session
 """
+if run_id:

Review comment:
   Missing this change

##
File path: airflow/migrations/versions/3c20cacc0044_add_dagrun_run_type.py
##
@@ -0,0 +1,58 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+Add DagRun run_type
+
+Revision ID: 3c20cacc0044
+Revises: 952da73b5eff
+Create Date: 2020-04-08 13:35:25.671327
+
+"""
+
+import sqlalchemy as sa
+from alembic import op
+
+from airflow.models import DagRun
+from airflow.utils.types import DagRunType
+
+# revision identifiers, used by Alembic.
+revision = "3c20cacc0044"
+down_revision = "952da73b5eff"
+branch_labels = None
+depends_on = None
+
+
+def upgrade():
+"""Apply Add DagRun run_type"""
+op.add_column("dag_run", sa.Column("run_type", sa.String(length=50), 
nullable=True))
+
+connection = op.get_bind()
+sessionmaker = sa.orm.sessionmaker()
+session = sessionmaker(bind=connection)
+
+for run_type in DagRunType:
+
session.query(DagRun).filter(DagRun.run_id.like(f"{run_type.value}__%")).update(
+{DagRun.run_type: run_type.value}, synchronize_session=False
+)

Review comment:
   This needs doing





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ashb commented on a change in pull request #8227: Add run_type to DagRun

2020-06-01 Thread GitBox


ashb commented on a change in pull request #8227:
URL: https://github.com/apache/airflow/pull/8227#discussion_r433172534



##
File path: airflow/api/common/experimental/trigger_dag.py
##
@@ -68,15 +68,17 @@ def _trigger_dag(
 execution_date.isoformat(),
 min_dag_start_date.isoformat()))
 
+run_type: Optional[DagRunType] = None
 if not run_id:
-run_id = f"{DagRunType.MANUAL.value}__{execution_date.isoformat()}"
-
-dag_run_id = dag_run.find(dag_id=dag_id, run_id=run_id)
-if dag_run_id:
-raise DagRunAlreadyExists("Run id {} already exists for dag id 
{}".format(
-run_id,
-dag_id
-))
+run_type = DagRunType.MANUAL
+dag_run = dag_run.find(dag_id=dag_id, run_type=run_type, 
execution_date=execution_date)

Review comment:
   Is there is already a scheduled dag_run for this exact time, this 
`find()` will fail to find anything, but the unique constraint (on 
exeuction_date, dag_id) would still be violated.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ashb commented on a change in pull request #8227: Add run_type to DagRun

2020-06-01 Thread GitBox


ashb commented on a change in pull request #8227:
URL: https://github.com/apache/airflow/pull/8227#discussion_r433172534



##
File path: airflow/api/common/experimental/trigger_dag.py
##
@@ -68,15 +68,17 @@ def _trigger_dag(
 execution_date.isoformat(),
 min_dag_start_date.isoformat()))
 
+run_type: Optional[DagRunType] = None
 if not run_id:
-run_id = f"{DagRunType.MANUAL.value}__{execution_date.isoformat()}"
-
-dag_run_id = dag_run.find(dag_id=dag_id, run_id=run_id)
-if dag_run_id:
-raise DagRunAlreadyExists("Run id {} already exists for dag id 
{}".format(
-run_id,
-dag_id
-))
+run_type = DagRunType.MANUAL
+dag_run = dag_run.find(dag_id=dag_id, run_type=run_type, 
execution_date=execution_date)

Review comment:
   Is there is already a scheduled dag_run for this exact time but of a 
different run_type, this `find()` will fail to find anything, but the unique 
constraint (on exeuction_date, dag_id) would still be violated.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ashb commented on a change in pull request #8227: Add run_type to DagRun

2020-05-28 Thread GitBox


ashb commented on a change in pull request #8227:
URL: https://github.com/apache/airflow/pull/8227#discussion_r432152810



##
File path: airflow/models/dag.py
##
@@ -1468,14 +1472,28 @@ def create_dagrun(self,
 :param session: database session
 :type session: sqlalchemy.orm.session.Session
 """
+if run_id:

Review comment:
   ```suggestion
   if run_id and not run_type:
   ```
   
   We should be able to specify a type and an run_id, don't you think? (Right 
now this would blindly overwrite the provided run_type if run_id is specified.)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ashb commented on a change in pull request #8227: Add run_type to DagRun

2020-05-28 Thread GitBox


ashb commented on a change in pull request #8227:
URL: https://github.com/apache/airflow/pull/8227#discussion_r432151494



##
File path: airflow/migrations/versions/3c20cacc0044_add_dagrun_run_type.py
##
@@ -0,0 +1,58 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+Add DagRun run_type
+
+Revision ID: 3c20cacc0044
+Revises: 952da73b5eff
+Create Date: 2020-04-08 13:35:25.671327
+
+"""
+
+import sqlalchemy as sa
+from alembic import op
+
+from airflow.models import DagRun
+from airflow.utils.types import DagRunType
+
+# revision identifiers, used by Alembic.
+revision = "3c20cacc0044"
+down_revision = "952da73b5eff"
+branch_labels = None
+depends_on = None
+
+
+def upgrade():
+"""Apply Add DagRun run_type"""
+op.add_column("dag_run", sa.Column("run_type", sa.String(length=50), 
nullable=True))
+
+connection = op.get_bind()
+sessionmaker = sa.orm.sessionmaker()
+session = sessionmaker(bind=connection)
+
+for run_type in DagRunType:
+
session.query(DagRun).filter(DagRun.run_id.like(f"{run_type.value}__%")).update(
+{DagRun.run_type: run_type.value}, synchronize_session=False
+)

Review comment:
   I don't see this done anywhere ^^





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ashb commented on a change in pull request #8227: Add run_type to DagRun

2020-05-28 Thread GitBox


ashb commented on a change in pull request #8227:
URL: https://github.com/apache/airflow/pull/8227#discussion_r432151089



##
File path: airflow/migrations/versions/3c20cacc0044_add_dagrun_run_type.py
##
@@ -0,0 +1,90 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+Add DagRun run_type
+
+Revision ID: 3c20cacc0044
+Revises: 952da73b5eff
+Create Date: 2020-04-08 13:35:25.671327
+
+"""
+
+import sqlalchemy as sa
+from alembic import op
+from sqlalchemy import Boolean, Column, Integer, PickleType, String
+from sqlalchemy.ext.declarative import declarative_base
+
+from airflow.models.base import ID_LEN
+from airflow.utils import timezone
+from airflow.utils.sqlalchemy import UtcDateTime
+from airflow.utils.state import State
+from airflow.utils.types import DagRunType
+
+# revision identifiers, used by Alembic.
+revision = "3c20cacc0044"
+down_revision = "952da73b5eff"
+branch_labels = None
+depends_on = None
+
+Base = declarative_base()
+
+
+class DagRun(Base):
+"""
+DagRun describes an instance of a Dag. It can be created
+by the scheduler (for regular runs) or by an external trigger
+"""
+__tablename__ = "dag_run"
+
+id = Column(Integer, primary_key=True)
+dag_id = Column(String(ID_LEN))
+execution_date = Column(UtcDateTime, default=timezone.utcnow)
+start_date = Column(UtcDateTime, default=timezone.utcnow)
+end_date = Column(UtcDateTime)
+_state = Column('state', String(50), default=State.RUNNING)
+run_id = Column(String(ID_LEN))
+external_trigger = Column(Boolean, default=True)
+run_type = Column(String(50), nullable=False)
+conf = Column(PickleType)
+
+
+def upgrade():
+"""Apply Add DagRun run_type"""
+op.add_column("dag_run", sa.Column("run_type", sa.String(length=50)))
+op.drop_index("dag_id_state", table_name="dag_run")

Review comment:
   And index on (dag_id,state) is probably used by this code:
   
   ```
   active_runs = DagRun.find(
   dag_id=dag.dag_id,
   state=State.RUNNING,
   external_trigger=False,
   session=session
   )
   ```
   
   Question is if we think DBs are smart of to use 2 out of the 3 columns for 
the new dag_id_state_type index, or if we should keep this one.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ashb commented on a change in pull request #8227: Add run_type to DagRun

2020-05-27 Thread GitBox


ashb commented on a change in pull request #8227:
URL: https://github.com/apache/airflow/pull/8227#discussion_r431190655



##
File path: airflow/models/dagrun.py
##
@@ -54,25 +54,27 @@ class DagRun(Base, LoggingMixin):
 _state = Column('state', String(50), default=State.RUNNING)
 run_id = Column(String(ID_LEN))
 external_trigger = Column(Boolean, default=True)
+run_type = Column(String(50), nullable=True)
 conf = Column(PickleType)
 
 dag = None
 
 __table_args__ = (
-Index('dag_id_state', dag_id, _state),
+Index('dag_id_state_type', dag_id, _state, run_type),

Review comment:
   When do we filter by Dag id and run type?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ashb commented on a change in pull request #8227: Add run_type to DagRun

2020-05-19 Thread GitBox


ashb commented on a change in pull request #8227:
URL: https://github.com/apache/airflow/pull/8227#discussion_r427449286



##
File path: airflow/models/dagrun.py
##
@@ -65,14 +66,15 @@ class DagRun(Base, LoggingMixin):
 )
 
 def __init__(self, dag_id=None, run_id=None, execution_date=None, 
start_date=None, external_trigger=None,
- conf=None, state=None):
+ conf=None, state=None, run_type=None):
 self.dag_id = dag_id
 self.run_id = run_id
 self.execution_date = execution_date
 self.start_date = start_date
 self.external_trigger = external_trigger
 self.conf = conf
 self.state = state
+self.run_type = run_type

Review comment:
   (As Kamil already commented. Missed that comment somehow)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ashb commented on a change in pull request #8227: Add run_type to DagRun

2020-05-19 Thread GitBox


ashb commented on a change in pull request #8227:
URL: https://github.com/apache/airflow/pull/8227#discussion_r427448548



##
File path: airflow/migrations/versions/3c20cacc0044_add_dagrun_run_type.py
##
@@ -0,0 +1,93 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+Add DagRun run_type
+
+Revision ID: 3c20cacc0044
+Revises: 952da73b5eff
+Create Date: 2020-04-08 13:35:25.671327
+
+"""
+
+import sqlalchemy as sa
+from alembic import op
+from sqlalchemy import Boolean, Column, Integer, PickleType, String
+from sqlalchemy.engine.reflection import Inspector
+from sqlalchemy.ext.declarative import declarative_base
+
+from airflow.models.base import ID_LEN
+from airflow.utils import timezone
+from airflow.utils.sqlalchemy import UtcDateTime
+from airflow.utils.state import State
+from airflow.utils.types import DagRunType
+
+# revision identifiers, used by Alembic.
+revision = "3c20cacc0044"
+down_revision = "952da73b5eff"
+branch_labels = None
+depends_on = None
+
+Base = declarative_base()
+
+
+class DagRun(Base):
+"""
+DagRun describes an instance of a Dag. It can be created
+by the scheduler (for regular runs) or by an external trigger
+"""
+__tablename__ = "dag_run"
+
+id = Column(Integer, primary_key=True)
+dag_id = Column(String(ID_LEN))
+execution_date = Column(UtcDateTime, default=timezone.utcnow)
+start_date = Column(UtcDateTime, default=timezone.utcnow)
+end_date = Column(UtcDateTime)
+_state = Column('state', String(50), default=State.RUNNING)
+run_id = Column(String(ID_LEN))
+external_trigger = Column(Boolean, default=True)
+run_type = Column(String(50))
+conf = Column(PickleType)
+
+
+def upgrade():
+"""Apply Add DagRun run_type"""
+op.add_column("dag_run", sa.Column("run_type", sa.String(length=50), 
nullable=True))
+
+connection = op.get_bind()
+sessionmaker = sa.orm.sessionmaker()
+session = sessionmaker(bind=connection)
+inspector = Inspector.from_engine(connection)
+tables = inspector.get_table_names()
+
+if 'dag_run' in tables:
+for run_type in DagRunType:
+
session.query(DagRun).filter(DagRun.run_id.like(f"{run_type.value}__%")).update(
+{DagRun.run_type: run_type.value}, synchronize_session=False
+)
+session.commit()
+
+session.query(DagRun).filter(DagRun.run_type.is_(None)).update(
+{DagRun.run_type: DagRunType.MANUAL.value}, 
synchronize_session=False
+)
+session.commit()

Review comment:
   Yeah, otherwise the tests would have failed cos the model and DB 
wouldn't match.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ashb commented on a change in pull request #8227: Add run_type to DagRun

2020-05-19 Thread GitBox


ashb commented on a change in pull request #8227:
URL: https://github.com/apache/airflow/pull/8227#discussion_r427447731



##
File path: airflow/models/dagrun.py
##
@@ -65,14 +66,15 @@ class DagRun(Base, LoggingMixin):
 )
 
 def __init__(self, dag_id=None, run_id=None, execution_date=None, 
start_date=None, external_trigger=None,
- conf=None, state=None):
+ conf=None, state=None, run_type=None):
 self.dag_id = dag_id
 self.run_id = run_id
 self.execution_date = execution_date
 self.start_date = start_date
 self.external_trigger = external_trigger
 self.conf = conf
 self.state = state
+self.run_type = run_type

Review comment:
   
https://github.com/apache/airflow/blob/bae5cc2f5ca32e0f61c3b92008fbd484184448ef/airflow/jobs/scheduler_job.py#L1158-L1188





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ashb commented on a change in pull request #8227: Add run_type to DagRun

2020-05-19 Thread GitBox


ashb commented on a change in pull request #8227:
URL: https://github.com/apache/airflow/pull/8227#discussion_r427415675



##
File path: airflow/models/dagrun.py
##
@@ -65,14 +66,15 @@ class DagRun(Base, LoggingMixin):
 )
 
 def __init__(self, dag_id=None, run_id=None, execution_date=None, 
start_date=None, external_trigger=None,
- conf=None, state=None):
+ conf=None, state=None, run_type=None):
 self.dag_id = dag_id
 self.run_id = run_id
 self.execution_date = execution_date
 self.start_date = start_date
 self.external_trigger = external_trigger
 self.conf = conf
 self.state = state
+self.run_type = run_type

Review comment:
   The scheduler (well dag parsing process) looks at DagRuns where type is 
not Backfill.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ashb commented on a change in pull request #8227: Add run_type to DagRun

2020-04-23 Thread GitBox


ashb commented on a change in pull request #8227:
URL: https://github.com/apache/airflow/pull/8227#discussion_r414098939



##
File path: airflow/utils/types.py
##
@@ -22,3 +22,13 @@ class DagRunType(enum.Enum):
 BACKFILL_JOB = "backfill"
 SCHEDULED = "scheduled"
 MANUAL = "manual"
+
+@staticmethod
+def resolve_run_type(run_id: str) -> "DagRunType":
+"""
+Resolved DagRun type from run_id.
+"""
+for run_type in DagRunType:
+if run_id.startswith(run_type.value):

Review comment:
   ```suggestion
   if run_id.startswith(run_type.value + "__"):
   ```
   
   no?

##
File path: airflow/migrations/versions/3c20cacc0044_add_dagrun_run_type.py
##
@@ -0,0 +1,101 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+Add DagRun run_type
+
+Revision ID: 3c20cacc0044
+Revises: 952da73b5eff
+Create Date: 2020-04-08 13:35:25.671327
+
+"""
+
+import sqlalchemy as sa
+from alembic import op
+from sqlalchemy import Boolean, Column, Integer, PickleType, String
+from sqlalchemy.engine.reflection import Inspector
+from sqlalchemy.ext.declarative import declarative_base
+
+from airflow.models.base import ID_LEN
+from airflow.utils import timezone
+from airflow.utils.sqlalchemy import UtcDateTime
+from airflow.utils.state import State
+from airflow.utils.types import DagRunType
+
+# revision identifiers, used by Alembic.
+revision = "3c20cacc0044"
+down_revision = "952da73b5eff"
+branch_labels = None
+depends_on = None
+
+Base = declarative_base()
+
+
+class DagRun(Base):
+"""
+DagRun describes an instance of a Dag. It can be created
+by the scheduler (for regular runs) or by an external trigger
+"""
+__tablename__ = "dag_run"
+
+id = Column(Integer, primary_key=True)
+dag_id = Column(String(ID_LEN))
+execution_date = Column(UtcDateTime, default=timezone.utcnow)
+start_date = Column(UtcDateTime, default=timezone.utcnow)
+end_date = Column(UtcDateTime)
+_state = Column('state', String(50), default=State.RUNNING)
+run_id = Column(String(ID_LEN))
+external_trigger = Column(Boolean, default=True)
+run_type = Column(String(50))
+conf = Column(PickleType)
+
+
+def upgrade():
+"""Apply Add DagRun run_type"""
+op.add_column("dag_run", sa.Column("run_type", sa.String(length=50), 
nullable=True))
+op.drop_constraint("dag_run_dag_id_execution_date_key", "dag_run", 
"unique")
+op.create_unique_constraint(None, "dag_run", ('dag_id', 'execution_date', 
'run_type'))

Review comment:
   What's going on here? Adding a type column shouldn't need to change the 
uniqueness constraints on the table.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ashb commented on a change in pull request #8227: Add run_type to DagRun

2020-04-23 Thread GitBox


ashb commented on a change in pull request #8227:
URL: https://github.com/apache/airflow/pull/8227#discussion_r414096043



##
File path: airflow/migrations/versions/3c20cacc0044_add_dagrun_run_type.py
##
@@ -0,0 +1,101 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+Add DagRun run_type
+
+Revision ID: 3c20cacc0044
+Revises: 952da73b5eff
+Create Date: 2020-04-08 13:35:25.671327
+
+"""
+
+import sqlalchemy as sa
+from alembic import op
+from sqlalchemy import Boolean, Column, Integer, PickleType, String
+from sqlalchemy.engine.reflection import Inspector
+from sqlalchemy.ext.declarative import declarative_base
+
+from airflow.models.base import ID_LEN
+from airflow.utils import timezone
+from airflow.utils.sqlalchemy import UtcDateTime
+from airflow.utils.state import State
+from airflow.utils.types import DagRunType
+
+# revision identifiers, used by Alembic.
+revision = "3c20cacc0044"
+down_revision = "952da73b5eff"
+branch_labels = None
+depends_on = None
+
+Base = declarative_base()
+
+
+class DagRun(Base):
+"""
+DagRun describes an instance of a Dag. It can be created
+by the scheduler (for regular runs) or by an external trigger
+"""
+__tablename__ = "dag_run"
+
+id = Column(Integer, primary_key=True)
+dag_id = Column(String(ID_LEN))
+execution_date = Column(UtcDateTime, default=timezone.utcnow)
+start_date = Column(UtcDateTime, default=timezone.utcnow)
+end_date = Column(UtcDateTime)
+_state = Column('state', String(50), default=State.RUNNING)
+run_id = Column(String(ID_LEN))
+external_trigger = Column(Boolean, default=True)
+run_type = Column(String(50))
+conf = Column(PickleType)
+
+
+def upgrade():
+"""Apply Add DagRun run_type"""
+op.add_column("dag_run", sa.Column("run_type", sa.String(length=50), 
nullable=True))
+op.drop_constraint("dag_run_dag_id_execution_date_key", "dag_run", 
"unique")
+op.create_unique_constraint(None, "dag_run", ('dag_id', 'execution_date', 
'run_type'))
+op.drop_index('dag_id_state', table_name='dag_run')
+op.create_index('dag_id_state_run_type', 'dag_run', ['dag_id', 'state', 
'run_type'], unique=False)
+
+connection = op.get_bind()
+sessionmaker = sa.orm.sessionmaker()
+session = sessionmaker(bind=connection)
+inspector = Inspector.from_engine(connection)
+tables = inspector.get_table_names()
+
+if 'dag_run' in tables:

Review comment:
   This shouldn't ever be needed -- when this migration is run the dag_run 
table has to exist.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ashb commented on a change in pull request #8227: Add run_type to DagRun

2020-04-23 Thread GitBox


ashb commented on a change in pull request #8227:
URL: https://github.com/apache/airflow/pull/8227#discussion_r414094936



##
File path: UPDATING.md
##
@@ -62,6 +62,41 @@ https://developers.google.com/style/inclusive-documentation
 
 -->
 
+### DAG.create_dagrun accepts run_type and does not require run_id
+This change is caused by adding `run_type` column to `DagRun`.
+
+Previous signature:
+```python
+def create_dagrun(self,
+  run_id,
+  state,
+  execution_date=None,
+  start_date=None,
+  external_trigger=False,
+  conf=None,
+  session=None):
+```
+current:
+```python
+def create_dagrun(self,
+  state,
+  execution_date=None,
+  run_id=None,
+  start_date=None,
+  external_trigger=False,
+  conf=None,
+  run_type=None,

Review comment:
   Run_type should probably be required, no?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ashb commented on a change in pull request #8227: Add run_type to DagRun

2020-04-09 Thread GitBox
ashb commented on a change in pull request #8227: Add run_type to DagRun
URL: https://github.com/apache/airflow/pull/8227#discussion_r406122863
 
 

 ##
 File path: airflow/api/common/experimental/mark_tasks.py
 ##
 @@ -48,11 +48,11 @@ def _create_dagruns(dag, execution_dates, state, run_type):
 
 for date in dates_to_create:
 dag_run = dag.create_dagrun(
-run_id=f"{run_type}__{date.isoformat()}",
 
 Review comment:
   I think run_type should be non-null in the db, yes. So perhaps make run_type 
required, but run_id can be overridden.
   
   We could detect run_type based on run_id, it would make it easier for anyone 
to upgrade, sure.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on a change in pull request #8227: Add run_type to DagRun

2020-04-09 Thread GitBox
ashb commented on a change in pull request #8227: Add run_type to DagRun
URL: https://github.com/apache/airflow/pull/8227#discussion_r406108915
 
 

 ##
 File path: airflow/models/dagrun.py
 ##
 @@ -65,14 +66,15 @@ class DagRun(Base, LoggingMixin):
 )
 
 def __init__(self, dag_id=None, run_id=None, execution_date=None, 
start_date=None, external_trigger=None,
- conf=None, state=None):
+ conf=None, state=None, run_type=None):
 self.dag_id = dag_id
 self.run_id = run_id
 self.execution_date = execution_date
 self.start_date = start_date
 self.external_trigger = external_trigger
 self.conf = conf
 self.state = state
+self.run_type = run_type
 
 Review comment:
   Yeah, we should look at how we query dag_run table, as I'd guess we want a 
multi-column index on run_type + something else (exeuction date? dag id?) as we 
don't (I think) ever select just by run_type?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on a change in pull request #8227: Add run_type to DagRun

2020-04-09 Thread GitBox
ashb commented on a change in pull request #8227: Add run_type to DagRun
URL: https://github.com/apache/airflow/pull/8227#discussion_r406108194
 
 

 ##
 File path: airflow/migrations/versions/3c20cacc0044_add_dagrun_run_type.py
 ##
 @@ -0,0 +1,58 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+Add DagRun run_type
+
+Revision ID: 3c20cacc0044
+Revises: 952da73b5eff
+Create Date: 2020-04-08 13:35:25.671327
+
+"""
+
+import sqlalchemy as sa
+from alembic import op
+
+from airflow.models import DagRun
+from airflow.utils.types import DagRunType
+
+# revision identifiers, used by Alembic.
+revision = "3c20cacc0044"
+down_revision = "952da73b5eff"
+branch_labels = None
+depends_on = None
+
+
+def upgrade():
+"""Apply Add DagRun run_type"""
+op.add_column("dag_run", sa.Column("run_type", sa.String(length=50), 
nullable=True))
+
+connection = op.get_bind()
+sessionmaker = sa.orm.sessionmaker()
+session = sessionmaker(bind=connection)
+
+for run_type in DagRunType:
+
session.query(DagRun).filter(DagRun.run_id.like(f"{run_type.value}__%")).update(
+{DagRun.run_type: run_type.value}, synchronize_session=False
+)
 
 Review comment:
   Run_id can be anything in case of a manual run, so I think we also need a 
`SET run_type = 'manual' where run_type is null` afterwards.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on a change in pull request #8227: Add run_type to DagRun

2020-04-09 Thread GitBox
ashb commented on a change in pull request #8227: Add run_type to DagRun
URL: https://github.com/apache/airflow/pull/8227#discussion_r406107134
 
 

 ##
 File path: airflow/api/common/experimental/mark_tasks.py
 ##
 @@ -48,11 +48,11 @@ def _create_dagruns(dag, execution_dates, state, run_type):
 
 for date in dates_to_create:
 dag_run = dag.create_dagrun(
-run_id=f"{run_type}__{date.isoformat()}",
 
 Review comment:
   Oh I see, defaulted/handled in create_dagrun, nice.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on a change in pull request #8227: Add run_type to DagRun

2020-04-09 Thread GitBox
ashb commented on a change in pull request #8227: Add run_type to DagRun
URL: https://github.com/apache/airflow/pull/8227#discussion_r406106635
 
 

 ##
 File path: airflow/migrations/versions/3c20cacc0044_add_dagrun_run_type.py
 ##
 @@ -0,0 +1,58 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+Add DagRun run_type
+
+Revision ID: 3c20cacc0044
+Revises: 952da73b5eff
+Create Date: 2020-04-08 13:35:25.671327
+
+"""
+
+import sqlalchemy as sa
+from alembic import op
+
+from airflow.models import DagRun
 
 Review comment:
   Yup, see #8176 for an example of how to do this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on a change in pull request #8227: Add run_type to DagRun

2020-04-09 Thread GitBox
ashb commented on a change in pull request #8227: Add run_type to DagRun
URL: https://github.com/apache/airflow/pull/8227#discussion_r406106150
 
 

 ##
 File path: airflow/api/common/experimental/mark_tasks.py
 ##
 @@ -48,11 +48,11 @@ def _create_dagruns(dag, execution_dates, state, run_type):
 
 for date in dates_to_create:
 dag_run = dag.create_dagrun(
-run_id=f"{run_type}__{date.isoformat()}",
 
 Review comment:
   What happend to run_id? I think we still need this, even if we don't use the 
prefix for detecting the type


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services