[GitHub] [airflow] potiuk commented on a change in pull request #6950: [AIRFLOW-6392] Remove cyclic dependency baseoperator <-> helpers
potiuk commented on a change in pull request #6950: [AIRFLOW-6392] Remove cyclic dependency baseoperator <-> helpers URL: https://github.com/apache/airflow/pull/6950#discussion_r362053242 ## File path: UPDATING.md ## @@ -57,6 +57,31 @@ https://developers.google.com/style/inclusive-documentation --> +### Chain and cross_downstream moved from helpers to BaseOperator + +The chain and cross_downstream methods are now moved to airflow.models.baseoperator module from Review comment: It's already solved as well :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on a change in pull request #6950: [AIRFLOW-6392] Remove cyclic dependency baseoperator <-> helpers
potiuk commented on a change in pull request #6950: [AIRFLOW-6392] Remove cyclic dependency baseoperator <-> helpers URL: https://github.com/apache/airflow/pull/6950#discussion_r362014321 ## File path: airflow/utils/helpers.py ## @@ -142,83 +135,6 @@ def as_flattened_list(iterable): return [e for i in iterable for e in i] -def chain(*tasks): Review comment: Agree. Static was bad idea. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on a change in pull request #6950: [AIRFLOW-6392] Remove cyclic dependency baseoperator <-> helpers
potiuk commented on a change in pull request #6950: [AIRFLOW-6392] Remove cyclic dependency baseoperator <-> helpers URL: https://github.com/apache/airflow/pull/6950#discussion_r362014068 ## File path: UPDATING.md ## @@ -57,6 +57,31 @@ https://developers.google.com/style/inclusive-documentation --> +### Chain and cross_downstream moved from helpers to BaseOperator + +The chain and cross_downstream methods are now moved to airflow.models.baseoperator module from +airflow.utils.helpers module where they belong. + +Helpers module is supposed to contain standalone helper methods that can be imported by all classes. +The `chain` method and `cross_downstream` method both use BaseOperator and if any other package imports +any of the helpers classes it automatically creates an implicit dependency to BaseOperator which Review comment: Yep. I also reworded it slightly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on a change in pull request #6950: [AIRFLOW-6392] Remove cyclic dependency baseoperator <-> helpers
potiuk commented on a change in pull request #6950: [AIRFLOW-6392] Remove cyclic dependency baseoperator <-> helpers URL: https://github.com/apache/airflow/pull/6950#discussion_r362014133 ## File path: UPDATING.md ## @@ -57,6 +57,31 @@ https://developers.google.com/style/inclusive-documentation --> +### Chain and cross_downstream moved from helpers to BaseOperator + +The chain and cross_downstream methods are now moved to airflow.models.baseoperator module from +airflow.utils.helpers module where they belong. + +Helpers module is supposed to contain standalone helper methods that can be imported by all classes. +The `chain` method and `cross_downstream` method both use BaseOperator and if any other package imports +any of the helpers classes it automatically creates an implicit dependency to BaseOperator which +can often lead to cyclic dependencies. + +More infomration in [AIFLOW-6392](https://issues.apache.org/jira/browse/AIRFLOW-6392) + +In Airflow 1.10 you imported those two methods like this: Review comment: This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on a change in pull request #6950: [AIRFLOW-6392] Remove cyclic dependency baseoperator <-> helpers
potiuk commented on a change in pull request #6950: [AIRFLOW-6392] Remove cyclic dependency baseoperator <-> helpers URL: https://github.com/apache/airflow/pull/6950#discussion_r362013565 ## File path: UPDATING.md ## @@ -57,6 +57,31 @@ https://developers.google.com/style/inclusive-documentation --> +### Chain and cross_downstream moved from helpers to BaseOperator + +The chain and cross_downstream methods are now moved to airflow.models.baseoperator module from +airflow.utils.helpers module where they belong. Review comment: Yep. I will also update the description slightly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on a change in pull request #6950: [AIRFLOW-6392] Remove cyclic dependency baseoperator <-> helpers
potiuk commented on a change in pull request #6950: [AIRFLOW-6392] Remove cyclic dependency baseoperator <-> helpers URL: https://github.com/apache/airflow/pull/6950#discussion_r362013100 ## File path: airflow/models/baseoperator.py ## @@ -670,6 +672,7 @@ def __deepcopy__(self, memo): for k, v in self.__dict__.items(): if k not in shallow_copy: +# noinspection PyArgumentList Review comment: I am afraid so. The check is super useful because it detects spelling mistakes in parameters or removed parameters. This is actually a known problem because the memo parameter is by mistake missing in skeleton definition for python skeletons (from typeshed I believe). The issue is described here: https://intellij-support.jetbrains.com/hc/en-us/community/posts/36181224-deepcopy-unexpected-memo-argument and reported here (unresolved): https://youtrack.jetbrains.com/issue/PY-29664 I have not found a good way of silencing it only for deepcopy, so I propose we can leave it as it is. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on a change in pull request #6950: [AIRFLOW-6392] Remove cyclic dependency baseoperator <-> helpers
potiuk commented on a change in pull request #6950: [AIRFLOW-6392] Remove cyclic dependency baseoperator <-> helpers URL: https://github.com/apache/airflow/pull/6950#discussion_r362011152 ## File path: airflow/models/baseoperator.py ## @@ -862,6 +865,7 @@ def clear(self, tasks += [ t.task_id for t in self.get_flat_relatives(upstream=False)] +# noinspection PyUnresolvedReferences Review comment: It's the in_ operator on task_id. I will try to find another way of excluding those SQLAlchemy magic operators as this is a problem that repeats often. For now I remove it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on a change in pull request #6950: [AIRFLOW-6392] Remove cyclic dependency baseoperator <-> helpers
potiuk commented on a change in pull request #6950: [AIRFLOW-6392] Remove cyclic dependency baseoperator <-> helpers URL: https://github.com/apache/airflow/pull/6950#discussion_r362010644 ## File path: tests/models/test_baseoperator.py ## @@ -241,7 +243,46 @@ def test_default_resources(self): task = DummyOperator(task_id="default-resources") self.assertIsNone(task.resources) +# noinspection PyUnresolvedReferences Review comment: https://issues.apache.org/jira/browse/AIRFLOW-6406 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on a change in pull request #6950: [AIRFLOW-6392] Remove cyclic dependency baseoperator <-> helpers
potiuk commented on a change in pull request #6950: [AIRFLOW-6392] Remove cyclic dependency baseoperator <-> helpers URL: https://github.com/apache/airflow/pull/6950#discussion_r362009498 ## File path: tests/models/test_baseoperator.py ## @@ -241,7 +243,46 @@ def test_default_resources(self): task = DummyOperator(task_id="default-resources") self.assertIsNone(task.resources) +# noinspection PyUnresolvedReferences Review comment: It's the qty field. The type hints are missing for Resource type clases and Intellij derives them from __init__ (names of __init__ parameters are the same even though they are different type long <> Resource. I will remove it and create an issue to fix type hints in the Resource classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on a change in pull request #6950: [AIRFLOW-6392] Remove cyclic dependency baseoperator <-> helpers
potiuk commented on a change in pull request #6950: [AIRFLOW-6392] Remove cyclic dependency baseoperator <-> helpers URL: https://github.com/apache/airflow/pull/6950#discussion_r362007720 ## File path: airflow/utils/helpers.py ## @@ -28,15 +28,8 @@ import psutil from jinja2 import Template +from airflow import AirflowException Review comment: Yeah. That is not yet enforced :). The change is in a separate PR to do so. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on a change in pull request #6950: [AIRFLOW-6392] Remove cyclic dependency baseoperator <-> helpers
potiuk commented on a change in pull request #6950: [AIRFLOW-6392] Remove cyclic dependency baseoperator <-> helpers URL: https://github.com/apache/airflow/pull/6950#discussion_r361961106 ## File path: airflow/utils/helpers.py ## @@ -142,83 +135,6 @@ def as_flattened_list(iterable): return [e for i in iterable for e in i] -def chain(*tasks): Review comment: > The existence of the idea of creating an automatic tool does not affect whether we should or should not make changes. I don't agree. By having automated tools we might decide that we deliberately not keep backwards compatibility in some cases more easily - especially if those changes can easily and automatically be migrated in the existing DAGs. "Backwards compatibility" is simply not as important for 2.0 as it was in 1.10. This decision has already been made. We have a number of backwards incompatible changes, there is no going back. > Others look much less used, although there may be someone who finds these functions. But in my opinion, this is the choice of the lesser evil. I am not sure which one is lesser. If we automate migration for chain method it's not evil at all for migration. Especially that finally it makes sense for the chain method to be in baseoperator package (or in some form of operator_helpers but I prefer not to use helpers/managers/etc. if not neccessary). It makes much more sense for the other helpers method to be there - they were there originally, simply adding chains and the other method here later was a mistake that we want to correct now. For me more evil is to have bad naming in future version of airflow. >> It is not possible to import a single static method into a file. You must always import the whole class. A similar opinion was expressed in "Google Python Style Guide": Agree - will make it module function in baseoperator. it makes more sense. It comes from the old Java background where I prefer to have functions in classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on a change in pull request #6950: [AIRFLOW-6392] Remove cyclic dependency baseoperator <-> helpers
potiuk commented on a change in pull request #6950: [AIRFLOW-6392] Remove cyclic dependency baseoperator <-> helpers URL: https://github.com/apache/airflow/pull/6950#discussion_r361908129 ## File path: airflow/utils/helpers.py ## @@ -142,83 +135,6 @@ def as_flattened_list(iterable): return [e for i in iterable for e in i] -def chain(*tasks): Review comment: I agree it's a bit sad (that's why I added UPDATING.md) but we already have a number of incompatibilities and we plan to have a migration tool to handle this https://issues.apache.org/jira/browse/AIRFLOW-6390 (and such automated conversion will be easy to add). There are other functions that look useful - validate_key, reduce_in_chunks, as_tuple, as_flattened_list. I see them as being used by the users - they are not base-operator dependent but still useful either in custom operators or dags. I could of course move the other methods elsewhere, but I think there is an advantage of moving it (and any future similar methods) to BaseOperator. I wonder what others think about it ? Is it worth to introduce such incompatibility or not? I am happy to do both but I think it's quite a bit cleaner as it is now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] potiuk commented on a change in pull request #6950: [AIRFLOW-6392] Remove cyclic dependency baseoperator <-> helpers
potiuk commented on a change in pull request #6950: [AIRFLOW-6392] Remove cyclic dependency baseoperator <-> helpers URL: https://github.com/apache/airflow/pull/6950#discussion_r361906152 ## File path: airflow/utils/helpers.py ## @@ -142,83 +135,6 @@ def as_flattened_list(iterable): return [e for i in iterable for e in i] -def chain(*tasks): -r""" -Given a number of tasks, builds a dependency chain. -Support mix airflow.models.BaseOperator and List[airflow.models.BaseOperator]. -If you want to chain between two List[airflow.models.BaseOperator], have to -make sure they have same length. - -chain(t1, [t2, t3], [t4, t5], t6) - -is equivalent to - - / -> t2 -> t4 \ -t1 -> t6 - \ -> t3 -> t5 / - -t1.set_downstream(t2) -t1.set_downstream(t3) -t2.set_downstream(t4) -t3.set_downstream(t5) -t4.set_downstream(t6) -t5.set_downstream(t6) - -:param tasks: List of tasks or List[airflow.models.BaseOperator] to set dependencies -:type tasks: List[airflow.models.BaseOperator] or airflow.models.BaseOperator -""" -from airflow.models.baseoperator import BaseOperator -for index, up_task in enumerate(tasks[:-1]): -down_task = tasks[index + 1] -if isinstance(up_task, BaseOperator): -up_task.set_downstream(down_task) -elif isinstance(down_task, BaseOperator): -down_task.set_upstream(up_task) -else: -if not isinstance(up_task, Iterable) or not isinstance(down_task, Iterable): -raise TypeError( -'Chain not supported between instances of {up_type} and {down_type}'.format( -up_type=type(up_task), down_type=type(down_task))) -elif len(up_task) != len(down_task): -raise AirflowException( -'Chain not supported different length Iterable but get {up_len} and {down_len}'.format( -up_len=len(up_task), down_len=len(down_task))) -else: -for up, down in zip(up_task, down_task): -up.set_downstream(down) - - -def cross_downstream(from_tasks, to_tasks): -r""" -Set downstream dependencies for all tasks in from_tasks to all tasks in to_tasks. -E.g.: cross_downstream(from_tasks=[t1, t2, t3], to_tasks=[t4, t5, t6]) -Is equivalent to: - -t1 --> t4 - \ / -t2 -X> t5 - / \ -t3 --> t6 - -t1.set_downstream(t4) -t1.set_downstream(t5) -t1.set_downstream(t6) -t2.set_downstream(t4) -t2.set_downstream(t5) -t2.set_downstream(t6) -t3.set_downstream(t4) -t3.set_downstream(t5) -t3.set_downstream(t6) - -:param from_tasks: List of tasks to start from. -:type from_tasks: List[airflow.models.BaseOperator] -:param to_tasks: List of tasks to set as downstream dependencies. -:type to_tasks: List[airflow.models.BaseOperator] -""" -for task in from_tasks: -task.set_downstream(to_tasks) - - def pprinttable(rows): Review comment: It's unrelated :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services