potiuk commented on a change in pull request #18718: URL: https://github.com/apache/airflow/pull/18718#discussion_r723957739
########## File path: airflow/models/connection.py ########## @@ -289,8 +289,11 @@ def rotate_fernet_key(self): if self._extra and self.is_extra_encrypted: self._extra = fernet.rotate(self._extra.encode('utf-8')).decode() - def get_hook(self): - """Return hook based on conn_type.""" + def get_hook(self, **kwargs): Review comment: There is a backwards-compatibility problem with that approach. While this change onm it's own is backwards, compatible and old providers will work, if you implement a change to a provider that adds kwargs to `get_hook`, they will not be usable with already released versions of Airflow. We have A LOT of operators derived from SQLOperator and using `get_hook()` directly. If we just change a signature of get_hook() it will be very easy for anyone to start using it and breaking compatibility with released Airflow version without anyone noticing. I am not saying we should not do it, it's a good change but I think this change should be much more thought out and we need to figure out when and how to do it to avoid disruption. I dop not yet know how to do it best - my current thinking is that we should rather introduce a new method (`get_hook with params()` rather than modify existing ones, and add a pre-commit check that will check if a provider is using that method, and signals that either the provider should have the "additional_dependency" added for Airflow >version X.Y.Z. We would also have to check the usages of SQL Operators the same way and flag problems with dependencies if we see that any of the BaseSQL-derived operators passes `hook_params`. The second part will be much more "brittle" I am afraid so maybe we should figure a bit different way here. Thoughts? ########## File path: airflow/models/connection.py ########## @@ -289,8 +289,11 @@ def rotate_fernet_key(self): if self._extra and self.is_extra_encrypted: self._extra = fernet.rotate(self._extra.encode('utf-8')).decode() - def get_hook(self): - """Return hook based on conn_type.""" + def get_hook(self, **kwargs): Review comment: There is a backwards-compatibility problem with that approach. While this change on it's own is backwards, compatible and old providers will work, if you implement a change to a provider that adds kwargs to `get_hook`, they will not be usable with already released versions of Airflow. We have A LOT of operators derived from SQLOperator and using `get_hook()` directly. If we just change a signature of get_hook() it will be very easy for anyone to start using it and breaking compatibility with released Airflow version without anyone noticing. I am not saying we should not do it, it's a good change but I think this change should be much more thought out and we need to figure out when and how to do it to avoid disruption. I dop not yet know how to do it best - my current thinking is that we should rather introduce a new method (`get_hook with params()` rather than modify existing ones, and add a pre-commit check that will check if a provider is using that method, and signals that either the provider should have the "additional_dependency" added for Airflow >version X.Y.Z. We would also have to check the usages of SQL Operators the same way and flag problems with dependencies if we see that any of the BaseSQL-derived operators passes `hook_params`. The second part will be much more "brittle" I am afraid so maybe we should figure a bit different way here. Thoughts? ########## File path: airflow/models/connection.py ########## @@ -289,8 +289,11 @@ def rotate_fernet_key(self): if self._extra and self.is_extra_encrypted: self._extra = fernet.rotate(self._extra.encode('utf-8')).decode() - def get_hook(self): - """Return hook based on conn_type.""" + def get_hook(self, **kwargs): Review comment: There is a backwards-compatibility problem with that approach. While this change on it's own is backwards, compatible and old providers will work, if you implement a change to a provider that adds kwargs to `get_hook`, they will not be usable with already released versions of Airflow. We have A LOT of operators derived from SQLOperator and using `get_hook()` directly. If we just change a signature of get_hook() it will be very easy for anyone to start using it and breaking compatibility with released Airflow version without anyone noticing. I am not saying we should not do it, it's a good change but I think this change should be much more thought out and we need to figure out when and how to do it to avoid disruption. I dop not yet know how to do it best - my current thinking is that we should rather introduce a new method (`get_hook with params()` rather than modify existing ones, and add a pre-commit check that will check if a provider is using that method, and signals that the provider should have the "additional_dependency" added for Airflow >version X.Y.Z. We would also have to check the usages of SQL Operators the same way and flag problems with dependencies if we see that any of the BaseSQL-derived operators passes `hook_params`. The second part will be much more "brittle" I am afraid so maybe we should figure a bit different way here. Thoughts? ########## File path: airflow/models/connection.py ########## @@ -289,8 +289,11 @@ def rotate_fernet_key(self): if self._extra and self.is_extra_encrypted: self._extra = fernet.rotate(self._extra.encode('utf-8')).decode() - def get_hook(self): - """Return hook based on conn_type.""" + def get_hook(self, **kwargs): Review comment: There is a backwards-compatibility problem with that approach. While this change on it's own is backwards, compatible and old providers will work, if you implement a change to a provider that adds kwargs to `get_hook`, they will not be usable with already released versions of Airflow. We have A LOT of operators derived from SQLOperator and using `get_hook()` directly. If we just change a signature of get_hook() it will be very easy for anyone to start using it and breaking compatibility with released Airflow version without anyone noticing. I am not saying we should not do it, it's a good change but I think this change should be much more thought out and we need to figure out when and how to do it to avoid disruption. I dop not yet know how to do it best - my current thinking is that we should rather introduce a new method (`get_hook with params()` rather than modify existing ones, and add a pre-commit check that will check if a provider is using that method, and signals that the provider should have the "additional_dependency" added for Airflow >= X.Y.Z. We would also have to check the usages of SQL Operators the same way and flag problems with dependencies if we see that any of the BaseSQL-derived operators passes `hook_params`. The second part will be much more "brittle" I am afraid so maybe we should figure a bit different way here. Thoughts? ########## File path: airflow/models/connection.py ########## @@ -289,8 +289,11 @@ def rotate_fernet_key(self): if self._extra and self.is_extra_encrypted: self._extra = fernet.rotate(self._extra.encode('utf-8')).decode() - def get_hook(self): - """Return hook based on conn_type.""" + def get_hook(self, **kwargs): Review comment: There is a backwards-compatibility problem with that approach. While this change on it's own is backwards, compatible and old providers will work, if you implement a change to a provider that adds kwargs to `get_hook`, they will not be usable with already released versions of Airflow. We have A LOT of operators derived from SQLOperator and using `get_hook()` directly. If we just change a signature of get_hook() it will be very easy for anyone to start using it and breaking compatibility with released Airflow version without anyone noticing. I am not saying we should not do it, it's a good change but I think this change should be much more thought out and we need to figure out when and how to do it to avoid disruption. I dop not yet know how to do it best - my current thinking is that we should rather introduce a new method (`get_hook with params()` rather than modify existing ones, and add a pre-commit check that will check if a provider is using that method, and signals that the provider should have the "additional_dependency" added for Airflow >= X.Y.Z. We would also have to check the usages of SQL Operators the same way and flag problems with dependencies if we see that any of the BaseSQL-derived operators passes `hook_params`. The second part will be much more "brittle" I am afraid, and difficult to detect , so maybe we should figure a bit different way here. Thoughts? ########## File path: airflow/models/connection.py ########## @@ -289,8 +289,11 @@ def rotate_fernet_key(self): if self._extra and self.is_extra_encrypted: self._extra = fernet.rotate(self._extra.encode('utf-8')).decode() - def get_hook(self): - """Return hook based on conn_type.""" + def get_hook(self, **kwargs): Review comment: There is a backwards-compatibility problem with that approach. While this change on it's own is backwards, compatible and old providers will work, if you implement a change to a provider that adds kwargs to `get_hook`, they will not be usable with already released versions of Airflow. We have A LOT of operators derived from SQLOperator and using `get_hook()` directly. If we just change a signature of get_hook() it will be very easy for anyone to start using it and breaking compatibility with released Airflow version without anyone noticing. I am not saying we should not do it, it's a good change but I think this change should be much more thought out and we need to figure out when and how to do it to avoid disruption. I dop not yet know how to do it best - my current thinking is that we should rather introduce a new method (`get_hook with params()` rather than modify existing ones, and add a pre-commit check that will check if a provider is using that method, and signals that the provider should have the "additional_dependency" added for Airflow >= X.Y.Z. We would also have to check the usages of SQL Operators the same way and flag problems with dependencies if we see that any of the BaseSQL-derived operators passes `hook_params`. The second part will be much more "brittle" I am afraid, and difficult to detect , so maybe we should figure a bit different way of passing parameters here - one that will be easy to detect and flag by pre-commit. Thoughts? ########## File path: airflow/models/connection.py ########## @@ -289,8 +289,11 @@ def rotate_fernet_key(self): if self._extra and self.is_extra_encrypted: self._extra = fernet.rotate(self._extra.encode('utf-8')).decode() - def get_hook(self): - """Return hook based on conn_type.""" + def get_hook(self, **kwargs): Review comment: There is a backwards-compatibility problem with that approach. While this change on it's own is backwards, compatible and old providers will work, if you implement a change to a provider that adds kwargs to `get_hook`, they will not be usable with already released versions of Airflow. We have A LOT of operators derived from SQLOperator or using `get_hook()` directly. If we just change a signature of get_hook() it will be very easy for anyone to start using it and breaking compatibility with released Airflow version without anyone noticing. I am not saying we should not do it, it's a good change but I think this change should be much more thought out and we need to figure out when and how to do it to avoid disruption. I dop not yet know how to do it best - my current thinking is that we should rather introduce a new method (`get_hook with params()` rather than modify existing ones, and add a pre-commit check that will check if a provider is using that method, and signals that the provider should have the "additional_dependency" added for Airflow >= X.Y.Z. We would also have to check the usages of SQL Operators the same way and flag problems with dependencies if we see that any of the BaseSQL-derived operators passes `hook_params`. The second part will be much more "brittle" I am afraid, and difficult to detect , so maybe we should figure a bit different way of passing parameters here - one that will be easy to detect and flag by pre-commit. Thoughts? ########## File path: airflow/models/connection.py ########## @@ -289,8 +289,11 @@ def rotate_fernet_key(self): if self._extra and self.is_extra_encrypted: self._extra = fernet.rotate(self._extra.encode('utf-8')).decode() - def get_hook(self): - """Return hook based on conn_type.""" + def get_hook(self, **kwargs): Review comment: There is a backwards-compatibility (or rather future-compatibility) problem with that approach. While this change on it's own is backwards, compatible and old providers will work, if you implement a change to a provider that adds kwargs to `get_hook`, they will not be usable with already released versions of Airflow. We have A LOT of operators derived from SQLOperator or using `get_hook()` directly. If we just change a signature of get_hook() it will be very easy for anyone to start using it and breaking compatibility with released Airflow version without anyone noticing. I am not saying we should not do it, it's a good change but I think this change should be much more thought out and we need to figure out when and how to do it to avoid disruption. I dop not yet know how to do it best - my current thinking is that we should rather introduce a new method (`get_hook with params()` rather than modify existing ones, and add a pre-commit check that will check if a provider is using that method, and signals that the provider should have the "additional_dependency" added for Airflow >= X.Y.Z. We would also have to check the usages of SQL Operators the same way and flag problems with dependencies if we see that any of the BaseSQL-derived operators passes `hook_params`. The second part will be much more "brittle" I am afraid, and difficult to detect , so maybe we should figure a bit different way of passing parameters here - one that will be easy to detect and flag by pre-commit. Thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org