baryluk opened a new issue, #45812:
URL: https://github.com/apache/airflow/issues/45812

   ### Apache Airflow Provider(s)
   
   cncf-kubernetes
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-cncf-kubernetes 4.3.0
   
   
   ### Apache Airflow version
   
   2.3.4
   
   ### Operating System
   
   Linux
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   n/a
   
   ### What happened
   
   We are upgrading from apache-airflow-providers-cncf-kubernetes 3.0.0 to 
4.3.0 (going slowly through releases).
   
   We have a custom script, that during docker image build of our airflow, 
tests all dags and all dag tasks in dry_run mode. Mostly to detect Python 
syntax errors, dag cycles duplicate tasks, wrong imports, ntemplating errors 
etc.
   
   This was working all fine with our existing airflow, but we decided to 
upgrade airflow to newer version, and that also means updating airflow 
providers. After fixing bunch of other issues, I found the issues with 
KubernetedPodOperator dry run.
   
   New dry_run added in 
https://github.com/apache/airflow/commit/d56ff765e15f9fcd582bc6d1ec0e83b0fedf476a
 invokes `KubernetesPodOperator` `build_pod_request_obj()` method which has a 
call to a property `self.hook.is_in_cluster`:
   
   ```python
           pod.metadata.labels.update(
               {
                   'airflow_version': airflow_version.replace('+', '-'),
                   'airflow_kpo_in_cluster': str(self.hook.is_in_cluster),
               }
           )
   ```
   
   Unfortunately this property constructs a Kube API client object which 
requires kube client config / credentials to work.
   
   ```python
       @property
       def is_in_cluster(self):
           """Expose whether the hook is configured with 
``load_incluster_config`` or not"""
           if self._is_in_cluster is not None:
               return self._is_in_cluster
           self.api_client  # so we can determine if we are in_cluster or not
           return self._is_in_cluster```
   
   This causes dry_run to not able to execute in isolated test environment:
   
   ```
   Traceback (most recent call last):
     File "<stdin>", line 97, in <module>
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/models/dag.py", line 
2307, in cli
       args.func(args, self)
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/cli/cli_parser.py", 
line 51, in command
       return func(*args, **kwargs)
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/utils/cli.py", line 
99, in wrapper
       return f(*args, **kwargs)
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/cli/commands/task_command.py",
 line 545, in task_test
       ti.dry_run()
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/models/taskinstance.py",
 line 1815, in dry_run
       self.task.dry_run()
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py",
 line 607, in dry_run
       pod = self.build_pod_request_obj()
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py",
 line 595, in build_pod_request_obj
       'airflow_kpo_in_cluster': str(self.hook.is_in_cluster),
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/hooks/kubernetes.py",
 line 283, in is_in_cluster
       self.api_client  # so we can determine if we are in_cluster or not
     File "/usr/local/lib/python3.9/functools.py", line 993, in __get__
       val = self.func(instance)
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/hooks/kubernetes.py",
 line 291, in api_client
       return self.get_conn()
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/hooks/kubernetes.py",
 line 239, in get_conn
       config.load_kube_config(
     File 
"/home/airflow/.local/lib/python3.9/site-packages/kubernetes/config/kube_config.py",
 line 808, in load_kube_config
       loader = _get_kube_config_loader(
     File 
"/home/airflow/.local/lib/python3.9/site-packages/kubernetes/config/kube_config.py",
 line 767, in _get_kube_config_loader
       raise ConfigException(
   kubernetes.config.config_exception.ConfigException: Invalid kube-config 
file. No configuration found.
   ```
   
   
   We would like to continue using dry_run, but be able to run it without 
providing credentials or kube config. It does not need to be 100% accurate.
   
   Two options:
   
   * env var to bypass setting of `airflow_kpo_in_cluster` label in dry run 
mode, if user requests to do so.
   * never populate it in dry_run mode. (change signature of 
`build_pod_request_obj` to have `dry_run: bool = False` kwarg and invoke it 
with `dry_run=True` in KubernetesPodOperator.dry_run()` method.
   
   (or both) 
   
   ### What you think should happen instead
   
   n/a
   
   ### How to reproduce
   
   n/a
   
   ### Anything else
   
   n/a
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to