[ https://issues.apache.org/jira/browse/AIRFLOW-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Fokko Driesprong closed AIRFLOW-2966. ------------------------------------- Resolution: Fixed Assignee: John Hofman Fix Version/s: 2.0.0 > KubernetesExecutor + namespace quotas kills scheduler if the pod can't be > launched > ---------------------------------------------------------------------------------- > > Key: AIRFLOW-2966 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2966 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler > Affects Versions: 2.0.0 > Environment: Kubernetes 1.9.8 > Reporter: John Hofman > Assignee: John Hofman > Priority: Major > Fix For: 2.0.0 > > > When running Airflow in Kubernetes with the KubernetesExecutor and resource > quota's set on the namespace Airflow is deployed in. If the scheduler tries > to launch a pod into the namespace that exceeds the namespace limits it gets > an ApiException, and crashes the scheduler. > This stack trace is an example of the ApiException from the kubernetes client: > {code:java} > [2018-08-27 09:51:08,516] {pod_launcher.py:58} ERROR - Exception when > attempting to create Namespaced Pod. > Traceback (most recent call last): > File "/src/apache-airflow/airflow/contrib/kubernetes/pod_launcher.py", line > 55, in run_pod_async > resp = self._client.create_namespaced_pod(body=req, namespace=pod.namespace) > File > "/usr/local/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", > line 6057, in create_namespaced_pod > (data) = self.create_namespaced_pod_with_http_info(namespace, body, **kwargs) > File > "/usr/local/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", > line 6142, in create_namespaced_pod_with_http_info > collection_formats=collection_formats) > File > "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", > line 321, in call_api > _return_http_data_only, collection_formats, _preload_content, > _request_timeout) > File > "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", > line 155, in __call_api > _request_timeout=_request_timeout) > File > "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", > line 364, in request > body=body) > File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line > 266, in POST > body=body) > File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line > 222, in request > raise ApiException(http_resp=r) > kubernetes.client.rest.ApiException: (403) > Reason: Forbidden > HTTP response headers: HTTPHeaderDict({'Audit-Id': > 'b00e2cbb-bdb2-41f3-8090-824aee79448c', 'Content-Type': 'application/json', > 'Date': 'Mon, 27 Aug 2018 09:51:08 GMT', 'Content-Length': '410'}) > HTTP response body: > {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods > \"podname-ec366e89ef934d91b2d3ffe96234a725\" is forbidden: exceeded quota: > compute-resources, requested: limits.memory=4Gi, used: limits.memory=6508Mi, > limited: > limits.memory=10Gi","reason":"Forbidden","details":{"name":"podname-ec366e89ef934d91b2d3ffe96234a725","kind":"pods"},"code":403}{code} > > I would expect the scheduler to catch the Exception and at least mark the > task as failed, or better yet retry the task later. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)