nicnguyen3103 opened a new issue #20394: URL: https://github.com/apache/airflow/issues/20394
### Apache Airflow version 2.2.2 (latest released) ### What happened I tried to upgrade to Airflow 2.2.2 and the celery worker dies immediately upon startup with the command airflow celery worker. Celery is on 5.1.2, I did not encounter this issue in the airflow version 2.1.4 with celery 4.4.5. The full log can be found in this file [airflow_celery_error_log.txt](https://github.com/apache/airflow/files/7738675/airflow_celery_error_log.txt) Both the mysql and redis was hosted on the internal network of docker swarm. Since the documentation does not mentioned any security with the celery deserialize message, could you help me to look into this issue? Many thanks ### What you expected to happen Celery-kombu refuse to accept pickle content. The default accept content of celery is Json https://docs.celeryproject.org/en/stable/userguide/configuration.html#std-setting-accept_content, I tried to change it to pickle via the environment variable `CELERY_ACCEPT_CONTENT=pickle` in docker compose but airflow celery worker don't pick it up ``` [tasks] . airflow.executors.celery_executor.execute_command [2021-12-18 03:31:55,136: INFO/MainProcess] Connected to redis://redis:6379/0 [2021-12-18 03:31:55,145: INFO/MainProcess] mingle: searching for neighbors [2021-12-18 03:31:56,163: INFO/MainProcess] mingle: all alone [2021-12-18 03:31:56,177: INFO/MainProcess] celery@stg-worker1 ready. [2021-12-18 03:31:56,179: CRITICAL/MainProcess] Can't decode message body: ContentDisallowed('Refusing to deserialize untrusted content of type pickle (application/x-python-serialize)') [type:'application/x-python-serialize' encoding:'binary' headers:{'sentry-trace': '99d3c3760460417ebdd83936d93084c4-856283284c7dc7c9-', 'headers': {'sentry-trace': '99d3c3760460417ebdd83936d93084c4-856283284c7dc7c9-'}}] body: b'\x80\x02}q\x00(X\x04\x00\x00\x00taskq\x01X\x16\x00\x00\x00sentry.tasks.send_pingq\x02X\x02\x00\x00\x00idq\x03X$\x00\x00\x00655800b5-421a-4715-87cb-14d46b9b7013q\x04X\x04\x00\x00\x00argsq\x05)X\x06\x00\x00\x00kwargsq\x06}q\x07X\x05\x00\x00\x00groupq\x08NX\x0b\x00\x00\x00group_indexq\tNX\x07\x00\x00\x00retriesq\nK\x00X\x03\x00\x00\x00etaq\x0bNX\x07\x00\x00\x00expiresq\x0cX \x00\x00\x002021-12-18T03:32:44.624948+00:00q\rX\x03\x00\x00\x00utcq\x0e\x88X\t\x00\x00\x00callbacksq\x0fNX\x08\x00\x00\x00errbacksq\x10NX\t\x00\x00\x00timelimitq\x11NN\x86q\x12X\x07\x00\x00\x00tasksetq\x13NX\x05\x00\x00\x00chordq\x14Nu.' (333b) Traceback (most recent call last): File "/home/nonroot/airflow/airflowvirtual/lib/python3.8/site-packages/celery/worker/consumer/consumer.py", line 568, in on_task_received type_ = message.headers['task'] # protocol v2 KeyError: 'task' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/nonroot/airflow/airflowvirtual/lib/python3.8/site-packages/celery/worker/consumer/consumer.py", line 573, in on_task_received payload = message.decode() File "/home/nonroot/airflow/airflowvirtual/lib/python3.8/site-packages/kombu/message.py", line 194, in decode self._decoded_cache = self._decode() File "/home/nonroot/airflow/airflowvirtual/lib/python3.8/site-packages/kombu/message.py", line 198, in _decode return loads(self.body, self.content_type, File "/home/nonroot/airflow/airflowvirtual/lib/python3.8/site-packages/kombu/serialization.py", line 242, in loads raise self._for_untrusted_content(content_type, 'untrusted') kombu.exceptions.ContentDisallowed: Refusing to deserialize untrusted content of type pickle (application/x-python-serialize) [2021-12-18 03:31:56,181: CRITICAL/MainProcess] Can't decode message body: ContentDisallowed('Refusing to deserialize untrusted content of type pickle (application/x-python-serialize)') [type:'application/x-python-serialize' encoding:'binary' headers:{'sentry-trace': 'df0cfb36633b4d34b0ebf2a293f13e25-9b0212d02c6be824-', 'headers': {'sentry-trace': 'df0cfb36633b4d34b0ebf2a293f13e25-9b0212d02c6be824-'}}] body: b'\x80\x02}q\x00(X\x04\x00\x00\x00taskq\x01X#\x00\x00\x00sentry.tasks.enqueue_scheduled_jobsq\x02X\x02\x00\x00\x00idq\x03X$\x00\x00\x007d3c5bc2-06bd-42d5-aa3d-ea36f44b2360q\x04X\x04\x00\x00\x00argsq\x05)X\x06\x00\x00\x00kwargsq\x06}q\x07X\x05\x00\x00\x00groupq\x08NX\x0b\x00\x00\x00group_indexq\tNX\x07\x00\x00\x00retriesq\nK\x00X\x03\x00\x00\x00etaq\x0bNX\x07\x00\x00\x00expiresq\x0cX \x00\x00\x002021-12-18T03:32:44.670615+00:00q\rX\x03\x00\x00\x00utcq\x0e\x88X\t\x00\x00\x00callbacksq\x0fNX\x08\x00\x00\x00errbacksq\x10NX\t\x00\x00\x00timelimitq\x11NN\x86q\x12X\x07\x00\x00\x00tasksetq\x13NX\x05\x00\x00\x00chordq\x14Nu.' (346b) Traceback (most recent call last): File "/home/nonroot/airflow/airflowvirtual/lib/python3.8/site-packages/celery/worker/consumer/consumer.py", line 568, in on_task_received type_ = message.headers['task'] # protocol v2 KeyError: 'task' ``` ### How to reproduce 1. Install a non-root airflow container. The command i used in dockerfile was: `RUN . $AIRFLOW_VENV/bin/activate && pip install 'apache-airflow[mysql,amazon,celery,papermill]==2.2.2' --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.2.2/constraints-3.8.txt"` 2. Install `apache-airflow[statsd] and apache-airflow[sentry]` as there is no provider for them 3. Setup both redis and mysql in the same docker network. Celery worker will connect to redis as broker and mysql as result 4. Setup sentry on the same network and connect use SENTRY_DSN. 5. Start celery worker ### Operating System Ubuntu "20.04.2 LTS (Focal Fossa)" ### Versions of Apache Airflow Providers pip install 'apache-airflow[mysql,amazon,celery,papermill]==2.2.2' --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.2.2/constraints-3.8.txt ### Deployment Docker-Compose ### Deployment details All server is running on docker swarm with engine version 20.10.7 ### Anything else _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
