url54 opened a new issue, #45718: URL: https://github.com/apache/airflow/issues/45718
### Apache Airflow version Other Airflow 2 version (please specify below) ### If "Other Airflow 2 version" selected, which one? 2.10.3 ### What happened? Dags Processing Manager attempts to parse PPTX files causing it to place corrupted data in the metadata database. ### What you think should happen instead? Ideally it should not be parsing any none-Python files from the /usr/local/airflow/dags directory. ### How to reproduce Copy text from an Airflow DAG code and place it inside a PPTX slide. Upload the PPTX to `/usr/local/airflow/dags`, wait for errors to appear in dags processing manager logs (may take little bit of time), for reference I have attached what I used and screenshot:  [Testing AIrflow Bug.pptx](https://github.com/user-attachments/files/18444241/Testing.AIrflow.Bug.pptx) ### Operating System Amazon Linux 2023 ### Versions of Apache Airflow Providers Connection type | Package| |--|--| |AWS Connection |[apache-airflow-providers-amazon[aiobotocore]==9.0.0](https://airflow.apache.org/docs/apache-airflow-providers-amazon/9.0.0/index.html)| | Postgres Connection|[apache-airflow-providers-postgres==5.13.1](https://airflow.apache.org/docs/apache-airflow-providers-postgres/5.13.1/index.html)| |FTP Connection|[apache-airflow-providers-ftp==3.11.1](https://airflow.apache.org/docs/apache-airflow-providers-ftp/3.11.1/index.html)| |Fab Connection|[apache-airflow-providers-fab==1.5.0](https://airflow.apache.org/docs/apache-airflow-providers-fab/1.5.0/index.html)| |Celery Connection|[apache-airflow-providers-celery==3.8.3](https://airflow.apache.org/docs/apache-airflow-providers-celery/3.8.3/index.html)| |HTTP Connection|[apache-airflow-providers-http==4.13.2](https://airflow.apache.org/docs/apache-airflow-providers-http/4.13.2/index.html)| |IMAP Connection|[apache-airflow-providers-imap==3.7.0](https://airflow.apache.org/docs/apache-airflow-providers-imap/3.7.0/index.html)| |Common SQL|[apache-airflow-providers-common-sql==1.19.0](https://airflow.apache.org/docs/apache-airflow-providers-common-sql/1.19.0/index.html)| |SQLite Connection|[apache-airflow-providers-sqlite==3.9.0](https://airflow.apache.org/docs/apache-airflow-providers-sqlite/3.9.0/index.html)| |SMTP Connection|[apache-airflow-providers-smtp==1.8.0](https://airflow.apache.org/docs/apache-airflow-providers-smtp/1.8.0/index.html)| ### Deployment Amazon (AWS) MWAA ### Deployment details Nothing special, default MWAA deployment no additional configurations or requirements. ### Anything else? Based on how zipfile works --> https://github.com/apache/airflow/blob/main/airflow/utils/file.py#L276 It appears the list of items that could be parsed extends beyond PPTX [[1]](https://en.wikipedia.org/wiki/List_of_file_signatures) | Hex signature | ISO 8859-1 | Offset | Extension | Description | |--|--|--|--|--| |50 4B 03 04, 50 4B 05 06 (empty archive), 50 4B 07 08 (spanned archive)|PK␃␄,PK␅␆,PK␇␈ |0| zip, aar, apk, docx, epub, [ipa](https://en.wikipedia.org/wiki/.ipa), jar, kmz, [maff](https://en.wikipedia.org/wiki/Mozilla_Archive_Format), msix, odp, ods, odt, pk3, pk4, pptx, usdz, vsdx, xlsx, [xpi](https://en.wikipedia.org/wiki/XPInstall) |[zip file format](https://en.wikipedia.org/wiki/ZIP_(file_format)) and formats based on it, such as [EPUB](https://en.wikipedia.org/wiki/EPUB), [JAR](https://en.wikipedia.org/wiki/JAR_(file_format)), [ODF](https://en.wikipedia.org/wiki/OpenDocument), [OOXML](https://en.wikipedia.org/wiki/Office_Open_XML) ### Are you willing to submit PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org