AjayKumbham opened a new pull request, #55645: URL: https://github.com/apache/airflow/pull/55645
Fix Airflow 3 Spark k8s pod JSON payload corruption when using KubernetesExecutor This PR fixes an issue where Airflow 3 Spark worker k8s pod entrypoints corrupt task JSON payloads by stripping double quotes when arguments are processed through the container's shell. ## Problem When running SparkSubmitOperator tasks with Airflow 3 and KubernetesExecutor, the JSON payload passed to Spark containers gets corrupted because: - The `workload_to_command_args()` function uses `--json-string` to pass JSON directly as a command argument - Spark container entrypoints process these arguments through a shell - The shell strips double quotes from the JSON string, corrupting the payload - This causes Spark tasks to fail with malformed JSON errors Users previously had to manually add escaping slashes as a workaround. ## Solution - Change `workload_to_command_args()` to use `--json-path` instead of `--json-string` - Write JSON payload to a temporary file in `/tmp/` to avoid shell escaping issues - Use unique filenames based on task instance details to prevent conflicts - Maintain backward compatibility since Task SDK already supports both `--json-path` and `--json-string` ## Changes Made - **pod_generator.py**: Modified `workload_to_command_args()` function to write JSON to temp file - **test_pod_generator.py**: Added comprehensive tests for regular and mapped tasks - **test_template_rendering.py**: Updated existing test to use new approach with proper cleanup ## Testing - ✅ Verified JSON with problematic double quotes (the original issue case) - ✅ Tested mapped task filename generation and uniqueness - ✅ Validated file creation, content integrity, and cleanup - ✅ Ensured integration with existing test suite This fix eliminates the need for manual escaping workarounds and allows SparkSubmitOperator to work correctly with Airflow 3 in Kubernetes environments. closes: #55561 --- **^ Add meaningful description above** Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#pull-request-guidelines)** for more information. In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [airflow-core/newsfragments](https://github.com/apache/airflow/tree/main/airflow-core/newsfragments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
