AjayKumbham opened a new pull request, #55645:
URL: https://github.com/apache/airflow/pull/55645

   Fix Airflow 3 Spark k8s pod JSON payload corruption when using 
KubernetesExecutor
   
   This PR fixes an issue where Airflow 3 Spark worker k8s pod entrypoints 
corrupt task JSON payloads by stripping double quotes when arguments are 
processed through the container's shell.
   
   ## Problem
   When running SparkSubmitOperator tasks with Airflow 3 and 
KubernetesExecutor, the JSON payload passed to Spark containers gets corrupted 
because:
   - The `workload_to_command_args()` function uses `--json-string` to pass 
JSON directly as a command argument
   - Spark container entrypoints process these arguments through a shell
   - The shell strips double quotes from the JSON string, corrupting the payload
   - This causes Spark tasks to fail with malformed JSON errors
   
   Users previously had to manually add escaping slashes as a workaround.
   
   ## Solution
   - Change `workload_to_command_args()` to use `--json-path` instead of 
`--json-string`
   - Write JSON payload to a temporary file in `/tmp/` to avoid shell escaping 
issues
   - Use unique filenames based on task instance details to prevent conflicts
   - Maintain backward compatibility since Task SDK already supports both 
`--json-path` and `--json-string`
   
   ## Changes Made
   - **pod_generator.py**: Modified `workload_to_command_args()` function to 
write JSON to temp file
   - **test_pod_generator.py**: Added comprehensive tests for regular and 
mapped tasks
   - **test_template_rendering.py**: Updated existing test to use new approach 
with proper cleanup
   
   ## Testing
   - ✅ Verified JSON with problematic double quotes (the original issue case)
   - ✅ Tested mapped task filename generation and uniqueness
   - ✅ Validated file creation, content integrity, and cleanup
   - ✅ Ensured integration with existing test suite
   
   This fix eliminates the need for manual escaping workarounds and allows 
SparkSubmitOperator to work correctly with Airflow 3 in Kubernetes environments.
   
   closes: #55561
   
   ---
   
   **^ Add meaningful description above**
   
   Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#pull-request-guidelines)**
 for more information.
   
   In case of fundamental code changes, an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a 
newsfragment file, named `{pr_number}.significant.rst` or 
`{issue_number}.significant.rst`, in 
[airflow-core/newsfragments](https://github.com/apache/airflow/tree/main/airflow-core/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to