AjayKumbham commented on PR #55645: URL: https://github.com/apache/airflow/pull/55645#issuecomment-3289467834
Hi @eladkal, totally get your concern. Let me break this down: The actual problem: Airflow 3 passes task data as JSON strings to containers. Spark containers (and others) have entrypoints that run these through shell, which strips the quotes and breaks the JSON. Why I'm fixing it here: I checked the other executors - AWS Lambda/ECS/Batch all use the same --json-string approach but don't have this issue because they execute directly without shell processing. This is specifically a Kubernetes container problem. The workload_to_command_args() function is Kubernetes-only, so fixing it here doesn't affect other executors. By switching to --json-path (writing to a temp file), we sidestep the shell processing entirely. Why it's broader than Spark: Any operator with complex JSON configs would hit this with shell-based container entrypoints. SparkSubmitOperator just happened to be where users first noticed it because Spark configs are JSON-heavy. The fix is surgical - it only changes how KubernetesExecutor passes data to containers, which is exactly where this container-specific workaround belongs. I did use AI assistance for cleaner code and PR description, but the debugging, root cause analysis, and solution approach are mine. Happy to clarify any technical details..! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
