should OutputCommitCoordinator fail stages for authorized committer failures when using s3a optimized committers?
In https://issues.apache.org/jira/browse/SPARK-39195, OutputCommitCoordinator was modified to fail a stage if an authorized committer task fails. We run our spark jobs on a k8s cluster managed by karpenter and mostly built from spot instances. As a result, our executors are frequently killed. With the above change, that leads to expensive stage failures at the final write stage. I think I understand why the above is needed when using FileOutputCommitter, but it seems like we can handle things like the magic s3a committer differently. For those, we could instead abort the task attempt, which will the data files that are awaiting the final PUT operation, and remove them from the list of files to be completed during the job commit phase Does this seem reasonable? I think the change could go in OutputCommitCoordinator (as a case in the taskCompleted block), but there are other options as well Any other ideas on how stop individual failures of authorized committer tasks from failing the whole job?
[Spark SQL][How-To] Remove builtin function support from Spark
Hello, I'm very new to the Spark ecosystem, apologies if this question is a bit simple. I want to modify a custom fork of Spark to remove function support. For example, I want to remove the query runners ability to call reflect and java_method. I saw that there exists a data structure in spark-sql called FunctionRegistry that seems to act as an allowlist on what functions Spark can execute. If I remove a function of the registry, is that enough guarantee that that function can "never" be invoked in Spark, or are there other areas that would need to be changed as well? Thanks, Matthew McMillian
[Spark SQL][How-To] Remove builtin function support from Spark
Hello, I'm very new to the Spark ecosystem, apologies if this question is a bit simple. I want to modify a custom fork of Spark to remove function support. For example, I want to remove the query runners ability to call reflect and java_method. I saw that there exists a data structure in spark-sql called FunctionRegistry that seems to act as an allowlist on what functions Spark can execute. If I remove a function of the registry, is that enough guarantee that that function can "never" be invoked in Spark, or are there other areas that would need to be changed as well? Thanks, Matthew McMillian