kecookier opened a new issue, #5074:
URL: https://github.com/apache/incubator-gluten/issues/5074
### Backend
VL (Velox)
### Bug description
Based on my tests, some bugs have caused this issue:
1. Currently, `VeloxBackend::initUdf()` is called before
`UDFResolver.loadAndGetFunctionDescriptions()`, so the `udflibPath` cannot be
resolved on the driver.
2. When using `SparkFiles.get(f)` to retrieve a local file, it does not work
on the driver in `yarn-client` or `yarn-cluster` mode.
This is because the `--files/--archives` options set different Spark
configuration keys, which are determined by the type of cluster manager and the
deploy mode. If we use `--master=yarn`, then the Spark configuration keys
`spark.yarn.dist.files`/`spark.yarn.dist.archives` will be set, and the file
will be distributed to the working directory on all nodes(both driver and
executors). Otherwise, `spark.files`/`spark.archives` will be set, and the
files are added by `SparkContext.addFile`, which can be accessed by
`SparkFiles.get()`.
The Spark code that processes the --files and --archives arguments can be
found at:
https://github.com/apache/spark/blob/25ecde948bebf01d2cb1e160516238e1d949ffdb/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L653
### Spark version
Spark-3.2.x
### Spark configurations
_No response_
### System information
_No response_
### Relevant logs
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org
-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org