Re: [I] [VL] UDF load failed in yarn-cluster mode [incubator-gluten]

2024-03-25 Thread via GitHub


marin-ma closed issue #5074: [VL] UDF load failed in yarn-cluster mode
URL: https://github.com/apache/incubator-gluten/issues/5074


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org



[I] [VL] UDF load failed in yarn-cluster mode [incubator-gluten]

2024-03-21 Thread via GitHub


kecookier opened a new issue, #5074:
URL: https://github.com/apache/incubator-gluten/issues/5074

   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   Based on my tests, some bugs have caused this issue:
   1. Currently, `VeloxBackend::initUdf()` is called before 
`UDFResolver.loadAndGetFunctionDescriptions()`, so the `udflibPath` cannot be 
resolved on the driver.
   2. When using `SparkFiles.get(f)` to retrieve a local file, it does not work 
on the driver in `yarn-client` or `yarn-cluster` mode. 
   This is because the `--files/--archives` options set different Spark 
configuration keys, which are determined by the type of cluster manager and the 
deploy mode. If we use `--master=yarn`, then the Spark configuration keys 
`spark.yarn.dist.files`/`spark.yarn.dist.archives` will be set, and the file 
will be distributed to the working directory on all nodes(both driver and 
executors). Otherwise, `spark.files`/`spark.archives` will be set, and the 
files are added by `SparkContext.addFile`, which can be accessed by 
`SparkFiles.get()`.
   
   The Spark code that processes the --files and --archives arguments can be 
found at: 
https://github.com/apache/spark/blob/25ecde948bebf01d2cb1e160516238e1d949ffdb/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L653
   
   ### Spark version
   
   Spark-3.2.x
   
   ### Spark configurations
   
   _No response_
   
   ### System information
   
   _No response_
   
   ### Relevant logs
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org
For additional commands, e-mail: commits-h...@gluten.apache.org