AngersZhuuuu commented on code in PR #43936:
URL: https://github.com/apache/spark/pull/43936#discussion_r1401427353


##########
core/src/main/scala/org/apache/spark/SparkContext.scala:
##########
@@ -1822,7 +1822,7 @@ class SparkContext(config: SparkConf) extends Logging {
       logInfo(s"Added file $path at $key with timestamp $timestamp")
       // Fetch the file locally so that closures which are run on the driver 
can still use the
       // SparkFiles API to access files.
-      Utils.fetchFile(uri.toString, root, conf, hadoopConfiguration, 
timestamp, useCache = false)
+      Utils.fetchFile(uri.toString, root, conf, hadoopConfiguration, 
timestamp, useCache = true)

Review Comment:
   Executor log when `updateDependencies`
   ```
   23/11/21 17:44:55 INFO Utils: Fetching hdfs://path/feature_map.txt to 
/mnt/ssd/2/yarn/nm-local-dir/usercache/user/appcache/application_1698132018785_8173703/spark-e5d383fd-0064-44e8-850b-c2c1934a0ddf/fetchFileTemp5380393885914736245.tmp
   23/11/21 17:44:55 INFO Utils: Copying 
/mnt/ssd/2/yarn/nm-local-dir/usercache/user/appcache/application_1698132018785_8173703/spark-e5d383fd-0064-44e8-850b-c2c1934a0ddf/-17061381181700559593903_cache
 to 
/mnt/ssd/1/yarn/nm-local-dir/usercache/user/appcache/application_1698132018785_8173703/container_e59_1698132018785_8173703_01_000683/./feature_map.txt
   ```
   
   In executor side, pass `useCache = true` when is not local mode, then 
executor will fetch the file to cache then copy cache file to root dir with 
filename.
   
   For sparkcontext dirver, current code pass `useCache=false` only fetch file 
as  file temp
   ```
   23/11/21 17:39:53 INFO [pool-3-thread-2] SparkContext: Added file 
hdfs://path/feature_map.txt at hdfs://path/feature_map.txt with timestamp 
1700559593903
   23/11/21 17:39:54 INFO [pool-3-thread-2] Utils: Fetching 
hdfs://path/feature_map.txt to 
/mnt/ssd/0/yarn/nm-local-dir/usercache/user/appcache/application_1698132018785_8173703/spark-21bedef6-1c5e-464e-9cb0-bb6903b3d84c/userFiles-a4929fdb-b634-4829-a7e3-00d82b0d521b/fetchFileTemp8739978227963911629.tmp
   ```
   
   So the added file won't exist under root dir with it's filename.
   The code of `Utils.fetchFile()` as below
   <img width="1110" alt="截屏2023-11-22 上午10 21 58" 
src="https://github.com/apache/spark/assets/46485123/68f6e2f9-a6e2-493d-bd65-d7b2cc88fadd";>
   
   
   It's clear that executor is local should pass `useCache=false` since in 
local mode, it should use file fetched by sc.
   But current code, sc won't add this file with it's file name.
   
   So I think should be like
   
   1. SC add file should also copy file to root dir with the file name, then 
driver side also can get the file with file name then can run local task in 
driver
   2. For non-local mode executor will also update the dependencies and work 
well
   3. For local mode executor, it was started in driver process. It can use the 
file downloaded by `SC.addFile()`
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to