wangyum opened a new pull request, #8331:
URL: https://github.com/apache/incubator-gluten/pull/8331

   ## What changes were proposed in this pull request?
   
   1. Avoid RPC calls.
   2. Add a cache.
   
   (Fixes: \#8330)
   
   ## How was this patch tested?
   
   Manual tests, before: 225007 ms, after: 1330 ms:
   ```scala
   import org.apache.hadoop.fs.viewfs.ViewFileSystemUtils
   import org.apache.hadoop.conf.Configuration
   import org.apache.hadoop.fs.{FileSystem, Path}
   
   val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
   val paths = fs.listStatus(new 
Path("viewfs://my-cluster/path/to/table")).take(50000).map(_.getPath.toString)
   
   val start1 = System.currentTimeMillis
   paths.map(p => FileSystem.get(new Path(p).toUri, 
spark.sparkContext.hadoopConfiguration).resolvePath(new Path(p)))
   println(s"Convert viewfs to HDFS using resolvePath took: 
${System.currentTimeMillis - start1} ms")
   
   val start2 = System.currentTimeMillis
   paths.map(p => ViewFileSystemUtils.convertViewfsToHdfs(p, 
spark.sparkContext.hadoopConfiguration))
   println(s"Convert viewfs to HDFS using convertViewfsToHdfs took: 
${System.currentTimeMillis - start2} ms")
   ```
   
   Output:
   ```
   Convert viewfs to HDFS using resolvePath took: 225007 ms
   Convert viewfs to HDFS using convertViewfsToHdfs took: 1330 ms
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to