wangyum opened a new pull request, #8331:
URL: https://github.com/apache/incubator-gluten/pull/8331
## What changes were proposed in this pull request?
1. Avoid RPC calls.
2. Add a cache.
(Fixes: \#8330)
## How was this patch tested?
Manual tests, before: 225007 ms, after: 1330 ms:
```scala
import org.apache.hadoop.fs.viewfs.ViewFileSystemUtils
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileSystem, Path}
val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
val paths = fs.listStatus(new
Path("viewfs://my-cluster/path/to/table")).take(50000).map(_.getPath.toString)
val start1 = System.currentTimeMillis
paths.map(p => FileSystem.get(new Path(p).toUri,
spark.sparkContext.hadoopConfiguration).resolvePath(new Path(p)))
println(s"Convert viewfs to HDFS using resolvePath took:
${System.currentTimeMillis - start1} ms")
val start2 = System.currentTimeMillis
paths.map(p => ViewFileSystemUtils.convertViewfsToHdfs(p,
spark.sparkContext.hadoopConfiguration))
println(s"Convert viewfs to HDFS using convertViewfsToHdfs took:
${System.currentTimeMillis - start2} ms")
```
Output:
```
Convert viewfs to HDFS using resolvePath took: 225007 ms
Convert viewfs to HDFS using convertViewfsToHdfs took: 1330 ms
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]