[ 
https://issues.apache.org/jira/browse/HDFS-17855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18040764#comment-18040764
 ] 

ASF GitHub Bot commented on HDFS-17855:
---------------------------------------

koodin9 opened a new pull request, #8102:
URL: https://github.com/apache/hadoop/pull/8102

   <!--
     Thanks for sending a pull request!
       1. If this is your first time, please read our contributor guidelines: 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
       2. Make sure your PR title starts with JIRA issue id, e.g., 
'HADOOP-17799. Your PR title ...'.
   -->
   
   ### Description of PR
   ViewFS with linkMergeSlash generates invalid paths during 
listStatus/listLocatedStatus operations, causing InvalidPathException or 
incorrect path resolution.
   
   ### How was this patch tested?
   Added test codes
   
   ### For code changes:
   
   - [x] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> ViewFS with linkMergeSlash generates invalid paths during 
> listStatus/listLocatedStatus operations, causing InvalidPathException or 
> incorrect path resolution
> ------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-17855
>                 URL: https://issues.apache.org/jira/browse/HDFS-17855
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: viewfs
>    Affects Versions: 2.10.2, 3.4.1
>         Environment: * Hadoop version: 2.10.2
> * Configuration: ViewFS with linkMergeSlash enabled
> * Affected applications: JobHistoryServer, Hive, any application using ViewFS 
> with linkMergeSlash
>            Reporter: SeongHoon Ku
>            Priority: Major
>         Attachments: HADOOP-ViewFS-linkMergeSlash-fix.patch
>
>
> h1. Summary
> ViewFS with linkMergeSlash generates invalid paths during 
> listStatus/listLocatedStatus operations, causing InvalidPathException or 
> incorrect path resolution
> h1. Description
> When ViewFS is configured with {{linkMergeSlash}}, directory listing 
> operations using *RemoteIterator* generate invalid paths, causing 
> {{InvalidPathException}} errors in applications using the FileContext API.
> * Applications using *FileContext API (ViewFs)* with {{listLocatedStatus()}} 
> or {{listStatusIterator()}}
> * Examples: JobHistoryServer, Hive/Tez applications
> * Specifically fails in {{ViewFs$WrappingRemoteIterator.next()}} method
> h2. Configuration Example
> {code:xml}
> <property>
>   <name>fs.defaultFS</name>
>   <value>viewfs://hadoop-cluster</value>
> </property>
> <property>
>   <name>fs.viewfs.mounttable.hadoop-cluster.linkMergeSlash</name>
>   <value>hdfs://hadoop-cluster</value>
> </property>
> {code}
> h2. Error Stack Trace
> *JobHistoryServer:*
> {noformat}
> org.apache.hadoop.fs.InvalidPathException: Invalid path name relative paths 
> not allowed:
> hadoop-cluster/user/history/done/2021
>     at 
> org.apache.hadoop.fs.AbstractFileSystem.checkPath(AbstractFileSystem.java:370)
>     at 
> org.apache.hadoop.fs.AbstractFileSystem.makeQualified(AbstractFileSystem.java:428)
>     at 
> org.apache.hadoop.fs.viewfs.ViewFs$WrappingRemoteIterator.next(ViewFs.java:848)
>     at 
> org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:238)
> {noformat}
> *Hive (Tez):*
> {noformat}
> org.apache.tez.dag.app.dag.impl.AMUserCodeException: java.io.IOException:
> cannot find dir = 
> viewfs://hadoop-cluster/user/hive/hadoop-cluster/tmp/hive/...
> {noformat}
> *Observed pattern:*
> * Invalid path: 
> {{viewfs://hadoop-cluster/user/hive/hadoop-cluster/tmp/hive/...}}
> * Correct path: {{viewfs://hadoop-cluster/tmp/hive/...}}
> * Working directory and cluster name are duplicated in the path
> ----
> h1. Root Cause
> h2. Technical Analysis
> When {{linkMergeSlash}} is configured, the ViewFS root node is created with 
> its path name ({{fullPath}}) incorrectly set to {{mountTableName}} instead of 
> {{"/"}}.
> *Bug location in {{InodeTree.java}}:*
> {code:java}
> // Current (buggy) code
> if (isMergeSlashConfigured) {
>   root = new INodeLink<T>(mountTableName, ugi,  // "hadoop-cluster" - BUG!
>       initAndGetTargetFs(), mergeSlashTarget);
>   mountPoints.add(new MountPoint<T>("/", (INodeLink<T>) root));
>   rootFallbackLink = null;
> }
> {code}
> This causes {{root.fullPath}} to be set to the cluster name (e.g., 
> {{"hadoop-cluster"}}) instead of {{"/"}}.
> h2. Impact Chain
> # During path resolution ({{InodeTree.java}}), {{root.fullPath}} is used as 
> {{ResolveResult.resolvedPath}}:
> {code:java}
> if (root.isLink()) {
>   ResolveResult<T> res = new ResolveResult<T>(ResultKind.EXTERNAL_DIR,
>       getRootLink().getTargetFileSystem(), root.fullPath, remainingPath);
>       //                                   ^^^^^^^^^^^^^ Uses mountTableName!
>   return res;
> }
> {code}
> # During path conversion in {{ViewFileSystem.getChrootedPath()}} (line 563):
> {code:java}
> return this.makeQualified(
>     suffix.length() == 0 ? f : new Path(res.resolvedPath, suffix));
> // Creates: new Path("hadoop-cluster", "user/history/done")
> // Result: "hadoop-cluster/user/history/done" (RELATIVE PATH!)
> {code}
> # {{makeQualified()}} then prepends the working directory to this relative 
> path:
> {noformat}
> Expected: viewfs://hadoop-cluster/user/history/done
> Actual:   viewfs://hadoop-cluster/user/mapred/hadoop-cluster/user/history/done
> {noformat}
> h2. Why linkMergeSlash Should Use "/"
> {{linkMergeSlash}} is designed to merge the entire ViewFS root with a single 
> target directory. Therefore:
> * ViewFS root ({{/}}) = Target directory specified by linkMergeSlash
> * The root node's {{fullPath}} should naturally be {{/}}
> * This maintains consistency with the {{MountPoint}} API which already 
> returns {{/}}
> ----
> h1. Testing
> h2. Test Cases
> Added comprehensive test cases in {{TestViewFileSystemLinkMergeSlash.java}}:
> # *{{testListStatusReturnsCorrectPaths()}}*
> ** Verifies {{listStatus()}} returns proper ViewFS paths
> ** Checks scheme, authority, and path correctness
> # *{{testListLocatedStatusReturnsCorrectPaths()}}*
> ** Verifies {{listLocatedStatus()}} with RemoteIterator
> ** Ensures lazy evaluation works correctly
> # *{{testResolvedPathIsAbsolute()}}*
> ** Reproduces exact bug scenario (JobHistoryServer use case)
> ** Validates path resolution for {{/user/history/done/2021}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to