[ https://issues.apache.org/jira/browse/HADOOP-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139993#comment-17139993 ]
Uma Maheswara Rao G edited comment on HADOOP-17072 at 6/18/20, 9:06 PM: ------------------------------------------------------------------------ Hi [~virajith] , thanks for filing and the patch! I looked at patch quickly. We have two closely similar functionality APIs which are publicly exposed in FileSystem.java currently. # public Path getLinkTarget(Path f) throws IOException { # public FileSystem[] getChildFileSystems() { Did you get chance to check them in ur use case without adding new APIs? I see that in getClusterRoots, you are already using getChildFileSystems. You added FileSystem#getRootURI. This seems to me a simple util method, need not be in FileSystem.java? Is your plan actually to override getRootURI in specific fileSystems? The current implementation in ViewFileSystem#getClusterRoots can be done in any util class as well by using public getChildFileSystems? Could you please elaborate a bit? was (Author: umamaheswararao): Hi [~virajith] , thanks for filing and the patch! I looked at patch quickly. We have two closely similar functionality APIs which are publicly exposed in FileSystem.java currently. 1) public Path getLinkTarget(Path f) throws IOException { 2) public FileSystem[] getChildFileSystems() { Did you get chance to check them in ur use case without adding new APIs? I see that in getClusterRoots, you are already using getChildFileSystems. You added FileSystem#getRootURI. This seems to me a simple util method, need not be in FileSystem.java? Is your plan actually to override getRootURI in specific fileSystems? The current implementation in ViewFileSystem#getClusterRoots can be done in any util class as well by using public getChildFileSystems? Could you please elaborate a bit? > Add getClusterRoot and getClusterRoots methods to FileSystem and > ViewFilesystem > ------------------------------------------------------------------------------- > > Key: HADOOP-17072 > URL: https://issues.apache.org/jira/browse/HADOOP-17072 > Project: Hadoop Common > Issue Type: Task > Components: fs, viewfs > Reporter: Virajith Jalaparti > Assignee: Virajith Jalaparti > Priority: Major > Attachments: HADOOP-17072.001.patch > > > In a federated setting (HDFS federation, federation across multiple buckets > on S3, multiple containers across Azure storage), certain system > tools/pipelines require the ability to map paths to the clusters/accounts. > Consider the example of GDPR compliance/retention jobs that need to go over > various datasets, ingested over a period of T days and remove/quarantine > datasets that are not properly annotated/have reached their retention period. > Such jobs can rely on renames to a global trash/quarantine directory to > accomplish their task. However, in a federated setting, efficient, atomic > renames (as those within a single HDFS cluster) are not supported across the > different clusters/shards in federation. As a result, such jobs will need to > leverage a trash/quarantine directory per cluster/shard. Further, they would > need to map from a particular path to the cluster/shard that contains this > path. > To address such cases, this JIRA proposes to get add two new methods to > {{FileSystem}}: {{getClusterRoot}} and {{getClusterRoots()}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org