[ https://issues.apache.org/jira/browse/HDFS-15289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17090912#comment-17090912 ]
Virajith Jalaparti edited comment on HDFS-15289 at 4/23/20, 8:25 PM: --------------------------------------------------------------------- cc: [~shv] [~cliang], [~abhishekd] Hi [~umamaheswararao], thanks for posting this! At LinkedIn, we are currently evaluating the same situation and are implementing a solution along the same lines as described in the doc (both problem 1 and 2 as discussed in your document). Our use cases include: * HDFS federation (we are currently working on federating our largest cluster) * Accessing data across multiple storage accounts in Azure (as part of migration to cloud). * The same user code (UDFs etc.) should be able to work on both 1 and 2. In many cases, UDFs use {{hdfs:///.}} Our concerns around overriding {{fs.hdfs.impl}} are: # {{saveNamespace}} and other methods in {{FileSystem}} all needed to be implemented in {{ViewFSOveraloadScheme}}. Do you have any specific plans around testing this? # Admins will not have a way to directly access HDFS unless configs on admin machines are deployed separately. Is this something you considered? How do you plan to make admin tools work? # How to handle cases where {{DistributedFileSystem}} is used instead of {{FileSystem}}? Do you plan to make {{ViewFSOveraloadScheme extend }}{{DistributedFileSystem?}} Any thoughts around 1-3 above? was (Author: virajith): cc: [~shv] [~cliang], [~abhishekd] Hi [~umamaheswararao], thanks for posting this! At LinkedIn, we are currently evaluating the same situation and are implementing a solution along the same lines as described in the doc (both problem 1 and 2 as discussed in your document). Our use cases include: * HDFS federation (we are currently working on federating our largest cluster) * Accessing data across multiple storage accounts in Azure (as part of migration to cloud). * The same user code (UDFs etc.) should be able to work on both 1 and 2. In many cases, UDFs use {{hdfs:///.}} Our concerns around overriding {{fs.hdfs.impl}} are: # {{saveNamespace}} and other methods in {{FileSystem}} all needed to be implemented in {{ViewFSOveraloadScheme}}. Do you have any specific plans around testing this? # Admins will not have a way to directly access HDFS unless configs on admin machines are deployed separately. Is this something you considered? How do you plan to make admin tools work? # How do we handle cases where {{DistributedFileSystem}} is used instead of {{FileSystem}}? Do you plan to make {{ViewFSOveraloadScheme extend }}{{DistributedFileSystem?}}{{}} Any thoughts around 1-3 above? > Allow viewfs mounts with hdfs scheme and centralized mount table > ---------------------------------------------------------------- > > Key: HDFS-15289 > URL: https://issues.apache.org/jira/browse/HDFS-15289 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs > Affects Versions: 3.2.0 > Reporter: Uma Maheswara Rao G > Assignee: Uma Maheswara Rao G > Priority: Major > Fix For: 3.4.0 > > Attachments: ViewFSOverloadScheme - V1.0.pdf > > > ViewFS provides flexibility to mount different filesystem types with mount > points configuration table. Additionally viewFS provides flexibility to > configure any fs (not only HDFS) scheme in mount table mapping. This approach > is solving the scalability problems, but users need to reconfigure the > filesystem to ViewFS and to its scheme. This will be problematic in the case > of paths persisted in meta stores, ex: Hive. In systems like Hive, it will > store uris in meta store. So, changing the file system scheme will create a > burden to upgrade/recreate meta stores. In our experience many users are not > ready to change that. > Router based federation is another implementation to provide coordinated > mount points for HDFS federation clusters. Even though this provides > flexibility to handle mount points easily, this will not allow > other(non-HDFS) file systems to mount. So, this does not solve the purpose > when users want to mount external(non-HDFS) filesystems. > So, the problem here is: Even though many users want to adapt to the scalable > fs options available, technical challenges of changing schemes (ex: in meta > stores) in deployments are obstructing them. > So, we propose to allow hdfs scheme in ViewFS like client side mount system > and provision user to create mount links without changing URI paths. > I will upload detailed design doc shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org