[jira] [Commented] (HADOOP-14210) Directories are not listed recursively when fs.defaultFs is viewFs
[ https://issues.apache.org/jira/browse/HADOOP-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396260#comment-16396260 ] Hanisha Koneru commented on HADOOP-14210: - It would be good to have {{-ls -R}} operation on ViewFs to work as it would on Unix. But I do have a concern/ question about recursively listing files/ directories in ViewFs. How are we handling the scenario where one mount target is a parent of another mount target. For example, in the config below, if we recursively list the files/ directories under viewFs root, then the files/ directories under {{/user}} will be listed twice (once for {{/nn1}} and once for {{/user}}). I think this would be a bad experience for users. {code:java} fs.defaultFS = viewfs:/// fs.viewfs.mounttable.default.link./nn1 = hdfs://ns1/ fs.viewfs.mounttable.default.link./user = hdfs://ns1/user/ {code} One option is to duplicate the behavior as is for symlinks in Unix. In Unix, {{ls -R}} does not list the contents of a symlink's target. We need to add "*{{-L | --dereference}}*" option to recursively list contents of symlinks along with directories. We can copy this behaviour in ViewFs. That is, we recursively list the contents of mount's target filesystem only when {{-ls -R}} is called with the option {{-L}}. This would still list the contents of \{{/user}} twice for the scenario mentioned above, but I think that should be fine. Would love to hear thoughts on this. > Directories are not listed recursively when fs.defaultFs is viewFs > -- > > Key: HADOOP-14210 > URL: https://issues.apache.org/jira/browse/HADOOP-14210 > Project: Hadoop Common > Issue Type: Bug > Components: viewfs >Affects Versions: 2.7.0 >Reporter: Ajith S >Priority: Major > Labels: viewfs > Attachments: HDFS-8413.patch > > > Mount a cluster on client throught viewFs mount table > Example: > {quote} > > fs.defaultFS > viewfs:/// > > > fs.viewfs.mounttable.default.link./nn1 > hdfs://ns1/ > > > fs.viewfs.mounttable.default.link./user > hdfs://host-72:8020/ > > > {quote} > Try to list the files recursively *(hdfs dfs -ls -R / or hadoop fs -ls -R /)* > only the parent folders are listed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14210) Directories are not listed recursively when fs.defaultFs is viewFs
[ https://issues.apache.org/jira/browse/HADOOP-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939527#comment-15939527 ] Andrew Wang commented on HADOOP-14210: -- I don't have a fully educated opinion on this topic, but I normally look to Unixy behavior to determine user expectations of HDFS. "ls -R /" lists everything. When I stat a mount point, it's a directory, not a symlink. Symlinks are currently not enabled in HDFS, so no downstream apps have ever encountered a symlink. Again looking at Unix as a guide, applications still don't handle symlinks correctly from a correctness or security POV after decades of existence. So, if we can roll out VFS without requiring apps to understand symlinks, I'd be happy. A lot of our apps use HDFS APIs like the shell or {{FileSystem#listFiles}}. We'd want these to work the same way regardless of how the VFS is sharded, for transparency. Can we dredge up any of the initial motivations from the VFS design docs or JIRA? Given that existing FSs can already span to 100s of millions of files, clients need to handle high scale no matter what. For the CLI ls command, I'm hoping we're using the iterator-based listing methods, as well as providing a mode that doesn't try to align columns (which requires buffering all the output). > Directories are not listed recursively when fs.defaultFs is viewFs > -- > > Key: HADOOP-14210 > URL: https://issues.apache.org/jira/browse/HADOOP-14210 > Project: Hadoop Common > Issue Type: Bug > Components: viewfs >Affects Versions: 2.7.0 >Reporter: Ajith S > Labels: viewfs > Attachments: HDFS-8413.patch > > > Mount a cluster on client throught viewFs mount table > Example: > {quote} > > fs.defaultFS > viewfs:/// > > > fs.viewfs.mounttable.default.link./nn1 > hdfs://ns1/ > > > fs.viewfs.mounttable.default.link./user > hdfs://host-72:8020/ > > > {quote} > Try to list the files recursively *(hdfs dfs -ls -R / or hadoop fs -ls -R /)* > only the parent folders are listed. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14210) Directories are not listed recursively when fs.defaultFs is viewFs
[ https://issues.apache.org/jira/browse/HADOOP-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939330#comment-15939330 ] Manoj Govindassamy commented on HADOOP-14210: - [~ajithshetty] bq. {code} - - result[i++] = new FileStatus(0, false, 0, 0, + boolean isDir=link.isMergeLink; + if (link.targetDirLinkList.length == 1) { +try { + isDir = + link.targetFileSystem.getFileLinkStatus(link.getTargetLink()) + .isDirectory(); {code} LinkMerge / MergeMounts are not supported yet. But, I see your point of reaching out to the target filesystem to find if the linked item is indeed a directory or not. [~xkrogen] In the context of ViewFileSystem, all the linked entities - be it be Dir or File in target filesystem are of type INodeLink. Only the internal directories in the ViewFileSystem mount tree are of type INodeDir. So, [~ajithshetty] pointed out, ViewFileSystem treats only its internal directories as Dirs and all others as linked files. So, the FileStatus[] returned by ViewFileSystem has the Dir flag turned off for all the linked Directories in the target filesystem making the LS command stop the file tree traversal. Given the scale ViewFileSystem could be, returning millions of FileStatus[] across all namenodes could be a problem for Clients as well. So, I was assuming the intention for {{listStatus}} on ViewFileSystem is to only list the mount tree and not the entire world. But, this doesn't go well "ls -R" expectation from Clients. [~andrew.wang], any thoughts on the expectation for "ls -R" on ViewFileSystem root ? > Directories are not listed recursively when fs.defaultFs is viewFs > -- > > Key: HADOOP-14210 > URL: https://issues.apache.org/jira/browse/HADOOP-14210 > Project: Hadoop Common > Issue Type: Bug > Components: viewfs >Affects Versions: 2.7.0 >Reporter: Ajith S > Labels: viewfs > Attachments: HDFS-8413.patch > > > Mount a cluster on client throught viewFs mount table > Example: > {quote} > > fs.defaultFS > viewfs:/// > > > fs.viewfs.mounttable.default.link./nn1 > hdfs://ns1/ > > > fs.viewfs.mounttable.default.link./user > hdfs://host-72:8020/ > > > {quote} > Try to list the files recursively *(hdfs dfs -ls -R / or hadoop fs -ls -R /)* > only the parent folders are listed. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org