[jira] [Commented] (HADOOP-14210) Directories are not listed recursively when fs.defaultFs is viewFs

2018-03-12 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396260#comment-16396260
 ] 

Hanisha Koneru commented on HADOOP-14210:
-

It would be good to have {{-ls -R}} operation on ViewFs to work as it would on 
Unix. But I do have a concern/ question about recursively listing files/ 
directories in ViewFs. How are we handling the scenario where one mount target 
is a parent of another mount target. For example, in the config below, if we 
recursively list the files/ directories under viewFs root, then the files/ 
directories under {{/user}} will be listed twice (once for {{/nn1}} and once 
for {{/user}}). I think this would be a bad experience for users.
{code:java}
fs.defaultFS = viewfs:///

fs.viewfs.mounttable.default.link./nn1 = hdfs://ns1/

fs.viewfs.mounttable.default.link./user = hdfs://ns1/user/

{code}
 

One option is to duplicate the behavior as is for symlinks in Unix. In Unix, 
{{ls -R}} does not list the contents of a symlink's target. We need to add 
"*{{-L | --dereference}}*" option to recursively list contents of symlinks 
along with directories.
 We can copy this behaviour in ViewFs. That is, we recursively list the 
contents of mount's target filesystem only when {{-ls -R}} is called with the 
option {{-L}}. This would still list the contents of \{{/user}} twice for the 
scenario mentioned above, but I think that should be fine. 

Would love to hear thoughts on this.

> Directories are not listed recursively when fs.defaultFs is viewFs
> --
>
> Key: HADOOP-14210
> URL: https://issues.apache.org/jira/browse/HADOOP-14210
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: viewfs
>Affects Versions: 2.7.0
>Reporter: Ajith S
>Priority: Major
>  Labels: viewfs
> Attachments: HDFS-8413.patch
>
>
> Mount a cluster on client throught viewFs mount table
> Example:
> {quote}
>  
> fs.defaultFS
> viewfs:///
>   
> 
> fs.viewfs.mounttable.default.link./nn1
> hdfs://ns1/  
> 
> 
> fs.viewfs.mounttable.default.link./user
> hdfs://host-72:8020/
> 
>  
> {quote}
> Try to list the files recursively *(hdfs dfs -ls -R / or hadoop fs -ls -R /)* 
> only the parent folders are listed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14210) Directories are not listed recursively when fs.defaultFs is viewFs

2017-03-23 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939527#comment-15939527
 ] 

Andrew Wang commented on HADOOP-14210:
--

I don't have a fully educated opinion on this topic, but I normally look to 
Unixy behavior to determine user expectations of HDFS. "ls -R /" lists 
everything. When I stat a mount point, it's a directory, not a symlink.

Symlinks are currently not enabled in HDFS, so no downstream apps have ever 
encountered a symlink. Again looking at Unix as a guide, applications still 
don't handle symlinks correctly from a correctness or security POV after 
decades of existence. So, if we can roll out VFS without requiring apps to 
understand symlinks, I'd be happy.

A lot of our apps use HDFS APIs like the shell or {{FileSystem#listFiles}}. 
We'd want these to work the same way regardless of how the VFS is sharded, for 
transparency.

Can we dredge up any of the initial motivations from the VFS design docs or 
JIRA?

Given that existing FSs can already span to 100s of millions of files, clients 
need to handle high scale no matter what. For the CLI ls command, I'm hoping 
we're using the iterator-based listing methods, as well as providing a mode 
that doesn't try to align columns (which requires buffering all the output).

> Directories are not listed recursively when fs.defaultFs is viewFs
> --
>
> Key: HADOOP-14210
> URL: https://issues.apache.org/jira/browse/HADOOP-14210
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: viewfs
>Affects Versions: 2.7.0
>Reporter: Ajith S
>  Labels: viewfs
> Attachments: HDFS-8413.patch
>
>
> Mount a cluster on client throught viewFs mount table
> Example:
> {quote}
>  
> fs.defaultFS
> viewfs:///
>   
> 
> fs.viewfs.mounttable.default.link./nn1
> hdfs://ns1/  
> 
> 
> fs.viewfs.mounttable.default.link./user
> hdfs://host-72:8020/
> 
>  
> {quote}
> Try to list the files recursively *(hdfs dfs -ls -R / or hadoop fs -ls -R /)* 
> only the parent folders are listed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14210) Directories are not listed recursively when fs.defaultFs is viewFs

2017-03-23 Thread Manoj Govindassamy (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939330#comment-15939330
 ] 

Manoj Govindassamy commented on HADOOP-14210:
-

[~ajithshetty]
bq. {code}
-
-  result[i++] = new FileStatus(0, false, 0, 0,
+  boolean isDir=link.isMergeLink;
+  if (link.targetDirLinkList.length == 1) {
+try {
+  isDir =
+  link.targetFileSystem.getFileLinkStatus(link.getTargetLink())
+  .isDirectory();
{code}
LinkMerge / MergeMounts are not supported yet. But, I see your point of 
reaching out to the target filesystem to find if the linked item is indeed a 
directory or not. 

[~xkrogen]
In the context of ViewFileSystem, all the linked entities - be it be Dir or 
File in target filesystem are of type INodeLink. Only the internal directories 
in the ViewFileSystem mount tree are of type INodeDir. So, [~ajithshetty] 
pointed out, ViewFileSystem treats only its internal directories as Dirs and 
all others as linked files. So, the FileStatus[] returned by ViewFileSystem has 
the Dir flag turned off for all the linked Directories in the target filesystem 
making the LS command stop the file tree traversal. 

Given the scale ViewFileSystem could be, returning millions of FileStatus[] 
across all namenodes could be a problem for Clients as well. So, I was assuming 
the intention for {{listStatus}} on ViewFileSystem is to only list the mount 
tree and not the entire world. But, this doesn't go well "ls -R" expectation 
from Clients. [~andrew.wang], any thoughts on the expectation for "ls -R" on 
ViewFileSystem root ?

> Directories are not listed recursively when fs.defaultFs is viewFs
> --
>
> Key: HADOOP-14210
> URL: https://issues.apache.org/jira/browse/HADOOP-14210
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: viewfs
>Affects Versions: 2.7.0
>Reporter: Ajith S
>  Labels: viewfs
> Attachments: HDFS-8413.patch
>
>
> Mount a cluster on client throught viewFs mount table
> Example:
> {quote}
>  
> fs.defaultFS
> viewfs:///
>   
> 
> fs.viewfs.mounttable.default.link./nn1
> hdfs://ns1/  
> 
> 
> fs.viewfs.mounttable.default.link./user
> hdfs://host-72:8020/
> 
>  
> {quote}
> Try to list the files recursively *(hdfs dfs -ls -R / or hadoop fs -ls -R /)* 
> only the parent folders are listed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org