[ https://issues.apache.org/jira/browse/HDFS-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eli Collins updated HDFS-245: ----------------------------- Attachment: designdocv2.txt While writing tests I noticed the current API doesn't match POSIX semantics that closely, eg if a path refers to a symlink then the symlink is not resolved, eg {{getFileStatus}} returns the FileStatus of the link rather than what it points to (ie behaves like {{lstat}} rather than {{stat}}), ditto for {{open}}, {{setReplication}} etc. While some APIs should act on the symlink itself (eg {{rename}}, {{delete}}) others need symlinks fully resolved. The design doc should specify the intended behavior of the FileContext API wrt symlinks. I attached an updated version and pasted the relevant section below. What do people think? h2. FileContext APIs This section specifies the behavior of the FileContext API when links are present in paths. The intent is to match POSIX semantics. For most functions, if symlinks are supported, all links leading up to the target of a path should automatically be resolved. Some functions will not resolve any links in a given path. Some functions will, if given a path that refers to a symlink, operate on the target of the symlink, while others will operate on the symlink itself. For example, {{setReplication}} and {{getFileBlockLocations}} act on the symlink target while {{delete}} and {{getFileStatus}} act on the symlink itself. Behavior is specified both for filesystems that do and do not support symlinks. To support symlink-aware utilities the FileContext API requires some new interfaces (eg equivalent to {{lstat}}) to indicate whether a path refers to a symlink. - {{create}}, {{mkdir}} -- the path should not refer to a symlink since the path must not currently exist. - {{delete}}, {{deleteOnExit}} -- if path refers to a symlink then the symlink is removed (like {{unlink}}). - {{open}} -- if the given path refers to a symlink then the path is fully resolved. - {{set|getWorkingDirectory}} -- if the given path refers to a symlink then the symlink is fully resolved when setting the working directory, ie if the working directory is changed to {{/link1/link2}} then subsequent queries of the working directory should return whatever {{link2}} points to. - {{setReplication}} -- if the given path refers to a symlink then the path is fully resolved. - {{setPermission}} -- if the given path refers to a symlink then the path is fully resolved (like {{chmod}}). Symlink access is determined by permissions of the target of the symlink. - {{setOwner}} -- if the given path refers to a symlink then the path is fully resolved (like {{chown}}). We could add an {{lchown}} equivalent in the future. - {{setTimes}} -- if the given path refers to a symlink then the path is fully resolved. ySmlinks do not have access times. - {{get|setFileChecksum}} -- if the given path refers to a symlink then the path is fully resolved, ie there are no checksums associated with symlinks. - {{getFileStatus}} -- if the given path refers to a symlink then the path is fully resolved, ie returns the FileStatus of the file or directory the symlink points to. - *new* {{getLinkFileStatus}} -- like {{lstat}}, if the given path refers to a symlink then the FileStatus of the symlink is returned, otherwise the results as if {{getFileStatus}} was called. If symlink support is not enabled or the underlying filesystem does not support symlinks then the results are the same as if {{getFileStatus}} was called. - {{isDirectory}}, {{isFile}} -- if the given path refers to a symlink then the path is fully resolved, ie if the symlink points to a directory then {{isDirectory}} returns true. - *new* {{isLink}} -- returns true if the given path refers to a symlink. If symlink support is not enabled or the underlying filesystem does not support symlinks then {{isLink}} returns false. - {{listStatus}} -- if the given path refers to a symlink then the path is fully resolved, ie the result is equivalent to calling {{listStatus}} with the target of the symlink. - {{getFileBlockLocations}} -- if the given path refers to a symlink then the path is fully resolved, ie symlinks are not associated with blocks. - {{getFsStatus}} -- if the given path refers to a symlink then the path is fully resolved, ie the FsStatus of the target of the symlink is returned. - {{getLinkTarget}} -- only the first symlink in the given path is resolved. If symlink support is not enabled or the underlying filesystem does not support symlinks then an IOException is thrown. - {{resolve}} -- all symlinks in the given path are resolved. If symlink support is not enabled or the underlying filesystem does not support symlinks then no symlinks are resolved. - {{createSymlink(oldpath, newpath)}} -- newpath should not refer to a symlink since the path must not currently exist. -- _No symlinks are resolved in oldpath_. For example, if {{/link1}} points to {{/dir}}, and {{/link1/link2}} points to {{/link1/file}}, then {{createSymlink("/link1/file", "/link1/link2")}} points {{link2}} to {{/link1/file}} (not {{/dir/file}}). The path {{/link1/link2}} resolves as follows: {{/dir/link2}} -> {{/link1/file}} -> {{/dir/file}}. -- If symlink support is not enabled or the underlying filesystem does not support symlinks then an IOException is thrown. - {{rename(oldpath, newpath)}} -- -- if oldpath refers to a symlink, the symlink is renamed (POSIX) -- if newpath refers to a symlink, the symlink is over-written (POSIX), if the the OVERWRITE option is passed. > Create symbolic links in HDFS > ----------------------------- > > Key: HDFS-245 > URL: https://issues.apache.org/jira/browse/HDFS-245 > Project: Hadoop HDFS > Issue Type: New Feature > Reporter: dhruba borthakur > Assignee: dhruba borthakur > Attachments: 4044_20081030spi.java, designdocv1.txt, designdocv2.txt, > HADOOP-4044-strawman.patch, symlink-0.20.0.patch, symLink1.patch, > symLink1.patch, symLink11.patch, symLink12.patch, symLink13.patch, > symLink14.patch, symLink15.txt, symLink15.txt, symlink16-common.patch, > symlink16-hdfs.patch, symlink16-mr.patch, symlink17-common.txt, > symlink17-hdfs.txt, symlink18-common.txt, symlink19-common.txt, > symlink19-common.txt, symlink19-hdfs.txt, symLink4.patch, symLink5.patch, > symLink6.patch, symLink8.patch, symLink9.patch > > > HDFS should support symbolic links. A symbolic link is a special type of file > that contains a reference to another file or directory in the form of an > absolute or relative path and that affects pathname resolution. Programs > which read or write to files named by a symbolic link will behave as if > operating directly on the target file. However, archiving utilities can > handle symbolic links specially and manipulate them directly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.