[ https://issues.apache.org/jira/browse/HDFS-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757438#action_12757438 ]
Konstantin Shvachko commented on HDFS-245: ------------------------------------------ ~Alan asked for the summary. Here is the one before history takes a new turn in this issue. Tried to be objective.~ h2. Overview of the SymLink proposal. The SymLink proposal HDFS-245 introduces symbolic links in HDFS, which can point to files or directories in the same (internal link) or a different (external link) files systems. # Internal link example /users/shv/test # External link example hdfs://nn1.yahoo.com:50030/users/shv/data External links may be also viewed as mount points of other Hadoop file system(s), those that implement FileSystem API. Internal links can be resolved inside the name-node, there is no contradiction here. External links should be returned back to an HDFS client, which should connect to the linked file system on a different cluster and get information from there. The controversy here is how the information about the external symbolic links should be returned to and handled by the client. Note that such handling can be only done at the top FileSystem level, because lower level abstractions represent a single Hadoop file system. The following major approaches have been considered. *I.* Replace each call to the name-node with two: the first resolves the input path if it is a symbolic link or does nothing if the path is local, the second performs the required action, like open() a file. *II.* Each name-node operation with a path parameter throws an UnresolvedLinkException if the path crosses an external link (a mount point). The exception contains a new path pointed to by the symlink, which the client uses to repeat the call with a different file system. *III.* Every call to the name-node returns a result, which incorporates a potential symlink resolution, if any, in addition to the actual return type of the method. Thus the result of a method is sort of a union of a potential link and the real type. See examples below. *IV.* Let the name-node throw MountPointException if the specified path contains an external symlink. The client catches MountPointException, calls a new method getLinkTarget() to resolve the link, and uses the resolved path further to perform the action - open() a file. *V.* Similar to (III) but the potential symlink resolution is hidden in a thread local variable. Most people agreed that (I) doubles the access time in the regular case, when there are no symlinks. Doug proposed at some point to benchmark it before ruling it out, because the name-node is fast based on the latest benchmarks. (II), (III) and (V) do not require extra RPC calls. (IV) does not require extra RPCs in the regular (no symlinks) case, but does require to resolve external symlinks explicitly if one is on the path. The latest patch submitted by Dhruba implements (III). There were several iterations over this. The latest API looks like this: {code} FsResult<LocatedBlocks> getBlockLocations(path, ...) throws IOException; FsResult<FileStatus> getFileInfo(String src) throws IOException; FsResult<BooleanWritable> delete(String src, boolean recursive) throws IOException; FsResult<NullWritable> setOwner(String src, ...) throws IOException; {code} This is much better than the starting patch, which was defining a separate union-class for every return type, but still, replacing a simple {{void}} type with {{FsResult<NullWritable>}} and {{Boolean}} with {{FsResult<BooleanWritable>}} looks controversial. - Doug opposes (II) and (IV) on the basis that "Exceptions should not be used for normal program control." - He refers to ["Best Practices for Exception Handling":http://onjava.com/pub/a/onjava/2003/11/19/exceptions.html] This lists just three cases where exceptions are appropriate: programming errors, client code errors, and resource failures, which the links are clearly not. - I, Sanjay, et al. argue that this and other sources mention and accept usage of so called Checked exceptions, which "represent invalid conditions in areas outside the immediate control of the program". UnresolvedLinkException and MountPointException fall under this category, because the name-node cannot immediately handle external symlinks being unaware of other clusters. - Doug's counter argument to this is that the whole HDFS is a single monolithic layer, which should handle symlinks and therefore this is not an exceptional condition for any sub-layers of HDFS. > Create symbolic links in HDFS > ----------------------------- > > Key: HDFS-245 > URL: https://issues.apache.org/jira/browse/HDFS-245 > Project: Hadoop HDFS > Issue Type: New Feature > Reporter: dhruba borthakur > Assignee: Eli Collins > Attachments: 4044_20081030spi.java, HADOOP-4044-strawman.patch, > symlink-0.20.0.patch, symLink1.patch, symLink1.patch, symLink11.patch, > symLink12.patch, symLink13.patch, symLink14.patch, symLink15.txt, > symLink15.txt, symLink4.patch, symLink5.patch, symLink6.patch, > symLink8.patch, symLink9.patch > > > HDFS should support symbolic links. A symbolic link is a special type of file > that contains a reference to another file or directory in the form of an > absolute or relative path and that affects pathname resolution. Programs > which read or write to files named by a symbolic link will behave as if > operating directly on the target file. However, archiving utilities can > handle symbolic links specially and manipulate them directly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.