[jira] Commented: (HDFS-245) Create symbolic links in HDFS

Konstantin Shvachko (JIRA) Fri, 18 Sep 2009 14:17:49 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757438#action_12757438
 ]


Konstantin Shvachko commented on HDFS-245:
------------------------------------------

~Alan asked for the summary. Here is the one before history takes a new turn in 
this issue. Tried to be objective.~

h2. Overview of the SymLink proposal.

The SymLink proposal HDFS-245 introduces symbolic links in HDFS, which can 
point to files or directories in the same (internal link) or a different 
(external link) files systems.
# Internal link example /users/shv/test
# External link example hdfs://nn1.yahoo.com:50030/users/shv/data
External links may be also viewed as mount points of other Hadoop file 
system(s), those that implement FileSystem API.

Internal links can be resolved inside the name-node, there is no contradiction 
here.

External links should be returned back to an HDFS client, which should connect 
to the linked file system on a different cluster and get information from 
there. The controversy here is how the information about the external symbolic 
links should be returned to and handled by the client. 
Note that such handling can be only done at the top FileSystem level, because 
lower level abstractions represent a single Hadoop file system.

The following major approaches have been considered.

*I.* Replace each call to the name-node with two: the first resolves the input 
path if it is a symbolic link or does nothing if the path is local, the second 
performs the required action, like open() a file.
*II.* Each name-node operation with a path parameter throws an 
UnresolvedLinkException if the path crosses an external link (a mount point). 
The exception contains a new path pointed to by the symlink, which the client 
uses to repeat the call with a different file system.
*III.* Every call to the name-node returns a result, which incorporates a 
potential symlink resolution, if any, in addition to the actual return type of 
the method. Thus the result of a method is sort of a union of a potential link 
and the real type. See examples below.
*IV.* Let the name-node throw MountPointException if the specified path 
contains an external symlink. The client catches MountPointException, calls a 
new method getLinkTarget() to resolve the link, and uses the resolved path 
further to perform the action - open() a file.
*V.* Similar to (III) but the potential symlink resolution is hidden in a 
thread local variable.

Most people agreed that (I) doubles the access time in the regular case, when 
there are no symlinks. Doug proposed at some point to benchmark it before 
ruling it out, because the name-node is fast based on the latest benchmarks.
(II), (III) and (V) do not require extra RPC calls.
(IV) does not require extra RPCs in the regular (no symlinks) case, but does 
require to resolve external symlinks explicitly if one is on the path.

The latest patch submitted by Dhruba implements (III). There were several 
iterations over this. The latest API looks like this:
{code}
FsResult<LocatedBlocks> getBlockLocations(path, ...) throws IOException;
FsResult<FileStatus> getFileInfo(String src) throws IOException;
FsResult<BooleanWritable> delete(String src, boolean recursive) throws 
IOException;
FsResult<NullWritable> setOwner(String src, ...) throws IOException;
{code}

This is much better than the starting patch, which was defining a separate 
union-class for every return type, but still, replacing a simple {{void}} type 
with {{FsResult<NullWritable>}} and {{Boolean}} with 
{{FsResult<BooleanWritable>}} looks controversial.

- Doug opposes (II) and (IV) on the basis that "Exceptions should not be used 
for normal program control."
- He refers to ["Best Practices for Exception 
Handling":http://onjava.com/pub/a/onjava/2003/11/19/exceptions.html]
This lists just three cases where exceptions are appropriate: programming 
errors, client code errors, and resource failures, which the links are clearly 
not.
- I, Sanjay, et al. argue that this and other sources mention and accept usage 
of so called Checked exceptions, which "represent invalid conditions in areas 
outside the immediate control of the program". UnresolvedLinkException and 
MountPointException fall under this category, because the name-node cannot 
immediately handle external symlinks being unaware of other clusters.
- Doug's counter argument to this is that the whole HDFS is a single monolithic 
layer, which should handle symlinks and therefore this is not an exceptional 
condition for any sub-layers of HDFS.




> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HDFS-245
>                 URL: https://issues.apache.org/jira/browse/HDFS-245
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: dhruba borthakur
>            Assignee: Eli Collins
>         Attachments: 4044_20081030spi.java, HADOOP-4044-strawman.patch, 
> symlink-0.20.0.patch, symLink1.patch, symLink1.patch, symLink11.patch, 
> symLink12.patch, symLink13.patch, symLink14.patch, symLink15.txt, 
> symLink15.txt, symLink4.patch, symLink5.patch, symLink6.patch, 
> symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file 
> that contains a reference to another file or directory in the form of an 
> absolute or relative path and that affects pathname resolution. Programs 
> which read or write to files named by a symbolic link will behave as if 
> operating directly on the target file. However, archiving utilities can 
> handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-245) Create symbolic links in HDFS

Reply via email to