[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

dhruba borthakur (JIRA) Mon, 06 Oct 2008 22:43:37 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637384#action_12637384
 ]


dhruba borthakur commented on HADOOP-4044:
------------------------------------------

1. "symlink" vs "link":
   I think it makes sense to call the current implementation as "links" instead 
of symlinks. None of the existing file system implementations have any support 
for any kinds of links. It is ok for the first implementation to refer to this 
new construct as a generic "link". HDFS implements it as a symbolic link, but 
some other file system may implement "links" as hard links.

  +1 for calling this construct as "link" instead of "symlink".

2. Exceptions vs Objects-as-return-status in an public API (FileSystem or 
ClientProtocol API)
   Exceptions or Object-as-return-value approaches are two ways of 
communication a certain piece of information to the user of the API. 
   (a) One goal is to discuss  how we can attempt to make that API somewhat 
future proof. If we consider our current Hadoop RPC, the only way to 
serialize/deserialize an exception object is to serialize its message string. 
The client side can de-serialize this string and reconstruct an exception 
object. If the return status need to contain various different pieces of 
information, then serializing/deser as a string inside the exception object is 
not very elegant. Many other RPC systems (e.g. Thrift)  allow versioning 
objects (adding new fields) but many might not allow adding new exceptions to a 
pre-existing method call.  Thus, making API calls return 
objects-as-return-types seem to be more future-proof than adding exceptions.
   (b) The thumb-rule that we have been following is that exceptions are 
generated when an abnormal situation occurs. If an exception is thrown by the 
ClientProtocol, it is logged by the RPC subsystem into an error log. This is a 
good characteristic to have in a distributed system, makes debugging easy 
because a scan in the error logs
pinpoints the exceptions raised by the API. Access control checks or disk-full 
conditions raise exceptions, and they are logged by the RPC subsystem. We do 
not want every call to "traverse a symbolic link" to log an exception message 
in the error logs, do we? (Of course, we can special case it and say that we 
will not log UnresolvedPathException; but by
special-casing it, we are acknowledging that this exception is not an abnormal 
behaviour).

 +1 for RenameResult rename(Path src, Path dst) throws IOException;

3. Exceptions vs Object-as-return-status inside the NameNode
   (a)  Different filesystem implementations can have very different 
implementations and a very different set of developers. For example, HDFS might 
implement code in such a way that traversing a link returns a object-status 
where S3 or KFS throws an exception (internal to the implementation). If we 
write a file system implementation for Ceph, we are likely to not rewrite the 
Ceph code to not use exceptions (or vice versa). I would like to draw the 
distinction that this issue is not related to what is decide in case (2) above.
   (b) The primary focus for designing the internal methods of the NameNode is 
not future-proof for backward compatibility. Also, there isn't any requirement 
to serialize/deserialize any exception objects as long as that object is used 
inside the NameNode. Thus, exceptions could be used here. This keeps most of 
the HDFS code clean and elegant.

  +1 for Using Exceptions inside the NameNode internal methods.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, 
> symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file 
> that contains a reference to another file or directory in the form of an 
> absolute or relative path and that affects pathname resolution. Programs 
> which read or write to files named by a symbolic link will behave as if 
> operating directly on the target file. However, archiving utilities can 
> handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Reply via email to