[
https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637384#action_12637384
]
dhruba borthakur commented on HADOOP-4044:
------------------------------------------
1. "symlink" vs "link":
I think it makes sense to call the current implementation as "links" instead
of symlinks. None of the existing file system implementations have any support
for any kinds of links. It is ok for the first implementation to refer to this
new construct as a generic "link". HDFS implements it as a symbolic link, but
some other file system may implement "links" as hard links.
+1 for calling this construct as "link" instead of "symlink".
2. Exceptions vs Objects-as-return-status in an public API (FileSystem or
ClientProtocol API)
Exceptions or Object-as-return-value approaches are two ways of
communication a certain piece of information to the user of the API.
(a) One goal is to discuss how we can attempt to make that API somewhat
future proof. If we consider our current Hadoop RPC, the only way to
serialize/deserialize an exception object is to serialize its message string.
The client side can de-serialize this string and reconstruct an exception
object. If the return status need to contain various different pieces of
information, then serializing/deser as a string inside the exception object is
not very elegant. Many other RPC systems (e.g. Thrift) allow versioning
objects (adding new fields) but many might not allow adding new exceptions to a
pre-existing method call. Thus, making API calls return
objects-as-return-types seem to be more future-proof than adding exceptions.
(b) The thumb-rule that we have been following is that exceptions are
generated when an abnormal situation occurs. If an exception is thrown by the
ClientProtocol, it is logged by the RPC subsystem into an error log. This is a
good characteristic to have in a distributed system, makes debugging easy
because a scan in the error logs
pinpoints the exceptions raised by the API. Access control checks or disk-full
conditions raise exceptions, and they are logged by the RPC subsystem. We do
not want every call to "traverse a symbolic link" to log an exception message
in the error logs, do we? (Of course, we can special case it and say that we
will not log UnresolvedPathException; but by
special-casing it, we are acknowledging that this exception is not an abnormal
behaviour).
+1 for RenameResult rename(Path src, Path dst) throws IOException;
3. Exceptions vs Object-as-return-status inside the NameNode
(a) Different filesystem implementations can have very different
implementations and a very different set of developers. For example, HDFS might
implement code in such a way that traversing a link returns a object-status
where S3 or KFS throws an exception (internal to the implementation). If we
write a file system implementation for Ceph, we are likely to not rewrite the
Ceph code to not use exceptions (or vice versa). I would like to draw the
distinction that this issue is not related to what is decide in case (2) above.
(b) The primary focus for designing the internal methods of the NameNode is
not future-proof for backward compatibility. Also, there isn't any requirement
to serialize/deserialize any exception objects as long as that object is used
inside the NameNode. Thus, exceptions could be used here. This keeps most of
the HDFS code clean and elegant.
+1 for Using Exceptions inside the NameNode internal methods.
> Create symbolic links in HDFS
> -----------------------------
>
> Key: HADOOP-4044
> URL: https://issues.apache.org/jira/browse/HADOOP-4044
> Project: Hadoop Core
> Issue Type: New Feature
> Components: dfs
> Reporter: dhruba borthakur
> Assignee: dhruba borthakur
> Attachments: symLink1.patch, symLink1.patch, symLink4.patch,
> symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file
> that contains a reference to another file or directory in the form of an
> absolute or relative path and that affects pathname resolution. Programs
> which read or write to files named by a symbolic link will behave as if
> operating directly on the target file. However, archiving utilities can
> handle symbolic links specially and manipulate them directly.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.