[ 
https://issues.apache.org/jira/browse/HADOOP-12910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15189621#comment-15189621
 ] 

Chris Nauroth commented on HADOOP-12910:
----------------------------------------

I am sensing massive scope creep in this discussion.

bq. Actually, one more thing to define in HDFS-9924 and include any 
specification is: linearlizability/serializability guarantees

I'm going to repeat some of my comments from HDFS-9924.  A big motivation for 
this effort is that we often see an application needs to execute a large set of 
renames, where the application has knowledge that there is no dependency 
between the rename operations and no ordering requirements.  Although 
linearizability is certainly nicer to have than not have, use cases like this 
don't need linearizability.

Implementing a linearizability guarantee would significantly complicate this 
effort.  ZooKeeper has an async API with ordering guarantees, and it takes a 
very delicate coordination between client-side and server-side state to make 
that happen.  Instead, I suggest that we focus on what we really need (async 
execution of independent operations) and tell clients that they have 
responsibility to coordinate dependencies between calls.  I also have commented 
on HDFS-9924 that we could later providing a programming model of futures + 
promises as a more elegant way to help callers structure code with multiple 
dependent async calls.  Even that much is not an immediate need though.

This does not preclude providing a linearizability guarantee at some point in 
the future.  I'm just saying that we have an opportunity to provide something 
valuable sooner even without linearizability.

bq. I'm going to be ruthless and say "I'd like to see a specification of this 
alongside the existing one". Because that one has succeeded in being a 
reference point for everyone; we need to continue that for a key binding. It 
should be straightforward here.

Assuming the above project plan is acceptable (no linearizability right now), 
this reduces to a simple statement like "individual async operations adhere to 
the same contract as the corresponding sync operations, and there are no 
guarantees on ordering across multiple async operations."

bq. Is it the future that raises an IOE, or the operation? I can see both 
needing to

Certainly Hadoop-specific exceptions like {{AccessControlException}} and 
{{QuotaExceededException}} must dispatch asynchronously, such as wrapped in an 
{{ExecutionException}}.  You won't know if you're going to hit one of these at 
the time of submitting the call.  My opinion is that if the API is truly async, 
then it implies we cannot perform I/O on the calling thread, and therefore 
cannot throw an {{IOException}} at call time.  I believe Nicholas wants to put 
{{throws IOException}} in the method signatures anyway for ease of 
backwards-compatible changes in the future though, just in case we find a need 
later.  I think that's acceptable.


> Add new FileSystem API to support asynchronous method calls
> -----------------------------------------------------------
>
>                 Key: HADOOP-12910
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12910
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Xiaobing Zhou
>
> Add a new API, namely FutureFileSystem (or AsynchronousFileSystem, if it is a 
> better name).  All the APIs in FutureFileSystem are the same as FileSystem 
> except that the return type is wrapped by Future, e.g.
> {code}
>   //FileSystem
>   public boolean rename(Path src, Path dst) throws IOException;
>   //FutureFileSystem
>   public Future<Boolean> rename(Path src, Path dst) throws IOException;
> {code}
> Note that FutureFileSystem does not extend FileSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to