[ 
https://issues.apache.org/jira/browse/HDFS-13713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603111#comment-16603111
 ] 

Steve Loughran commented on HDFS-13713:
---------------------------------------

* unused parts: call out need to clean up
* duplicate entries: should the s3a one do a check & fail consistently? Or call 
out that it's a MUST fail with IllegalArgumentException or IOE. (I'd prefer a 
consistent IllegalArgumentException as this check is straightforward to do 
client side)
*  now a dir at dest on completion?

For hdfs, that check can be done. For the stores, yes, there's a race, but the 
client could do some precursor work

* S3AMultipartUploader.initialize() can call mkdirs() on the parent, so trigger 
the full check for parent paths & add a marker to stop a file going in there 
later. Or at the very least, check the parent for not being a file (nonexistent 
is OK), and check the dest for being nonexistent or file.

which means we need two preconditions for initialize
{code}
* destination is nonexistent or a file   (getFileStatus() =. FNFE or isFile)
* no parent entry is a file (or least, getFileStatus(parent) == FNFE or 
isDirectory
{code}

This is consistent with S3A.create()

On commit, same conditions hold but there is a cost there of a second set 
getFileStatus calls. Ill highlight that we don't do that second check on 
close() on an S3 output stream, even though that same condition can exist. 

so maybe then
* add checks at init time
* at commit time say "undefined outcome if changes have happened", discuss race 
conditions of all stores here. *and make it the responsibility of applications 
using the code to manage this*

Or, and its an interesting thought: don't do the checks at init time, but 
postpone them until commit. That's the smallest window, and it will make clear 
to callers that this is when you need to check for trouble. That is: 
initialize() returning doesn't guarantee that the path/parent conditions are met

All this is straightforward to test, obviously, which is why having consistent 
exceptions is nice. (and its why I like these FS specs, they identify those 
corner cases we can trivially derive tests from, and, use as the reference when 
trying to decide whether a test failure is a bug in the FS vs the test itself

> Add specification of Multipart Upload API to FS specification, with contract 
> tests
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-13713
>                 URL: https://issues.apache.org/jira/browse/HDFS-13713
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: fs, test
>    Affects Versions: 3.2.0
>            Reporter: Steve Loughran
>            Assignee: Ewan Higgs
>            Priority: Blocker
>         Attachments: HDFS-13713.001.patch, HDFS-13713.002.patch, 
> multipartuploader.md
>
>
> There's nothing in the FS spec covering the new API. Add it in a new .md file
> * add FS model with the notion of a function mapping (uploadID -> Upload), 
> the operations (list, commit, abort). The [TLA+ 
> mode|https://issues.apache.org/jira/secure/attachment/12865161/objectstore.pdf]l
>  of HADOOP-13786 shows how to do this.
> * Contract tests of not just the successful path, but all the invalid ones.
> * implementations of the contract tests of all FSs which support the new API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to