[ https://issues.apache.org/jira/browse/HDFS-13713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603111#comment-16603111 ]
Steve Loughran commented on HDFS-13713: --------------------------------------- * unused parts: call out need to clean up * duplicate entries: should the s3a one do a check & fail consistently? Or call out that it's a MUST fail with IllegalArgumentException or IOE. (I'd prefer a consistent IllegalArgumentException as this check is straightforward to do client side) * now a dir at dest on completion? For hdfs, that check can be done. For the stores, yes, there's a race, but the client could do some precursor work * S3AMultipartUploader.initialize() can call mkdirs() on the parent, so trigger the full check for parent paths & add a marker to stop a file going in there later. Or at the very least, check the parent for not being a file (nonexistent is OK), and check the dest for being nonexistent or file. which means we need two preconditions for initialize {code} * destination is nonexistent or a file (getFileStatus() =. FNFE or isFile) * no parent entry is a file (or least, getFileStatus(parent) == FNFE or isDirectory {code} This is consistent with S3A.create() On commit, same conditions hold but there is a cost there of a second set getFileStatus calls. Ill highlight that we don't do that second check on close() on an S3 output stream, even though that same condition can exist. so maybe then * add checks at init time * at commit time say "undefined outcome if changes have happened", discuss race conditions of all stores here. *and make it the responsibility of applications using the code to manage this* Or, and its an interesting thought: don't do the checks at init time, but postpone them until commit. That's the smallest window, and it will make clear to callers that this is when you need to check for trouble. That is: initialize() returning doesn't guarantee that the path/parent conditions are met All this is straightforward to test, obviously, which is why having consistent exceptions is nice. (and its why I like these FS specs, they identify those corner cases we can trivially derive tests from, and, use as the reference when trying to decide whether a test failure is a bug in the FS vs the test itself > Add specification of Multipart Upload API to FS specification, with contract > tests > ---------------------------------------------------------------------------------- > > Key: HDFS-13713 > URL: https://issues.apache.org/jira/browse/HDFS-13713 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: fs, test > Affects Versions: 3.2.0 > Reporter: Steve Loughran > Assignee: Ewan Higgs > Priority: Blocker > Attachments: HDFS-13713.001.patch, HDFS-13713.002.patch, > multipartuploader.md > > > There's nothing in the FS spec covering the new API. Add it in a new .md file > * add FS model with the notion of a function mapping (uploadID -> Upload), > the operations (list, commit, abort). The [TLA+ > mode|https://issues.apache.org/jira/secure/attachment/12865161/objectstore.pdf]l > of HADOOP-13786 shows how to do this. > * Contract tests of not just the successful path, but all the invalid ones. > * implementations of the contract tests of all FSs which support the new API. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org