[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously
[ https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13678507#comment-13678507 ] Suresh Srinivas commented on HADOOP-9371: - Steve, What you are trying do in this jira? Because some of the comments in this jira suggests changing the semantics. Is your intent to document the semantics rigorously and add tests to ensure any other file system implementation can be tested (I do not know how you can test atomicity easily) and certified based on these tests? or Are you also planning to change the semantics? As regards to deciding the semantics, where the documentation is either sparse or not clear, the semantics as implemented by HDFS is the gold standard. Because that is what majority of applications are dependent upon. I would discourage others from second guessing what applications need, because we do not know all the applications that are out there. Define Semantics of FileSystem and FileContext more rigorously -- Key: HADOOP-9371 URL: https://issues.apache.org/jira/browse/HADOOP-9371 Project: Hadoop Common Issue Type: Sub-task Components: fs Affects Versions: 1.2.0, 3.0.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: HADOOP-9361.2.patch, HADOOP-9361.patch, HadoopFilesystemContract.pdf Original Estimate: 48h Remaining Estimate: 48h The semantics of {{FileSystem}} and {{FileContext}} are not completely defined in terms of # core expectations of a filesystem # consistency requirements. # concurrency requirements. # minimum scale limits Furthermore, methods are not defined strictly enough in terms of their outcomes and failure modes. The requirements and method semantics should be defined more strictly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously
[ https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667065#comment-13667065 ] Steve Loughran commented on HADOOP-9371: Konstantin -good points -I'm going to redo it as .apt so the linking isn't going to be useful (soon). MD may be well tooled, but as there isn't a consistent format for handling tables, it's not that much better than APT (though it does make it easier to use angle brackets in in-line code, and doesn't tie you to a single build tool forever. # Atomic recursive deletes? it sort of happens today in every real FS as the toplevel inode goes away. I don't know how that spans filesystems -can I actually do an rm -rf above a mounted FS in Unix? That said: saying no guarantees about atomicity is one thing -it gives us flexibility in future - but as all normal filesystems appear to provide this, code will tend to assume it anyway. I think we should do it -but call out blobstores for breaking some of these rules. # atomic rename where the parent dir stays the same does seem a good compromise on atomicity; it means that more distributed filesystems don't do it. In fact, we could say there are no guarantees that rename() across filesystems work at all. And then add an explicit exception {{RenameAcrossFileSystemsUnsupported}} for this. I'm confident that you can't rename file:///c:/something.txt to file:///d:/something.txt on windows. Define Semantics of FileSystem and FileContext more rigorously -- Key: HADOOP-9371 URL: https://issues.apache.org/jira/browse/HADOOP-9371 Project: Hadoop Common Issue Type: Sub-task Components: fs Affects Versions: 1.2.0, 3.0.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: HADOOP-9361.2.patch, HADOOP-9361.patch, HadoopFilesystemContract.pdf Original Estimate: 48h Remaining Estimate: 48h The semantics of {{FileSystem}} and {{FileContext}} are not completely defined in terms of # core expectations of a filesystem # consistency requirements. # concurrency requirements. # minimum scale limits Furthermore, methods are not defined strictly enough in terms of their outcomes and failure modes. The requirements and method semantics should be defined more strictly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously
[ https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13666838#comment-13666838 ] Konstantin Shvachko commented on HADOOP-9371: - Steve, you might want to link the document from github to the jira. Add Link has an option to add a web link. Not requiring atomicity for mkdirs() and recursive deletes makes sense to me. For renames I think we should also restrict atomicity to one special case, when file or directory name changes, that is file is not moving from one directory to another. I call it in-place rename, which with inode numbers in place is a trivial operation. Atomic moves are hard if you build a distributed namespace service (like Giraffa). Moving a file between directories that are located on different nodes requires distributed coordination, which can be complex. Define Semantics of FileSystem and FileContext more rigorously -- Key: HADOOP-9371 URL: https://issues.apache.org/jira/browse/HADOOP-9371 Project: Hadoop Common Issue Type: Sub-task Components: fs Affects Versions: 1.2.0, 3.0.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: HADOOP-9361.2.patch, HADOOP-9361.patch, HadoopFilesystemContract.pdf Original Estimate: 48h Remaining Estimate: 48h The semantics of {{FileSystem}} and {{FileContext}} are not completely defined in terms of # core expectations of a filesystem # consistency requirements. # concurrency requirements. # minimum scale limits Furthermore, methods are not defined strictly enough in terms of their outcomes and failure modes. The requirements and method semantics should be defined more strictly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously
[ https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13636552#comment-13636552 ] Steve Loughran commented on HADOOP-9371: also note that apple's HFS hasn't offered atomic renames until recently:[http://www.weirdnet.nl/apple/rename.html] Define Semantics of FileSystem and FileContext more rigorously -- Key: HADOOP-9371 URL: https://issues.apache.org/jira/browse/HADOOP-9371 Project: Hadoop Common Issue Type: Sub-task Components: fs Affects Versions: 1.2.0, 3.0.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: HADOOP-9361.2.patch, HADOOP-9361.patch, HadoopFilesystemContract.pdf Original Estimate: 48h Remaining Estimate: 48h The semantics of {{FileSystem}} and {{FileContext}} are not completely defined in terms of # core expectations of a filesystem # consistency requirements. # concurrency requirements. # minimum scale limits Furthermore, methods are not defined strictly enough in terms of their outcomes and failure modes. The requirements and method semantics should be defined more strictly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously
[ https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635220#comment-13635220 ] Steve Loughran commented on HADOOP-9371: We also need to specify {{Seekable}}, as the {{FSDataInputStream}} which must be returned from {{open()}} calls implement it, and the specifics of {{seek(long pos)}} are not completely defined, consistently implemented, or explicitly tested. * some implementation classes validate the range of a seek in the call; it can also be postponed until the next read() (which is how Posix expects it). * Not everything rejects negative seek offsets * While {{EOFException}} would be the appropriate exception to raise on going past the end of the file, it is rarely to be seen in the source. Delayed seeks can deliver tangible performance benefits and it would be unwise to demand stricter validation than {{::lseek()}} or {{::SetFilePointerEx()}}. We ought to say you can if you want, and write tests that verify either the seek fails, or the read straight afterwards fails. == Seekable == * When a file is opened, {{getPos()}} MUST equal 0 * Implementations MAY NOT implement {{seek()}}, and instead MAY throw an {{IOException}} * A {{seek(L)}} on a closed input stream MUST fail with an {{IOException}}. * After a successful {{seek(L)}}, {{getPos()==L}} for all L: {{0 = L length(file)}} * On a {{seek(L)}} with L0 an MUST be thrown. It SHOULD be an {{IOException}}. It MAY be {{IllegalArgumentException}} or other {{RuntimeException}} * On a {{seek(L)}} with Llength(file), an {{IOException}} MAY be thrown. It SHOULD be an {{EndOfFileException}} * If an {{IOException}} is not thrown, then an {{IOException}} MUST be thrown on the next {{read()}} operation. It SHOULD be an {{EndOfFileException}} This is actually a relaxation of the {{Seekable.seek()}} definition, which states Can't seek past the end of the file.. The {{RawLocalFileSystem}} on which everything ultimately depends does support seeking past the end of the file -it is only on the read operation where an exception is raised. * After a {{seek(L)}} with {{Llength(file)}}, {{read()}} returns the byte at position L in the file. * After a {{seek(L)}} with {{L==length(file)}}, {{read()}} returns -1 * After a {{seek(L)}} with {{L==length(file)}}, {{read(byte[1],0,1)}} returns the byte at position L in the file. Tests to verify offset validation # open a file of length {{file_len 0}}, verify {{getPos()==0}} # {{seek(file_len)}}, verify {{getPos()==file_len}} If an exception is not raised, read() and expect an {{IOException}} exception # {{seek(file_len+1)}}, expect an {{EOFException}} If an exception is not raised, read() and expect the exception then # seek(-1), expect an {{IOException}} immediately. open a file of length {{file_len == 0}} # verify {{getPos()==0}} # Verify that {{seek(0)}} succeeds. # verify that {{read()}} returns -1. Test to verify {{seek()}} actually changes the location for future reads. * verify that after a {{seek()}}, {{read()}} returns the data at the seek location. This must work for forward and backwards seeks. * verify that after a {{seek()}}, a {{read(byte[])}} returns the bytes of data at the seek location. This must work for forward and backwards seeks.] Repeat for very large offsets (e.g. 128KB file), to ensure that filesystems with local caches/buffers handle longer range seeks correctly. Define Semantics of FileSystem and FileContext more rigorously -- Key: HADOOP-9371 URL: https://issues.apache.org/jira/browse/HADOOP-9371 Project: Hadoop Common Issue Type: Sub-task Components: fs Affects Versions: 1.2.0, 3.0.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: HADOOP-9361.2.patch, HADOOP-9361.patch, HadoopFilesystemContract.pdf Original Estimate: 48h Remaining Estimate: 48h The semantics of {{FileSystem}} and {{FileContext}} are not completely defined in terms of # core expectations of a filesystem # consistency requirements. # concurrency requirements. # minimum scale limits Furthermore, methods are not defined strictly enough in terms of their outcomes and failure modes. The requirements and method semantics should be defined more strictly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously
[ https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635267#comment-13635267 ] Steve Loughran commented on HADOOP-9371: note that {{BufferedFSInputStream}} doesn't meet this spec as it treats a negative seek as a no-op: {code} public void seek(long pos) throws IOException { if( pos0 ) { return; } {code} Define Semantics of FileSystem and FileContext more rigorously -- Key: HADOOP-9371 URL: https://issues.apache.org/jira/browse/HADOOP-9371 Project: Hadoop Common Issue Type: Sub-task Components: fs Affects Versions: 1.2.0, 3.0.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: HADOOP-9361.2.patch, HADOOP-9361.patch, HadoopFilesystemContract.pdf Original Estimate: 48h Remaining Estimate: 48h The semantics of {{FileSystem}} and {{FileContext}} are not completely defined in terms of # core expectations of a filesystem # consistency requirements. # concurrency requirements. # minimum scale limits Furthermore, methods are not defined strictly enough in terms of their outcomes and failure modes. The requirements and method semantics should be defined more strictly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously
[ https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628045#comment-13628045 ] Steve Loughran commented on HADOOP-9371: bradley -thanks for your research. I wonder if we should just say, in the concurrency section: * Multiple writers MAY open a file for writing. If this occurs, the outcome is undefined I guess we have to make sure that Syncable is defined here too Define Semantics of FileSystem and FileContext more rigorously -- Key: HADOOP-9371 URL: https://issues.apache.org/jira/browse/HADOOP-9371 Project: Hadoop Common Issue Type: Sub-task Components: fs Affects Versions: 1.2.0, 3.0.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: HADOOP-9361.2.patch, HADOOP-9361.patch, HadoopFilesystemContract.pdf Original Estimate: 48h Remaining Estimate: 48h The semantics of {{FileSystem}} and {{FileContext}} are not completely defined in terms of # core expectations of a filesystem # consistency requirements. # concurrency requirements. # minimum scale limits Furthermore, methods are not defined strictly enough in terms of their outcomes and failure modes. The requirements and method semantics should be defined more strictly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously
[ https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13625721#comment-13625721 ] bradley childs commented on HADOOP-9371: Great work here guys. I've been researching the semantics around write locking and have a couple comments. First around this line regarding write atomicity: Only one writer can write to a file (ISSUE: does anything in MR/HBase use this for locks?), which implies fully atomic write transactions. If this line is a MUST (slightly unclear) then the file lock/release would have to be explicit around create(), append(), and open(). Any writer would have to go through a lock/release state for the file during the output stream instantiation (not desirable). If you looked at HDFS' DistributedFileSystem.java (linked below) create/open/append methods, a FSDataOutputStream is returned with no locking or lifecycle. Further investigation show's no explicit locking inside the FSDataOutputStream stream class. Instead, the FSDataOutputStream does implement the o.a.h.fs.Syncable class which provides a sync() method. Per the interface a call to the sync method Synchronize[s] all buffer with the underlying devices. To me this says that there is no exclusive Writers. Instead a Writers file consistency is only guaranteed the instant the sync(...) method is called on the underlying OutputStream, after which it only MAY be consistent until the sync(..) method is called again. Summary: I believe Only one writer can write to a file (ISSUE: does anything in MR/HBase use this for locks?) should be changed to something like A file may have multiple writers with each writers only guarantee on consistency is during a sync(...) call. Ref: https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.4/src/hdfs/org/apache/hadoop/hdfs/DistributedFileSystem.java https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.4/src/core/org/apache/hadoop/fs/FSDataOutputStream.java https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.4/src/core/org/apache/hadoop/fs/Syncable.java Define Semantics of FileSystem and FileContext more rigorously -- Key: HADOOP-9371 URL: https://issues.apache.org/jira/browse/HADOOP-9371 Project: Hadoop Common Issue Type: Sub-task Components: fs Affects Versions: 1.2.0, 3.0.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: HADOOP-9361.2.patch, HADOOP-9361.patch, HadoopFilesystemContract.pdf Original Estimate: 48h Remaining Estimate: 48h The semantics of {{FileSystem}} and {{FileContext}} are not completely defined in terms of # core expectations of a filesystem # consistency requirements. # concurrency requirements. # minimum scale limits Furthermore, methods are not defined strictly enough in terms of their outcomes and failure modes. The requirements and method semantics should be defined more strictly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously
[ https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613654#comment-13613654 ] Steve Loughran commented on HADOOP-9371: I've just published a copy of my branch of hadoop-trunk with this patch to github This has auto rendering of the [MD file|https://github.com/steveloughran/hadoop-trunk/blob/stevel/HADOOP-9361-filesystem-contract/hadoop-common-project/hadoop-common/src/site/markdown/filesystem-contract.md] I've merged in Mike's and Matt's comments already. Matt: that {{mkdirs()}} point is significant. Have you found code that expects atomic directory creation? If so, we'd better fix it. (this makes me think of something else: a front end client to {{DFSClient}} that downconverts some of the ops to non-atomic. In the case of mkdirs, simply doing the mkdir chain client-side would suffice. I don't see an easy way to do the equivalent of {{mv}} without creating the dest dir then moving the entries below the original.) Define Semantics of FileSystem and FileContext more rigorously -- Key: HADOOP-9371 URL: https://issues.apache.org/jira/browse/HADOOP-9371 Project: Hadoop Common Issue Type: Sub-task Components: fs Affects Versions: 1.2.0, 3.0.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: HADOOP-9361.2.patch, HADOOP-9361.patch, HadoopFilesystemContract.pdf Original Estimate: 48h Remaining Estimate: 48h The semantics of {{FileSystem}} and {{FileContext}} are not completely defined in terms of # core expectations of a filesystem # consistency requirements. # concurrency requirements. # minimum scale limits Furthermore, methods are not defined strictly enough in terms of their outcomes and failure modes. The requirements and method semantics should be defined more strictly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously
[ https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13606335#comment-13606335 ] Matthew Farrellee commented on HADOOP-9371: --- [~ste...@apache.org] Does delete(path, true) need to be atomic? My research suggests that only the HDFS implementation is atomic. (Note: current = r2.0.3-alpha on 2013-02-15 19:41) http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html FilterFileSystem - delegates w/o locking ChecksumFileSystem - delegates w/o locking LocalFileSystem - inherits, delegates to RawLocalFileSystem HarFileSystem - not implemented FTPFileSystem - FTPClient.removeDirectory w/o locking KosmosFileSystem - (not on trunk) no locking NativeS3FileSystem - no locking (even createParent()s to avoid errors, weird) RawLocalFileSystem - uses File.delete (if isFile) and FileUtil.fullyDelete w/o locking S3FileSystem - no locking ViewFileSystem - partial eval, no locking on top level - ChRootFileSystem uses RawLocalFileSystem http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/AbstractFileSystem.html AbstractFileSystem (uses FileContext.delete) FilterFs - delegates w/o locking ChecksumFs - delegates w/o locking LocalFs - inherits, delegates to RawLocalFs DelegateToFileSystem - delegates w/o locking RawLocalFs - inherits, delegates to RawLocalFileSystem FtpFs - inherits, delegates to FTPFileSystem ViewFs - partial eval, no locking at top level FileContext.delete - no hint of atomic requirement, delegates to AbstractFileSystem Side note - it's interesting to see how many FS implementations make their way back to RawLocalFileSystem, sometimes through 3+ layers of indirection. Define Semantics of FileSystem and FileContext more rigorously -- Key: HADOOP-9371 URL: https://issues.apache.org/jira/browse/HADOOP-9371 Project: Hadoop Common Issue Type: Sub-task Components: fs Affects Versions: 1.2.0, 3.0.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: HADOOP-9361.2.patch, HADOOP-9361.patch, HadoopFilesystemContract.pdf Original Estimate: 48h Remaining Estimate: 48h The semantics of {{FileSystem}} and {{FileContext}} are not completely defined in terms of # core expectations of a filesystem # consistency requirements. # concurrency requirements. # minimum scale limits Furthermore, methods are not defined strictly enough in terms of their outcomes and failure modes. The requirements and method semantics should be defined more strictly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously
[ https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602166#comment-13602166 ] Steve Loughran commented on HADOOP-9371: [~mikelid] all good points. How about you submit a patch to the md file for the implicit assumptions, the copy-paste and the root dir -that one being easy to test on all but localfs. That what happens to read during a write or append is a tough one. HDFS silently serves up new data when the read crosses a block, which I'm not convinced is what anyone expects to have happen. We could rephrase consistency after any update operation has completed, read operations initiated afterwards see a consistent view of the latest data? Even there, the ambiguity of what happens of read-during-write is something we should pull out, as it may be where user expectations != hdfs operation Define Semantics of FileSystem and FileContext more rigorously -- Key: HADOOP-9371 URL: https://issues.apache.org/jira/browse/HADOOP-9371 Project: Hadoop Common Issue Type: Sub-task Components: fs Affects Versions: 1.2.0, 3.0.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: HADOOP-9361.patch, HadoopFilesystemContract.pdf Original Estimate: 48h Remaining Estimate: 48h The semantics of {{FileSystem}} and {{FileContext}} are not completely defined in terms of # core expectations of a filesystem # consistency requirements. # concurrency requirements. # minimum scale limits Furthermore, methods are not defined strictly enough in terms of their outcomes and failure modes. The requirements and method semantics should be defined more strictly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously
[ https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601901#comment-13601901 ] Mike Liddell commented on HADOOP-9371: -- A few items for consideration: Possible additions to 'implicit assumption': - paths are represented as Unicode strings - equality/comparison of paths is based on binary content. this implies case-sensitivity and no locale-specific comparison rules. The data added to a file during a write or append MAY be visible during while the write operation is in progress. - Allowing read(s) during write seems to break the subsequent rule that readers always see consistent data. Deleting the root path, /, MUST fail iff recursive==false. - If the root path is empty, it seems reasonable for delete(/,false) to succeed but to have no effect. After a file is created, all ls operations on the file and parent directory MUST not find the file - copy-paste error - after a file is deleted ... Security: if a caller has the rights to list a directory, it has the rights to list directories all the way up the tree. - This point raises lots of interesting questions and requirements for individual methods. A section on security assumptions/rules would be great. Define Semantics of FileSystem and FileContext more rigorously -- Key: HADOOP-9371 URL: https://issues.apache.org/jira/browse/HADOOP-9371 Project: Hadoop Common Issue Type: Sub-task Components: fs Affects Versions: 1.2.0, 3.0.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: HADOOP-9361.patch, HadoopFilesystemContract.pdf Original Estimate: 48h Remaining Estimate: 48h The semantics of {{FileSystem}} and {{FileContext}} are not completely defined in terms of # core expectations of a filesystem # consistency requirements. # concurrency requirements. # minimum scale limits Furthermore, methods are not defined strictly enough in terms of their outcomes and failure modes. The requirements and method semantics should be defined more strictly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously
[ https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13599860#comment-13599860 ] Steve Loughran commented on HADOOP-9371: [~farrellee] -I think I just pulled that {{mkdirs()}} is atomic fact from HDFS, knowing that it's something blobstores dramatically break ({{mkdirs()}} taking the time for a chain of PUT operations from the potentially remote caller. You are right, though, there's no guarantee that it has to be atomic, and a quick look at the Posix docs imply that while {{mkdir()}} is required to be (it's one of the API calls that must be atomic), {{mkdirs()}} can be done client side. When you start to consider cross-volume and NFS mounts, it would have to be non-atomic. I'll change that, and we'd better hope that nobody relies on mkdirs being atomic. I wonder if there is a way to check this other than turning it off and seeing what breaks? Define Semantics of FileSystem and FileContext more rigorously -- Key: HADOOP-9371 URL: https://issues.apache.org/jira/browse/HADOOP-9371 Project: Hadoop Common Issue Type: Sub-task Components: fs Affects Versions: 1.2.0, 3.0.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: HadoopFilesystemContract.pdf Original Estimate: 48h Remaining Estimate: 48h The semantics of {{FileSystem}} and {{FileContext}} are not completely defined in terms of # core expectations of a filesystem # consistency requirements. # concurrency requirements. # minimum scale limits Furthermore, methods are not defined strictly enough in terms of their outcomes and failure modes. The requirements and method semantics should be defined more strictly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously
[ https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13599411#comment-13599411 ] Matthew Farrellee commented on HADOOP-9371: --- Page 2, Concurrency, you mention mkdir/mkdirs is atomic It seems reasonable that mkdir is atomic. I've been researching mkdirs(), with a focus on idempotence and atomicity. ClientProtocol.java:mkdirs() clearly labels it as @Idempotent, and the documentation and various implementations support that claim. It's also a property that is relatively straight-forward to implement on many back-end filesystems. I'm having more difficulty tracking down the atomicity of mkdirs(). The LocalFS implementations are not themselves atomic. I tracked the HDFS implementation back to FSNamesystem.java:mkdirsInt(), which appears to provide an atomic implementation. However, the atomic nature of mkdirsInt() appears to come from HDFS-988, which looks to fix a bug by making mkdirs() atomic rather having an explicit purpose of making mkdirs() atomic by design. How are you getting to mkdirs() as atomic? A mild concern of mine is that even if mkdirs() isn't atomic by design, for HDFS it has been implemented as atomic and who knows who may silently be relying on the not-by-design atomic property. That said, given mkdirs() is idempotent it isn't suitable for use as a locking mechanism. Define Semantics of FileSystem and FileContext more rigorously -- Key: HADOOP-9371 URL: https://issues.apache.org/jira/browse/HADOOP-9371 Project: Hadoop Common Issue Type: Sub-task Components: fs Affects Versions: 1.2.0, 3.0.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: HadoopFilesystemContract.pdf Original Estimate: 48h Remaining Estimate: 48h The semantics of {{FileSystem}} and {{FileContext}} are not completely defined in terms of # core expectations of a filesystem # consistency requirements. # concurrency requirements. # minimum scale limits Furthermore, methods are not defined strictly enough in terms of their outcomes and failure modes. The requirements and method semantics should be defined more strictly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously
[ https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13599651#comment-13599651 ] Arun C Murthy commented on HADOOP-9371: --- +1 for this effort - thanks for taking this on Steve! Define Semantics of FileSystem and FileContext more rigorously -- Key: HADOOP-9371 URL: https://issues.apache.org/jira/browse/HADOOP-9371 Project: Hadoop Common Issue Type: Sub-task Components: fs Affects Versions: 1.2.0, 3.0.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: HadoopFilesystemContract.pdf Original Estimate: 48h Remaining Estimate: 48h The semantics of {{FileSystem}} and {{FileContext}} are not completely defined in terms of # core expectations of a filesystem # consistency requirements. # concurrency requirements. # minimum scale limits Furthermore, methods are not defined strictly enough in terms of their outcomes and failure modes. The requirements and method semantics should be defined more strictly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira