[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously

2013-06-07 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13678507#comment-13678507
 ] 

Suresh Srinivas commented on HADOOP-9371:
-

Steve,

What you are trying do in this jira? Because some of the comments in this jira 
suggests changing the semantics.

Is your intent to document the semantics rigorously and add tests to ensure any 
other file system implementation can be tested (I do not know how you can test 
atomicity easily) and certified based on these tests? or Are you also planning 
to change the semantics?

As regards to deciding the semantics, where the documentation is either sparse 
or not clear, the semantics as implemented by HDFS is the gold standard. 
Because that is what majority of applications are dependent upon. I would 
discourage others from second guessing what applications need, because we do 
not know all the applications that are out there.

 Define Semantics of FileSystem and FileContext more rigorously
 --

 Key: HADOOP-9371
 URL: https://issues.apache.org/jira/browse/HADOOP-9371
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 1.2.0, 3.0.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HADOOP-9361.2.patch, HADOOP-9361.patch, 
 HadoopFilesystemContract.pdf

   Original Estimate: 48h
  Remaining Estimate: 48h

 The semantics of {{FileSystem}} and {{FileContext}} are not completely 
 defined in terms of 
 # core expectations of a filesystem
 # consistency requirements.
 # concurrency requirements.
 # minimum scale limits
 Furthermore, methods are not defined strictly enough in terms of their 
 outcomes and failure modes.
 The requirements and method semantics should be defined more strictly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously

2013-05-25 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667065#comment-13667065
 ] 

Steve Loughran commented on HADOOP-9371:


Konstantin -good points

-I'm going to redo it as .apt so the linking isn't going to be useful (soon). 
MD may be well tooled, but as there isn't a consistent format for handling 
tables, it's not that much better than APT (though it does make it easier to 
use angle brackets in in-line code, and doesn't tie you to a single build tool 
forever.

# Atomic recursive deletes? it sort of happens today in every real FS as the 
toplevel inode goes away. I don't know how that spans filesystems -can I 
actually do an rm -rf above a mounted FS in Unix?

That said: saying no guarantees about atomicity is one thing -it gives us 
flexibility in future - but as all normal filesystems appear to provide this, 
code will tend to assume it anyway. I think we should do it -but call out 
blobstores for breaking some of these rules.

# atomic rename where the parent dir stays the same does seem a good compromise 
on atomicity; it means that more distributed filesystems don't do it. 

In fact, we could say there are no guarantees that rename() across filesystems 
work at all. And then add an explicit exception 
{{RenameAcrossFileSystemsUnsupported}} for this. I'm confident that you can't 
rename file:///c:/something.txt to file:///d:/something.txt on windows.





 Define Semantics of FileSystem and FileContext more rigorously
 --

 Key: HADOOP-9371
 URL: https://issues.apache.org/jira/browse/HADOOP-9371
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 1.2.0, 3.0.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HADOOP-9361.2.patch, HADOOP-9361.patch, 
 HadoopFilesystemContract.pdf

   Original Estimate: 48h
  Remaining Estimate: 48h

 The semantics of {{FileSystem}} and {{FileContext}} are not completely 
 defined in terms of 
 # core expectations of a filesystem
 # consistency requirements.
 # concurrency requirements.
 # minimum scale limits
 Furthermore, methods are not defined strictly enough in terms of their 
 outcomes and failure modes.
 The requirements and method semantics should be defined more strictly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously

2013-05-24 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13666838#comment-13666838
 ] 

Konstantin Shvachko commented on HADOOP-9371:
-

Steve, you might want to link the document from github to the jira. Add Link 
has an option to add a web link.

Not requiring atomicity for mkdirs() and recursive deletes makes sense to me.
For renames I think we should also restrict atomicity to one special case, when 
file or directory name changes, that is file is not moving from one directory 
to another. I call it in-place rename, which with inode numbers in place is a 
trivial operation. Atomic moves are hard if you build a distributed namespace 
service (like Giraffa). Moving a file between directories that are located on 
different nodes requires distributed coordination, which can be complex.

 Define Semantics of FileSystem and FileContext more rigorously
 --

 Key: HADOOP-9371
 URL: https://issues.apache.org/jira/browse/HADOOP-9371
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 1.2.0, 3.0.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HADOOP-9361.2.patch, HADOOP-9361.patch, 
 HadoopFilesystemContract.pdf

   Original Estimate: 48h
  Remaining Estimate: 48h

 The semantics of {{FileSystem}} and {{FileContext}} are not completely 
 defined in terms of 
 # core expectations of a filesystem
 # consistency requirements.
 # concurrency requirements.
 # minimum scale limits
 Furthermore, methods are not defined strictly enough in terms of their 
 outcomes and failure modes.
 The requirements and method semantics should be defined more strictly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously

2013-04-19 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13636552#comment-13636552
 ] 

Steve Loughran commented on HADOOP-9371:


also note that apple's HFS hasn't offered atomic renames until 
recently:[http://www.weirdnet.nl/apple/rename.html]

 Define Semantics of FileSystem and FileContext more rigorously
 --

 Key: HADOOP-9371
 URL: https://issues.apache.org/jira/browse/HADOOP-9371
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 1.2.0, 3.0.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HADOOP-9361.2.patch, HADOOP-9361.patch, 
 HadoopFilesystemContract.pdf

   Original Estimate: 48h
  Remaining Estimate: 48h

 The semantics of {{FileSystem}} and {{FileContext}} are not completely 
 defined in terms of 
 # core expectations of a filesystem
 # consistency requirements.
 # concurrency requirements.
 # minimum scale limits
 Furthermore, methods are not defined strictly enough in terms of their 
 outcomes and failure modes.
 The requirements and method semantics should be defined more strictly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously

2013-04-18 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635220#comment-13635220
 ] 

Steve Loughran commented on HADOOP-9371:


We also need to specify {{Seekable}}, as the {{FSDataInputStream}} which must 
be returned from {{open()}} calls implement it, and the specifics of 
{{seek(long pos)}} are not completely defined, consistently implemented, or 
explicitly tested.

* some implementation classes validate the range of a seek in the call; it can 
also be postponed until the next read() (which is how Posix expects it).
* Not everything rejects negative seek offsets
* While {{EOFException}} would be the appropriate exception to raise on going 
past the end of the file, it is rarely to be seen in the source.

Delayed seeks can deliver tangible performance benefits and it would be unwise 
to demand stricter validation than {{::lseek()}} or {{::SetFilePointerEx()}}. 
We ought to say you can if you want, and write tests that verify either the 
seek fails, or the read straight afterwards fails. 

== Seekable ==

* When a file is opened, {{getPos()}} MUST equal 0
* Implementations MAY NOT implement {{seek()}}, and instead MAY throw an 
{{IOException}}
* A {{seek(L)}} on a closed input stream MUST fail with an {{IOException}}.
* After a successful {{seek(L)}}, {{getPos()==L}} for all L:  {{0 = L  
length(file)}}
* On a {{seek(L)}} with L0 an MUST be thrown. It SHOULD be an {{IOException}}. 
It MAY be {{IllegalArgumentException}} or other {{RuntimeException}}
* On a {{seek(L)}} with Llength(file), an {{IOException}} MAY be thrown. It 
SHOULD be an {{EndOfFileException}}
* If an {{IOException}} is not thrown, then an {{IOException}} MUST be thrown 
on the next {{read()}} operation. It SHOULD be an {{EndOfFileException}} 


This is actually a relaxation of the {{Seekable.seek()}} definition, which 
states Can't seek past the end of the file.. The {{RawLocalFileSystem}} on 
which everything ultimately depends does support seeking past the end of the 
file -it is only on the read operation where an exception is raised.

* After a {{seek(L)}} with {{Llength(file)}}, {{read()}} returns the byte at 
position L in the file.
* After a {{seek(L)}} with {{L==length(file)}}, {{read()}} returns -1
* After a {{seek(L)}} with {{L==length(file)}}, {{read(byte[1],0,1)}} returns 
the byte at position L in the file.

Tests to verify offset validation
# open a file of length {{file_len  0}}, verify {{getPos()==0}}
# {{seek(file_len)}}, verify {{getPos()==file_len}}
 If an exception is not raised, read() and expect an {{IOException}} exception 
# {{seek(file_len+1)}}, expect an {{EOFException}}
 If an exception is not raised, read() and expect the exception then
# seek(-1), expect an {{IOException}} immediately.

open a file of length {{file_len == 0}}
 # verify {{getPos()==0}}
 # Verify that {{seek(0)}} succeeds.
 # verify that {{read()}} returns -1.

Test to verify {{seek()}} actually changes the location for future reads.
* verify that after a {{seek()}}, {{read()}} returns the data at the seek 
location. This must work for forward and backwards seeks.
* verify that after a {{seek()}}, a {{read(byte[])}} returns the bytes of data 
at the seek location. This must work for forward and backwards seeks.]
Repeat for very large offsets (e.g. 128KB file), to ensure that filesystems 
with local caches/buffers handle longer range seeks correctly.


 Define Semantics of FileSystem and FileContext more rigorously
 --

 Key: HADOOP-9371
 URL: https://issues.apache.org/jira/browse/HADOOP-9371
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 1.2.0, 3.0.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HADOOP-9361.2.patch, HADOOP-9361.patch, 
 HadoopFilesystemContract.pdf

   Original Estimate: 48h
  Remaining Estimate: 48h

 The semantics of {{FileSystem}} and {{FileContext}} are not completely 
 defined in terms of 
 # core expectations of a filesystem
 # consistency requirements.
 # concurrency requirements.
 # minimum scale limits
 Furthermore, methods are not defined strictly enough in terms of their 
 outcomes and failure modes.
 The requirements and method semantics should be defined more strictly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously

2013-04-18 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635267#comment-13635267
 ] 

Steve Loughran commented on HADOOP-9371:


note that {{BufferedFSInputStream}} doesn't meet this spec as it treats a 
negative seek as a no-op:
{code}
  public void seek(long pos) throws IOException {
if( pos0 ) {
  return;
}
{code}

 Define Semantics of FileSystem and FileContext more rigorously
 --

 Key: HADOOP-9371
 URL: https://issues.apache.org/jira/browse/HADOOP-9371
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 1.2.0, 3.0.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HADOOP-9361.2.patch, HADOOP-9361.patch, 
 HadoopFilesystemContract.pdf

   Original Estimate: 48h
  Remaining Estimate: 48h

 The semantics of {{FileSystem}} and {{FileContext}} are not completely 
 defined in terms of 
 # core expectations of a filesystem
 # consistency requirements.
 # concurrency requirements.
 # minimum scale limits
 Furthermore, methods are not defined strictly enough in terms of their 
 outcomes and failure modes.
 The requirements and method semantics should be defined more strictly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously

2013-04-10 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628045#comment-13628045
 ] 

Steve Loughran commented on HADOOP-9371:


bradley -thanks for your research.

I wonder if we should just say, in the concurrency section:

* Multiple writers MAY open a file for writing. If this occurs, the outcome is 
undefined

I guess we have to make sure that Syncable is defined here too



 Define Semantics of FileSystem and FileContext more rigorously
 --

 Key: HADOOP-9371
 URL: https://issues.apache.org/jira/browse/HADOOP-9371
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 1.2.0, 3.0.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HADOOP-9361.2.patch, HADOOP-9361.patch, 
 HadoopFilesystemContract.pdf

   Original Estimate: 48h
  Remaining Estimate: 48h

 The semantics of {{FileSystem}} and {{FileContext}} are not completely 
 defined in terms of 
 # core expectations of a filesystem
 # consistency requirements.
 # concurrency requirements.
 # minimum scale limits
 Furthermore, methods are not defined strictly enough in terms of their 
 outcomes and failure modes.
 The requirements and method semantics should be defined more strictly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously

2013-04-08 Thread bradley childs (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13625721#comment-13625721
 ] 

bradley childs commented on HADOOP-9371:


Great work here guys.  I've been researching the semantics around write locking 
and have a couple comments.  First around this line regarding write atomicity:

Only one writer can write to a file (ISSUE: does anything in MR/HBase use this 
for locks?), which implies fully atomic write transactions. 

If this line is a MUST (slightly unclear) then the file lock/release would have 
to be explicit around create(), append(), and open().  Any writer would have to 
go through a lock/release state for the file during the output stream 
instantiation (not desirable).  

If you looked at HDFS' DistributedFileSystem.java (linked below) 
create/open/append methods, a FSDataOutputStream is returned with no locking or 
lifecycle.  Further investigation show's no explicit locking inside the 
FSDataOutputStream stream class.

Instead, the FSDataOutputStream does implement the o.a.h.fs.Syncable class 
which provides a sync() method. Per the interface a call to the sync method 
Synchronize[s] all buffer with the underlying devices.

To me this says that there is no exclusive Writers.  Instead a Writers file 
consistency is only guaranteed the instant the sync(...) method is called on 
the underlying OutputStream, after which it only MAY be consistent until the 
sync(..) method is called again. 

Summary:  I believe Only one writer can write to a file (ISSUE: does anything 
in MR/HBase use this for locks?) should be changed to something like  A file 
may have multiple writers with each writers only guarantee on consistency is 
during a sync(...) call.

Ref:
https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.4/src/hdfs/org/apache/hadoop/hdfs/DistributedFileSystem.java
https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.4/src/core/org/apache/hadoop/fs/FSDataOutputStream.java
https://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.4/src/core/org/apache/hadoop/fs/Syncable.java


 Define Semantics of FileSystem and FileContext more rigorously
 --

 Key: HADOOP-9371
 URL: https://issues.apache.org/jira/browse/HADOOP-9371
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 1.2.0, 3.0.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HADOOP-9361.2.patch, HADOOP-9361.patch, 
 HadoopFilesystemContract.pdf

   Original Estimate: 48h
  Remaining Estimate: 48h

 The semantics of {{FileSystem}} and {{FileContext}} are not completely 
 defined in terms of 
 # core expectations of a filesystem
 # consistency requirements.
 # concurrency requirements.
 # minimum scale limits
 Furthermore, methods are not defined strictly enough in terms of their 
 outcomes and failure modes.
 The requirements and method semantics should be defined more strictly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously

2013-03-26 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613654#comment-13613654
 ] 

Steve Loughran commented on HADOOP-9371:


I've just published a copy of my branch of hadoop-trunk with this patch to 
github

This has auto rendering of the [MD 
file|https://github.com/steveloughran/hadoop-trunk/blob/stevel/HADOOP-9361-filesystem-contract/hadoop-common-project/hadoop-common/src/site/markdown/filesystem-contract.md]

I've merged in Mike's and Matt's comments already. 

Matt: that {{mkdirs()}} point is significant. Have you found code that expects 
atomic directory creation? If so, we'd better fix it.

(this makes me think of something else: a front end client to {{DFSClient}} 
that downconverts some of the ops to non-atomic. In the case of mkdirs, simply 
doing the mkdir chain client-side would suffice. I don't see an easy way to do 
the equivalent of {{mv}} without creating the dest dir then moving the entries 
below the original.)

 Define Semantics of FileSystem and FileContext more rigorously
 --

 Key: HADOOP-9371
 URL: https://issues.apache.org/jira/browse/HADOOP-9371
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 1.2.0, 3.0.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HADOOP-9361.2.patch, HADOOP-9361.patch, 
 HadoopFilesystemContract.pdf

   Original Estimate: 48h
  Remaining Estimate: 48h

 The semantics of {{FileSystem}} and {{FileContext}} are not completely 
 defined in terms of 
 # core expectations of a filesystem
 # consistency requirements.
 # concurrency requirements.
 # minimum scale limits
 Furthermore, methods are not defined strictly enough in terms of their 
 outcomes and failure modes.
 The requirements and method semantics should be defined more strictly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously

2013-03-19 Thread Matthew Farrellee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13606335#comment-13606335
 ] 

Matthew Farrellee commented on HADOOP-9371:
---

[~ste...@apache.org] Does delete(path, true) need to be atomic?

My research suggests that only the HDFS implementation is atomic.

(Note: current = r2.0.3-alpha on 2013-02-15 19:41)

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html

  FilterFileSystem - delegates w/o locking
ChecksumFileSystem - delegates w/o locking
   LocalFileSystem - inherits, delegates to RawLocalFileSystem
HarFileSystem - not implemented
  FTPFileSystem - FTPClient.removeDirectory w/o locking
  KosmosFileSystem - (not on trunk) no locking
  NativeS3FileSystem - no locking (even createParent()s to avoid errors, weird)
  RawLocalFileSystem - uses File.delete (if isFile) and FileUtil.fullyDelete 
w/o locking
  S3FileSystem - no locking
  ViewFileSystem - partial eval, no locking on top level
 - ChRootFileSystem uses RawLocalFileSystem

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/AbstractFileSystem.html

  AbstractFileSystem (uses FileContext.delete)
FilterFs - delegates w/o locking
  ChecksumFs - delegates w/o locking
LocalFs - inherits, delegates to RawLocalFs
DelegateToFileSystem - delegates w/o locking
  RawLocalFs - inherits, delegates to RawLocalFileSystem
  FtpFs - inherits, delegates to FTPFileSystem
ViewFs - partial eval, no locking at top level
  FileContext.delete - no hint of atomic requirement, delegates to 
AbstractFileSystem

Side note - it's interesting to see how many FS implementations make their way 
back to RawLocalFileSystem, sometimes through 3+ layers of indirection.

 Define Semantics of FileSystem and FileContext more rigorously
 --

 Key: HADOOP-9371
 URL: https://issues.apache.org/jira/browse/HADOOP-9371
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 1.2.0, 3.0.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HADOOP-9361.2.patch, HADOOP-9361.patch, 
 HadoopFilesystemContract.pdf

   Original Estimate: 48h
  Remaining Estimate: 48h

 The semantics of {{FileSystem}} and {{FileContext}} are not completely 
 defined in terms of 
 # core expectations of a filesystem
 # consistency requirements.
 # concurrency requirements.
 # minimum scale limits
 Furthermore, methods are not defined strictly enough in terms of their 
 outcomes and failure modes.
 The requirements and method semantics should be defined more strictly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously

2013-03-14 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602166#comment-13602166
 ] 

Steve Loughran commented on HADOOP-9371:


[~mikelid] all good points.

How about you submit a patch to the md file for the implicit assumptions, the 
copy-paste and the root dir -that one being easy to test on all but localfs.

That what happens to read during a write or append is a tough one. HDFS 
silently serves up new data when the read crosses a block, which I'm not 
convinced is what anyone expects to have happen. 

We could rephrase consistency after any update operation has completed, read 
operations initiated afterwards see a consistent view of the latest data?

Even there, the ambiguity of what happens of read-during-write is something we 
should pull out, as it may be where user expectations != hdfs operation

 

 Define Semantics of FileSystem and FileContext more rigorously
 --

 Key: HADOOP-9371
 URL: https://issues.apache.org/jira/browse/HADOOP-9371
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 1.2.0, 3.0.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HADOOP-9361.patch, HadoopFilesystemContract.pdf

   Original Estimate: 48h
  Remaining Estimate: 48h

 The semantics of {{FileSystem}} and {{FileContext}} are not completely 
 defined in terms of 
 # core expectations of a filesystem
 # consistency requirements.
 # concurrency requirements.
 # minimum scale limits
 Furthermore, methods are not defined strictly enough in terms of their 
 outcomes and failure modes.
 The requirements and method semantics should be defined more strictly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously

2013-03-13 Thread Mike Liddell (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601901#comment-13601901
 ] 

Mike Liddell commented on HADOOP-9371:
--

A few items for consideration:

Possible additions to 'implicit assumption': 
 - paths are represented as Unicode strings
 - equality/comparison of paths is based on binary content. this implies 
case-sensitivity and no locale-specific comparison rules.

The data added to a file during a write or append MAY be visible during while 
the write operation is in progress.
- Allowing read(s) during write seems to break the subsequent rule that 
readers always see consistent data.

 Deleting the root path, /, MUST fail iff recursive==false.
- If the root path is empty, it seems reasonable for delete(/,false) to 
succeed but to have no effect.

 After a file is created, all ls operations on the file and parent directory 
 MUST not find the file
- copy-paste error - after a file is deleted ...

 Security: if a caller has the rights to list a directory, it has the rights 
 to list directories all the way up the tree.
- This point raises lots of interesting questions and requirements for 
individual methods.  A section on security assumptions/rules would be great.




 Define Semantics of FileSystem and FileContext more rigorously
 --

 Key: HADOOP-9371
 URL: https://issues.apache.org/jira/browse/HADOOP-9371
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 1.2.0, 3.0.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HADOOP-9361.patch, HadoopFilesystemContract.pdf

   Original Estimate: 48h
  Remaining Estimate: 48h

 The semantics of {{FileSystem}} and {{FileContext}} are not completely 
 defined in terms of 
 # core expectations of a filesystem
 # consistency requirements.
 # concurrency requirements.
 # minimum scale limits
 Furthermore, methods are not defined strictly enough in terms of their 
 outcomes and failure modes.
 The requirements and method semantics should be defined more strictly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously

2013-03-12 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13599860#comment-13599860
 ] 

Steve Loughran commented on HADOOP-9371:


[~farrellee] -I think I just pulled that {{mkdirs()}} is atomic fact from HDFS, 
knowing that it's something blobstores dramatically break ({{mkdirs()}} taking 
the time for a chain of PUT operations from the potentially remote caller.

You are right, though, there's no guarantee that it has to be atomic, and a 
quick look at the Posix docs imply that while {{mkdir()}} is required to be 
(it's one of the API calls that must be atomic), {{mkdirs()}} can be done 
client side. When you start to consider cross-volume and NFS mounts, it would 
have to be non-atomic.

I'll change that, and we'd better hope that nobody relies on mkdirs being 
atomic. I wonder if there is a way to check this other than turning it off and 
seeing what breaks? 

 Define Semantics of FileSystem and FileContext more rigorously
 --

 Key: HADOOP-9371
 URL: https://issues.apache.org/jira/browse/HADOOP-9371
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 1.2.0, 3.0.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HadoopFilesystemContract.pdf

   Original Estimate: 48h
  Remaining Estimate: 48h

 The semantics of {{FileSystem}} and {{FileContext}} are not completely 
 defined in terms of 
 # core expectations of a filesystem
 # consistency requirements.
 # concurrency requirements.
 # minimum scale limits
 Furthermore, methods are not defined strictly enough in terms of their 
 outcomes and failure modes.
 The requirements and method semantics should be defined more strictly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously

2013-03-11 Thread Matthew Farrellee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13599411#comment-13599411
 ] 

Matthew Farrellee commented on HADOOP-9371:
---

Page 2, Concurrency, you mention mkdir/mkdirs is atomic

It seems reasonable that mkdir is atomic.

I've been researching mkdirs(), with a focus on idempotence and atomicity.

ClientProtocol.java:mkdirs() clearly labels it as @Idempotent, and the 
documentation and various implementations support that claim. It's also a 
property that is relatively straight-forward to implement on many back-end 
filesystems.

I'm having more difficulty tracking down the atomicity of mkdirs(). The LocalFS 
implementations are not themselves atomic. I tracked the HDFS implementation 
back to FSNamesystem.java:mkdirsInt(), which appears to provide an atomic 
implementation. However, the atomic nature of mkdirsInt() appears to come from 
HDFS-988, which looks to fix a bug by making mkdirs() atomic rather having an 
explicit purpose of making mkdirs() atomic by design.

How are you getting to mkdirs() as atomic?

A mild concern of mine is that even if mkdirs() isn't atomic by design, for 
HDFS it has been implemented as atomic and who knows who may silently be 
relying on the not-by-design atomic property. That said, given mkdirs() is 
idempotent it isn't suitable for use as a locking mechanism.

 Define Semantics of FileSystem and FileContext more rigorously
 --

 Key: HADOOP-9371
 URL: https://issues.apache.org/jira/browse/HADOOP-9371
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 1.2.0, 3.0.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HadoopFilesystemContract.pdf

   Original Estimate: 48h
  Remaining Estimate: 48h

 The semantics of {{FileSystem}} and {{FileContext}} are not completely 
 defined in terms of 
 # core expectations of a filesystem
 # consistency requirements.
 # concurrency requirements.
 # minimum scale limits
 Furthermore, methods are not defined strictly enough in terms of their 
 outcomes and failure modes.
 The requirements and method semantics should be defined more strictly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously

2013-03-11 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13599651#comment-13599651
 ] 

Arun C Murthy commented on HADOOP-9371:
---

+1 for this effort - thanks for taking this on Steve!

 Define Semantics of FileSystem and FileContext more rigorously
 --

 Key: HADOOP-9371
 URL: https://issues.apache.org/jira/browse/HADOOP-9371
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 1.2.0, 3.0.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HadoopFilesystemContract.pdf

   Original Estimate: 48h
  Remaining Estimate: 48h

 The semantics of {{FileSystem}} and {{FileContext}} are not completely 
 defined in terms of 
 # core expectations of a filesystem
 # consistency requirements.
 # concurrency requirements.
 # minimum scale limits
 Furthermore, methods are not defined strictly enough in terms of their 
 outcomes and failure modes.
 The requirements and method semantics should be defined more strictly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira