[ 
https://issues.apache.org/jira/browse/HADOOP-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751761#action_12751761
 ] 

Todd Lipcon commented on HADOOP-6223:
-------------------------------------

I looked through FileContext9.patch in detail last night. My notes are below. 
Obviously this is still a preliminary patch, but I wasn't sure I'd have time to 
look again for a few weeks, so figured I'd give a detailed review now - feel 
free to ignore notes about things that are preliminary bits and won't be in the 
final patch.

FileContext.java;
  nits:
    - apache license headers

    - typos:
      FileContext.java:22 operationS
      "imples" -> implies
      "imples a  default" (extra space)

    - check for more than 80 columns (eg FileContext.java:33, 39)

    - "file related" -> "file-related" (line 56)

    - "all file systems instance" -> "all file system instances"

    - overall "filesystem" seems better to me than "file system"

    - inconsistent use of "you" vs "one" becomes confusing to read

    - Extraneous bit of javadoc left right before FileContext class definition 
- looks like notes to self.

    - indentation on initFromConfig method is too deep

    - javadoc for getFSofPath is not filled in

    - extra space before "theConfig" on line 195

    - extra ';' on line 209

    - new member variables defined in the middle of the class (LOCAL_FS_URI and 
defaultPerm)

    - typo: "relavent" line 267

    - javadoc for setWorkingDirectory refers to "newWdir" but the param is 
called "p". The description should be under the @param p section.

    - indentation on line 513

    - Javadoc for setTimes: I'd rather see "since the epoch" instead of "since 
Jan 1, 1970".

  code:

    - "conf" is the common name rather than "theConfig"

    - rather than FileSystem.getInitialWorkingDirectory() returning null by 
default and having the check in FileContext, have it default to just passing 
through to getHomeDirectory() in FileSystem.java

    - is defaultFsURI.equals(defaultFS.getUri()) an invariant? If so, why are 
there two separate member variables? If not, when is that not the case?

    - could makeAbsolute be made public? It seems generally useful

    - style: "isNotSchemeWithRelative" should be called 
"checkNotSchemeWithRelative" - otherwise it looks like it should return a 
boolean

    - also, that function claims to throw IOException if it's of the wrong 
type, but in fact it throws IllegalArgumentException.

    - are these kinds of paths *ever* legal in Hadoop? If not, can this check 
go into the Path constructor such that we can never end up with an invalid 
object?

    - getFSofPath seems like a hackish implementation, but unfortunately that's 
the API that FileSystem gives. Could that function in FileSystem be refactored 
a bit so there is a more sensical way of checking ownership of a path without 
having to throw/catch an exception like this?

    - JavaDoc for FileContext() constructor - again this is the normal behavior 
for hadoop configuration. I think you can leave it at just "using a default 
Configuration". In fact, I am sort of in favor of removing this constructor 
entirely and forcing the user to explicitly choose to construct a new 
Configuration().

    - the static factory methods (getFileContext, getLocalFSFileContext) seem 
confusing in that they are not singleton instances. I can see this tripping 
people up - if you think they should be new instances every time it might be 
worth renaming the methods to "createFileContext" or "newFileContext".

    - I'm not sure I see the point of the factory methods that call 
setDefaultFileSystem for you - again the "get" name makes it seem like it would 
be the same instance every time if you call it multiple times on the same FS.

    - Why allow the user to pass either URI or FileSystem instances? There's 
less code if you just provide one, and the user can always go from one to the 
other. I'm in favor of fewer code paths where possible.

    - setWorkingDirectory - why not call makeAbsolute here?

    - CreatOpts should be CreateOptions. I think Ken Thompson once remarked 
that if he had to do anything differently in his past he would have spelled 
O_CREAT right :)

    - ReflactionFac -> ReplicationFactor

    - I'm not sold on the varargs/CreatOpts solution. What about having a class 
using boxed Integers, where "null" means "not set by the user". Then use the 
method chaining pattern so you can do new 
CreateOptions().setDefaultReplication(3).setBlockSize(64*1024*1024).etc.etc.etc.
 Given this structure we could refactor the general FileSystem interface to 
take that single parameter, rather than the hard-to-migrate 
lots-of-arguments-to-open route we've got now.

    - If left that way, at least change the sequence of "ifs" to "else if's, 
and add an else clause at the end with "throw new IllegalArgumentException(...)"

    - mkdirs: this javadoc claims that it returns false if the directory 
already exists, but in my experience that isn't true, at least with HDFS. I 
vaguely recall you opened another JIRA about that elsewhere, so maybe 
out-of-scope for this one.

    - mkdirs: either don't accept null as an argument, or document that null 
means defaults in the javadoc

    - style: abs_dir -> absDir

    - delete: another maybe out-of-scope thing: I believe delete with 
recursive=false will still fail even on an empty directory. Would be nice to 
clear that up before 0.21.

    - rename: would be nice to clarify these semantics as well. Again I think I 
saw another JIRA about this.

    - rename: rather than comparing URIs for equality, would it be possible to 
implement FileSystem.equals to default to using URIs, but allow other classes 
to override?

    - setOwner: some kind of assertion that (username != null || groupname != 
null)?

    - setVerifyChecksum(bool) - since this isn't Path-specific, it seems odd 
that it's present here. The pattern imho should be anything that takes a Path 
should be present in FileContext and forward through to the appropriate FS, but 
non-Path-specific stuff shouldn't be delegated.

    - the "Not implemented yet" method should just be removed, imo

    - getFSStatus might be better named getFsStatus to match the type. Also I 
think better to call the no-param getStatus() rather than getStatus(null), 
stylewise

    - getFSStatus() doesn't makeAbsolute the path

    - Why does the inner class have a redundant reference thisFC? Since it's 
not a static inner class this reference already exists implicitly. The purpose 
of this inner class also isn't clear to me.

    - Agree with the "TBD not very clear" on line 1263 - this code path is 
really hard to follow.

    - Line 1263: should use srcFS.equals(dstFS) here

overall:


  - I'm not 100% sold that your working directory can be on a different 
filesystem than the default filesystem. In particular, I think it's unexpected 
that "foo" and "/foo" might be on different filesystems. Could this be changed 
such that there's a concept of a "home directory" and "working directory", and 
you can always "cd home". Then absolute paths and relative paths would always 
be on the same filesystem as the working directory.

  - I'm unclear of the definition of "Server Side Defaults" from the JavaDoc - 
are they in fact server side? They're just configurations, right? In general 
the discussion of Configuration in FileContext seems somewhat extraneous - the 
behavior here is the same as it is everywhere else in Hadoop, until HDFS-578 is 
done. If HDFS-578 is a true pre-requisite to this, can you link the tickets?

  - the isNotSchemeWithRelative function is only called in a few places - it's 
unclear to me whether this is on purpose or not. Maybe this should just be part 
of makeAbsolute?

  - There's already some confusion/inclarity in the Hadoop docs about what 
"qualified" vs "absolute" means in terms of paths. If the JavaDocs here could 
help clarify that, I think that would be a big plus. (what's the difference 
between makeQualified and makeAbsolute?)

  - The "TBD"s should be changed to "TODOs" so that they flag on the Hudson 
test-patch. And, of course, they should be fixed before this gets committed.

  - A lot of the stuff in util is copied straight from FileSystem.java. This 
code duplication should be avoided. This also includes DEFAULT_FILTER, 
GlobFilter below



FsConfig.java:

nits:
  - typo: line 10: static method*s*
  - missing apache license
  - "see the note on the javdoc" -> jav*a*doc

code:
  - the "NOTE" section of the class javadoc doesn't make sense to me. What 
Files class?

  - What is SERVER_DEFAULT(-1) referring to? I see neither the constant nor a 
literal -1 anywhere. [edit: looks like this is part of HDFS-578, see above]

  - FS_HOME_DIR default isn't quite correct.. it would be /user/<foo> but I 
don't think you can do a ${...} style substitution from this context. So the 
calculation of the default would have to be done in code, or else just not 
provide a default at all and throw an exception if for some reason it can't be 
grabbed from the Configuration object.

  - Would be nice to have javadocs for each of the static getters

  - getImplClass: check that the URI has a scheme. If it has no scheme, throw 
an IllegalArgumentException? Add javadoc to that effect.

  - The class javadoc indicates that there can be instances of this class, but 
it is entirely made of static methods. Remove that section of the javadoc and 
make the class abstract and final.


FileSystem.java:
  - line 544 typo agains -> against
  - typo: absolutPermission -> absolutePermission (perhaps this is a new brand 
of Hadoop sponsored vodka! +1!)


FileContextBaseTest.java:

  - line 102: should duplicate the assertion from line 103 before setting the 
working directory here too - make sure we ended up at the right absolute path 
by means of the relative "cd"s

  - should test setWorkingDirectory on a nonexistent dir

  - also, all of the setWorkingDirectory calls are called with 
already-qualified paths. Should try some with non-qualified paths too.

  - test on line 117 confirms my earlier comment that mkdirs returns true if 
you call it on an already-existing directory


> New improved FileSystem interface for those implementing new files systems.
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-6223
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6223
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>
> The FileContext API (HADOOP-4952) provides an improved interface for the 
> application writer.
> This lets us simplify the FileSystem API since it will no longer need to deal 
> with notions of default filesystem [ / ],  wd, and config
> defaults for blocksize, replication factor etc. Further it will not need the 
> many overloaded methods for create() and open() since
> the FileContext API provides that convenience.
> The FileSystem API can be simplified and can now be restricted to those 
> implementing new file systems.
> This jira proposes that we create new file system API,  and deprecate 
> FileSystem API after a few releases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to