[ https://issues.apache.org/jira/browse/HADOOP-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187777#comment-13187777 ]
Robert Joseph Evans commented on HADOOP-7973: --------------------------------------------- {quote}Is FsShell a publicly supported API now?{quote} FsShell is marked as @InterfaceAudiance.Private on trunk, so no it is not a publicly supported API. However it is used directly by Pig, and possibly others. The use that we are referring to is an oozie action like the following. {code} <action name="copy"> <java> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <main-class>org.apache.hadoop.fs.FsShell</main-class> <arg>-cp</arg> <arg>${from}</arg> <arg>${to}</arg> </java> </action> {code} This is more or less the same as calling {code}hadoop fs -cp $from $to{code} it is done this way because oozie does not support copy from the fs action, because oozie does not want significant amounts of data flowing to or from the node oozie is running on. Yes this technically is a violation of our interface visibility guidelines, but only very slightly, because it is trying to act very much like {code}hadoop fs{code} which is a public interface. I am OK with telling the customer to fix their usage of this long term, because this is not what they are supposed to do. We have already told them this, but the practice is quite pervasive. It worked before, it no longer works, and this is simply because our internal code, FsShell, is ignoring the guidelines that we tell everyone else to follow. Don't call FileSystem.close. Which kind of reminds me of that scene from "The Emperor's new Groove" ["Why do we even have that lever"|http://www.youtube.com/watch?v=AGdFiA0A_c0] If this API is not supposed to be called, then why has it not been deprecated, and replaced with something that has cleaner semantics that users actually understand. {quote}I've seen this bite users as well but its more so cause they do not understand how to use the FS objects than anything else:{quote} That seems to point to me that there is something wrong with the API if people who use our main interface have to have a deep understanding of how FileSystem caching works, and what is more that it can be disabled. I believe that we may want to leave FileSystem.close in place but deprecate it, and provide a method that does the expected behavior of closing the FileSystem if it is not part of the cache, or nothing if it is part of the cache. At the same time, we update FsShell to use this new API. I want to reiterate that I am not condoning the behavior that has exposed this issue. But we have customers that are doing this, and I would really like to unblock them. Especially if I can unblock them with a tiny change on our part instead of a massive change on their part. Especially if doing so seems to fix an API that is causing problems. > DistributedFileSystem close has severe consequences > --------------------------------------------------- > > Key: HADOOP-7973 > URL: https://issues.apache.org/jira/browse/HADOOP-7973 > Project: Hadoop Common > Issue Type: Bug > Components: fs > Affects Versions: 1.0.0 > Reporter: Daryn Sharp > Assignee: Daryn Sharp > Priority: Blocker > Attachments: HADOOP-7973.patch > > > The way {{FileSystem#close}} works is very problematic. Since the > {{FileSystems}} are cached, any {{close}} by any caller will cause problems > for every other reference to it. Will add more detail in the comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira