[ https://issues.apache.org/jira/browse/HADOOP-9758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721391#comment-13721391 ]
Colin Patrick McCabe commented on HADOOP-9758: ---------------------------------------------- bq. btw, I'm also going to move this to HADOOP rather than HDFS, since it's not just disabling DFS resolution. OK. bq. The provided test case also tests both FileContext and FileSystem. I could separate these if you wish. Nah, I think they're fine. Thanks. I don't think we should change {{FileContext}} to extend {{Configurable}}. The JavaDoc for FileContext clearly states that if you want a non-default configuration, you should call {{getFileContext}}. {code} * Example 4: Use a specific config, ignoring $HADOOP_CONFIG * Generally you should not need use a config unless you are doing * <ul> * <li> configX = someConfigSomeOnePassedToYou; * <li> myFContext = getFileContext(configX); // configX is not changed, * // is passed down * <li> myFContext.create(path, ...); * <li>... * </ul> {code} Let's cache the {{resolveSymlinks}} boolean, consistent with how we cache the other client configuration information in final variables in {{DFSClient#conf}}. Doing a lookup every time is slow and definitely not necessary. {code} } catch (UnresolvedLinkException e) { Configuration conf = filesys.getConf(); boolean resolveSymlinks = conf.getBoolean( CommonConfigurationKeys.FS_SYMLINKS_RESOLVE_KEY, CommonConfigurationKeys.FS_SYMLINKS_RESOLVE_DEFAULT); if (!resolveSymlinks) { throw new IOException("Path " + path + " contains a symlink" + " and symlink resolution is disabled (" + CommonConfigurationKeys.FS_SYMLINKS_RESOLVE_KEY + ").", e); } {code} It's dangerous to change the Configuration object inside a {{FileSystem}}, since you don't know how many other threads are sharing that same object (due to the caching). Users who want a {{FileSystem}} with a specific configuration can use {{FileSystem#newInstance(Configuration conf)}}. > Provide configuration option for FileSystem/FileContext symlink resolution > -------------------------------------------------------------------------- > > Key: HADOOP-9758 > URL: https://issues.apache.org/jira/browse/HADOOP-9758 > Project: Hadoop Common > Issue Type: Improvement > Affects Versions: 3.0.0, 2.3.0 > Reporter: Andrew Wang > Assignee: Andrew Wang > Attachments: hdfs-4968-1.patch, hdfs-4968-2.patch, hdfs-4968-3.patch > > > With FileSystem symlink support incoming in HADOOP-8040, some clients will > wish to not transparently resolve symlinks. This is somewhat similar to > O_NOFOLLOW in open(2). > Rationale for is for a security model where a user can invoke a third-party > service running as a service user to operate on the user's data. For > instance, users might want to use Hive to query data in their homedirs, where > Hive runs as the Hive user and the data is readable by the Hive user. This > leads to a security issue with symlinks: > # User Mallory invokes Hive to process data files in {{/user/mallory/hive/}} > # Hive checks permissions on the files in {{/user/mallory/hive/}} and allows > the query to proceed. > # RACE: Mallory replaces the files in {{/user/mallory/hive}} with symlinks > that point to user Ann's Hive files in {{/user/ann/hive}}. These files aren't > readable by Mallory, but she can create whatever symlinks she wants in her > own scratch directory. > # Hive's MR jobs happily resolve the symlinks and accesses Ann's private data. > This is also potentially useful for clients using FileContext, so let's add > it there too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira