[ https://issues.apache.org/jira/browse/HDFS-12202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yongjun Zhang reassigned HDFS-12202: ------------------------------------ Assignee: Yongjun Zhang > Provide new set of FileSystem API to bypass external attribute provider > ----------------------------------------------------------------------- > > Key: HDFS-12202 > URL: https://issues.apache.org/jira/browse/HDFS-12202 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs, hdfs-client > Reporter: Yongjun Zhang > Assignee: Yongjun Zhang > > HDFS client uses > {code} > /** > * Return a file status object that represents the path. > * @param f The path we want information from > * @return a FileStatus object > * @throws FileNotFoundException when the path does not exist > * @throws IOException see specific implementation > */ > public abstract FileStatus getFileStatus(Path f) throws IOException; > /** > * List the statuses of the files/directories in the given path if the path > is > * a directory. > * <p> > * Does not guarantee to return the List of files/directories status in a > * sorted order. > * <p> > * Will not return null. Expect IOException upon access error. > * @param f given path > * @return the statuses of the files/directories in the given patch > * @throws FileNotFoundException when the path does not exist > * @throws IOException see specific implementation > */ > public abstract FileStatus[] listStatus(Path f) throws > FileNotFoundException, > IOException; > {code} > to get FileStatus of files. > When external attribute provider (INodeAttributeProvider) is enabled for a > cluster, the external attribute provider is consulted to get back some > relevant info (including ACL, group etc) and returned back in FileStatus, > There is a problem here, when we use distcp to copy files from srcCluster to > tgtCluster, if srcCluster has external attribute provider enabled, the data > we copied would contain data from attribute provider, which we may not want. > Create this jira to add a new set of interface for distcp to use, so that > distcp can copy HDFS data only and bypass external attribute provider data. > The new set API would look like > {code} > /** > * Return a file status object that represents the path. > * @param f The path we want information from > * @param bypassExtAttrProvider if true, bypass external attr provider > * when it's in use. > * @return a FileStatus object > * @throws FileNotFoundException when the path does not exist > * @throws IOException see specific implementation > */ > public FileStatus getFileStatus(Path f, > final boolean bypassExtAttrProvider) throws IOException; > /** > * List the statuses of the files/directories in the given path if the path > is > * a directory. > * <p> > * Does not guarantee to return the List of files/directories status in a > * sorted order. > * <p> > * Will not return null. Expect IOException upon access error. > * @param f > * @param bypassExtAttrProvider if true, bypass external attr provider > * when it's in use. > * @return > * @throws FileNotFoundException > * @throws IOException > */ > public FileStatus[] listStatus(Path f, > final boolean bypassExtAttrProvider) throws FileNotFoundException, > IOException; > {code} > So when bypassExtAttrProvider is true, external attribute provider will be > bypassed. > Thanks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org