[ https://issues.apache.org/jira/browse/HDFS-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997143#comment-15997143 ]
Chris Douglas commented on HDFS-6984: ------------------------------------- Thanks for taking a look, Andrew. bq. Looks like this latest patch has grown some new functionality. That's not intended. It includes [~ste...@apache.org]'s request that the user-facing types be {{Serializable}}, and your request to move the flags out of {{FsPermissionExtension}}, deprecating Writable APIs in/around FileStatus. Versions before v009 tried to avoid a set of flags for features in HdfsFileStatus and FileStatus, but ACLs/encryption rely on the FsPermissionExtension bits, not the PB record. I tried variants that used a placeholder, "remote" stub in the PB field that's sufficient to fill out the FileStatus API, but there are too many required fields in the PB messages to make that efficient or clean. ACLs, encryption, and EC are all handled slightly differently. Instead of adding to client impl complexity, v009 gives up and adds a set of flags. By new functionality, are you referring to anything beyond making HdfsFileStatus a subtype of FileStatus? bq. I'd like to keep setting the bits server-side. There's no guarantee whether 2.9.0 or 3.0.0 will GA first, and if we have the option of avoiding incompatibility, I'd like to take it. Agreed. I hope we eventually remove this, or at least stop adding new flags to FsPermission. bq. Could you give a pointer to this? These bits are new to the JSON code, so it'd be good to catch any bugs quickly. [Here|https://github.com/apache/hadoop/blob/81092b1/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/JsonUtil.java#L373] is the relevant line. It's consistent with the documentation- that _is_ the FsPermission\[Extension\] for that path- but including the flag bits in the AclStatus permission field looks more like an accident of the implementation, not part of its intended API. bq. I'm worried about making HdfsFileStatus extend FileStatus, particularly the link target. HdfsFileStatus is used on the NameNode, and there is potentially some mangling from turning the link target String into a Path. The target is supposed to be an opaque value within the filesystem, left to be interpreted by the client. For this particular case, shouldn't the caller be using {{HdfsFileStatus::getSymlinkInBytes}} in contexts where it's opaque? I share your concern, but extending FileStatus also required touching code that didn't expect (spurious) IOExceptions. In those and other existing contexts, I didn't find any instances where the String wasn't converted to a Path. Converting the byte[] to a String may also mangle it somewhat. To treat it as opaque for the patch, I'll change the serialization to use the {{getSymlinkInBytes}} conversion, rather than {{Path::toString}}. I'll also go through its uses, and try to find all the cases that don't immediately convert the String to a Path. bq. How bad is it to split this change out? I've split this twice already, to fix concerns over DistCp and for {{Serializable}}. We can change the scope of the JIRA, but (IMHO) these changes are related, and implied by previous review requests. bq. Wasn't clear what we gain by having HdfsFileStatus extend FileStatus The case [~sershe] raised for HDFS-7878 involved passing a handle across process boundaries. If a PathHandle is required to store sufficient state to open the file, then it needs to copy (and serialize) data from HdfsFileStatus that, in most cases, will not be used. We can construct this on-demand, if the HdfsFileStatus is kept intact. There's also the minor inefficiency of creating a new FileStatus instance and copying fields, but 1) that's often incurred anyway because the Located\* types are irreconcilable and 2) in the noise. FileStatus is exposing attributes that (AFAIK) only apply to HDFS (e.g., EC, encryption); the types are 1:1 mappings right now. If HDFS wants to pass additional information, or encode it differently, then it can return a different data type (perhaps with this as a field). As it's used inside the NN, it lazily binds some fields, but I don't know of a use that's inconsistent with the POD type. I wish FileStatus were an interface and not a concrete class, but that ship has sailed. If you're really uncomfortable with the change, then I can try to roll it back. bq. HttpsFileSystem: Please do not use a wildcard import Recently switched to Intellij; not intended. Will fix my settings. > In Hadoop 3, make FileStatus serialize itself via protobuf > ---------------------------------------------------------- > > Key: HDFS-6984 > URL: https://issues.apache.org/jira/browse/HDFS-6984 > Project: Hadoop HDFS > Issue Type: Improvement > Affects Versions: 3.0.0-alpha1 > Reporter: Colin P. McCabe > Assignee: Colin P. McCabe > Labels: BB2015-05-TBR > Attachments: HDFS-6984.001.patch, HDFS-6984.002.patch, > HDFS-6984.003.patch, HDFS-6984.004.patch, HDFS-6984.005.patch, > HDFS-6984.006.patch, HDFS-6984.007.patch, HDFS-6984.008.patch, > HDFS-6984.009.patch, HDFS-6984.nowritable.patch > > > FileStatus was a Writable in Hadoop 2 and earlier. Originally, we used this > to serialize it and send it over the wire. But in Hadoop 2 and later, we > have the protobuf {{HdfsFileStatusProto}} which serves to serialize this > information. The protobuf form is preferable, since it allows us to add new > fields in a backwards-compatible way. Another issue is that already a lot of > subclasses of FileStatus don't override the Writable methods of the > superclass, breaking the interface contract that read(status.write) should be > equal to the original status. > In Hadoop 3, we should just make FileStatus serialize itself via protobuf so > that we don't have to deal with these issues. It's probably too late to do > this in Hadoop 2, since user code may be relying on the existing FileStatus > serialization there. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org