[ 
https://issues.apache.org/jira/browse/HDFS-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997143#comment-15997143
 ] 

Chris Douglas commented on HDFS-6984:
-------------------------------------

Thanks for taking a look, Andrew.

bq.  Looks like this latest patch has grown some new functionality.
That's not intended. It includes [~ste...@apache.org]'s request that the 
user-facing types be {{Serializable}}, and your request to move the flags out 
of {{FsPermissionExtension}}, deprecating Writable APIs in/around FileStatus.

Versions before v009 tried to avoid a set of flags for features in 
HdfsFileStatus and FileStatus, but ACLs/encryption rely on the 
FsPermissionExtension bits, not the PB record. I tried variants that used a 
placeholder, "remote" stub in the PB field that's sufficient to fill out the 
FileStatus API, but there are too many required fields in the PB messages to 
make that efficient or clean. ACLs, encryption, and EC are all handled slightly 
differently. Instead of adding to client impl complexity, v009 gives up and 
adds a set of flags.

By new functionality, are you referring to anything beyond making 
HdfsFileStatus a subtype of FileStatus?

bq. I'd like to keep setting the bits server-side. There's no guarantee whether 
2.9.0 or 3.0.0 will GA first, and if we have the option of avoiding 
incompatibility, I'd like to take it.
Agreed. I hope we eventually remove this, or at least stop adding new flags to 
FsPermission.

bq. Could you give a pointer to this? These bits are new to the JSON code, so 
it'd be good to catch any bugs quickly.
[Here|https://github.com/apache/hadoop/blob/81092b1/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/JsonUtil.java#L373]
 is the relevant line. It's consistent with the documentation- that _is_ the 
FsPermission\[Extension\] for that path- but including the flag bits in the 
AclStatus permission field looks more like an accident of the implementation, 
not part of its intended API.

bq. I'm worried about making HdfsFileStatus extend FileStatus, particularly the 
link target. HdfsFileStatus is used on the NameNode, and there is potentially 
some mangling from turning the link target String into a Path. The target is 
supposed to be an opaque value within the filesystem, left to be interpreted by 
the client.
For this particular case, shouldn't the caller be using 
{{HdfsFileStatus::getSymlinkInBytes}} in contexts where it's opaque? I share 
your concern, but extending FileStatus also required touching code that didn't 
expect (spurious) IOExceptions. In those and other existing contexts, I didn't 
find any instances where the String wasn't converted to a Path. Converting the 
byte[] to a String may also mangle it somewhat.

To treat it as opaque for the patch, I'll change the serialization to use the 
{{getSymlinkInBytes}} conversion, rather than {{Path::toString}}. I'll also go 
through its uses, and try to find all the cases that don't immediately convert 
the String to a Path.

bq. How bad is it to split this change out?
I've split this twice already, to fix concerns over DistCp and for 
{{Serializable}}. We can change the scope of the JIRA, but (IMHO) these changes 
are related, and implied by previous review requests.

bq. Wasn't clear what we gain by having HdfsFileStatus extend FileStatus
The case [~sershe] raised for HDFS-7878 involved passing a handle across 
process boundaries. If a PathHandle is required to store sufficient state to 
open the file, then it needs to copy (and serialize) data from HdfsFileStatus 
that, in most cases, will not be used. We can construct this on-demand, if the 
HdfsFileStatus is kept intact. There's also the minor inefficiency of creating 
a new FileStatus instance and copying fields, but 1) that's often incurred 
anyway because the Located\* types are irreconcilable and 2) in the noise.

FileStatus is exposing attributes that (AFAIK) only apply to HDFS (e.g., EC, 
encryption); the types are 1:1 mappings right now. If HDFS wants to pass 
additional information, or encode it differently, then it can return a 
different data type (perhaps with this as a field). As it's used inside the NN, 
it lazily binds some fields, but I don't know of a use that's inconsistent with 
the POD type. I wish FileStatus were an interface and not a concrete class, but 
that ship has sailed.

If you're really uncomfortable with the change, then I can try to roll it back.

bq. HttpsFileSystem: Please do not use a wildcard import
Recently switched to Intellij; not intended. Will fix my settings.

> In Hadoop 3, make FileStatus serialize itself via protobuf
> ----------------------------------------------------------
>
>                 Key: HDFS-6984
>                 URL: https://issues.apache.org/jira/browse/HDFS-6984
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Colin P. McCabe
>            Assignee: Colin P. McCabe
>              Labels: BB2015-05-TBR
>         Attachments: HDFS-6984.001.patch, HDFS-6984.002.patch, 
> HDFS-6984.003.patch, HDFS-6984.004.patch, HDFS-6984.005.patch, 
> HDFS-6984.006.patch, HDFS-6984.007.patch, HDFS-6984.008.patch, 
> HDFS-6984.009.patch, HDFS-6984.nowritable.patch
>
>
> FileStatus was a Writable in Hadoop 2 and earlier.  Originally, we used this 
> to serialize it and send it over the wire.  But in Hadoop 2 and later, we 
> have the protobuf {{HdfsFileStatusProto}} which serves to serialize this 
> information.  The protobuf form is preferable, since it allows us to add new 
> fields in a backwards-compatible way.  Another issue is that already a lot of 
> subclasses of FileStatus don't override the Writable methods of the 
> superclass, breaking the interface contract that read(status.write) should be 
> equal to the original status.
> In Hadoop 3, we should just make FileStatus serialize itself via protobuf so 
> that we don't have to deal with these issues.  It's probably too late to do 
> this in Hadoop 2, since user code may be relying on the existing FileStatus 
> serialization there.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to