This sounds reasonable, and is not like the case of AFS which uses special tokens in symbolic links that can expand to other things.
I'm a bit concerned about potential effects on applications, it *seems* like this is done in a manner that is safe, but there are a few items: * are applications consistent in their use of pathconf/fpathconf to get filesystem limits * presumably archivers and such are not expected to traverse these? (they get handled like an ordinary symbolic link) * what happens when the referral is archived and then reextracted? (is the attribute lost?) * as a nit, its not truly file system independent, since it relies on symbolic links (not all filesystems support symlinks, though admittedly the ones of interest to this case all do) I believe that this case likely exceeds the obviousness test for a fast track. I certainly wouldn't be comfortable having it go through with only a single +1 from another member (your own +1 doesn't count, as I understand the rules -- case owners don't count). Given this, I'm going to derail the case, just to force enough members to read it to get a meaningful vote. I'll write any resulting opinion. I don't think we need any additional materials apart from answers to the questions I've already raised. Note that I don't think there is anything intrinsically wrong with the case (though my archivers question above is I think a real potential concern) -- the derail here should not be taken as a negative statement about the case itself; I just want to make sure it is adequately and properly reviewed. Thanks. - Garrett Glenn Skinner wrote: > I'm sponsoring the following fast track for Afshin Salek and the CIFS > i-team. It times out on Friday, July 17th. > > A copy of the specification below appears in the case directory under > the name "specification". > > I've pre-reviewed it and will give it a +1 up front. > > -- Glenn > > ---------------- > > Template Version: @(#)onepager.txt 1.35 07/11/07 SMI > Copyright 2007 Sun Microsystems > > 1. Introduction > 1.1. Project/Component Working Name: > Support for Reparse Points > > 1.2. Name of Document Author/Supplier: > Author: Afshin Salek > > 1.3. Date of This Document: > 07/08/09 > > 1.4. Name of Major Document Customer(s)/Consumer(s): > PSARC > CIFS team > > 1.5. Email Aliases: > 1.5.1. Responsible Manager: Barry.Greenberg at Sun.COM > 1.5.2. Responsible Engineer: Afshin.Ardakani at Sun.COM > 1.5.3. Marketing Manager: > 1.5.4. Interest List: cifs-team at sun.com > > A patch binding is requested for this change. > > 4. Technical Description: > 4.1. Details: > > INTRODUCTION > > There are situations where a mechanism is needed to reflect > the concept that data is not present at a particular path, but > can be found in some alternate location(s). Examples include > "referrals" used to build unified name spaces in NFSv4.x and > SMB, and data relocation in HSM systems. A "reparse point" is > defined as the marker for a namespace redirection and a > container for the metadata to specify where the target of this > redirection is. > > Reparse points are intended to be a general mechanism for > location redirection and as such the file system that contains > them is not cognizant of the reparse point format or content. > Services that use reparse points know how to interpret and use > the stored data. > > REPARSE POINT OBJECT > > After a lot of discussion the consensus is that the best way > to represent reparse points in the file system, in order to > minimize the effect on existing applications and utilities, to > use symbolic links. One of the main goals in this context has > been the ability to use existing utilities for backup/restore > and also ZFS send/receive without having to modify them to > know how to deal with reparse points. > > Some of what is envisioned here could be done with extensions > to the Solaris automounter capability. Part of the > motivation, though, is to create centrally-administrated > namespaces served by a group of fileservers to near-zero-admin > clients. It is expected to be easier to keep the namespaces > uniform if only a small number of servers need to participate. > HSM solutions would also normally be tied closely to a storage > server by this mechanism. Also, for both NFS and SMB > referrals, it is the client that chooses the target and not > the server. The server only provides the targets' information > and it is up to the client to pick the desirable target to > access the data. > > To distinguish a regular symlink from a reparse point, an > extensible system attribute will be set on the symlink. This > system attribute is only one bit which indicates whether or > not a symlink contains reparse data. > > The reparse data will be stored as the link target. The > reparse data is not in file system path format, which is the > typical format of a link target. In order to avoid coming up > with a totaly new format for reparse data as the link target > we decided to adopt the format used by magic links in BSD: > (http://www.daemon-systems.org/man/symlink.7.html) > > @{repa...@{service-type1:data} [...@{service-type2:data}]...} > > Where some examples of service-type are: > > #define REPARSE_SVC_SMB "SMB" > #define REPARSE_SVC_NFS "NFS" > #define REPARSE_SVC_HSM "HSM" > > The data for each service will be in string format, which is > expected to be typically a UUID string. > > The pattern above starts with "REPARSE" to distinguish it from > a other magic links, such as those supported by BSD. Note > that this case is not a proposal to support BSD magic links, > the intent is to avoid precluding the future addition of full > BSD magic link support. > > Multiple services entries can co-exist within the symlink > data. It is expected that normally, all entries would resolve > to the same logical location, e.g. NFS and CIFS clients would > find the same files. > > BASIC INTERFACES > > There is a need for both userspace and kernel APIs to work > with reparse points. > > Userspace API > > In userspace the symlink(2) system call will be used to set a > reparse point. The readlink(2) system call will be used in > turn to read the reparse data. > > Kernel API > > In the kernel, VOP_SYMLINK and VOP_READLINK will be used to > set/get reparse data. > > These interfaces will support all replication, archive and > copy operations to preserve reparse points without further > changes. > > fop_symlink() needs to be modified to recognize the reparse > @{REPARSE} tag and pass the appropriate attribute (i.e. > reparse system attribute) to VOP_SYMLINK to be set on the > symlink. > > IMPLEMENTATION OBSERVATIONS > > VFS feature registration can be used to determine whether or > not a file system supports reparse points. > > Two things are needed to obtain the reparse point data in the > kernel. First, the consumer needs to know that a reparse > point has been encountered and, second, it needs the vnode > pointer to the symlink. The proposal is to enhance VOP_LOOKUP > to return the attributes of the looked up vnode. This way > when the vnode is available the caller can check the > attributes to determine if the returned vnode is a reparse > point or a regular symlink. Here are the old and revised > signatures of VOP_LOOKUP: > > int VOP_LOOKUP(vnode_t *dvp, char *nm, vnode_t **vpp, > pathname_t *pnp, int flags, vnode_t *rdir, cred_t *cr, > caller_context_t *ct, int *deflags, pathname_t *ppnp) > > int VOP_LOOKUP(vnode_t *dvp, char *nm, vnode_t **vpp, > pathname_t *pnp, int flags, vnode_t *rdir, cred_t *cr, > caller_context_t *ct, int *deflags, pathname_t *ppnp, > vattr_t *vap) > > A vattr_t pointer argument is added at the end to return the > attributes if it is non-NULL. This is an optimization so that > consumers don't have to invoke an extra VOP_GETATTR after > lookup for obtaining the attributes. > > The symlink target size should be increased to 16K to > accomodate the maximum size supported for MS-DFS referrals by > Windows. Applications are expected to query the PATH_MAX and > SYMLINK_MAX values on the local system using > pathconf(2)/fpathconf(2). The value of SYMLINK_MAX would be > changed to 16K on ZFS. The value of PATH_MAX will not be > affected. > > To provide compatibility with other UNIXes (see section 6 > below), sharemgr(1M) would be enhanced to support a "refer" > option for NFS exports. This option would only result in > creation of a reparse point at the specified path and does not > actually share the path over NFS. > > This case is only about the underlying infrastructure and a > future case will be presented to deal with details and > specifics of handling referrals for NFSv4 server. > > SECURITY CONSIDERATIONS > > Referrals are similar to regular symbolic links in that they > are only pointers to data that could be discovered in some > other way. The presence of such a pointer does not compromise > the security of the target object or data; the target service > or file system must still enforce security. > > OPERATION FLOW > > Once a kernel service encounters a reparse point, it reads the > data using VOP_READLINK and passes the data up to a user space > daemon (e.g. reparsed) along with its desired record type. > Depending on the requested record type the daemon could simply > extract the information from the passed data and return it to > kernel or do any other processing necessary to obtain the > actual referral information e.g. in the case of FedFS, > contacting NSDB. Going through a common user space daemon to > get the referral data makes this process generic and easily > expandable for possible future use cases. > > Referral extraction and creation by a userspace daemon can be > handled via a library plugin architecture for different > service types. > > Operation Flow Example > > Here is a simplified example of operation for a CIFS client > that tries to access a file where the path contains a DFS > link: > > a) Client tries to access \\srv\root\...\link\...\file.txt > where: > 'root' is a share (namespace root) > 'link' is a reparse point seen as a folder by client > > b) CIFS server does a VOP_LOOKUP for 'link' when it is > recognized as a reparse point by examining the attributes > return by VOP_LOOKUP. At this point a > STATUS_PATH_NOT_COVERED is returned to client > > c) Client sends a "link referral" request to the server. CIFS > server uses VOP_READLINK to get the 'link' data and sends > the data to 'reparsed' daemon via a door call and gets back > the DFS link targets in a format understandable by the CIFS > client. The targets are sent back to the client in > response to its "link referral" request. > > b) Client picks one of the targets and contacts the target > server to access 'file.txt' > > NFS REFERRAL IN OTHER UNIXES > > FS referrals have been implemented in other major UNIX > distributions such as Linux, AIX and HP-UX but there is no > unified approach or implementation. > > Linux, AIX and HP-UX specify referrals as an NFS export > option. The option format is basically the same in all three > operating systems (refer=path at host) but the presentation is > somewhat different in each case: > > - In Linux a referral is presented as a mount point. > - In HP-UX a referral is a file system partition or logical volume. > - In AIX a special object is used to represent a referral. > > These are all mechanisms to trigger a change in namespace > while resolving a path. > > This proposal is somewhat aligned with the AIX approach but > does not require a new object type to be defined, which has > the advantage of not impacting existing applications. As > mentioned previously, an NFS "refer" option will be supported > to provide option format compatibility. > > Additionally, the Solaris requirements include support for > both NFS and SMB referrals whereas these other operating > systems only support NFS referrals, and they do not provide > native SMB support. For the Solaris operating system, this > proposal provides a generic solution to support multiple, > disparate referral mechanisms without placing restrictions on > the format required by each mechanism. > > The following links provide a bit more details about each OS > discussed above: > > http://www.citi.umich.edu/projects/nfsv4/linux/using-referrals.html > http://nfsv4.bullopensource.org/doc/migration-and-replication-0.2.pdf > http://docs.hp.com/en/5900-0306/ch01s11.html?jumpid=reg_R1002_USEN > http://docs.hp.com/en/13578/nfsv4_whitepaper.pdf > > http://publib.boulder.ibm.com/infocenter/systems/index.jsp?topic=/com.ibm.aix.commadmn/doc/commadmndita/nfs_referrals.htm > > > INTERFACE TABLE > > |Proposed |Specified | > |Stability |in what | > Interface Name |Classification |Document? | Comments > =========================================================================== > XAT_REPARSE |Consolidation |This |Reparse extensible > |Private |Document |attribute > | | | > VOP_LOOKUP, fop_lookup |Contracted |This |Added new argument: > |Consolidation |Document |vattr_t *vap > |Private* | | > | | | > Reparse token syntax |Committed |This | > |Private |Document | > | | | > SYMLINK_MAX |Committed |This |Increased to 16K > | |Document | > > * The project's deliverables will all go into the OS/NET > Consolidation, so no contracts are required. > > 6. Resources and Schedule: > > 6.4. Product Approval Committee requested information: > 6.4.1. Consolidation or Component Name: > ON > > 6.5. ARC review type: > FastTrack > >