[ 
https://issues.apache.org/jira/browse/HDFS-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13878992#comment-13878992
 ] 

Yongjun Zhang commented on HDFS-5767:
-------------------------------------

I did some experiment, and saw that if we configure nsswitch.conf as
  passwd files ldap

then "getent passwd <userX>" will return the entry in /etc/passwd (ldap also 
has an entry for the same user).

And "getent passwd" returns the combined entries from /etc/passwd and ldap, 
with /etc/passwd entries appear in the beginning.

So the search order is the order that the database appears in the configure (I 
hope this is correct).

My current thinking is, for implicity, we can assume unique mapping between 
userName and uid.
And we can assume
    "getent passwd <userName>"
and
   "getent passwd <userId>"
return the result we need (the first matching entry based on search order).

The mapping set up algorithm can look at:

 foreach en "getent passwd" {
     if (en.uname is already in map) {
        if (en.uid is not the same as in the map) {          
          warn and ignore
          //at this point, I expect the uid curent in the map for en.uname is 
the same 
          //as "getent passwd en.uname" should give, we can probably add an 
assertion for this.
          // 
          //and the impact of ignoring this entry is subject to our monitoring 
in the field, so better issue
          //a good message here for problem investigation.
        }
     } else {
       set <en.user, en.uid> mapping as a new entry
     }
 }

Do you think this makes sense?

I made quite a few comments in a row, welcome to look at my earlier ones and 
share your thoughts too.

Thanks.


> Nfs implementation assumes userName userId mapping to be unique, which is not 
> true sometimes
> --------------------------------------------------------------------------------------------
>
>                 Key: HDFS-5767
>                 URL: https://issues.apache.org/jira/browse/HDFS-5767
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: nfs
>    Affects Versions: 2.3.0
>         Environment: With LDAP enabled
>            Reporter: Yongjun Zhang
>            Assignee: Brandon Li
>
> I'm seeing that the nfs implementation assumes unique <userName, userId> pair 
> to be returned by command  "getent paswd". That is, for a given userName, 
> there should be a single userId, and for a given userId, there should be a 
> single userName.  The reason is explained in the following message:
>  private static final String DUPLICATE_NAME_ID_DEBUG_INFO = "NFS gateway 
> can't start with duplicate name or id on the host system.\n"
>       + "This is because HDFS (non-kerberos cluster) uses name as the only 
> way to identify a user or group.\n"
>       + "The host system with duplicated user/group name or id might work 
> fine most of the time by itself.\n"
>       + "However when NFS gateway talks to HDFS, HDFS accepts only user and 
> group name.\n"
>       + "Therefore, same name means the same user or same group. To find the 
> duplicated names/ids, one can do:\n"
>       + "<getent passwd | cut -d: -f1,3> and <getent group | cut -d: -f1,3> 
> on Linux systms,\n"
>       + "<dscl . -list /Users UniqueID> and <dscl . -list /Groups 
> PrimaryGroupID> on MacOS.";
> This requirement can not be met sometimes (e.g. because of the use of LDAP) 
> Let's do some examination:
> What exist in /etc/passwd:
> $ more /etc/passwd | grep ^bin
> bin:x:2:2:bin:/bin:/bin/sh
> $ more /etc/passwd | grep ^daemon
> daemon:x:1:1:daemon:/usr/sbin:/bin/sh
> The above result says userName  "bin" has userId "2", and "daemon" has userId 
> "1".
>  
> What we can see with "getent passwd" command due to LDAP:
> $ getent passwd | grep ^bin
> bin:x:2:2:bin:/bin:/bin/sh
> bin:x:1:1:bin:/bin:/sbin/nologin
> $ getent passwd | grep ^daemon
> daemon:x:1:1:daemon:/usr/sbin:/bin/sh
> daemon:x:2:2:daemon:/sbin:/sbin/nologin
> We can see that there are multiple entries for the same userName with 
> different userIds, and the same userId could be associated with different 
> userNames.
> So the assumption stated in the above DEBUG_INFO message can not be met here. 
> The DEBUG_INFO also stated that HDFS uses name as the only way to identify 
> user/group. I'm filing this JIRA for a solution.
> Hi [~brandonli], since you implemented most of the nfs feature, would you 
> please comment? 
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to