On Thu, Mar 11, 2021 at 10:19:52AM +0000, Tom Ellis wrote:

> SPJ Wrote:
> > I've just installed WSL2 and built GHC. I get this (single)
> > validation failure in libraries/unix/tests/getGroupEntryForName.  It
> > seems to be just an error message wibble, but I can't push a change
> > to master because that'll affect everyone else.
> 
> Interesting, I've only ever built GHC on WSL and WSL2. I've seen this
> error message on WSL2 during every test run, I think.  I didn't
> realise that it never occurred on other platforms, let alone that it
> was WSL2 specific!

I am curious what specific version/branch of GHC (and associated
submodule commit of "unix") is being tested.

I've recently cleaned a bunch of the upstream "unix" handling of the
group/passwd database handling, but I don't believe that GHC has yet
switched to the newer code.

A subtle facet of the delta points in the right direction:

    -getGroupEntryForName: getGroupEntryForName: does not exist (no such group)
    +getGroupEntryForName: getGroupEntryForName: does not exist (No such 
process)

not only is it complaining about "process" rather than "group", but
crucially the case of the word "No" is different.  The variance is due
to the fact that there are two possible error paths with group lookup
in the group lookup code:

        doubleAllocWhileERANGE loc enttype initlen unpack action =
          alloca $ go initlen
         where
          go len res = do
            r <- allocaBytes len $ \buf -> do
                   rc <- action buf (fromIntegral len) res
                   if rc /= 0
--hard-error->       then return (Left rc)
                     else do p <- peek res
--not-found-->               when (p == nullPtr) $ notFoundErr
                             fmap Right (unpack p)
            case r of
              Right x -> return x
              Left rc | Errno rc == eRANGE ->
                -- ERANGE means this is not an error
                -- we just have to try again with a larger buffer
                go (2 * len) res
              Left rc ->
--1-->          ioError (errnoToIOError loc (Errno rc) Nothing Nothing)
          notFoundErr =
--2-->      ioError $ flip ioeSetErrorString ("no such " ++ enttype)
                    $ mkIOError doesNotExistErrorType loc Nothing Nothing

The expected error path is "not-found" -> (2), where the group lookup
works, but no result is found (rc == 0).  This reports the lower-case
"no such group".

The unexpected error path is a non-zero return from "getgrnam_r"
(action) -> (1), which uses `errno` to build the error string, which
ends up being "No such process".

On Linux systems that's: ESRCH 3 /* No such process */

So the call to "getgrnam_r" failed by returning ESRCH, rather than 0.
The Linux manpage does not suggest to me that one might expect a
non-zero return from getgrnam_r(3) just from a missing entry in the
group file:

    RETURN VALUE
           The getgrnam() and getgrgid() functions return a pointer to a
           group structure, or NULL if the matching entry is not found
           or an error occurs.  If an error occurs, errno is set
           appropriately.  If one wants to check errno after the call,
           it should be set to zero before the call.

           The return value may point to a static area, and may be
           overwritten by subsequent calls to getgrent(3), getgrgid(),
           or getgrnam().  (Do not pass the  returned  pointer  to
           free(3).)

           On  success, getgrnam_r() and getgrgid_r() return zero, and
--->       set *result to grp.  If no matching group record was found,
--->       these functions return 0 and store NULL in *result.  In case
--->       of error, an error number is returned, and NULL is stored in
--->       *result.

    ERRORS
           0 or ENOENT or ESRCH or EBADF or EPERM or ...
                  The given name or gid was not found.

           EINTR  A signal was caught; see signal(7).

           EIO    I/O error.

           EMFILE The per-process limit on the number of open file descriptors 
has been reached.

           ENFILE The system-wide limit on the total number of open files has 
been reached.

           ENOMEM Insufficient memory to allocate group structure.

           ERANGE Insufficient buffer space supplied.

The "0 or ENOENT or ESRCH ..." text then plausibly applies to
getgrnam(3), and its legacy behaviour.

So the question is why the lookup is failing.  To that end compiling a
tracing with "strace" the below C program should tell the story:

    #include <sys/types.h>
    #include <grp.h>
    #include <errno.h>
    #include <stdio.h>

    int main(int argc, char **argv)
    {
        struct group g, *p;
        char buf[1024];
        int rc;

        errno = 0;
        rc = getgrnam_r("nosuchgrouphere", &g, buf, sizeof(buf), &p);
        printf("%p: %m(%d)\n", p, errno);
        return (rc == 0 && p == NULL);
    }

On a Fedora 31 system I get:

    $ make g
    cc     g.c   -o g
    $ ./g
    (nil): Success(0)

If something else happens on WSL2, running

    $ strace -o g.trace ./g

may reveal something not going right during the lookup if the problem is
with some system call.  On the other hand, if the problem is entirely
in "user-land", then it may take more work to see what's going on.

Is group database on these systems backed just by local files or by
AD LDAP?  A look at at the "group" entry in /etc/nsswitch.conf may
shed some light on how groups are found.

-- 
    Viktor.
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Reply via email to