Re: openFile gives "file is locked" error on Linux when creating a non-existing file

2024-10-09 Thread Viktor Dukhovni
On Wed, Oct 09, 2024 at 12:15:32PM +0530, Harendra Kumar wrote:
> We do use low level C APIs and GHC APIs to create a Handle in the
> event watching module. But that is for the watch-root and not for the
> file that is experiencing this problem. So here is how it works. We
> have a top level directory which is watched for events using inotify.
> We first create this directory, this directory is opened using
> inotify_init which returns a C file descriptor. We then create a
> Handle from this fd, this Handle is used for watching inotify events.
> We are then creating a file inside this directory which is being
> watched while we are reading events from the parent directory. The
> resource-busy issue occurs when creating a file inside this directory.
> So we are not creating the Handle for the file in question in a
> non-standard manner, but the parent directory Handle is being created
> in that manner. I do not know if that somehow affects anything. Or if
> the fact that the directory is being watched using inotify makes any
> difference?
> 
> The code for creating the watch Handle is here:
> https://github.com/composewell/streamly/blob/bbac52d9e09fa5ad760ab6ee5572c701e198d4ee/src/Streamly/Internal/FileSystem/Event/Linux.hs#L589
> . Viktor, you may want to take a quick look at this to see if it can
> make any difference to the issue at hand.

I don't have the cycles to isolate the problem.  I still suspect that
your code is somehow directly closing file descriptors associated with a
Handle.  This then orphans the associated logical reader/writer lock,
which then gets inherited by the next incarnation of the same (dev, ino)
pair.  However, if the filesystem underlying "/tmp" were actually "tmpfs",
inode reuse would be quite unlikely, because tmpfs inodes are assigned
from a strictly incrementing counter:

$ for i in {1..10}; do touch /tmp/foobar; ls -i /tmp/foobar; rm
/tmp/foobar; done
3830 /tmp/foobar
3831 /tmp/foobar
3832 /tmp/foobar
3833 /tmp/foobar
3834 /tmp/foobar
3835 /tmp/foobar
3836 /tmp/foobar
3837 /tmp/foobar
3838 /tmp/foobar
3839 /tmp/foobar

but IIRC you mentioned that on Github "/tmp" is ext4, not "tmpfs"
(perhaps RAM-backed storage is a more scarce resource), in which
case indeed inode reuse is quite likely:

$ for i in {1..10}; do touch /var/tmp/foobar; ls -i /var/tmp/foobar; rm
/var/tmp/foobar; done
25854141 /var/tmp/foobar
25854142 /var/tmp/foobar
25854141 /var/tmp/foobar
25854142 /var/tmp/foobar
25854141 /var/tmp/foobar
25854142 /var/tmp/foobar
25854141 /var/tmp/foobar
25854142 /var/tmp/foobar
25854141 /var/tmp/foobar
25854142 /var/tmp/foobar

But since normal open/close of Handles acquires the lock after open, and
releases it before close, the evidence points to a bypass of the normal
open file lifecycle.

Your codebase contains a bunch of custom file management logic, which
could be the source the of problem.  To find the problem code path,
you'd probably need to instrument the RTS lock/unlock code to log its
activity: (mode, descriptor, dev, ino) tuples being added and removed.
And strace execution to be able to identify descriptor open and close
events.  Ideally the problem will be reproducible even with strace.

Good luck.

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: openFile gives "file is locked" error on Linux when creating a non-existing file

2024-10-08 Thread Viktor Dukhovni
On Wed, Oct 09, 2024 at 10:24:30AM +0530, Harendra Kumar wrote:

> I just noticed that cabal seems to be running test suites in parallel.
> We have two test suites. Even though each test suite generates the
> temp names randomly, they use the same prefix, if the generated names
> have a possibility of conflict due to PRNG it may cause a problem.
> That is perhaps the more likely cause rather than hunting this in GHC.
> cabal running tests in parallel without explicitly saying so came as a
> surprise to me. In fact I found an issue in cabal repo asking for a
> "feature" to run them sequentially, the issue is still open -
> https://github.com/haskell/cabal/issues/6751 . Hopefully this is it.

Just parallel execution is not sufficient to explain the observed
problem, you still need to have the same inode/dev already open
in the same process, or bookkeeping of which dev/ino pairs are
in use to be incorrect.

So either the Github filesystem is reusing inodes of already deleted,
but still open files (a deviation from expected Unix behaviour), or
somehow GHC fails to correctly track the dev/ino pairs of open handles.

My best guess is that something is manipulating file descriptors
directly, bypassing the Handle layer, and *then* parallel execution
could exacerbate the resulting inconsistent state.

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: openFile gives "file is locked" error on Linux when creating a non-existing file

2024-10-08 Thread Viktor Dukhovni
On Tue, Oct 08, 2024 at 06:08:52PM +0530, Harendra Kumar wrote:

> What if we closed a file and created another one and the inode of the
> previous file got reused for the new one? Is it possible that there is
> a small window after the deletion of the old one in which GHC keeps
> the lock in its hash table?

That SHOULD NOT happen, GHC releases the (internal hash table entry)
lock before closing the file descriptor:

close :: FD -> IO ()
close fd =
  do let closer realFd =
   throwErrnoIfMinus1Retry_ "GHC.IO.FD.close" $
#if defined(mingw32_HOST_OS)
   if fdIsSocket fd then
 c_closesocket (fromIntegral realFd)
   else
#endif
 c_close (fromIntegral realFd)

 -- release the lock *first*, because otherwise if we're preempted
 -- after closing but before releasing, the FD may have been reused.
 -- (#7646)
 release fd

 closeFdWith closer (fromIntegral (fdFD fd))

release :: FD -> IO ()
release fd = do _ <- unlockFile (fromIntegral $ fdFD fd)
return ()

Solved in GHC 7.8 11 years ago:

https://gitlab.haskell.org/ghc/ghc/-/issues/7646#note_68902

This assumes that the application is not closing the file descriptor
"behind GHC's back".  That is, you're not using the POSIX package to
directly close file descriptors underlying Haskell file Handles
(which would then orphan the associated "lock").

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: openFile gives "file is locked" error on Linux when creating a non-existing file

2024-10-08 Thread Viktor Dukhovni
On Tue, Oct 08, 2024 at 01:15:40PM +0530, Harendra Kumar wrote:
> On Tue, 8 Oct 2024 at 11:50, Viktor Dukhovni  wrote:
> 
> > What sort of filesystem is "/tmp/fsevent_dir-.../watch-root" located in?
> 
> This happens on github Linux CI. Not sure which filesystem they are
> using. Earlier I was wondering if something funny is happening in case
> they are using NFS. But NFS usually causes issues due to caching of
> directory entries if we are doing cross-node operations, here we are
> on a single node and operations are not running in parallel (or that's
> what I believe).  I will remove the hspec layer from the tests to make
> sure that the code is simpler and our understanding is correct.
> 
> I will also run the tests on circle-ci to check if the problem occurs
> there. I have never seen this problem in testing this on a Linux
> machine on AWS even if I ran the tests for days in a loop.

Looking more closely at the GHC code, we see that there's an internal
(RTS not OS level) exclusive lock on the (device, inode) pair as part of
opening a Unix file for writes, or shared lock for reads.

  rts/FileLock.c:
int
lockFile(StgWord64 id, StgWord64 dev, StgWord64 ino, int for_writing)
{
Lock key, *lock;

ACQUIRE_LOCK(&file_lock_mutex);

key.device = dev;
key.inode  = ino;

lock = lookupHashTable_(obj_hash, (StgWord)&key, hashLock, cmpLocks);

if (lock == NULL)
{
lock = stgMallocBytes(sizeof(Lock), "lockFile");
lock->device = dev;
lock->inode  = ino;
lock->readers = for_writing ? -1 : 1;
insertHashTable_(obj_hash, (StgWord)lock, (void *)lock, hashLock);
insertHashTable(key_hash, id, lock);
RELEASE_LOCK(&file_lock_mutex);
return 0;
}
else
{
// single-writer/multi-reader locking:
if (for_writing || lock->readers < 0) {
RELEASE_LOCK(&file_lock_mutex);
return -1;
}
insertHashTable(key_hash, id, lock);
lock->readers++;
RELEASE_LOCK(&file_lock_mutex);
return 0;
}
}

This is obtained in "libraries/base/GHC/IO/FD.hs", via:

mkFD fd iomode mb_stat is_socket is_nonblock = do
...
case fd_type of
Directory ->
   ioException (IOError Nothing InappropriateType "openFile"
   "is a directory" Nothing Nothing)

-- regular files need to be locked
RegularFile -> do
   -- On Windows we need an additional call to get a unique device 
id
   -- and inode, since fstat just returns 0 for both.
   -- See also Note [RTS File locking]
   (unique_dev, unique_ino) <- getUniqueFileInfo fd dev ino
   r <- lockFile (fromIntegral fd) unique_dev unique_ino
 (fromBool write)
   when (r == -1)  $
ioException (IOError Nothing ResourceBusy "openFile"
   "file is locked" Nothing Nothing)
...

This suggests that when the file in question is opened there's already a
read lock in for the same dev/ino.  Perhaps the Github filesystem fails
to ensure uniqueness of dev+ino of open files (perhaps when open files
are already unlinked)?

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: openFile gives "file is locked" error on Linux when creating a non-existing file

2024-10-07 Thread Viktor Dukhovni
On Tue, Oct 08, 2024 at 11:23:14AM +0530, Harendra Kumar wrote:

> This cannot be a TOCTOU bug as the code to check the existence of the
> file is only introduced for debugging this issue, to report in case
> the file exists for some reason.  Our understanding is that this file
> cannot exist in the first place. We have never seen the "File exists"
> message being printed, I will make that an error to make sure. The
> tests create a temporary file using a random directory name in the
> system temp directory, the directory is destroyed at the end of the
> test. Also, tests do not run in parallel, we are using hspec to run
> tests and it does not run tests in parallel unless we explicitly say
> so, so there is no possibility that two tests could be trying to use
> the same file. We will double check that. Also, this happens only on
> Linux. We will also try the append mode as you suggested.

What sort of filesystem is "/tmp/fsevent_dir-.../watch-root" located in?

Creating and closing a file in write mode from GHC:

import System.IO

main :: IO ()
main = do
putStrLn "Show time" >> hFlush stdout
openFile "/tmp/foo.out" WriteMode >>= hClose

translates on Linux to (strace):

write(1, "Show time\n", 10) = 10
openat(AT_FDCWD, "/tmp/foo.out", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 
0666) = 6
newfstatat(6, "", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_EMPTY_PATH) = 0
ftruncate(6, 0) = 0
ioctl(6, TCGETS, 0x7ffd358412a0)= -1 ENOTTY (Inappropriate ioctl 
for device)
close(6)= 0

Nothing at all unusual happening here, so if the OS returns EBUSY,
perhaps there's something interesting you can report about the state of
that directory before file creation?  Perhaps there's some filesystem or
other kernel resource you're maxing out during the tests?

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: openFile gives "file is locked" error on Linux when creating a non-existing file

2024-10-07 Thread Viktor Dukhovni
On Mon, Oct 07, 2024 at 09:52:06PM +1100, Viktor Dukhovni wrote:

> If you need to know whether the file got created by this call, or was
> found to exist already, you need a lower-level API, such as (Unix C):
> 
> /* In some cases 0600 is more appropriate */
> int fd = open(path, O_WRONLY|O_CREAT|O_EXCL, 0666);
> 
> if (fd >= 0) {
> /* Just created */
> (void) close(fd);
> ...
> } else if (errno == EEXIST) {
> /* Already present */
> ...
> } else {
> /* Permission or other problem */
> ...
> }

I should mention that The above assumes a "local" filesystem, with NFS a
race may still be possible, and the open(2) manpage may describe
work-arounds, e.g. Linux:

  On NFS, O_EXCL is supported only when using NFSv3 or later on
  kernel 2.6 or later.  In NFS environments where O_EXCL support is
  not provided, programs  that  rely on  it for performing locking
  tasks will contain a race condition.  Portable programs that want
  to perform atomic file locking using a lockfile, and need to avoid
  reliance on NFS support for O_EXCL, can create a unique file on
  the same filesystem (e.g., incorporating hostname and PID), and
  use link(2) to make a link to  the lockfile.   If  link(2) returns
  0, the lock is successful.  Otherwise, use stat(2) on the unique
  file to check if its link count has increased to 2, in which case
  the lock is also successful.

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: openFile gives "file is locked" error on Linux when creating a non-existing file

2024-10-07 Thread Viktor Dukhovni
On Mon, Oct 07, 2024 at 08:25:21AM +0530, Harendra Kumar wrote:

> exists <- doesFileExist filepath
> if not exists
> then do
> putStrLn $ "Creating file: " ++ (parent  file)
> openFile (parent  file) WriteMode >>= hClose
> putStrLn $ "Created file: " ++ (parent  file)
> else error $ "File exists: " ++ filepath

This is a classic TOCTOU bug.  The file can come into existence between
the "doesFileExist" test and the attempt to create it.  To create a file
only if it does not exist, you need to use an "open" variant that
creates the file if necessary, and leaves it unmodified if it already
exists.  "AppendMode" works for this, because you're closing the file
immediately, so the fact that any writes would "append" is not material.

So replace the above with:

openFile filepath AppendMode >>= hClose

If you need to know whether the file got created by this call, or was
found to exist already, you need a lower-level API, such as (Unix C):

/* In some cases 0600 is more appropriate */
int fd = open(path, O_WRONLY|O_CREAT|O_EXCL, 0666);

if (fd >= 0) {
/* Just created */
(void) close(fd);
...
} else if (errno == EEXIST) {
/* Already present */
...
} else {
/* Permission or other problem */
...
}

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Problems building cabal-install?

2023-11-25 Thread Viktor Dukhovni
On Sat, Nov 25, 2023 at 05:23:33PM -0500, Viktor Dukhovni wrote:

> > Which GHC version are you attempting to build with? My guess is that
> > `cabal-install-3.4` excludes your GHC's `base` via its version
> > constraints.
> 
> No, I'm specifically using GHC 8.10, which actually comes with the Cabal
> 3.4 library.  Also tried 8.8 with same results.

Here's the build output:

$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 8.8.4

$ cabal --version
cabal-install version 3.0.1.0
compiled using version 3.0.1.0 of the Cabal library

$ cabal install --constraint 'cabal-install ^>=3.4' cabal-install
Resolving dependencies...
Build profile: -w ghc-8.8.4 -O1
In order, the following will be built (use -v for more details):
 - Cabal-3.4.1.0 (lib) (requires build)
 - Cabal-syntax-3.10.2.0 (lib) (requires build)
 ...
 - hackage-security-0.6.2.3 (lib) (requires build)
 ...
 - cabal-install-3.4.1.0 (exe:cabal) (requires build)
...
Installing   Cabal-syntax-3.10.2.0 (lib)
CompletedCabal-syntax-3.10.2.0 (lib)
Starting hackage-security-0.6.2.3 (lib)
Building hackage-security-0.6.2.3 (lib)
Installing   hackage-security-0.6.2.3 (lib)
Completedhackage-security-0.6.2.3 (lib)
Installing   Cabal-3.4.1.0 (lib)
CompletedCabal-3.4.1.0 (lib)
Starting cabal-install-3.4.1.0 (exe:cabal)
Building cabal-install-3.4.1.0 (exe:cabal)

Failed to build exe:cabal from cabal-install-3.4.1.0.
Build log (

/home/viktor/.cabal/logs/ghc-8.8.4/cabal-install-3.4.1.0-0f55dc0aa499748357ddf42c4e32b1e210d53da7ef90484735d9a77309f7612d.log
):
Configuring executable 'cabal' for cabal-install-3.4.1.0..
Preprocessing executable 'cabal' for cabal-install-3.4.1.0..
Building executable 'cabal' for cabal-install-3.4.1.0..
[  1 of 180] Compiling Distribution.Client.Compat.Directory ( 
Distribution/Client/Compat/Directory.hs, 
dist/build/cabal/cabal-tmp/Distribution/Client/Compat/Directory.o )
[  2 of 180] Compiling Distribution.Client.Compat.ExecutablePath ( 
Distribution/Client/Compat/ExecutablePath.hs, 
dist/build/cabal/cabal-tmp/Distribution/Client/Compat/ExecutablePath.o )
...
[128 of 180] Compiling Distribution.Client.FetchUtils ( 
Distribution/Client/FetchUtils.hs, 
dist/build/cabal/cabal-tmp/Distribution/Client/FetchUtils.o )

Distribution/Client/FetchUtils.hs:195:36: error:
• Couldn't match type ‘Distribution.Types.PackageId.PackageIdentifier’
 with 
‘Cabal-syntax-3.10.2.0:Distribution.Types.PackageId.PackageIdentifier’
  NB: 
‘Cabal-syntax-3.10.2.0:Distribution.Types.PackageId.PackageIdentifier’
is defined in ‘Distribution.Types.PackageId’
in package ‘Cabal-syntax-3.10.2.0’
  ‘Distribution.Types.PackageId.PackageIdentifier’
is defined in ‘Distribution.Types.PackageId’
in package ‘Cabal-3.4.1.0’
  Expected type: 
Cabal-syntax-3.10.2.0:Distribution.Types.PackageId.PackageIdentifier
Actual type: PackageId
• In the second argument of ‘Sec.downloadPackage'’, namely ‘pkgid’
  In a stmt of a 'do' block: Sec.downloadPackage' rep pkgid path
  In the second argument of ‘($)’, namely
‘do info verbosity ("Writing " ++ path)
Sec.downloadPackage' rep pkgid path’
|
195 |   Sec.downloadPackage' rep pkgid path
|^
cabal: Failed to build exe:cabal from cabal-install-3.4.1.0. See the build 
log
above for details.

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Problems building cabal-install?

2023-11-25 Thread Viktor Dukhovni
On Sat, Nov 25, 2023 at 05:09:59PM -0500, Ben Gamari wrote:
> Viktor Dukhovni  writes:
> 
> > Just, for example:
> >
> > $ cabal install --constraint "cabal-install ^>= 3.4" cabal-install
> >
> > This fails due to a conflict between Cabal-3.4 and Cabal-syntax-3.10,
> > (which is not the right choice of dependency for Cabal 3.4).
> >
> > The hackage dependency data looks wrong, the "cabal-syntax" flag in
> > "hackage-security" should not default to "on", and then an older
> > version of "Cabal-syntax" would be chosen.
>
> Which GHC version are you attempting to build with? My guess is that
> `cabal-install-3.4` excludes your GHC's `base` via its version
> constraints.

No, I'm specifically using GHC 8.10, which actually comes with the Cabal
3.4 library.  Also tried 8.8 with same results.

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Problems building cabal-install?

2023-11-25 Thread Viktor Dukhovni
On Sat, Nov 25, 2023 at 04:52:04PM -0500, Ben Gamari wrote:

> > The latter shows up as a dependency of "Cabal" on hackage, but not
> > in the upstream Git repo.  Is there is there some sort of problem
> > with the hackage metadata for Cabal 3.0, 3.2, 3.4, ...
> >
> > It is odd for these to have "Cabal-syntax 3.10.*" as a dependency, with
> > conflicting definitions.
> >
> I believe it is expected that `Cabal-syntax` should appear in the
> dependency set of `Cabal`. I had no trouble building `cabal-install`
> from upstream `master` (4f53a2feeb17bd54b609ee7cfba3c25348aca997) with
> GHC 9.6.3.
> 
> Perhaps you could describe more precisely what you are doing?

Just, for example:

$ cabal install --constraint "cabal-install ^>= 3.4" cabal-install

This fails due to a conflict between Cabal-3.4 and Cabal-syntax-3.10,
(which is not the right choice of dependency for Cabal 3.4).

The hackage dependency data looks wrong, the "cabal-syntax" flag in
"hackage-security" should not default to "on", and then an older
version of "Cabal-syntax" would be chosen.

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Problems building cabal-install?

2023-11-24 Thread Viktor Dukhovni
On Sat, Nov 25, 2023 at 01:07:37AM -0500, Viktor Dukhovni wrote:

> I am having a rather unexpected difficulty building older versions of
> cabal-install.  The invariably run into conflicts between "Cabal" and
> "Cabal-syntax".
> 
> The latter shows up as a dependency of "Cabal" on hackage, but not
> in the upstream Git repo.  Is there is there some sort of problem
> with the hackage metadata for Cabal 3.0, 3.2, 3.4, ...
> 
> It is odd for these to have "Cabal-syntax 3.10.*" as a dependency, with
> conflicting definitions.

I was able to build 'cabal-3.4' from git, after running "cabal freeze"
and editing the freeze file to clear the "+cabal-syntax" flag that for
some reason was getting set for "hackage-security" (seems to the source
of problem) and removing the "3.10" pin for "Cabal-syntax".

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Problems building cabal-install?

2023-11-24 Thread Viktor Dukhovni
I am having a rather unexpected difficulty building older versions of
cabal-install.  The invariably run into conflicts between "Cabal" and
"Cabal-syntax".

The latter shows up as a dependency of "Cabal" on hackage, but not
in the upstream Git repo.  Is there is there some sort of problem
with the hackage metadata for Cabal 3.0, 3.2, 3.4, ...

It is odd for these to have "Cabal-syntax 3.10.*" as a dependency, with
conflicting definitions.

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Support conversion from (UArray i Word8) to ShortByteString?

2023-11-13 Thread Viktor Dukhovni
On Mon, Nov 13, 2023 at 09:35:06PM -0500, Matthew Craven wrote:

> Your proposed `arrayToByteArray` seems plausible.

Thanks, that could be handy, but see below.

> Your proposed `byteArrayToShort` is just the newtype-constructor
> `ShortByteString` which is exposed from Data.ByteString.Short since
> bytestring-0.12.0.0.

Thanks.  I did not notice these are now exposed without having to import
unstable "Internal" interfaces:

GHCi, version 9.8.1: ...
λ> import Data.Array.Byte
λ> import Data.ByteString.Short
λ> :t ShortByteString
ShortByteString :: Data.Array.Byte.ByteArray -> ShortByteString
λ> :t SBS
SBS :: GHC.Prim.ByteArray# -> ShortByteString
λ> :t ByteArray
ByteArray :: GHC.Prim.ByteArray# -> ByteArray

So as of GHC 9.8.1 and "bytestring" 12, I have all the missing glue.

> > An alternative is to add a tailored version of the UArray and STUArray
> > APIs to 'MutableByteArray' by extending the rather limited API of
> > 'Data.Array.Byte':
> 
> How does this compare to the interface provided by
> primitive:Data.Primitive.ByteArray? Their `runByteArray` is your
> `runMutableByteArray`.

I was unaware that the "primitive" packages already provides what I was
looking for.  Unlike the case with "array", there's no duplication of
bounds checks for performing separate read/write at the same index
(because there are no bounds checks), so the "missing" will not be
missed.

Looks like I'm all set.  Just need to use 'SBS' in place of
'ShortByteString' while working with GHC 9.[246].*.

For my use case, I don't need the additional safety (bounds checks) of
"array", but it is perhaps reasonable to consider adding the proposed
bridge (from UArray i Word8), for users who want a bit more safety than
one gets with "primitive".

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Support conversion from (UArray i Word8) to ShortByteString?

2023-11-11 Thread Viktor Dukhovni
The 'ShortByteString' type in the "bytestring" package has seen some
singnificant improvement recently, and yet its API is still noticeably
limited in comparison to its pinned, I/O friendly 'ByteString' elder
sibling.

One of the limitations is that that there are fewer ways to construct a
'ShortByteString' object.  One often has to restort to constructing a
pinned ByteString, and then copy.  (There no "ST" Builders that write
to resizable MutableByteArrays instead of raw memory pointers).

Meanwhile, under-the covers, both the "UArray i Word8" type and
'ShortByteString' hold an immutable 'ByteArray', and the STUArray API
provides a flexible "UArray" construction interface.

Would it be reasonable to "bridge" the two APIs:

Data.Array.Unboxed:  (re-export from Data.Array.Base)
import Data.Array.Byte

arrayToByteArray :: UArray i Word8 -> ByteArray
arrayToByteArray (UArray _ _ _ ba#) = ByteArray ba#
{-# INLINE arrayToByArrray #-}

Data.ByteString.Short: (re-export from Data.ByteString.Short.Internal)
byteArrayToShort :: ByteArray -> ShortByteString
byteArrayToShort = coerce
{-# INLINE byteArrayToShort #-}

It would then be possible to write:

short = byteArrayToShort $ arrayToByteArray $ runSTUArray m
  where
m = do
a <- newArray (0, last) 0 -- zero fill
sequence_ [ writeArray a ix e | (ix, e) <- generator ]

and generate the bytes of a 'ShortByteString' from an arbitrary
computation, possibly merging multiple inputs into some bytes by using
the recently introduced "modifyArray" (or explicit read/modify/write).

Any thoughts about the wisdom or lack thereof of this proposal?

An alternative is to add a tailored version of the UArray and STUArray
APIs to 'MutableByteArray' by extending the rather limited API of
'Data.Array.Byte':

runMutableByteArray :: (forall s. ST s (MutableByteArray s))
-> ByteArray
runMutableByteArray m = runST $ m >>= freezeMutableByteArray

freezeMutableByteArray (MutableByteArray mba#) =
ST $ \s -> case unsafeFreezeByteArray# mba# s of
(# s', ba# #) -> (# s', ByteArray ba# #)

Since "Data.Array.Byte" is an "array" (rather than string) interface, it
could have a richer set of indexed read/write/modify primitives along
the lines of those found in "Data.Array.STUArray", but specialised to
'Word8' elements and implicit zero-based integer indexing.

The flexible construction I seek would then be via "Data.Array.Bytes",
rather than the somewhat too general index and value types from UArray.

short = byteArrayToShort $ runMutableByteArray m
  where
m = do
a <- newByteArray size 0 -- 0 fill
sequence_ [ writeByteArray a ix e | (ix, e) <- generator ]

In this scenario, the indexed-mutation of ShortByteStrings under
construction, or indexed-mutation of copies for various transformations,
could live in Data.Array.Byte, with ShortByteString and various
applications leveraging the random-access mutation (and resizing, ...)
to implement higher level operations.

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: [ANNOUNCE] GHC 9.4.8 is now available

2023-11-10 Thread Viktor Dukhovni
On Fri, Nov 10, 2023 at 04:22:54PM -0500, Viktor Dukhovni wrote:

> On Fri, Nov 10, 2023 at 09:23:11PM +0530, Zubin Duggal wrote:
> 
> > The GHC developers are happy to announce the availability of GHC 9.4.8. 
> > Binary
> > distributions, source distributions, and documentation are available on the
> > [release page](/download_ghc_9_4_8.html).
> > 
> 
> Many thanks.  I am, however, having a problem building 9.4.8 from source
> on Fedora 36, with GHC 9.4.6 as the compiler, and cabal-install
> 3.10.1.0.  The diagnostic output (some paths made relative to reduce
> clutter) is:
> 
> ...
> Error: hadrian: Encountered missing or private dependencies:
> hpc >=0.6.2 && <0.8

Never mind, looks like the source tree wasn't quite up-to-date with all
the submodules. Sorry about the noise...

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: [ANNOUNCE] GHC 9.4.8 is now available

2023-11-10 Thread Viktor Dukhovni
On Fri, Nov 10, 2023 at 09:23:11PM +0530, Zubin Duggal wrote:

> The GHC developers are happy to announce the availability of GHC 9.4.8. Binary
> distributions, source distributions, and documentation are available on the
> [release page](/download_ghc_9_4_8.html).
> 

Many thanks.  I am, however, having a problem building 9.4.8 from source
on Fedora 36, with GHC 9.4.6 as the compiler, and cabal-install
3.10.1.0.  The diagnostic output (some paths made relative to reduce
clutter) is:

...
| Configure package 'haskeline'
| Run Ghc LinkHs Stage1: 
_build/stage1/libraries/process/build/c/cbits/posix/runProcess.dyn_o (and 8 
more) => _build/stage1/libraries/process/build/libHSprocess-1.6.18.0-ghc9.4.8.so
| ContextData oracle: resolving data for 'hpc-bin' (Stage1, dyn)...
# _build/stage1/utils/hpc/setup-config
| Configure package 'hpc-bin'
| Package 'hpc-bin' configuration flags: configure --distdir 
_build/stage1/utils/hpc --cabal-file utils/hpc/hpc-bin.cabal --ipid 
$pkg-$version --prefix ${pkgroot}/.. --htmldir 
${pkgroot}/../../doc/html/libraries/hpc-bin-0.69 
--with-ghc=_build/stage0/bin/ghc --ghc-option=-no-global-package-db 
--ghc-option=-package-db=_build/stage1/lib/package.conf.d 
--with-ghc-pkg=_build/stage0/bin/ghc-pkg 
--ghc-pkg-option=--global-package-db=_build/stage1/lib/package.conf.d 
--enable-library-vanilla --enable-library-profiling --disable-library-for-ghci 
--enable-shared --with-gcc=/usr/bin/cc --with-ld=ld.gold --with-ar=/bin/ar 
--with-alex=/bin/alex --with-happy=/bin/happy --configure-option=CFLAGS=-iquote 
utils/hpc --configure-option=LDFLAGS=-fuse-ld=gold --gcc-options=-iquote 
utils/hpc -fuse-ld=gold --configure-option=--with-gmp-includes=/usr/include 
--configure-option=--with-gmp-libraries=/usr/lib64 
--configure-option=--host=x86_64-unknown-linux 
--configure-option=--with-cc=/usr/bin/cc 
--ghc-option=-ghcversion-file=rts/include/ghcversion.h 
--ghc-option=-ghcversion-file=rts/include/ghcversion.h 
--flags=-build-tool-depends -v0
# cabal-configure (for _build/stage1/utils/hpc/setup-config)
Error: hadrian: Encountered missing or private dependencies:
hpc >=0.6.2 && <0.8

Error when running Shake build system:
  at want, called at src/Main.hs:124:44 in main:Main
* Depends on: binary-dist-dir
  at need, called at src/Rules/BinaryDist.hs:130:9 in main:Rules.BinaryDist
* Depends on: _build/stage1/bin/hpc
  at apply1, called at 
src/Development/Shake/Internal/Rules/Oracle.hs:159:32 in 
shake-0.19.7-0a34884117d1c1ae051fb71ab291372738b4fe99639c85970f08dbdf7c0632db:Development.Shake.Internal.Rules.Oracle
* Depends on: OracleQ (ContextDataKey (Context {stage = Stage1, package = 
Package {pkgType = Program, pkgName = "hpc-bin", pkgPath = "utils/hpc"}, way = 
dyn}))
  at need, called at src/Hadrian/Oracles/Cabal/Rules.hs:54:9 in 
main:Hadrian.Oracles.Cabal.Rules
* Depends on: _build/stage1/utils/hpc/setup-config
* Raised the exception:
ExitFailure 1

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Type-level sized Word literals???

2023-10-30 Thread Viktor Dukhovni
On Mon, Oct 30, 2023 at 09:20:16AM +0100, Vladislav Zavialov via ghc-devs wrote:

> Can you tell more about the code you're writing? Would it be possible
> to use it as the basis for the "Motivation" section of a GHC proposal?
> 

Working with DNS resource records (A, NS, CNAME, SOA, MX, SRV,
...) requires a runtime extensible data model:

- Some RR types will be known at DNS-library compile-time.

- Some RR types can defined and added at runtime (application
  compile-time, and registered with the DNS library).

- Some RR types may appear "on the wire" in serialised form,
  with the RRTYPE not known to either the library or
  application code.

The library data model has separate types for the "rrData" parts of each
resource record type, which are existentially quantified in the RR:

RR { rrOwner, rrTTL, rrClass, rrData :: RData }

RData = forall a. KnownRData a => RData a

with the usual:

fromRData :: KnownRData a => RData -> Maybe a
fromRData (RData a) = cast a

To filter a list of resource records to those of a particular RR type,
I have:

monoList :: forall a t. (KnownRData a, Foldable t) => t RData -> [a]
monoList = foldr (maybe id (:) . fromRData) go []

I'd like to be able to use this also to distinguish between
"OpaqueRData" resource record types, so there's actually a separate
opaque type for each 16-bit RR number:

{-# LANGUAGE AllowAmbiguousTypes #-}

type OpaqueRData :: Word16 -> Type
data OpaqueRData w = OpaqueRData { getOpaqueRData :: Bytes16 }

-- Nat16 constraint enforces 65535 ceiling
-- natVal16 returns Nat as Word16.
--
instance Nat16 n => KnownRData (OpaqueRData n) where
rdType = RRTYPE $ natVal16 @n

This works, because the phantom indices have to match for "cast" to
return a Just value, so that, for example:

λ> x1 = RData $ (OpaqueRData (coerce ("abc" :: ShortByteString)) :: 
OpaqueRData 1)
λ> fromRData x1 :: (Maybe (OpaqueRData 1))
Just (OpaqueRData @1 "616263")

λ> fromRData x1 :: (Maybe (OpaqueRData 2))
Nothing

λ> l1 = monoList [x1] :: [OpaqueRData 1]
λ> l2 = monoList [x1] :: [OpaqueRData 2]
λ> hPutBuilder stdout $ foldr (presentLn) ("That's all folks!\n") l1
\# 3 616263
That's all folks!
λ> hPutBuilder stdout $ foldr (presentLn) ("That's all folks!\n") l2
That's all folks!

In addition to labeling unknown RData with Word16 values, I also
type-index unknown EDNS options (they're elements of the OPT pseudo RR
that carries DNS protocol extensions) and unknown SVCB/HTTPS key/value
service parameter pairs.

Applications can register novel RData types, EDNS options, and SVCB
key/value types at runtime, and the extended code points behave just
like the "built-in" ones, because the only "built-in" code points are
the opaque ones, the others are registered at runtime by the library
as part of default resolver settings.

So this is how I end up with Word16-indexed types.  One might argue that
"OpaqueRData" could be a single type, and that filtering by RRTYPE
should have the "rrType" method taking a value to optionally inspect,
but I like the type-level separation even between Opaque data of
different RRTYPEs, and ditto for EDNS options and SVCB/HTTPS fields.

This supports view patterns:

f (fromRData -> Just (T_a ipv4)) = ...
f (fromRData -> Just (T_ ipv6)) = ...

which should "morally" also work for:

getBlob42 :: OpaqueRData 42 -> ShortByteString
getBlob42 = fromBytes16 . getOpaqueRData

f (fmap getBlob42 . fromRData -> Just blob) = ...

yielding just the serialised blobs of RRTYPE 42, with little
chance of accidentally pulling in blobs of the wrong RRTYPE.

I may before long add an associated type to the KnownRData class:

type RdType :: Nat -- Ideally some day Word16

making it possible to write:

-- Identity functions on the actual Opaque types.
toOpaque :: a -> Opaque (RdType a)
fromOpaque :: Opaque (RdType a) -> a

at which point a simple tweak to the above "blob" pattern match could
also work when the RRtype 42 was decoded as known:

f (fmap getBlob42 . fromRData . toOpaque -> Just blob)) = ...

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Type-level sized Word literals???

2023-10-29 Thread Viktor Dukhovni
I am working on some code where it is useful to have types indexed by a
16-bit unsigned value.

Presently, I am using type-level naturals and having to now and then
provide witnesses that a 'Nat' value I am working with is at most 65535.

Also, perhaps there's some inefficiency here, as Naturals have two
constructors, and so a more complex representation with (AFAIK) no
unboxed forms.

I was wondering what it would take to have type-level fixed-size
Words (Word8, Word16, Word32, Word64) to go along with the Nats?

It looks like some of the machinery (KnownWord16, SomeWord16, wordVal16,
etc.) can be copied straight out of GHC.TypeNats with minor changes, and
that much works, but the three major things that are't easily done seem
to be:

- There are it seems no TypeReps for types of Kind Word16, so one can't
  have (Typeable (Foo w)) with (w :: Word16).

- There are no literals of a promoted Word16 Kind.

type Foo :: Word16 -> Type
data Foo w = MkFoo Int

-- 1 has Kind 'Natural' (a.k.a. Nat)
x = MkFoo 13 :: Foo 1 -- Rejected, 

-- The new ExtendedLiterals syntax does not help
--
x = MkFoo 13 :: Foo (W16# 1#Word16) -- Syntax error!

- There are unsurprisingly also no built-in 'KnownWord16' instances
  for any hypothetical type-level literals of Kind Word16.

Likely the use case for type-level fixed-size words is too specialised
to rush to shoehorn into GHC, but is there something on the not too
distant horizon that would make it easier and reasonable to have
fixed-size unsigned integral type literals available?

[ I don't see a use-case for unsigned versions, they can trivially be
  represented by the unsigned value of the same width. ]

With some inconvenience, in many cases I can perhaps synthesise Proxies
for types of Kind Word16, and just never use literals directly.

--
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Reinstallable - base

2023-10-17 Thread Viktor Dukhovni
On Tue, Oct 17, 2023 at 04:54:41PM +0100, Adam Gundry wrote:

> Thanks for starting this discussion, it would be good to see progress in
> this direction. As it happens I was discussing this question with Ben and
> Matt over dinner last night, and unfortunately they explained to me that it
> is more difficult than I naively hoped, even once wired-in and known-key
> things are moved to ghc-internal.
> 
> The difficulty is that, as a normal Haskell library, ghc itself will be
> compiled against a particular version of base. Then when Template Haskell is
> used (with the internal interpreter), code will be dynamically loaded into a
> process that already has symbols for ghc's version of base, which means it
> is not safe for the code to depend on a different version of base. This is
> rather like the situation with TH and cross-compilers.

To avoid that problem, GHC's own dependency on "base" could be indirect
via a shared object with versioned symbol names and a version-specific
SONAME (possibly even a private to GHC SONAME and private symbol version
names).  Say "libbase.so.4.19.1".

The dependency on "base" in the TemplatHaskell generated code would then
also need to be dynamic, allowing the two versions of base to coexist
without conflict, both in turn depdent on a common version of the GHC
internal libraries.

This would of course somewhat complicate binary distributions, but that
should be manageable.  Perhaps there are less invasive (more clever)
solutions?

-- 
Viktor
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Problem building 9.4.6 on Fedora 36 (bytestring/cbits/is-valid-utf8.c)

2023-08-08 Thread Viktor Dukhovni
On Tue, Aug 08, 2023 at 11:33:52AM -0400, Viktor Dukhovni wrote:

> The build was failing, because rts/OSThreads.h via Rts.h from
> libraries/bytestring/cbits/is-valid-utf8.c had no definition of
> `clockid_t`.  This type is not exposed when _POSIX_C_SOURCE is
> not defined to a sufficiently high value:

Apologies, original $subject should have said 9.4.6.

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Problem building 9.4.7 on Fedora 36 (bytestring/cbits/is-valid-utf8.c)

2023-08-08 Thread Viktor Dukhovni
The build was failing, because rts/OSThreads.h via Rts.h from
libraries/bytestring/cbits/is-valid-utf8.c had no definition of
`clockid_t`.  This type is not exposed with _POSIX_C_SOURCE is
not defined to a sufficiently high value:

SYNOPSIS
   #include 

   int clock_getres(clockid_t clockid, struct timespec *res);

   int clock_gettime(clockid_t clockid, struct timespec *tp);
   int clock_settime(clockid_t clockid, const struct timespec *tp);

   Link with -lrt (only for glibc versions before 2.17).

   Feature Test Macro Requirements for glibc (see feature_test_macros(7)):

   clock_getres(), clock_gettime(), clock_settime():
   _POSIX_C_SOURCE >= 199309L

I quick-and-dirty work-around was:

--- a/libraries/bytestring/cbits/is-valid-utf8.c
+++ b/libraries/bytestring/cbits/is-valid-utf8.c
@@ -27,6 +27,10 @@ LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 
ARISING IN ANY WAY
 OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 SUCH DAMAGE.
 */
+#undef _POSIX_C_SOURCE
+#define _POSIX_C_SOURCE 200809L
+#undef _XOPEN_SOURCE
+#define _XOPEN_SOURCE   700
 #pragma GCC push_options
 #pragma GCC optimize("-O2")
 #include 


There's surely a better solution.

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: I can't build HEAD

2023-07-09 Thread Viktor Dukhovni
Spam detection software, running on the system "mail.haskell.org", has
identified this incoming email as possible spam.  The original message
has been attached to this so you can view it (if it isn't spam) or label
similar future email.  If you have any questions, see
@@CONTACT_ADDRESS@@ for details.

Content preview:  On Sun, Jul 09, 2023 at 05:49:38PM +0100, Simon Peyton Jones
   wrote: > in a clean HEAD build, including "git submodule update", I get this.
   > > Can anyone help? Did you run "cabal update"? What is your boot compiler
   version? [...] 

Content analysis details:   (5.8 points, 5.0 required)

 pts rule name  description
 -- --
-0.0 SPF_PASS   SPF: sender matches SPF record
 5.0 UNWANTED_LANGUAGE_BODY BODY: Message written in an undesired language
 0.8 BAYES_50   BODY: Bayes spam probability is 40 to 60%
[score: 0.4970]


--- Begin Message ---
On Sun, Jul 09, 2023 at 05:49:38PM +0100, Simon Peyton Jones wrote:

> in a clean HEAD build, including "git submodule update", I get this.
> 
> Can anyone help?

Did you run "cabal update"?  What is your boot compiler version?

> # cabal-configure (for _build/stage0/libraries/text/setup-config)
> | Run GhcPkg Recache (Stage0 InTreeLibs): none => none
> hadrian: Encountered missing or private dependencies:
> data-array-byte >=0.1 && <0.2

So far, it looks like HEAD is building for me (still running and got
past building "text"):


/\
| Successfully built library 'text' (Stage0 InTreeLibs, way v). 
 |
| Library: 
/home/viktor/dev/ghc/_build/stage0/libraries/text/build/libHStext-2.0.2-inplace.a
 |
| Library synopsis: An efficient packed Unicode text type.  
 |

\/
... many lines and some time later ...

/--\
| Successfully built library 'text' (Stage1, way p).
   |
| Library: 
/home/viktor/dev/ghc/_build/stage1/libraries/text/build/libHStext-2.0.2-inplace_p.a
 |
| Library synopsis: An efficient packed Unicode text type.  
   |

\--/

* Source tree clean
* Boot compiler GHC 9.6.1
* Cabal 3.10.1.0 + cabal update

Build script (FreeBSD):

BOOTPREFIX=$HOME/.local/ghc-9.6
BOOTGHC=$BOOTPREFIX/bin/ghc
PREFIX=$HOME/.local/ghc-master
BDIR=$HOME/dev/ghc/_build

git submodule sync
git submodule update
./boot

GHC=$BOOTGHC \
CLANG=/usr/local/bin/clang15 \
LLC=/usr/local/bin/llc15 \
OPT=/usr/local/bin/opt15 \
AR=/usr/local/bin/ar fp_prog_ar=$AR bash ./configure \
--prefix=$PREFIX \
--enable-large-address-space \
--with-gmp-includes=/usr/local/include \
--with-gmp-libraries=/usr/local/lib \
--with-hs-cpp=/usr/bin/cc

GHC=$BOOTGHC \
hadrian/build -j9 -o"${BDIR}" --docs=no-sphinx binary-dist-dir

-- 
Viktor.
--- End Message ---
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Build of GHC 9.6 fails when the build directory is not a child of the source directory

2023-04-30 Thread Viktor Dukhovni
On Sun, Apr 30, 2023 at 11:18:07AM +0200, Torsten Schmits via ghc-devs wrote:

> I created an issue for this: 
> https://gitlab.haskell.org/ghc/ghc/-/issues/22741
> 
> You can share your insights there!

Done.  It does look like we encountered the same underlying issue..

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Build of GHC 9.6 fails when the build directory is not a child of the source directory

2023-04-29 Thread Viktor Dukhovni
For some time now I'd been unable to build GHC 9.6 from source.  The
reason turned out to be that my hadrian command-line selected an
explicit build directory that was not an immediate child of the source
directory (default it seems is "_build").

With the source tree under "$HOME/dev/ghc/", the hardrian command

$ hadrian/build -V -V -o"$HOME/dev/buildghc" --docs=no-sphinx 
binary-dist-dir

after building stage0, and running "configure" in libraries/base,
reports an error finding HsFFI.h:

Reading parameters from 
$HOME/dev/buildghc/stage1/libraries/base/build/base.buildinfo
/usr/bin/cc '-fuse-ld=gold' /tmp/2303653-4.c -o /tmp/2303653-5
'-D__GLASGOW_HASKELL__=906' \
'-Dlinux_BUILD_OS=1' \
'-Dx86_64_BUILD_ARCH=1' \
'-Dlinux_HOST_OS=1' \
'-Dx86_64_HOST_ARCH=1' \
-I$HOME/dev/buildghc/stage1/libraries/base/build/autogen \
-I$HOME/dev/buildghc/stage1/libraries/base/build/include \
-Ilibraries/base/include \
-Ilibraries/base \
-I/usr/include \

-I$HOME/dev/buildghc/stage1/inplace/../../..//libraries/ghc-bignum/include/ \
-I$HOME/dev/buildghc/stage1/libraries/ghc-bignum/build/include/ \

-I$HOME/dev/buildghc/stage1/inplace/../../..//libraries/ghc-bignum/include \
-I$HOME/dev/buildghc/stage1/libraries/ghc-bignum/build/include \
-I$HOME/dev/buildghc/stage1/inplace/../../..//rts/include \
-I$HOME/dev/buildghc/stage1/rts/build/include \
'-I$HOME/dev/buildghc/stage1/inplace/../../..//rts/@FFIIncludeDir@' \
'-I$HOME/dev/buildghc/stage1/rts/build/@FFIIncludeDir@' \
'-I$HOME/dev/buildghc/stage1/inplace/../../..//rts/@LibdwIncludeDir@' \
'-I$HOME/dev/buildghc/stage1/rts/build/@LibdwIncludeDir@' \
-L$HOME/dev/buildghc/stage1/inplace/../libraries/ghc-bignum/build \
-L$HOME/dev/buildghc/stage1/inplace/../libraries/ghc-prim/build \
-L$HOME/dev/buildghc/stage1/inplace/../rts/build -iquote \
$HOME/dev/ghc/libraries/base \
'-fuse-ld=gold'

There are two issues to note here:

- "hadrian" fails to substitute @FFIIncludeDir@ and @LibdwIncludeDir@.
  This used to be handled by "configure", but the job of turning
  "rts.cabal.in" into "rts.cabal" seems to have been reassigned to
  "hadrian".

- With the build output directory a sibling rather than a child of
  the source tree, the path to "rts/include" is not constructed
  correctly.  The path:

-I$HOME/dev/buildghc/stage1/inplace/../../..//rts/include

  should have been:

-I$HOME/dev/buildghc/stage1/inplace/../../../ghc/rts/include

Switching to the default path proved to be a viable work-around, but
perhaps other choices should also work.

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: [ANNOUNCE] GHC 9.6.1 is now available

2023-03-12 Thread Viktor Dukhovni
On Sat, Mar 11, 2023 at 09:14:12PM -0500, Ben Gamari wrote:

> The GHC team is very pleased to announce the availability of GHC 9.6.1.
> As usual, binaries and source distributions are available at
> downloads.haskell.org:
> 
> https://downloads.haskell.org/ghc/9.6.1/

Is anyone else having trouble building 9.6.1 from upstream source.

a58c028a18 (HEAD -> ghc-9.6, tag: ghc-9.6.1-release, origin/ghc-9.6) Fix 
TBA in base changelog
1f5bce0db8 Set RELEASE=YES
87ab8e353f Bump haddock submodule to 2.28
...

My attempts on a Fedora 36 system are so far unsuccessful, with both GCC
and Clang-14 as the C compilers, and GHC 9.4.4 as the bootstrap GHC.

The build fails when "HsBase.h" is not found during the "stage1" build
of the ghc-bignum library.

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: DKIM failures for gitlab mail

2023-01-23 Thread Viktor Dukhovni
On Mon, Jan 23, 2023 at 03:41:21PM +0100, Joachim Breitner wrote:
> Hi Ben,
> 
> gentle reminder about this issue? I’m worried I (and maybe others) are
> going to miss gitlab notifications.

A recent gitlab notice has:

Received: by gitlab.haskell.org (Postfix, from userid 165)
id AF9E627CA9; Mon, 16 Jan 2023 20:50:59 -0500 (EST)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=gitlab.haskell.org;
s=mail; t=1673920259;
bh=bezCH96kI1N9pklJv6GEpVDADij1+8Q/zwCT65Djz/4=;
h=Date:From:Reply-To:To:Subject:List-Id;
b=L7ikqNV+Hn0OZzM9AH+rLIvP5P9COe8/zuP7bmSsMJ50kFJ2a7gJy4cbxoX83bNqU
oBQV78j6nIFV/SRgbaF9vQciNBzWu1GNACMGaqVMVjTBki93xw/hvMv8JDIhAdAYaV
da96BBtxrTDoDUtFBtYlb5n361TqIDHXHkCqE5Dc=

The DKIM data in DNS is:

$ dig +short +nosplit -t txt mail._domainkey.gitlab.haskell.org
"v=DKIM1; k=rsa; 
p=MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDiTJ9J8+wWWFRzHjjr5CCbOx33rZaDH2PQsQtTLwOPVZDTSjz8pwUuyQ4s+Xxq6f6UEEAIo/8ZHySJqXG6HN3b6/Gq2SwnE2xLk307gcWzZgyF/9UM5SpcJ46VxYPu2spBQSWhDnRbp849ZouuY/orKT/HMb/9xow25KwWbAyh8wIDAQAB"

Putting it together:

$ echo 
MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDiTJ9J8+wWWFRzHjjr5CCbOx33rZaDH2PQsQtTLwOPVZDTSjz8pwUuyQ4s+Xxq6f6UEEAIo/8ZHySJqXG6HN3b6/Gq2SwnE2xLk307gcWzZgyF/9UM5SpcJ46VxYPu2spBQSWhDnRbp849ZouuY/orKT/HMb/9xow25KwWbAyh8wIDAQAB
 |
openssl base64 -A -d |
openssl pkey -pubin -inform DER -out /tmp/pkey.pem

$ openssl base64 -d <<-\EOF > /tmp/sig.dat
L7ikqNV+Hn0OZzM9AH+rLIvP5P9COe8/zuP7bmSsMJ50kFJ2a7gJy4cbxoX83bNq
UoBQV78j6nIFV/SRgbaF9vQciNBzWu1GNACMGaqVMVjTBki93xw/hvMv8JDIhAdA
YaVda96BBtxrTDoDUtFBtYlb5n361TqIDHXHkCqE5Dc=
EOF

$ openssl pkeyutl -pubin -inkey /tmp/pkey.pem \
-encrypt -pkeyopt rsa_padding_mode:none \
-in /tmp/sig.dat -hexdump

 - 52 90 e5 01 80 fa 77 53-b3 19 97 16 33 70 1e 29   R.wS3p.)
0010 - 7e 7b cf 5c a4 51 b2 eb-7c fa 88 dc ce 92 b2 ac   ~{.\.Q..|...
0020 - 4f 86 d4 f1 32 83 55 0a-0b c0 49 92 a3 4a 54 47   O...2.U...I..JTG
0030 - dc 6b 5d bd 2c 1e 5d 85-cf f4 4f c8 3c c5 3f bd   .k].,.]...O.<.?.
0040 - 9d 56 29 a2 b5 dc 94 13-50 c3 28 23 0c a0 64 0b   .V).P.(#..d.
0050 - 0e 99 96 4a 0f b4 36 1a-3a d6 ff 6f 50 00 1a 38   ...J..6.:..oP..8
0060 - 09 34 75 a6 d5 29 da 80-7c c1 bd 77 c4 a3 01 32   .4u..)..|..w...2
0070 - d1 16 b4 8f 6c 3d fd a4-25 8d 53 2b 64 9c d8 ed   l=..%.S+d...

We see that the RSA public key operation does not produce a valid PKCS#1
padded block, so most likely an outdated key is published in DNS, or the
wrong "selector" ("s=" value, currently "mail") was added to the DKIM
signature header (if the correct key is published under some other
selector).

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Deprecating Safe Haskell, or heavily investing in it?

2022-12-27 Thread Viktor Dukhovni
On Tue, Dec 27, 2022 at 09:39:22PM +0100, Hécate wrote:

> I came across the nsjail system from Google a little while after posting 
> this thread: https://github.com/google/nsjail/#overview

Yes, this is the sort of thing that one can begin to trust, provided
that the exposed capabalities are managed only by inclusion, all system
calls, filesystem namespaces, network namespaces, ... that are not
explicitly allowed are denied.

> Perhaps we could get the most value for our buck if we externalise the
> solution to work with OS-level mechanisms?  What do you think of that?
> Something based upon eBPF would certainly incur less modifications to
> the RTS?

Indeed, it would be simpler to leverage existing virtualisation and/or
containerisation technologies, than build a new microkernel within the
RTS.  Consequently, I guess I am saying that "Safe Haskell" was an
interesting research project, but may be a practical dead-end.

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Deprecating Safe Haskell, or heavily investing in it?

2022-12-27 Thread Viktor Dukhovni
On Tue, Dec 27, 2022 at 10:31:07PM +0100, Jaro Reinders wrote:

> The bytestring package does have run time bounds checks. So maybe Safe
> Haskell is safer than you think?

No.  The safety depends on careful Safe/Unsafe marking of an
unmanageable and growing set of modules.  How does GHC know
that "Data.ByteString.Unsafe" is actually "unsafe" in the
sense of "Safe" Haskell?

λ> BS.index x 10
*** Exception: Data.ByteString.index: index too large: 10, length = 6
CallStack (from HasCallStack):
  error, called at libraries/bytestring/Data/ByteString.hs:2026:23 in 
bytestring-0.11.3.1:Data.ByteString
  moduleError, called at libraries/bytestring/Data/ByteString.hs:1232:24 in 
bytestring-0.11.3.1:Data.ByteString
  index, called at :7:1 in interactive:Ghci3
λ> import Data.ByteString.Unsafe as UBS
λ> UBS.unsafeIndex x 3
27
λ> UBS.unsafeIndex x 100
162
λ> UBS.unsafeIndex x 1000
185
λ> UBS.unsafeIndex x 1
Segmentation fault (core dumped)

This is too brittle to be safe on an ongoing basis in practice.

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Deprecating Safe Haskell, or heavily investing in it?

2022-12-27 Thread Viktor Dukhovni
On Tue, Dec 27, 2022 at 06:09:59PM +0100, Hécate wrote:

> Now, there are two options (convenient!) that are left to us:
> 
> 1. Deprecate Safe Haskell: We remove the Safe mechanism as it exists 
> today, and keep the IO restriction under another name. This will 
> certainly cause much joy amongst maintainers and GHC developers alike. 
> The downside is that we don't have a mechanism to enforce "Strict 
> type-safety" anymore.
> 
> 2. We heavily invest in Safe Haskell: This is the option where we amend 
> the PVP to take changes of Safety annotations into account, invest in 
> workforce to fix the bugs on the GHC side. Which means we also invest in 
> the tools that check for PVP compatibility to check for Safety. This is 
> not the matter of a GSoC, or a 2-days hackathon, and I would certainly 
> have remorse sending students to the salt mines like that.
> 
> I do not list the Status Quo as an option because it is terrible and has 
> led us to regularly have complaints from both GHC & Ecosystem libraries 
> maintainers. There can be no half-measures that they usually tend to 
> make us slide back into the status quo.
> 
> So, what do you think?

I think that "Restricted IO" would in principle be the more sensible
approach.  HOWEVER, for robust "sandboxing" of untrusted code what's
required is more than just hiding the raw IO Monad from the sandboxed
code.  Doing that securely is much too difficult to do correctly, as
evidenced by the ultimate failure (long history of bypass issues) of
similar efforts for enabling restricted execution of untrusted code in
Java (anyone still using Java "applets", or running Flash in their
browser???).

The only way to do this correctly is to provide strong memory separation
between the untrusted code and the TCB.  The only mainstream working
examples of this that I know of are:

* Kernel vs. user space memory separation.

* Tcl's multiple interpreters, where untrusted code runs in
  slave interpreters stripped of most verbs, with aliases
  added to wrappers that call back into the parent interpreter
  for argument validation and restricted execution.

Both systems provide strong memory isolation of untrusted code, only
data passes between the untrusted code and the TCB through a limited
set of callbacks (system calls if you like).

For "Safe Haskell" to really be *safe*, memory access from untrusted
code would need to be "virtualised", with a separate heap and foreign
memory allocator for evaluation of untrusted code, and the RTS rewriting
and restricting all direct memory access.  This means that "peek" and
"poke" et. al. would not directly read memory, but rather be restricted
to specific address ranges allocated to the untrusted task.

Essentially the RTS would have to become a user-space microkernel.

This is in principle possible, but it is not clear whether this is worth
doing, given limited resources.

To achieve "safe" execution, restricted code needs to give up some
runtime performance, just compile-time safety checks are not
sufficiently robust in practice.  For example, the underlying byte
arrays (pinned or not) behind ByteString and Text when used from
untrusted code would not allow access to data beyond the array bounds
(range checked on every access), ...  which again speaks to some
"virtualisation" of memory access by the RTS, at least to the extent of
always performing range checks when running untrusted code.

Bottom line, I don't trust systems like Safe Haskell, or Java's
type-system-based sandboxing of untrusted code, ... that try to perform
sandboxing in a shared address space by essentially static analysis
alone.  We've long left shared address space security systems DOS and
MacOS 9 behind... good riddance.

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: DKIM failures for gitlab mail

2022-11-30 Thread Viktor Dukhovni
On Wed, Nov 30, 2022 at 05:33:44PM +0100, Joachim Breitner wrote:

> I noticed that a small number of Gitlab notification emails end up in
> my spamfilter. While there is not much you can do about triggering some
> bayesian style spam filter at my email provider (mailbox.org), I did
> notice this in the headers:
> 
> X-Spam-Status: No, score=2.704 tagged_above=2 required=6
> tests=[DKIM_INVALID=0.1, DKIM_SIGNED=0.1, HS_RSPAMD_10_11=2.5,
> HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001,
> URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
> Authentication-Results: spamfilter01.heinlein-hosting.de (amavisd-new);
> dkim=fail (1024-bit key) reason="fail (bad RSA signature)"
> header.d=gitlab.haskell.org
> DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple;
> d=gitlab.haskell.org;
> s=mail; t=1669733134;
> bh=D0NUcHiskEnwSP99umP3zo8Fz8fl74OgAJ8NRDKCsp4=;
> h=Date:From:Reply-To:To:In-Reply-To:References:Subject:List-Id:
>  List-Unsubscribe;
> b=R+WMLfhRZZdYxMd6K6w+iodDe8EHzwONNArNyboqsU5NnafPRhKZ1UeGxO/BCMvEK
>  M7XHRRrBsPfRYpTph7xSGY427KGXieASVg1GDhAiwKSLBCiqDdkBaoJLLUIfUD02NS
>  ouI3tvQ9mddNdaEK7retq8N+29hzs/ezf9cpgy+Q=

Indeed the signature in "b=" was not made by the key at
mail._domainkey.gitlab.haskell.org.  Running the below:

sig=$(
printf "%s\n%s\n%s\n" \
R+WMLfhRZZdYxMd6K6w+iodDe8EHzwONNArNyboqsU5NnafPRhKZ1UeGxO/BCMvE \
KM7XHRRrBsPfRYpTph7xSGY427KGXieASVg1GDhAiwKSLBCiqDdkBaoJLLUIfUD0 \
2NSouI3tvQ9mddNdaEK7retq8N+29hzs/ezf9cpgy+Q=
)

pkey=$(
dig +short -t txt mail._domainkey.gitlab.haskell.org |
perl -MMIME::Base64 -ne '
/^"v=DKIM1;/ or next;
print decode_base64($1) if m{;\s*p=(\S+?)(?:;|$)}
' |
openssl pkey -pubin -inform DER
)

openssl rsautl -raw -encrypt -pubin \
-inkey <( printf "%s\n" "$pkey" ) \
-in <(printf "%s\n" "$sig" | openssl base64 -d) |
xxd -p

the output is:

509bfc93a492f1b5328308e51624d9a7ed1378861f577b11413c5034bc0c
673d61660434d4bc30844e7648da0f9605923805973a313a8c3bc82215cc
ac447e47551087c544a0592ac3ae48474584bad7d9ca5b850a67493a7977
d28aaa3a9a7580d165dc4f31ff484bdbc40e94a2be1750e71c51c555b5c1
6bc051947bb07ae4

Which is not a PKCS#1.5 padded signature block.  So either the
"b=" value was corrupted in transit, or it was signed by a key
that is different from what is published in DNS.

> but maybe Postfix  is not using the right key?

Strictly speaking that's not Postfix itself, but some DKIM milter, but
nits aside, more likely a stale public key is published in DNS.

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: 'Caching' of results of default instance definitions

2022-11-22 Thread Viktor Dukhovni
On Wed, Nov 23, 2022 at 12:28:46PM +1100, Clinton Mead wrote:

> I have a class with a "method" which has a default definition, and that
> default definition has no arguments on the LHS, will a separate "instance"
> of that default definition be created for each instance of that class that
> inherits that default definition? The important consequence of that being
> that the default definition is only computed once per type.

Typically, an instance method will have the instance type variable
present in either one of the parameter types or in the result type,
making it possible to infer at call sites which instance to invoke.
Such methods are polymorphic, and rarely admit a sensible default value.

However, when a such a default value is possible, and if you disable
any inlining that might trigger separate per call site evaluation,
then indeed you can get a "once per-type" value.  The below prints
"Foo wuz here" only three times.

Main.hs:
module Main (main) where
import Again
import M

main :: IO ()
main = do
print $ one : foo
print $ 'X' : foo
print $ [one] : foo
again
  where one = 1 :: Int

Again.hs:
module Again(again) where
import M

{-# NOINLINE again #-}
again :: IO ()
again = do
print $ one : foo
print $ 'X' : foo
print $ [one] : foo
  where one = 1 :: Int

M.hs
{-# LANGUAGE FlexibleInstances #-}
module M (foo) where
import Debug.Trace

class M a where
{-# NOINLINE foo #-}
foo :: [a]
foo = trace "Foo wuz here" $ []

instance M Int
instance M Char
instance M ([Int])


With `TypeApplications` and `AllowAmbiguousTypes`, you can define
non-polymorphic instance methods that require an explicit type
application at the call site.  In that case, with inlining disabled and
optimisation enabled, the various default `foo @sometype` calls can be
collapsed to a single constant across multiple types.

M.hs:
{-# LANGUAGE AllowAmbiguousTypes, FlexibleInstances #-}
{-# OPTIONS_GHC -O2 #-}
module M (foo) where
import Debug.Trace

class M a where
{-# NOINLINE foo #-}
foo :: Int
foo = trace "Foo wuz here" $ 42

instance M Int
instance M Char
instance M ([Int])

With the above class definition and instance definitions and the print
statements necessarily written with type applications:

print $ foo @Int
print $ foo @Char
print $ foo @[Int]

the trace string is printed just once.

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


[Solved via #21974] GHC 9.4 symlink wrappers installed incorrectly on FreeBSD 12

2022-08-10 Thread Viktor Dukhovni
On Wed, Aug 10, 2022 at 10:06:43PM -0400, Viktor Dukhovni wrote:
> I just built a GHC 9.4 bindist on FreeBSD 12, and found that the bindist
> Makefile installed unusable wrappers when the source was a symlink.  It
> seems plausible that the issue may not be FreeBSD-specific, but that's
> all I've tried so far.

It seems I neglected to read the post-release announcement:

Due to an unfortunate packaging issue, the macOS binary
distributions for 9.4.1 are not usable as uploaded. The problem is
described in #21974, which also includes a small patch to mitigate
the breakage. We will be releasing a 9.4.2 within the week fixing
the issue.

Indeed #21974 fixes the issue.

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


GHC 9.4 symlink wrappers installed incorrectly on FreeBSD 12

2022-08-10 Thread Viktor Dukhovni
I just built a GHC 9.4 bindist on FreeBSD 12, and found that the bindist
Makefile installed unusable wrappers when the source was a symlink.  It
seems plausible that the issue may not be FreeBSD-specific, but that's
all I've tried so far.

The problem wrappers do not have execute permissions, and lack the
variable settings at the top of the scripts needed to make them work:

$ ls -l ~/.local/ghc-9.4/bin
total 98
-rw-r--r--  1 viktor  viktor45 Aug 10 21:46 ghc
-rwxr-xr-x  1 viktor  viktor   406 Aug 10 21:46 ghc-9.4.1
-rw-r--r--  1 viktor  viktor97 Aug 10 21:46 ghc-pkg
-rwxr-xr-x  1 viktor  viktor   466 Aug 10 21:46 ghc-pkg-9.4.1
-rw-r--r--  1 viktor  viktor67 Aug 10 21:46 ghci
-rwxr-xr-x  1 viktor  viktor   430 Aug 10 21:46 ghci-9.4.1
-rw-r--r--  1 viktor  viktor57 Aug 10 21:46 haddock
-rwxr-xr-x  1 viktor  viktor   434 Aug 10 21:46 haddock-ghc-9.4.1
-rw-r--r--  1 viktor  viktor33 Aug 10 21:46 hp2ps
-rwxr-xr-x  1 viktor  viktor   406 Aug 10 21:46 hp2ps-ghc-9.4.1
-rw-r--r--  1 viktor  viktor33 Aug 10 21:46 hpc
-rwxr-xr-x  1 viktor  viktor   402 Aug 10 21:46 hpc-ghc-9.4.1
-rw-r--r--  1 viktor  viktor   734 Aug 10 21:46 hsc2hs
-rwxr-xr-x  1 viktor  viktor  1109 Aug 10 21:46 hsc2hs-ghc-9.4.1
-rw-r--r--  1 viktor  viktor50 Aug 10 21:46 runghc
-rwxr-xr-x  1 viktor  viktor   417 Aug 10 21:46 runghc-9.4.1
-rw-r--r--  1 viktor  viktor50 Aug 10 21:46 runhaskell
-rwxr-xr-x  1 viktor  viktor   425 Aug 10 21:46 runhaskell-9.4.1

$ (cd ~/.local/ghc-9.4; grep . $(find bin ! -name hsc2hs ! -perm -001))
bin/ghci:executable="$bindir/ghc-9.4.1"
bin/ghci:exec $executable --interactive "$@"
bin/ghc:exec "$executablename" -B"$libdir" ${1+"$@"}
bin/haddock:exec "$executablename" -B"$libdir" -l"$libdir" ${1+"$@"}
bin/hp2ps:exec "$executablename" ${1+"$@"}
bin/ghc-pkg:PKGCONF="$libdir/package.conf.d"
bin/ghc-pkg:exec "$executablename" --global-package-db "$PKGCONF" ${1+"$@"}
bin/runghc:exec "$executablename" -f "$exedir/ghc" ${1+"$@"}
bin/hpc:exec "$executablename" ${1+"$@"}
bin/runhaskell:exec "$executablename" -f "$exedir/ghc" ${1+"$@"}

The same holds for "hsc2hs", but the "meat" of the script is longer so I
chose to skip it.

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: A macOS static linking mystery

2022-08-08 Thread Viktor Dukhovni
On Mon, Aug 08, 2022 at 09:59:48AM -0400, Viktor Dukhovni wrote:

> On my MacOS laptop I get:
> 
> $ /usr/local/bin/pkg-config --libs libffi
> -lffi
> 
> which does not use the "brew"-installed libffi.  Not surprising, since
> /usr/local/lib/pkgconfig/ has no symlink to the "libffi.pc" file.

When updating "libffi" HomeBrew reports:

==> libffi
libffi is keg-only, which means it was not symlinked into /usr/local,
because macOS already provides this software and installing another version 
in
parallel can cause all kinds of trouble.

For compilers to find libffi you may need to set:
  export LDFLAGS="-L/usr/local/opt/libffi/lib"
  export CPPFLAGS="-I/usr/local/opt/libffi/include"

If the MacOS libffi works, it is probably safer to use it rather than
the HomeBrew version.

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: A macOS static linking mystery

2022-08-08 Thread Viktor Dukhovni
On Mon, Aug 08, 2022 at 07:29:38AM -0400, Ryan Scott wrote:

> An exception to this rule is macOS, however. On macOS, building libffi
> always appears to default to linking against the static version of libffi,
> even when a dynamic version is also available. To reproduce this
> phenomenon, check out libffi [1] and run the following commands:
> 
> $ brew install libffi # If it is not already installed
> $ cabal build ctime
> $ otool -L $(cabal list-bin ctime)

What is the output of

$ pkg-config --libs libffi

on this system?  If "cabal" passes any additional flags to "pkg-config"
use those as well.

On my MacOS laptop I get:

$ /usr/local/bin/pkg-config --libs libffi
-lffi

which does not use the "brew"-installed libffi.  Not surprising, since
/usr/local/lib/pkgconfig/ has no symlink to the "libffi.pc" file.

> This is exceedingly strange, since my Hombrew installation does in fact
> provide libffi.dylib:
> 
> $ ls -alh ~/Software/homebrew/Cellar/libffi/3.4.2/lib/
> [...]
> drwxr-xr-x   3 rscott  134085054096B Aug  7 08:51 pkgconfig

For "pkg-config" to find the HomeBrew "libffi", there would need to be
a "libffi.pc" symlink to the one in the "pkgconfig" directory.

Perhaps there are additional steps to perform in HomeBrew to activate
this "libffi" as a default target for "pkg-config".

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Mixed boxed/unboxed arrays?

2022-08-03 Thread Viktor Dukhovni
On Wed, Aug 03, 2022 at 10:35:43PM +0200, J. Reinders wrote:

> I found the mistake:
> 
>compactAdd c k
>p <- anyToPtr k
> 
> Should be:
> 
>p <- anyToPtr . getCompact =<< compactAdd c k
> 
> Otherwise I guess I’m not using the pointer that’s on the compact region.

Correct, I started my reply to your previous message before seeing that
you also found the same error.

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Mixed boxed/unboxed arrays?

2022-08-03 Thread Viktor Dukhovni
On Wed, Aug 03, 2022 at 10:16:50PM +0200, J. Reinders wrote:

> I have an implementation that mostly works here:
> https://github.com/noughtmare/clutter
> in the src/Counter.hs file.
> 
> The only problem is that I get segfaults or internal GHC errors if I
> run it on large files. I’ve adding some tracing and it seems to occur
> when I try to coerce back pointers from the hash table array to proper
> Haskell values in the ’toList’ function.

Yes, this is delicate, requiring detailed knowledge of the internals.

> Currently, I’m using the ‘ptrToAny' and ‘anyToPtr' functions to do the
> coercing, because that sounds like the safest option.
> 
> Do you know what’s going wrong or do you have a safer design for coercing the 
> pointers?

The code at:

https://github.com/noughtmare/clutter/blob/main/src/Counter.hs#L50-L52

looks wrong.  You're ignoring the return value of `compactAdd`, and
coercing the original (non-compact) key to a pointer, but this is liable
to be moved by GC.  You need something like:

p <- addCompact c k >>= getCompact >>= anyToPtr



> I thought it might be because the compact region gets deallocated
> before all the pointers are extracted, but even if I add a ’touch c’
> (where c contains the compact region) at the end it still gives the
> same errors.

Given the issue above, it is too early to speculate along these lines.

It may also turn out that once the code works, it may be no faster or
even much slower than the two-array approach.  Compacting new keys has a
cost, and perhaps that will dominate any speedup from combining the key
and value in the same primitive cell.

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Mixed boxed/unboxed arrays?

2022-08-02 Thread Viktor Dukhovni
On Tue, Aug 02, 2022 at 05:32:58PM +0200, J. Reinders wrote:

> > Could you use `StablePtr` for the keys?
> 
> That might be an option, but I have no idea how performant stable
> pointers are and manual management is obviously not ideal.

If your hash table keys qualify for being stored in a "compact region",
you may not need per-key stable pointers, just (carefully) coercing the
keys to pointers suffices to produce primitive "handles" that are stable
for the lifetime of the "compact region".  The inverse (unsafe) coercion
recovers the key.

This also has the advantage that a key count does not incur a high
ongoing GC cost.  The keys are of course copied into the compact region.

With this you could store "pointer + count" in a primitive cell.  The
hash table then holds a reference to the compact region and compacts
keys on insert.

https://hackage.haskell.org/package/compact-0.2.0.0/docs/Data-Compact.html

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Mixed boxed/unboxed arrays?

2022-08-02 Thread Viktor Dukhovni
On Tue, Aug 02, 2022 at 03:31:57PM +0200, J. Reinders wrote:

> I’ve been investigating fast hash table implementations. In particular
> hash tables used for counting unique items. For this use case, I
> believe the most performant hash tables are, in C terms, arrays of
> structures with a (boxed) pointer to the key, which is the item that
> we are counting, and an (unboxed) integer which holds the actual
> count.
> 
> I already know of the ‘vector-hashtables’ package which uses two
> separate arrays, for example one boxed to hold the keys and one
> unboxed to hold the counts. However, I believe it can be quite
> important to store all the elements in the same array as that can
> reduce the number of cache misses. Because with random access to two
> arrays there is a higher chance that there will be two cache misses
> even if it immediately finds the right key in the hash table.

Could you use `StablePtr` for the keys?


https://downloads.haskell.org/~ghc/latest/docs/html/libraries/base-4.16.1.0/GHC-Stable.html

The corresponding `Ptr` can be stored in an unboxed Storable array along
with the count.

This comes at the cost of later having to explicitly free each StablePtr.


https://downloads.haskell.org/~ghc/latest/docs/html/libraries/base-4.16.1.0/GHC-Stable.html#v:freeStablePtr

How does the cost of computing object hashes and comparing colliding
objects compare with the potential cache miss cost of using boxed
integers or a separate array?  Would such an "optimisation" be worth
the effort?

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Git problem

2022-04-06 Thread Viktor Dukhovni
On Wed, Apr 06, 2022 at 10:55:09PM +0100, Simon Peyton Jones wrote:

> I see this
> bash$ git status
> On branch wip/romes/ttg-splices-improvements
> Your branch is up to date with 'origin/wip/romes/ttg-splices-improvements'.
> 
> modified   libraries/Cabal
> +Subproject commit d638e33dbc056048b393964286c7fe394b2730d7-dirty
> modified   libraries/unix
> +Subproject commit 1f72ccec55c1b61299310b994754782103a617f5-dirty
> 
> How can I get my submodules in sync with this branch?

( cd libraries/Cabal && { git clean -xdf .; git checkout .; } )
( cd libraries/unix && { git clean -xdf .; git checkout .; } )

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: convention around pattern synonyms

2021-12-30 Thread Viktor Dukhovni
On Thu, Dec 30, 2021 at 04:46:29PM +, Richard Eisenberg wrote:

> I agree that this kind of backward-compatibility pattern synonym is
> good and shouldn't be prefixed with PS_.
> 
> But do you have a concrete example of this leakage of an internal GHC
> type via TH? While I can imagine this happening, I don't know of any
> examples in practice. Note that even enumeration types (like Role)
> have separate TH counterparts.

Perhaps my assumption that TH types directly mirror the internal AST is
not correct...  A recent user-visible change is in `ConP`

https://github.com/nikita-volkov/contravariant-extras/pull/9

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: convention around pattern synonyms

2021-12-29 Thread Viktor Dukhovni
Some "GHC-internal" types leak to users via TH, and their constructors
occasionally pick up new fields, causing breakage downstream.  The extra
field often has a sensible default (Nothing, [], ...) and it should be
best practice to rename the constructor when adding the new field, while
replacing the original constructor with a pattern synonym with the "old"
signature.

data Foo = ...
 | NewImprovedMkFoo X Y Z -- was MkFoo Y Z

pattern MkFoo :: Foo
pattern MkFoo Y Z = NewImprovedMkFoo Nothing Y Z

When pattern synonyms are used to maintain a backwards-compatible API,
there should of course be no special signalling to differentiate them
from "real" constructors.

The boundary between "GHC-internal" and external may not always be
obvious, some care is required to reduce leaking breakage via TH.

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: [EXTERNAL] Unexpected duplicate join points in "Core" output?

2021-11-24 Thread Viktor Dukhovni
On Wed, Nov 24, 2021 at 06:32:04PM -0500, Viktor Dukhovni wrote:

> > Yes exactly. And it would not be hard to adapt the existing CSE pass
> > to support this.  Just needs doing.
> > 
> > A ticket and a repo case would be really helpful.
> 
> I'll do my best to construct a standalone reproducer that is not mired
> in ByteString code.  The ByteString example should not be too difficult
> to mimmic in code that relies only on base.

Just noticed a complication, it seems that the placemnt of the IO state
token in the join point argument list is non-deterministic, so I'm
starting to see join points in which the argument lists are permuted,
with an equivalent permutation at the jump/call site... :-(

Two exit points returning equivalent data, the first returns early,
the second returns after first performing some I/O:

return $ Result valid acc (ptr `minusPtr` start)

become respectively (ipv2 and w3 are IO state tokens):

1. jump exit2 ww4 ww5 valid ipv2
   -- acc ptr valid s#
2. jump exit3 ww4 ww5 w3 valid
   -- acc ptr s# valid

So the join points are then only alpha equivalent up to argument
permutation:

  join {
exit2 :: Word# -> Addr# -> Bool -> State# RealWorld -> Maybe (Int, 
ByteString)
exit2 (ww4 :: Word#) (ww5 :: Addr#) (valid :: Bool) (ipv2 :: State# 
RealWorld)
  = ...

  join {
exit3 :: Word# -> Addr# -> State# RealWorld -> Bool -> Maybe (Int, 
ByteString)
exit3 (ww4 :: Word#) (ww5 :: Addr#) (w2 :: State# RealWorld) (valid :: 
Bool)
  = ...

I don't how argument lists to join points are ordered, would it be
possible to make them predictably consistent?

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: [EXTERNAL] Unexpected duplicate join points in "Core" output?

2021-11-24 Thread Viktor Dukhovni
On Wed, Nov 24, 2021 at 11:14:00PM +, Simon Peyton Jones via ghc-devs wrote:

> | For two join points to be duplicates they need to not only be alpha
> | equivalent but to also have the same continuation.  
> 
> Yes exactly. And it would not be hard to adapt the existing CSE pass
> to support this.  Just needs doing.
> 
> A ticket and a repo case would be really helpful.

I'll do my best to construct a standalone reproducer that is not mired
in ByteString code.  The ByteString example should not be too difficult
to mimmic in code that relies only on base.

Though I might still have to use Foreign.Storable and Foreign.Ptr and
some sort of unsafePerformIO variant in there, so that I get essentially
the same basic structure of inlining and join points.

I guess I'll try removing excess baggage while the basic structure
persists, and ideally end up with something small enough.

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: [EXTERNAL] Unexpected duplicate join points in "Core" output?

2021-11-24 Thread Viktor Dukhovni
On Sun, Nov 21, 2021 at 06:53:53AM -0500, Carter Schonwald wrote:

> On Sat, Nov 20, 2021 at 4:17 PM Simon Peyton Jones via ghc-devs <
> ghc-devs@haskell.org> wrote:
> 
> > There is absolutely no reason not to common-up those to join points.  But
> > we can't common up some join points when we could if they were let's.
> > Consider
> >
> > join j1 x = x+1
> > in case v of
> >   A -> f (join j2 x = x+1 in ...j2...)
> >   B -> j1...
> >   C -> j1...
> >
> > Even though j2 is identical to j1's, we can't eliminate j2 in favour of j1
> > because then j1 wouldn't be a join point any more.
>
> In this example: why would it stop being a join point ?
> 
> Admittedly, my intuition might be skewed by my own ideas about how
> join points are sortah a semantic special case of other constructs.

I think the point is that join points are tail calls that don't return
to the caller.  But here even though `j1` and `j2` have the same body
j1's continuation is not the same as j2's continuation.

Rather the result of `j2` is the input to `f`, but the result of j1 is a
possible output of the whole `case` block in the B and C branches.  For
two join points to be duplicates they need to not only be alpha
equivalent but to also have the same continuation.  Something like

join j1 x = x + 1 in
join j2 y = y + 1 in
... j1 ...
... j2 ...

where eliminating j2 in favour of j1 should be correct.

-- 
VIktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: [EXTERNAL] Unexpected duplicate join points in "Core" output?

2021-11-20 Thread Viktor Dukhovni
On Sat, Nov 20, 2021 at 09:15:15PM +, Simon Peyton Jones via ghc-devs wrote:

> GHC.Core.Opt.CSE is conservative at the moment, and never CSE's *any*
> join point.  It would not be hard to make it clever enough to CSE join
> points, but no one has yet done it.
> 
> Do open a ticket!

Thanks, I opened https://gitlab.haskell.org/ghc/ghc/-/issues/20717

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: [Take 2] Unexpected duplicate join points in "Core" output?

2021-11-20 Thread Viktor Dukhovni
On Sat, Nov 20, 2021 at 01:54:36PM -0500, Viktor Dukhovni wrote:

> Is there some way for GHC to figure out to not float out such cheap
> computations?  The 'Result' constructor is strict, so there's no cost to
> evaluating `used > 0`, and cloning the entire computation is I think
> the more unfortunate choice...

I managed to get the loop to not emit duplicate code bloat by
inserting another NOINLINE term:

!keepGoing = acc < q || acc == q && d <= r
{-# NOINLINE keepGoing #-}

Thus the below produces Core with no significant bloat, matching roughly
what one might (reasonably?/naively?) expect.  But I am reluctant to
actually include such work-arounds in the PR, the code that produces
more "bloated" Core is easier to understand and maintain...

_digits :: Accum -> Accum -> BI.ByteString -> Accum -> Result   
  {-# INLINE _digits #-}

_digits !q !r !(BI.BS !fp !len) = \ !acc -> 
  
BI.accursedUnutterablePerformIO $   
  BI.unsafeWithForeignPtr 
fp $ \ptr -> do 
  let end = ptr `plusPtr` len   

go ptr end ptr acc  
where   

go start end = loop 
where   

loop !ptr !acc | ptr == end 

  = return $ Result (ptr `minusPtr` start) acc  
loop !ptr !acc = 
getDigit >>= \ !d ->
if | d <= 9-> update d
   | otherwise -> return $ Result (ptr `minusPtr` start) acc
  where
fromDigit = \w -> fromIntegral w - 0x30 -- i.e. w - '0'
--
{-# NOINLINE getDigit #-}
getDigit | ptr /= end = fromDigit <$> peek ptr
 | otherwise  = pure 10  -- End of input
--
update d
| keepGoing = loop (ptr `plusPtr` 1) (acc * 10 + d)
| otherwise = return Overflow
  where
{-# NOINLINE keepGoing #-}
!keepGoing = acc < q || acc == q && d <= r

The Core code is now, with the duplicate comparison as the only visible
inefficiency.

-- The exit/exit3 joins could be combined but are small,
-- ditto with exit1/exit2.

Rec {
-- RHS size: {terms: 190, types: 146, coercions: 0, joins: 8/10}
$wconsume
  :: ByteString -> Int# -> Word# -> Maybe (Word64, ByteString)
$wconsume
  = \ (w :: ByteString) (ww :: Int#) (ww1 :: Word#) ->
  case w of wild {
Empty ->
  case ww of {
__DEFAULT -> Just (W64# ww1, Empty);
0# -> Nothing
  };
Chunk dt dt1 dt2 cs ->
  let {
end :: Addr#
end = plusAddr# dt dt2 } in
  join {
$s$j
  :: Int# -> Word# -> State# RealWorld -> Maybe (Word64, 
ByteString)
$s$j (sc :: Int#) (sc1 :: Word#) (sc2 :: State# RealWorld)
  = case touch# dt1 sc2 of { __DEFAULT ->
case ==# sc dt2 of {
  __DEFAULT ->
case ># sc 0# of {
  __DEFAULT ->
case ww of {
  __DEFAULT -> Just (W64# sc1, wild);
  0# -> Nothing
};
  1# -> Just (W64# sc1, Chunk (plusAddr# dt sc) dt1 (-# 
dt2 sc) cs)
};
  1# -> $wconsume cs (orI# ww sc) sc1
}
} } in
  join {
exit
  :: Addr# -> Word# -> State# RealWorld -> Maybe (Word64, 
ByteString)
exit (ww2 :: A

Re: [Take 2] Unexpected duplicate join points in "Core" output?

2021-11-20 Thread Viktor Dukhovni
On Sat, Nov 20, 2021 at 12:49:08PM +0100, Andreas Klebinger wrote:

> For the assembly I opened a ticket:
> https://gitlab.haskell.org/ghc/ghc/-/issues/20714

Thanks, much appreciated.  Understood re redundant join points, though
in the non-toy context the redundnat point code is noticeably larger.

join {
  exit4
:: Addr# -> Word# -> State# RealWorld -> Maybe (Int64, 
ByteString)
  exit4 (ww4 :: Addr#) (ww5 :: Word#) (ipv :: State# RealWorld)
= case touch# dt1 ipv of { __DEFAULT ->
  let {
dt3 :: Int#
dt3 = minusAddr# ww4 dt } in
  case ==# dt3 dt2 of {
__DEFAULT -> jump exit1 ww2 wild dt dt1 dt2 cs dt3 ww5;
1# -> jump $wconsume cs (orI# ww2 dt3) ww5
  }
  } } in
join {
  exit5
:: Addr# -> Word# -> State# RealWorld -> Maybe (Int64, 
ByteString)
  exit5 (ww4 :: Addr#) (ww5 :: Word#) (w1 :: State# RealWorld)
= case touch# dt1 w1 of { __DEFAULT ->
  let {
dt3 :: Int#
dt3 = minusAddr# ww4 dt } in
  case ==# dt3 dt2 of {
__DEFAULT -> jump exit1 ww2 wild dt dt1 dt2 cs dt3 ww5;
1# -> jump $wconsume cs (orI# ww2 dt3) ww5
  }
  } } in

FWIW, these don't appear to be deduplicated, both result from the same
conditional: `acc < q || acc == q && d < 5`.  I need some way to make
this compute a single boolean value without forking the continuation.

There's a another source of code bloat that I'd like to run by you...
In the WIP code for Lazy ByteString 'readInt', I started with:

  readInt !q !r =
\ !s -> consume s False 0
  where
-- All done
consume s@Empty !valid !acc
= if valid then convert acc s else Nothing
-- skip empty chunk
consume (Chunk (BI.BS _ 0) cs) !valid !acc
-- Recurse
= consume cs valid acc
-- process non-empty chunk
consume s@(Chunk c@(BI.BS _ !len) cs) !valid !acc
= case _digits q r c acc of
Result used acc'
| used <= 0 -- No more digits present
  -> if valid then convert acc' s else Nothing
| used < len -- valid input not entirely digits
  -> let !c' = BU.unsafeDrop used c
  in convert acc' $ Chunk c' cs
| otherwise -- try to read more digits
-- Recurse
  -> consume cs True acc'
Overflow -> Nothing

Now _digits is the I/O loop I shared before, and the calling code gets
inlined into that recursive loop with various join points.  But the loop
gets forked into multiple copies which are compiled separately, because
there are two different recursive calls into "consume" that got compiled
into separate "joinrec { ... }".

So I tried instead:

  readInt !q !r =
\ !s -> consume s False 0
  where
-- All done
consume s@Empty !valid !acc
= if valid then convert acc s else Nothing
consume s@(Chunk c@(BI.BS _ !len) cs) !valid !acc
= case _digits q r c acc of
Result used acc'
| used == len -- try to read more digits
-- Recurse
  -> consume cs (valid || used > 0) acc'
| used > 0 -- valid input not entirely digits
  -> let !c' = BU.unsafeDrop used c
  in convert acc' $ Chunk c' cs
| otherwise -- No more digits present
  -> if valid then convert acc' s else Nothing
Overflow -> Nothing

But was slightly surprised to find even more duplication (3 copies
instead of tw) of the I/O loop, because in the call:

consume cs (valid || used > 0) acc'

the boolean argument got floated out, giving:

case valid of
True -> consume cs True acc'
_ -> case used > 0 of
True -> consume cs True acc'
_-> consume cs False acc'

and each of these then generates essentially the same code.  To get the
code to be emitted just once, I had to switch from a Bool "valid" to a
bitwise "valid":

  readInt !q !r =
\ !s -> consume s 0 0
  where
-- All done
consume s@Empty !valid !acc
= if valid /= 0 then convert acc s else Nothing
consume s@(Chunk c@(BI.BS _ !len) cs) !valid !acc
= case _digits q r c acc of
Result used acc'
| used == len -- try to read more digits
-- Recurse
  -> consume cs (valid .|. used) acc'
| used > 0 -- valid input 

[Take 2] Unexpected duplicate join points in "Core" output?

2021-11-19 Thread Viktor Dukhovni
[ Sorry wrong version of attachment in previous message. ]

The below "Core" output from "ghc -O2" (9.2/8.10) for the attached
program shows seemingly rendundant join points:

  join {
exit :: State# RealWorld -> (# State# RealWorld, () #)
exit (ipv :: State# RealWorld) = jump $s$j ipv } in

  join {
exit1 :: State# RealWorld -> (# State# RealWorld, () #)
exit1 (ipv :: State# RealWorld) = jump $s$j ipv } in

that are identical in all but name.  These correspond to fallthrough
to the "otherwise" case in:

   ...
   | acc < q || (acc == q && d <= 5)
 -> loop (ptr `plusPtr` 1) (acc * 10 + d)
   | otherwise -> return Nothing

but it seems that the generated X86_64 code (also below) ultimately
consolidates these into a single target... Is that why it is harmless to
leave these duplicated in the generated "Core"?

[ Separately, in the generated machine code, it'd also be nice to avoid
  comparing the same "q" with the accumulator twice.  A single load and
  compare should I think be enough, as I'd expect the status flags to
  persist across the jump the second test.

  This happens to not be performance critical in my case, because most
  calls should satisfy the first test, but generally I think that 3-way
  "a < b", "a == b", "a > b" branches ideally avoid comparing twice... ]

 Associated Core output

-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
main2 :: Addr#
main2 = "12345678901234567890 junk"#

-- RHS size: {terms: 129, types: 114, coercions: 0, joins: 6/8}
main1 :: State# RealWorld -> (# State# RealWorld, () #)
main1
  = \ (eta :: State# RealWorld) ->
  let {
end :: Addr#
end = plusAddr# main2 25# } in
  join {
$s$j :: State# RealWorld -> (# State# RealWorld, () #)
$s$j _ = hPutStr2 stdout $fShowMaybe4 True eta } in
  join {
exit :: State# RealWorld -> (# State# RealWorld, () #)
exit (ipv :: State# RealWorld) = jump $s$j ipv } in
  join {
exit1 :: State# RealWorld -> (# State# RealWorld, () #)
exit1 (ipv :: State# RealWorld) = jump $s$j ipv } in
  join {
exit2
  :: Addr# -> Word# -> State# RealWorld -> (# State# RealWorld, () 
#)
exit2 (ww :: Addr#) (ww1 :: Word#) (ipv :: State# RealWorld)
  = case eqAddr# ww main2 of {
  __DEFAULT ->
hPutStr2
  stdout
  (++
 $fShowMaybe1
 (case $w$cshowsPrec3 11# (integerFromWord# ww1) [] of
  { (# ww3, ww4 #) ->
  : ww3 ww4
  }))
  True
  eta;
  1# -> jump $s$j ipv
} } in
  joinrec {
$wloop
  :: Addr# -> Word# -> State# RealWorld -> (# State# RealWorld, () 
#)
$wloop (ww :: Addr#) (ww1 :: Word#) (w :: State# RealWorld)
  = join {
  getDigit :: State# RealWorld -> (# State# RealWorld, () #)
  getDigit (eta1 :: State# RealWorld)
= case eqAddr# ww end of {
__DEFAULT ->
  case readWord8OffAddr# ww 0# eta1 of { (# ipv, ipv1 
#) ->
  let {
ipv2 :: Word#
ipv2 = minusWord# (word8ToWord# ipv1) 48## } in
  case gtWord# ipv2 9## of {
__DEFAULT ->
  case ltWord# ww1 1844674407370955161## of {
__DEFAULT ->
  case ww1 of {
__DEFAULT -> jump exit ipv;
1844674407370955161## ->
  case leWord# ipv2 5## of {
__DEFAULT -> jump exit1 ipv;
1# ->
  jump $wloop
(plusAddr# ww 1#)
(plusWord# 18446744073709551610## 
ipv2)
ipv
  }
  };
1# ->
  jump $wloop
(plusAddr# ww 1#) (plusWord# (timesWord# 
ww1 10##) ipv2) ipv
  };
1# -> jump exit2 ww ww1 ipv
  }
  };
1# -> jump exit2 ww ww1 eta1
  } } in
jump getDigit w; } in
  jump $wloop main2 0## realWorld#

 Executable

Unexpected duplicate join points in "Core" output?

2021-11-19 Thread Viktor Dukhovni
The below "Core" output from "ghc -O2" (9.2/8.10) for the attached
program shows seemingly rendundant join points:

  join {
exit :: State# RealWorld -> (# State# RealWorld, () #)
exit (ipv :: State# RealWorld) = jump $s$j ipv } in

  join {
exit1 :: State# RealWorld -> (# State# RealWorld, () #)
exit1 (ipv :: State# RealWorld) = jump $s$j ipv } in

that are identical in all but name.  These correspond to fallthrough
to the "otherwise" case in:

   ...
   | acc < q || (acc == q && d <= 5)
 -> loop (ptr `plusPtr` 1) (acc * 10 + d)
   | otherwise -> return Nothing

but it seems that the generated X86_64 code (also below) ultimately
consolidates these into a single target... Is that why it is harmless to
leave these duplicated in the generated "Core"?

[ Separately, in the generated machine code, it'd also be nice to avoid
  comparing the same "q" with the accumulator twice.  A single load and
  compare should I think be enough, as I'd expect the status flags to
  persist across the jump the second test.

  This happens to not be performance critical in my case, because most
  calls should satisfy the first test, but generally I think that 3-way
  "a < b", "a == b", "a > b" branches ideally avoid comparing twice... ]

 Associated Core output

-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
main2 :: Addr#
main2 = "12345678901234567890 junk"#

-- RHS size: {terms: 129, types: 114, coercions: 0, joins: 6/8}
main1 :: State# RealWorld -> (# State# RealWorld, () #)
main1
  = \ (eta :: State# RealWorld) ->
  let {
end :: Addr#
end = plusAddr# main2 25# } in
  join {
$s$j :: State# RealWorld -> (# State# RealWorld, () #)
$s$j _ = hPutStr2 stdout $fShowMaybe4 True eta } in
  join {
exit :: State# RealWorld -> (# State# RealWorld, () #)
exit (ipv :: State# RealWorld) = jump $s$j ipv } in
  join {
exit1 :: State# RealWorld -> (# State# RealWorld, () #)
exit1 (ipv :: State# RealWorld) = jump $s$j ipv } in
  join {
exit2
  :: Addr# -> Word# -> State# RealWorld -> (# State# RealWorld, () 
#)
exit2 (ww :: Addr#) (ww1 :: Word#) (ipv :: State# RealWorld)
  = case eqAddr# ww main2 of {
  __DEFAULT ->
hPutStr2
  stdout
  (++
 $fShowMaybe1
 (case $w$cshowsPrec3 11# (integerFromWord# ww1) [] of
  { (# ww3, ww4 #) ->
  : ww3 ww4
  }))
  True
  eta;
  1# -> jump $s$j ipv
} } in
  joinrec {
$wloop
  :: Addr# -> Word# -> State# RealWorld -> (# State# RealWorld, () 
#)
$wloop (ww :: Addr#) (ww1 :: Word#) (w :: State# RealWorld)
  = join {
  getDigit :: State# RealWorld -> (# State# RealWorld, () #)
  getDigit (eta1 :: State# RealWorld)
= case eqAddr# ww end of {
__DEFAULT ->
  case readWord8OffAddr# ww 0# eta1 of { (# ipv, ipv1 
#) ->
  let {
ipv2 :: Word#
ipv2 = minusWord# (word8ToWord# ipv1) 48## } in
  case gtWord# ipv2 9## of {
__DEFAULT ->
  case ltWord# ww1 1844674407370955161## of {
__DEFAULT ->
  case ww1 of {
__DEFAULT -> jump exit ipv;
1844674407370955161## ->
  case leWord# ipv2 5## of {
__DEFAULT -> jump exit1 ipv;
1# ->
  jump $wloop
(plusAddr# ww 1#)
(plusWord# 18446744073709551610## 
ipv2)
ipv
  }
  };
1# ->
  jump $wloop
(plusAddr# ww 1#) (plusWord# (timesWord# 
ww1 10##) ipv2) ipv
  };
1# -> jump exit2 ww ww1 ipv
  }
  };
1# -> jump exit2 ww ww1 eta1
  } } in
jump getDigit w; } in
  jump $wloop main2 0## realWorld#

 Executable disassembly

The jumps at "-1->" and "-2->" that correspond

Re: Documenting GHC: blogs, wiki pages, Notes, Haddocks, etc

2021-09-14 Thread Viktor Dukhovni
> On 14 Sep 2021, at 8:29 am, Hécate  wrote:
> 
> I may have missed an episode or two here but what prevents us from writing 
> Notes as Named Chunks¹, write them where Haddock expects you to put 
> documentation, and refer to them from the relevant spot in the code?
> Viktor (in CC) has done a wonderful work at producing nice layouts for 
> Haddocks in base, and we could learn a couple of lessons from his MRs.

Thanks for the callout.  My contribution to the documentation has thus far been
limited to just Data.Foldable and Data.Traversable, though I was hoping that
the approach might catch on if others find it a step in the right direction.

Specific content aside, in terms of haddock techniques, the main thing I
did was to append a more expansive prose overview of a library module below
the function synopses.  This did not require anything fancy, just `$section`
references from the module header.

However I also needed to occasionally create hyperlinks within the overview,
and here I ran into a limitation.  Haddock renders a hyperlink to a particular
anchor section of a module as:

Module.Name

with no syntax to customise the user-friendly text.  This means that one
is forced into some linguistic contortions to create natural sentences
with the desired hyperlinks.

This is particularly tricky when the hyperlink will appear not only in
the prose of a module's overview, but also in the synopsis of a function
or a class that may be re-exported by another module (e.g. the Prelude).

It would ideally be possible to render the hyperlink differently in its
"home" module than in a re-exporting module.

Otherwise, I found anchors and hyperlinks to be largely usable...

-- 
Viktor.

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Failed to build latest stable GHC on FreeBSD with Hadrian

2021-08-27 Thread Viktor Dukhovni
On Fri, Aug 27, 2021 at 07:15:26PM +0200, Alexis Praga wrote:

> As a complete beginner in regards to GHC, I tried to build GHC 9.2 as it
> looked like the latest stable from git.I failed to build 9.0.1 before that.
> 
> After checking out the ghc-9.2 branch, I ran (following the wiki):
> 
> > ./boot
> > set LOCALBASE=/usr/local
> > ./configure --with-gmp-includes=$LOCALBASE/include 
> > --with-gmp-libraries=$LOCALBASE/lib --disable-large-address-space
> > hadrian/build -j
> 

The attached script works for me on FreeBSD 12.2.  Perhaps it'll
work for you as well (you many to tweek some of the configured
paths).

-- 
Viktor.


build.sh
Description: Bourne shell script
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: primitive (byte) string literal with length?

2021-08-25 Thread Viktor Dukhovni
On Wed, Aug 25, 2021 at 07:05:58PM +0300, Oleg Grenrus wrote:

> The newew proposal [1] is tagged as "needs revision". It doesn't
> include(# Int#, Addr# #), but those are easy to get from ByteArray#
> which has negligible overhead.
> [...]
> [1] https://github.com/ghc-proposals/ghc-proposals/pull/292

Yes, ByteArray# literals would work just as well for my needs.

The one thing that's missing, from the proposed variants:

Rather than adding new syntax, this proposal leverages an existing
GHC extension: QuasiQuotes. Rather than using TemplateHaskell, these
quasiquoters would be built in to the compiler. Here are some
examples of ByteArray# literals under this scheme:

[octets|fe01bce8|] -- ByteArray# (four bytes)
[utf8|Araña|]  -- ByteArray# (UTF-8)
[utf16|Araña|] -- ByteArray# (UTF-16, native endian)
[utf16le|Araña|]   -- ByteArray# (UTF-16, little endian)
[utf16be|Araña|]   -- ByteArray# (UTF-16, big endian)

is a syntax for octet-strings that does not force hex encoding of every
byte, thus something along the lines of:

[octetstr|foo%A0bar|] -- ByteArray# (seven bytes)

The "%hh" hex octet could be "\hh" or "\xhh", ... whatever is deemed
sufficiently natural/readable (perhaps "foo\xA0\&bar" for consistency
with Haskell strings?).  The "\xhh" form would be familiar to Python
users:

>>> x = b'foo\xA0bar'
>>> len(x)
7
>>> x[3]
160

So, I support the proposal, even though quasi-quoters are more bulky
than "somebytes"##, they have the advantage of supporting multiple
variant formats.  I might be tempted to use "octets" for the non-hex
form with "%" or other escapes, and "hexstr" (or similar) for the hex
form.

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: primitive (byte) string literal with length?

2021-08-25 Thread Viktor Dukhovni
On Tue, Aug 24, 2021 at 09:03:30AM -0400, Viktor Dukhovni wrote:

I originally wrote:

> > >Is there any GHC syntax for constructing a primitive string literal
> > >with a known (not hand coded) byte count?
> > >With `"some bytes"#` I get just the `Addr#` pointer, but not the size.
> > >
> > >If there's nothing available, would it be reasonable to introduce a new
> > >syntax?
> > >Perhaps:
> > >
> > >   "some bytes"## :: (# Addr#, Int# #)

But neglected to mention that I knew about `cstringLength#`, but found
it wanting, because it does not support octet-strings with embedded NUL
characters:

> Sadly, that does not work when the primitive octet string contains
> internal NUL bytes.
> 
> λ> :set -package ghc-prim
> λ> :set -XMagicHash
> λ> import GHC.CString
> λ> import GHC.Int
> λ>
> λ> I# (cstringLength# "foobar\xa0"#)
> 7
> λ> I# (cstringLength# "foo\0bar\xa0"#)
> 3

If there isn't some other extant work-around, any feedback on my
proposal of a new syntax for a primitive unboxed (address, length) pair:

"some bytes"## :: (# Addr#, Int# #)

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: primitive (byte) string literal with length?

2021-08-24 Thread Viktor Dukhovni
On Tue, Aug 24, 2021 at 08:48:53AM +0200, Sylvain Henry wrote:

> Le 24 août 2021 à 06:34, à 06:34, Viktor Dukhovni  a 
> écrit:
> >
> >Is there any GHC syntax for constructing a primitive string literal
> >with a known (not hand coded) byte count?
> >With `"some bytes"#` I get just the `Addr#` pointer, but not the size.
> >
> >If there's nothing available, would it be reasonable to introduce a new
> >syntax?
> >Perhaps:
> >
> > "some bytes"## :: (# Addr#, Int# #)
>
> You can use cstringLength# which has a constant-folding rules for
> literals. That's what we use in GHC to build FastString literals.

Sadly, that does not work when the primitive octet string contains
internal NUL bytes.

λ> :set -package ghc-prim
λ> :set -XMagicHash
λ> import GHC.CString
λ> import GHC.Int
λ>
λ> I# (cstringLength# "foobar\xa0"#)
7
λ> I# (cstringLength# "foo\0bar\xa0"#)
3

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


primitive (byte) string literal with length?

2021-08-23 Thread Viktor Dukhovni

Is there any GHC syntax for constructing a primitive string literal with a 
known (not hand coded) byte count?
With `"some bytes"#` I get just the `Addr#` pointer, but not the size.

If there's nothing available, would it be reasonable to introduce a new syntax?
Perhaps:

"some bytes"## :: (# Addr#, Int# #)

-- 
Viktor.

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Trying to speedup GHC compile times...Help!

2021-07-02 Thread Viktor Dukhovni
On Fri, Jul 02, 2021 at 08:08:39AM +, Simon Peyton Jones via ghc-devs wrote:

> I strongly urge you to keep a constantly-update status wiki page,
> which lists the ideas you are working on, and points to relevant
> resources and tickets.  An email thread like this is a good way to
> gather ideas, but NOT a good way to organise and track them.

I remain curious as to whether "Scrap your type applications" is worth a
second look.  There are edge cases in which compile time blowup is a
result of type blowup (as opposed to code blowup via inlining).  Might
GHC have changed enough in the last ~5 years to make it now "another
compiler":

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/if.pdf

(Section 4.4):

Overall, allocation decreased by a mere 0.1%. The largest reduction was
4%, and the largest increase was 12%, but 120 of the 130 modules showed a
change of less than 1%. Presumably, the reduction in work that arises
from smaller types is balanced by the additional overheads of SystemIF.
On this evidence, the additional complexity introduced by the new
reduction rules does not pay its way. Nevertheless, these are matters
that are dominated by nitty-gritty representation details, and the
balance might well be different in another compiler.

Could it be that some of the more compile time intensive packages on hackage
(aeson, vector, ...) would benefit more than the various modules in base?

Wild speculation aside, of course finding and fixing inefficiencies in
the implementation of existing common primitive should be a win across
the board, and should not require changing major compiler design
features, just leaner code.

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: GHC and the future of Freenode

2021-06-06 Thread Viktor Dukhovni
On 19 May 2021, at 11:48 am, Carter Schonwald  
wrote:

> I personally vote for irc.  Perhaps via Libera. 

Perhaps I'm too much of an IRC noob, but I still found it it rather
surprising to be banned from libera.chat (my IP is blacklisted) for
pasting a 25-line build script for building GHC via hadrian on FreeBSD
into the #ghc channel.

This was in response to a discussion about issues with the bindist,
how the port is built, ... and while perhaps I'm expected to use a
paste bin, the abrupt ban was rather a harsh response.

The ban appears to have been "temporary", an hour or so later I am
able to reconnect, but this does not leave a good impression.

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: ghcup failed

2021-06-02 Thread Viktor Dukhovni
On Wed, Jun 02, 2021 at 07:06:59PM +, Simon Peyton Jones via ghc-devs wrote:

> "/home/simonpj/.ghcup/ghc/8.10.4/lib/ghc-8.10.4/bin/ghc-pkg" --force 
> --global-package-db 
> "/home/simonpj/.ghcup/ghc/8.10.4/lib/ghc-8.10.4/package.conf.d" update 
> rts/dist/package.conf.install
> 
> ghc-pkg: Couldn't open database 
> /home/simonpj/.ghcup/ghc/8.10.4/lib/ghc-8.10.4/package.conf.d for 
> modification: {handle: 
> /home/simonpj/.ghcup/ghc/8.10.4/lib/ghc-8.10.4/package.conf.d/package.cache.lock}:
>  hLock: invalid argument (Invalid argument)

With WSL2, what sort of filesystem is /home/?  Is a native
Linux filesystem (ext4, btrfs, ...) over a block device, or is it NTFS
(either shared with Windows or dedicated for WSL2)?

The "hLock" function in GHC.IO.Handle.Lock makes use of a "lockImpl"
handle that on linux typically expects to find working support for the
sane "open file descriptor locking", which avoids historical POSIX lock
breakage by using:

   F_OFD_SETLK
   F_OFD_SETLKW
   F_OFD_GETLK

The supporting is in GHC.IO.Handle.Lock.LinuxOFD.  It appears that

   F_OFD_SETLKW

is failing on WSL2 with EINVAL.  It is not clear whether the issue is
lack of support for F_OFD_SETLKW in the fcntl(2) implementation, or
something about the structure that's passed to acquire the lock:

instance Storable FLock where
sizeOf _ = #{size struct flock}
alignment _ = #{alignment struct flock}
poke ptr x = do
fillBytes ptr 0 (sizeOf x)
#{poke struct flock, l_type}   ptr (l_type x)
#{poke struct flock, l_whence} ptr (l_whence x)
#{poke struct flock, l_start}  ptr (l_start x)
#{poke struct flock, l_len}ptr (l_len x)
#{poke struct flock, l_pid}ptr (l_pid x)
peek ptr =
FLock <$> #{peek struct flock, l_type}   ptr
  <*> #{peek struct flock, l_whence} ptr
  <*> #{peek struct flock, l_start}  ptr
  <*> #{peek struct flock, l_len}ptr
  <*> #{peek struct flock, l_pid}ptr

or perhaps an issue with locking generally for the filesystem in
question.

Whether the lock is per open file or per file object across all its
open instances (POSIX breakage) should not depend on the filesystem
type, so if locking works with F_SETLKW, it should also work with
F_OFD_SETLW, provided the latter is supported at all.

It should be possible to test lock support on WSL2 with a simple
program (source attached), compiled via:

$ make CFLAGS=-D_GNU_SOURCE ofdlock

and executed (with CWD in the relevant filesystem):

$ ./ofdlock ofdlock.dat
Size of struct flock = 32
  l_type ofset = 0
  l_whence ofset = 2
  l_start ofset = 8
  l_len ofset = 16
  l_pid ofset = 24

This should not report any errors, and should return a size and
structure offset values that match the upstream compilation environment.

-- 
Viktor.
#include 
#include 
#include 
#include 

#include 
#include 

int main(int argc, char *argv[])
{
int fd;
struct flock lck;

if (argc < 2)
errx(1, "Usage: %s pathname", argv[0]);

printf("Size of struct flock = %zu\n", sizeof(lck));
printf("  l_type ofset = %zu\n",   (char *)&lck.l_type - (char *)&lck);
printf("  l_whence ofset = %zu\n", (char *)&lck.l_whence - (char *)&lck);
printf("  l_start ofset = %zu\n", (char *)&lck.l_start - (char *)&lck);
printf("  l_len ofset = %zu\n", (char *)&lck.l_len - (char *)&lck);
printf("  l_pid ofset = %zu\n", (char *)&lck.l_pid - (char *)&lck);

if ((fd = open(argv[1], O_WRONLY|O_CREAT, 0666)) < 0)
err(1, "open: %s", argv[1]);

memset((void *)&lck, 0, sizeof(lck));
lck.l_type   = F_WRLCK;
lck.l_whence = SEEK_SET;
if (fcntl(fd, F_OFD_SETLKW, &lck) != 0)
err(1, "fcntl(F_WRLCK): %s", argv[1]);

lck.l_type = F_UNLCK;
if (fcntl(fd, F_OFD_SETLK, &lck) != 0)
err(1, "fcntl(F_UNLCK): %s", argv[1]);
return 0;
}
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: instance {Semigroup, Monoid} (Bag a) ?

2021-04-14 Thread Viktor Dukhovni
On Wed, Apr 14, 2021 at 06:26:38PM +, Richard Eisenberg wrote:

> In the work on simplifying the error-message infrastructure (heavy
> lifting by Alfredo, in cc), I've been tempted (twice!) to add
> 
> > instance Semigroup (Bag a) where
> >   (<>) = unionBags
> > 
> > instance Monoid (Bag a) where
> >   mempty = emptyBag
> 
> to GHC.Data.Bag.

I agree that the new Monoid is appropriate.

> The downside to writing these is that users might be tempted to write
> e.g. mempty instead of emptyBag, while the latter gives more
> information to readers and induces less manual type inference (to a
> human reader). The upside is that it means Bags work well with
> Monoid-oriented functions, like foldMap.

I don't see the possibility of writing `mempty` as an issue.  I find
myself not infrequently writing `mempty` for, e.g., empty ByteStrings,
rather than ByteString.empty, because while there are lots of
type-specific "empties", they often need to be used qualified, while the
polymorphic `mempty` is both clear and flexible.

If anything, what's atypical here is that "emptyBag" has the type in its
name.  With many other types we have:

empty :: ByteString
empty :: ByteString.Builder
empty :: Map k v
empty :: Set a
empty :: Seq a
empty :: Text
empty :: Vector a
...

when the type is a Monoid, it is much simpler to just use mempty.

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: How to ensure optimization for large immutable vectors to be shared w.r.t. Referential Transparency

2021-04-06 Thread Viktor Dukhovni
On Tue, Apr 06, 2021 at 11:10:51AM -0400, Viktor Dukhovni wrote:

> > λ> let v = VS.fromList [3,2,5] in isSameVector (SomeVector v) (SomeVector v)
> 
> One thing I'm not sure about, that perhaps someone else can shed light
> on, is whether with optimisation one might expect the two (SomeVector v)
> values to be subject to CSE, given that they both invoke `v` at the same
> type.  Is there a non-default optimisation flag that makes CSE more
> aggressive that would make that happen?

On a hunch I tried suppressing the inlining of the definition of `v`,
and CSE then kicked in...

{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE NoMonomorphismRestriction #-}

import Control.Monad.ST
import Data.Coerce
import qualified Data.Vector.Storable as VS

newtype SomeVector = SomeVector (VS.Vector Int) 


isSameVector :: SomeVector -> SomeVector -> Bool
isSameVector !(SomeVector x) !(SomeVector y) = runST $ do
  mx@(VS.MVector !x'offset !x'fp) <- VS.unsafeThaw x
  my@(VS.MVector !y'offset !y'fp) <- VS.unsafeThaw y
  _ <- VS.unsafeFreeze mx
  _ <- VS.unsafeFreeze my
  return $ x'offset == y'offset && x'fp == y'fp

makev = VS.fromList [0..1023]
{-# NOINLINE makev #-}

main :: IO ()
main = 
let !v = makev
 in print $ isSameVector (SomeVector v) (SomeVector v)

So it appears that inlining of `v` into (SomeVector v) is the proximate
barrier to identifying the two (SomeVector v) terms.  Is this expected?

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: How to ensure optimization for large immutable vectors to be shared w.r.t. Referential Transparency

2021-04-06 Thread Viktor Dukhovni
On Tue, Apr 06, 2021 at 10:58:20PM +0800, YueCompl via ghc-devs wrote:

> On a second thought, maybe GHCi's silence is a bad thing here? Maybe
> it should complain loudly as GHC does?

No, GHCi is doing the expected thing.  Because GHCi's REPL sees one line
at a time, it is not generally possible for it to infer specific
monomorphic types on the spot, so GHCi infers the polymorphic type.

And as for the complaint, that was because I prepended: v `seq` ...
in which the type of `v` is ambiguous when polymorphic.  GHCi also
complains if you try that.

> λ> let v = VS.fromList [3,2,5] in isSameVector (SomeVector v) (SomeVector v)

One thing I'm not sure about, that perhaps someone else can shed light
on, is whether with optimisation one might expect the two (SomeVector v)
values to be subject to CSE, given that they both invoke `v` at the same
type.  Is there a non-default optimisation flag that makes CSE more
aggressive that would make that happen?

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: How to ensure optimization for large immutable vectors to be shared w.r.t. Referential Transparency

2021-04-06 Thread Viktor Dukhovni
On Tue, Apr 06, 2021 at 07:12:51PM +0800, YueCompl via ghc-devs wrote:

> λ> import Control.Monad.ST
> λ> import qualified Data.Vector.Storable as VS
> λ> 
> λ> :{
> λ| 
> λ| newtype SomeVector = SomeVector (VS.Vector Int)
> λ| 
> λ| isSameVector :: SomeVector -> SomeVector -> Bool
> λ| isSameVector (SomeVector !x) (SomeVector !y) = runST $ do
> λ|   mx@(VS.MVector !x'offset !x'fp) <- VS.unsafeThaw x
> λ|   my@(VS.MVector !y'offset !y'fp) <- VS.unsafeThaw y
> λ|   _ <- VS.unsafeFreeze mx
> λ|   _ <- VS.unsafeFreeze my
> λ|   return $ x'offset == y'offset && x'fp == y'fp
> λ| 
> λ| :}
> λ> 
> λ> let !v = VS.fromList [3,2,5] in isSameVector (SomeVector v) (SomeVector v)
> False
> λ> 
> λ> let !v = SomeVector (VS.fromList [3,2,5]) in isSameVector v v
> True

In GHCi, but not in compiled programs, by default the
`NoMonomorphismRestriction` extension is enabled.  If I compile your
code with that restriction, I can reproduce your results (the values are
not shared).

If I either skip the extension, or add an explicit type annotation to
for the vector, then the values are shared.

{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE NoMonomorphismRestriction #-}
import Control.Monad.ST
import qualified Data.Vector.Storable as VS

newtype SomeVector = SomeVector (VS.Vector Int)

isSameVector :: SomeVector -> SomeVector -> Bool
isSameVector (SomeVector !x) (SomeVector !y) = runST $ do
  mx@(VS.MVector !x'offset !x'fp) <- VS.unsafeThaw x
  my@(VS.MVector !y'offset !y'fp) <- VS.unsafeThaw y
  _ <- VS.unsafeFreeze mx
  _ <- VS.unsafeFreeze my
  return $ x'offset == y'offset && x'fp == y'fp

main :: IO ()
main =
let !v = VS.fromList [0..1023] -- :: VS.Vector Int
 in print $ isSameVector (SomeVector v) (SomeVector v)

Since newtypes are always strict in their argument, I don't think the
BangPattern does what you'd like it to do, it just makes "main" strict
in v.  As defined with `NoMonomorphismRestriction` v is a polymorphic
function, and I guess it is specialised at the call site.

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Type inference of singular matches on GADTs

2021-03-28 Thread Viktor Dukhovni
On Sun, Mar 28, 2021 at 11:00:56PM -0400, Carter Schonwald wrote:

> On Sun, Mar 28, 2021 at 10:19 PM Richard Eisenberg  wrote:
>
> > I think this is the key part of Alexis's plea: that the type checker take
> > into account exhaustivity in choosing how to proceed.
> >
> > Another way to think about this:
> >
> > f1 :: HList '[] -> ()
> > f1 HNil = ()
> >
> > f2 :: HList as -> ()
> > f2 HNil = ()
> >
> > Both f1 and f2 are well typed definitions. In any usage site where both
> > are well-typed, they will behave the same. Yet f1 is exhaustive while f2 is
> > not. ...
>
> I like how you've boiled down this discussion, it makes it much clearer to
> me at least :)

+1.  Very much distills it for me too.  Thanks!

FWIW, I've since boiled down the pattern-synonym example to the below,
where I find the choices of ":^" and ":$" to be pleasantly mnemonic,
though "HSolo" is perhaps a bit too distracting...

{-# language DataKinds, FlexibleInstances, FlexibleContexts, GADTs
   , PatternSynonyms, TypeOperators #-}
{-# OPTIONS_GHC -Wno-type-defaults #-}
import Data.Reflection
import Data.Proxy

default (Int)

data HList as where
  HNil_  :: HList '[]
  HCons_ :: a -> HList as -> HList (a ': as)
infixr 5 `HCons_`

pattern HNil :: HList '[];
pattern HNil  = HNil_

pattern HSolo :: a -> HList '[a]
pattern HSolo a = a :^ HNil

pattern (:^) :: a -> HList as -> HList (a ': as)
pattern (:^) a as = HCons_ a as
infixr 5 :^

pattern (:$) :: a -> b -> HList '[a,b]
pattern (:$) a b = a :^ HSolo b
infixr 5 :$

hApp :: Reifies s (HList as) => (HList as -> r) -> Proxy s -> r
hApp f = f . reflect

main :: IO ()
main = do
print $ reify HNil  $ hApp (\ HNil -> 42)
print $ reify (HSolo42) $ hApp (\ (HSolo a) -> a)
print $ reify (28 :$ "0xe") $ hApp (\ (a :$ b) -> a + read b)

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Type inference of singular matches on GADTs

2021-03-26 Thread Viktor Dukhovni
On Fri, Mar 26, 2021 at 07:41:09PM -0500, Alexis King wrote:

> type applications in patterns are still not enough to satisfy me. I 
> provided the empty argument list example because it was simple, but I’d 
> also like this to typecheck:
> 
> baz :: Int -> String -> Widget
> baz = 
> 
> bar = foo (\(a `HCons` b `HCons` HNil) -> baz a b)
> 

Can you be a bit more specific on how the constraint `Blah` is presently
defined, and how `foo` uses the HList type to execute a function of the
appropriate arity and signature?

The example below my signature typechecks, provided I use pattern
synonyms for the GADT constructors, rather than use the constructors
directly.

-- 
Viktor.

{-# language DataKinds
   , FlexibleInstances
   , GADTs
   , PatternSynonyms
   , ScopedTypeVariables
   , TypeApplications
   , TypeFamilies
   , TypeOperators
   #-}

import GHC.Types
import Data.Proxy
import Type.Reflection
import Data.Type.Equality

data HList as where
  HNil_  :: HList '[]
  HCons_ :: a -> HList as -> HList (a ': as)
infixr 5 `HCons_`

pattern HNil :: HList '[];
pattern HNil = HNil_
pattern (:^) :: a -> HList as -> HList (a ': as)
pattern (:^) a as = HCons_ a as
pattern (:$) a b = a :^ b :^ HNil
infixr 5 :^
infixr 5 :$

class Typeable as => Blah as where
params :: HList as
instance Blah '[Int,String] where
params = 39 :$ "abc"

baz :: Int -> String -> Int
baz i s = i + length s

bar = foo (\(a :$ b) -> baz a b)

foo :: Blah as => (HList as -> Int) -> Int
foo f = f params
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: GHC 8.10 backports?

2021-03-21 Thread Viktor Dukhovni
On Mon, Mar 22, 2021 at 12:39:28PM +0800, Gergő Érdi wrote:

> I'd love to have this in a GHC 8.10 release:
> https://mail.haskell.org/pipermail/ghc-devs/2021-March/019629.html

This is already in 9.0, 9.2 and master, but it is a rather non-trivial
change, given all the new work that went into the String case.  So I am
not sure it is small/simple enough to make for a compelling backport.

There's a lot of recent activity in this space.  See also
, which is not
yet merged into master, and might still be eta-reduced one more step).

I don't know whether such optimisation tweaks (not a bugfix) are in
scope for backporting, we certainly need to be confident they'll not
cause any new problems.  FWIW, 5259 is dramatically simpler...

Of course we also have
 in much the
same territory, but there we're still blocked on someone figuring out
what's going on with the 20% compile-time hit with T13056, and whether
that's acceptable or not...

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Type inference of singular matches on GADTs

2021-03-20 Thread Viktor Dukhovni
On Sat, Mar 20, 2021 at 08:13:18AM -0400, Viktor Dukhovni wrote:

> As soon as I try add more complex contraints, I appear to need an
> explicit type signature for HNil, and then the code again compiles:

But aliasing the promoted constructors via pattern synonyms, and using
those instead, appears to resolve the ambiguity.

-- 
Viktor.

{-# LANGUAGE
DataKinds
  , GADTs
  , PatternSynonyms
  , PolyKinds
  , ScopedTypeVariables
  , TypeFamilies
  , TypeOperators
  #-}

import GHC.Types

infixr 1 `HC`

data HList as where
  HNil  :: HList '[]
  HCons :: a -> HList as -> HList (a ': as)

pattern HN :: HList '[];
pattern HN = HNil
pattern HC :: a -> HList as -> HList (a ': as)
pattern HC a as = HCons a as

class Nogo a where

type family   Blah (as :: [Type]) :: Constraint
type instance Blah '[]= ()
type instance Blah (_ ': '[]) = ()
type instance Blah (_ ': _ ': '[]) = ()
type instance Blah (_ ': _ ': _ ': _) = (Nogo ())

foo :: (Blah as) => (HList as -> Int) -> Int 
foo _ = 42

bar :: Int
bar = foo (\ HN -> 1)

baz :: Int
baz = foo (\ (True `HC` HN) -> 2)

pattern One :: Int
pattern One = 1
bam :: Int
bam = foo (\ (True `HC` One `HC` HN) -> 2)
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Type inference of singular matches on GADTs

2021-03-20 Thread Viktor Dukhovni
On Sat, Mar 20, 2021 at 04:40:59AM -0500, Alexis King wrote:

> Today I was writing some code that uses a GADT to represent 
> heterogeneous lists:
> 
> data HList as where
>    HNil  :: HList '[]
>    HCons :: a -> HList as -> HList (a ': as)
> 
> This type is used to provide a generic way to manipulate n-ary 
> functions. Naturally, I have some functions that accept these n-ary 
> functions as arguments, which have types like this:
> 
> foo :: Blah as => (HList as -> Widget) -> Whatsit
> 
> The idea is that Blah does some type-level induction on as and supplies 
> the function with some appropriate values. Correspondingly, my use sites 
> look something like this:
> 
> bar = foo (\HNil -> ...)
> 
> Much to my dismay, I quickly discovered that GHC finds these expressions 
> quite unfashionable, and it invariably insults them:
> 
> • Ambiguous type variable ‘as0’ arising from a use of ‘foo’
>    prevents the constraint ‘(Blah as0)’ from being solved.

FWIW, the simplest possible example:

{-# LANGUAGE DataKinds, TypeOperators, GADTs #-}

data HList as where
  HNil  :: HList '[]
  HCons :: a -> HList as -> HList (a ': as)

foo :: (as ~ '[]) => (HList as -> Int) -> Int
foo f = f HNil

bar :: Int
bar = foo (\HNil -> 1)

compiles without error.  As soon as I try add more complex contraints, I
appear to need an explicit type signature for HNil, and then the code
again compiles:

{-# LANGUAGE
DataKinds
  , GADTs
  , PolyKinds
  , ScopedTypeVariables
  , TypeFamilies
  , TypeOperators
  #-}

import GHC.Types

data HList as where
  HNil  :: HList '[]
  HCons :: a -> HList as -> HList (a ': as)

class Nogo a where

type family   Blah (as :: [Type]) :: Constraint
type instance Blah '[]= ()
type instance Blah (_ ': '[]) = ()
type instance Blah (_ ': _ ': _)   = (Nogo ())

foo :: (Blah as) => (HList as -> Int) -> Int 
foo _ = 42

bar :: Int
bar = foo (\ (HNil :: HNilT) -> 1)
type HNilT = HList '[]

baz :: Int
baz = foo (\ (True `HCons` HNil :: HOneT Bool) -> 2)
type HOneT a = HList (a ': '[])

Is this at all useful?

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Is referring to GHC-proposals in GHC user manual bad practice or not?

2021-03-17 Thread Viktor Dukhovni
> On Mar 17, 2021, at 2:35 PM, Richard Eisenberg  wrote:
> 
> My vote is that the manual should be self-standing. References to proposals 
> are good, but as supplementary/background reading only. My gold standard 
> always is: if we lost all the source code to GHC and all its compiled 
> versions, but just had the manual and Haskell Reports (but without external 
> references), we could re-create an interface-equivalent implementation. (I 
> say "interface-equivalent" because we do not specify all the details of e.g. 
> optimizations and interface files.) We are very, very far from that gold 
> standard. Yet I still think it's a good standard to aim for when drafting new 
> sections of the manual.

I strongly agree.  Tracking down the evolving proposals, is rather
a chore...

-- 
Viktor.

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Build failure -- missing dependency? Help!

2021-03-15 Thread Viktor Dukhovni
On Mon, Mar 15, 2021 at 06:44:20AM -0400, Viktor Dukhovni wrote:

> ..., the FreeBSD "validate --legacy"
> successfully builds GHC.  [ The tests seem to all be failing, perhaps
> the test driver scripts are not portable to FreeBSD, but previously
> the compiler was not building. ]

FWIW, the tests seem to fail for two reasons:

1.  The "install   dir" and "test   space" directories don't
appear to be handled correctly.  I had to drop the spaces.

2.  On FreeBSD many tests run into the dreaded:

unhandled ELF relocation(RelA) type 19

Can anyone versed in Elf internals help with:

https://gitlab.haskell.org/ghc/ghc/-/issues/19086

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Build failure -- missing dependency? Help!

2021-03-15 Thread Viktor Dukhovni
On Mon, Mar 15, 2021 at 09:46:35AM +0100, Sylvain Henry wrote:
> >
> > Thank you! Don’t forget to comment it – especially because it is fake.
>
> Done in https://gitlab.haskell.org/ghc/ghc/-/merge_requests/5265

Speaking of build failures with the legacy make system, I see a build
failure on FreeBSD 12.2 with "validate --legacy" that I don't see with
hadrian.  It looks like the C compiler flags aren't quite the same and
warnings are more tolerated in the hadrian build.

The issue is that once "PosixSource.h" is included, FreeBSD (rightly I
believe) hides header prototypes of various non-POSIX extensions.  In
particular pthread_setname_np(3), is not exposed from .

The hadrian build works fine, but the legacy build stops with a fatal
missing prototype.

The fix appears to be include  before "PosixSource.h" as
below.  Since we have no CI for FreeBSD, and this change only affects
FreeBSD, I'm not sure whether it makes sense to burn build CI cycles for
an MR with this change.  What's the right way to proceed?  FWIW, with
your MR and the below patch, the FreeBSD "validate --legacy"
successfully builds GHC.  [ The tests seem to all be failing, perhaps
the test driver scripts are not portable to FreeBSD, but previously
the compiler was not building. ]

--- a/rts/posix/Itimer.c
+++ b/rts/posix/Itimer.c
@@ -17,6 +17,12 @@
  * seems to support.  So much for standards.
  */

+#include "ghcconfig.h"
+#if defined(freebsd_HOST_OS)
+#include 
+#include 
+#endif
+
 #include "PosixSource.h"
 #include "Rts.h"

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: Build failure -- missing dependency? Help!

2021-03-15 Thread Viktor Dukhovni


> On Mar 14, 2021, at 6:53 PM, Simon Peyton Jones via ghc-devs 
>  wrote:
> 
> I’m getting this (with ‘sh validate –legacy’).  Oddly
> 
>   • It does not happen on HEAD
>   • It does happen on wip/T19495, a tiny patch with one innocuous change 
> to GHC.Tc.Gen.HsType
> I can’t see how my patch could possible cause “missing files” in ghc-bignum!
> 
> I’m guessing that there is a missing dependency that someone doesn’t show up 
> in master, but does in my branch, randomly.
> 
> There’s something funny about ghc-bignum; it doesn’t seem to be a regular 
> library
> 
> Can anyone help?

I managed to reproduce the issue on my machine, and noticed that after:

 $ cd libraries/ghc-bignum/
 $ gmake
 $ cd ../..
 $ ./validate --legacy --no-clean

the build continues OK.  So it looks like the legacy parallel build
has a missing dependency on the completion of the build of
libraries/ghc-bignum at the point when it is trying to run:

  $ "inplace/bin/ghc-stage1" -v1 \
-hisuf hi  \
-osuf o  \
-hcsuf hc  \
-static -O0 -H64m -Wall -fllvm-fill-undef-with-garbage -Werror  \
-this-unit-id base-4.16.0.0  \
-hide-all-packages -package-env - -i \
-ilibraries/base/. \
-ilibraries/base/dist-install/build \
-Ilibraries/base/dist-install/build  \
-ilibraries/base/dist-install/build/./autogen \
-Ilibraries/base/dist-install/build/./autogen  \
-Ilibraries/base/include \
-Ilibraries/base/dist-install/build/include \
-optP-include  \
-optPlibraries/base/dist-install/build/./autogen/cabal_macros.h  \
-package-id ghc-bignum-1.0  \
-package-id ghc-prim-0.8.0  \
-package-id rts  \
-this-unit-id base  \
-Wcompat -Wnoncanonical-monad-instances  \
-XHaskell2010 -O  \
-dcore-lint -dno-debug-output  \
-no-user-package-db  \
-rtsopts  \
-Wno-trustworthy-safe -Wno-deprecated-flags -Wnoncanonical-monad-instances  
\
-outputdirlibraries/base/dist-install/build  \
-dynamic-too  \
-c libraries/base/./GHC/Exception/Type.hs-boot  \
-o libraries/base/dist-install/build/GHC/Exception/Type.o-boot  \
-dyno libraries/base/dist-install/build/GHC/Exception/Type.dyn_o-boot

My best guess is that the problem command fires via
libraries/base/dist-install/package-data.mk which
is created by cabal, and things get rather complicated
from there...

-- 
Viktor.

___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: WSL2

2021-03-11 Thread Viktor Dukhovni
On Thu, Mar 11, 2021 at 07:53:20PM +, Simon Peyton Jones via ghc-devs wrote:

> Voila

Thanks! 

> /etc/nsswitch.conf group entry
> group:  files systemd

The main "suspicious" thing here (decoded traces below my signature) is
that the nsswitch.conf file is configured to try "systemd" as a source
of group data, but attempts to contact "systemd" or read the underlying
systemd store directly are failing.  This is different from "not found",
where systemd might have furnished a negative reply (as is the case on
my Fedora 31 system, see below).

So a failure return code is not surprising, because the answer is not
authoritative, systemd might have answered differently if it had been
possible to query it.  It appears the WSL2 systems have a systemically
misconfigured "nsswitch.conf" that wants to query "group" (and likely
other) data from an unavailable source.

[ Bottom line, the "unix" test case in question may need to be prepared
  to encounter such misconfiguration of the test platform and accept
  either type of error.  Perhaps catch the IO expected IO exception, and
  output a fixed "not found" message regardless of the exception details,
  or by specifically checking for either of the two expected forms. ]

By way of contrast, on my Fedora system, systemd can actually be reached
and appears to respond to the "nss" library's satisfaction:

execve("/usr/bin/getent", ["getent", "group", "xyzzy0"], 0x7fff3afbcca0 /* 
31 vars */) = 0
...
openat(AT_FDCWD, "/lib64/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/etc/group", O_RDONLY|O_CLOEXEC) = 3
read(3, "root:x:0:\nbin:x:1:\ndaemon:x:2:\ns"..., 4096) = 1161
read(3, "", 4096)   = 0
...
openat(AT_FDCWD, "/lib64/libnss_systemd.so.2", O_RDONLY|O_CLOEXEC) = 3
access("/etc/systemd/dont-synthesize-nobody", F_OK) = -1 ENOENT (No such 
file or directory)
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
connect(3, {sa_family=AF_UNIX, sun_path="/run/dbus/system_bus_socket"}, 30) 
= 0
getsockopt(3, SOL_SOCKET, SO_PEERCRED, {pid=1, uid=0, gid=0}, [12]) = 0
getsockopt(3, SOL_SOCKET, SO_PEERSEC, 0x5568c64660e0, [64]) = -1 
ENOPROTOOPT (Protocol not available)
getsockopt(3, SOL_SOCKET, SO_PEERGROUPS, 0x5568c6466130, [256->0]) = 0
sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\0AUTH 
EXTERNAL\r\nDATA\r\n", iov_len=22}, {iov_base="NEGOTIATE_UNIX_FD\r\n", 
iov_len=19}, {iov_base="BEGIN\r\n", iov_len=7}], msg_iovlen=3, 
msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 48
recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="DATA\r\nOK 
7bc788e33c85b875f6b74a6"..., iov_len=256}], msg_iovlen=1, msg_controllen=0, 
msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 58
sendmsg(3, {msg_name=NULL, msg_namelen=0, 
msg_iov=[{iov_base="l\1\0\1\0\0\0\0\1\0\0\0m\0\0\0\1\1o\0\25\0\0\0/org/fre"..., 
iov_len=128}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 
MSG_DONTWAIT|MSG_NOSIGNAL) = 128
recvmsg(3, {msg_name=NULL, msg_namelen=0, 
msg_iov=[{iov_base="l\2\1\1\16\0\0\0\377\377\377\377G\0\0\0\5\1u\0\1\0\0\0", 
iov_len=24}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, 
MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 24
recvmsg(3, {msg_name=NULL, msg_namelen=0, 
msg_iov=[{iov_base="\7\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0"..., 
iov_len=78}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, 
MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 78
sendmsg(3, {msg_name=NULL, msg_namelen=0, 
msg_iov=[{iov_base="l\1\0\1\v\0\0\0\2\0\0\0\247\0\0\0\1\1o\0\31\0\0\0/org/fre"...,
 iov_len=184}, {iov_base="\6\0\0\0xyzzy0\0", iov_len=11}], msg_iovlen=2, 
msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 195
recvmsg(3, {msg_name=NULL, msg_namelen=0, 
msg_iov=[{iov_base="l\4\1\1\16\0\0\0\377\377\377\377\227\0\0\0\7\1s\0\24\0\0\0",
 iov_len=24}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, 
MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 24
recvmsg(3, {msg_name=NULL, msg_namelen=0, 
msg_iov=[{iov_base="org.freedesktop.DBus\0\0\0\0\6\1s\0\t\0\0\0"..., 
iov_len=158}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, 
MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 158
recvmsg(3, {msg_name=NULL, msg_namelen=0, 
msg_iov=[{iov_base="l\3\1\1(\0\0\0\257\30\r\0m\0\0\0\5\1u\0\2\0\0\0", 
iov_len=24}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, 
MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 24
recvmsg(3, {msg_name=NULL, msg_namelen=0, 
msg_iov=[{iov_base="\6\1s\0\t\0\0\0:1.303526\0\0\0\0\0\0\0\4\1s\0*\0\0\0"..., 
iov_len=144}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, 
MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 144
close(3)= 0

-- 
Viktor.

So group lookups are configured to try /etc/group first, and then some
systemd-based machinery (possibly creating groups on the fly, ...).

> == Tracing getent group xyzzy0

execve("/usr/bin/getent", ["getent",

Re: WSL2

2021-03-11 Thread Viktor Dukhovni
On Thu, Mar 11, 2021 at 12:21:15PM +, Simon Peyton Jones via ghc-devs wrote:

> Like Tom, I'm not following the details, but if you want me to run
> some commands and send you the output I can do that.  Just send the
> script!

See attached.  If any of the prerequisite shell utilities are not
installed, the script will exit asking that they be installed.

Please email me the output, or post to the list.  (Should be just a
couple of hundred lines of mostly hex output).

-- 
Viktor.


getgrnam.sh
Description: Bourne shell script
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: WSL2

2021-03-11 Thread Viktor Dukhovni
> On Mar 11, 2021, at 9:41 AM, Tom Ellis 
>  wrote:
> 
> I'm not really following the details, but is this useful to you?
> 
> % cat g.c && cc g.c -o g && ./g
> #include 
> #include 
> #include 
> #include 
> 
> int main(int argc, char **argv)
> {
>char buf[1024];
>struct group g, *p;
>int rc;
> 
>errno = 0;
>rc = getgrnam_r(argc > 1 ? argv[1] : "nosuchgrouphere",
>&g, buf, sizeof(buf), &p);
>printf("%s(%p) %m(%d)\n", p ? g.gr_name : NULL, p, errno);
>return (rc == 0 && p == NULL);
> }
> (null)((nil)) No such process(3)

Yes, it means that the reported error is not an artefact of
the Haskell "unix" package, but rather originates directly
from normal use of the getpwnam_r(3) glibc API on these
systems.

It would now be useful to also post:

  - The output of "./g root" or some other group known to exist.
  - The output of "./g xyzzy" or some other short group name known to
 not exist
  - The output of "grep group /etc/nsswitch.conf"
  - Attach an strace output file (g.trace.txt) from:

strace -o g.trace.txt ./g

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: WSL2

2021-03-11 Thread Viktor Dukhovni
On Thu, Mar 11, 2021 at 06:05:04AM -0500, Viktor Dukhovni wrote:

> So the question is why the lookup is failing.  To that end compiling a
> tracing with "strace" the below C program should tell the story:
> 
> #include 
> #include 
> #include 
> #include 
> 
> int main(int argc, char **argv)
> {
> struct group g, *p;
> char buf[1024];
> int rc;
> 
> errno = 0;
> rc = getgrnam_r("nosuchgrouphere", &g, buf, sizeof(buf), &p);
> printf("%p: %m(%d)\n", p, errno);
> return (rc == 0 && p == NULL);
> }

To experiment with other group names and make sure that at least
group "root" or similar works, a slightly extended version is:

#include 
#include 
#include 
#include 

int main(int argc, char **argv)
{
char buf[1024];
struct group g, *p;
int rc;

errno = 0;
rc = getgrnam_r(argc > 1 ? argv[1] : "nosuchgrouphere",
&g, buf, sizeof(buf), &p);
printf("%s(%p) %m(%d)\n", p ? g.gr_name : NULL, p, errno);
return (rc == 0 && p == NULL);
}

This gives (again Fedora 31) the expected results:

$ make g
cc g.c   -o g
$ ./g
(null)((nil)) Success(0)
$ ./g root
root(0x7ffe6a6225d0) Success(0)

-- 
Viktor.
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


Re: WSL2

2021-03-11 Thread Viktor Dukhovni
On Thu, Mar 11, 2021 at 10:19:52AM +, Tom Ellis wrote:

> SPJ Wrote:
> > I've just installed WSL2 and built GHC. I get this (single)
> > validation failure in libraries/unix/tests/getGroupEntryForName.  It
> > seems to be just an error message wibble, but I can't push a change
> > to master because that'll affect everyone else.
> 
> Interesting, I've only ever built GHC on WSL and WSL2. I've seen this
> error message on WSL2 during every test run, I think.  I didn't
> realise that it never occurred on other platforms, let alone that it
> was WSL2 specific!

I am curious what specific version/branch of GHC (and associated
submodule commit of "unix") is being tested.

I've recently cleaned a bunch of the upstream "unix" handling of the
group/passwd database handling, but I don't believe that GHC has yet
switched to the newer code.

A subtle facet of the delta points in the right direction:

-getGroupEntryForName: getGroupEntryForName: does not exist (no such group)
+getGroupEntryForName: getGroupEntryForName: does not exist (No such 
process)

not only is it complaining about "process" rather than "group", but
crucially the case of the word "No" is different.  The variance is due
to the fact that there are two possible error paths with group lookup
in the group lookup code:

doubleAllocWhileERANGE loc enttype initlen unpack action =
  alloca $ go initlen
 where
  go len res = do
r <- allocaBytes len $ \buf -> do
   rc <- action buf (fromIntegral len) res
   if rc /= 0
--hard-error->   then return (Left rc)
 else do p <- peek res
--not-found-->   when (p == nullPtr) $ notFoundErr
 fmap Right (unpack p)
case r of
  Right x -> return x
  Left rc | Errno rc == eRANGE ->
-- ERANGE means this is not an error
-- we just have to try again with a larger buffer
go (2 * len) res
  Left rc ->
--1-->  ioError (errnoToIOError loc (Errno rc) Nothing Nothing)
  notFoundErr =
--2-->  ioError $ flip ioeSetErrorString ("no such " ++ enttype)
$ mkIOError doesNotExistErrorType loc Nothing Nothing

The expected error path is "not-found" -> (2), where the group lookup
works, but no result is found (rc == 0).  This reports the lower-case
"no such group".

The unexpected error path is a non-zero return from "getgrnam_r"
(action) -> (1), which uses `errno` to build the error string, which
ends up being "No such process".

On Linux systems that's: ESRCH 3 /* No such process */

So the call to "getgrnam_r" failed by returning ESRCH, rather than 0.
The Linux manpage does not suggest to me that one might expect a
non-zero return from getgrnam_r(3) just from a missing entry in the
group file:

RETURN VALUE
   The getgrnam() and getgrgid() functions return a pointer to a
   group structure, or NULL if the matching entry is not found
   or an error occurs.  If an error occurs, errno is set
   appropriately.  If one wants to check errno after the call,
   it should be set to zero before the call.

   The return value may point to a static area, and may be
   overwritten by subsequent calls to getgrent(3), getgrgid(),
   or getgrnam().  (Do not pass the  returned  pointer  to
   free(3).)

   On  success, getgrnam_r() and getgrgid_r() return zero, and
--->   set *result to grp.  If no matching group record was found,
--->   these functions return 0 and store NULL in *result.  In case
--->   of error, an error number is returned, and NULL is stored in
--->   *result.

ERRORS
   0 or ENOENT or ESRCH or EBADF or EPERM or ...
  The given name or gid was not found.

   EINTR  A signal was caught; see signal(7).

   EIOI/O error.

   EMFILE The per-process limit on the number of open file descriptors 
has been reached.

   ENFILE The system-wide limit on the total number of open files has 
been reached.

   ENOMEM Insufficient memory to allocate group structure.

   ERANGE Insufficient buffer space supplied.

The "0 or ENOENT or ESRCH ..." text then plausibly applies to
getgrnam(3), and its legacy behaviour.

So the question is why the lookup is failing.  To that end compiling a
tracing with "strace" the below C program should tell the story:

#include 
#include 
#include 
#include 

int main(int argc, char **argv)
{
struct group g, *p;
char buf[1024];
int rc;

errno = 0;
rc = getgrnam_r("nosuchgrouphere", &g, buf, sizeof(buf), &p);
printf("%p: %m(%d)\n", p, errno);
return (rc == 0 && p == NULL);
}

On a Fedora 31 system I get:

$ make g
cc g.c   -o g
$ ./g
(nil): Success(0)

If somethin