It is likely this GHC issue: https://gitlab.haskell.org/ghc/ghc/-/issues/18832 .
On Sat, 28 Jun 2025 at 17:56, Harendra Kumar <harendra.ku...@gmail.com> wrote: > > We instrumented the GHC locking mechanism as suggested by Viktor and > deployed the instrumented GHC in CI. We got our first crash after 4 months! > > Summary from the preliminary investigation of the crash: > > * The problem is not filesystem dependent or inode reuse related. > * After understanding it better, now I am able to reproduce it on the > local machine. > * The problem may be related to hClose getting interrupted by an async > exception. > > The GHC panic message from the instrumented code is as follows: > > 2025-06-27T09:11:41.8177372Z lockFile: first lock: pid 21804, tid > 21806, id 17 dev 2065 ino 262152 write 1 > 2025-06-27T09:11:41.8178053Z FileSystem.Event: internal error: > close: lock exists: pid 21804, tid 21806, fd 17 > 2025-06-27T09:11:41.8178492Z > 2025-06-27T09:11:41.8178604Z (GHC version 9.2.8 for x86_64_unknown_linux) > 2025-06-27T09:11:41.8179160Z Please report this as a GHC bug: > https://www.haskell.org/ghc/reportabug > > The message is coming from the "close" system call wrapper that I > created. It means we are closing the fd without releasing the lock first. > > Based on the test logs I figured that the "close" call in the panic message is > happening from this code: > > createFile :: FilePath -> FilePath -> IO () > createFile file parent = openFile (parent </> file) WriteMode >>= hClose > > Now the interesting thing is that in this particular test, we are > running two threads. In one thread, we are watching for file system > events related to this file, and in the other thread we are running > the createFile code snippet given above. As soon as the file gets > created we receive a file CREATED event in the first thread and we use > throwTo to send a ThreadAbort async exception to the createFile thread to > kill it. > > My theory is that if the exception is delivered at a certain point > during hClose it returns without releasing the lock but closes the > file. To test that theory I wrote the createFile thread code as shown > below, so that I keep doing open and close in a loop, to make it more > likely to interrupt hClose by the exception: > > createFile :: FilePath -> FilePath -> IO () > createFile file parent = go > where > go = > do > h <- openFile (parent </> file) WriteMode > hClose h > go > > Voila! With this code I am able to reproduce it locally now, though it > takes many runs (in a loop) and more than a few minutes to reproduce. > > Any ideas, whether the problem is in hClose or it may be something else? > Next, I will try a code review of hClose and instrumenting further to narrow > down the problem. > > Thanks, > Harendra > > On Sat, 16 Nov 2024 at 13:02, Viktor Dukhovni <ietf-d...@dukhovni.org> wrote: > > > > On Fri, Nov 15, 2024 at 06:45:40PM +0530, Harendra Kumar wrote: > > > > > Coming back to this issue after a break. I reviewed the code carefully > > > and I cannot find anything where we are doing something in the > > > application code that affects the RTS locking mechanism. Let me walk > > > through the steps of the test up to failure and what we are doing in > > > the code. The test output is like this: > > > > It is indeed not immediately clear where in your code or in some > > dependency (including base, GHC, ...) a descriptor that contributes to > > the RTS file reader/writer count (indexed by (dev, ino)) might be closed > > without adjusting the count by calling the RTS `unlockFile()` function > > (via GHC.IO.FD.release). > > > > It may be worth noting that GHC does not *reliably* prevent simultaneous > > open handles for the same underlying file, because handles returned by > > hDuplicate do not contribute to the count: > > > > demo.hs: > > import GHC.IO.Handle (hDuplicate) > > import System.IO > > > > main :: IO () > > main = do > > fh1 <- dupOpen "/tmp/foo" > > fh2 <- dupOpen "/tmp/foo" > > writeNow fh1 "abc\n" > > writeNow fh2 "def\n" > > readFile "/tmp/foo" >>= putStr > > hClose fh1 > > hClose fh2 > > where > > -- Look Mom, no lock! > > dupOpen path = do > > fh <- openFile path WriteMode > > hDuplicate fh <* hClose fh > > > > writeNow fh s = hPutStr fh s >> hFlush fh > > > > $ ghc -O demo.hs > > [1 of 2] Compiling Main ( demo.hs, demo.o ) > > [2 of 2] Linking demo > > $ ./demo > > def > > > > I am not sure that Haskell really should be holding the application's > > hand in this area, corrupting output files by concurrent writers can > > just as easily happen by running two independent processes. But letting > > go of this guardrail would IIRC be a deviation from the Haskell report, > > and there are likely applications that depend on this (and don't use > > hDupicate or equivalent to break the reader/writer tracking). > > > > -- > > Viktor. > > _______________________________________________ > > ghc-devs mailing list > > ghc-devs@haskell.org > > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs