RE: Bug in IO libraries when sending data through a pipe?
I don't think mmap() provides exactly the right behaviour. It lets you specify that modifications made by the current process aren't committed to the file, but what we want is to snapshot the file so that subsequent modifications by *other* processes aren't seen by the local process. The Linux manpage isn't very explicit, so I've tried it. When making a private mapping (MAP_PRIVATE), it makes a snapshot which is private to the process, and isn't affected by other processes. It's exactly what's needed. (see attachment) It only makes a snapshot of those pages of the file which have been modified by the current process, because that's when the copy-on-write happens (a page is 4k on Linux/x86). In unmodified pages of the file the mapping is still exposed to modifications by other processes, even if the current process has already read data from that page - try it! Cheers, Simon ___ Glasgow-haskell-bugs mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
RE: Bug in IO libraries when sending data through a pipe?
On Mon, 2002-03-18 at 18:30, Simon Marlow wrote: The spec could perhaps *require* that it was a pure value, so that the file contents is snapshotted at the time of the hGetContents and you always get the same result regardless of subsequent or concurrent I/O operations. This can perhaps be implemented with copy-on-write if the OS supports it, but I don't know if any OSs actually do. The mmap system call provides this, on Unix systems. The man page says it's POSIX.1b standard, so it might even be supported by Windows 2000. I don't think mmap() provides exactly the right behaviour. It lets you specify that modifications made by the current process aren't committed to the file, but what we want is to snapshot the file so that subsequent modifications by *other* processes aren't seen by the local process. This strategy won't work for general IO streams (pipes, sockets...), of course. If the spec would require all stream contents to behave like pure values, this would limit lazy streams to files. So you could no longer handle general streams and files the same way... These kinds of streams are immutable, so the problem is less severe. We don't have to worry about the interaction with other processes. Cheers, Simon ___ Glasgow-haskell-bugs mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
RE: Bug in IO libraries when sending data through a pipe?
I've been following this discussion with some interest. It seems to me on important underlying problem is being hinted at, but has never been made explicit: * When we fork(), we lose sharing. *Any* lazy computation which passes to both children is going to penalize you, sometimes in very surprising ways. Thus, raw access to fork is guaranteed to be the wrong thing for nearly everybody all the time. It's probably worth noting this prominently next to any and all documentation for fork, and next to its code. Why? Because use of fork is part of the commonly accepted idiom for running one program from within another. It's likely programmers who've done this in other languages will go looking for fork rather than some nicer, higher-level functionality (POpen?) that has seqs in all the right places and actually does what they want. That said, I'd love to have lazy I/O that actually works right, if only because it actually *does* do the right thing for the 95% of the programs which get written which *aren't* doing fancy I/O. I say this having written programs which use lazy I/O to process files which are much larger than the total virtual memory on my machine (so mmap-ing regular files to snapshot their contents isn't going to be good enough for me, even if it works for smaller files). It seems to me part of the problem is that lazy I/O results in concurrency, and concurrency is hard. This is particularly true as the lazy I/O routines don't say WARNING! CONCURRENCY all over the place. Does this mean we should make semi-closed handles untouchable? Should there be rules that turn lines . getContents into something vaguely sensible and non-lazy? Should we say something sensible about the behavior of anything concurrent-ish across fork? Simon, what would it take to make you stop worrying and love lazy I/O? :-) -Jan-Willem Maessen ___ Glasgow-haskell-bugs mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
RE: Bug in IO libraries when sending data through a pipe?
* When we fork(), we lose sharing. *Any* lazy computation which passes to both children is going to penalize you, sometimes in very surprising ways. I'm not sure I understand why loss of sharing is the problem - losing sharing for pure computations is by no means a disaster, it just means that some work is duplicated. Here's another way to look at the problem with hGetContents: - hGetContents returns a stream whose value may be affected by subsequent I/O operations - evaluating the stream returned by hGetContents may perform some I/O if these two effects are connected to each other, as they are in the fork() example, then evaluating one pure value changes the subsequent value of another. This is clearly not referentially transparent, it's not just a loss of sharing. It's not just fork() that suffers from this interaction - you can get the same effect just using Posix.dupFd Posix.fdToHandle (although it's probably not possible using just standard Haskell 98). Thus, raw access to fork is guaranteed to be the wrong thing for nearly everybody all the time. It's probably worth noting this prominently next to any and all documentation for fork, and next to its code. Why? Because use of fork is part of the commonly accepted idiom for running one program from within another. It's likely programmers who've done this in other languages will go looking for fork rather than some nicer, higher-level functionality (POpen?) that has seqs in all the right places and actually does what they want. Yes, I agree we should stick large red notices next to Posix.forkProcess. That said, I'd love to have lazy I/O that actually works right, if only because it actually *does* do the right thing for the 95% of the programs which get written which *aren't* doing fancy I/O. I say this having written programs which use lazy I/O to process files which are much larger than the total virtual memory on my machine (so mmap-ing regular files to snapshot their contents isn't going to be good enough for me, even if it works for smaller files). It seems to me part of the problem is that lazy I/O results in concurrency, and concurrency is hard. This is particularly true as the lazy I/O routines don't say WARNING! CONCURRENCY all over the place. Does this mean we should make semi-closed handles untouchable? Should there be rules that turn lines . getContents into something vaguely sensible and non-lazy? Should we say something sensible about the behavior of anything concurrent-ish across fork? Simon, what would it take to make you stop worrying and love lazy I/O? :-) If there was a form of lazy I/O which didn't require the programmer to reason about the evaluatedness of the lazy stream, that would remove most of my complaints. I agree that lazy I/O results in concurrency - but it's a particularly intractable form of concurrency because it involves interaction between the IO and pure parts of the program. Concurrent Haskell is easy by comparison: everything's in the IO monad. One example that crops up regularly is trying to lazilly read a large number of files - if you're not careful, you run out of file descriptors. How do you avoid running out of file descriptors? Well, you make sure the lazy stream is fully evaluated. That requires adding seq or otherwise reasoning about whether we've evaluated the stream to the end. If you're going to use seq, then that defeats the purpose of lazy I/O in the first place, and reasoning about the evaluatedness of values is made hard by Haskell's underspecification of the evaulation order. IMHO if you find you need to worry about these things in your program, then you should switch to non-lazy I/O. Let's keep IO in the IO monad! Cheers, Simon ___ Glasgow-haskell-bugs mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
RE: Bug in IO libraries when sending data through a pipe?
On Fre, 2002-03-15 at 15:05, Volker Stolz wrote: Am 15. Mar 2002 um 14:39 MET schrieb Volker Wysk: - If instead the child's child (echo.c) closes stdin immediately after being executed, some data is lost. Where's the use in closing stdin when you're passing arguments as parameters? This is effectively a NOP and shouldn't influence the result, so I'd expect the same missing data as in your original case. Yes, it's a NOP (just to be sure). The difference is in child.hs: callIO (\ps - Kommando fehlgeschlagen mit ++ show ps ++ :\n ++ kommando prog par) (executeFile' prog True par Nothing) -- instead, to avoid bug: -- (hClose stdin executeFile' prog True par Nothing) If you use the commented out line instead of the one in effect ((executeFile' prog...), you get the first behaviour, which is, the problem doesn't occur. I'm still investigating this. It appears that the child process *is* reading from stdin before doing the exec, but I'm not sure why. But this at least partially explains why the parent process doesn't get to see all the data on stdin. Cheers, Simon ___ Glasgow-haskell-bugs mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
RE: Bug in IO libraries when sending data through a pipe?
Yes, it's a NOP (just to be sure). The difference is in child.hs: callIO (\ps - Kommando fehlgeschlagen mit ++ show ps ++ :\n ++ kommando prog par) (executeFile' prog True par Nothing) -- instead, to avoid bug: -- (hClose stdin executeFile' prog True par Nothing) If you use the commented out line instead of the one in effect ((executeFile' prog...), you get the first behaviour, which is, the problem doesn't occur. I'm still investigating this. It appears that the child process *is* reading from stdin before doing the exec, but I'm not sure why. But this at least partially explains why the parent process doesn't get to see all the data on stdin. Ok, here's my explanation. Let's take a look at the code for child (parent is unimportant - you can pipe data into child using 'cat' and still observe the disputed behaviour). Here's the code, simplified slightly: main = do qinh - getContents let pfade = zeilen qinh mapM_ (\pfad - run /bin/true [pfad]) pfade zeilen :: String - [String] zeilen txt = let gruppen = groupBy (\a b - (a == '\n') == (b == '\n')) txt in filter (\str - filter (/= '\n') str /= ) gruppen run prog par = callIO (executeFile' prog True par Nothing) callIO :: IO () - IO () callIO io = do maybepid - forkProcess case maybepid of Nothing - io exitWith ExitSuccess Just pid - do (Just ps) - getProcessStatus True True pid if ps == Exited ExitSuccess then return () else failIO (show ps) The issue here is whether the child process (i.e. the Nothing case in callIO) can cause any reading from stdin by virtue of evaluating the lazy stream returned by getContents. If the child process does cause any reading from stdin, then the parent process will lose data, because any reading by the child won't be seen by the parent. Looking at the code, it appears that the arguments passed to 'run', namely [pfad], will be fully evaluated by this point because groupBy has to examine the characters of the stream in order to split it up into chunks, and furthermore the call to filter will compare the chunk against the empty string. But on further investigation, groupBy is much lazier than this: groupBy :: (a - a - Bool) - [a] - [[a]] groupBy _ [] = [] groupBy eq (x:xs) = (x:ys) : groupBy eq zs where (ys,zs) = span (eq x) xs ie. it returns the chunk immediately with only the first character evaluated. Since the call to filter only needs to examine the chunk to determine whether it is empty or not, the rest of the chunk, and hence everything beyond the next character in the lazy stream, will be unevaluated until the child process needs to construct its argument list to pass to executeFile. Hence the child process pokes on the lazy stream, and when the parent process subsequently demands data from the lazy stream, there is a buffer's worth missing. Moral of this story: don't use lazy I/O. I'll say that again in case you missed it: _ _ | _ \ ___ _ __ ( ) |_ _ _ ___ ___ | | | |/ _ \| '_ \|/| __| | | | / __|/ _ \ | |_| | (_) | | | | | |_ | |_| \__ \ __/ |/ \___/|_| |_| \__| \__,_|___/\___| _ ____ | | __ _ _ _ |_ _| / / _ \ | |/ _` |_ / | | | | | / / | | | | | (_| |/ /| |_| | | | / /| |_| | |_|\__,_/___|\__, | |___/_/ \___/ |___/ Cheers, Simon ___ Glasgow-haskell-bugs mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
Re: Bug in IO libraries when sending data through a pipe?
Am 18. Mar 2002 um 13:51 MET schrieb Simon Marlow: unevaluated until the child process needs to construct its argument list to pass to executeFile. Hence the child process pokes on the lazy stream, and when the parent process subsequently demands data from the lazy stream, there is a buffer's worth missing. Moral of this story: don't use lazy I/O. There's a warning in the documentation of executeFile to exactly the same end after all... -- Volker Stolz * [EMAIL PROTECTED] * PGP + S/MIME ___ Glasgow-haskell-bugs mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
Re: Bug in IO libraries when sending data through a pipe?
In local.glasgow-haskell-bugs, you wrote: I've stripped down my program to produce an example. In the process, the problem disappeard a few times. I hope it shows up on your machine. The attached files reproduce it on my machine, but the exact results vary from run to run. There's no bug in the libraries: The data is read lazily by 'getContents', then you invoke 'forkProcess'. From this moment on, you should have two (heavy-weight) processes competing for input from the same source (stdin). Further proof might be that in making 'getContents' strict by inserting a 'print' statement to force evaluation now returns correct results: .. qinh - getContents let pfade = zeilen qinh print pfade -- FORCE EVALUATION mapM_ (\pfad - run /bin/echo [pfad]) pfade .. Notice that a 'hSetBuffering stdin NoBuffering' before 'getContents' doesn't seem to be sufficient, either. Regards, Volker -- Wonderful \hbox (0.80312pt too nice) in paragraph at lines 16--18 Volker Stolz * [EMAIL PROTECTED] Please use PGP or S/MIME for correspondence! ___ Glasgow-haskell-bugs mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
RE: Bug in IO libraries when sending data through a pipe?
In local.glasgow-haskell-bugs, you wrote: I've stripped down my program to produce an example. In the process, the problem disappeard a few times. I hope it shows up on your machine. The attached files reproduce it on my machine, but the exact results vary from run to run. There's no bug in the libraries: The data is read lazily by 'getContents', then you invoke 'forkProcess'. From this moment on, you should have two (heavy-weight) processes competing for input from the same source (stdin). Thanks Volker. This just strengthens my feeling that lazy I/O is a Very Bad Thing which should be avoided for non-trivial programs. Alternatives are: hGetChar, hGetLine, hGetBuf, and hGetArray (the last one will be in 5.04 and returns an IOUArray of Word8). Cheers, Simon ___ Glasgow-haskell-bugs mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
RE: Bug in IO libraries when sending data through a pipe?
Just to be sure, I've changed to example program a bit (see attachment). I think it now demonstrates clearly that there must be a bug in the libraries. - If the child closes its child's stdin before calling executeFile, all data gets through. - If instead the child's child (echo.c) closes stdin immediately after being executed, some data is lost. Volker #include stdio.h main(int argc, char** argv) { close(0); if (argc = 2) puts(argv[1]); return 0; } 0 :: parent child echo ./parent /dev/null parent : parent.hs ghc -O -o parent parent.hs -package posix -package util -package std -syslib lang child : child.hs ghc -O -o child child.hs -package posix -package util -package std -syslib lang echo : echo.c gcc -o echo echo.c clean :: rm -f *.o *~ parent child echo *.hi import System import List import Posix import PosixUtil import IO main = do pipeto senden ./child [] hPutStrLn stderr (sent: ++ show (length senden)) senden = concat (take 200 (repeat xxx\n)) pipeto :: String - String - [String] - IO () pipeto eing prog par = do catch (do (zu, von) - createPipe -- Pipe von -- zu erzeugen vonh - fdToHandle von -- Für von brauchen wir einen Handle hSetBuffering vonh NoBuffering -- nichts zurückhalten, wenn der Kindprozeß es lesen will mpid - forkProcess -- fork(). Danach hat die Pipe vier offene Enden. case mpid of Nothing - do-- Kind hClose vonh -- Kind-Schreibende schließen dupTo zu (intToFd 0) -- Kind-Leseende auf die Standardeingabe kopieren fdClose zu-- erstes Kind-Leseende schließen executeFile' prog True par Nothing Just pid - do -- Vater fdClose zu-- Vater-Leseende schließen hPutStr vonh eing -- Text druch die Röhre (verzögert) hClose vonh -- Vater-Schreibende schließen (Just ps) - getProcessStatus True True pid -- auf angehaltenes Kind warten if ps == Exited ExitSuccess then return () else failIO (Kommando fehlgeschlagen mit ++ show ps ++ :\n ++ kommando prog par) ) (\err - do errno - getErrorCode hPutStrLn stderr (Aufruf fehlgeschlagen: ... | ++ kommando prog par ++ \nerrno = ++ show errno) ioError err) executeFile' :: FilePath -- Command - Bool -- Search PATH? - [String] -- Arguments - Maybe [(String, String)] -- Environment - IO a executeFile' cmd args env ca = do executeFile cmd args env ca hPutStrLn stderr (Kommando ++ cmd ++ nicht gefunden) exitFailure failIO :: String - IO a failIO meld = do hPutStrLn stderr meld exitFailure kommando :: String - [String] - String kommando k par = concat (intersperse (map shell_quote (k:par))) shell_quote :: String - String shell_quote txt = let need_to_quote c = c `elem` ' \t\n\\\|;()!{}*[?]^$`# quote (z:zs) = if (z `elem` \$`\\) then ('\\':(z:(quote zs))) else (z:(quote zs)) quote = \ in if any need_to_quote txt then '' : quote txt else txt import System import List import Posix import PosixUtil import IO main = do qinh - getContents let pfade = zeilen qinh mapM_ (\pfad - run ./echo [pfad]) pfade hPutStrLn stderr (received: ++ show (length qinh)) run :: FilePath-- Command - [String]-- Arguments - IO () run prog par = callIO (\ps - Kommando fehlgeschlagen mit ++ show ps ++ :\n ++ kommando prog par) (executeFile' prog True par Nothing) -- instead, to avoid bug: -- (hClose stdin executeFile' prog True par Nothing) -- Die gegebene Aktion als neuen Prozeß ausführen. Kind wegforken und auf -- dessen Beendigung warten. Sein Ergebnis überwachen und bei Fehler mit -- Meldung abbrechen. Der erste Parameter ist eine Funktion, die aus einem -- ExitStatus eine Fehlermeldung generiert. Sie wird nur mit (Exited -- (ExitFailure _)) oder (Terminated _) aufgerufen. callIO :: (ProcessStatus - String) -- Fehlermeldung erzeugen - IO ()
Re: Bug in IO libraries when sending data through a pipe?
Am 15. Mar 2002 um 14:39 MET schrieb Volker Wysk: - If instead the child's child (echo.c) closes stdin immediately after being executed, some data is lost. Where's the use in closing stdin when you're passing arguments as parameters? This is effectively a NOP and shouldn't influence the result, so I'd expect the same missing data as in your original case. -- Volker Stolz * [EMAIL PROTECTED] * PGP + S/MIME ___ Glasgow-haskell-bugs mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
Re: Bug in IO libraries when sending data through a pipe?
On Fre, 2002-03-15 at 15:05, Volker Stolz wrote: Am 15. Mar 2002 um 14:39 MET schrieb Volker Wysk: - If instead the child's child (echo.c) closes stdin immediately after being executed, some data is lost. Where's the use in closing stdin when you're passing arguments as parameters? This is effectively a NOP and shouldn't influence the result, so I'd expect the same missing data as in your original case. Yes, it's a NOP (just to be sure). The difference is in child.hs: callIO (\ps - Kommando fehlgeschlagen mit ++ show ps ++ :\n ++ kommando prog par) (executeFile' prog True par Nothing) -- instead, to avoid bug: -- (hClose stdin executeFile' prog True par Nothing) If you use the commented out line instead of the one in effect ((executeFile' prog...), you get the first behaviour, which is, the problem doesn't occur. Volker signature.asc Description: This is a digitally signed message part
RE: Bug in IO libraries when sending data through a pipe?
On Mon, 11 Mar 2002, Simon Marlow wrote: There seems to be a bug in the IO libraries. I'm using the following procedure to call an external program and send it data through a pipe. Could you send us a complete example that we can run to reproduce the problem? I've stripped down my program to produce an example. In the process, the problem disappeard a few times. I hope it shows up on your machine. The attached files reproduce it on my machine, but the exact results vary from run to run. Volker import System import List import Posix import PosixUtil import IO main = do qinh - getContents let pfade = zeilen qinh mapM_ (\pfad - run /bin/echo [pfad]) pfade hPutStrLn stderr (received: ++ show (length qinh)) run :: FilePath-- Command - [String]-- Arguments - IO () run prog par = callIO (\ps - Kommando fehlgeschlagen mit ++ show ps ++ :\n ++ kommando prog par) (executeFile' prog True par Nothing) -- Die gegebene Aktion als neuen Prozeß ausführen. Kind wegforken und auf -- dessen Beendigung warten. Sein Ergebnis überwachen und bei Fehler mit -- Meldung abbrechen. Der erste Parameter ist eine Funktion, die aus einem -- ExitStatus eine Fehlermeldung generiert. Sie wird nur mit (Exited -- (ExitFailure _)) oder (Terminated _) aufgerufen. callIO :: (ProcessStatus - String) -- Fehlermeldung erzeugen - IO () -- Kindprozeß - IO () callIO fm io = do maybepid - forkProcess case maybepid of Nothing - -- Kind io exitWith ExitSuccess Just pid - do -- Vater (Just ps) - getProcessStatus True True pid -- auf angehaltenes Kind warten if ps == Exited ExitSuccess then return () else failIO (fm ps) zeilen :: String - [String] zeilen txt = let gruppen = groupBy (\a b - (a == '\n') == (b == '\n')) txt in filter (\str - filter (/= '\n') str /= ) gruppen kommando :: String - [String] - String kommando k par = concat (intersperse (map shell_quote (k:par))) executeFile' :: FilePath -- Command - Bool -- Search PATH? - [String] -- Arguments - Maybe [(String, String)] -- Environment - IO a executeFile' cmd args env ca = do executeFile cmd args env ca hPutStrLn stderr (Kommando ++ cmd ++ nicht gefunden) exitFailure failIO :: String - IO a failIO meld = do hPutStrLn stderr meld exitFailure shell_quote :: String - String shell_quote txt = let need_to_quote c = c `elem` ' \t\n\\\|;()!{}*[?]^$`# quote (z:zs) = if (z `elem` \$`\\) then ('\\':(z:(quote zs))) else (z:(quote zs)) quote = \ in if any need_to_quote txt then '' : quote txt else txt 0 :: parent child ./parent /dev/null parent : parent.hs ghc -O -o parent parent.hs -package posix -package util -package std -syslib lang child : child.hs ghc -O -o child child.hs -package posix -package util -package std -syslib lang clean :: rm -f *.o *~ parent child *.hi import System import List import Posix import PosixUtil import IO main = do pipeto senden ./child [] hPutStrLn stderr (sent: ++ show (length senden)) senden = concat (take 200 (repeat xxx\n)) pipeto :: String - String - [String] - IO () pipeto eing prog par = do catch (do (zu, von) - createPipe -- Pipe von -- zu erzeugen vonh - fdToHandle von -- Für von brauchen wir einen Handle hSetBuffering vonh NoBuffering -- nichts zurückhalten, wenn der Kindprozeß es lesen will mpid - forkProcess -- fork(). Danach hat die Pipe vier offene Enden. case mpid of Nothing - do-- Kind hClose vonh -- Kind-Schreibende schließen dupTo zu (intToFd 0) -- Kind-Leseende auf die Standardeingabe kopieren fdClose zu-- erstes Kind-Leseende schließen executeFile' prog True par Nothing Just pid - do -- Vater fdClose zu-- Vater-Leseende schließen hPutStr vonh eing -- Text druch die Röhre (verzögert)