Re: [Haskell] Re: haskell.org Public Domain
Ashley Yakeley wrote: I think we're going for public domain, assuming we can also add text to satisfy German law, etc. AIUI, the main problem with the notion of public domain under typical European copyright law is that authors have moral rights (e.g. the right of attribution and to prohibit defacement) which are inalienable, i.e. any statement waiving or rescinding such rights is void and unenforceable. IOW, no matter what language the licence uses, the author retains the right to sue for violations of their moral rights. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] haskell.org Public Domain
Ashley Yakeley wrote: Sounds like a stupid idea? Thought so. A wiki should be public domain, plain and simple. (Put contributions with a different license somewhere else and link to them. No big deal.) There seems to be a consensus for public domain both here and on the wiki page. http://haskell.org/haskellwiki/HaskellWiki:Community_Portal Does anyone have any objections to putting everything in the public domain? Insisting that everything is in the public domain prevents the inclusion of third-party content, unless either: a) that content is also in the public domain (which is unusual; even content which is freely redistributable usually has some kind of restriction, even if it's only an acknowledgement requirement), or b) you can obtain a specific exemption from its author (assuming that you can actually identify and locate the author, which isn't always easy for projects with a long history and many contributors). -- Glynn Clements [EMAIL PROTECTED] ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] GHC and GLUT
Wolfgang Jeltsch wrote: Does installing the libsm-dev package help? -- Mark Thank you, Mark and Jared. Installing the libsm-dev package resulted in ld outputting a similar error message for -lXmu. After also installing libxmu-dev, linking was possible. Are libsm-dev and/or libxmu-dev needed for every Haskell GLUT application or just for certain examples? If they are always needed, some package has not declared all its dependencies. In this case, which package is the one with the dependency declaration bug? GLUT uses XmuLookupStandardColormap from libXmu. libXmu requires libXt, which in turn requires libSM. As that's the only Xmu function which GLUT uses, I would have thought that it would be worth making the effort to remove that particular dependency from GLUT. Insofar as it's a bug, it's in the dependency list for GLUT. Ultimately, the correct dependency list for the Haskell GLUT package is GLUT, plus whatever GLUT happens to require on your particular system. But I don't know whether the dependency list can be generated dynamically. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell-cafe] How to print a string (lazily)
Donn Cave wrote: I sometimes call a function with side-effects in IO a command. But the terms are fungible. But calling putStr a function is correct. It is not a pure function however. Is that the standard party line? I mean, we all know its type and semantics, whatever you want to call them, but if we want to put names to things, I had the impression that the IO monad is designed to work in a pure functional language - so that the functions are indeed actually pure, including putStr. putStr is a pure function, but it isn't a pure function ;) OTOH, getLine isn't even a function, just a value. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Bug in Haskell for C programmers tutorial?
Cale Gibbard wrote: You shouldn't have to flush output manually. Which implementation are you using? Try importing System.IO and doing: hGetBuffering stdout = print and see what gets printed. It should be NoBuffering. The buffering for stdout should be LineBuffering if stdout is a terminal and BlockBuffering otherwise. The buffering for stderr should always be NoBuffering. It's actually not, if you're starting your program from ghci, which is what confused me. From GHCi, you get NoBuffering on stdout, and LineBuffering on stdin, which is sane for interactive programs. Why anyone would want LineBuffering as default on stdout is somewhat mysterious to me. Because most programs output entire lines, and the implementation of Haskell's putStr etc sucks when using NoBuffering (one write() per character). C follows the same rules (stdin/stdout use line buffering for terminals, block buffering otherwise, stderr is always unbuffered), even though C's puts(), printf() etc behave a lot better with unbuffered streams (they pass either whole strings or large chunks to write() rather than individual characters). -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Bug in Haskell for C programmers tutorial?
Cale Gibbard wrote: You shouldn't have to flush output manually. Which implementation are you using? Try importing System.IO and doing: hGetBuffering stdout = print and see what gets printed. It should be NoBuffering. The buffering for stdout should be LineBuffering if stdout is a terminal and BlockBuffering otherwise. The buffering for stderr should always be NoBuffering. If for whatever reason it's not, you can set it to that at the start of your programs with hSetBuffering stdout NoBuffering I would suggest using an explit hFlush after each putStr rather than disabling buffering altogether, as disabling buffering will result in putStr etc calling write() once per character, which is very inefficient. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell vs OCaml
Branimir Maksimovic wrote: Could you give an example of a loop you find awkward in Haskell? Well I want simple loop for(int i =0;i10;++i)doSomething(i); mapM_ doSomething [0..9] -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Functions with side-effects?
Daniel Carrera wrote: I'm a Haskell newbie and I don't really understand how Haskell deals with functions that really must have side-effects. Like a rand() function or getLine(). Those aren't functions. A function is a single-valued relation, i.e. a (possibly infinite) set of ordered pairs x,y such that the set doesn't contains two pairs a,b and c,d where a == c and b =/= d. IOW, a static mapping from argument to result. Haskell uses the term function to mean a function in the strict mathematical sense, and not (like most other languages) to mean a procedure which returns a value as well as reading and writing some implicit state. I know this has something to do with monads, but I don't really understand monads yet. Is there someone who might explain this in newbie terms? I don't need to understand the whole thing, I don't need a rand() function right this minute. I just want to understand how Haskell separates purely functional code from non-functional code (I understand that a rand() function is inevitably not functional code, right?) All Haskell code is functional (discounting certain low-level details such as unsafePerformIO). Side effects are implemented by making the prior state an argument and the new state a component of the result, i.e. a C procedure of type: res_t foo(arg_t); becomes a Haskell function with type: ArgType - State - (State, ResType) To simplify coding (particularly, making sure that you use the correct iteration of the state at any given point), all of this is usually wrapped up in an instance of the Monad class. But there isn't anything special about Monad instances. The class itself and many of its instances are written in standard Haskell. To provide a concrete example, here's a monadic random number generator: type Seed = Int data Rand a = R { app :: Seed - (Seed, a) } myRand :: Rand Int myRand = R $ \seed - let result = (seed' `div` 65536) `mod` 32768 seed' = seed * 1103515245 + 12345 in (seed', result) instance Monad Rand where f = g = R $ \seed - let (seed', x) = app f seed in app (g x) seed' return x = R $ \seed - (seed, x) runR :: Seed - Rand a - a runR seed f = snd $ app f seed Example usage: randomPair :: Rand (Int, Int) randomPair = do myRand = \x - myRand = \y - return (x, y) or, using do notation (which is simply syntactic sugar): randomPair :: Rand (Int, Int) randomPair = do x - myRand y - myRand return (x, y) main = print $ runR 99 randomPair The main difference between the built-in IO monad and the Rand monad above is that where the Rand monad has a Seed for its state, the IO monad has the (conceptual) World type. As the World type has to represent the entire observable state of the universe, you can't actually obtain instances of it within a Haskell program, and thus there is no equivalent to runR. Instead, you provide an IO instance (main) to the runtime, which (conceptually) applies it to the World value representing the state of the universe at program start, and updates the universe to match the World value returned from main at program end. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: syscall, sigpause and EINTR on Mac OSX
Joel Reymont wrote: This should be enough reason to scan for keyboard events instead. There is no guarantee that SIGINT would be sent only by keyboard. import System.Posix.Signals main = do installHandler sigINT Ignore Nothing x - getChar if x == '\ETX' then do print Gotcha! else do print Try again! main This does not work for ^C. Can it actually be done? Of course I can just read q but that would be too simple :-). You have to put the terminal into raw mode to be able to read any character which is normally processed by the TTY driver, e.g.: import System.Posix.Terminal atts - getTerminalAttributes (handleToFd stdin) let atts' = withoutMode atts ProcessInput setTerminalAttributes (handleToFd stdin) atts' Immediately This disables all input processing, e.g. line-editing and CR-LF translation (i.e. pressing the Enter/Return key will result in CR, not LF). Remember to set it back before exiting. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Records (was Re: [Haskell] Improvements to GHC)
Sebastian Sylvan wrote: How about (¤)? It looks like a ring to me, I'm not sure where that's located on a EN keyboard, but it's not terribly inconvenient on my SE keyboard. f ¤ g looks better than f . g for function composition, if you ask me. That symbol actually does look better, but isn't on any English keyboards to the best of my knowledge. I can get it in my setup with compose-key o x, but not many people have a compose key assigned. Also, this may just be a bug, but currently, ghc gives a lexical error if I try to use that symbol anywhere, probably just since it's not an ASCII character. Hmm. On my keyboard it's Shift+4. Strange that it's not available on other keyboards. As far as I know that symbol means nothing particularly swedish. In fact, I have no idea what it means at all =) It's a generic currency symbol (the X11 keysym is XK_currency). It doesn't exist on a UK keyboard (where Shift-4 is the dollar sign). In any case, using non-ASCII characters gives rise to encoding issues (e.g. you have to be able to edit UTF-8 files). -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Converting [Word8] to String
Tomasz Zielonka wrote: How do I convert a list of bytes to a string? I assume you don't care about Unicode: That should have said I assume that the data is encoded using ISO-8859-1 (or a subset thereof, e.g. US-ASCII). map (Char.chr . fromIntegral) or map (toEnum . fromEnum) For anything else, you will have to either to write a decoder (or use someone else's; several exist for UTF-8), or interface to iconv() using the FFI. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] newbe question
[EMAIL PROTECTED] wrote: obviously, Hugs thinks that =- is a special operator. In Haskell you have the ability to define your own operators, so it would be possible to define an operator =-. I would suggest that you always put spaces around the = in declarations. Best wishes, Wolfgang Hello, thank you for fast reply. Ok, but what is the semantic of '=-' ? If it's an operator, it should have some impact (right term?). It isn't defined in the prelude or any of the standard libraries. The point is that the Haskell tokeniser treats any consecutive sequence of the symbols !#$%*+./=[EMAIL PROTECTED]|-~ as a single operator token. This occurs regardless of whether a definition exists for the operator. More generally, the tokenising phase is unaffected by whether or not an operator, constructor, identifier etc is defined. A specific sequence of characters will always produce the same sequence of tokens regardless of what definitions exist. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] Haskell versus Lisp
David F. Place wrote: I don't deny that all of the things you mentioned are wonderful indeed. I just wonder if they really could only be done in lisp or even most conveniently. Obviously, if you can do it in Lisp, you can do it in any Turing-complete language; in the worst case, you just write a Lisp interpreter. As for convenience: syntax matters. The equivalence of code and data in Lisp lets you write your own syntactic sugar. You're still bound by the lexical (token-level) grammar, although reader macros mean that isn't much of a restriction. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] Haskell versus Lisp
John Meacham wrote: In Haskell, code is data too because code in the sense of imperative actions is described by IO values. You cannot analyse them. And thus they are not data. Huh? I'd say they are not /concrete/ data, but (abstract) data they surely are(?) and you are certainly free to turn them into concrete data by creating your own data type which you then can inspect and modify and then interpret. IOW, you are free to write a Lisp interpreter in Haskell. But it's a lot easier to do it in Lisp. That, in a nutshell, is Lisp's key strength. It uses the same structure for code as for data, which makes it very easy to add new language features. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] Re: [Haskell-cafe] Haskell versus Lisp
David F. Place wrote: That, in a nutshell, is Lisp's key strength. It uses the same structure for code as for data, which makes it very easy to add new language features. I assume that you refer to `eval' and the fact it operates on conses and symbols. Beyond the extremely contrived example of a metacircular interpreter, what are some examples of the benefits of this feature of lisp? What are some examples of language features that are easy to add? Well, to state the obvious, being able to extend or replace the language's syntax and semantics. In particular, being able to do so locally. Probably the most useful consequence is the ability to create new control constructs without being constrained by the existing syntax and semantics (and without having to write your own monadic versions of existing functions). -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Trapped by the Monads
Mark Carter wrote: Could you briefly elaborate on what you mean by hybrid variables? According to Google, hybrid in genetics means The offspring of genetically dissimilar parents or stock, especially the offspring produced by breeding plants or animals of different varieties, species, or races. It's kind of like that - but for variables. The typical example in C is: mem = malloc(1024) Malloc returns 0 to indicate that memory cannot be allocated, or a memory address if it can. The variable mem is a so-called hybrid variable; it crunches together 2 different concepts: a boolean value (could I allocate memory?) and an address value (what is the address where I can find my allocated memory). Well in that case, Maybe provides the perfect example of how to implement hybrid variables correctly. The types Ptr a and Maybe (Ptr a) are distinct. If you try to pass the latter to a function which expects the former, you'll get a compile-time error. You first have to extract the underlying value, which means that you need to match against (Just x). If the wrapped value is Nothing, you'll get an exception. Furthermore, if you forget to handle the Nothing case, you'll get a compile-time warning. In C, there's no way to distinguish (using the type system) between a possibly-null pointer and a non-null pointer. Using a pair of a boolean and a pointer is the wrong approach because the pointer is meaningless if the boolean is false, but the type system won't prevent you from using the value of the pointer in that case. A more general example is structures where certain fields are only valid in certain circumstances (e.g. depending upon the type field). Haskell-style sum types, (of which Maybe is an example) are a much better solution, as the the fields only exist when they are meaningful. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Re: [Haskell-cafe] Haskell versus Lisp
Wolfgang Jeltsch wrote: Bearing this in mind, and hoping you can see where I'm coming from, I think my question is: shouldn't you guys be using Lisp? Lisp is impure, weakly typed and has way too many parentheses. Why would we use lisp? It seems to be lacking almost all the advantages of Haskell, and have an ugly, inflexible syntax to boot. The ability to dynamically generate, manipulate and analyse code in a structured manner provides a flexibility which is unmatched by any other language I know of. A good example is Emacs; lisp is entirely the right language for that, IMHO. Could you explain this a bit more, please? To the moment, I cannot imagine cases where you need LISP's way of code analysis and manipulation because Haskell's capabilities are not sufficient. In Haskell, code is data too because code in the sense of imperative actions is described by IO values. You cannot analyse them. And thus they are not data. But you can use your do expressions etc. to construct action descriptions with a more general type like MonadIO m = m a. Then you can instantiate m with a monad whose values store part of the action's structure so that this information can be used later. Or you use a monad which doesn't keep structural information to use it for later processing but which does the processing upon construction. Yeah, but this is heading in the direction of Greenspun's Tenth Rule of Programming: Any sufficiently complicated C or Fortran program contains an ad hoc informally-specified bug-ridden slow implementation of half of Common Lisp. You could easily end up doing the same thing in Haskell. The main thing about Lisp is that it tends to make it fairly easy to write something close to the ideal language for the task in hand without starting from the ground floor. You get stuck with Lisp's token syntax, and the semantics of its core primitives, but you can replace anything else. Every other language (including Haskell) tends to have the problem that eventually you will encounter a situation where the language's own worldview gets in the way. Or, to put it another way: if Haskell is so flexible, why do we need Template Haskell? I can't imagine a Template Lisp; it would just be Lisp. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] Re: [Haskell-cafe] Haskell versus Lisp
Tomasz Zielonka wrote: Every other language (including Haskell) tends to have the problem that eventually you will encounter a situation where the language's own worldview gets in the way. Are you sure that lisp's worldview never gets in the way? I wouldn't say never. But it's main advantage is that it doesn't really have much of a worldview. Its primary composite data type is the heterogeneous linked list, which is isomorphic to both binary trees and n-ary trees. This provides a reasonable fit for most common data structures, and also for most languages (anything defined by a recursive grammar can be represented as a parse tree). The complete absence of keywords is another useful feature (I've lost track of the various C packages which had to have identifiers named class renamed to allow for C++). Even quote is just another symbol. Ultimately, all languages are limited by their choice of primitives. Or, to put it another way: if Haskell is so flexible, why do we need Template Haskell? It's nice to have Template Haskell, but saying that we need it is a bit of an overstatement. In the GHC Survey 2005 only 9% of people said it's essential. Well, OK, I was one of them, but I think you know what I mean. I can't imagine a Template Lisp; it would just be Lisp. The power of lisp macros is often overrated. I remember a long discussion crossposted on comp.lang.lisp an comp.lang.functional. The lisp advocates gave examples for how macros allow to do things supposedly unavailable in other languages. Surprisingly, most of these things were equally easy to do with higher-order functions and closures in Haskell. I am sure that lisp gurus can achieve great things with macros, but I'm not sure they are the best tool for software engineering problems. I think they can make the code more difficult to understand, make the semantics less uniform (despite the uniform syntax), and can become an abused ugly hack. Well, this is heading towards the inevitable paradox (Gödel's theorem, halting problem, etc). If you allow the programmer to escape from your chosen semantic model, you no longer have the luxury of being able to assume those semantics. Ultimately, the issue isn't whether shooting oneself in the foot is a good idea, it's whether you leave it up to the language or to the programmer to prevent it. Both have their pros and cons. In that regard, Lisp and Haskell are almost opposite extremes, with more conventional languages inbetween. Haskell's safety and consistency can get in the way, while Lisp's freedom can be quite unsafe and inconsistent. Don't get me wrong - I still think that lisp is one of the best programming languages around and from time to time I am trying to learn a bit of it. One of the things that puts me off is the attitude of its community - it seems to be very close minded. Hmm. That depends upon which faction of the community you're dealing with. If you get into discussions about the merits of Lisp on public fora, you'll likely be dealing with the evangelists. Language evangelists are often closed-minded whatever the language. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell-cafe] Haskell versus Lisp
David Roundy wrote: Bearing this in mind, and hoping you can see where I'm coming from, I think my question is: shouldn't you guys be using Lisp? Lisp is impure, weakly typed and has way too many parentheses. Why would we use lisp? It seems to be lacking almost all the advantages of Haskell, and have an ugly, inflexible syntax to boot. The ability to dynamically generate, manipulate and analyse code in a structured manner provides a flexibility which is unmatched by any other language I know of. A good example is Emacs; lisp is entirely the right language for that, IMHO. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] mailing list headaches
Frederik Eaton wrote: However, threading by References, which RFC 2822 says SHOULD be possible, and which works on my other folders, doesn't work well on Haskell mailing lists. Presumably the issue is that there are a large number of Windows users with strange mail clients which don't insert References headers. It isn't so much that there are a large number of such users, but that two of the core developers are among them (and are both employed by Microsoft, so RFC-conformance probably isn't an option). -- Glynn Clements [EMAIL PROTECTED] ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell-cafe] Newbie question
André Vargas Abs da Cruz wrote: I think this is a totally newbie question as i am a complete novice to Haskell. I am trying to write down a few programs using GHC in order to get used with the language. I am having some problems with a piece of code (that is supposed to return a list of lines from a text file) which I transcribe below: module Test where import IO readDataFromFile filename = do bracket (openFile filename ReadMode) hClose (\h - do contents - hGetContents h return (lines contents)) The question is: if i try to run this, it returns me nothing (just an empty list). Why does this happen ? When i remove the return and put a print instead, it prints everything as i expected. hGetContents reads the file lazily; it won't actually read anything until you try to consume the result. However, by that point, you will have called hClose. In general, you shouldn't use hClose in conjunction with lazy I/O (hGetContents etc) unless you are certain that the data will have been read. When you put the print in place of the return, you force the data to be consumed immediately, so the issue doesn't arise. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] ST/STRef vs. IO/IORef
Srinivas Nedunuri wrote: Hello, I have some code that manipulates STRefs within the ST monad. All good and fine, until I come across some computation that uses lets say IO and everything skids to a halt. At this point I have 3 choices: 1. Define a ST State Transformer monad and do all my previous ST computations in that 2. convert all subsequent ST computations into IO computations using stToIO 3. stop using the ST monad and do everything in the IO monad I was wondering what advice folks had. In particular, what are the disadvantages to doing everything in the IO monad - ie why even bother with the ST monad? The most obvious disadvantage is that the IO monad has no equivalent of runST. Also, there is no ioToST (only unsafeIOToST), so if you use the IO monad, the code can only be used within the IO monad. The IO monad is like a trap; once your inside, you can't get out. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell-cafe] Interaction in Haskell
Dinh Tien Tuan Anh wrote: Yes, it is certainly not Hugs which prevents from realtime interaction but it is the terminal you are using. If the terminal lets you delete the characters on the current line it has to keep them until you complete it with ENTER. Piping from and to other programs or files may not have this problem. You're right, i've been using shell in Emacs to run Hugs, but when back to normal terminal, it works. Just for curiousity, why does it happen ? Emacs' shell-mode doesn't send anything to the terminal driver until you hit Return, at which point it sends the whole line. Note that the terminal driver itself normally does line-buffering, although this is disabled if you set the buffering for stdin to NoBuffering. You can't disable the line-buffering inherent in Emacs' shell-mode; you would have to use terminal-emulator instead (M-x term instead of M-x shell). -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] matrix computations based on the GSL
Keean Schupke wrote: So the linear operator is translation (ie: + v)... effectively 'plus' could be viewed as a function which takes a vector and returns a matrix (operator) (+) :: Vector - Matrix Since a matrix _is_ not a linear map but only its representation, this would not make sense. As I said (v+) is not a linear map thus there is no matrix which represents it. A linear map f must fulfill f 0 == 0 But since v+0 == v the function (v+) is only a linear map if 'v' is zero. I can't see how to fit in your vector extension by the 1-component. Eh? Translation is a linear operation no? No. It's affine, but not linear. As Henning said, to be linear, it must map zero to zero. Adding vectors translates the first by the second (or the second by the first - the two are isomorphic)... A translation can be represented by the matrix: 1 0 0 0 0 1 0 0 0 0 1 0 dx dy dz 1 So the result of v+ is this matrix. No. If the above matrix is M, then: [x y z w].M = [x+w.dx y+w.dy z+w.dz w] which isn't a translation. In the specific case of homogeneous coordinates, where: h [x y z] = [x y z 1] h' [x y z w] = [x/w y/w z/w] then \v - h'(h(v).M) is a translation, but M isn't itself a translation. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] New to Haskell, suggestions on code
Flavio Botelho wrote: At many places i have put a Char type instead of an abstract one because some funcations were not working properly before and i wanted to be able to output things and so be able to see what was the problem. (Haskell doesnt seem a 'magic' function to output arbitrary structures? That would be quite helpful for debugging) The show method in the Show class generates a string representation of an instance. The print function can be used to print the string representation of any instance of Show to stdout. All standard types except for functions are instances of Show, and Haskell can automatically derive Show instances for user defined types, provided that all of the constituent types are instances of Show. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Specify array or list size?
Thomas Davie wrote: I'm not familiar with your C++ example (not being familiar with C++), but I think that it's a bit of a stretch of the imagination to say that C introduces a variable of type array of 50 ints, the fact that this is now an array of 50 integers is never checked at any point in the compilation or run, and I'm not sure it can be even if KR had wanted to. The size is taken into account when such array type is an element of another array, and by sizeof. int (*p)[50]; /* p may legally point only to arrays of 50 ints each */ ++p; /* p is assumed to point into an array, and is moved by one element, i.e. by 50 ints */ I'm not sure what you're trying to prove by saying that... There is still no type information that says that the contents of p are an array of 50 elements... Put it this way, then: 1 void foo(void) 2 { 3 int a[2][50]; 4 int b[2][60]; 5 int (*p)[50]; 6 7 p = a; 8 p = b; } $ gcc -c -Wall foo.c foo.c: In function `foo': foo.c:8: warning: assignment from incompatible pointer type In line 7, an expression of type int (*)[50] is assigned to a variable of type int (*)[50], which is OK. In line 8, an expression of type int (*)[60] is assigned to a variable of type int (*)[50], and the compiler complains. I can still attempt to access element 51 and get a runtime memory error. That's because C doesn't do bounds checking on array accesses. It has nothing to do with types. The type of p is still int**, No it isn't. int** and int (*)[50] are different types and have different run-time behaviour. not pointer to array of 50 ints Yes it is. The semantics of C pointer arithmetic mean that the size of the target is an essential part of the pointer type. In C, arrays and pointers are *not* the same thing. They are often confused because C does several automatic conversions: 1. When used as an expression (rather than an lvalue), arrays are automatically converted to pointers. Arrays only ever occur as lvalues, never as expressions. 2. In a declaration, the x[...] syntax indicates that x is an array, but in an expression, x must be a pointer (which includes an array which has been converted to a pointer due to rule 1 above). 3. When declaring function arguments, you can use T x[] or T x[N] as an alternative syntax for T *x; x is still a pointer, regardless of the syntax used. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Compiling with NHC98
Daniel Carrera wrote: I haven't used NHC so I can't guarantee this will work, but try doing something like this: $ nhc98 -c RC4.hs $ nhc98 -c prng.hs $ nhc98 RC4.o prng.o -o prng Yay! It does. And I just put it in a makefile: ---daniel's makefile COMPILER=nhc98 RC4.o: $(COMPILER) -c RC4.hs prng.o: $(COMPILER) -c prng.hs prng: RC4.o prng.o $(COMPILER) RC4.o prng.o -o prng ---daniel's makefile This can fail with a parallel make, which may try to compile RC4.hs and prng.hs concurrently. It can also fail if you rebuild after modifying any of the files, as it won't realise that it needs to re-compile. To handle that, you need to be more precise about the dependencies, i.e.: RC4.o RC4.hi: RC4.hs $(HC) -c RC4.hs prng.o prng.hi: prng.hs RC4.hi $(HC) -c prng.hs prng: RC4.o prng.o $(HC) RC4.o prng.o -o prng With most make programs (e.g. GNU make), you can use pattern rules to avoid repeating the commands, e.g.: # how to compile any .hs file to produce .o and .hi files %.o %.hi: %.hs $(HC) -c $ # how to build the prng program prng: RC4.o prng.o $(HC) -o $@ $+ # note that prng.o depends upon RC4.hi prng.o: RC4.hi -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell] Should inet_ntoa Be Pure?
Axel Simon wrote: Does anyone know why these are in the IO monad? Aren't they pure functions converting between dotted-decimal strings and a 32-bit network byte ordered binary value? I guess the answer is no for both: The first one can fail That doesn't mean that it should be in the IO monad; using Maybe would suffice. and the second one overwrites a fixed string buffer (yuck!). From the man page: The return value from inet_ntoa() points to a buffer which is overwritten on each call. This buffer is implemented as thread-specific data in multithreaded applications. Hence ntoa needs to be an IO action so that the value is read immediately before the next ntoa call is executed. That shouldn't be an issue so long as the buffer contents are converted to a Haskell String before the function is called again within the same thread. However, I wouldn't rely upon all implementations of inet_ntoa() being thread-safe. What you could do is to apply unsafePerformIO to [snip] Or you could just re-implement the functions in Haskell. Apart from the re-entrancy issues with inet_ntoa(), many implementations of inet_addr() have misfeatures, e.g. allowing octets to be expressed in octal or hex, or allowing numbers outside of the 0-255 range (in which case, the top bits overflow into the next octet). -- Glynn Clements [EMAIL PROTECTED] ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [HOpenGL] Re: OpenGL/GLUT examples crashing: known problem?
Claus Reinke wrote: Btw, is there a way to reset the opengl system to a sane state in software? Or are there some invalid assumptions about default state in the other examples? If OpenGL is getting stuck in a non-functional state, that indicates a bug in the driver. -- Glynn Clements [EMAIL PROTECTED] ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: [Haskell] Memoization in Haskell
Bright Sun wrote: I can not understand memoization in Haskell. I can not find an example except the fib. For example, I want drop a list 4 times like, drop 4 (drop 3 (drop 2 (drop 1 [1,2,3,4,5,6,7,8,9,10,11,12]))) [11,12] can I implement a memoization funcation like droploop [1,2,3,4] [1,2,3,4,5,6,7,8,9,10,11,12] to get same result [11,12] That isn't memoization. Memoization is where you record a set of prior argument/result pairs (i.e. a lookup table) to avoid having to perform the exact same calculation repeatedly. You can implement droploop as a fold, e.g.: droploop ns xs = foldr drop xs ns -- Glynn Clements [EMAIL PROTECTED] ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] URLs in haskell module namespace
S. Alexander Jacobson wrote: As I move from machine to machine, it would be nice not to have to install all the libraries I use over and over again. I'd like to be able to do something like this: import http://module.org/someLib as someLib If the requested module itself does local imports, the implementation would first try to resolve the names on the client machine and otherwise make requests along remote relative paths. Embedding the path in the source code seems like a bad idea. If you want to allow modules to be loaded via URLs, it would make more sense to extend GHC's -i switch, i.e. ghc -i http://module.org/ ... -- Glynn Clements [EMAIL PROTECTED] ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell-cafe] invalid character encoding
John Meacham wrote: I'm not suggesting inventing conventions. I'm suggesting leaving such issues to the application programmer who, unlike the library programmer, probably has enough context to be able to reliably determine the correct encoding in any specific instance. But the whole point of Foreign.C.String is to interface to existing C code. And one of the most common conventions of said interfaces is to represent strings in the current locale, Which is why locale honoring conversion routines are useful. My point is that most C functions which accept or return char*s will work regardless of whether those char*s can be decoded according to the current locale. E.g. while (d = readdir(dir), d) { stat(d-d_name, st); ... } will stat() every filename in the directory regardless of whether or not the filenames are valid in the locale's encoding. The Haskell equivalent using FilePath (i.e. String), getDirectoryContents etc currently only works because the char* - String conversions are hardcoded to ISO-8859-1, which is infallible and reversible. If it used e.g. UTF-8, it would fail on any filename which wasn't valid UTF-8 even though it never actually needs to know the string of characters which the filename represents. The same applies to reading filenames from argv[] and passing them to open() etc. This is one of the most common idioms in Unix programming, and it doesn't care about encodings at all. Again, it would cease to work reliably in Haskell if the automatic char* - String conversions in getArgs etc started using the locale. I'm not arguing about *how* char* - String conversions should be performed so much as arguing about *whether* these conversions should be performed. The conversion issues are only problems because the conversions are being done at all. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] invalid character encoding
Einar Karttunen wrote: In what way is ISO-2022 non-reversible? Is it possible that a ISO-2022 file name that is converted to Unicode cannot be converted back any more (assuming you know for sure that it was ISO-2022 in the first place)? I am no expert on ISO-2022 so the following may contain errors, please correct if it is wrong. ISO-2022 - Unicode is always possible. Also Unicode - ISO-2022 should be always possible, but is a relation not a function. This means there are an infinite? ways of encoding a particular unicode string in ISO-2022. ISO-2022 works by providing escape sequences to switch between different character sets. One can freely use these escapes in almost any way you wish. Exactly. Moreover, while there are an infinite number of equivalent representations in theory (you can add as many redundant switching sequences as you wish), there are multiple plausible equivalent representations in practice. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] invalid character encoding
Marcin 'Qrczak' Kowalczyk wrote: I'm talking about standard (XSI) curses, which will just pass printable (non-control) bytes straight to the terminal. If your terminal uses CP437 (or some other non-standard encoding), you can just pass the appropriate bytes to waddstr() etc and the corresponding characters will appear on the terminal. Which terminal uses CP437? Most software terminal emulators can use any encoding. Traditional comms packages tend to support this (including their own VGA font if necessary) because of its widespread use on BBSes which were targeted at MS-DOS systems. There exist hardware terminals (I can't name specific models, but I have seen them in use) which support this, specifically for use with MS-DOS systems. Linux console doesn't, except temporarily after switching the mapping to builtin CP437 (but this state is not used by curses) or after loading CP437 as the user map (nobody does this, and it won't work properly with all characters from the range 0x80-0x9F anyway). I *still* encounter programs written for the linux console which assume that the built-in CP437 font is being used (if you use an ISO-8859-1 font, you get dialogs with accented characters where you would expect line-drawing characters). You can treat it as immutable. Just don't call setlocale with different arguments again. Which limits you to a single locale. If you are using the locale's encoding, that limits you to a single encoding. There is no support for changing the encoding of a terminal on the fly by programs running inside it. If you support multiple terminals with different encodings, and the library uses the global locale settings to determine the encoding, you need to switch locale every time you write to a different terminal. The point is that a single program often generates multiple streams of text, possibly for different audiences (e.g. humans and machines). Different streams may require different conventions (encodings, numeric formats, collating orders), but may use the same functions. A single program has a single stdout and a single filesystem. The contexts which use the locale encoding don't need multiple encodings. Multiple encodings are needed e.g. for exchanging data with other machines for the network, for reading contents of text files after the user has specified an encoding explicitly etc. In these cases an API with explicitly provided encoding should be used. A API which is used for reading and writing text files or sockets is just as applicable to stdin/stdout. The current locale mechanism is just a way of avoiding the issues as much as possible when you can't get away with avoiding them altogether. It's a way to communicate the encoding of the terminal, filenames, strerror, gettext etc. It's *a* way, but it's not a very good way. It sucks when you can't apply a single convention to everything. It's not so bad to justify inventing our own conventions and forcing users to configure the encoding of Haskell programs separately. I'm not suggesting inventing conventions. I'm suggesting leaving such issues to the application programmer who, unlike the library programmer, probably has enough context to be able to reliably determine the correct encoding in any specific instance. Unicode has no viable competition. There are two viable alternatives. Byte strings with associated encodings and ISO-2022. ISO-2022 is an insanely complicated brain-damaged mess. I know it's being used in some parts of the world, but the sooner it will die, the better. ISO-2022 has advantages and disadvantages relative to UTF-8. I don't want to go on about the specifics here because they aren't particularly relevant. What's relevant is that it isn't likely to disappear any time soon. A large part of the world already has a universal encoding which works well enough; they don't *need* UTF-8, and aren't going to rebuild their IT infrastructure from scratch for the sake of it. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] invalid character encoding
Wolfgang Thaller wrote: Of course, it's quite possible that the only test cases will be people using UTF-8-only (or even ASCII-only) systems, in which case you won't see any problems. I'm kind of hoping that we can just ignore a problem that is so rare that a large and well-known project like GTK2 can get away with ignoring it. 1. The filename issues in GTK-2 are likely to be a major problem in CJK locales, where filenames which don't match the locale (which is seldom UTF-8) are common. 2. GTK's filename handling only really applies to file selector dialogs. Most other uses of filenames in a GTK-based application don't involve GTK; they use the OS API functions which just deal with byte strings. 3. GTK is a GUI library. Most of the text which it deals with is going to be rendered, so it *has* to be interpreted as characters. Treating it as blobs of data won't work. IOW, on the question of whether or not to interpret byte strings as character strings, GTK is at the far end of the scale. Also, IIRC, Java strings are supposed to be unicode, too - how do they deal with the problem? Files are represented by instances of the File class: http://java.sun.com/j2se/1.5.0/docs/api/java/io/File.html An abstract representation of file and directory pathnames. You can construct Files from Strings, and convert Files to Strings. The File class includes two sets of directory enumeration methods: list() returns an array of Strings, while listFiles() returns an array of Files. The documentation for the File class doesn't mention encoding issues at all. However, with that interface, it would be possible to enumerate and open filenames which cannot be decoded. So we can't do Unicode-based I18N because there exist a few unix systems with messed-up file systems? Declaring such systems to be messed up won't make the problems go away. If a design doesn't work in reality, it's the fault of the design, not of reality. In general, yes. But we're not talking about all of reality here, we're talking about one small part of reality - the question is, can the part of reality where the design doesn't work be ignored? Sure, you *can* ignore it; KR C ignored everything other than ASCII. If you limit yourself to locales which use the Roman alphabet (i.e. ISO-8859-N for N=1/2/3/4/9/15), you can get away with a lot. Most such users avoid encoding issues altogether by dropping the accents and sticking to ASCII, at least when dealing with files which might leave their system. To get a better idea, you would need to consult users whose language doesn't use the roman alphabet, e.g. CJK or cyrillic. Unfortunately, you don't usually find too many of them on lists such as this. I'm only familiar with one OSS project which has a sizeable CJK user base, and that's XEmacs (whose I18N revolves around ISO-2022, and most of the documentation is in Japanese). Even there, there are separate mailing lists for English and Japanese, and the two seldom communicate. I think that if we wait long enough, the filename encoding problems will become irrelevant and we will live in an ideal world where unicode actually works. Maybe next year, maybe only in ten years. Maybe not even then. If Unicode really solved encoding problems, you'd expect the CJK world to be the first adopters, but they're actually the least eager; you are more likely to find UTF-8 in an English-language HTML page or email message than a Japanese one. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] invalid character encoding
Wolfgang Thaller wrote: If you try to pretend that I18N comes down to shoe-horning everything into Unicode, you will turn the language into a joke. How common will those problems you are describing be by the time this has been implemented? How common are they even now? Right now, GHC assumes ISO-8859-1 whenever it has to automatically convert between String and CString. Conversions to and from ISO-8859-1 cannot fail, and encoding and decoding are exact inverses. OK, so the intermediate string will be nonsense if ISO-8859-1 isn't the correct encoding, but that doesn't actually matter a lot of the time; frequently, you're just grabbing a blob of data from one function and passing it to another. The problems will only appear once you start dealing with fallible or non-reversible encodings such as UTF-8 or ISO-2022. If and when that happens, I guess we'll find out how common the problems are. Of course, it's quite possible that the only test cases will be people using UTF-8-only (or even ASCII-only) systems, in which case you won't see any problems. I haven't yet encountered a unix box where the file names were not in the system locale encoding. On all reasonably up-to-date Linux boxes that I've seen recently, they were in UTF-8 (and the system locale agreed). I've encountered boxes where multiple encodings were used; primarily web and FTP servers which were shared amongst multiple clients. Each client used whichever encoding(s) they felt like. IIRC, the most common non-ASCII encoding was MS-DOS codepage 850 (the clients were mostly using Windows 3.1 at that time). I haven't done sysadmin for a while, so I don't know the current situation, but I don't think that the world has switched to UTF-8 in the mean time. [Most of the non-ASCII filenames which I've seen recently have been either ISO-8859-1 or Win-12XX; I haven't seen much UTF-8.] On both Windows and Mac OS X, filenames are stored in Unicode, so it is always possible to convert them to unicode. So we can't do Unicode-based I18N because there exist a few unix systems with messed-up file systems? Declaring such systems to be messed up won't make the problems go away. If a design doesn't work in reality, it's the fault of the design, not of reality. Haskell's Unicode support is a joke because the API designers tried to avoid the issues related to encoding with wishful thinking (i.e. you open a file and you magically get Unicode characters out of it). OK, that part is purely wishful thinking, but assuming that filenames are text that can be represented in Unicode is wishful thinking that corresponds to 99% of reality. So why can't the remaining 1 percent of reality be fixed instead? The issue isn't whether the data can be represented as Unicode text, but whether you can convert it to and from Unicode without problems. To do this, you need to know the encoding, you need to store the encoding so that you can convert the wide string back to a byte string, and the encoding needs to be reversible. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] invalid character encoding
, but I can see why they would have changed it (a single catalogue for encoding variants of a given locale). How would it know how to interpret filenames for graphical display? An option menu on the file selector is one option; heuristics are another. Heuristics won't distinguish various ISO-8859-x from each other. You treat the locale's encoding as a heuristic. If it looks like ISO-8859-x, and the locale's encoding is ISO-8859-x, you use that. If it looks like Shift-JIS, you don't complain and give up just because the locale is UTF-8. An option menu on the file selector is user-unfriendly because users don't want to configure it for each program separately. They want to set it in one place and expect it to work everywhere. Nothing will work everywhere. An option menu allows the user to force the encoding for individual cases when whatever other mechanism(s) you use get it wrong. I've needed to use Mozilla's View - Character Encoding menu enough times when the browser's guess turned out to be wrong (and blindly honouring the charset specified by HTTP's Content-Type: or HTML's META tags would be a disaster). At least Gtk-1 would attempt to display the filename; you would get the odd question mark but at least you could select the file; Gtk+2 also attempts to display the filename. It can be opened even though the filename has inconvertible characters escaped. This isn't my experience; I just get messages like: Gtk-Message: The filename \377.ppm couldn't be converted to UTF-8. (try setting the environment variable G_FILENAME_ENCODING): Invalid byte sequence in conversion input and the filename is omitted altogether. The current locale mechanism is just a way of avoiding the issues as much as possible when you can't get away with avoiding them altogether. It's a way to communicate the encoding of the terminal, filenames, strerror, gettext etc. It's *a* way, but it's not a very good way. It sucks when you can't apply a single convention to everything. Unicode has been described (accurately, IMHO) as Esperanto for computers. Both use the same approach to try to solve essentially the same problem. And both will be about as successful in the long run. Unicode has no viable competition. There are two viable alternatives. Byte strings with associated encodings and ISO-2022. In CJK environments, ISO-2022 is still far more widespread than UTF-8, and will likely remain so for the foreseeable future. And byte strings with associated encodings are probably still the most common of all. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] invalid character encoding
Marcin 'Qrczak' Kowalczyk wrote: Glynn Clements [EMAIL PROTECTED] writes: It should be possible to specify the encoding explicitly. Conversely, it shouldn't be possible to avoid specifying the encoding explicitly. What encoding should a binding to readline or curses use? Curses in C comes in two flavors: the traditional byte version and a wide character version. The second version is easy if we can assume that wchar_t is Unicode, but it's not always available and until recently in ncurses it was buggy. Let's assume we are using the byte version. How to encode strings? The (non-wchar) curses API functions take byte strings (char*), so the Haskell bindings should take CString or [Word8] arguments. If you provide wrapper functions which take String arguments, either they should have an encoding argument or the encoding should be a mutable per-terminal setting. A terminal uses an ASCII-compatible encoding. Wide character version of curses convert characters to the locale encoding, and byte version passes bytes unchanged. This means that if a Haskell binding to the wide character version does the obvious thing and passes Unicode directly, then an equivalent behavior can be obtained from the byte version (only limited to 256-character encodings) by using the locale encoding. I don't know enough about the wchar version of curses to comment on that. I do know that, to work reliably, the normal (byte) version of curses needs to pass printable bytes through unmodified. It is possible for curses to be used with a terminal which doesn't use the locale's encoding. Specifically, a single process may use curses with multiple terminals with differing encodings, e.g. an airport public information system displaying information in multiple languages. Also, it's quite common to use non-standard encodings with terminals (e.g. codepage 437, which has graphic characters beyond the ACS_* set which terminfo understands). The locale encoding is the right encoding to use for conversion of the result of strerror, gai_strerror, msg member of gzip compressor state etc. When an I/O error occurs and the error code is translated to a Haskell exception and then shown to the user, why would the application need to specify the encoding and how? Because the application may be using multiple locales/encodings. Having had to do this in C (i.e. repeatedly calling setlocale() to select the correct encoding), I would much prefer to have been able to pass the locale as a parameter. [The most common example is printf(%f). You need to use the C locale (decimal point) for machine-readable text but the user's locale (locale-specific decimal separator) for human-readable text. This isn't directly related to encodings per se, but a good example of why parameters are preferable to state.] If application code doesn't want to use the locale's encoding, it shouldn't be shoe-horned into doing so because a library developer decided to duck the encoding issues by grabbing whatever encoding was readily to hand (i.e. the locale's encoding). If a C library is written with the assumption that texts are in the locale encoding, a Haskell binding to such library should respect that assumption. C libraries which use the locale do so as a last resort. KR C completely ignored I18N issues. ANSI C added the locale mechanism to as a hack to provide minimal I18N support while maintaining backward compatibility and in a minimally-intrusive manner. The only reason that the C locale mechanism isn't a major nuisance is that you can largely ignore it altogether. Code which requires real I18N can use other mechanisms, and code which doesn't require any I18N can just pass byte strings around and leave encoding issues to code which actually has enough context to handle them correctly. Only some libraries allow to work with different, explicitly specified encodings. Many libraries don't, especially if the texts are not the core of the library functionality but error messages. And most such libraries just treat text as byte strings. They don't care about their interpretation, or even whether or not they are valid in the locale's encoding. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] invalid character encoding
John Meacham wrote: It doesn't affect functions added by the hierarchical libraries, i.e. those functions are safe only with the ASCII subset. (There is a vague plan to make Foreign.C.String conform to the FFI spec, which mandates locale-based encoding, and thus would change all those, but it's still up in the air.) Hmm. I'm not convinced that automatically converting to the current locale is the ideal behaviour (it'd certianly break all my programs!). Certainly a function for converting into the encoding of the current locale would be useful for may users but it's important to be able to know the encoding with certainty. It should only be the default, not the only option. I'm not sure that it should be available at all. It should be possible to specify the encoding explicitly. Conversely, it shouldn't be possible to avoid specifying the encoding explicitly. Personally, I wouldn't provide an all-in-one convert String to CString using locale's encoding function, just in case anyone was tempted to actually use it. But this is exactly what is needed for most C library bindings. I very much doubt that most is accurate. C functions which take a char* fall into three main cases: 1. Unspecified encoding, i.e. it's a string of bytes, not characters. 2. Locale's encoding, as determined by nl_langinfo(CODESET); essentially, whatever was set with setlocale(LC_CTYPE), defaulting to C/POSIX if setlocale() hasn't been called. 3. Fixed encoding, e.g. UTF-8, ISO-2022, US-ASCII (or EBCDIC on IBM mainframes). Historically, library functions have tended to fall into category 1 unless they *need* to know the interpretation of a given byte or sequence of bytes (e.g. ctype.h), in which case they fall into category 2. Most of libc falls into category 1, with a minority of functions in category 2. Code which is designed to handle multiple languages simultaneously is more likely to fall into category 3, using one of the universal encodings (typically ISO-2022 in southeast Asia and UTF-8 elsewhere). E.g. Gtk-2.x uses UTF-8 almost exclusively, although you can force the use of the locale's encoding for filenames (if you have filenames in multiple encodings, you lose; filenames using the wrong encoding simply don't appear in file selectors). -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] invalid character encoding
it was the simplest way to retrofit minimal I18N onto KR C. It also means that most code can easily duck the issues (i.e. so you don't have to pass a locale parameter to isupper() etc). OTOH, if you don't want to duck the issue, global locale settings are a nuisance. The only reason that the C locale mechanism isn't a major nuisance is that you can largely ignore it altogether. Then how would a Haskell program know what encoding to use for stdout messages? It doesn't necessarily need to. If you are using message catalogues, you just read bytes from the catalogue and write them to stdout. The issue then boils down to using the correct encoding for the catalogues; the code doesn't need to know. How would it know how to interpret filenames for graphical display? An option menu on the file selector is one option; heuristics are another. Both tend to produce better results in non-trivial cases than either of Gtk-2's choices: i.e. filenames must be either UTF-8 or must match the locale (depending up the G_BROKEN_FILENAMES setting), otherwise the filename simply doesn't exist. At least Gtk-1 would attempt to display the filename; you would get the odd question mark but at least you could select the file; ultimately, the returned char* just gets passed to open(), so the encoding only really matters for display. Code which requires real I18N can use other mechanisms, and code which doesn't require any I18N can just pass byte strings around and leave encoding issues to code which actually has enough context to handle them correctly. Haskell can't just pass byte strings around without turning the Unicode support into a joke (which it is now). If you try to pretend that I18N comes down to shoe-horning everything into Unicode, you will turn the language into a joke. Haskell's Unicode support is a joke because the API designers tried to avoid the issues related to encoding with wishful thinking (i.e. you open a file and you magically get Unicode characters out of it). The current locale mechanism is just a way of avoiding the issues as much as possible when you can't get away with avoiding them altogether. Unicode has been described (accurately, IMHO) as Esperanto for computers. Both use the same approach to try to solve essentially the same problem. And both will be about as successful in the long run. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] invalid character encoding
Marcin 'Qrczak' Kowalczyk wrote: E.g. Gtk-2.x uses UTF-8 almost exclusively, although you can force the use of the locale's encoding for filenames (if you have filenames in multiple encodings, you lose; filenames using the wrong encoding simply don't appear in file selectors). Actually they do appear, even though you can't type their names from the keyboard. The name shown in the GUI used to be escaped in different ways by different programs or even different places in one program (question marks, %hex escapes \oct escapes), but recently they added some functions to glib to make the behavior uniform. In the last version of Gtk-2.x which I tried, invalid filenames are just omitted from the list. Gtk-1.x displayed them (I think with question marks, but it may have been a box). I've just tried with a more recent version (2.6.2); the default behaviour is similar, although you can now get around the issue by using G_FILENAME_ENCODING=ISO-8859-1. Of course, if your locale is a long way from ISO-8859-1, that isn't a particularly good solution. The best test case would be a system used predominantly by Japanese, where (apparently) it's common to have a mixture of both EUC-JP and Shift-JIS filenames (occasionally wrapped in ISO-2022, but usually raw). -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] invalid character encoding
Marcin 'Qrczak' Kowalczyk wrote: It doesn't affect functions added by the hierarchical libraries, i.e. those functions are safe only with the ASCII subset. (There is a vague plan to make Foreign.C.String conform to the FFI spec, which mandates locale-based encoding, and thus would change all those, but it's still up in the air.) Hmm. I'm not convinced that automatically converting to the current locale is the ideal behaviour (it'd certianly break all my programs!). Certainly a function for converting into the encoding of the current locale would be useful for may users but it's important to be able to know the encoding with certainty. It should only be the default, not the only option. I'm not sure that it should be available at all. It should be possible to specify the encoding explicitly. Conversely, it shouldn't be possible to avoid specifying the encoding explicitly. Personally, I wouldn't provide an all-in-one convert String to CString using locale's encoding function, just in case anyone was tempted to actually use it. The decision as to the encoding belongs in application code; not in (most) libraries, and definitely not in the language. [Libraries dealing with file formats or communication protocols which mandate a specific encoding are an exception. But they will be using a fixed encoding, not the locale's encoding.] If application code chooses to use the locale's encoding, it can retrieve it then pass it as the encoding argument to any applicable functions. If application code doesn't want to use the locale's encoding, it shouldn't be shoe-horned into doing so because a library developer decided to duck the encoding issues by grabbing whatever encoding was readily to hand (i.e. the locale's encoding). -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: Process library and signals
Simon Marlow wrote: I think this covers most of the useful situations. If you want to do the same thing in both parent and child, or handle in the parent and SIG_DFL in the child: use runProcess. If you want to ignore in the parent and SIG_DFL in the child: use System.Cmd.{system,rawSystem}. To handle in the parent and ignore in the child: unfortunately not directly supported. As it stands, you can have whatever behaviour you want in the parent: set the desired handling before calling system/rawSystem/runProcess then set it back afterwards. However, this will cease to be true for system/rawSystem if you change them so that the child restores the handlers to their state upon entry. I don't understand... is there a typo somewhere above? Perhaps you meant child in the first paragraph? Sorry; I wasn't thinking straight. That part of my message is incorrect; changing the signal handling before calling system/rawSystem won't help, because they force both cases. If they were changed to behave like system(), the caller could determine the *child* behaviour, but that's prone to a race condition, so I doubt that it would be useful in practice. system/rawSystem now behave almost exactly like system() in C. The only difference is that you can't ignore SIGINT/SIGQUIT in the child, but I can fix that if necessary. I'm not sure how much it matters; system() isn't really of much use for real programs anyhow. -- Glynn Clements [EMAIL PROTECTED] ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: [Haskell] Re: xemacs haskell major mode
Surendra Singhi wrote: Is there any ilisp or slime like package for haskell, which integrates haskell with xemacs or emacs and provides a kind of integrated development environment? I am using Hugs 98. Does URL: http://www.haskell.org/pipermail/haskell/2004-November/015015.html help? I downloaded the haskell mode from that site and I was trying to configure it, but during the process I ran into this error Debugger entered--Lisp error: (void-function charsetp) signal(void-function (charsetp)) The charsetp function only exists if XEmacs was built with the MuLE (MUlti-Lingual Emacs) option. However, the only use of that function which I can see in the haskell-mode code is: (and (fboundp 'make-char) (charsetp 'japanese-jisx0208) So charsetp should only be called if the make-char function exists, and that function should also only exist if XEmacs was built with the MuLE option. It may be that you have a version of XEmacs which has make-char but which doesn't have charsetp. In any case, I have CC'd this message to the maintainer of haskell-mode, Stefan Monnier, in case he has any ideas. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell-cafe] Re: File path programme
Peter Simons wrote: Hmmm, I'm not really sure what equivalence for file paths should mean in the presence of hard/symbolic links, (NFS-)mounted file systems, etc. Well, there is a sort-of canonic version for every path; on most Unix systems the function realpath(3) will find it. My interpretation is that two paths are equivalent iff they point to the same target. I think that any definition which includes an iff is likely to be overly optimistic. More likely, you will have to settle for a definition such that, if two paths are considered equal, they refer to the same file, but without the converse (i.e. even if they aren't equal, they might still refer to the same file). Even so, you will need to make certain assumptions. E.g. older Unices would allow root to replace the . and .. entries; you probably want to assume that can't happen. You (and the others who pointed it out) are correct, though, that the current 'canon' function doesn't accomplish that. I guess, I'll have to move it into the IO monad to get it right. And I should probably rename it, too. ;-) A version in the IO monad would allow for a tighter definition (i.e. more likely to correctly identify that two different path values actually refer to the same file). [Certainly, you have to use the IO monad if you want to allow for case sensitivity, as that depends upon which filesystems are mounted where.] Within the IO monad, the obvious approach is to stat() both pathnames and check whether their targets have the same device/inode pairs. That's reasonably simple, and probably about as good as you can get. That still won't handle the case where you mount a single remote filesystem via both NFS and SMB though. I doubt that anything can achieve that. There are also issues of definition, e.g. is /dev/tty considered equivalent to the specific /dev/ttyXX device for the current process? -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] File path programme
Keean Schupke wrote: I guess it's just that I'm more concerned with making possible what is currently impossible (according to the library standards)--that is, using FFI and IO on the same file--rather than just adding utility features that application developers could have written themselves. I suppose we don't need a class for this, all we need is a couple of functions to convert between FilePath and CString. Except paths are different on different platforms... for example: /a/b/../c/hello\ there/test and: A:\a\b\ notice how the backslash is used to 'escape' a space or meta-character on unix, That's Bourne-shell syntax, not Unix API syntax. So far as open() etc are concerned, a backslash is just another character. Also, Windows accepts both slash and backslash equally in most situations. It's only really command-line parsing (where slash is normally used to denote switches) where there's an issue. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] File path programme
robert dockins wrote: I don't pretend to fully understand various unicode standard but it seems to me that these problems are deeper than file path library. The equation (decode . encode) /= id seems confusing for me. Can you give me an example when this happen? I am pretty sure that ISO 2022 encoded strings can have multiple ways to express the same unicode glyphs. This means that any sensible relation between IS0 2022 strings and unicode strings maps more than one ISO 2022 string onto the same unicode string. The inverse is therefore not a function. To make it a function one of the possibly several encodings of the unicode string will have to be chosen. So you have a ISO 2022 string A which is decoded to a unicode string U. We reencode U to an ISO 2022 string B. It may be that A /= B. That is the problem. Exactly. And it isn't a theoretical issue. E.g. in an environment where EUC-JP is used, filenames may begin with ESC$)B (designate JISX0208 to G1), or they may not (because G1 is assumed to contain JISX0208 initally). More generally, ISO-2022 strings frequently contain redundant character-set switching sequences, so conversion to unicode and back again typically won't yield the original sequence of bytes. The various UTF encodings do not have this particular problem; if a UTF string is valid, then it is a unique representation of a unicode string. Except that there are some ad-hoc extensions, e.g. the UTF-8 variant used by both Java and Tcl permits NUL characters to be embedded in NUL-terminated UTF-8 strings by encoding them as a two-byte sequence (which is invalid in UTF-8 proper). -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: File path programme
(BMarcin 'Qrczak' Kowalczyk wrote: (B (B The various UTF encodings do not have this particular problem; if a UTF (B string is valid, then it is a unique representation of a unicode string. (B However, decoding is still a partial function and can fail. (B (B And while it is partly true, it is qualified by the problems relative to (B canonicalization (an "-Bé" in Unicode can both be represented as "é" or as (B two-A (B chars (an e and an accent) and they should (ideally) compare equal). (B (B In what sense "equal"? They are supposed to be equivalent as far (B as the semantics of the text is concerned, but representations are (B clearly different and most programs distinguish them. In particular (B they are different filenames on both Unix and Windows. AFAIK MacOS (B normalizes filenames, but using a slightly different algorithm than (B Unicode (perhaps just an older version). (B (B IMHO it makes no sense to pretend that they are exactly the same when (B strings consist of code points or lower level units (and I don't (B believe another choice for the default string type would be practical). (B (BWell, at least you and I agree on that. (B (BOnce you start down the "semantic equivalence" route, you will quickly (Brun into issues like "ß" == "ss", and it only gets worse from there (Bon. (B (B-- (BGlynn Clements [EMAIL PROTECTED] (B___ (BHaskell-Cafe mailing list (BHaskell-Cafe@haskell.org (Bhttp://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] File path programme
Ben Rudiak-Gould wrote: Is there an MSDN page that actually gives a grammar, or at least a taxonomy, of Win32 pathnames? That would be useful. It would also be longer than War and Peace, once you start allowing for MS-DOS 8.3 pathnames, codepages, the fact that anything anywhere which contains aux, con, lpt etc refers to a device (sometimes), the fact that ... == ../.. (sometimes), the handling of incomplete multibyte characters, ... Search the BugTraq archives for issues related to IIS access-control lists to discover the myriad different names which can be used to refer to a given file for which the administrator is (unsuccessfully) trying to restrict access. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Hugsvs GHC (again)was: Re: Somerandomnewbiequestions
Ketil Malde wrote: The point is that the Unix documentation does not consider the short pause as data is read off your hard drive to be blocking. So that's why select will always report that data is available when you use it with a file handle. Isn't this also for historic reasons? Partly. But I think that it's also because this functionality wasn't intended for the purpose which is being discussed, i.e. enabling a process to obtain maximal CPU utilisation. For that purpose, explicit overlapped I/O (in all forms) can only ever be a partial solution, because you still have the issue that memory (i.e. code/data/stack segments) is demand-paged. The only solution there is multiple threads. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Hugsvs GHC (again)was: Re: Somerandomnewbiequestions
Keean Schupke wrote: Why is disk a special case? With slow streams, where there may be an indefinite delay before the data is available, you can use non-blocking I/O, asynchronous I/O, select(), poll() etc to determine if the data is available. If it is, reading the data is essentially just copying from kernel memory to userspace. If it isn't, the program can do something else while it's waiting for the data to arrive. With files or block devices, the data is always deemed to be available, even if the data isn't in physical memory. Calling read() in such a situation will block until the data has been read into memory. I have never heard that all processes under linux wait for a disk read... The kernel most certainly does not busy wait for disks to respond, so the only alternative is that the process that needs to wait (and only that process) is put to sleep. In which case a second thread would be unaffected. Correct. The point is that maximising CPU utilisation requires the use of multiple kernel threads; select/poll or non-blocking/asynchronous I/O won't suffice. Linux does not busy wait in the Kernel! (don't forget the kernel does read-ahead, so it could be that read really does return 'immediately' and without any delay apart from at the end of file - In which case asynchronous IO just slows you down with extra context switches). It doesn't busy wait; it suspends the process/thread, then schedules some other runnable process/thread. The original thread remains suspended until the data has been transferred into physical memory. Reading data from a descriptor essentially falls into three cases: 1. The data is in physical RAM. read() copies the data to the supplied user-space buffer then returns control to the caller. 2. The data isn't in physical RAM, but is available with only a finite delay (i.e. time taken to read from block device or network filesystem). 3. The data isn't in physical RAM, and may take an indefinite amount of time to arrive (e.g. from a socket, pipe, terminal etc). The central issue is that the Unix API doesn't distinguish between cases 1 and 2 when it comes to non-blocking I/O, asynchronous I/O, select/poll etc. [OTOH, NT overlapped I/O and certain Unix extensions do distinguish these cases, i.e. data is only available when it's in physical RAM.] If you read from a non-blocking descriptor, and case 2 applies, read() will block while the data is read from disk then return the data; it won't return -1 with errno set to EAGAIN, as would happen with case 3. If you want to be able to utilise the CPU while waiting for disk I/O to occur, you have to use multiple kernel threads, with one thread for each pending I/O operation, plus another one for computations (or another one for each CPU if you want to obtain the full benefit of an SMP system). Even then, you still have to allow for the fact that user-space memory is subject to swapping and demand-paging. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe]Re: Hugsvs GHC (again)was: Re: Somerandomnewbiequestions
Keean Schupke wrote: read. I don't see the problem... (Okay, I can see that if select lies, and the read takes a long time you might miss the next scheduling timeslot - but as far as I am aware select doesn't lie, and read will return immediately if select says there is data ready)... select() _does_ lie for ordinary files, e.g., disk files. It assumes the data is immediately readable, even if it hasn't pulled it off disk yet. If the ordinary file actually resides on an NFS volume, or CD, or something else slow, then you have a problem. But the kernel does read-ahead so this data should just be a buffer copy. You can't rely upon the kernel always having read the data already. E.g. a program which performs trivial operations on large files may well be able to consume the data faster than the kernel can obtain it. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Hugsvs GHC (again)was: Re: Somerandomnewbiequestions
Keean Schupke wrote: Okay, my ignorance of Posix is showing again. Is it currently the case, then, that every GHC thread will stop running while a disk read is in progress in any thread? Is this true on all platforms? It's true on Unix-like systems, I believe. Even with -threaded. It might not be true on Win32. I think this is not true on linux, where a thread is just a process created with special flags to keep the same fds and memory. As threads on linux are scheduled like processes, one thread blocking should not affect the others? That should be true of all POSIX-like thread implementations (including Linux, whose threads aren't quite POSIX-compliant, e.g. in regard to signal handling, but aren't that far off). Essentially, blocking system calls only block the calling kernel thread. OTOH, if you are implementing multiple user-space threads within a single kernel thread, if that kernel thread blocks, all of the user-space threads within it will be blocked. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: [Haskell-cafe] Re: Hugsvs GHC (again)was: Re: Somerandomnewbiequestions
Simon Marlow wrote: We do use a thread pool. But you still need as many OS threads as there are blocked read() calls, unless you have a single thread doing select() as I described. How does the select() help? AFAIK, select() on a regular file or block device will always indicate that it is readable, even if a subsequent read() would have to read the data from disk. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Hugs vs GHC (again)was: Re: Somerandomnewbiequestions
Ben Rudiak-Gould wrote: GHC really needs non-blocking I/O to support its thread model, and memory-mapped I/O always blocks. If, by blocks, you mean that execution will be suspended until the data has been read from the device into the buffer cache, then Unix non-blocking I/O (i.e. O_NONBLOCK) also blocks. Okay, my ignorance of Posix is showing again. Is it currently the case, then, that every GHC thread will stop running while a disk read is in progress in any thread? The kernel thread which called read() will be blocked. If GHC threads are userspace threads running within a single kernel thread, then they will all block. If GHC uses multiple kernel threads, the other kernel threads will continue to run. Is this true on all platforms? Some platforms (but, AFAIK, not linux) allow asynchronous I/O on regular files. NT has overlapped I/O, which is essentially the same thing. There are two ways of reading from a file/stream in Win32 on NT. One is asynchronous: the call returns immediately and you receive a notification later that the read has completed. The other is synchronous but almost-nonblocking: it returns as much data as is available, and the entire contents of a file is considered always available. But it always returns at least one byte, and may spend an arbitrary amount of time waiting for that first byte. You can avoid this by waiting for the handle to become signalled; if it's signalled then a subsequent ReadFile will not block indefinitely. Win32's synchronous ReadFile is basically the same as Posix's (blocking) read. For some reason I thought that Win32's asynchronous ReadFile was similar to Posix's non-blocking read, but I gather from [1] that they're completely different. They're similar, but not identical. Traditionally, Unix non-blocking I/O (along with asynchronous I/O, select() and poll()) were designed for slow streams such as pipes, terminals, sockets etc. Regular files and block devices are assumed to return the data immediately. Essentially, for slow streams, you have to wait for the data to arrive before it can be read, so waiting may take an indefinite amount of time. For fast streams, the data is always available, you just have to wait for the system call to give it to you. IOW, the time taken to read from a block device is amortised into the execution time of the system call, rather than being treated as a delay. Also, even with blocking I/O, slow streams only block if no data is available. If less data is available than was requested, they will usually return whatever is available rather than waiting until they have the requested amount. Non-blocking I/O only affects the case where no data is available. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: I/O interface
Ferenc Wagner wrote: dup()-ed filehandles share a common file position. They also share the file status flags (O_NONBLOCK, O_APPEND etc). So, enabling or disabling non-blocking I/O will affect all descriptors obtained by duplication (either by dup/dup2 or by fork). OTOH, each descriptor has its own set of descriptor flags (i.e. the close-on-exec flag). A related issue is that device state (e.g. terminal settings) is a property of the device itself, and so is shared amongst all descriptors which refer to the device regardless of whether they were created by dup/dup2 or a separate open() call. For this reason, hSetBuffering shouldn't be modifying the ICANON flag, IMHO. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Parse text difficulty
Malcolm Wallace wrote: Prelude [1..5] `zipWith (+)` [7..] interactive:1: parse error on input `(' is there a technical reason for this or did it just happen? If you are asking why general expressions are prohibited between backticks, yes, there is a reason. The expression could be arbitrarily large, so you might have to search many lines to find the closing backtick. But in such a situation, it is surely much more likely that the programmer has simply forgotten to close the ticks around a simple identifier. Just think of the potential for delightfully baffling type error messages that might result! There's also the issue that you wouldn't be allowed to use backticks within such an expression, so you would need additional grammar rules describing expressions which are allowed within backticks. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Top Level TWI's again was Re: [Haskell] Re: Parameterized Show
Keean Schupke wrote: Can a C function be pure? I guess it can... The trouble is you cannot proove its pure? But - why would you want to use a pure C function. Because it already exists? E.g. most BLAS/LAPACK functions are pure; should they be re-written in Haskell? [Yes, I know that BLAS/LAPACK are written in Fortran, but I don't think that changes the argument. The resulting object code (which is what you would actually be using) wouldn't be significantly different if they were written in C.] -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: Bug in touchForeignPtr?
Keean Schupke wrote: C exit routines aren't responsible for freeing OS resources; the OS is. The fact that the SysV IPC objects aren't freed on exit is intentional; they are meant to be persistent. For the same reason, the OS doesn't delete upon termination any files which the process created. Right, which is why if you want to clean up temporary files, or temporary semaphores the OS doesn't do it for you, and you need to put some routine inplace to do it (using at_exit)... It seems this is the only way to guarantee something gets run when a program exits for whatever reason. There isn't any way to *guarantee* that something is run upon termination. The program may be terminated due to SIGKILL (e.g. due to a system-wide lack of virtual memory). If you run out of stack, you may not be able to call functions to perform clean-up. Also, if the program crashes, handling the resulting SIGSEGV (etc) is likely to be unreliable, as the memory containing the resource references may have been trashed. Calling remove() on a filename which might have been corrupted is inadvisable. Also, at_exit() isn't standard. atexit() is ANSI C, but that is only supposed to be called for normal termination (exit() or return from main()), not for _exit() or fatal signals. -- Glynn Clements [EMAIL PROTECTED] ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: [Haskell-cafe] Sample rate inference
Henning Thielemann wrote: The computation sample rate should be propagated through the network as follows: If in a component of equal sample rate some processors have the same fixed sample rate, all uncertain processors must adapt that. If some processors have different fixed sample rates this is an error. If no processor has a fixed sample rate, the user must provide one manually. To me this looks very similar to type inference. Is there some mechanism in Haskell which supports this programming structure? If you define a class for sample rates, and an instance for each possible sample rate, then you could use type inference, Interesting approach, though it's not good idea to restrict to some sample rates. It's also not necessary to do the inference at compile time. Ah. I think that I took your comparision to type inference too literally. I doubt that this specific example wouldn't work in practice (the type inference would probably give the compiler a heart attack), but you could presumably construct an equivalent mechanism using base-N numerals. :-) How can one implement a sample rate inference that work at run-time for arbitrary rates? This will be the only way if one works with sampled sounds read from a file. This is essentially unification. Haskell and ML use it for type inference and for pattern matching, (although pattern matching is always unidirectional, i.e unifying a pattern comprised of both variables and constants with a value comprised solely of constants). Prolog uses it more extensively (variables can occur on either side). Essentially, unification involves matching structures comprised of constants, variables, and other structures. An unbound variable matches anything, resulting in the variable becoming bound; a bound variable matches whatever its value matches; a constant matches itself; and a structure matches another structure if they have the same number of components and all of their components match. You could probably use GHC's type inference code, although converting it for your purposes may be more work than starting from scratch. The Hugs98 code contains a miniature prolog implementation, so you could take the unification algorithm from that. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Space efficiency problem
Keith Wansbrough wrote: The problem is that there are so many iterations, that the program gets killed (kill -9) by the system. I'm not sure what you mean here - I've never encountered a system that kills processes with -9, other than at shutdown time. Are you sure it's -9? If a process exhausts its resource limits (as set with setrlimit()), the kernel will typically kill it with SIGKILL. Also, if the available system-wide memory gets too low, the kernel may start killing of processes, again with SIGKILL. When this occurs, the shell from which the process was spawned will typically write Killed to the terminal. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Set of reals...?
MR K P SCHUPKE wrote: Double already has +Inf and -Inf; it's just that Haskell doesn't have (AFAIK) syntax to write them as constants. In the source for the GHC libraries it uses 1/0 for +Infinity and -1/0 for -Infinity, so I assume these are the official way to do it. Personally I would define nicer names: positiveInfinity :: Double positiveInfinity = 1/0 negativeInfinity :: Double negativeInfinity = -1/0 Or just: infinity = 1/0 and use -infinity for the negative. One other nit: isn't the read/show syntax for Haskell98 types supposed to valid Haskell syntax? From http://www.haskell.org/onlinereport/derived.html#derived-text The result of show is a syntactically correct Haskell expression containing only constants, given the fixity declarations in force at the point where the type is declared. [Note: the above sentecne refers specifically to derived instances, but induction would require that it also holds for base types.] However: Prelude let infinity = 1/0 :: Double Prelude show infinity Infinity Prelude read (show infinity) :: Double Infinity Prelude Infinity interactive:1: Data constructor not in scope: `Infinity' -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: Is it safe to call getProcessExitCode more than once?
David Brown wrote: Both [waitForProcess and getProcessExitCode] will throw an exception if the process terminated on a signal. So if I terminate a process manually, I'll have to wait for the ExitCode to avoid a zombie process, and waiting for the ExitCode invariably throws an exception. It's just the way that Unix process management works. I guess you have to catch the exception to handle it well. This is part of the aspect that makes writing shells so complicated. I think that Peter was referring primarily to the fact that the Haskell interface to waitpid() throws an exception if the process terminated due to a signal, not the fact that you have to reap children to prevent the accumulation of zombies. The C interface is that waitpid() (and similar) return a status code; you can then use the macros from sys/wait.h to determine whether the process terminated normally (e.g. via exit()) or abnormally (due to a fatal signal), and to obtain either the exit code or the signal number as appropriate. The Haskell interface oversimplifies matters, making it easier to get the exit code in the case of normal termination, but complicating the handling of abnormal termination. -- Glynn Clements [EMAIL PROTECTED] ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: [Haskell-cafe] Set of reals...?
Stijn De Saeger wrote: Now, for unions I tried the following: to take the union of two BasicSets, just append them and contract the result. contracting meaning: merge overlapping intervals. contract :: Range - Range - BasicSet contract (x1,y1) (x2,y2) | x2 = y1 = if x2 = x1 then [(x1, (max y1 y2))] else if y2 = x1 then [(x2, (max y1 y2))] else [(x2,y2), (x1,y1)] | x1 = y2 = if x1 = x2 then [(x2, (max y1 y2))] else if y1 = x2 then [(x1, (max y1 y2))] else [(x1,y1), (x2,y2)] | x1 = x2 = [(x1,y1), (x2, y2)] Now generalizing this from Ranges to BasicSets is where i got stuck. In my limited grasp of haskell and FP, this contractSet function below is just crying for the use of a fold operation, but i can't for the life of me see how to do it. As the result is a BasicSet, the accumulator would need to be a BasicSet and the operator would need to have type: BasicSet - Range - BasicSet This can presumably be implemented as a fold on contract, so contractSet would essentially be a doubly-nested fold. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: Process library and signals
Simon Marlow wrote: So basically you're saying that if runProcess is to be used in a system()-like way, that is the parent is going to wait synchronously for the child, then the parent should be ignoring SIGQUIT/SIGINT. On the other hand, if runProcess is going to be used in a popen()-like way, then the parent should not be ignoring SIGQUIT/SIGINT. Exactly. The current interface doesn't allow for controlling the behaviour in this way. Yep. So the current signal handling in runProcess is wrong, and should probably be removed. What should we have instead? We could implement the system()-like signal handling for System.Cmd.system only, perhaps. Well, probably for system and rawSystem. The problem, as I see it, is that the Process library is meant to be both flexible and portable. If you don't need the portability, you already have the primitives in System.Posix, and separate fork/exec will inevitably provide more flexibility than an all-in one version. If you provide system/rawSystem and runInteractive{Command,Process}, that's covered the most common cases (i.e. system() and popen()). So what is runProcess for? If it doesn't do the signal handling, it's only really suitable for popen-style usage. Which is unfortunate; I can imagine a use for an intermediate semi-raw system, which supports e.g. file redirection or even command pipelines, but without using the shell (i.e. accepts the argv[] individually). In particular, using the shell is risky if you want to use untrusted data in the argument list (e.g. CGI programs). If runProcess doesn't do the signal handling between the fork and the exec, you can't change the child's signal handling after the exec. You could change the signal handling of the parent (i.e. the current process) before calling runProcess, let the child inherit it, then change it back again after runProcess returns, but that gives rise to a potential race condition. One possibility would be to allow an extra argument of type IO () (or Maybe (IO ()), where Nothing is shorthand for Just $ return ()) which would be executed between the fork and the exec on Unix and ignored on Windows. AFAICT, that would expose the full functionality available on Unix without interfering with Windows usage or adding complexity. -- Glynn Clements [EMAIL PROTECTED] ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Is it safe to call getProcessExitCode more than once?
Peter Simons wrote: Both [waitForProcess and getProcessExitCode] will throw an exception if the process terminated on a signal. So if I terminate a process manually, I'll have to wait for the ExitCode to avoid a zombie process, and waiting for the ExitCode invariably throws an exception. Or do I misunderstand something? No, that seems correct. Although, depending upon the OS, setting SIGCHLD to SIG_IGN may cause processes to be reaped automatically (i.e. not become zombies), so that's a possible alternative. -- Glynn Clements [EMAIL PROTECTED] ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: [Haskell-cafe] Re: exitFailure under forkProcess
John Goerzen wrote: Oh also, I would very much appreciate Haskell interfaces to realpath() and readlink(). I don't know about realpath() (which is a BSD-ism, and included in GNU libc, but I'm not sure about other Unices), but readlink() exists as System.Posix.readSymbolicLink. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: [Haskell-cafe] Re: exitFailure under forkProcess
Simon Marlow wrote: Yes. Its POSIX interface is, uhm, weird. I can't quite put my finger on it, but things like setting up a pipe to a child process's stdin just seem brittle and fragile with all sorts of weird errors. I can do this in my sleep in C, Perl, or Python but in Haskell I can barely make it work when I'm fully conscious :-) *laughs* Is there anything concrete we can do? The POSIX layer is supposed to be pretty minimal, so in theory most POSIX idioms should not be harder in Haskell, and hopefully should be easier. Part of the problem is that you can't always consider the use of individual POSIX functions in isolation. Things which are done (possibly unknowingly) in one place might affect the way in which other system calls behave. One major issue is the way in which fork() has global consequences. E.g. if a library has file descriptors for internal use, fork() will duplicate them. If the library subsequently closes its copy of the descriptor, but the inherited copy (which the child may not even know exists) remains open, the file (socket, device, etc) will remain open. Another example of this is the interaction between buffered streams and descriptors. If a process forks while unflushed data remains in a stream, the data may be written twice. This can be quite serious if the stream corresponds to some form of control channel (i.e. a pipe or socket communicating with another process). Ultimately, the only real solution to such issues is to ensure that any high-level functionality provides a sufficient level of cooperation with lower-level code, e.g. allowing it to be synchronised, or at least shut down into a state such that it doesn't interfere, ensuring that it doesn't hide unnecessary details which may actually be necessary in more involved programs, etc. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Set of reals...?
Stijn De Saeger wrote: Thanks for the explanation, at first it seemed like enumFromThenTo would indeed give me the functionality I am looking for. But then all of GHCi started acting weird while playing around... this is a copy-paste transcript from the terminal. *S3 0.5 `elem` [0.0,0.1..1.0] True *S3 0.8 `elem` [0.6,0.7..1.0] False *S3 0.8 `elem` [0.6,0.7..1.0] False *S3 [0.6,0.7..0.9] [0.6,0.7,0.7999,0.8999] *S3 Floating point has limited precision, and uses binary rather than decimal, so you can't exactly represent multiples of 1/10 as floating-point values. Internally, the elements of the list would actually be out by a relative error of ~2e-16 for double-precision, ~1e-7 for single precision, but the code which converts to decimal representation for printing rounds it. However, Haskell does support rationals: Prelude [6/10 :: Rational,7/10..9/10] [3 % 5,7 % 10,4 % 5,9 % 10] Prelude 4/5 `elem` [6/10 :: Rational,7/10..9/10] True in your reply you wrote : However, you can't specify infinitesimally small steps, nor increment according to the resolution of the floating point type (at least, not using the enumeration syntax; you *could* do it manually using integer enumerations and encodeFloat, but that wouldn't be particularly practical). Is this what you were referring to? i wouldn't say 0.1 is an infinitesimal small step. No; you could realistically use much smaller steps than that. My point was that you can't realistically use sufficiently small steps that values won't fall through the cracks: Prelude 0.61 `elem` [0.6,0.7..0.9] False Whilst you could, without too much effort, enumerate a range of floating-point values such that all intermediate values were included, the resulting list would be massive. Single precision floating-point uses a 24-bit mantissa, so an exhaustive iteration of the range [0.5..1.0] would have 2^24+1 elements. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: exitFailure under forkProcess
John Goerzen wrote: I wonder what the behavior of fwrite() in this situation is. I don't know if it ever performs buffering such that write() is never called during a call to fwrite(). fwrite() is no different to other stdio functions in this regard. If the stream is buffered, a call to fwrite() may simply result in data being appended to the buffer; it doesn't guarantee a call to write(). -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: Is it safe to call getProcessExitCode more than once?
Peter Simons wrote: John Goerzen writes: Assuming it is based on wait() or one of its derivatives, and I suspect it is, you cannot call it more than once for a single process. That's what I _assume_, too, but a definite answer would be nice. In the meanwhile, I have found out that it might not be safe to call it once, even: CaughtException waitForProcess: does not exist (No child processes) That's a child I _did_ start and which apparently terminated before I called waitForProcess. Shouldn't I be getting the exit code of that process rather than an exception? I can think of two reasons why this might be happening: 1. SIGCHLD is being ignored (SIG_IGN); the Process library doesn't appear to be doing this, but something else might. 2. Something else (e.g. the RTS) is handling SIGCHLD and reaping the process automatically. Do waitForProcess and getProcessExitCode differ in their behavior other than that one blocks and other doesn't? Both call waitpid(); getProcessExitCode uses WNOHANG, while waitForProcess doesn't. They differ in their handling of errors. waitForProcess will throw an exception if waitpid() indicates any error (except EINTR, where it just retries the waitpid() call), whereas getProcessExitCode will return Nothing. Both will throw an exception if the process terminated on a signal. -- Glynn Clements [EMAIL PROTECTED] ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Process library and signals
Having looked at the latest version of the Process library, it appears that my earlier comments about signal handling may have been misinterpreted. First, my comments regarding the handling of SIGINT/SIGQUIT were specific to system(). The C system() function ignores these signals in the parent while the child is executing. However, this doesn't necessarily apply to other functions; e.g. popen() doesn't ignore these signals, and runProcess probably shouldn't either. With system(), the parent blocks until the child is finished, so if the user presses Ctrl-C to kill the currently executing process, they probably want to kill the child. If the parent wants to die on Ctrl-C, it can use WIFSIGNALED/WTERMSIG to determine that the child was killed and terminate itself. OTOH, with popen(), the parent continues to run alongside the child, with the child behaving as a slave, so the parent will normally want to control the signal handling. Ideally, system() equivalents (e.g. system, rawSystem) would ignore the signals in the parent, popen() equivalents (e.g. runInteractiveProcess) wouldn't, and lower-level functions (e.g. runProcess) would give you a choice. Unfortunately, there is an inherent conflict between portability and generality, as the Unix and Windows interfaces are substantially different. Unix has separate fork/exec primitives, with the option to execute arbitrary code between the two, whilst Windows has a single primitive with a fixed set of options. Essentially, I'm not sure that a Windows-compatible runProcess would be sufficiently general to accurately implement both system() and popen() equivalents on Unix. Either system/rawSystem should be implemented using lower-level functions (i.e. not runProcess) or runProcess needs an additional option to control the handling of signals in the child. Also, my comment regarding the signals being reset in the child was inaccurate. system() doesn't reset them in the sense of SIG_DFL. It sets them to SIG_IGN before the fork(), recording their previous handlers. After the fork, it resets them in the child to the values they had upon entry to the system() function (i.e. to the values they had before they were ignored). The effect is as if they had been set to SIG_IGN in the parent after the fork(), but without the potential race condition. Thus, if they were originally ignored in the parent before system() was entered, they will be ignored in the child. If they were at their defaults (SIG_DFL) before system() was entered, they will be so in the child. If they had been set to specific handlers, system() will restore those handlers in the child, but then execve() will reset them to SIG_DFL, as the handler functions won't exist after the execve(). -- Glynn Clements [EMAIL PROTECTED] ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
RE: [Haskell-cafe] Are handles garbage-collected?
Conal Elliott wrote: What happens when a System.IO.Handle falls out of scope without being explicitly hClosed? Is that a resource leak? Or will the RTS close the handle for me? AFAIK, Handles have finalisers which close them, but I don't know if GHC triggers garbage collection when file descriptors run out. If not, you will have problems if you manage to run out of fds between GCs. How about using bracket to introduce explicit close on end of scope? I'm puzzled why explicit bracketing is seen as an acceptable solution. It seems to me that bracketing has the same drawbacks as explicit memory management, namely that it sometimes retains the resource (e.g., memory or file descriptor) longer than necessary (resource leak) and sometimes not long enough (potentially disastrous programmer error). Whether the resource is system RAM, file descriptors, video memory, fonts, brushes, bitmaps, graphics contexts, 3D polygon meshes, or whatever, I'd like GC to track the resource use and free unused resources correctly and efficiently. File descriptors aren't simply a resource in the sense that memory is. Closing a descriptor may have significance beyond the process which closes it. If it refers to the write end of a pipe or socket, closing it may cause the reader to receive EOF; if it refers to a file, any locks will be released; and so on. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: ANNOUNCE: GHC version 6.2.2
Simon Marlow wrote: = The (Interactive) Glasgow Haskell Compiler -- version 6.2.2 = The GHC Team is pleased to announce the latest patchlevel release of GHC, 6.2.2. This is a bugfix release only, there are no new features. Code that worked with 6.2.1 will work unchanged with 6.2.2. Should it be possible to obtain this via CVS? My attempts to update from 6.2.1 with cvs update -r ghc-6-2-2 ... fail with: cvs [server aborted]: cannot write /cvs/CVSROOT/val-tags: Read-only file system -- Glynn Clements [EMAIL PROTECTED] ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
[Haskell] Re: ANNOUNCE: GHC version 6.2.2
Simon Marlow wrote: = The (Interactive) Glasgow Haskell Compiler -- version 6.2.2 = The GHC Team is pleased to announce the latest patchlevel release of GHC, 6.2.2. This is a bugfix release only, there are no new features. Code that worked with 6.2.1 will work unchanged with 6.2.2. Should it be possible to obtain this via CVS? My attempts to update from 6.2.1 with cvs update -r ghc-6-2-2 ... fail with: cvs [server aborted]: cannot write /cvs/CVSROOT/val-tags: Read-only file system -- Glynn Clements [EMAIL PROTECTED] ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] threading mutable state through callbacks
Vincenzo Ciancia wrote: Unfortunately, in this case the whole point of what people are trying to do with unsafePerformIO is to allow these things to be visible at the top level :-) Sometimes I get too much involved in what I think about, and forget the original goal :) A little _too_ naive, it seems, I apologize. So it's like the original idea, that using these toplevel IO bindings one has to impose an order of evaluation over all program bindings, which surely is against the current meaning of haskell programs, e.g. if I say conf - readMyConfFile init = fn conf people would agree that the correct meaning is to first evaluate all of the IO bindings and then the rest of the program: x1 - a1 ... xn - an v1 = expr1 ... vn = exprn main = action should be equivalent to main = do x1 - a1 ... xn - an let v1 = expr1 ... vn = exprn in action This would not change the meaning of a standard haskell program I think (but I am not an expert as you see). Am I wrong? In the former, the variables have global scope, and may be exported from the module. Also, what if you do this in a module other than Main? -- Glynn Clements [EMAIL PROTECTED] ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell-cafe] Re: What Functions are Standard?
Malcolm Wallace wrote: I can't comment on nhc98, but the Haskell98 standard doesn't include any mechanism for binary I/O. Ouch. That seems like a major oversight to me. Will there be any effort to fix that in the future? Note that, on Unix-like systems, there is no difference between text I/O and binary I/O on files. It is only Windows that requires a separation of the modes. There are two issues here. The first is EOL conversion; as Malcom notes, this isn't an issue on Unix, but it is an issue on Windows. On Windows, there is no standard way to obtain the contents of a file such that \n and \r\n are distinct. The second is character encoding/decoding. The Haskell98 I/O functions all deal with Chars. When reading a file, the byte stream is converted to a list of characters using an *unspecified* encoding. AFAIK, all implementations are currently hardcoded to assume ISO-8859-1, so you can reliably obtain the original list of bytes using the ord function. However, nothing in the standard dictates that ISO-8859-1 is used, and there has been talk of using the locale's encoding instead. If that were to happen, it would be practically (as well as theoretically) impossible to perform binary I/O using the Haskell98 API, even on Unix. This issue has been beaten to death fairly recently, so I'm not going to repeat it here. See the thread entitled Writing binary files from Sep 11-18 for the details. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] What Functions are Standard?
John Goerzen wrote: Hello, I have been writing code using the docs over at http://www.haskell.org/ghc/docs/latest/html/libraries/index.html, which is the only comprehensive library reference I could find. I am using some code from System.IO, supposedly from base. When I try to build this with nhc98, it doesn't know about hGetBuf, hPutBuf, or openBinaryFile from there or about mallocForeignPtrArray from the Foreign.* area. But all these look standard to me. They aren't; they are GHC extensions, except for mallocForeignPtrArray, which is specified by the FFI addendum: http://www.cse.unsw.edu.au/~chak/haskell/ffi/ What am I missing here? Does nhc98 really completely lack the ability to read binary data from a file? I can't comment on nhc98, but the Haskell98 standard doesn't include any mechanism for binary I/O. Or where should I be finding it, and how could I have known for myself that those particular ghc functions were unsupported elsewhere? The Haskell98 report can be found at: http://www.haskell.org/onlinereport/ Anything which isn't listed there is essentially a vendor extension. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: newCString -- to 'free' or not?
Peter Simons wrote: When I create a CString with Foreign.C.String.newCString, do I have to 'free' it after I don't need it anymore? Or is there some RTS magic taking place? How about Foreign.Marshal.Utils.new and all those other newXYZ functions? Yes. The new* functions allocate the memory with malloc, and you have to free it yourself. OTOH, the with* functions allocate the memory with alloca, and it is freed automatically. Also, a ForeignPtr includes a finaliser which will free the data automatically when it is no longer referenced. -- Glynn Clements [EMAIL PROTECTED] ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
RE: [Haskell-cafe] Re: Writing binary files?
MR K P SCHUPKE wrote: You wouldn't want to have to accumulate the entire body as a single byte string Ever heard of lazyness? Haskell does it quite well... Accumulating the entire body doesn't really do this because haskell is lazy. You don't need a more complex interface in Haskell! Are you sure that will work in the general case? Or are you assuming lazy I/O? -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Writing binary files?
Marcin 'Qrczak' Kowalczyk wrote: What I'm suggesting in the above is to sidestep the encoding issue by keeping filenames as byte strings wherever possible. Ok, but let it be in addition to, not instead treating them as character strings. Provided that you know the encoding, nothing stops you converting them to strings, should you have a need to do so. Processing data in their original byte encodings makes supporting multiple languages harder. Filenames which are inexpressible as character strings get in the way of clean APIs. When considering only filenames, using bytes would be sufficient, but in overall it's more convenient to Unicodize them like other strings. It also harms reliability. Depending upon the encoding, two distinct byte strings may have the same Unicode representation. Such encodings are not suitable for filenames. Regardless of whether they are suitable, they are used. For me ISO-2022 is a brain-damaged concept and should die. Well, it isn't likely to. I haven't addressed any of the other stuff about ISO-2022, as it isn't really relevant. Whether ISO-2022 is good or bad doesn't matter; what matters is that it is likely to remain in use for the foreseeable future. Such tarballs are not portable across systems using different encodings. Well, programs which treat filenames as byte strings to be read from argv[] and passed directly to open() won't have any problems with this. The OS itself may have problems with this; only some filesystems accept arbitrary bytes apart from '\0' and '/' (and with the special meaning for '.'). Exotic characters in filenames are not very portable. No, but most Unix programs manage to handle them without problems. A Haskell program in my world can do that too. Just set the encoding to Latin1. But programs should handle this by default, IMHO. IMHO it's more important to make them compatible with the representation of strings used in other parts of the program. Why? Filenames are, for the most part, just tokens to be passed around. Filenames are often stored in text files, True. whose bytes are interpreted as characters. Sometimes true, sometimes not. Where filenames occur in data files, e.g. configuration files, the program which reads the configuration file typically passes the bytes directly to the OS without interpretation. Applying QP to non-ASCII parts of filenames is suitable only if humans won't edit these files by hand. Who said anything about QP? My specific point is that the Haskell98 API has a very big problem due to the assumption that the encoding is always known. Existing implementations work around the problem by assuming that the encoding is always ISO-8859-1. The API is incomplete and needs to be enhanced. Programs written using the current API will be limited to using the locale encoding. That just adds unnecessary failure modes. But otherwise programs would continuously have bugs in handling text which is not ISO-8859-1, especially with multibyte encoding where pretending that ISO-8859-2 is ISO-8859-1 too often doesn't work. Why? I can't switch my environment to UTF-8 yet precisely because too many programs were written with the attitude you are promoting: they don't care about the encoding, they just pass bytes around. That's all that many programs should be doing. Bugs range from small annoyances like tabular output which doesn't line up, through mangled characters on a graphical display, to full-screen interactive programs being unusable on a UTF-8 terminal. IOW: 1. display doesn't work correctly, 2. display doesn't work correctly, and 3. display doesn't work correctly. You keep citing cases involving graphical display as a reason why all programs should be working with characters all of the time. I haven't suggested that programs should never deal with characters, yet you keep insinuating that is my argument, then proceed to attack it. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Writing binary files?
-8859-2 unless specified otherwise. 1. In that situation, you can't avoid the encoding issues. It doesn't matter what the default is, because you're going to have to set the encoding anyhow. Why do you always want me to set the encoding? That should be the job of the RTS. Because you might know the encoding, and the RTS doesn't. The locale is a fallback mechanism, for the situation where you *need* an encoding but one hasn't been specified by other means. 2. If you assume ISO-8859-1, you can always convert back to Word8 If I want a list of Word8's, then I should be able to get them without extracting them from a string. The point is that, currently, you can't. Nothing in the core Haskell98 API actually uses Word8, it all uses Char/String. then re-decode as UTF-8. If you assume UTF-8, anything which is neither UTF-8 nor ASCII will fail far more severely than just getting the collation order wrong. If I use String's to handle binary data, then I should expect things to break. If I want to get text, and it's not in the expected encoding, then the user has messed up. Or maybe the expectation is incorrect. Well, my view is essentially that files should be treated as containing bytes unless you explicitly choose to decode them, at which point you have to specify the encoding. Why do you always want me to _manually_ specify an encoding? Because we don't have an oracle which will magically determine the encoding for you. If I want bytes, I'll use the (currently being discussed, see beginning of this thread) binary I/O API, if I want String's (i.e. text), I'll use the current I/O API (which is pretty text-orientated anyway, see hPutStrLn, hGetLine, ...). If you want text, well, tough; what comes out most system calls and core library functions (not just read()) are bytes. There isn't any magic wand which will turn them into characters without knowing the encoding. completely new wide-character API for those who wish to use it. Which would make it horrendously difficult to do even basic I18N. Why? That gets the failed attempt at I18N out of everyone's way with a minimum of effort and with maximum backwards compatibility for existing code. If existing code, expects String's to be just a list of bytes, it's _broken_. I know. That's what I'm saying. The problem is that the broken code is the Haskell98 API. String's are a list of unicode characters, [Word8] is a list of bytes. And what comes out of (and goes into) most core library functions is the latter. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Writing binary files?
, to be used in an unfriendly environment. A Haskell program in my world can do that too. Just set the encoding to Latin1. But programs should handle this by default, IMHO. Filenames are, for the most part, just tokens to be passed around. You get a value from argv[], and pass it to open() or whatever. It doesn't need to have any meaning. My specific point is that the Haskell98 API has a very big problem due to the assumption that the encoding is always known. Existing implementations work around the problem by assuming that the encoding is always ISO-8859-1. The API is incomplete and needs to be enhanced. Programs written using the current API will be limited to using the locale encoding. That just adds unnecessary failure modes. Just as ReadFile is limited to text files because of line endings. What do you prefer: to provide a non-Haskell98 API for binary files, or to fix the current API by forcing programs to use \r\n on Windows and \n on Unix manually? That's a harder case. There is a good reason for auto-converting EOL, as most programs actually process file contents. Most programs don't process filenames; they just pass them around. If filenames were expressed as bytes in the Haskell program, how would you map them to WinAPI? If you use the current Windows code page, the set of valid characters is limited without a good reason. Windows filenames are arguably characters rather than bytes. However, if you want to present a common API, you can just use a fixed encoding on Windows (either UTF-8 or UTF-16). This encoding would be incompatible with most other texts seen by the program. In particular reading a filename from a file would not work without manual recoding. We already have that problem; you can't read non-Latin1 strings from files. In some regards, the problem is worse on Windows, because of the prevalence of non-ASCII text (Windows 12xx and smart quotes), so using UTF-8 for file contents on Windows is even harder. Which is a pity. ISO-2022 is brain-damaged because of enormous complexity, Or, depending upon ones perspective, Unicode is brain-damaged because, for the sake of simplicity, it over-simplifies the situation. The over-simplification is one reason for it's lack of adoption in the CJK world. It's necessary to simplify things in order to make them usable by ordinary programs. People reject overly complicated designs even if they are in some respects more general. ISO-2022 didn't catch - about the only program I've seen which tries to fully support it is Emacs. And X. Compound text is ISO-2022. For commercial X software, Motif (which uses compound text) is still the most widely-used toolkit. But, then, the fact that you haven't seen many ISO-2022 programs is probably because you're used to using programs developed by and for Westerners. In the far East, ISO-2022 is by far the most popular encoding. There, you could realistically ignore all other encodings. BTW, that's why Emacs (and XEmacs) support ISO-2022 much better than they do UTF-8. Because MuLE was written by Japanese developers. Multi-lingual text consists of distinct sections written in distinct languages with distinct alphabets. It isn't actually one big chunk in a single global language with a single massive alphabet. Multi-lingual text is almost context-insensitive. You can copy a part of it into another text, even written in another language, and it will retain its alphabet - this is much harder with stateful ISO-2022. ISO-2022 is wrong not by distinguishing alphabets but by being stateful. Sure, the statefulness adds complexity (which is one of the reasons so many people prefer to work with UTF-8), but it has the benefit of providing distinct markers to indicate where the character set is being switched (that isn't a compelling advantage; you could reconstruct the markers if you could uniquely determine the character set for each character). OTOH, Unicode is wrong by not distinguishing character sets. This is a significant reason why it hasn't been adopt in the far East (specifically, Han unification). and ISO-8859-x have small repertoires. Which is one of the reasons why they are likely to persist for longer than UTF-8 true believers might like. My I/O design doesn't force UTF-8, it works with ISO-8859-x as well. But I was specifically addressing Unicode versus multiple encodings internally. The size of the Unicode alphabet effectively prohibits using codepoints as indices. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Writing binary files?
locale setting. If I want to read text and can't determine the encoding by other ways (protocol spec, ...), then it's what the user set his locale setting to. No. An oracle would always get it right. The locale merely provides a fallback. If you want text, well, tough; what comes out most system calls and core library functions (not just read()) are bytes. Which need to be interpreted by the program depending on where these bytes come from. They don't necessarily need to be interpreted. A lot of data simply gets routed from one place to another. E.g. a program reads a filename from argv[i] and passes it to open(). It doesn't matter if the filename is in Klingon. There isn't any magic wand which will turn them into characters without knowing the encoding. If I know the encoding, I should be able to set it. If I don't, it's the locale setting. If you *need* an encoding, and don't have any better information, then the locale provides a last resort. Decoding bytes according to the locale for the sake of it just adds an unnecessary failure mode. completely new wide-character API for those who wish to use it. Which would make it horrendously difficult to do even basic I18N. Why? Having different types for single-byte and multi-byte strings together with seperate functions to handle them (that's what I assume you mean by a new wide-character API) with single-byte strings being the preferred one (the cause of being a seperate API) would make sorting, upper/lower case testing etc. not exactly easier. For case testing, locale-dependent sorting and the like, you need to convert to characters. [Although possibly only temporarily; you can sort a list of byte strings based upon their corresponding character strings using sortBy. This means that a decoding failure only means that the ordering will be wrong. This is essentially what happens with ls if you have filenames which aren't valid in the current locale.] Note: there are still situations where sorting bytes makes sense, i.e. where you only need *an* ordering rather than a specific ordering, e.g. uniq. I know. That's what I'm saying. The problem is that the broken code is the Haskell98 API. No, it's not broken. It just has some missing features (i.e. I/O / env functions accepting bytes instead of strings). It's broken. Being able to represent filenames as byte strings is fundamental. Being able to convert them to or from character strings is useful but not essential. The only reason why the existing API doesn't cause serious problems is because the translation is currently hardwired to an encoding which can't fail. String's are a list of unicode characters, [Word8] is a list of bytes. And what comes out of (and goes into) most core library functions is the latter. Strictly speaking, the former comes out with the semantics of the latter. :-) By core library functions, I was referring primarily to libc, not the Haskell library functions which were built upon them. The Haskell developers can change Haskell, they can't change libc. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: [Haskell-cafe] Re: Writing binary files?
Simon Marlow wrote: Which is why I'm suggesting changing Char to be a byte, so that we can have the basic, robust API now and wait for the more advanced API, rather than having to wait for a usable API while people sort out all of the issues. An easier way is just to declare that the existing API assumes a Latin-1 encoding consistently. Later we might add a way to let the application pick another encoding, or request that the I/O library uses the locale encoding. But how do you do that without breaking stuff? If the application changes the encoding to UTF-8 (either explicitly, or by using the locale's encoding when it happens to be UTF-8), then code such as: [filename] - getArgs openFile filename ReadMode will fail if filename isn't a valid UTF-8 sequence. Similarly for the other cases where the OS accepts/returns byte strings but the Haskell interface uses String. Currently, the use of String for byte strings doesn't cause problems because decoding using ISO-8859-1 can't fail. Allowing the use of a fallible decoder introduces a new set of issues. E.g. what happens if you call getDirectoryContents for a directory which contains filenames which aren't valid in the current encoding? Does the call fail outright, or are invalid entries silently omitted? I'm less concerned about the handling of streams, as you can reasonably add a way to change the encoding before any data has been read or written. I'm more concerned about FilePaths, argv, the environment etc. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Writing binary files?
Udo Stenzel wrote: One more reason to fix the I/O functions to handle encodings and have a seperate/underlying binary I/O API. The problem is that we also need to fix them to handle *no encoding*. What are you proposing here? Making the breakage even worse by specifying a text based api that uses no encoding? No. I'm suggesting that many of the I/O functions shouldn't be treating their arguments or return values as text. Having a seperate byte based api is far better. If you don't know the encoding, all you have is bytes, no text. My point is that many of the existing functions should be changed to use bytes instead of text (not separate byte/char versions). E.g.: type FilePath = [Byte] If you have a reason to treat a FilePath as text, then you convert it. E.g. names - getDirectoryContents dir let namesT = map (toString localeEncoding) names We don't need a separate getDirectoryContentsAsText, and we certainly don't want that to be the default. For stream I/O, then having both text and binary read/write functions makes sense. String's are a list of unicode characters, [Word8] is a list of bytes. And what comes out of (and goes into) most core library functions is the latter. So System.Directory needs to be specified in terms of bytes, too. Looks like a clean solution to me. Sure. But I'm looking for a solution which doesn't involve re-writing everything, and which won't result in lots of programs suddenly becoming unreliable if the hardwired default ISO-8859-1 conversion is changed. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: [Haskell-cafe] Re: Writing binary files?
MR K P SCHUPKE wrote: E.g. what happens if you call getDirectoryContents for a directory which contains filenames which aren't valid in the current encoding? Surely this shows the problem with the idea of a 'current encoding' Yes. In case I haven't already made this clear, my argument is essentially that it's the API which is broken, rather than the implementations. ... You could be reading files from two remote servers each using different encodings... So you could have read and write raw [Word8] and read and write char, somehting like: readWithEncoder :: ([Word8] - [Char]) - IO [Char] writeWithEncoder :: ([Char] - [Word8]) - [Char] - IO () In the general case, it needs to be a bit more complex than that, in order to handle stateful encodings (e.g. ISO-2022), or to handle decoding multi-byte encodings (e.g. UTF-8) in chunks. Unfortunately, the iconv interface doesn't allow the encoder state to be extracted, so a generic iconv-based converter would have to be in the IO monad. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: [Haskell-cafe] Re: Writing binary files?
Simon Marlow wrote: Which is why I'm suggesting changing Char to be a byte, so that we can have the basic, robust API now and wait for the more advanced API, rather than having to wait for a usable API while people sort out all of the issues. An easier way is just to declare that the existing API assumes a Latin-1 encoding consistently. Later we might add a way to let the application pick another encoding, or request that the I/O library uses the locale encoding. But how do you do that without breaking stuff? If the application changes the encoding to UTF-8 (either explicitly, or by using the locale's encoding when it happens to be UTF-8), then code such as: [filename] - getArgs openFile filename ReadMode will fail if filename isn't a valid UTF-8 sequence. Similarly for the other cases where the OS accepts/returns byte strings but the Haskell interface uses String. And that's the correct behaviour, isn't it? No. The correct behaviour is to keep such data as byte strings. Otherwise it's going to be hard to write robust programs if the hard-wired ISO-8859-1 encoding is ever changed. In the current implementation, getArgs gets a list of bytes from argv[], which it converts to a String. The String is passed to openFile, which converts it back to a list of bytes which are then passed to open(). Thus the list of bytes is effectively fed through (encode . decode). For ISO-8859-*, this is the identity function. For UTF-8, it's a subfunction of the identity function, i.e. it either returns its input or it fails. I don't see what is to be gained by having it fail. It would be preferable to just pass the byte string directly from argv[] to open(). I'm less concerned about the handling of streams, as you can reasonably add a way to change the encoding before any data has been read or written. I'm more concerned about FilePaths, argv, the environment etc. Yes, these are interesting issues. Filenames are stored as character strings on some OSs (eg. Windows) and byte strings on others. So the Haskell portable API should probably use String, and do decoding based on the locale (if the programmer asks for it). Argv and the environment - I don't know. Windows CreateProcess() allows these to be UTF-16 strings, but I don't know what encoding/decoding happens between CreateProcess() and what the target process sees in its argv[] (can't be bothered to dig through MSDN right now). I suspect these should be Strings in Haskell too, with appropriate decoding/encoding happening under the hood. I suspect that Windows will convert them according to the active codepage, so that OpenFileA(argv[i], ...) works. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Writing binary files?
Gabriel Ebner wrote: For case testing, locale-dependent sorting and the like, you need to convert to characters. [Although possibly only temporarily; you can sort a list of byte strings based upon their corresponding character strings using sortBy. This means that a decoding failure only means that the ordering will be wrong. This is essentially what happens with ls if you have filenames which aren't valid in the current locale.] sortBy could only cope with single-byte encodings. Multi-byte encodings would need something else. I think that you may have misunderstood my point. I was referring to something like this: type ByteString = [Word8] decode :: ByteString - String decode = ... comparator :: ByteString - ByteString comparator s1 s2 = compare (decode s1) (decode s2) sortByteStrings :: [ByteString] - [ByteString] sortByteStrings ss = sortBy comparator ss The byte strings which are returned from sortByteStrings are the original byte strings, but the ordering will be determined by the encoding. This produces the same results as decode-sort-encode (in the cases where the latter actually works), but is more robust. It's broken. Being able to represent filenames as byte strings is fundamental. Being able to convert them to or from character strings is useful but not essential. The only reason why the existing API doesn't cause serious problems is because the translation is currently hardwired to an encoding which can't fail. Handling binary filenames is hardly fundamental. It isn't even very portable, see the posts about filename handling under modern Windows. It might be an important feature, but there are other programs out there (mostly GUIs) that expect filenames to be encoded according to the locale settings too. It's fundamental if you want your programs to be robust. For most programs, there is no legitimate reason to refuse to read a file because of its name. A GUI program (or for that matter, a terminal) might legitimately fail to *display* a filename correctly if it can't decode it (it has to index into the font). But that isn't a reason to reject it altogether. E.g. if I create a file whose name contains control characters, most GUI programs display it incorrectly in the file selection dialog, but they still manage to open it. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: [Haskell-cafe] Re: Writing binary files?
MR K P SCHUPKE wrote: In the general case, it needs to be a bit more complex than that, Thats why the functions handled lists not individual characters, I was assuming that each [Word8] - [Char] represented a valid and complete encoding block... IE at the start of each call it assumes no escapes. All this means is than when reading in chunks you paste those chunks together before conversion, and you can only break outside of escapes. This in my opinion is better behaviour anyway... I don't want some hidden escape state mangling output, just because some earler code generated invalid output. Right. Certainly, a stateless interface will handle converting complete strings (pathnames, arguments, etc). But, ultimately we will have need of a more general interface. E.g. in the chunked HTTP example which Oleg gave, you would probably want separate decoders for the headers and body, switching between them as you read the stream. You wouldn't want to have to accumulate the entire body as a single byte string just so that you could decode it in one go, and you can't just push a decoder onto the stream. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Writing binary files?
) nor uses your locale's encoding? That's a pretty big sacrifice. My view is that, right now, we have the worst of both worlds, and taking a short step backwards (i.e. narrow the Char type and leave the rest alone) is a lot simpler (and more feasible) than the long journey towards real I18N. It would bury any hope in supporting a UTF-8 environment. I've heard that RedHat tried to impose UTF-8 by default. It was mostly a failure because it's too early, too many programs are not ready for it. I guess the RedHat move helped to identify some of them. But UTF-8 will inevitably be usable in future. If they tried a decade hence, it would still be too early. The single-byte encodings (ISO-8859-*, koi-8, win-12xx) aren't likely to be disappearing any time soon, nor is ISO-2022 (UTF-8 has quite spectacularly failed to make inroads in CJK-land; there are probably more UTF-8 users in the US than there). It would be great if Haskell programs were in the group which can support it instead of being forced to be abandoned because of lack of Unicode support in the language they are written in. Haskell should be able to support it, but it shouldn't refuse to support anything else, it shouldn't make you jump through hoops to write usable programs, and we shouldn't have to wait until all of the encoding issues have been sorted out to do things which don't even deal with encodings. Look, C has all of the functionality that we're talking about: wide characters, wide versions of string.h and ctype.h, and conversion between byte-streams and wide characters. But it did it without getting in the way of writing programs which don't care about encodings, without consigning everything which has gone before to the scrap heap, and without everyone having to wait a couple of decades to (reliably) do simple things like copying a file to a socket or enumerating a directory. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Writing binary files?
as part of the encoding. E.g. for network protocols which use CRLF, it would be useful to be able to set CRLF as the EOL convention then use e.g. hPutStrLn to write lines. The same thoughts apply to filenames. Make them [Word8] and convert explicitly. Well, it's arguable that they should be [Word8] on Unix and String on Windows. I suppose that you could handle the Windows case by automatically converting to/from UTF-8. By the way, I think a path should be a list of names (that is of type [[Word8]]) and the library would be concerned with putting in the right path separator. Add functions to read and show pathnames in the local conventions and we'll never need to worry about path separators again. There would certainly be some advantages to making FilePath an abstract type, but there are quite a few corner cases to deal with. There are limits to the extent to which this can be achieved. E.g. what happens if you set the encoding to UTF-8, then call getDirectoryContents for a directory which contains filenames which aren't valid UTF-8 strings? Well, then you did something stupid, didn't you? If you don't know the encoding you shouldn't decode anything. That's a strong point against any implicit decoding, I think. Yes. However, I suspect that we will have to live with some of the mistakes of the past, i.e. using String in the I/O functions. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Writing binary files?
portable. It's easier to emulate the traditional C paradigm in the Unicode paradigm than vice versa, I'm not entirely sure what you mean by that, but I think that I disagree. The C/Unix approach is more general; it isn't tied to any specific encoding. If filenames were expressed as bytes in the Haskell program, how would you map them to WinAPI? If you use the current Windows code page, the set of valid characters is limited without a good reason. Windows filenames are arguably characters rather than bytes. However, if you want to present a common API, you can just use a fixed encoding on Windows (either UTF-8 or UTF-16). If they tried a decade hence, it would still be too early. The single-byte encodings (ISO-8859-*, koi-8, win-12xx) aren't likely to be disappearing any time soon, nor is ISO-2022 (UTF-8 has quite spectacularly failed to make inroads in CJK-land; there are probably more UTF-8 users in the US than there). Which is a pity. ISO-2022 is brain-damaged because of enormous complexity, Or, depending upon ones perspective, Unicode is brain-damaged because, for the sake of simplicity, it over-simplifies the situation. The over-simplification is one reason for it's lack of adoption in the CJK world. Multi-lingual text consists of distinct sections written in distinct languages with distinct alphabets. It isn't actually one big chunk in a single global language with a single massive alphabet. and ISO-8859-x have small repertoires. Which is one of the reasons why they are likely to persist for longer than UTF-8 true believers might like. E.g. languages which don't primarily use the Roman alphabet (Greek, Russian) can still be represented as one byte per character. And it's feasible to have tables which are indexed by codepoint; as a counter-example, calling XQueryFont for a Unicode font *really* sucks if either the server doesn't have the BigFont extension or, worse still, it can't use it because the client is remote. I would not *force* UTF-8, but it should work for those who voluntarily choose to use it as their locale encoding. Including filenames. Not forcibly decoding filenames isn't the same thing as preventing them from being decoded. Look, C has all of the functionality that we're talking about: wide characters, wide versions of string.h and ctype.h, and conversion between byte-streams and wide characters. ctype.h is useless for UTF-8. Hello? Let's try that again, with emphasis: C has ... WIDE VERSIONS OF string.h and ctype.h They're called wchar.h and wctype.h. There is no capability of attaching automatic recoders of explicitly chosen encodings to file handles. At this point you starting engaging in diversionary tactics. Again. No, the C language doesn't make these issues easy and has lots of historic baggage. The issues aren't easy, and have lots of historic baggage. That's reality. Fortunately, C has a history of being geared to reality, rather than the comfortable fantasy where the issues don't exist. Which is why everyone uses it. But it did it without getting in the way of writing programs which don't care about encodings, It does get in the way of writing programs which do care, because they must do whole recoding themselves and remember which API has which character set limitations. No. Not doing something for you isn't the same thing as getting in the way. Getting in the way is doing for you something which you didn't want done in the first place. Getting in the way is not letting you do something yourself. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Writing binary files?
Graham Klyne wrote: In particular, the idea of narrowing the Char type really seems like a bad idea to me (if I understand the intent correctly). Not so long ago, I did a whole load of work on the HaXml parser so that, among other things, it would support UTF-8 and UTF-16 Unicode (as required by the XML spec). To do this depends upon having a Char type that can represent the full repertoire of Unicode characters. Note: I wasn't proposing doing away with wide character support altogether. Essentially, I was suggesting making Char a byte and having e.g. WideChar for wide characters. The reason being that the existing Haskell98 API uses Char for functions which are actually dealing with bytes. In an ideal world, the IO, System and Directory modules (and the Prelude I/O functions) would have used Byte, leaving Char to represent a (wide) character. However, that isn't the hand we've been dealt. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] FilePath handling [Was: Writing binary files?]
Henning Thielemann wrote: Udo Stenzel wrote: The same thoughts apply to filenames. Make them [Word8] and convert explicitly. By the way, I think a path should be a list of names (that is of type [[Word8]]) and the library would be concerned with putting in the right path separator. Add functions to read and show pathnames in the local conventions and we'll never need to worry about path separators again. I even plead for an abstract data type FilePath which supports operations like 'enter a directory', 'go one level higher' and so on. Are you referring to pure operations on the FilePath, e.g. appending and removing entries? That's reasonable enough. But it needs to be borne in mind that there's a difference between: setCurrentDirectory .. and: dir - getCurrentDirectory setCurrentDirectory $ parentDirectory dir [where parentDirectory is a pure FilePath - FilePath function.] if the last component in the path is a symlink. If you want to make FilePath an instance of Eq, the situation gets much more complicated. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Unicoded filenames
Marcin 'Qrczak' Kowalczyk wrote: Here is what happens when a language provides only narrow-char API for filenames: I have a filename as an UTF-8 encoded string. I need to be able to handle strange chars like accents, Asian chars etc. Is there any way to create a file with that name? I only need it on Win32. Windows uses UTF-16 for filenames, but provides a non-Unicode interface for legacy applications; the standard open() function that OCaml's open_out wraps appears to use the legacy interface. The precise codepage this uses is system-dependent, and AFAIK there's no way for a program to determine what it is without calling out to the Win32 API, but you can be pretty sure it won't be UTF-8. In other words, there is no reliable way to use a filename containing non-ASCII characters with OCaml's standard library. No, this is what happens when an API imposes restrictions upon the filenames which it can handle. Essentially, it's due to two (or possibly three) factors: 1. The fact that Windows uses wide strings, rather than multi-byte strings, for filenames. 2. The fact that Windows' compatibility interface is broken, i.e. it only lets you access filenames which can be represented in the current codepage (which, to me, is highly analogous to only supporting filenames which are valid in the current locale). 3. Possibly that OCaml insists upon using UTF-8. [I don't know that this is the case, but the fact that they specifically mention UTF-8 suggests that it might be.] IOW, this incident seems to oppose, rather than support, the filenames-as-characters viewpoint. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Writing binary files?
David Menendez wrote: I'd like to see the following: - Duplicate the IO library. The duplicate should work with [Byte] everywhere where the old library uses String. Byte is some suitable unsigned integer, on most (all?) platforms this will be Word8 - Provide an explicit conversion between encodings. A simple conversion of type [Word8] - String would suit me, iconv would provide all that is needed. I like this idea, but I say there should be a bit-oriented layer beneath everything. The byte stream is inherent, as that's (usually) what the OS gives you. Everything else is synthesised. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Writing binary files?
in areas which work with arbitrary binary data (mostly file contents). And the ability to actually use any encoding except ISO-8859-1 in any meaningful way. I.e. encoders/decoders for other encodings, along with the means to specify which encoding to use for functions which need to perform encoding or decoding. My main concern is that someone will get sick of waiting and make the wrong fix, i.e. keep the existing API but default to the locale's encoding, so that every simple program then has to explicitly set it back to ISO-8859-1 to get reasonable worst-case behaviour. Supporting byte I/O and supporting character recoding needs to be done before this. My view is that, right now, we have the worst of both worlds, and taking a short step backwards (i.e. narrow the Char type and leave the rest alone) is a lot simpler (and more feasible) than the long journey towards real I18N. More generally, this is the most intrusive example of a common problem with too many Haskell libraries, i.e. exporting an interface which is too high-level and glosses over too many detail. But this isn't some obscure third-party libray. This is the Haskell98 standard library; some of it's in the Prelude. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Writing binary files?
Abraham Egnor wrote: Passing a Ptr isn't that onerous; it's easy enough to make functions that have the signature you'd like: import System.IO import Data.Word (Word8) import Foreign.Marshal.Array hPutBytes :: Handle - [Word8] - IO () hPutBytes h ws = withArray ws $ \p - hPutBuf h p $ length ws hGetBytes :: Handle - Int - IO [Word8] hGetBytes h c = allocaArray c $ \p - do c' - hGetBuf h p c peekArray c' p The problem with this approach is that the entire array has to be held in memory, which could be an issue if the amount of data involved is large. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Writing binary files?
Sven Panne wrote: Also, changing the existing functions to deal with encodings is likely to break a lot of things (i.e. anything which reads or writes data which is in neither UTF-8 nor the locale-specified encoding). Hmmm, the Unicode tables start with ISO-Latin-1, so what would exactly break when we stipulate that the standard encoding for string I/O in Haskell is ISO-Latin-1? That would essentially be formally specifying the existing behaviour, which wouldn't break anything, including the mechanism for reading/writing binary data which I suggested (and which is the only choice if your Haskell implementation doesn't have h{Get,Put}Buf). The problems would come if it was decided to change the existing behaviour, i.e. use something other than Latin1. -- Glynn Clements [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe