from:"Glynn Clements"

Re: [Haskell] Re: haskell.org Public Domain

2006-01-11 Thread Glynn Clements


Ashley Yakeley wrote:

 I think we're going for public domain, assuming we can also add text to 
 satisfy German law, etc.

AIUI, the main problem with the notion of public domain under
typical European copyright law is that authors have moral rights (e.g. 
the right of attribution and to prohibit defacement) which are
inalienable, i.e. any statement waiving or rescinding such rights is
void and unenforceable. IOW, no matter what language the licence uses,
the author retains the right to sue for violations of their moral
rights.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell

Re: [Haskell] haskell.org Public Domain

2006-01-09 Thread Glynn Clements


Ashley Yakeley wrote:

  Sounds like a stupid idea?  Thought so.  A wiki should be public domain,
  plain and simple.  (Put contributions with a different license somewhere
  else and link to them.  No big deal.)
 
 There seems to be a consensus for public domain both here and on the 
 wiki page.
 http://haskell.org/haskellwiki/HaskellWiki:Community_Portal
 
 Does anyone have any objections to putting everything in the public 
 domain?

Insisting that everything is in the public domain prevents the
inclusion of third-party content, unless either:

a) that content is also in the public domain (which is unusual; even
content which is freely redistributable usually has some kind of
restriction, even if it's only an acknowledgement requirement), or

b) you can obtain a specific exemption from its author (assuming that
you can actually identify and locate the author, which isn't always
easy for projects with a long history and many contributors).

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell

Re: [Haskell] GHC and GLUT

2006-01-06 Thread Glynn Clements


Wolfgang Jeltsch wrote:

  Does installing the libsm-dev package help?
 
  -- Mark
 
 Thank you, Mark and Jared.
 
 Installing the libsm-dev package resulted in ld outputting a similar error 
 message for -lXmu.  After also installing libxmu-dev, linking was possible.
 
 Are libsm-dev and/or libxmu-dev needed for every Haskell GLUT application or 
 just for certain examples?  If they are always needed, some package has not 
 declared all its dependencies. In this case, which package is the one with
 the dependency declaration bug?

GLUT uses XmuLookupStandardColormap from libXmu. libXmu requires
libXt, which in turn requires libSM.

As that's the only Xmu function which GLUT uses, I would have thought
that it would be worth making the effort to remove that particular
dependency from GLUT.

Insofar as it's a bug, it's in the dependency list for GLUT. 
Ultimately, the correct dependency list for the Haskell GLUT package
is GLUT, plus whatever GLUT happens to require on your particular
system. But I don't know whether the dependency list can be generated
dynamically.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell

Re: [Haskell-cafe] How to print a string (lazily)

2006-01-03 Thread Glynn Clements


Donn Cave wrote:

  I sometimes call a function with side-effects in IO a command.  But
  the terms are fungible.  But calling putStr a function is correct.  It
  is not a pure function however.
 
 Is that the standard party line?  I mean, we all know its type and
 semantics, whatever you want to call them, but if we want to put names
 to things, I had the impression that the IO monad is designed to work
 in a pure functional language - so that the functions are indeed actually
 pure, including putStr.

putStr is a pure function, but it isn't a pure function ;)

OTOH, getLine isn't even a function, just a value.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bug in Haskell for C programmers tutorial?

2006-01-02 Thread Glynn Clements


Cale Gibbard wrote:

   You shouldn't have to flush output manually. Which implementation are
   you using? Try importing System.IO and doing:
   hGetBuffering stdout = print
   and see what gets printed. It should be NoBuffering.
 
  The buffering for stdout should be LineBuffering if stdout is a
  terminal and BlockBuffering otherwise. The buffering for stderr should
  always be NoBuffering.
 
 It's actually not, if you're starting your program from ghci, which is
 what confused me. From GHCi, you get NoBuffering on stdout, and
 LineBuffering on stdin, which is sane for interactive programs. Why
 anyone would want LineBuffering as default on stdout is somewhat
 mysterious to me.

Because most programs output entire lines, and the implementation of
Haskell's putStr etc sucks when using NoBuffering (one write() per
character).

C follows the same rules (stdin/stdout use line buffering for
terminals, block buffering otherwise, stderr is always unbuffered),
even though C's puts(), printf() etc behave a lot better with
unbuffered streams (they pass either whole strings or large chunks to
write() rather than individual characters).

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Bug in Haskell for C programmers tutorial?

2006-01-01 Thread Glynn Clements


Cale Gibbard wrote:

 You shouldn't have to flush output manually. Which implementation are
 you using? Try importing System.IO and doing:
 hGetBuffering stdout = print
 and see what gets printed. It should be NoBuffering.

The buffering for stdout should be LineBuffering if stdout is a
terminal and BlockBuffering otherwise. The buffering for stderr should
always be NoBuffering.

 If for whatever
 reason it's not, you can set it to that at the start of your programs
 with hSetBuffering stdout NoBuffering

I would suggest using an explit hFlush after each putStr rather than
disabling buffering altogether, as disabling buffering will result in
putStr etc calling write() once per character, which is very
inefficient.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Haskell vs OCaml

2005-12-25 Thread Glynn Clements


Branimir Maksimovic wrote:

 Could you give an example of a loop you find awkward in Haskell?
 
 Well I want simple loop for(int i =0;i10;++i)doSomething(i);

mapM_ doSomething [0..9]

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Functions with side-effects?

2005-12-23 Thread Glynn Clements


Daniel Carrera wrote:

 I'm a Haskell newbie and I don't really understand how Haskell deals 
 with functions that really must have side-effects. Like a rand() 
 function or getLine().

Those aren't functions.

A function is a single-valued relation, i.e. a (possibly infinite) set
of ordered pairs x,y such that the set doesn't contains two pairs
a,b and c,d where a == c and b =/= d. IOW, a static mapping from
argument to result.

Haskell uses the term function to mean a function in the strict
mathematical sense, and not (like most other languages) to mean a
procedure which returns a value as well as reading and writing some
implicit state.

 I know this has something to do with monads, but I don't really 
 understand monads yet. Is there someone who might explain this in newbie 
 terms? I don't need to understand the whole thing, I don't need a rand() 
 function right this minute. I just want to understand how Haskell 
 separates purely functional code from non-functional code (I understand 
 that a rand() function is inevitably not functional code, right?)

All Haskell code is functional (discounting certain low-level details
such as unsafePerformIO).

Side effects are implemented by making the prior state an argument and
the new state a component of the result, i.e. a C procedure of type:

res_t foo(arg_t);

becomes a Haskell function with type:

ArgType - State - (State, ResType)

To simplify coding (particularly, making sure that you use the correct
iteration of the state at any given point), all of this is usually
wrapped up in an instance of the Monad class. But there isn't anything
special about Monad instances. The class itself and many of its
instances are written in standard Haskell.

To provide a concrete example, here's a monadic random number
generator:

type Seed = Int

data Rand a = R { app :: Seed - (Seed, a) }

myRand :: Rand Int
myRand = R $ \seed - let
result = (seed' `div` 65536) `mod` 32768
seed' = seed * 1103515245 + 12345
in (seed', result)

instance Monad Rand where
f = g = R $ \seed - let (seed', x) = app f seed
 in app (g x) seed'
return x = R $ \seed - (seed, x)

runR :: Seed - Rand a - a
runR seed f = snd $ app f seed

Example usage:

randomPair :: Rand (Int, Int)
randomPair = do
myRand = \x -
myRand = \y -
return (x, y)

or, using do notation (which is simply syntactic sugar):

randomPair :: Rand (Int, Int)
randomPair = do
x - myRand
y - myRand
return (x, y)

main = print $ runR 99 randomPair

The main difference between the built-in IO monad and the Rand monad
above is that where the Rand monad has a Seed for its state, the IO
monad has the (conceptual) World type.

As the World type has to represent the entire observable state of the
universe, you can't actually obtain instances of it within a Haskell
program, and thus there is no equivalent to runR.

Instead, you provide an IO instance (main) to the runtime, which
(conceptually) applies it to the World value representing the state of
the universe at program start, and updates the universe to match the
World value returned from main at program end.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: syscall, sigpause and EINTR on Mac OSX

2005-12-11 Thread Glynn Clements


Joel Reymont wrote:

  This should be enough reason to scan  for keyboard events instead.
  There is no guarantee that SIGINT would be sent only by keyboard.
 
 import System.Posix.Signals
 
 main =
  do installHandler sigINT Ignore Nothing
 x - getChar
 if x == '\ETX'
then do print Gotcha!
else do print Try again!
main
 
 This does not work for ^C. Can it actually be done? Of course I can  
 just read q but that would be too simple :-).

You have to put the terminal into raw mode to be able to read any
character which is normally processed by the TTY driver, e.g.:

import System.Posix.Terminal

atts - getTerminalAttributes (handleToFd stdin)
let atts' = withoutMode atts ProcessInput
setTerminalAttributes (handleToFd stdin) atts' Immediately

This disables all input processing, e.g. line-editing and CR-LF
translation (i.e. pressing the Enter/Return key will result in CR, not
LF).

Remember to set it back before exiting.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Records (was Re: [Haskell] Improvements to GHC)

2005-11-18 Thread Glynn Clements


Sebastian Sylvan wrote:

   How about (¤)? It looks like a ring to me, I'm not sure where that's
   located on a EN keyboard, but it's not terribly inconvenient on my SE
   keyboard. f ¤ g looks better than f . g for function composition, if
   you ask me.
  
  That symbol actually does look better, but isn't on any English
  keyboards to the best of my knowledge. I can get it in my setup with
  compose-key o x, but not many people have a compose key assigned.
  Also, this may just be a bug, but currently, ghc gives a lexical error
  if I try to use that symbol anywhere, probably just since it's not an
  ASCII character.
 
 Hmm. On my keyboard it's Shift+4. Strange that it's not available on
 other keyboards. As far as I know that symbol means nothing
 particularly swedish. In fact, I have no idea what it means at all
 =)

It's a generic currency symbol (the X11 keysym is XK_currency). It
doesn't exist on a UK keyboard (where Shift-4 is the dollar sign).

In any case, using non-ASCII characters gives rise to encoding issues
(e.g. you have to be able to edit UTF-8 files).

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Converting [Word8] to String

2005-10-04 Thread Glynn Clements


Tomasz Zielonka wrote:

  How do I convert a list of bytes to a string?
 
 I assume you don't care about Unicode:

That should have said I assume that the data is encoded using
ISO-8859-1 (or a subset thereof, e.g. US-ASCII).

 map (Char.chr . fromIntegral)
 
 or
 
 map (toEnum . fromEnum)

For anything else, you will have to either to write a decoder (or use
someone else's; several exist for UTF-8), or interface to iconv()
using the FFI.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] newbe question

2005-09-27 Thread Glynn Clements


[EMAIL PROTECTED] wrote:

  obviously, Hugs thinks that =- is a special operator.  In Haskell you have 
  the 
  ability to define your own operators, so it would be possible to define an 
  operator =-.  I would suggest that you always put spaces around the = in 
  declarations.
  
  Best wishes,
  Wolfgang
 
 
 Hello,
 thank you for fast reply. 
 Ok, but what is the semantic of '=-' ? If it's an operator, it should
 have some impact (right term?).

It isn't defined in the prelude or any of the standard libraries.

The point is that the Haskell tokeniser treats any consecutive
sequence of the symbols !#$%*+./=[EMAIL PROTECTED]|-~ as a single operator 
token. 
This occurs regardless of whether a definition exists for the
operator.

More generally, the tokenising phase is unaffected by whether or not
an operator, constructor, identifier etc is defined. A specific
sequence of characters will always produce the same sequence of tokens
regardless of what definitions exist.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] Haskell versus Lisp

2005-09-21 Thread Glynn Clements


David F. Place wrote:

 I don't deny that all of the things you mentioned are wonderful  
 indeed.  I just wonder if they really could only be done in lisp or  
 even most conveniently.

Obviously, if you can do it in Lisp, you can do it in any
Turing-complete language; in the worst case, you just write a Lisp
interpreter.

As for convenience: syntax matters. The equivalence of code and data
in Lisp lets you write your own syntactic sugar. You're still bound by
the lexical (token-level) grammar, although reader macros mean that
isn't much of a restriction.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] Haskell versus Lisp

2005-09-20 Thread Glynn Clements


John Meacham wrote:

In Haskell, code is data too because code in the sense of
imperative actions is described by IO values.  You cannot analyse
them.
  
   And thus they are not data.
  
  Huh? I'd say they are not /concrete/ data, but (abstract) data they 
  surely are(?)
 
 and you are certainly free to turn them into concrete data by creating
 your own data type which you then can inspect and modify and then
 interpret.

IOW, you are free to write a Lisp interpreter in Haskell. But it's a
lot easier to do it in Lisp.

That, in a nutshell, is Lisp's key strength. It uses the same
structure for code as for data, which makes it very easy to add new
language features.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell

Re: [Haskell] Re: [Haskell-cafe] Haskell versus Lisp

2005-09-20 Thread Glynn Clements


David F. Place wrote:

  That, in a nutshell, is Lisp's key strength. It uses the same
  structure for code as for data, which makes it very easy to add new
  language features.
 
 
 I assume that you refer to `eval' and the fact it operates on conses  
 and symbols.  Beyond the extremely contrived example of a  
 metacircular interpreter, what are some examples of the benefits of  
 this feature of lisp?   What are some examples of language features  
 that are easy to add?

Well, to state the obvious, being able to extend or replace the
language's syntax and semantics. In particular, being able to do so
locally.

Probably the most useful consequence is the ability to create new
control constructs without being constrained by the existing syntax
and semantics (and without having to write your own monadic versions
of existing functions).

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Trapped by the Monads

2005-09-20 Thread Glynn Clements


Mark Carter wrote:

  Could you briefly elaborate on what you mean by hybrid variables?
 
 According to Google, hybrid in genetics means The offspring of 
 genetically dissimilar parents or stock, especially the offspring 
 produced by breeding plants or animals of different varieties, species, 
 or races. It's kind of like that - but for variables.
 
 The typical example in C is:
  mem = malloc(1024)
 Malloc returns 0 to indicate that memory cannot be allocated, or a 
 memory address if it can. The variable mem is a so-called hybrid 
 variable; it crunches together 2 different concepts: a boolean value 
 (could I allocate memory?) and an address value (what is the address 
 where I can find my allocated memory).

Well in that case, Maybe provides the perfect example of how to
implement hybrid variables correctly.

The types Ptr a and Maybe (Ptr a) are distinct. If you try to pass
the latter to a function which expects the former, you'll get a
compile-time error. You first have to extract the underlying value,
which means that you need to match against (Just x). If the wrapped
value is Nothing, you'll get an exception. Furthermore, if you forget
to handle the Nothing case, you'll get a compile-time warning.

In C, there's no way to distinguish (using the type system) between a
possibly-null pointer and a non-null pointer. Using a pair of a
boolean and a pointer is the wrong approach because the pointer is
meaningless if the boolean is false, but the type system won't prevent
you from using the value of the pointer in that case.

A more general example is structures where certain fields are only
valid in certain circumstances (e.g. depending upon the type field). 
Haskell-style sum types, (of which Maybe is an example) are a much
better solution, as the the fields only exist when they are
meaningful.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Re: [Haskell-cafe] Haskell versus Lisp

2005-09-16 Thread Glynn Clements


Wolfgang Jeltsch wrote:

Bearing this in mind, and hoping you can see where I'm coming from, I
think my question is: shouldn't you guys be using Lisp?
  
   Lisp is impure, weakly typed and has way too many parentheses.  Why would
   we use lisp? It seems to be lacking almost all the advantages of Haskell,
   and have an ugly, inflexible syntax to boot.
 
  The ability to dynamically generate, manipulate and analyse code in a
  structured manner provides a flexibility which is unmatched by any
  other language I know of.
 
  A good example is Emacs; lisp is entirely the right language for that,
  IMHO.
 
 Could you explain this a bit more, please?  To the moment, I cannot imagine 
 cases where you need LISP's way of code analysis and manipulation because 
 Haskell's capabilities are not sufficient.
 
 In Haskell, code is data too because code in the sense of imperative actions 
 is described by IO values.  You cannot analyse them.

And thus they are not data.

 But you can use your do 
 expressions etc. to construct action descriptions with a more general type 
 like MonadIO m = m a.  Then you can instantiate m with a monad whose values 
 store part of the action's structure so that this information can be used 
 later.  Or you use a monad which doesn't keep structural information to use 
 it for later processing but which does the processing upon construction.

Yeah, but this is heading in the direction of Greenspun's Tenth Rule
of Programming:

Any sufficiently complicated C or Fortran program contains an
ad hoc informally-specified bug-ridden slow implementation of
half of Common Lisp.

You could easily end up doing the same thing in Haskell.

The main thing about Lisp is that it tends to make it fairly easy to
write something close to the ideal language for the task in hand
without starting from the ground floor. You get stuck with Lisp's
token syntax, and the semantics of its core primitives, but you can
replace anything else.

Every other language (including Haskell) tends to have the problem
that eventually you will encounter a situation where the language's
own worldview gets in the way.

Or, to put it another way: if Haskell is so flexible, why do we need
Template Haskell? I can't imagine a Template Lisp; it would just be
Lisp.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell

Re: [Haskell] Re: [Haskell-cafe] Haskell versus Lisp

2005-09-16 Thread Glynn Clements


Tomasz Zielonka wrote:

  Every other language (including Haskell) tends to have the problem
  that eventually you will encounter a situation where the language's
  own worldview gets in the way.
 
 Are you sure that lisp's worldview never gets in the way?

I wouldn't say never. But it's main advantage is that it doesn't
really have much of a worldview.

Its primary composite data type is the heterogeneous linked list,
which is isomorphic to both binary trees and n-ary trees. This
provides a reasonable fit for most common data structures, and also
for most languages (anything defined by a recursive grammar can be
represented as a parse tree).

The complete absence of keywords is another useful feature (I've lost
track of the various C packages which had to have identifiers named
class renamed to allow for C++). Even quote is just another
symbol.

Ultimately, all languages are limited by their choice of primitives.

  Or, to put it another way: if Haskell is so flexible, why do we need
  Template Haskell?
 
 It's nice to have Template Haskell, but saying that we need it is a bit
 of an overstatement. In the GHC Survey 2005 only 9% of people said it's
 essential. Well, OK, I was one of them, but I think you know what I
 mean.
 
  I can't imagine a Template Lisp; it would just be Lisp.
 
 The power of lisp macros is often overrated. I remember a long
 discussion crossposted on comp.lang.lisp an comp.lang.functional. The
 lisp advocates gave examples for how macros allow to do things
 supposedly unavailable in other languages. Surprisingly, most of these
 things were equally easy to do with higher-order functions and closures
 in Haskell.
 
 I am sure that lisp gurus can achieve great things with macros, but I'm
 not sure they are the best tool for software engineering problems.
 I think they can make the code more difficult to understand, make the
 semantics less uniform (despite the uniform syntax), and can become an
 abused ugly hack.

Well, this is heading towards the inevitable paradox (Gödel's theorem,
halting problem, etc). If you allow the programmer to escape from your
chosen semantic model, you no longer have the luxury of being able to
assume those semantics.

Ultimately, the issue isn't whether shooting oneself in the foot is a
good idea, it's whether you leave it up to the language or to the
programmer to prevent it. Both have their pros and cons.

In that regard, Lisp and Haskell are almost opposite extremes, with
more conventional languages inbetween. Haskell's safety and
consistency can get in the way, while Lisp's freedom can be quite
unsafe and inconsistent.

 Don't get me wrong - I still think that lisp is one of the best
 programming languages around and from time to time I am trying to learn
 a bit of it.  One of the things that puts me off is the attitude of its
 community - it seems to be very close minded.

Hmm. That depends upon which faction of the community you're dealing
with. If you get into discussions about the merits of Lisp on public
fora, you'll likely be dealing with the evangelists. Language
evangelists are often closed-minded whatever the language.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell

Re: [Haskell-cafe] Haskell versus Lisp

2005-09-16 Thread Glynn Clements


David Roundy wrote:

  Bearing this in mind, and hoping you can see where I'm coming from, I 
  think my question is: shouldn't you guys be using Lisp?
 
 Lisp is impure, weakly typed and has way too many parentheses.  Why would
 we use lisp? It seems to be lacking almost all the advantages of Haskell,
 and have an ugly, inflexible syntax to boot.

The ability to dynamically generate, manipulate and analyse code in a
structured manner provides a flexibility which is unmatched by any
other language I know of.

A good example is Emacs; lisp is entirely the right language for that,
IMHO.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] mailing list headaches

2005-09-08 Thread Glynn Clements


Frederik Eaton wrote:

 However, threading by References, which RFC 2822 says
 SHOULD be possible, and which works on my other folders, doesn't work
 well on Haskell mailing lists. Presumably the issue is that there are
 a large number of Windows users with strange mail clients which don't
 insert References headers.

It isn't so much that there are a large number of such users, but that
two of the core developers are among them (and are both employed by
Microsoft, so RFC-conformance probably isn't an option).

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell

Re: [Haskell-cafe] Newbie question

2005-08-12 Thread Glynn Clements


André Vargas Abs da Cruz wrote:

 I think this is a totally newbie question as i am a complete novice 
 to Haskell. I am trying to write down a few programs using GHC in order 
 to get used with the language. I am having some problems with a piece of 
 code (that is supposed to return a list of lines from a text file) which 
 I transcribe below:
 
 module Test where
 
 import IO
 
 readDataFromFile filename = do
 bracket (openFile filename ReadMode) hClose
 (\h - do contents - hGetContents h
   return (lines contents))
 
 The question is: if i try to run this, it returns me nothing (just 
 an empty list). Why does this happen ? When i remove the return and 
 put a print instead, it prints everything as i expected.

hGetContents reads the file lazily; it won't actually read anything
until you try to consume the result. However, by that point, you
will have called hClose.

In general, you shouldn't use hClose in conjunction with lazy I/O
(hGetContents etc) unless you are certain that the data will have been
read.

When you put the print in place of the return, you force the data to
be consumed immediately, so the issue doesn't arise.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] ST/STRef vs. IO/IORef

2005-08-03 Thread Glynn Clements


Srinivas Nedunuri wrote:

 Hello, I have some code that manipulates STRefs within the ST monad. All
 good and fine, until I come across some computation that uses lets say
 IO and everything skids to a halt. At this point I have 3 choices:
 
 1. Define a ST State Transformer monad and do all my previous ST
 computations in that
 2. convert all subsequent ST computations into IO computations using
 stToIO 
 3. stop using the ST monad and do everything in the IO monad
 
 I was wondering what advice folks had. In particular, what are the
 disadvantages to doing everything in the IO monad - ie why even bother
 with the ST monad?

The most obvious disadvantage is that the IO monad has no equivalent
of runST. Also, there is no ioToST (only unsafeIOToST), so if you use
the IO monad, the code can only be used within the IO monad. The IO
monad is like a trap; once your inside, you can't get out.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell

Re: [Haskell-cafe] Interaction in Haskell

2005-07-11 Thread Glynn Clements


Dinh Tien Tuan Anh wrote:

  Yes, it is certainly not Hugs which prevents from realtime interaction but
  it is the terminal you are using. If the terminal lets you delete the
  characters on the current line it has to keep them until you complete it
  with ENTER.  Piping from and to other programs or files may not have this
  problem.
 
 You're right, i've been using shell in Emacs to run Hugs, but when back to 
 normal terminal, it works.
 
 Just for curiousity, why does it happen ?

Emacs' shell-mode doesn't send anything to the terminal driver until
you hit Return, at which point it sends the whole line.

Note that the terminal driver itself normally does line-buffering,
although this is disabled if you set the buffering for stdin to
NoBuffering.

You can't disable the line-buffering inherent in Emacs' shell-mode;
you would have to use terminal-emulator instead (M-x term instead of
M-x shell).

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] matrix computations based on the GSL

2005-07-09 Thread Glynn Clements


Keean Schupke wrote:

 So the linear operator is translation (ie: + v)... effectively 'plus'
 could be viewed as a function which takes a vector and returns a matrix
 (operator)
 
 (+) :: Vector - Matrix
 
 
 
 Since a matrix _is_ not a linear map but only its representation, this
 would not make sense. As I said (v+) is not a linear map thus there is no
 matrix which represents it. A linear map f must fulfill
  f 0 == 0
 
 But since
  v+0 == v
   the function (v+) is only a linear map if 'v' is zero.
 
  I can't see how to fit in your vector extension by the 1-component.
 
   
 
 Eh?
 
 Translation is a linear operation no?

No. It's affine, but not linear. As Henning said, to be linear, it
must map zero to zero.

 Adding vectors translates the
 first by the second
 (or the second by the first - the two are isomorphic)... A translation
 can be represented
 by the matrix:
 
 1   0   0   0
 0   1   0   0
 0   0   1   0
 dx dy dz 1
 
 So the result of v+ is this matrix.

No. If the above matrix is M, then:

[x y z w].M = [x+w.dx y+w.dy z+w.dz w]

which isn't a translation.

In the specific case of homogeneous coordinates, where:

h  [x y z]   = [x y z 1]
h' [x y z w] = [x/w y/w z/w]

then \v - h'(h(v).M) is a translation, but M isn't itself a
translation.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] New to Haskell, suggestions on code

2005-06-28 Thread Glynn Clements


Flavio Botelho wrote:

 At many places i have put a Char type instead of an abstract one because some 
 funcations were not working properly before and i wanted to be able to output 
 things and so be able to see what was the problem. 
 (Haskell doesnt seem a 'magic' function to output arbitrary structures? That 
 would be quite helpful for debugging)

The show method in the Show class generates a string representation of
an instance. The print function can be used to print the string
representation of any instance of Show to stdout.

All standard types except for functions are instances of Show, and
Haskell can automatically derive Show instances for user defined
types, provided that all of the constituent types are instances of
Show.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Specify array or list size?

2005-05-08 Thread Glynn Clements


Thomas Davie wrote:

  I'm not familiar with your C++ example (not being familiar with C++),
  but I think that it's a bit of a stretch of the imagination to say
  that C introduces a variable of type array of 50 ints, the fact
  that this is now an array of 50 integers is never checked at any
  point in the compilation or run, and I'm not sure it can be even if
  KR had wanted to.
 
 
  The size is taken into account when such array type is an element of
  another array, and by sizeof.
 
  int (*p)[50]; /* p may legally point only to arrays of 50 ints each */
  ++p; /* p is assumed to point into an array, and is moved by one
  element, i.e. by 50 ints */
 I'm not sure what you're trying to prove by saying that... There is  
 still no type information that says that the contents of p are an  
 array of 50 elements...

Put it this way, then:

1   void foo(void)
2   {
3   int a[2][50];
4   int b[2][60];
5   int (*p)[50];
6   
7   p = a;
8   p = b;
}

$ gcc -c -Wall foo.c
foo.c: In function `foo':
foo.c:8: warning: assignment from incompatible pointer type

In line 7, an expression of type int (*)[50] is assigned to a
variable of type int (*)[50], which is OK. In line 8, an expression
of type int (*)[60] is assigned to a variable of type int (*)[50],
and the compiler complains.

 I can still attempt to access element 51 and get a runtime memory
 error.

That's because C doesn't do bounds checking on array accesses. It has
nothing to do with types.

 The type of p is still int**,

No it isn't. int** and int (*)[50] are different types and have
different run-time behaviour.

 not pointer to array of 50 ints

Yes it is. The semantics of C pointer arithmetic mean that the size of
the target is an essential part of the pointer type.

In C, arrays and pointers are *not* the same thing. They are often
confused because C does several automatic conversions:

1. When used as an expression (rather than an lvalue), arrays are
automatically converted to pointers. Arrays only ever occur as
lvalues, never as expressions.

2. In a declaration, the x[...] syntax indicates that x is an array,
but in an expression, x must be a pointer (which includes an array
which has been converted to a pointer due to rule 1 above).

3. When declaring function arguments, you can use T x[] or T x[N]
as an alternative syntax for T *x; x is still a pointer, regardless
of the syntax used.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Compiling with NHC98

2005-05-08 Thread Glynn Clements


Daniel Carrera wrote:

  I haven't used NHC so I can't guarantee this will work, but try doing
  something like this:
  
  $ nhc98 -c RC4.hs
  $ nhc98 -c prng.hs
  $ nhc98 RC4.o prng.o -o prng
 
 Yay! It does. And I just put it in a makefile:
 
 ---daniel's makefile
 COMPILER=nhc98
 
 RC4.o:
  $(COMPILER) -c RC4.hs
 
 prng.o:
  $(COMPILER) -c prng.hs
 
 prng: RC4.o  prng.o
  $(COMPILER) RC4.o prng.o -o prng
 ---daniel's makefile

This can fail with a parallel make, which may try to compile RC4.hs
and prng.hs concurrently. It can also fail if you rebuild after
modifying any of the files, as it won't realise that it needs to
re-compile. To handle that, you need to be more precise about the
dependencies, i.e.:

RC4.o RC4.hi: RC4.hs
$(HC) -c RC4.hs

prng.o prng.hi: prng.hs RC4.hi
$(HC) -c prng.hs

prng: RC4.o  prng.o
$(HC) RC4.o prng.o -o prng

With most make programs (e.g. GNU make), you can use pattern rules to
avoid repeating the commands, e.g.:

# how to compile any .hs file to produce .o and .hi files
%.o %.hi: %.hs
$(HC) -c $

# how to build the prng program
prng: RC4.o prng.o
$(HC) -o $@ $+

# note that prng.o depends upon RC4.hi
prng.o: RC4.hi

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell] Should inet_ntoa Be Pure?

2005-05-07 Thread Glynn Clements


Axel Simon wrote:

   Does anyone know why these are in the IO monad? Aren't they pure functions
   converting between dotted-decimal strings and a 32-bit network byte 
   ordered
   binary value?
 
 I guess the answer is no for both: The first one can fail

That doesn't mean that it should be in the IO monad; using Maybe would
suffice.

 and the second one overwrites a fixed string buffer (yuck!). From
 the man page:
 
  The return value from inet_ntoa() points to a  buffer  which
  is  overwritten on each call.  This buffer is implemented as
  thread-specific data in multithreaded applications.
 
 Hence ntoa needs to be an IO action so that the value is read
 immediately before the next ntoa call is executed.

That shouldn't be an issue so long as the buffer contents are
converted to a Haskell String before the function is called again
within the same thread.

However, I wouldn't rely upon all implementations of inet_ntoa() being
thread-safe.

 What you could do is to apply unsafePerformIO to 

[snip]

Or you could just re-implement the functions in Haskell.

Apart from the re-entrancy issues with inet_ntoa(), many
implementations of inet_addr() have misfeatures, e.g. allowing octets
to be expressed in octal or hex, or allowing numbers outside of the
0-255 range (in which case, the top bits overflow into the next
octet).

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell

Re: [HOpenGL] Re: OpenGL/GLUT examples crashing: known problem?

2005-04-09 Thread Glynn Clements


Claus Reinke wrote:

 Btw, is there a way to reset the opengl system to a sane state in 
 software? Or are there some invalid assumptions about default 
 state in the other examples?

If OpenGL is getting stuck in a non-functional state, that indicates
a bug in the driver.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Re: [Haskell] Memoization in Haskell

2005-04-03 Thread Glynn Clements


Bright Sun wrote:

 I can not understand memoization in Haskell.  I can
 not find an example except the fib.
 
 For example, I want drop a list 4 times like, 
 
 drop 4 (drop 3 (drop 2 (drop 1
 [1,2,3,4,5,6,7,8,9,10,11,12])))
 [11,12]
 
 can I implement a memoization funcation like
 
 droploop [1,2,3,4] [1,2,3,4,5,6,7,8,9,10,11,12]
 to get same result
 [11,12]

That isn't memoization.

Memoization is where you record a set of prior argument/result pairs
(i.e. a lookup table) to avoid having to perform the exact same
calculation repeatedly.

You can implement droploop as a fold, e.g.:

droploop ns xs = foldr drop xs ns

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell

Re: [Haskell] URLs in haskell module namespace

2005-03-24 Thread Glynn Clements


S. Alexander Jacobson wrote:

 As I move from machine to machine, it would be nice not to have to 
 install all the libraries I use over and over again.  I'd like to be 
 able to do something like this:
 
import http://module.org/someLib as someLib
 
 If the requested module itself does local imports, the implementation 
 would first try to resolve the names on the client machine and 
 otherwise make requests along remote relative paths.

Embedding the path in the source code seems like a bad idea. If you
want to allow modules to be loaded via URLs, it would make more sense
to extend GHC's -i switch, i.e.

ghc -i http://module.org/ ...

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell

Re: [Haskell-cafe] invalid character encoding

2005-03-21 Thread Glynn Clements


John Meacham wrote:

  I'm not suggesting inventing conventions. I'm suggesting leaving such
  issues to the application programmer who, unlike the library
  programmer, probably has enough context to be able to reliably
  determine the correct encoding in any specific instance.
 
 But the whole point of Foreign.C.String is to interface to existing C
 code. And one of the most common conventions of said interfaces is to
 represent strings in the current locale, Which is why locale honoring
 conversion routines are useful. 

My point is that most C functions which accept or return char*s will
work regardless of whether those char*s can be decoded according to
the current locale. E.g.

while (d = readdir(dir), d)
{
stat(d-d_name, st);
...
}

will stat() every filename in the directory regardless of whether or
not the filenames are valid in the locale's encoding.

The Haskell equivalent using FilePath (i.e. String),
getDirectoryContents etc currently only works because the char* -
String conversions are hardcoded to ISO-8859-1, which is infallible
and reversible. If it used e.g. UTF-8, it would fail on any filename
which wasn't valid UTF-8 even though it never actually needs to know
the string of characters which the filename represents.

The same applies to reading filenames from argv[] and passing them to
open() etc. This is one of the most common idioms in Unix programming,
and it doesn't care about encodings at all. Again, it would cease to
work reliably in Haskell if the automatic char* - String conversions
in getArgs etc started using the locale.

I'm not arguing about *how* char* - String conversions should be
performed so much as arguing about *whether* these conversions should
be performed. The conversion issues are only problems because the
conversions are being done at all.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] invalid character encoding

2005-03-19 Thread Glynn Clements


Einar Karttunen wrote:

  In what way is ISO-2022 non-reversible? Is it possible that a ISO-2022 
  file name that is converted to Unicode cannot be converted back any 
  more (assuming you know for sure that it was ISO-2022 in the first 
  place)?
 
 I am no expert on ISO-2022 so the following may contain errors,
 please correct if it is wrong.
 
 ISO-2022 - Unicode is always possible.
 Also Unicode - ISO-2022 should be always possible, but is a relation
 not a function. This means there are an infinite? ways of encoding a
 particular unicode string in ISO-2022.
 
 ISO-2022 works by providing escape sequences to switch between different
 character sets. One can freely use these escapes in almost any way you
 wish.

Exactly.

Moreover, while there are an infinite number of equivalent
representations in theory (you can add as many redundant switching
sequences as you wish), there are multiple plausible equivalent
representations in practice.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] invalid character encoding

2005-03-19 Thread Glynn Clements


Marcin 'Qrczak' Kowalczyk wrote:

  I'm talking about standard (XSI) curses, which will just pass
  printable (non-control) bytes straight to the terminal. If your
  terminal uses CP437 (or some other non-standard encoding), you can
  just pass the appropriate bytes to waddstr() etc and the corresponding
  characters will appear on the terminal.
 
 Which terminal uses CP437?

Most software terminal emulators can use any encoding. Traditional
comms packages tend to support this (including their own VGA font if
necessary) because of its widespread use on BBSes which were targeted
at MS-DOS systems.

There exist hardware terminals (I can't name specific models, but I
have seen them in use) which support this, specifically for use with
MS-DOS systems.

 Linux console doesn't, except temporarily after switching the mapping
 to builtin CP437 (but this state is not used by curses) or after
 loading CP437 as the user map (nobody does this, and it won't work
 properly with all characters from the range 0x80-0x9F anyway).

I *still* encounter programs written for the linux console which
assume that the built-in CP437 font is being used (if you use an
ISO-8859-1 font, you get dialogs with accented characters where you
would expect line-drawing characters).

  You can treat it as immutable. Just don't call setlocale with
  different arguments again.
 
  Which limits you to a single locale. If you are using the locale's
  encoding, that limits you to a single encoding.
 
 There is no support for changing the encoding of a terminal on the fly
 by programs running inside it.

If you support multiple terminals with different encodings, and the
library uses the global locale settings to determine the encoding, you
need to switch locale every time you write to a different terminal.

  The point is that a single program often generates multiple streams of
  text, possibly for different audiences (e.g. humans and machines).
  Different streams may require different conventions (encodings,
  numeric formats, collating orders), but may use the same functions.
 
 A single program has a single stdout and a single filesystem. The
 contexts which use the locale encoding don't need multiple encodings.
 
 Multiple encodings are needed e.g. for exchanging data with other
 machines for the network, for reading contents of text files after the
 user has specified an encoding explicitly etc. In these cases an API
 with explicitly provided encoding should be used.

A API which is used for reading and writing text files or sockets is
just as applicable to stdin/stdout.

   The current locale mechanism is just a way of avoiding the issues
   as much as possible when you can't get away with avoiding them
   altogether.
  
  It's a way to communicate the encoding of the terminal, filenames,
  strerror, gettext etc.
 
  It's *a* way, but it's not a very good way. It sucks when you can't
  apply a single convention to everything.
 
 It's not so bad to justify inventing our own conventions and forcing
 users to configure the encoding of Haskell programs separately.

I'm not suggesting inventing conventions. I'm suggesting leaving such
issues to the application programmer who, unlike the library
programmer, probably has enough context to be able to reliably
determine the correct encoding in any specific instance.

  Unicode has no viable competition.
 
  There are two viable alternatives. Byte strings with associated
  encodings and ISO-2022.
 
 ISO-2022 is an insanely complicated brain-damaged mess. I know it's
 being used in some parts of the world, but the sooner it will die,
 the better.

ISO-2022 has advantages and disadvantages relative to UTF-8. I don't
want to go on about the specifics here because they aren't
particularly relevant. What's relevant is that it isn't likely to
disappear any time soon.

A large part of the world already has a universal encoding which works
well enough; they don't *need* UTF-8, and aren't going to rebuild
their IT infrastructure from scratch for the sake of it.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] invalid character encoding

2005-03-19 Thread Glynn Clements


Wolfgang Thaller wrote:

  Of course, it's quite possible that the only test cases will be people
  using UTF-8-only (or even ASCII-only) systems, in which case you won't
  see any problems.
 
 I'm kind of hoping that we can just ignore a problem that is so rare 
 that a large and well-known project like GTK2 can get away with 
 ignoring it.

1. The filename issues in GTK-2 are likely to be a major problem in
CJK locales, where filenames which don't match the locale (which is
seldom UTF-8) are common.

2. GTK's filename handling only really applies to file selector
dialogs. Most other uses of filenames in a GTK-based application don't
involve GTK; they use the OS API functions which just deal with byte
strings.

3. GTK is a GUI library. Most of the text which it deals with is going
to be rendered, so it *has* to be interpreted as characters. Treating
it as blobs of data won't work. IOW, on the question of whether or not
to interpret byte strings as character strings, GTK is at the far end
of the scale.

 Also, IIRC, Java strings are supposed to be unicode, too - 
 how do they deal with the problem?

Files are represented by instances of the File class:

http://java.sun.com/j2se/1.5.0/docs/api/java/io/File.html

An abstract representation of file and directory pathnames.

You can construct Files from Strings, and convert Files to Strings. 

The File class includes two sets of directory enumeration methods:
list() returns an array of Strings, while listFiles() returns an array
of Files.

The documentation for the File class doesn't mention encoding issues
at all. However, with that interface, it would be possible to
enumerate and open filenames which cannot be decoded.

  So we can't do Unicode-based I18N because there exist a few unix
  systems with messed-up file systems?
 
  Declaring such systems to be messed up won't make the problems go
  away. If a design doesn't work in reality, it's the fault of the
  design, not of reality.
 
 In general, yes. But we're not talking about all of reality here, we're 
 talking about one small part of reality - the question is, can the part 
 of reality where the design doesn't work be ignored?

Sure, you *can* ignore it; KR C ignored everything other than ASCII.
If you limit yourself to locales which use the Roman alphabet (i.e.
ISO-8859-N for N=1/2/3/4/9/15), you can get away with a lot.

Most such users avoid encoding issues altogether by dropping the
accents and sticking to ASCII, at least when dealing with files which
might leave their system.

To get a better idea, you would need to consult users whose language
doesn't use the roman alphabet, e.g. CJK or cyrillic. Unfortunately,
you don't usually find too many of them on lists such as this.

I'm only familiar with one OSS project which has a sizeable CJK user
base, and that's XEmacs (whose I18N revolves around ISO-2022, and most
of the documentation is in Japanese). Even there, there are separate
mailing lists for English and Japanese, and the two seldom
communicate.

 I think that if we wait long enough, the filename encoding problems 
 will become irrelevant and we will live in an ideal world where unicode 
 actually works. Maybe next year, maybe only in ten years.

Maybe not even then. If Unicode really solved encoding problems, you'd
expect the CJK world to be the first adopters, but they're actually
the least eager; you are more likely to find UTF-8 in an
English-language HTML page or email message than a Japanese one.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] invalid character encoding

2005-03-18 Thread Glynn Clements


Wolfgang Thaller wrote:

  If you try to pretend that I18N comes down to shoe-horning everything
  into Unicode, you will turn the language into a joke.
 
 How common will those problems you are describing be by the time this 
 has been implemented?
 How common are they even now?

Right now, GHC assumes ISO-8859-1 whenever it has to automatically
convert between String and CString. Conversions to and from ISO-8859-1
cannot fail, and encoding and decoding are exact inverses.

OK, so the intermediate string will be nonsense if ISO-8859-1 isn't
the correct encoding, but that doesn't actually matter a lot of the
time; frequently, you're just grabbing a blob of data from one
function and passing it to another.

The problems will only appear once you start dealing with fallible or
non-reversible encodings such as UTF-8 or ISO-2022. If and when that
happens, I guess we'll find out how common the problems are. Of
course, it's quite possible that the only test cases will be people
using UTF-8-only (or even ASCII-only) systems, in which case you won't
see any problems.

 I haven't yet encountered a unix box where the file names were not in 
 the system locale encoding. On all reasonably up-to-date Linux boxes 
 that I've seen recently, they were in UTF-8 (and the system locale 
 agreed).

I've encountered boxes where multiple encodings were used; primarily
web and FTP servers which were shared amongst multiple clients. Each
client used whichever encoding(s) they felt like. IIRC, the most
common non-ASCII encoding was MS-DOS codepage 850 (the clients were
mostly using Windows 3.1 at that time).

I haven't done sysadmin for a while, so I don't know the current
situation, but I don't think that the world has switched to UTF-8 in
the mean time. [Most of the non-ASCII filenames which I've seen
recently have been either ISO-8859-1 or Win-12XX; I haven't seen much
UTF-8.]

 On both Windows and Mac OS X, filenames are stored in Unicode, so it is 
 always possible to convert them to unicode.
 So we can't do Unicode-based I18N because there exist a few unix 
 systems with messed-up file systems?

Declaring such systems to be messed up won't make the problems go
away. If a design doesn't work in reality, it's the fault of the
design, not of reality.

  Haskell's Unicode support is a joke because the API designers tried to
  avoid the issues related to encoding with wishful thinking (i.e. you
  open a file and you magically get Unicode characters out of it).
 
 OK, that part is purely wishful thinking, but assuming that filenames 
 are text that can be represented in Unicode is wishful thinking that 
 corresponds to 99% of reality.
 So why can't the remaining 1 percent of reality be fixed instead?

The issue isn't whether the data can be represented as Unicode text,
but whether you can convert it to and from Unicode without problems.
To do this, you need to know the encoding, you need to store the
encoding so that you can convert the wide string back to a byte
string, and the encoding needs to be reversible.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] invalid character encoding

2005-03-18 Thread Glynn Clements

, but I can see why they would have
changed it (a single catalogue for encoding variants of a given
locale).

  How would it know how to interpret filenames for graphical
  display?
 
  An option menu on the file selector is one option; heuristics are
  another.
 
 Heuristics won't distinguish various ISO-8859-x from each other.

You treat the locale's encoding as a heuristic. If it looks like
ISO-8859-x, and the locale's encoding is ISO-8859-x, you use that. If
it looks like Shift-JIS, you don't complain and give up just because
the locale is UTF-8.

 An option menu on the file selector is user-unfriendly because users
 don't want to configure it for each program separately. They want to
 set it in one place and expect it to work everywhere.

Nothing will work everywhere. An option menu allows the user to force
the encoding for individual cases when whatever other mechanism(s) you
use get it wrong.

I've needed to use Mozilla's View - Character Encoding menu enough
times when the browser's guess turned out to be wrong (and blindly
honouring the charset specified by HTTP's Content-Type: or HTML's META
tags would be a disaster).

  At least Gtk-1 would attempt to display the filename; you would get
  the odd question mark but at least you could select the file;
 
 Gtk+2 also attempts to display the filename. It can be opened
 even though the filename has inconvertible characters escaped.

This isn't my experience; I just get messages like:

Gtk-Message: The filename \377.ppm couldn't be converted to UTF-8. (try 
setting the environment variable G_FILENAME_ENCODING): Invalid byte sequence in 
conversion input

and the filename is omitted altogether.

  The current locale mechanism is just a way of avoiding the issues
  as much as possible when you can't get away with avoiding them
  altogether.
 
 It's a way to communicate the encoding of the terminal, filenames,
 strerror, gettext etc.

It's *a* way, but it's not a very good way. It sucks when you can't
apply a single convention to everything.

  Unicode has been described (accurately, IMHO) as Esperanto for
  computers. Both use the same approach to try to solve essentially the
  same problem. And both will be about as successful in the long run.
 
 Unicode has no viable competition.

There are two viable alternatives. Byte strings with associated
encodings and ISO-2022. In CJK environments, ISO-2022 is still far
more widespread than UTF-8, and will likely remain so for the
foreseeable future. And byte strings with associated encodings are
probably still the most common of all.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] invalid character encoding

2005-03-17 Thread Glynn Clements


Marcin 'Qrczak' Kowalczyk wrote:

 Glynn Clements [EMAIL PROTECTED] writes:
 
  It should be possible to specify the encoding explicitly.
 
  Conversely, it shouldn't be possible to avoid specifying the
  encoding explicitly.
 
 What encoding should a binding to readline or curses use?
 
 Curses in C comes in two flavors: the traditional byte version and a
 wide character version. The second version is easy if we can assume
 that wchar_t is Unicode, but it's not always available and until
 recently in ncurses it was buggy. Let's assume we are using the byte
 version. How to encode strings?

The (non-wchar) curses API functions take byte strings (char*), so the
Haskell bindings should take CString or [Word8] arguments. If you
provide wrapper functions which take String arguments, either they
should have an encoding argument or the encoding should be a mutable
per-terminal setting.

 A terminal uses an ASCII-compatible encoding. Wide character version
 of curses convert characters to the locale encoding, and byte version
 passes bytes unchanged. This means that if a Haskell binding to the
 wide character version does the obvious thing and passes Unicode
 directly, then an equivalent behavior can be obtained from the byte
 version (only limited to 256-character encodings) by using the locale
 encoding.

I don't know enough about the wchar version of curses to comment on
that.

I do know that, to work reliably, the normal (byte) version of curses
needs to pass printable bytes through unmodified.

It is possible for curses to be used with a terminal which doesn't use
the locale's encoding. Specifically, a single process may use curses
with multiple terminals with differing encodings, e.g. an airport
public information system displaying information in multiple
languages.

Also, it's quite common to use non-standard encodings with terminals
(e.g. codepage 437, which has graphic characters beyond the ACS_* set
which terminfo understands).

 The locale encoding is the right encoding to use for conversion of the
 result of strerror, gai_strerror, msg member of gzip compressor state
 etc. When an I/O error occurs and the error code is translated to a
 Haskell exception and then shown to the user, why would the application
 need to specify the encoding and how?

Because the application may be using multiple locales/encodings.
Having had to do this in C (i.e. repeatedly calling setlocale() to
select the correct encoding), I would much prefer to have been able to
pass the locale as a parameter.

[The most common example is printf(%f). You need to use the C locale
(decimal point) for machine-readable text but the user's locale
(locale-specific decimal separator) for human-readable text. This
isn't directly related to encodings per se, but a good example of why
parameters are preferable to state.]

  If application code doesn't want to use the locale's encoding, it
  shouldn't be shoe-horned into doing so because a library developer
  decided to duck the encoding issues by grabbing whatever encoding
  was readily to hand (i.e. the locale's encoding).
 
 If a C library is written with the assumption that texts are in the
 locale encoding, a Haskell binding to such library should respect that
 assumption.

C libraries which use the locale do so as a last resort. KR C
completely ignored I18N issues. ANSI C added the locale mechanism to
as a hack to provide minimal I18N support while maintaining backward
compatibility and in a minimally-intrusive manner.

The only reason that the C locale mechanism isn't a major nuisance is
that you can largely ignore it altogether. Code which requires real
I18N can use other mechanisms, and code which doesn't require any I18N
can just pass byte strings around and leave encoding issues to code
which actually has enough context to handle them correctly.

 Only some libraries allow to work with different, explicitly specified
 encodings. Many libraries don't, especially if the texts are not the
 core of the library functionality but error messages.

And most such libraries just treat text as byte strings. They don't
care about their interpretation, or even whether or not they are valid
in the locale's encoding.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] invalid character encoding

2005-03-17 Thread Glynn Clements


John Meacham wrote:

It doesn't affect functions added by the hierarchical libraries,
i.e. those functions are safe only with the ASCII subset. (There is
a vague plan to make Foreign.C.String conform to the FFI spec,
which mandates locale-based encoding, and thus would change all
those, but it's still up in the air.)
   
Hmm. I'm not convinced that automatically converting to the current
locale is the ideal behaviour (it'd certianly break all my programs!).
Certainly a function for converting into the encoding of the current
locale would be useful for may users but it's important to be able to
know the encoding with certainty.
   
   It should only be the default, not the only option.
  
  I'm not sure that it should be available at all.
  
   It should be possible to specify the encoding explicitly.
  
  Conversely, it shouldn't be possible to avoid specifying the encoding
  explicitly.
  
  Personally, I wouldn't provide an all-in-one convert String to
  CString using locale's encoding function, just in case anyone was
  tempted to actually use it.
 
 But this is exactly what is needed for most C library bindings.

I very much doubt that most is accurate.

C functions which take a char* fall into three main cases:

1. Unspecified encoding, i.e. it's a string of bytes, not characters.

2. Locale's encoding, as determined by nl_langinfo(CODESET);
essentially, whatever was set with setlocale(LC_CTYPE), defaulting to
C/POSIX if setlocale() hasn't been called.

3. Fixed encoding, e.g. UTF-8, ISO-2022, US-ASCII (or EBCDIC on IBM
mainframes).

Historically, library functions have tended to fall into category 1
unless they *need* to know the interpretation of a given byte or
sequence of bytes (e.g. ctype.h), in which case they fall into
category 2. Most of libc falls into category 1, with a minority of
functions in category 2.

Code which is designed to handle multiple languages simultaneously is
more likely to fall into category 3, using one of the universal
encodings (typically ISO-2022 in southeast Asia and UTF-8 elsewhere).

E.g. Gtk-2.x uses UTF-8 almost exclusively, although you can force the
use of the locale's encoding for filenames (if you have filenames in
multiple encodings, you lose; filenames using the wrong encoding
simply don't appear in file selectors).

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] invalid character encoding

2005-03-17 Thread Glynn Clements

 it was the
simplest way to retrofit minimal I18N onto KR C. It also means that
most code can easily duck the issues (i.e. so you don't have to pass a
locale parameter to isupper() etc).

OTOH, if you don't want to duck the issue, global locale settings are
a nuisance.

  The only reason that the C locale mechanism isn't a major nuisance
  is that you can largely ignore it altogether.
 
 Then how would a Haskell program know what encoding to use for stdout
 messages?

It doesn't necessarily need to. If you are using message catalogues,
you just read bytes from the catalogue and write them to stdout. The
issue then boils down to using the correct encoding for the
catalogues; the code doesn't need to know.

 How would it know how to interpret filenames for graphical
 display?

An option menu on the file selector is one option; heuristics are
another.

Both tend to produce better results in non-trivial cases than either
of Gtk-2's choices: i.e. filenames must be either UTF-8 or must match
the locale (depending up the G_BROKEN_FILENAMES setting), otherwise
the filename simply doesn't exist. At least Gtk-1 would attempt to
display the filename; you would get the odd question mark but at least
you could select the file; ultimately, the returned char* just gets
passed to open(), so the encoding only really matters for display.

  Code which requires real I18N can use other mechanisms, and code
  which doesn't require any I18N can just pass byte strings around and
  leave encoding issues to code which actually has enough context to
  handle them correctly.
 
 Haskell can't just pass byte strings around without turning the
 Unicode support into a joke (which it is now).

If you try to pretend that I18N comes down to shoe-horning everything
into Unicode, you will turn the language into a joke.

Haskell's Unicode support is a joke because the API designers tried to
avoid the issues related to encoding with wishful thinking (i.e. you
open a file and you magically get Unicode characters out of it).

The current locale mechanism is just a way of avoiding the issues as
much as possible when you can't get away with avoiding them
altogether.

Unicode has been described (accurately, IMHO) as Esperanto for
computers. Both use the same approach to try to solve essentially the
same problem. And both will be about as successful in the long run.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] invalid character encoding

2005-03-17 Thread Glynn Clements


Marcin 'Qrczak' Kowalczyk wrote:

  E.g. Gtk-2.x uses UTF-8 almost exclusively, although you can force the
  use of the locale's encoding for filenames (if you have filenames in
  multiple encodings, you lose; filenames using the wrong encoding
  simply don't appear in file selectors).
 
 Actually they do appear, even though you can't type their names
 from the keyboard. The name shown in the GUI used to be escaped in
 different ways by different programs or even different places in one
 program (question marks, %hex escapes \oct escapes), but recently
 they added some functions to glib to make the behavior uniform.

In the last version of Gtk-2.x which I tried, invalid filenames are
just omitted from the list. Gtk-1.x displayed them (I think with
question marks, but it may have been a box).

I've just tried with a more recent version (2.6.2); the default
behaviour is similar, although you can now get around the issue by
using G_FILENAME_ENCODING=ISO-8859-1. Of course, if your locale is
a long way from ISO-8859-1, that isn't a particularly good solution.

The best test case would be a system used predominantly by Japanese,
where (apparently) it's common to have a mixture of both EUC-JP and
Shift-JIS filenames (occasionally wrapped in ISO-2022, but usually
raw).

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] invalid character encoding

2005-03-16 Thread Glynn Clements


Marcin 'Qrczak' Kowalczyk wrote:

  It doesn't affect functions added by the hierarchical libraries,
  i.e. those functions are safe only with the ASCII subset. (There is
  a vague plan to make Foreign.C.String conform to the FFI spec,
  which mandates locale-based encoding, and thus would change all
  those, but it's still up in the air.)
 
  Hmm. I'm not convinced that automatically converting to the current
  locale is the ideal behaviour (it'd certianly break all my programs!).
  Certainly a function for converting into the encoding of the current
  locale would be useful for may users but it's important to be able to
  know the encoding with certainty.
 
 It should only be the default, not the only option.

I'm not sure that it should be available at all.

 It should be possible to specify the encoding explicitly.

Conversely, it shouldn't be possible to avoid specifying the encoding
explicitly.

Personally, I wouldn't provide an all-in-one convert String to
CString using locale's encoding function, just in case anyone was
tempted to actually use it.

The decision as to the encoding belongs in application code; not in
(most) libraries, and definitely not in the language.

[Libraries dealing with file formats or communication protocols which
mandate a specific encoding are an exception. But they will be using a
fixed encoding, not the locale's encoding.]

If application code chooses to use the locale's encoding, it can
retrieve it then pass it as the encoding argument to any applicable
functions.

If application code doesn't want to use the locale's encoding, it
shouldn't be shoe-horned into doing so because a library developer
decided to duck the encoding issues by grabbing whatever encoding was
readily to hand (i.e. the locale's encoding).

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

RE: Process library and signals

2005-02-07 Thread Glynn Clements


Simon Marlow wrote:

  I think this covers most of the useful situations.  If you want to do
  the same thing in both parent and child, or handle in the parent and
  SIG_DFL in the child: use runProcess.  If you want to ignore in the
  parent and SIG_DFL in the child: use System.Cmd.{system,rawSystem}. 
  To handle in the parent and ignore in the child: unfortunately not
  directly supported.
  
  As it stands, you can have whatever behaviour you want in the parent:
  set the desired handling before calling system/rawSystem/runProcess
  then set it back afterwards.
  
  However, this will cease to be true for system/rawSystem if you change
  them so that the child restores the handlers to their state upon
  entry.
 
 I don't understand...  is there a typo somewhere above?  Perhaps you
 meant child in the first paragraph?

Sorry; I wasn't thinking straight. That part of my message is
incorrect; changing the signal handling before calling
system/rawSystem won't help, because they force both cases.

If they were changed to behave like system(), the caller could
determine the *child* behaviour, but that's prone to a race condition,
so I doubt that it would be useful in practice.

 system/rawSystem now behave almost exactly like system() in C.  The only
 difference is that you can't ignore SIGINT/SIGQUIT in the child, but I
 can fix that if necessary.

I'm not sure how much it matters; system() isn't really of much use
for real programs anyhow.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Re: [Haskell] Re: xemacs haskell major mode

2005-02-02 Thread Glynn Clements


Surendra Singhi wrote:

   Is there any ilisp or slime like package for haskell, which integrates
   haskell with xemacs or emacs and provides a kind of integrated
   development environment?
   I am using Hugs 98.
  
  
   Does URL:
   http://www.haskell.org/pipermail/haskell/2004-November/015015.html
  
  help?
 
 I downloaded the haskell mode from that site and I was trying to
 configure it, but during the process I ran into this error
 
 Debugger entered--Lisp error: (void-function charsetp)

signal(void-function (charsetp))

The charsetp function only exists if XEmacs was built with the MuLE
(MUlti-Lingual Emacs) option.

However, the only use of that function which I can see in the
haskell-mode code is:

   (and (fboundp 'make-char) (charsetp 'japanese-jisx0208)

So charsetp should only be called if the make-char function exists,
and that function should also only exist if XEmacs was built with the
MuLE option.

It may be that you have a version of XEmacs which has make-char but
which doesn't have charsetp.

In any case, I have CC'd this message to the maintainer of
haskell-mode, Stefan Monnier, in case he has any ideas.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell

Re: [Haskell-cafe] Re: File path programme

2005-02-02 Thread Glynn Clements


Peter Simons wrote:

   Hmmm, I'm not really sure what equivalence for file
   paths should mean in the presence of hard/symbolic links,
   (NFS-)mounted file systems, etc.
 
 Well, there is a sort-of canonic version for every path; on
 most Unix systems the function realpath(3) will find it.
 My interpretation is that two paths are equivalent iff they
 point to the same target.

I think that any definition which includes an iff is likely to be
overly optimistic.

More likely, you will have to settle for a definition such that, if
two paths are considered equal, they refer to the same file, but
without the converse (i.e. even if they aren't equal, they might still
refer to the same file).

Even so, you will need to make certain assumptions. E.g. older Unices
would allow root to replace the . and .. entries; you probably
want to assume that can't happen.

 You (and the others who pointed it out) are correct, though,
 that the current 'canon' function doesn't accomplish that. I
 guess, I'll have to move it into the IO monad to get it
 right. And I should probably rename it, too. ;-)

A version in the IO monad would allow for a tighter definition (i.e. 
more likely to correctly identify that two different path values
actually refer to the same file).

[Certainly, you have to use the IO monad if you want to allow for case
sensitivity, as that depends upon which filesystems are mounted
where.]

Within the IO monad, the obvious approach is to stat() both pathnames
and check whether their targets have the same device/inode pairs. 
That's reasonably simple, and probably about as good as you can get.

That still won't handle the case where you mount a single remote
filesystem via both NFS and SMB though. I doubt that anything can
achieve that.

There are also issues of definition, e.g. is /dev/tty considered
equivalent to the specific /dev/ttyXX device for the current
process?

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] File path programme

2005-01-30 Thread Glynn Clements


Keean Schupke wrote:

 I guess it's just that I'm more concerned with making possible what is
 currently impossible (according to the library standards)--that is, using
 FFI and IO on the same file--rather than just adding utility features that
 application developers could have written themselves.  I suppose we don't
 need a class for this, all we need is a couple of functions to convert
 between FilePath and CString.
   
 
 Except paths are different on different platforms... for example:
 
 /a/b/../c/hello\ there/test
 
 and:
 
 A:\a\b\
 
 notice how the backslash is used to 'escape' a space or meta-character on
 unix,

That's Bourne-shell syntax, not Unix API syntax. So far as open() etc
are concerned, a backslash is just another character.

Also, Windows accepts both slash and backslash equally in most
situations. It's only really command-line parsing (where slash is
normally used to denote switches) where there's an issue.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] File path programme

2005-01-30 Thread Glynn Clements


robert dockins wrote:

  I don't pretend to fully understand various unicode standard but it
  seems to me that these problems are deeper than file path library. The
  equation (decode . encode)
  /= id seems confusing for me. Can you give me an example when this
  happen? 
 
 I am pretty sure that ISO 2022 encoded strings can have multiple ways to 
 express the same unicode glyphs.  This means that any sensible relation 
 between IS0 2022 strings and unicode strings maps more than one ISO 2022 
 string onto the same unicode string.  The inverse is therefore not a 
 function.  To make it a function one of the possibly several encodings 
 of the unicode string will have to be chosen.  So you have a ISO 2022 
 string A which is decoded to a unicode string U.  We reencode U to an 
 ISO 2022 string B.  It may be that A /= B.  That is the problem.

Exactly.

And it isn't a theoretical issue. E.g. in an environment where EUC-JP
is used, filenames may begin with ESC$)B (designate JISX0208 to G1),
or they may not (because G1 is assumed to contain JISX0208 initally).

More generally, ISO-2022 strings frequently contain redundant
character-set switching sequences, so conversion to unicode and back
again typically won't yield the original sequence of bytes.

 The various UTF encodings do not have this particular problem; if a UTF 
 string is valid, then it is a unique representation of a unicode string.

Except that there are some ad-hoc extensions, e.g. the UTF-8 variant
used by both Java and Tcl permits NUL characters to be embedded in
NUL-terminated UTF-8 strings by encoding them as a two-byte sequence
(which is invalid in UTF-8 proper).

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: File path programme

2005-01-30 Thread Glynn Clements


(BMarcin 'Qrczak' Kowalczyk wrote:
(B
(B  The various UTF encodings do not have this particular problem; if a UTF
(B  string is valid, then it is a unique representation of a unicode string.
(B  However, decoding is still a partial function and can fail.
(B 
(B  And while it is partly true, it is qualified by the problems relative to
(B  canonicalization (an "-Bé" in Unicode can both be represented as "é" or as 
(B  two-A
(B  chars (an e and an accent) and they should (ideally) compare equal).
(B 
(B In what sense "equal"? They are supposed to be equivalent as far
(B as the semantics of the text is concerned, but representations are
(B clearly different and most programs distinguish them. In particular
(B they are different filenames on both Unix and Windows. AFAIK MacOS
(B normalizes filenames, but using a slightly different algorithm than
(B Unicode (perhaps just an older version).
(B 
(B IMHO it makes no sense to pretend that they are exactly the same when
(B strings consist of code points or lower level units (and I don't
(B believe another choice for the default string type would be practical).
(B
(BWell, at least you and I agree on that.
(B
(BOnce you start down the "semantic equivalence" route, you will quickly
(Brun into issues like "ß" == "ss", and it only gets worse from there
(Bon.
(B
(B-- 
(BGlynn Clements [EMAIL PROTECTED]
(B___
(BHaskell-Cafe mailing list
(BHaskell-Cafe@haskell.org
(Bhttp://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] File path programme

2005-01-30 Thread Glynn Clements


Ben Rudiak-Gould wrote:

 Is there an MSDN page that actually gives a grammar, or at least a 
 taxonomy, of Win32 pathnames? That would be useful.

It would also be longer than War and Peace, once you start allowing
for MS-DOS 8.3 pathnames, codepages, the fact that anything anywhere
which contains aux, con, lpt etc refers to a device (sometimes),
the fact that ... == ../.. (sometimes), the handling of incomplete
multibyte characters, ...

Search the BugTraq archives for issues related to IIS access-control
lists to discover the myriad different names which can be used to
refer to a given file for which the administrator is (unsuccessfully)
trying to restrict access.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Hugsvs GHC (again)was: Re: Somerandomnewbiequestions

2005-01-24 Thread Glynn Clements


Ketil Malde wrote:

  The point is that the Unix documentation does not consider the short
  pause as data is read off your hard drive to be blocking. So that's why
  select will always report that data is available when you use it with a
  file handle.
 
 Isn't this also for historic reasons?

Partly.

But I think that it's also because this functionality wasn't intended
for the purpose which is being discussed, i.e. enabling a process to
obtain maximal CPU utilisation.

For that purpose, explicit overlapped I/O (in all forms) can only ever
be a partial solution, because you still have the issue that memory
(i.e. code/data/stack segments) is demand-paged. The only solution
there is multiple threads.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Hugsvs GHC (again)was: Re: Somerandomnewbiequestions

2005-01-20 Thread Glynn Clements


Keean Schupke wrote:

 Why is disk a special case?

With slow streams, where there may be an indefinite delay before the
data is available, you can use non-blocking I/O, asynchronous I/O,
select(), poll() etc to determine if the data is available.

If it is, reading the data is essentially just copying from kernel
memory to userspace.

If it isn't, the program can do something else while it's waiting for
the data to arrive.

With files or block devices, the data is always deemed to be
available, even if the data isn't in physical memory. Calling read()
in such a situation will block until the data has been read into
memory.

 I have never heard that all processes
 under linux wait for a disk read... The kernel most certainly does
 not busy wait for disks to respond, so the only alternative is that
 the process that needs to wait (and only that process) is put to
 sleep. In which case a second thread would be unaffected.

Correct. The point is that maximising CPU utilisation requires the use
of multiple kernel threads; select/poll or non-blocking/asynchronous
I/O won't suffice.

 Linux does not busy wait in the Kernel! (don't forget the kernel
 does read-ahead, so it could be that read really does return
 'immediately' and without any delay apart from at the end of file -
 In which case asynchronous IO just slows you down with extra context
 switches).

It doesn't busy wait; it suspends the process/thread, then schedules
some other runnable process/thread. The original thread remains
suspended until the data has been transferred into physical memory.

Reading data from a descriptor essentially falls into three cases:

1. The data is in physical RAM. read() copies the data to the supplied
user-space buffer then returns control to the caller.

2. The data isn't in physical RAM, but is available with only a finite
delay (i.e. time taken to read from block device or network
filesystem).

3. The data isn't in physical RAM, and may take an indefinite amount
of time to arrive (e.g. from a socket, pipe, terminal etc).

The central issue is that the Unix API doesn't distinguish between
cases 1 and 2 when it comes to non-blocking I/O, asynchronous I/O,
select/poll etc. [OTOH, NT overlapped I/O and certain Unix extensions
do distinguish these cases, i.e. data is only available when it's in
physical RAM.]

If you read from a non-blocking descriptor, and case 2 applies, read()
will block while the data is read from disk then return the data; it
won't return -1 with errno set to EAGAIN, as would happen with case 3. 

If you want to be able to utilise the CPU while waiting for disk I/O
to occur, you have to use multiple kernel threads, with one thread for
each pending I/O operation, plus another one for computations (or
another one for each CPU if you want to obtain the full benefit of
an SMP system).

Even then, you still have to allow for the fact that user-space
memory is subject to swapping and demand-paging.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe]Re: Hugsvs GHC (again)was: Re: Somerandomnewbiequestions

2005-01-20 Thread Glynn Clements


Keean Schupke wrote:

   read. I don't see the problem... (Okay, I can see that if select lies, 
   and the
   read takes a long time you might miss the next scheduling timeslot - but
   as far as I am aware select doesn't lie, and read will return immediately
   if select says there is data ready)...
   
   
 
 select() _does_ lie for ordinary files, e.g., disk files.  It
 assumes the data is immediately readable, even if it hasn't pulled it
 off disk yet.  If the ordinary file actually resides on an NFS
 volume, or CD, or something else slow, then you have a problem.
 
 But the kernel does read-ahead so this data should just be a buffer copy.

You can't rely upon the kernel always having read the data already. 
E.g. a program which performs trivial operations on large files may
well be able to consume the data faster than the kernel can obtain it.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Hugsvs GHC (again)was: Re: Somerandomnewbiequestions

2005-01-19 Thread Glynn Clements


Keean Schupke wrote:

 Okay, my ignorance of Posix is showing again. Is it currently the
 case, then, that every GHC thread will stop running while a disk read
 is in progress in any thread? Is this true on all platforms?
 
 It's true on Unix-like systems, I believe.  Even with -threaded.  It
 might not be true on Win32.
 
 I think this is not true on linux, where a thread is just a process created
 with special flags to keep the same fds and memory.
 
 As threads on linux are scheduled like processes, one thread blocking should
 not affect the others?

That should be true of all POSIX-like thread implementations
(including Linux, whose threads aren't quite POSIX-compliant, e.g. in
regard to signal handling, but aren't that far off).

Essentially, blocking system calls only block the calling kernel
thread.

OTOH, if you are implementing multiple user-space threads within a
single kernel thread, if that kernel thread blocks, all of the
user-space threads within it will be blocked.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

RE: [Haskell-cafe] Re: Hugsvs GHC (again)was: Re: Somerandomnewbiequestions

2005-01-19 Thread Glynn Clements


Simon Marlow wrote:

 We do use a thread pool.  But you still need as many OS threads as there
 are blocked read() calls, unless you have a single thread doing select()
 as I described.

How does the select() help? AFAIK, select() on a regular file or block
device will always indicate that it is readable, even if a subsequent
read() would have to read the data from disk.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Hugs vs GHC (again)was: Re: Somerandomnewbiequestions

2005-01-19 Thread Glynn Clements


Ben Rudiak-Gould wrote:

  GHC really needs non-blocking
  I/O to support its thread model, and memory-mapped I/O always blocks.
  
  If, by blocks, you mean that execution will be suspended until the
  data has been read from the device into the buffer cache, then Unix
  non-blocking I/O (i.e. O_NONBLOCK) also blocks.
 
 Okay, my ignorance of Posix is showing again. Is it currently the case, 
 then, that every GHC thread will stop running while a disk read is in 
 progress in any thread?

The kernel thread which called read() will be blocked. If GHC threads
are userspace threads running within a single kernel thread, then they
will all block. If GHC uses multiple kernel threads, the other kernel
threads will continue to run.

 Is this true on all platforms?

Some platforms (but, AFAIK, not linux) allow asynchronous I/O on
regular files. NT has overlapped I/O, which is essentially the same
thing.

 There are two ways of reading from a file/stream in Win32 on NT. One is 
 asynchronous: the call returns immediately and you receive a 
 notification later that the read has completed. The other is synchronous 
 but almost-nonblocking: it returns as much data as is available, and 
 the entire contents of a file is considered always available. But it 
 always returns at least one byte, and may spend an arbitrary amount of 
 time waiting for that first byte. You can avoid this by waiting for the 
 handle to become signalled; if it's signalled then a subsequent ReadFile 
 will not block indefinitely.
 
 Win32's synchronous ReadFile is basically the same as Posix's (blocking) 
 read. For some reason I thought that Win32's asynchronous ReadFile was 
 similar to Posix's non-blocking read, but I gather from [1] that they're 
 completely different.

They're similar, but not identical. Traditionally, Unix non-blocking
I/O (along with asynchronous I/O, select() and poll()) were designed
for slow streams such as pipes, terminals, sockets etc. Regular
files and block devices are assumed to return the data immediately.

Essentially, for slow streams, you have to wait for the data to arrive
before it can be read, so waiting may take an indefinite amount of
time. For fast streams, the data is always available, you just
have to wait for the system call to give it to you.

IOW, the time taken to read from a block device is amortised into the
execution time of the system call, rather than being treated as a
delay.

Also, even with blocking I/O, slow streams only block if no data is
available. If less data is available than was requested, they will
usually return whatever is available rather than waiting until they
have the requested amount. Non-blocking I/O only affects the case
where no data is available.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: I/O interface

2005-01-12 Thread Glynn Clements


Ferenc Wagner wrote:

 dup()-ed filehandles share a common file position.

They also share the file status flags (O_NONBLOCK, O_APPEND etc). So,
enabling or disabling non-blocking I/O will affect all descriptors
obtained by duplication (either by dup/dup2 or by fork).

OTOH, each descriptor has its own set of descriptor flags (i.e. the
close-on-exec flag).

A related issue is that device state (e.g. terminal settings) is a
property of the device itself, and so is shared amongst all
descriptors which refer to the device regardless of whether they were
created by dup/dup2 or a separate open() call.

For this reason, hSetBuffering shouldn't be modifying the ICANON flag,
IMHO.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Parse text difficulty

2004-12-09 Thread Glynn Clements


Malcolm Wallace wrote:

 Prelude [1..5] `zipWith (+)` [7..]
 interactive:1: parse error on input `('
  
  is there a technical reason for this or did it just happen?
 
 If you are asking why general expressions are prohibited between
 backticks, yes, there is a reason.  The expression could be arbitrarily
 large, so you might have to search many lines to find the closing
 backtick.  But in such a situation, it is surely much more likely
 that the programmer has simply forgotten to close the ticks around
 a simple identifier.  Just think of the potential for delightfully
 baffling type error messages that might result!

There's also the issue that you wouldn't be allowed to use backticks
within such an expression, so you would need additional grammar rules
describing expressions which are allowed within backticks.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Top Level TWI's again was Re: [Haskell] Re: Parameterized Show

2004-11-23 Thread Glynn Clements


Keean Schupke wrote:

 Can a C function be pure? I guess it can... The trouble is you cannot 
 proove its
 pure?
 
 But - why would you want to use a pure C function.

Because it already exists? E.g. most BLAS/LAPACK functions are pure;
should they be re-written in Haskell?

[Yes, I know that BLAS/LAPACK are written in Fortran, but I don't
think that changes the argument. The resulting object code (which is
what you would actually be using) wouldn't be significantly different
if they were written in C.]

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Bug in touchForeignPtr?

2004-11-22 Thread Glynn Clements


Keean Schupke wrote:

  C exit routines aren't responsible for freeing OS resources; the OS
  is.
  
  The fact that the SysV IPC objects aren't freed on exit is
  intentional; they are meant to be persistent. For the same reason, the
  OS doesn't delete upon termination any files which the process
  created.

  
 Right, which is why if you want to clean up temporary files, or
 temporary semaphores the OS doesn't do it for you, and you
 need to put some routine inplace to do it (using at_exit)... It
 seems this is the only way to guarantee something gets run when
 a program exits for whatever reason.

There isn't any way to *guarantee* that something is run upon
termination. The program may be terminated due to SIGKILL (e.g. due to
a system-wide lack of virtual memory). If you run out of stack, you
may not be able to call functions to perform clean-up.

Also, if the program crashes, handling the resulting SIGSEGV (etc) is
likely to be unreliable, as the memory containing the resource
references may have been trashed. Calling remove() on a filename which
might have been corrupted is inadvisable.

Also, at_exit() isn't standard. atexit() is ANSI C, but that is only
supposed to be called for normal termination (exit() or return from
main()), not for _exit() or fatal signals.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Re: [Haskell-cafe] Sample rate inference

2004-11-11 Thread Glynn Clements


Henning Thielemann wrote:

The computation sample rate should be propagated through the network as
   follows:
 If in a component of equal sample rate some processors have the same
   fixed sample rate, all uncertain processors must adapt that. 
 If some processors have different fixed sample rates this is an error. 
 If no processor has a fixed sample rate, the user must provide one
   manually.
To me this looks very similar to type inference. Is there some mechanism
   in Haskell which supports this programming structure? 
  
  If you define a class for sample rates, and an instance for each
  possible sample rate, then you could use type inference,
 
 Interesting approach, though it's not good idea to restrict to some sample
 rates. It's also not necessary to do the inference at compile time. 

Ah. I think that I took your comparision to type inference too
literally.

  I doubt that this specific example wouldn't work in practice (the type
  inference would probably give the compiler a heart attack), but you
  could presumably construct an equivalent mechanism using base-N
  numerals.
 
 :-)
 
 How can one implement a sample rate inference that work at run-time for
 arbitrary rates? This will be the only way if one works with sampled
 sounds read from a file. 

This is essentially unification.

Haskell and ML use it for type inference and for pattern matching,
(although pattern matching is always unidirectional, i.e unifying a
pattern comprised of both variables and constants with a value
comprised solely of constants). Prolog uses it more extensively
(variables can occur on either side).

Essentially, unification involves matching structures comprised of
constants, variables, and other structures. An unbound variable
matches anything, resulting in the variable becoming bound; a bound
variable matches whatever its value matches; a constant matches
itself; and a structure matches another structure if they have the
same number of components and all of their components match.

You could probably use GHC's type inference code, although converting
it for your purposes may be more work than starting from scratch. The
Hugs98 code contains a miniature prolog implementation, so you could
take the unification algorithm from that.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Space efficiency problem

2004-11-10 Thread Glynn Clements


Keith Wansbrough wrote:

  The problem is that there are
  so many iterations, that the program gets killed (kill -9) by the
  system.
 
 I'm not sure what you mean here - I've never encountered a system that
 kills processes with -9, other than at shutdown time.  Are you sure
 it's -9?

If a process exhausts its resource limits (as set with setrlimit()),
the kernel will typically kill it with SIGKILL. Also, if the available
system-wide memory gets too low, the kernel may start killing of
processes, again with SIGKILL.

When this occurs, the shell from which the process was spawned will
typically write Killed to the terminal.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Set of reals...?

2004-10-29 Thread Glynn Clements


MR K P SCHUPKE wrote:

 Double already has +Inf and -Inf; it's just that Haskell doesn't have
 (AFAIK) syntax to write them as constants.
 
   In the source for the GHC libraries it uses 1/0 for +Infinity
 and -1/0 for -Infinity, so I assume these are the official way to do it.
 
 Personally I would define nicer names:
 
   positiveInfinity :: Double
   positiveInfinity = 1/0
 
   negativeInfinity :: Double
   negativeInfinity = -1/0

Or just:

infinity = 1/0

and use -infinity for the negative.

One other nit: isn't the read/show syntax for Haskell98 types supposed
to valid Haskell syntax?

From http://www.haskell.org/onlinereport/derived.html#derived-text

The result of show is a syntactically correct Haskell
expression containing only constants, given the fixity
declarations in force at the point where the type is declared.

[Note: the above sentecne refers specifically to derived instances,
but induction would require that it also holds for base types.]

However:

Prelude let infinity = 1/0 :: Double
Prelude show infinity
Infinity
Prelude read (show infinity) :: Double
Infinity
Prelude Infinity

interactive:1: Data constructor not in scope: `Infinity'

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Is it safe to call getProcessExitCode more than once?

2004-10-28 Thread Glynn Clements


David Brown wrote:

Both [waitForProcess and getProcessExitCode] will throw
an exception if the process terminated on a signal.
  
  So if I terminate a process manually, I'll have to wait for
  the ExitCode to avoid a zombie process, and waiting for the
  ExitCode invariably throws an exception.
 
 It's just the way that Unix process management works.  I guess you have to
 catch the exception to handle it well.  This is part of the aspect that
 makes writing shells so complicated.

I think that Peter was referring primarily to the fact that the
Haskell interface to waitpid() throws an exception if the process
terminated due to a signal, not the fact that you have to reap
children to prevent the accumulation of zombies.

The C interface is that waitpid() (and similar) return a status code;
you can then use the macros from sys/wait.h to determine whether the
process terminated normally (e.g. via exit()) or abnormally (due to a
fatal signal), and to obtain either the exit code or the signal number
as appropriate.

The Haskell interface oversimplifies matters, making it easier to get
the exit code in the case of normal termination, but complicating the
handling of abnormal termination.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Re: [Haskell-cafe] Set of reals...?

2004-10-28 Thread Glynn Clements


Stijn De Saeger wrote:

 Now, for unions I tried the following: 
 to take the union of two BasicSets, just append them and contract the result.
 contracting meaning: merge overlapping intervals.
 
  contract :: Range - Range - BasicSet
  contract (x1,y1) (x2,y2) 
| x2 = y1 = if x2 = x1 then [(x1, (max y1 y2))] else 
   if y2 = x1 then [(x2, (max y1 y2))] else [(x2,y2), (x1,y1)]
| x1 = y2 = if x1 = x2 then [(x2, (max y1 y2))] else 
   if y1 = x2 then [(x1, (max y1 y2))] else [(x1,y1), (x2,y2)]
| x1 = x2 = [(x1,y1), (x2, y2)]
 
 
 Now generalizing this from Ranges to BasicSets is where i got stuck.
 In my limited grasp of haskell and FP, this contractSet function below
 is just crying for the use of a fold operation, but i can't for the
 life of me see how to do it.

As the result is a BasicSet, the accumulator would need to be a
BasicSet and the operator would need to have type:

BasicSet - Range - BasicSet

This can presumably be implemented as a fold on contract, so
contractSet would essentially be a doubly-nested fold.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

RE: Process library and signals

2004-10-27 Thread Glynn Clements


Simon Marlow wrote:

 So basically you're saying that if runProcess is to be used in a
 system()-like way, that is the parent is going to wait synchronously for
 the child, then the parent should be ignoring SIGQUIT/SIGINT.  On the
 other hand, if runProcess is going to be used in a popen()-like way,
 then the parent should not be ignoring SIGQUIT/SIGINT.

Exactly.

 The current
 interface doesn't allow for controlling the behaviour in this way.

Yep.

 So the current signal handling in runProcess is wrong, and should
 probably be removed.  What should we have instead?  We could implement
 the system()-like signal handling for System.Cmd.system only, perhaps.

Well, probably for system and rawSystem.

The problem, as I see it, is that the Process library is meant to be
both flexible and portable. If you don't need the portability, you
already have the primitives in System.Posix, and separate fork/exec
will inevitably provide more flexibility than an all-in one version.

If you provide system/rawSystem and runInteractive{Command,Process},
that's covered the most common cases (i.e. system() and popen()). So
what is runProcess for? If it doesn't do the signal handling, it's
only really suitable for popen-style usage.

Which is unfortunate; I can imagine a use for an intermediate
semi-raw system, which supports e.g. file redirection or even
command pipelines, but without using the shell (i.e. accepts the
argv[] individually). In particular, using the shell is risky if you
want to use untrusted data in the argument list (e.g. CGI programs).

If runProcess doesn't do the signal handling between the fork and the
exec, you can't change the child's signal handling after the exec. You
could change the signal handling of the parent (i.e. the current
process) before calling runProcess, let the child inherit it, then
change it back again after runProcess returns, but that gives rise to
a potential race condition.

One possibility would be to allow an extra argument of type IO () (or
Maybe (IO ()), where Nothing is shorthand for Just $ return ()) which
would be executed between the fork and the exec on Unix and ignored on
Windows. AFAICT, that would expose the full functionality available on
Unix without interfering with Windows usage or adding complexity.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Re: Is it safe to call getProcessExitCode more than once?

2004-10-27 Thread Glynn Clements


Peter Simons wrote:

   Both [waitForProcess and getProcessExitCode] will throw
   an exception if the process terminated on a signal.
 
 So if I terminate a process manually, I'll have to wait for
 the ExitCode to avoid a zombie process, and waiting for the
 ExitCode invariably throws an exception.
 
 Or do I misunderstand something?

No, that seems correct.

Although, depending upon the OS, setting SIGCHLD to SIG_IGN may cause
processes to be reaped automatically (i.e. not become zombies), so
that's a possible alternative.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Re: [Haskell-cafe] Re: exitFailure under forkProcess

2004-10-27 Thread Glynn Clements


John Goerzen wrote:

 Oh also, I would very much appreciate Haskell interfaces to realpath()
 and readlink().

I don't know about realpath() (which is a BSD-ism, and included in GNU
libc, but I'm not sure about other Unices), but readlink() exists as
System.Posix.readSymbolicLink.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

RE: [Haskell-cafe] Re: exitFailure under forkProcess

2004-10-27 Thread Glynn Clements


Simon Marlow wrote:

  Yes.  Its POSIX interface is, uhm, weird.  I can't quite put my finger
  on it, but things like setting up a pipe to a child process's stdin
  just seem brittle and fragile with all sorts of weird errors.  I can
  do this in my sleep in C, Perl, or Python but in Haskell I can barely
  make it work when I'm fully conscious :-)
 
 *laughs*
 
 Is there anything concrete we can do?  The POSIX layer is supposed to be
 pretty minimal, so in theory most POSIX idioms should not be harder in
 Haskell, and hopefully should be easier.

Part of the problem is that you can't always consider the use of
individual POSIX functions in isolation. Things which are done
(possibly unknowingly) in one place might affect the way in which
other system calls behave.

One major issue is the way in which fork() has global consequences.

E.g. if a library has file descriptors for internal use, fork() will
duplicate them. If the library subsequently closes its copy of the
descriptor, but the inherited copy (which the child may not even know
exists) remains open, the file (socket, device, etc) will remain open.

Another example of this is the interaction between buffered streams
and descriptors. If a process forks while unflushed data remains in
a stream, the data may be written twice. This can be quite serious if
the stream corresponds to some form of control channel (i.e. a pipe or
socket communicating with another process).

Ultimately, the only real solution to such issues is to ensure that
any high-level functionality provides a sufficient level of
cooperation with lower-level code, e.g. allowing it to be
synchronised, or at least shut down into a state such that it
doesn't interfere, ensuring that it doesn't hide unnecessary details
which may actually be necessary in more involved programs, etc.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Set of reals...?

2004-10-27 Thread Glynn Clements


Stijn De Saeger wrote:

 Thanks for the explanation, at first it seemed like enumFromThenTo
 would indeed give me the functionality I am looking for. But then all
 of GHCi started acting weird while playing around... this is a
 copy-paste transcript from the terminal.
 
 *S3 0.5 `elem` [0.0,0.1..1.0]
 True
 *S3 0.8 `elem` [0.6,0.7..1.0]
 False
 *S3 0.8 `elem` [0.6,0.7..1.0]
 False
 *S3 [0.6,0.7..0.9]
 [0.6,0.7,0.7999,0.8999]
 *S3 
 
 

Floating point has limited precision, and uses binary rather than
decimal, so you can't exactly represent multiples of 1/10 as
floating-point values. Internally, the elements of the list would
actually be out by a relative error of ~2e-16 for double-precision,
~1e-7 for single precision, but the code which converts to decimal
representation for printing rounds it.

However, Haskell does support rationals:

Prelude [6/10 :: Rational,7/10..9/10]
[3 % 5,7 % 10,4 % 5,9 % 10]
Prelude 4/5 `elem` [6/10 :: Rational,7/10..9/10]
True

 in your reply you wrote :
  However, you can't specify infinitesimally small steps, nor increment
  according to the resolution of the floating point type (at least, not
  using the enumeration syntax; you *could* do it manually using integer
  enumerations and encodeFloat, but that wouldn't be particularly
  practical).
 
 Is this what you were referring to? i wouldn't say 0.1 is an
 infinitesimal small step.

No; you could realistically use much smaller steps than that. My point
was that you can't realistically use sufficiently small steps that
values won't fall through the cracks:

Prelude 0.61 `elem` [0.6,0.7..0.9]
False

Whilst you could, without too much effort, enumerate a range of
floating-point values such that all intermediate values were included,
the resulting list would be massive. Single precision floating-point
uses a 24-bit mantissa, so an exhaustive iteration of the range
[0.5..1.0] would have 2^24+1 elements.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: exitFailure under forkProcess

2004-10-27 Thread Glynn Clements


John Goerzen wrote:

 I wonder what the behavior of fwrite() in this situation is.  I don't
 know if it ever performs buffering such that write() is never called
 during a call to fwrite().

fwrite() is no different to other stdio functions in this regard. If
the stream is buffered, a call to fwrite() may simply result in data
being appended to the buffer; it doesn't guarantee a call to write().

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Is it safe to call getProcessExitCode more than once?

2004-10-26 Thread Glynn Clements


Peter Simons wrote:

 John Goerzen writes:
 
   Assuming it is based on wait() or one of its derivatives,
   and I suspect it is, you cannot call it more than once
   for a single process.
 
 That's what I _assume_, too, but a definite answer would be
 nice. 
 
 In the meanwhile, I have found out that it might not be safe
 to call it once, even:
 
   CaughtException waitForProcess: does not exist (No child processes)
 
 That's a child I _did_ start and which apparently terminated
 before I called waitForProcess. Shouldn't I be getting the
 exit code of that process rather than an exception?

I can think of two reasons why this might be happening:

1. SIGCHLD is being ignored (SIG_IGN); the Process library doesn't
appear to be doing this, but something else might.

2. Something else (e.g. the RTS) is handling SIGCHLD and reaping the
process automatically.

 Do waitForProcess and getProcessExitCode differ in their
 behavior other than that one blocks and other doesn't?

Both call waitpid(); getProcessExitCode uses WNOHANG, while
waitForProcess doesn't.

They differ in their handling of errors. waitForProcess will throw an
exception if waitpid() indicates any error (except EINTR, where it
just retries the waitpid() call), whereas getProcessExitCode will
return Nothing. Both will throw an exception if the process terminated
on a signal.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Process library and signals

2004-10-26 Thread Glynn Clements


Having looked at the latest version of the Process library, it appears
that my earlier comments about signal handling may have been
misinterpreted.

First, my comments regarding the handling of SIGINT/SIGQUIT were
specific to system(). The C system() function ignores these signals in
the parent while the child is executing. However, this doesn't
necessarily apply to other functions; e.g. popen() doesn't ignore
these signals, and runProcess probably shouldn't either.

With system(), the parent blocks until the child is finished, so if
the user presses Ctrl-C to kill the currently executing process,
they probably want to kill the child. If the parent wants to die on
Ctrl-C, it can use WIFSIGNALED/WTERMSIG to determine that the child
was killed and terminate itself.

OTOH, with popen(), the parent continues to run alongside the child,
with the child behaving as a slave, so the parent will normally want
to control the signal handling.

Ideally, system() equivalents (e.g. system, rawSystem) would ignore
the signals in the parent, popen() equivalents (e.g. 
runInteractiveProcess) wouldn't, and lower-level functions (e.g. 
runProcess) would give you a choice.

Unfortunately, there is an inherent conflict between portability and
generality, as the Unix and Windows interfaces are substantially
different. Unix has separate fork/exec primitives, with the option to
execute arbitrary code between the two, whilst Windows has a single
primitive with a fixed set of options.

Essentially, I'm not sure that a Windows-compatible runProcess would
be sufficiently general to accurately implement both system() and
popen() equivalents on Unix. Either system/rawSystem should be
implemented using lower-level functions (i.e. not runProcess) or
runProcess needs an additional option to control the handling of
signals in the child.

Also, my comment regarding the signals being reset in the child was
inaccurate. system() doesn't reset them in the sense of SIG_DFL. It
sets them to SIG_IGN before the fork(), recording their previous
handlers. After the fork, it resets them in the child to the values
they had upon entry to the system() function (i.e. to the values they
had before they were ignored). The effect is as if they had been set
to SIG_IGN in the parent after the fork(), but without the potential
race condition.

Thus, if they were originally ignored in the parent before system()
was entered, they will be ignored in the child. If they were at their
defaults (SIG_DFL) before system() was entered, they will be so in the
child. If they had been set to specific handlers, system() will
restore those handlers in the child, but then execve() will reset them
to SIG_DFL, as the handler functions won't exist after the execve().

-- 
Glynn Clements [EMAIL PROTECTED]
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

RE: [Haskell-cafe] Are handles garbage-collected?

2004-10-24 Thread Glynn Clements


Conal Elliott wrote:

   What happens when a System.IO.Handle falls out of scope
   without being explicitly hClosed? Is that a resource leak?
   Or will the RTS close the handle for me?
  
  AFAIK, Handles have finalisers which close them, but I don't know if GHC
  triggers garbage collection when file descriptors run out.  If not, you
  will have problems if you manage to run out of fds between GCs.
  
  How about using bracket to introduce explicit close on end of scope?
 
 I'm puzzled why explicit bracketing is seen as an acceptable solution.
 It seems to me that bracketing has the same drawbacks as explicit memory
 management, namely that it sometimes retains the resource (e.g., memory
 or file descriptor) longer than necessary (resource leak) and sometimes
 not long enough (potentially disastrous programmer error).  Whether the
 resource is system RAM, file descriptors, video memory, fonts, brushes,
 bitmaps, graphics contexts, 3D polygon meshes, or whatever, I'd like GC
 to track the resource use and free unused resources correctly and
 efficiently.

File descriptors aren't simply a resource in the sense that memory
is. Closing a descriptor may have significance beyond the process
which closes it. If it refers to the write end of a pipe or socket,
closing it may cause the reader to receive EOF; if it refers to a
file, any locks will be released; and so on.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: ANNOUNCE: GHC version 6.2.2

2004-10-15 Thread Glynn Clements


Simon Marlow wrote:

=
 The (Interactive) Glasgow Haskell Compiler -- version 6.2.2
=
 
 The GHC Team is pleased to announce the latest patchlevel release of
 GHC, 6.2.2.  This is a bugfix release only, there are no new features.
 Code that worked with 6.2.1 will work unchanged with 6.2.2.

Should it be possible to obtain this via CVS? My attempts to update
from 6.2.1 with cvs update -r ghc-6-2-2 ... fail with:

cvs [server aborted]: cannot write /cvs/CVSROOT/val-tags: Read-only file system

-- 
Glynn Clements [EMAIL PROTECTED]
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

[Haskell] Re: ANNOUNCE: GHC version 6.2.2

2004-10-15 Thread Glynn Clements


Simon Marlow wrote:

=
 The (Interactive) Glasgow Haskell Compiler -- version 6.2.2
=
 
 The GHC Team is pleased to announce the latest patchlevel release of
 GHC, 6.2.2.  This is a bugfix release only, there are no new features.
 Code that worked with 6.2.1 will work unchanged with 6.2.2.

Should it be possible to obtain this via CVS? My attempts to update
from 6.2.1 with cvs update -r ghc-6-2-2 ... fail with:

cvs [server aborted]: cannot write /cvs/CVSROOT/val-tags: Read-only file system

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell

Re: [Haskell] threading mutable state through callbacks

2004-10-12 Thread Glynn Clements


Vincenzo Ciancia wrote:

  Unfortunately, in this case the whole point of what people are trying
  to do with unsafePerformIO is to allow these things to be visible at
  the top level :-)
 
 Sometimes I get too much involved in what I think about, and forget the 
 original goal :) A little _too_ naive, it seems, I apologize. So it's 
 like the original idea, that using these toplevel IO bindings one has 
 to impose an order of evaluation over all program bindings, which 
 surely is against the current meaning of haskell programs, e.g. if I 
 say
 
 conf - readMyConfFile
 init = fn conf
 
 people would agree that the correct meaning is to first evaluate all of 
 the IO bindings and then the rest of the program:
 
 x1 - a1
 ...
 xn - an
 
 v1 = expr1
 ...
 vn = exprn
 
 main = action
 
 should be equivalent to
 
 main = do
  x1 - a1
  ...
  xn - an
  let v1 = expr1
  ...
  vn = exprn in
action
 
 This would not change the meaning of a standard haskell program I think 
 (but I am not an expert as you see). Am I wrong?

In the former, the variables have global scope, and may be exported
from the module. Also, what if you do this in a module other than
Main?

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell

Re: [Haskell-cafe] Re: What Functions are Standard?

2004-10-06 Thread Glynn Clements


Malcolm Wallace wrote:

   I can't comment on nhc98, but the Haskell98 standard doesn't include
   any mechanism for binary I/O.
  
  Ouch.  That seems like a major oversight to me.  Will there be any
  effort to fix that in the future?
 
 Note that, on Unix-like systems, there is no difference between
 text I/O and binary I/O on files.  It is only Windows that requires
 a separation of the modes.

There are two issues here.

The first is EOL conversion; as Malcom notes, this isn't an issue on
Unix, but it is an issue on Windows. On Windows, there is no standard
way to obtain the contents of a file such that \n and \r\n are
distinct.

The second is character encoding/decoding. The Haskell98 I/O functions
all deal with Chars. When reading a file, the byte stream is converted
to a list of characters using an *unspecified* encoding. AFAIK, all
implementations are currently hardcoded to assume ISO-8859-1, so you
can reliably obtain the original list of bytes using the ord function.

However, nothing in the standard dictates that ISO-8859-1 is used, and
there has been talk of using the locale's encoding instead. If that
were to happen, it would be practically (as well as theoretically)
impossible to perform binary I/O using the Haskell98 API, even on
Unix.

This issue has been beaten to death fairly recently, so I'm not going
to repeat it here. See the thread entitled Writing binary files from
Sep 11-18 for the details.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] What Functions are Standard?

2004-10-05 Thread Glynn Clements


John Goerzen wrote:

 Hello,
 
 I have been writing code using the docs over at
 http://www.haskell.org/ghc/docs/latest/html/libraries/index.html, which
 is the only comprehensive library reference I could find.
 
 I am using some code from System.IO, supposedly from base.  When I try
 to build this with nhc98, it doesn't know about hGetBuf, hPutBuf, or
 openBinaryFile from there or about mallocForeignPtrArray from the
 Foreign.* area.  But all these look standard to me.

They aren't; they are GHC extensions, except for
mallocForeignPtrArray, which is specified by the FFI addendum:

http://www.cse.unsw.edu.au/~chak/haskell/ffi/

 What am I missing here?  Does nhc98 really completely lack the ability
 to read binary data from a file?

I can't comment on nhc98, but the Haskell98 standard doesn't include
any mechanism for binary I/O.

 Or where should I be finding it, and
 how could I have known for myself that those particular ghc functions
 were unsupported elsewhere?

The Haskell98 report can be found at:

http://www.haskell.org/onlinereport/

Anything which isn't listed there is essentially a vendor extension.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: newCString -- to 'free' or not?

2004-09-25 Thread Glynn Clements


Peter Simons wrote:

 When I create a CString with Foreign.C.String.newCString, do
 I have to 'free' it after I don't need it anymore? Or is
 there some RTS magic taking place?
 
 How about Foreign.Marshal.Utils.new and all those other
 newXYZ functions? 

Yes. The new* functions allocate the memory with malloc, and you have
to free it yourself. OTOH, the with* functions allocate the memory
with alloca, and it is freed automatically.

Also, a ForeignPtr includes a finaliser which will free the data
automatically when it is no longer referenced.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

RE: [Haskell-cafe] Re: Writing binary files?

2004-09-17 Thread Glynn Clements


MR K P SCHUPKE wrote:

 You wouldn't want to have to accumulate the
 entire body as a single byte string
 
 Ever heard of lazyness? Haskell does it quite well... Accumulating
 the entire body doesn't really do this because haskell is lazy. You
 don't need a more complex interface in Haskell!

Are you sure that will work in the general case? Or are you assuming
lazy I/O?

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Writing binary files?

2004-09-17 Thread Glynn Clements


Marcin 'Qrczak' Kowalczyk wrote:

  What I'm suggesting in the above is to sidestep the encoding issue
  by keeping filenames as byte strings wherever possible.
 
 Ok, but let it be in addition to, not instead treating them as
 character strings.

Provided that you know the encoding, nothing stops you converting them
to strings, should you have a need to do so.

  Processing data in their original byte encodings makes supporting
  multiple languages harder. Filenames which are inexpressible as
  character strings get in the way of clean APIs. When considering only
  filenames, using bytes would be sufficient, but in overall it's more
  convenient to Unicodize them like other strings.
 
  It also harms reliability. Depending upon the encoding, two distinct
  byte strings may have the same Unicode representation.
 
 Such encodings are not suitable for filenames.

Regardless of whether they are suitable, they are used.

 For me ISO-2022 is a brain-damaged concept and should die.

Well, it isn't likely to.

I haven't addressed any of the other stuff about ISO-2022, as it isn't
really relevant. Whether ISO-2022 is good or bad doesn't matter; what
matters is that it is likely to remain in use for the foreseeable
future.

  Such tarballs are not portable across systems using different encodings.
 
  Well, programs which treat filenames as byte strings to be read from
  argv[] and passed directly to open() won't have any problems with this.
 
 The OS itself may have problems with this; only some filesystems
 accept arbitrary bytes apart from '\0' and '/' (and with the special
 meaning for '.'). Exotic characters in filenames are not very
 portable.

No, but most Unix programs manage to handle them without problems.

  A Haskell program in my world can do that too. Just set the encoding
  to Latin1.
 
  But programs should handle this by default, IMHO.
 
 IMHO it's more important to make them compatible with the
 representation of strings used in other parts of the program.

Why?

  Filenames are, for the most part, just tokens to be passed around.
 
 Filenames are often stored in text files,

True.

 whose bytes are interpreted as characters.

Sometimes true, sometimes not.

Where filenames occur in data files, e.g. configuration files, the
program which reads the configuration file typically passes the bytes
directly to the OS without interpretation.

 Applying QP to non-ASCII parts of filenames is suitable
 only if humans won't edit these files by hand.

Who said anything about QP?

   My specific point is that the Haskell98 API has a very big problem due
   to the assumption that the encoding is always known. Existing
   implementations work around the problem by assuming that the encoding
   is always ISO-8859-1.
  
  The API is incomplete and needs to be enhanced. Programs written using
  the current API will be limited to using the locale encoding.
 
  That just adds unnecessary failure modes.
 
 But otherwise programs would continuously have bugs in handling text
 which is not ISO-8859-1, especially with multibyte encoding where
 pretending that ISO-8859-2 is ISO-8859-1 too often doesn't work.

Why?

 I can't switch my environment to UTF-8 yet precisely because too many
 programs were written with the attitude you are promoting: they don't
 care about the encoding, they just pass bytes around.

That's all that many programs should be doing.

 Bugs range from small annoyances like tabular output which doesn't
 line up, through mangled characters on a graphical display, to
 full-screen interactive programs being unusable on a UTF-8 terminal.

IOW:

1. display doesn't work correctly,
2. display doesn't work correctly, and
3. display doesn't work correctly.

You keep citing cases involving graphical display as a reason why all
programs should be working with characters all of the time.

I haven't suggested that programs should never deal with characters,
yet you keep insinuating that is my argument, then proceed to attack
it.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Writing binary files?

2004-09-16 Thread Glynn Clements

-8859-2 unless specified otherwise.
 
  1. In that situation, you can't avoid the encoding issues. It doesn't
  matter what the default is, because you're going to have to set the
  encoding anyhow.
 
 Why do you always want me to set the encoding?  That should be the job
 of the RTS.

Because you might know the encoding, and the RTS doesn't. The locale
is a fallback mechanism, for the situation where you *need* an
encoding but one hasn't been specified by other means.

  2. If you assume ISO-8859-1, you can always convert back to Word8
 
 If I want a list of Word8's, then I should be able to get them without
 extracting them from a string.

The point is that, currently, you can't. Nothing in the core Haskell98
API actually uses Word8, it all uses Char/String.

  then re-decode as UTF-8. If you assume UTF-8, anything which is neither
  UTF-8 nor ASCII will fail far more severely than just getting the
  collation order wrong.
 
 If I use String's to handle binary data, then I should expect things
 to break.  If I want to get text, and it's not in the expected
 encoding, then the user has messed up.

Or maybe the expectation is incorrect.

  Well, my view is essentially that files should be treated as
  containing bytes unless you explicitly choose to decode them, at
  which point you have to specify the encoding.
 
 Why do you always want me to _manually_ specify an encoding?

Because we don't have an oracle which will magically determine the
encoding for you.

 If I
 want bytes, I'll use the (currently being discussed, see beginning of
 this thread) binary I/O API, if I want String's (i.e. text), I'll use
 the current I/O API (which is pretty text-orientated anyway, see
 hPutStrLn, hGetLine, ...).

If you want text, well, tough; what comes out most system calls and
core library functions (not just read()) are bytes. There isn't any
magic wand which will turn them into characters without knowing the
encoding.

  completely new wide-character API for those who wish to use it.
 
 Which would make it horrendously difficult to do even basic I18N.

Why?

  That gets the failed attempt at I18N out of everyone's way with a
  minimum of effort and with maximum backwards compatibility for
  existing code.
 
 If existing code, expects String's to be just a list of bytes, it's
 _broken_.

I know. That's what I'm saying. The problem is that the broken code
is the Haskell98 API.

  String's are a list of unicode characters, [Word8] is a
 list of bytes.

And what comes out of (and goes into) most core library functions is
the latter.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Writing binary files?

2004-09-16 Thread Glynn Clements

, to be used in an unfriendly
 environment.
 
 A Haskell program in my world can do that too. Just set the encoding
 to Latin1.

But programs should handle this by default, IMHO. Filenames are, for
the most part, just tokens to be passed around. You get a value from
argv[], and pass it to open() or whatever. It doesn't need to have any
meaning.

  My specific point is that the Haskell98 API has a very big problem due
  to the assumption that the encoding is always known. Existing
  implementations work around the problem by assuming that the encoding
  is always ISO-8859-1.
 
 The API is incomplete and needs to be enhanced. Programs written using
 the current API will be limited to using the locale encoding.

That just adds unnecessary failure modes.

 Just as ReadFile is limited to text files because of line endings.
 What do you prefer: to provide a non-Haskell98 API for binary files,
 or to fix the current API by forcing programs to use \r\n on
 Windows and \n on Unix manually?

That's a harder case. There is a good reason for auto-converting EOL,
as most programs actually process file contents. Most programs don't
process filenames; they just pass them around.

  If filenames were expressed as bytes in the Haskell program, how would
  you map them to WinAPI? If you use the current Windows code page, the
  set of valid characters is limited without a good reason.
 
  Windows filenames are arguably characters rather than bytes. However,
  if you want to present a common API, you can just use a fixed encoding
  on Windows (either UTF-8 or UTF-16).
 
 This encoding would be incompatible with most other texts seen by the
 program. In particular reading a filename from a file would not work
 without manual recoding.

We already have that problem; you can't read non-Latin1 strings from
files.

In some regards, the problem is worse on Windows, because of the
prevalence of non-ASCII text (Windows 12xx and smart quotes), so
using UTF-8 for file contents on Windows is even harder.

  Which is a pity. ISO-2022 is brain-damaged because of enormous
  complexity,
 
  Or, depending upon ones perspective, Unicode is brain-damaged because,
  for the sake of simplicity, it over-simplifies the situation. The
  over-simplification is one reason for it's lack of adoption in the CJK
  world.
 
 It's necessary to simplify things in order to make them usable by
 ordinary programs. People reject overly complicated designs even if
 they are in some respects more general.
 
 ISO-2022 didn't catch - about the only program I've seen which tries
 to fully support it is Emacs.

And X. Compound text is ISO-2022. For commercial X software, Motif
(which uses compound text) is still the most widely-used toolkit.

But, then, the fact that you haven't seen many ISO-2022 programs is
probably because you're used to using programs developed by and for
Westerners. In the far East, ISO-2022 is by far the most popular
encoding. There, you could realistically ignore all other encodings.

BTW, that's why Emacs (and XEmacs) support ISO-2022 much better than
they do UTF-8. Because MuLE was written by Japanese developers.

  Multi-lingual text consists of distinct sections written in distinct
  languages with distinct alphabets. It isn't actually one big chunk
  in a single global language with a single massive alphabet.
 
 Multi-lingual text is almost context-insensitive. You can copy a part
 of it into another text, even written in another language, and it will
 retain its alphabet - this is much harder with stateful ISO-2022.
 
 ISO-2022 is wrong not by distinguishing alphabets but by being
 stateful.

Sure, the statefulness adds complexity (which is one of the reasons so
many people prefer to work with UTF-8), but it has the benefit of
providing distinct markers to indicate where the character set is
being switched (that isn't a compelling advantage; you could
reconstruct the markers if you could uniquely determine the character
set for each character).

OTOH, Unicode is wrong by not distinguishing character sets. This is a
significant reason why it hasn't been adopt in the far East
(specifically, Han unification).

  and ISO-8859-x have small repertoires.
 
  Which is one of the reasons why they are likely to persist for longer
  than UTF-8 true believers might like.
 
 My I/O design doesn't force UTF-8, it works with ISO-8859-x as well.

But I was specifically addressing Unicode versus multiple encodings
internally. The size of the Unicode alphabet effectively prohibits
using codepoints as indices.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Writing binary files?

2004-09-16 Thread Glynn Clements

 locale setting.  If I want to read text and
 can't determine the encoding by other ways (protocol spec, ...), then
 it's what the user set his locale setting to.

No. An oracle would always get it right. The locale merely provides
a fallback.

  If you want text, well, tough; what comes out most system calls and
  core library functions (not just read()) are bytes.
 
 Which need to be interpreted by the program depending on where these
 bytes come from.

They don't necessarily need to be interpreted. A lot of data simply
gets routed from one place to another. E.g. a program reads a
filename from argv[i] and passes it to open(). It doesn't matter if
the filename is in Klingon.

  There isn't any magic wand which will turn them into characters
  without knowing the encoding.
 
 If I know the encoding, I should be able to set it.  If I don't, it's
 the locale setting.

If you *need* an encoding, and don't have any better information, then
the locale provides a last resort. Decoding bytes according to the
locale for the sake of it just adds an unnecessary failure mode.

   completely new wide-character API for those who wish to use it.
  
  Which would make it horrendously difficult to do even basic I18N.
 
  Why?
 
 Having different types for single-byte and multi-byte strings together
 with seperate functions to handle them (that's what I assume you mean
 by a new wide-character API) with single-byte strings being the
 preferred one (the cause of being a seperate API) would make sorting,
 upper/lower case testing etc. not exactly easier.

For case testing, locale-dependent sorting and the like, you need to
convert to characters. [Although possibly only temporarily; you can
sort a list of byte strings based upon their corresponding character
strings using sortBy. This means that a decoding failure only means
that the ordering will be wrong. This is essentially what happens with
ls if you have filenames which aren't valid in the current locale.]

Note: there are still situations where sorting bytes makes sense, i.e. 
where you only need *an* ordering rather than a specific ordering,
e.g. uniq.

  I know. That's what I'm saying. The problem is that the broken code
  is the Haskell98 API.
 
 No, it's not broken.  It just has some missing features (i.e. I/O /
 env functions accepting bytes instead of strings).

It's broken. Being able to represent filenames as byte strings is
fundamental. Being able to convert them to or from character strings
is useful but not essential. The only reason why the existing API
doesn't cause serious problems is because the translation is currently
hardwired to an encoding which can't fail.

   String's are a list of unicode characters, [Word8] is a
  list of bytes.
 
  And what comes out of (and goes into) most core library functions is
  the latter.
 
 Strictly speaking, the former comes out with the semantics of the
 latter. :-)

By core library functions, I was referring primarily to libc, not
the Haskell library functions which were built upon them. The Haskell
developers can change Haskell, they can't change libc.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

RE: [Haskell-cafe] Re: Writing binary files?

2004-09-16 Thread Glynn Clements


Simon Marlow wrote:

  Which is why I'm suggesting changing Char to be a byte, so that we can
  have the basic, robust API now and wait for the more advanced API,
  rather than having to wait for a usable API while people sort out all
  of the issues.
 
 An easier way is just to declare that the existing API assumes a Latin-1
 encoding consistently.  Later we might add a way to let the application
 pick another encoding, or request that the I/O library uses the locale
 encoding.  

But how do you do that without breaking stuff? If the application
changes the encoding to UTF-8 (either explicitly, or by using the
locale's encoding when it happens to be UTF-8), then code such as:

[filename] - getArgs
openFile filename ReadMode

will fail if filename isn't a valid UTF-8 sequence. Similarly for the
other cases where the OS accepts/returns byte strings but the Haskell
interface uses String.

Currently, the use of String for byte strings doesn't cause problems
because decoding using ISO-8859-1 can't fail. Allowing the use of a
fallible decoder introduces a new set of issues.

E.g. what happens if you call getDirectoryContents for a directory
which contains filenames which aren't valid in the current encoding? 
Does the call fail outright, or are invalid entries silently omitted?

I'm less concerned about the handling of streams, as you can
reasonably add a way to change the encoding before any data has been
read or written. I'm more concerned about FilePaths, argv, the
environment etc.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Writing binary files?

2004-09-16 Thread Glynn Clements


Udo Stenzel wrote:

   One more reason to fix the I/O functions to handle encodings and have
   a seperate/underlying binary I/O API.
  
  The problem is that we also need to fix them to handle *no encoding*.
 
 What are you proposing here?  Making the breakage even worse by specifying
 a text based api that uses no encoding?  

No. I'm suggesting that many of the I/O functions shouldn't be
treating their arguments or return values as text.

 Having a seperate byte based api is far better. If you don't know
 the encoding, all you have is bytes, no text.

My point is that many of the existing functions should be changed to
use bytes instead of text (not separate byte/char versions). E.g.:

type FilePath = [Byte]

If you have a reason to treat a FilePath as text, then you convert it.
E.g.

names - getDirectoryContents dir
let namesT = map (toString localeEncoding) names

We don't need a separate getDirectoryContentsAsText, and we certainly
don't want that to be the default.

For stream I/O, then having both text and binary read/write functions
makes sense.

String's are a list of unicode characters, [Word8] is a
   list of bytes.
  
  And what comes out of (and goes into) most core library functions is
  the latter.
 
 So System.Directory needs to be specified in terms of bytes, too.  Looks like
 a clean solution to me.

Sure. But I'm looking for a solution which doesn't involve re-writing
everything, and which won't result in lots of programs suddenly
becoming unreliable if the hardwired default ISO-8859-1 conversion is
changed.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

RE: [Haskell-cafe] Re: Writing binary files?

2004-09-16 Thread Glynn Clements


MR K P SCHUPKE wrote:

 E.g. what happens if you call getDirectoryContents for a directory
 which contains filenames which aren't valid in the current encoding?
 
 Surely this shows the problem with the idea of a 'current encoding'

Yes.

In case I haven't already made this clear, my argument is essentially
that it's the API which is broken, rather than the implementations.

 ... You could be reading files from two remote servers each using
 different encodings...
 
 So you could have read and write raw [Word8] and read and write char,
 somehting like:
 
 readWithEncoder :: ([Word8] - [Char]) - IO [Char]
 writeWithEncoder :: ([Char] - [Word8]) - [Char] - IO ()

In the general case, it needs to be a bit more complex than that, in
order to handle stateful encodings (e.g. ISO-2022), or to handle
decoding multi-byte encodings (e.g. UTF-8) in chunks. Unfortunately,
the iconv interface doesn't allow the encoder state to be extracted,
so a generic iconv-based converter would have to be in the IO monad.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

RE: [Haskell-cafe] Re: Writing binary files?

2004-09-16 Thread Glynn Clements


Simon Marlow wrote:

  Which is why I'm suggesting changing Char to be a byte, so that we
  can have the basic, robust API now and wait for the more advanced
  API, rather than having to wait for a usable API while people sort
  out all of the issues.
  
  An easier way is just to declare that the existing API assumes a
  Latin-1 encoding consistently.  Later we might add a way to let the
  application pick another encoding, or request that the I/O library
  uses the locale encoding.
  
  But how do you do that without breaking stuff? If the application
  changes the encoding to UTF-8 (either explicitly, or by using the
  locale's encoding when it happens to be UTF-8), then code such as:
  
  [filename] - getArgs
  openFile filename ReadMode
  
  will fail if filename isn't a valid UTF-8 sequence. Similarly for the
  other cases where the OS accepts/returns byte strings but the Haskell
  interface uses String.
 
 And that's the correct behaviour, isn't it?

No. The correct behaviour is to keep such data as byte strings. 
Otherwise it's going to be hard to write robust programs if the
hard-wired ISO-8859-1 encoding is ever changed.

In the current implementation, getArgs gets a list of bytes from
argv[], which it converts to a String. The String is passed to
openFile, which converts it back to a list of bytes which are then
passed to open().

Thus the list of bytes is effectively fed through (encode . decode). 
For ISO-8859-*, this is the identity function. For UTF-8, it's a
subfunction of the identity function, i.e. it either returns its input
or it fails. I don't see what is to be gained by having it fail. It
would be preferable to just pass the byte string directly from argv[]
to open().

  I'm less concerned about the handling of streams, as you can
  reasonably add a way to change the encoding before any data has been
  read or written. I'm more concerned about FilePaths, argv, the
  environment etc.
 
 Yes, these are interesting issues.  Filenames are stored as character
 strings on some OSs (eg. Windows) and byte strings on others.  So the
 Haskell portable API should probably use String, and do decoding based
 on the locale (if the programmer asks for it).
 
 Argv and the environment - I don't know.  Windows CreateProcess() allows
 these to be UTF-16 strings, but I don't know what encoding/decoding
 happens between CreateProcess() and what the target process sees in its
 argv[] (can't be bothered to dig through MSDN right now).  I suspect
 these should be Strings in Haskell too, with appropriate
 decoding/encoding happening under the hood.

I suspect that Windows will convert them according to the active
codepage, so that OpenFileA(argv[i], ...) works. 

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Writing binary files?

2004-09-16 Thread Glynn Clements


Gabriel Ebner wrote:

  For case testing, locale-dependent sorting and the like, you need to
  convert to characters. [Although possibly only temporarily; you can
  sort a list of byte strings based upon their corresponding character
  strings using sortBy. This means that a decoding failure only means
  that the ordering will be wrong. This is essentially what happens with
  ls if you have filenames which aren't valid in the current locale.]
 
 sortBy could only cope with single-byte encodings.  Multi-byte
 encodings would need something else.

I think that you may have misunderstood my point. I was referring to
something like this:

type ByteString = [Word8]

decode :: ByteString - String
decode = ...

comparator :: ByteString - ByteString
comparator s1 s2 = compare (decode s1) (decode s2)

sortByteStrings :: [ByteString] - [ByteString]
sortByteStrings ss = sortBy comparator ss

The byte strings which are returned from sortByteStrings are the
original byte strings, but the ordering will be determined by the
encoding. This produces the same results as decode-sort-encode (in
the cases where the latter actually works), but is more robust.

  It's broken. Being able to represent filenames as byte strings is
  fundamental. Being able to convert them to or from character strings
  is useful but not essential. The only reason why the existing API
  doesn't cause serious problems is because the translation is currently
  hardwired to an encoding which can't fail.
 
 Handling binary filenames is hardly fundamental.  It isn't even very
 portable, see the posts about filename handling under modern Windows.
 It might be an important feature, but there are other programs out
 there (mostly GUIs) that expect filenames to be encoded according to
 the locale settings too.

It's fundamental if you want your programs to be robust. For most
programs, there is no legitimate reason to refuse to read a file
because of its name.

A GUI program (or for that matter, a terminal) might legitimately fail
to *display* a filename correctly if it can't decode it (it has to
index into the font). But that isn't a reason to reject it altogether.

E.g. if I create a file whose name contains control characters, most
GUI programs display it incorrectly in the file selection dialog, but
they still manage to open it.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

RE: [Haskell-cafe] Re: Writing binary files?

2004-09-16 Thread Glynn Clements


MR K P SCHUPKE wrote:

 In the general case, it needs to be a bit more complex than that,
 
 Thats why the functions handled lists not individual characters,
 I was assuming that each [Word8] - [Char] represented a valid
 and complete encoding block... IE at the start of each call it
 assumes no escapes. All this means is than when reading in chunks
 you paste those chunks together before conversion, and you can
 only break outside of escapes. This in my opinion is better
 behaviour anyway... I don't want some hidden escape state mangling
 output, just because some earler code generated invalid output.

Right. Certainly, a stateless interface will handle converting
complete strings (pathnames, arguments, etc).

But, ultimately we will have need of a more general interface. E.g. in
the chunked HTTP example which Oleg gave, you would probably want
separate decoders for the headers and body, switching between them as
you read the stream. You wouldn't want to have to accumulate the
entire body as a single byte string just so that you could decode it
in one go, and you can't just push a decoder onto the stream.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Writing binary files?

2004-09-15 Thread Glynn Clements

) nor uses your locale's encoding?

That's a pretty big sacrifice.

  My view is that, right now, we have the worst of both worlds, and
  taking a short step backwards (i.e. narrow the Char type and leave the
  rest alone) is a lot simpler (and more feasible) than the long journey
  towards real I18N.
 
 It would bury any hope in supporting a UTF-8 environment.
 
 I've heard that RedHat tried to impose UTF-8 by default. It was mostly
 a failure because it's too early, too many programs are not ready for
 it. I guess the RedHat move helped to identify some of them. But UTF-8
 will inevitably be usable in future.

If they tried a decade hence, it would still be too early. The
single-byte encodings (ISO-8859-*, koi-8, win-12xx) aren't likely to
be disappearing any time soon, nor is ISO-2022 (UTF-8 has quite
spectacularly failed to make inroads in CJK-land; there are probably
more UTF-8 users in the US than there).

 It would be great if Haskell programs were in the group which can
 support it instead of being forced to be abandoned because of lack
 of Unicode support in the language they are written in.

Haskell should be able to support it, but it shouldn't refuse to
support anything else, it shouldn't make you jump through hoops to
write usable programs, and we shouldn't have to wait until all of the
encoding issues have been sorted out to do things which don't even
deal with encodings.

Look, C has all of the functionality that we're talking about: wide
characters, wide versions of string.h and ctype.h, and conversion
between byte-streams and wide characters.

But it did it without getting in the way of writing programs which
don't care about encodings, without consigning everything which has
gone before to the scrap heap, and without everyone having to wait a
couple of decades to (reliably) do simple things like copying a file
to a socket or enumerating a directory.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Writing binary files?

2004-09-15 Thread Glynn Clements

 as part of the encoding. E.g. for network protocols which
use CRLF, it would be useful to be able to set CRLF as the EOL
convention then use e.g. hPutStrLn to write lines.

 The same thoughts apply to filenames.  Make them [Word8] and convert
 explicitly.

Well, it's arguable that they should be [Word8] on Unix and String on
Windows. I suppose that you could handle the Windows case by
automatically converting to/from UTF-8.

 By the way, I think a path should be a list of names (that
 is of type [[Word8]]) and the library would be concerned with putting in
 the right path separator.  Add functions to read and show pathnames in
 the local conventions and we'll never need to worry about path
 separators again.

There would certainly be some advantages to making FilePath an
abstract type, but there are quite a few corner cases to deal with.

  There are limits to the extent to which this can be achieved. E.g. 
  what happens if you set the encoding to UTF-8, then call
  getDirectoryContents for a directory which contains filenames which
  aren't valid UTF-8 strings?
 
 Well, then you did something stupid, didn't you?  If you don't know the
 encoding you shouldn't decode anything.  That's a strong point against
 any implicit decoding, I think.

Yes. However, I suspect that we will have to live with some of the
mistakes of the past, i.e. using String in the I/O functions.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Writing binary files?

2004-09-15 Thread Glynn Clements

 portable. It's easier to emulate the traditional
  C paradigm in the Unicode paradigm than vice versa,
 
  I'm not entirely sure what you mean by that, but I think that I
  disagree. The C/Unix approach is more general; it isn't tied to any
  specific encoding.
 
 If filenames were expressed as bytes in the Haskell program, how would
 you map them to WinAPI? If you use the current Windows code page, the
 set of valid characters is limited without a good reason.

Windows filenames are arguably characters rather than bytes. However,
if you want to present a common API, you can just use a fixed encoding
on Windows (either UTF-8 or UTF-16).

  If they tried a decade hence, it would still be too early. The
  single-byte encodings (ISO-8859-*, koi-8, win-12xx) aren't likely to
  be disappearing any time soon, nor is ISO-2022 (UTF-8 has quite
  spectacularly failed to make inroads in CJK-land; there are probably
  more UTF-8 users in the US than there).
 
 Which is a pity. ISO-2022 is brain-damaged because of enormous
 complexity,

Or, depending upon ones perspective, Unicode is brain-damaged because,
for the sake of simplicity, it over-simplifies the situation. The
over-simplification is one reason for it's lack of adoption in the CJK
world.

Multi-lingual text consists of distinct sections written in distinct
languages with distinct alphabets. It isn't actually one big chunk
in a single global language with a single massive alphabet.

 and ISO-8859-x have small repertoires.

Which is one of the reasons why they are likely to persist for longer
than UTF-8 true believers might like. E.g. languages which don't
primarily use the Roman alphabet (Greek, Russian) can still be
represented as one byte per character. And it's feasible to have
tables which are indexed by codepoint; as a counter-example, calling
XQueryFont for a Unicode font *really* sucks if either the server
doesn't have the BigFont extension or, worse still, it can't use it
because the client is remote.

 I would not *force* UTF-8, but it should work for those who
 voluntarily choose to use it as their locale encoding. Including
 filenames.

Not forcibly decoding filenames isn't the same thing as preventing
them from being decoded.

  Look, C has all of the functionality that we're talking about: wide
  characters, wide versions of string.h and ctype.h, and conversion
  between byte-streams and wide characters.
 
 ctype.h is useless for UTF-8.

Hello? Let's try that again, with emphasis:

  C has ... WIDE VERSIONS OF string.h and ctype.h

They're called wchar.h and wctype.h.

 There is no capability of attaching automatic recoders of explicitly
 chosen encodings to file handles.

At this point you starting engaging in diversionary tactics. Again.

 No, the C language doesn't make these issues easy and has lots of
 historic baggage.

The issues aren't easy, and have lots of historic baggage. That's
reality.

Fortunately, C has a history of being geared to reality, rather than
the comfortable fantasy where the issues don't exist. Which is why
everyone uses it.

  But it did it without getting in the way of writing programs which
  don't care about encodings,
 
 It does get in the way of writing programs which do care, because they
 must do whole recoding themselves and remember which API has which
 character set limitations.

No. Not doing something for you isn't the same thing as getting in the
way. Getting in the way is doing for you something which you didn't
want done in the first place. Getting in the way is not letting you do
something yourself.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Writing binary files?

2004-09-15 Thread Glynn Clements


Graham Klyne wrote:

 In particular, the idea of narrowing the Char type really seems like a 
 bad idea to me (if I understand the intent correctly).  Not so long ago, I 
 did a whole load of work on the HaXml parser so that, among other things, 
 it would support UTF-8 and UTF-16 Unicode (as required by the XML 
 spec).  To do this depends upon having a Char type that can represent the 
 full repertoire of Unicode characters.

Note: I wasn't proposing doing away with wide character support
altogether. Essentially, I was suggesting making Char a byte and
having e.g. WideChar for wide characters. The reason being that the
existing Haskell98 API uses Char for functions which are actually
dealing with bytes.

In an ideal world, the IO, System and Directory modules (and the
Prelude I/O functions) would have used Byte, leaving Char to represent
a (wide) character. However, that isn't the hand we've been dealt.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] FilePath handling [Was: Writing binary files?]

2004-09-15 Thread Glynn Clements


Henning Thielemann wrote:

 Udo Stenzel wrote:
 
  The same thoughts apply to filenames.  Make them [Word8] and convert
  explicitly.  By the way, I think a path should be a list of names (that
  is of type [[Word8]]) and the library would be concerned with putting in
  the right path separator.  Add functions to read and show pathnames in
  the local conventions and we'll never need to worry about path
  separators again.
 
 I even plead for an abstract data type FilePath which supports operations
 like 'enter a directory', 'go one level higher' and so on.

Are you referring to pure operations on the FilePath, e.g. appending
and removing entries? That's reasonable enough. But it needs to be
borne in mind that there's a difference between:

setCurrentDirectory ..
and:
dir - getCurrentDirectory
setCurrentDirectory $ parentDirectory dir

[where parentDirectory is a pure FilePath - FilePath function.]

if the last component in the path is a symlink.

If you want to make FilePath an instance of Eq, the situation gets
much more complicated.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Unicoded filenames

2004-09-15 Thread Glynn Clements


Marcin 'Qrczak' Kowalczyk wrote:

 Here is what happens when a language provides only narrow-char API for
 filenames:

  I have a filename as an UTF-8 encoded string. I need to be able to 
  handle strange chars like accents, Asian chars etc.
  
  Is there any way to create a file with that name? I only need it on Win32.
 
 Windows uses UTF-16 for filenames, but provides a non-Unicode interface 
 for legacy applications; the standard open() function that OCaml's 
 open_out wraps appears to use the legacy interface.  The precise 
 codepage this uses is system-dependent, and AFAIK there's no way for a 
 program to determine what it is without calling out to the Win32 API, 
 but you can be pretty sure it won't be UTF-8.
 
 In other words, there is no reliable way to use a filename containing 
 non-ASCII characters with OCaml's standard library.

No, this is what happens when an API imposes restrictions upon the
filenames which it can handle.

Essentially, it's due to two (or possibly three) factors:

1. The fact that Windows uses wide strings, rather than multi-byte
strings, for filenames.

2. The fact that Windows' compatibility interface is broken, i.e. it
only lets you access filenames which can be represented in the current
codepage (which, to me, is highly analogous to only supporting
filenames which are valid in the current locale).

3. Possibly that OCaml insists upon using UTF-8. [I don't know that
this is the case, but the fact that they specifically mention UTF-8
suggests that it might be.]

IOW, this incident seems to oppose, rather than support, the
filenames-as-characters viewpoint.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Writing binary files?

2004-09-14 Thread Glynn Clements


David Menendez wrote:

  I'd like to see the following:
  
  - Duplicate the IO library.  The duplicate should work with [Byte]
everywhere where the old library uses String.  Byte is some suitable
unsigned integer, on most (all?) platforms this will be Word8
  
  - Provide an explicit conversion between encodings.  A simple
conversion of type [Word8] - String would suit me, iconv would
provide all that is needed.
 
 I like this idea, but I say there should be a bit-oriented layer beneath
 everything.

The byte stream is inherent, as that's (usually) what the OS gives
you. Everything else is synthesised.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Writing binary files?

2004-09-13 Thread Glynn Clements

 in areas which work
 with arbitrary binary data (mostly file contents).

And the ability to actually use any encoding except ISO-8859-1 in any
meaningful way. I.e. encoders/decoders for other encodings, along with
the means to specify which encoding to use for functions which need to
perform encoding or decoding.

  My main concern is that someone will get sick of waiting and make the
  wrong fix, i.e. keep the existing API but default to the locale's
  encoding, so that every simple program then has to explicitly set it
  back to ISO-8859-1 to get reasonable worst-case behaviour.
 
 Supporting byte I/O and supporting character recoding needs to be done
 before this.

My view is that, right now, we have the worst of both worlds, and
taking a short step backwards (i.e. narrow the Char type and leave the
rest alone) is a lot simpler (and more feasible) than the long journey
towards real I18N.

More generally, this is the most intrusive example of a common problem
with too many Haskell libraries, i.e. exporting an interface which is
too high-level and glosses over too many detail. But this isn't some
obscure third-party libray. This is the Haskell98 standard library;
some of it's in the Prelude.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Writing binary files?

2004-09-12 Thread Glynn Clements


Abraham Egnor wrote:

 Passing a Ptr isn't that onerous; it's easy enough to make functions
 that have the signature you'd like:
 
 import System.IO
 import Data.Word (Word8)
 import Foreign.Marshal.Array
 
 hPutBytes :: Handle - [Word8] - IO ()
 hPutBytes h ws = withArray ws $ \p - hPutBuf h p $ length ws
 
 hGetBytes :: Handle - Int - IO [Word8]
 hGetBytes h c = allocaArray c $ \p -
 do c' - hGetBuf h p c
peekArray c' p

The problem with this approach is that the entire array has to be held
in memory, which could be an issue if the amount of data involved is
large.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Writing binary files?

2004-09-12 Thread Glynn Clements


Sven Panne wrote:

  Also, changing the existing functions to deal with encodings is likely
  to break a lot of things (i.e. anything which reads or writes data
  which is in neither UTF-8 nor the locale-specified encoding).
 
 Hmmm, the Unicode tables start with ISO-Latin-1, so what would exactly break
 when we stipulate that the standard encoding for string I/O in Haskell is
 ISO-Latin-1?

That would essentially be formally specifying the existing behaviour,
which wouldn't break anything, including the mechanism for
reading/writing binary data which I suggested (and which is the only
choice if your Haskell implementation doesn't have h{Get,Put}Buf).

The problems would come if it was decided to change the existing
behaviour, i.e. use something other than Latin1.

-- 
Glynn Clements [EMAIL PROTECTED]
___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

1 2 3 >

1 - 100 of 208 matches

Mail list logo