Re: H98 Text IO

2008-02-26 Thread Duncan Coutts

On Tue, 2008-02-26 at 14:18 +, Simon Marlow wrote:
> Simon Marlow wrote:
> > Duncan Coutts wrote:
> 
> Let's call this one proposal 0:
> 
> >>   * Haskell98 file IO should always use UTF-8.
> >>   * Haskell98 IO to terminals should use the current locale
> >> encoding.
> 
> and the others:
> 
> >   1. all text I/O is in the locale encoding (what C and Hugs do)
> > 
> >   2. stdin/stdout/stderr and terminals are always in the locale
> >  encoding, everything else is UTF-8
> > 
> >   3. everything is UTF-8

So it's clear that all these solutions have some downsides. We have to
decide what is more important.


Let me try and summarise:

basically we can be consistent with the OS environment or consistent
with other Haskell systems in other environments or try to get some
mixture of the two. It is pretty clear however that trying to get a
mixture still leads to some inconsistency with the OS environment.

  * "status quo" (what ghc/hugs do now)
This gives consistency with the OS environment with hugs and jhc
but not ghc, nhc or yhc. It gives consistency between haskell
programs (using the same haskell implementation) on different
platforms for ghc and nhc but not for hugs or jhc. There is no
consistency between haskell implementations.

  * "always locale" (solution 1 above)
This gives us consistency with the OS environment. All of the
shell snippets people have posted work with this. The main
disadvantage is that files moved between systems may be
interpreted differently.

  * "always utf8" (solution 3 above)
This gives consistency between Haskell programs across
platforms. The main disadvantage is that it is very unhelpful if
the locale is not UTF8. It fails the "putStr" test of printing
string literals to the terminal.

  * "mixture A" (solution 0 above)
The input/output format changes depending on the device. prog |
cat prints junk in non-UTF8 locales.

  * "mixture B"  (solution 2 above)
The output format changes depending on the device. prog in
behaves differently to prog < in.

And some example people have noted:

  * putStr "αβγδεζηθικλ"
That is just printing a string literal to the console/terminal.
Now that major implementations support Unicode .hs source files
it's kind of nice if this works.

This works with "always locale" and "mixture A" and "mixture B"
above. This fails for "status quo" with ghc (but works for hugs)
and fails for "always utf8" unless the locale happens to be
utf8.

  * ./prog  vs  ./prog | cat
That is, piping the output of a haskell program through cat and
printing the result to a terminal produces the same output as
displaying the program output directly.

This works with "always locale" and "mixture B" and fails with
"mixture A". With "always utf8" and with "status quo" it has the
property that it consistently produces the same junk on the
terminal  which some people see as a bonus (when not in a utf8
or latin1 locale respectively).

  * ./prog  vs  ./prog >file; cat file
This is another variation on the above and it has the same
failures.

  * ./prog in  vs  ./prog < in
That is reading a file given as a command line arg via readFile
gives the same result as reading stdin that has been redirected
from a the same file.

This works with "always locale" and "mixture A" and fails with
"mixture B". This is the dual of the previous two examples. This
fails with "always utf8" and with "status quo" when the file was
produced by another text processing program from the same
environment (eg a generic text editor).

  * ./foo vs  ./foo | hexdump -C
The output bytes we get sent to the terminal is exactly the same
as what we see piped to a program to examine those bytes.

This fails for "mixture A" and works for all the others. Works
in the strict sense that the bytes are the same, not in the
sense that the text output is readable.

So the problem with the mixture approaches is that the terminal and
files and pipes are all really interchangeable so we can find surprising
inconsistencies within the same OS environment.

The problem with the "always utf8" is that it's never right unless the
locale is set to utf8.


As a data point, Java and python use "always locale" as default if you
don't specify an encoding when opening a text stream.

I think personally I'm coming round to the "always locale" point of
view. We already have no cross-platform consistency for text files
because of the lf vs cr/lf issue and we have no cross-implementation
consistency.

Duncan


Re: H98 Text IO

2008-02-26 Thread Chris Kuklewicz

Reinier Lamers wrote:

Op 26-feb-2008, om 18:42 heeft Chris Kuklewicz het volgende geschreven:

The goal is that more complicated situations are reflected in
more complicated "ghc" or "main" invocations.  The least complicated
usage defaults to being identical cross-platform and regardless of
terminal I/O.

I think the best default would be UTF8 for all text handles.  This can
be easily documented, it can be easily understood, and will produce
the fewest suprises.

 > (...)

** Unless influenced by command-line switches, these default to UTF8.
I think that making the behavior of programs change, depending on 
compiler options, will produce a lot of surprises. I think that being 
only able to set the default encoding from within the program is a 
better idea, because it keeps the specification of the behavior of the 
program inside the source.


Reinier


I thought about that.  I started with realizing that *all* code written for GHC 
is written knowing Handles only return Word8 sized Latin1 characters.


So there are several way one might proceed, some of which are:

  1) No command line switches, default to Latin1.  To get unicode you call
 a special 'turnOnUnicodeHandleGoodness' IO operation.  This is good since
 it does not break old code.

  2) No command line switches, default to something new.  This required all old
 code to be conditionally retrofit with a 'turnOffUnicode' IO operation.
 This breaks much of the code that has been written, and is thus bad.

  3) Add a "ghc --turn-on-unicode" command line switch.  This makes all old code
 build just fine, since it lacks the switch to activate the new behavior.

  4) Add a "ghc --turn-off-unicode" command line switch.  This is nice since
 it lets new code use the new Handle encoding by default, but not nice in
 requiring that old code built using ghc-6.10 use an additional option.

I also think the following are likely to be true:

  *) Cabal is already controlling the ghc compiler switches for most code.

  *) The experience of the ghc-6.6 to ghc-6.8 transition involved updating most
 cabal files to allow old code to work with new compiler.

  *) Other changes, unrelated to the unicode handles, will require most
 old packages to update their cabal files to with with ghc-6.10

  *) The additional work to updated the cabal file to add the
 "--turn-off-unicode" command line switch to ghc would be 1 word to 1 line.

So I think that making ghc default to option (4) above saves nearly zero work 
when updating old cabal files compared to option (3).  The benefit of option (3) 
compared to (4) is that no boilerplate will be needed to obtain the new handle 
encoding.


And I simply prefer that the better handle encoding be the default; move the 
implementation forward.


Now if GHC does not have a command line switch then either with (2) you have to 
conditionally (perhaps with #ifdef) update almost every bit of code on hackage 
or with (1) you have all future programs burdened with boilerplate, which some 
people may forget.


So I will enjoy having switches as well as the IO commands.
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: H98 Text IO

2008-02-26 Thread Duncan Coutts

On Wed, 2008-02-27 at 01:14 +1100, Roman Leshchinskiy wrote:
> Duncan Coutts wrote:
> > On Wed, 2008-02-27 at 00:31 +1100, Roman Leshchinskiy wrote:

> >> I'm probably mistaken,
> >> but doesn't this proposal mean that I can't implement cat in H98 using
> >> text I/O? That would be a bit disturbing.
> > 
> > You've never been able to do that with the guarantees provided by H98.
> 
> As a matter of fact, 21.10.2 from the Haskell Report suggests that at 
> least copying text files should be possible. Unless I'm mistaken, your 
> proposal would invalidate that example somewhat.

> This begs another question. What exactly does "current locale" mean, 
> given that we have lazy I/O and the locale can be changed on the fly?

The current locale is a Posix concept. There are posix functions for
changing it.

I'd suggest that a Handle inherits the current locale as its encoding at
the point of creation of the Handle. Further changes to the posix locale
would not change any existing open Handles.

If we were to provide an action to change the encoding of an open Handle
then it is clear that it cannot act on semi-closed handles. That'd make
lazy IO ok.

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: H98 Text IO

2008-02-26 Thread Duncan Coutts

On Tue, 2008-02-26 at 07:28 -0800, John Meacham wrote:
> On Tue, Feb 26, 2008 at 01:34:54PM +, Duncan Coutts wrote:
> > Personally I'm not really fussed about which compromise we pick. I think
> > the more important point is that all the Haskell implementations pick
> > the same compromise so that we can effectively standardise the
> > behaviour.
> 
> Wait, are you talking about changing what ghc does or trying to change
> the haskell standard? I always thought ghc should do something more sane
> with character IO, non unicode aware programs are a blight.
> 
> I don't think choosing something arbitrary to standardize on is a good
> idea. It is not always clear what the best choice is. like, for instance
> until recently, jhc used locale encoding on linux, due to glibc's strong
> charset support and guarenteed use of unicode wchar_t's, but utf8 always
> on bsd-varients, where the wchar_t situation was less clear cut. On
> embedded systems, only supporting ASCII IO is certainly a valid choice.
> For a .NET backend, we will want to use .NET's native character IO
> routines.

Oh I wasn't trying to pin it down that much. If you want to use ebdic on
some embedded platform by default I don't care. I really mean that it'd
be nice if hugs, ghc, jhcm nhc98 etc could agree for each of the major
platforms, Linux/Unix, OS X and Windows. And I don't mean necessarily
that they should do the same thing across platforms (eg as I understand
it OS X would always use UTF8 not a variable locale) just that they
should do the same on the same platform.

So not a change of the H98 spec, just a common consensus on the major
platforms.

Duncan

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: H98 Text IO

2008-02-26 Thread Reinier Lamers

Op 26-feb-2008, om 18:42 heeft Chris Kuklewicz het volgende geschreven:

The goal is that more complicated situations are reflected in
more complicated "ghc" or "main" invocations.  The least complicated
usage defaults to being identical cross-platform and regardless of
terminal I/O.

I think the best default would be UTF8 for all text handles.  This can
be easily documented, it can be easily understood, and will produce
the fewest suprises.

> (...)

** Unless influenced by command-line switches, these default to UTF8.
I think that making the behavior of programs change, depending on  
compiler options, will produce a lot of surprises. I think that being  
only able to set the default encoding from within the program is a  
better idea, because it keeps the specification of the behavior of  
the program inside the source.


Reinier
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: H98 Text IO

2008-02-26 Thread Chris Kuklewicz

The H98 spec has the inside half of story nailed down: Char is
Unicode, and Handles are text I/O that deal in [Char].  The outside
half of the story is the binary encoding of the [Char], which was
unspecified, and left to the implementation.

The implementation dependence allows GHC to create a
"setHandleEncoding" (or "withHandleEncoding") operation. [I do not
want to get bogged down in syntax]. This is something that, like all
details of encoding, is not the H98 spec.  In addition, there may be
some command line parameters to GHC.

Imagine that GHC 6.10.1 is released with encoding support.  If the
user runs ghc with no options or setup changes, then the new defaults
will apply.

The goal is that more complicated situations are reflected in
more complicated "ghc" or "main" invocations.  The least complicated
usage defaults to being identical cross-platform and regardless of
terminal I/O.

I think the best default would be UTF8 for all text handles.  This can
be easily documented, it can be easily understood, and will produce
the fewest suprises.

I imagine that in this proposed ghc-6.10.1:

* GHC's handles now carry an encoding parameter.

** There is a way to create a new handle from an old one that differs
  only in the encoding.
  (perhaps 'hNew <- cloneHandleWithEncoding "Latin1" hOld')

* GHC's has mutable global variables that control the encoding
  parameter of new handles.

** Unless influenced by command-line switches, these default to UTF8.

** There are IO commands to read & write these global variables.

** There are different defaults for new terminal I/O handles and other
   I/O handles, so they could be given different encodings.

If you want to use the "local" or native encoding, then compile with
"ghc --local-encoding" or start the program with something like
"main = handlesUseLocalEncoding >> do ..."

If you want to use "Latin1" then use either
"ghc --encoding Latin1" or
"main = handlesUseEncoding "Latin1" >> do ..."

To compile older programs one could use "ghc --compat 6.8" or "ghc
--encoding Latin1" to access the old defaults.

One might even add "+RTS --encoding Latin1 -RTS" runtime options to
set the initial encoding.  Though I think this is unlikely to be
useful in practice.

I think that having terminal I/O be special is great for command line
applications.  But the nice behavior of such applications like "ls"
must not determine what the GHC runtime does by default.
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: H98 Text IO

2008-02-26 Thread John Vogel
Why not leave the defaults as they ARE OR USE utf-8 and give the
programmer the capability to specify what encoding they want when
they want to use a different one?

John
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: H98 Text IO

2008-02-26 Thread John Meacham
On Tue, Feb 26, 2008 at 01:34:54PM +, Duncan Coutts wrote:
> Personally I'm not really fussed about which compromise we pick. I think
> the more important point is that all the Haskell implementations pick
> the same compromise so that we can effectively standardise the
> behaviour.

Wait, are you talking about changing what ghc does or trying to change
the haskell standard? I always thought ghc should do something more sane
with character IO, non unicode aware programs are a blight.

I don't think choosing something arbitrary to standardize on is a good
idea. It is not always clear what the best choice is. like, for instance
until recently, jhc used locale encoding on linux, due to glibc's strong
charset support and guarenteed use of unicode wchar_t's, but utf8 always
on bsd-varients, where the wchar_t situation was less clear cut. On
embedded systems, only supporting ASCII IO is certainly a valid choice.
For a .NET backend, we will want to use .NET's native character IO
routines.

The important thing is standardizing how _binary_ handles work across
compilers. As long as everyone has a compatible openBinaryHandle then we
can layer whatever we want on it with compatible libraries.

I think the current behavior of GHC is poor and should be fixed, I
believe the intent of the haskell 98 standard is that character IO be
performed in a suitable system specific way, which always truncating to
8bits does not meet IMHO. But no need to prescribe something arbitrary
language-wide for a particular issue with ghc.

John 

-- 
John Meacham - ⑆repetae.net⑆john⑈
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: H98 Text IO

2008-02-26 Thread Simon Marlow

Roman Leshchinskiy wrote:

Duncan Coutts wrote:

On Wed, 2008-02-27 at 00:31 +1100, Roman Leshchinskiy wrote:



Also, would this affect the encoding used for file names? If so, how?


No, that's a separate issue.


Hmm, so how do I reliably read a list of file names from a file?


You didn't say what format the file takes, so there are a couple of 
options.  If you get to choose the format, then using read/show is easiest. 
 If you're stuck with a predefined format, say one filename per line, then 
it depends what system you're on:


 - on Windows, filenames are Unicode, so the file must be in
   some encoding: decode it appropriately.

 - on Unix, filenames are binary, so use openBinaryFile and hGetLine.

Yes, this is all broken (in particular FilePath == [Char] is wrong), but at 
least it's possible to do what you want, and it's not getting any worse 
with the proposed change.  Filenames are something else that need an 
overhaul, but one thing at a time.


Cheers,
Simon
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: H98 Text IO

2008-02-26 Thread Simon Marlow

Simon Marlow wrote:

Duncan Coutts wrote:


Let's call this one proposal 0:


  * Haskell98 file IO should always use UTF-8.
  * Haskell98 IO to terminals should use the current locale
encoding.


and the others:


  1. all text I/O is in the locale encoding (what C and Hugs do)

  2. stdin/stdout/stderr and terminals are always in the locale
 encoding, everything else is UTF-8

  3. everything is UTF-8


Some other points that came up on IRC:

 - there's a long precedent for behaving differently when connected to
   a terminal.  For example, 'ls' formats output in columns when
   connected to a terminal, or displays output in colour.  This is
   a point in favour of (0).

 - we might expect that "prog file" behaves the same as "prog http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: H98 Text IO

2008-02-26 Thread Roman Leshchinskiy

Duncan Coutts wrote:

On Wed, 2008-02-27 at 00:31 +1100, Roman Leshchinskiy wrote:

Duncan Coutts wrote:

So here is a concrete proposal:

  * Haskell98 file IO should always use UTF-8.
  * Haskell98 IO to terminals should use the current locale
encoding.
Personally, I'd find this deeply surprising. I don't care that much what 
locale gets used for I/O (if it matters, you have to deal with it 
explicitly anyway) as long as it is consistent. I'm probably mistaken,

but doesn't this proposal mean that I can't implement cat in H98 using
text I/O? That would be a bit disturbing.


You've never been able to do that with the guarantees provided by H98.


As a matter of fact, 21.10.2 from the Haskell Report suggests that at 
least copying text files should be possible. Unless I'm mistaken, your 
proposal would invalidate that example somewhat.


This begs another question. What exactly does "current locale" mean, 
given that we have lazy I/O and the locale can be changed on the fly?



Also, would this affect the encoding used for file names? If so, how?


No, that's a separate issue.


Hmm, so how do I reliably read a list of file names from a file?

Roman

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: H98 Text IO

2008-02-26 Thread Duncan Coutts

On Wed, 2008-02-27 at 00:31 +1100, Roman Leshchinskiy wrote:
> Duncan Coutts wrote:
> > 
> > So here is a concrete proposal:
> > 
> >   * Haskell98 file IO should always use UTF-8.
> >   * Haskell98 IO to terminals should use the current locale
> > encoding.
> 
> Personally, I'd find this deeply surprising. I don't care that much what 
> locale gets used for I/O (if it matters, you have to deal with it 
> explicitly anyway) as long as it is consistent. I'm probably mistaken,
> but doesn't this proposal mean that I can't implement cat in H98 using
> text I/O? That would be a bit disturbing.

You've never been able to do that with the guarantees provided by H98.

The current base lib provides System.IO.openBinaryFile which does make
it possible to implement cat on binary files.

> Also, would this affect the encoding used for file names? If so, how?

No, that's a separate issue.

Duncan

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: H98 Text IO

2008-02-26 Thread Duncan Coutts
On Tue, 2008-02-26 at 13:22 +, Simon Marlow wrote:

> So some alternatives that fix this are
> 
>1. all text I/O is in the locale encoding (what C and Hugs do)
> 
>2. stdin/stdout/stderr and terminals are always in the locale
>   encoding, everything else is UTF-8

I was initially confused about how this one was different from what I
first proposed.

The difference is that I was suggesting stdin/stdout/stderr be in the
locale *only* if thet're connected to a terminal, rather than always.

>3. everything is UTF-8


Personally I'm not really fussed about which compromise we pick. I think
the more important point is that all the Haskell implementations pick
the same compromise so that we can effectively standardise the
behaviour.

Duncan

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: H98 Text IO

2008-02-26 Thread Roman Leshchinskiy

Duncan Coutts wrote:


So here is a concrete proposal:

  * Haskell98 file IO should always use UTF-8.
  * Haskell98 IO to terminals should use the current locale
encoding.


Personally, I'd find this deeply surprising. I don't care that much what 
locale gets used for I/O (if it matters, you have to deal with it 
explicitly anyway) as long as it is consistent. I'm probably mistaken, 
but doesn't this proposal mean that I can't implement cat in H98 using 
text I/O? That would be a bit disturbing.


Also, would this affect the encoding used for file names? If so, how?

Roman

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: H98 Text IO

2008-02-26 Thread Simon Marlow

Duncan Coutts wrote:

From the H98 report:


All I/O functions defined here are character oriented. [...]
These functions cannot be used portably for binary I/O.

In the following, recall that String is a synonym for [Char]

(Section 6.1.2).

So ordinary text Handles are for text, not binary. Char is of course a
Unicode code point.

The crucial question of course is what encoding of text to use. For the
H98 IO functions we cannot set it as a parameter, we have to pick a
sensible default. Currently different implementations disagree on that
default. Hugs has for some time used the current locale on posix systems
(and I'm guessing the current code page on windows). GHC has always used
the Latin-1 encoding.

These days, most operating systems use a locale/codepage encoding that
covers full the Unicode range. So on hugs we get the benefit of that but
on GHC we do not.

This is endlessly surprising for beginners. They do
putStrLn "αβγδεζηθικλ"
and it comes out on their terminal as junk.

It also causes problems for serious programs, see for example the recent
hand-wringing on cabal-devel.

So here is a concrete proposal:

  * Haskell98 file IO should always use UTF-8.
  * Haskell98 IO to terminals should use the current locale
encoding.


While I support Duncan's proposal (we discussed it on IRC), I thought I 
should point out some of the ramifications of this, and the alternatives.


If everything that is not a terminal uses UTF-8 by default, then shell 
commands may behave in an unexpected way, e.g. for a Haskell program "prog",


  prog | cat

will output in UTF-8, and if your locale encoding is something other than 
UTF-8 you'll see junk.  Similarly,


  prog >file; cat file

will give the same (wrong) result.

So some alternatives that fix this are

  1. all text I/O is in the locale encoding (what C and Hugs do)

  2. stdin/stdout/stderr and terminals are always in the locale
 encoding, everything else is UTF-8

  3. everything is UTF-8

(1) has the advantage of being easy to understand, but causes problems when 
you want to move a file created on one system to another system, or share 
files between users.  The programmer in this case has to anticipate the 
problem and set an encoding (and we're not proposing to provide a way to 
specify encodings, yet, so openBinaryFile and a separate UTF-8 step would

be required).

(2) has a sort of "do what I want" feel, and will almost certanly cause
confusion in some cases, simply because it's an aribtrary choice.

(3) is easy to understand, but does the wrong thing for people who have
a locale encoding other than UTF-8.

Duncan's proposal occupies a useful point: text that we know to be 
ephemeral, because it is being sent to a terminal, is definitely sent in 
the user's default encoding.  Text that might be persistent or might be 
crossing a locale-boundary is always written in UTF-8, which is good for 
interchange and portability, the catch is that sometimes we identify a 
Handle as persistent when it is really ephemeral.


Note that sensible people who set their locale to UTF-8 are not affected by 
any of this - and that includes most new installations of Linux these days, 
I believe.


Cheers,
Simon
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: H98 Text IO

2008-02-26 Thread John Meacham
I came to the same conclusions. I think using either the current
encoding or utf8 are perfectly reasonable interpretations of the
standard. Jhc used to use the current locale always, but now it uses
utf8 always as that was easier to make portable to other operating
systems. (though current locale support will likely be added back at
some point)

I think this is a-okay as far as haskell 98 goes. Assuming latin1
without doing an 'openBinaryFile' is certainly not okay in my book.

John


-- 
John Meacham - ⑆repetae.net⑆john⑈
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


H98 Text IO

2008-02-26 Thread Duncan Coutts

>From the H98 report:

All I/O functions defined here are character oriented. [...]
These functions cannot be used portably for binary I/O.

In the following, recall that String is a synonym for [Char]
(Section 6.1.2).

So ordinary text Handles are for text, not binary. Char is of course a
Unicode code point.

The crucial question of course is what encoding of text to use. For the
H98 IO functions we cannot set it as a parameter, we have to pick a
sensible default. Currently different implementations disagree on that
default. Hugs has for some time used the current locale on posix systems
(and I'm guessing the current code page on windows). GHC has always used
the Latin-1 encoding.

These days, most operating systems use a locale/codepage encoding that
covers full the Unicode range. So on hugs we get the benefit of that but
on GHC we do not.

This is endlessly surprising for beginners. They do
putStrLn "αβγδεζηθικλ"
and it comes out on their terminal as junk.

It also causes problems for serious programs, see for example the recent
hand-wringing on cabal-devel.

So here is a concrete proposal:

  * Haskell98 file IO should always use UTF-8.
  * Haskell98 IO to terminals should use the current locale
encoding.

The main controversial point I think is whether to always use UTF-8 or
always use the current locale or some split as I've suggested. C chose
to always go with the current locale. Some people think that was a
mistake because the interpretation changes from user to user.

For terminals it is more clear cut that the locale is the right choice
because that is what the terminal is capable of displaying. Using
anything else will produce junk. We can detect if a handle is a terminal
when we open it using hIsTerminalDevice. This should be done
automatically (and ghc would ghc get it for free because it already does
that check to determine default buffering modes).

Sockets and pipes would be treated the same as files when opened in the
default text mode. The only special case is terminals.

The major problem is with code that assumes GHC's Handles are
essentially Word8 and layer their own UTF8 or other decoding over the
top. The utf8-string package has this problem for example. Such code
should be using openBinaryFile because they are reading/writing binary
data, not String text.

Note that many programs that really need to work with binary file
already use openBinaryFile, those that do not are already broken on
Windows which does cr/lf conversion on text files which breaks many
binary formats (though not utf8).

So we have decide which is more painful, keeping a limited text IO
system in GHC or breaking some existing programs which assume GHC's
current behaviour.

Opinions?

Please can we keep this discussion to the interpretation of the H98 IO
functions and not get into the separate discussion of how we could
extend or redesign the whole IO system. This is a questions of what are
the right defaults.

Duncan

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: how to use ghci-debugger with packages

2008-02-26 Thread Simon Marlow

Frederik Eaton wrote:


P.S. Here are some suggestions for the GHCi debugger documentation:

http://www.haskell.org/ghc/dist/current/docs/users_guide/ghci-debugger.html

"There is one major restriction: breakpoints and single-stepping are only available 
in interpreted modules; compiled code is invisible to the debugger."
-->
"There is one major restriction: breakpoints and single-stepping are only available 
in interpreted modules; compiled code is invisible to the debugger. Note that packages 
only contain compiled code - so debugging a package requires finding its source and 
loading that directly."

"There is currently no support for obtaining a "stack trace", but the tracing and 
history features provide a useful second-best, which will often be enough to establish the context 
of an error."
-->
"There is currently no support for obtaining a "stack trace", but the tracing and 
history features provide a useful second-best, which will often be enough to establish the context 
of an error. For instance, it is possible to break automatically when an exception is thrown, even 
if it is thrown from within compiled code (see 3.5.6. Debugging exceptions)."


Thanks!  I'll push these changes.

Cheers,
Simon

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Visual Haskell sources

2008-02-26 Thread Simon Marlow

Andrei Formiga wrote:

Hello,

The Visual Haskell 0.2 release notes [1] say that sources are
available, but the download page only has binaries available. Where
are the sources? Also, does it use the Visual Studio SDK, and is it
compatible with VS 2008? Thanks.


[1] http://haskell.org/visualhaskell/doc/index.html#release-notes



Sources are here:

  http://darcs.haskell.org/vshaskell/

good luck compiling it :-)

Cheers,
Simon

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users