#3398: Unicode output in GHC
-------------------------------+--------------------------------------------
  Reporter:  simonmar          |          Owner:                  
      Type:  bug               |         Status:  new             
  Priority:  high              |      Milestone:  6.12.1          
 Component:  Compiler          |        Version:  6.11            
  Severity:  normal            |       Keywords:                  
Difficulty:  Unknown           |       Testcase:  2816            
        Os:  Unknown/Multiple  |   Architecture:  Unknown/Multiple
-------------------------------+--------------------------------------------
 Unicode output is somewhat broken in GHC as a whole.  We should fix it
 properly.

 Most output is generated by the Pretty module.  Pretty has two ways to
 output:

  * `printLeftRender`, which is used when the rendering mode is `LeftMode`.
    This method uses the `BufWrite` module to speed up output.  For
 `FastStrings`,
    the output will be in UTF-8, for strings and other characters the
 output
    takes the low 8 bits of each character.

  * `printDoc`, when used in modes other than `LeftMode` (e.g. for things
 like
    error messages and `-ddump`), calls `hPutStr` for strings which uses
 the
    prevailing encoding on stdout.  However, it calls `hPutFS` for
 `FastStrings`,
    which always emits UTF-8.

  * In GHCi, there is an additional layer due to Haskeline, which pipes all
 the
    output through its own decoder (or tries to, I think there are cases
 not
    covered).

 This is all a bit of a mess.

 We should be using the Unicode layer in the IO library for all
 encoding/decoding now.  I suggest that:

  * we leave `printLeftRender` alone.  It is used for printing things like
 the
    `.s` file, and never outputs any Unicode characters because everything
 is
    Z-encoded.

  * `printDoc`, instead of `hPutFS`, should use `hPutStr . decodeFS`

  * We get rid of the Haskeline decoding layer.

 However, this will introduce a regression on Windows, because the
 Haskeline encoding layer currently does code-page encoding.  Judah has
 mentioned looking at doing code-page encoding in the GHC IO library, so
 let's see what happens there.

 Once this is done, we can do #2507 (quotation characters in error
 messages).

-- 
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/3398>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs

Reply via email to