Re: Should GHC default to -O1 ?

2011-11-09 Thread Nathan Howell
On Tue, Nov 8, 2011 at 11:28 PM, wagne...@seas.upenn.edu wrote:

 I don't agree that GHC's user interface should be optimized for newcomers
 to Haskell. GHC is an industrial-strength compiler with some very advanced
 features; the majority of its target audience is professional programmers.
 Let its interface reflect that fact.

 As Simon explained, GHC's current defaults are a very nice point in the
 programming space for people who are actively building and changing their
 programs.


It's easy to build arguments for either side, but my experience as a
professional developer is that new devs don't know what arguments they need
for reasonable performance, often knowing even what various optimization
flags do, but experienced developers do know the difference between -O0 and
-O1, and frequently need -debug (not a default option) more than -O0.

Seasoned GHC users can find that -O0 gives miserably slow compile times,
and fall back to GHCi for edit/rebuild cycles... which still aren't
terribly fast if you're using GHC's advanced features. I have a couple
small modules that take 10 minutes each to compile on a current Core i7 at
-O0, and -O2 really doesn't take much longer. GHCi is very slightly faster
but I'll still head directly downstairs for a coffee as soon as either of
these bad boys need rebuilding... and still make it back upstairs before
they're done.

And so I'd prefer the default to be -O1 or even -O2 and have people who
really need it use -O0. GHC shouldn't be painful on purpose, industrial
strength or not.

-n
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: behaviour change in getDirectoryContents in GHC 7.2?

2011-11-09 Thread Max Bolingbroke
On 8 November 2011 11:43, Simon Marlow marlo...@gmail.com wrote:
 Don't you mean 1 is what we have?

Yes, sorry!

 Failing to roundtrip in some cases, and doing so silently, seems highly
 suboptimal to me.  I'm sorry I didn't pick up on this at the time (Unicode
 is a swamp :).

I *can* change the implementation back to using lone surrogates. This
gives us guaranteed roundtripping but it means that the user might see
lone-surrogate Char values in Strings from the filesystem/command
line. IIRC this does break some software -- e.g. Brian's text
library explicitly checks for such characters and fails if it detects
them.

So whatever happens we are going to end up making some group of users unhappy!
  * No PEP383: Haskellers using non-ASCII get upset when their command
line argument [String]s aren't in fact sequences of characters, but
sequences of bytes in some arbitrary encoding
  * PEP383(surrogates): Unicoders get upset by lone surrogates (which
can actually occur at the moment, independent of PEP383 -- e.g. as
character literals or from FFI)
  * PEP383(private chars): Unixers get upset that we can't roundtrip
byte sequences that look like the codepoint 0xEFXX encoded in the
current locale. In practice, 0xEFXX is only decodable from a UTF
encoding, so we fail to roundtrip byte sequences like the one Ian
posted.

I'm happy to implement any behaviour, I would just like to know that
whatever it is is accepted as the correct tradeoff :-)

RE exposing a ByteString based interface to the IO library from
base/unix/whatever: AFAIK Python doesn't do this, and just tells
people to use the (x.encode(sys.getfilesystemencoding(),
surrogateescape)) escape hatch, which is what I've been
recommending. I think this would be more satisfying to John if it were
actually guaranteed to work on arbitrary byte sequences, not just
*highly likely* to work :-)

Max

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: behaviour change in getDirectoryContents in GHC 7.2?

2011-11-09 Thread Max Bolingbroke
On 7 November 2011 17:32, John Millikin jmilli...@gmail.com wrote:
 I am also not convinced that it is possible to correctly implement
 either of these functions if their behavior is dependent on the user's
 locale.

FWIW it's only dependent on the users locale because whether glibc
iconv detects errors in the *from* sequence depends on what the *to*
locale is. Clearly an invalid *from* sequence should be reported as
invalid regardless of *to*. I know this isn't much comfort to you,
though, since you do have to worry about broken behaviour in 7.2, and
possible future breakage with changes in iconv.

I understand your point that it would be better from a complexity
point of view to just roundtrip the bytes as *bytes* without relying
on all this escaping/unescaping code.

 Please understand, I am not arguing against the existence of this
 encoding layer in general. It's a fine idea for a simplistic
 high-level filesystem interaction library. But it should be
 *optional*, not part of the compiler or base.

The problem is that I *really really want* getArgs to decode the
command line arguments. That's almost the whole point of this change,
and it is what most users seem to expect. Given this constraint, the
code has to be part of base, and if getArgs has this behaviour then
any file system function we ship that takes a FilePath (i.e. all the
functions in base, directory, win32 and unix) must be prepared to
handle these escape characters for consistency.

I *would* be happy to expose an alternative file system API from the
posix package that operates with ByteString paths. This package could
provide a function :: FilePath - ByteString that encodes the string
with the fileSystemEncoding (removing escapes in the process) for
interoperability with file names arriving via getArgs, and at that
point the decision about whether to use the escaping/unescaping code
would be (mostly) in the hands of the user. We could even have posix
expose APIs to get command line arguments/environment variables as
ByteStrings, and then you could avoid escape/unescape entirely.

Which of these solutions (if any) would satisfy you?
 1. The current situation, plus an alternative API exposed from
posix along the lines described above
 2. The current situation but with the escape/unescape modified so it
allows true roundtripping (at the cost of weird surrogate Char
values popping up now and again). If you have this you can reliably
implement the alternative API on top of the String based one, assuming
we got our escape/unescape code right

I hope we can work together to find a solution here.

Cheers,
Max

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: behaviour change in getDirectoryContents in GHC 7.2?

2011-11-09 Thread Ian Lynagh
On Wed, Nov 09, 2011 at 11:02:54AM +, Simon Marlow wrote:
 
 I would be happy with the surrogate approach I think.  Arguable if
 you try to treat a string with lone surrogates as Unicode and it
 fails, then that is a feature: the original string wasn't Unicode.
 All you can do with an invalid Unicode string is use it as a
 FilePath again, and the right thing will happen.

If we aren't going to guarantee that the encoded string is unicode, then
is there any benefit to encoding it in the first place?

 Alternatively if we stick with the private char approach, it should
 be possible to have an escaping scheme for 0xEFxx characters in the
 input that would enable us to roundtrip correctly.  That is, escape
 0xEFxx into a sequence 0xYYEF 0xYYxx for some suitable YY.

Why not encode into private chars, i.e. encode U+EF00 (which in UTF8 is
0xEE 0xBC 0x80) as U+EFEE U+EFBC U+EF80, etc?

(Max gave some reasons earlier in this thread, but I'd need examples of
what goes wrong to understand them).


Thanks
Ian


___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Should GHC default to -O1 ?

2011-11-09 Thread Greg Weber
How much does using ghc without cabal imply a newer programmer? I don't use
cabal when trying out small bits of code (maybe I should be using ghci),
but am otherwise always using cabal.

On Wed, Nov 9, 2011 at 3:18 AM, Duncan Coutts
duncan.cou...@googlemail.comwrote:

 On 9 November 2011 00:17, Felipe Almeida Lessa felipe.le...@gmail.com
 wrote:
  On Tue, Nov 8, 2011 at 3:01 PM, Daniel Fischer
  daniel.is.fisc...@googlemail.com wrote:
  On Tuesday 08 November 2011, 17:16:27, Simon Marlow wrote:
  most people know about 1, but I think 2 is probably less well-known.
  When in the edit-compile-debug cycle it really helps to have -O off,
  because your compiles will be so much quicker due to both factors 1 
 2.
 
  Of course. So defaulting to -O1 would mean one has to specify -O0 in the
  .cabal or Makefile resp. on the command line during development, which
  certainly is an inconvenience.
 
  AFAIK, Cabal already uses -O1 by default.

 Indeed, and cabal check / hackage upload complain if you put -O{n} in
 your .cabal file.

 The recommended method during development is to use:

 $ cabal configure -O0


 Duncan

 ___
 Glasgow-haskell-users mailing list
 Glasgow-haskell-users@haskell.org
 http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Should GHC default to -O1 ?

2011-11-09 Thread Duncan Coutts
On 9 November 2011 13:53, Greg Weber g...@gregweber.info wrote:
 How much does using ghc without cabal imply a newer programmer? I don't use
 cabal when trying out small bits of code (maybe I should be using ghci), but
 am otherwise always using cabal.

The main reason cabal has always defaulted to -O is because
historically it's been assumed that the user is installing something
rather than just hacking on their own code.

If we can distinguish cleanly in the user interface between the
installing and hacking use cases then we could default to -O0 for the
hacking case.

Duncan

 On Wed, Nov 9, 2011 at 3:18 AM, Duncan Coutts duncan.cou...@googlemail.com
 wrote:

 On 9 November 2011 00:17, Felipe Almeida Lessa felipe.le...@gmail.com
 wrote:
  On Tue, Nov 8, 2011 at 3:01 PM, Daniel Fischer
  daniel.is.fisc...@googlemail.com wrote:
  On Tuesday 08 November 2011, 17:16:27, Simon Marlow wrote:
  most people know about 1, but I think 2 is probably less well-known.
  When in the edit-compile-debug cycle it really helps to have -O off,
  because your compiles will be so much quicker due to both factors 1 
  2.
 
  Of course. So defaulting to -O1 would mean one has to specify -O0 in
  the
  .cabal or Makefile resp. on the command line during development, which
  certainly is an inconvenience.
 
  AFAIK, Cabal already uses -O1 by default.

 Indeed, and cabal check / hackage upload complain if you put -O{n} in
 your .cabal file.

 The recommended method during development is to use:

 $ cabal configure -O0


 Duncan

 ___
 Glasgow-haskell-users mailing list
 Glasgow-haskell-users@haskell.org
 http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: behaviour change in getDirectoryContents in GHC 7.2?

2011-11-09 Thread Max Bolingbroke
On 9 November 2011 13:11, Ian Lynagh ig...@earth.li wrote:
 If we aren't going to guarantee that the encoded string is unicode, then
 is there any benefit to encoding it in the first place?

(I think you mean decoded here - my understanding is that decode ::
ByteString - String, encode :: String - ByteString)

 Why not encode into private chars, i.e. encode U+EF00 (which in UTF8 is
 0xEE 0xBC 0x80) as U+EFEE U+EFBC U+EF80, etc?

 (Max gave some reasons earlier in this thread, but I'd need examples of
 what goes wrong to understand them).

We can do this but it doesn't solve all problems. Here are two such problems:

PROBLEM 1 (bleeding from non-escaping to escaping TextEncodings)
===

So let's say we are reading a filename from stdin. Currently stdin
uses the utf8 TextEncoding -- this TextEncoding knows nothing about
private-char roundtripping, and will throw an exception when decoding
bad bytes or encoding our private chars.

Now the user types a UTF-8 U+EF80 character - i.e. we get the bytes
0xEE 0xBC 0x80 on stdin.

The utf8 TextEncoding naively decodes this byte sequence to the
character sequence U+EF80.

We have lost at this point: if the user supplies the resulting String
to a function that encodes the String with the fileSystemEncoding, the
String will be encoded into the byte sequence 0x80. This is probably
not what we want to happen! It means that a program like this:


main = do
  fp - getLine
  readFile fp = putStrLn


Will fail (file not found: \x80) when given the name of an
(existant) file 0xEE 0xBC 0x80.

PROBLEM 2 (bleeding between two different escaping TextEncodings)
===

So let's say the user supplies the UTF-8 encoded U+EF00 (byte sequence
0xEE 0xBC 0x80) as a command line argument, so it goes through the
fileSystemEncoding. In your scheme the resulting Char sequence is
U+EFEE U+EFBC U+EF80.

What happens when we that *encode* that Char sequence using a UTF-16
TextEncoding (that knows about the 0xEFxx escape mechanism)? The
resulting byte sequence is 0xEE 0xBC 0x80, NOT the UTF-16 encoded
version of U+EF00! This is certainly contrary to what the user would
expect.

PROBLEM 3 (bleeding from escaping to non-escaping TextEncodings)
===

Just as above, let's say the user supplies the UTF-8 encoded U+EF00
(byte sequence 0xEE 0xBC 0x80) as a command line argument, so it goes
through the fileSystemEncoding. In your scheme the resulting Char
sequence is U+EFEE U+EFBC U+EF80.

If you try to write this String to stdout (which uses the UTF-8
encoding that knows nothing about 0xEFxx escapes) you just get an
exception, NOT the UTF-8 encoded version of U+EF00. Game over man,
game over!

CONCLUSION
===

As far as I can see, the proposed escaping scheme recovers the
roundtrip property but fails to regain a lot of other
reasonable-looking behaviours.

(Note that the above outlined problems are problems in the current
implementation too -- but the current implementation doesn't even
pretend to support U+EFxx characters. Its correctness is entirely
dependent on them never showing up, which is why we chose a part of
the private codepoint region that is reserved specifically for the
purpose of encoding hacks).

Max

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: behaviour change in getDirectoryContents in GHC 7.2?

2011-11-09 Thread Max Bolingbroke
On 9 November 2011 11:02, Simon Marlow marlo...@gmail.com wrote:
 The performance overhead of all this worries me.  withCString has taken a
 huge performance hit, and I think there are people who wnat to know that
 there aren't several complex encoding/decoding passes between their Haskell
 code and the POSIX API.  We ought to be able to program to POSIX directly,
 and the same goes for Win32.

We are only really talking about environment variables, filenames and
command line arguments here. I'm sure there are performance
implications to all this decoding/encoding, but these bits of text are
almost always very short and are unlikely to be causing bottlenecks.
Adding a whole new API *just* to eliminate a hypothetical performance
problem seems like overkill.

OTOH, I'm happy to add it if we stick with using private chars for the
escapes, because then using it or not using it is a *correctness*
issue (albeit in rare cases).

Max

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: behaviour change in getDirectoryContents in GHC 7.2?

2011-11-09 Thread Simon Marlow

On 08/11/2011 15:42, John Millikin wrote:

On Tue, Nov 8, 2011 at 03:04, Simon Marlowmarlo...@gmail.com  wrote:

I really think we should provide the native APIs.  The problem is that the
System.Posix.Directory API is all in terms of FilePath (=String), and if we
gave that a different meaning from the System.Directory FilePaths then
confusion would ensue.  So perhaps we need to add another API to
System.Posix with filesystem operations in terms of ByteString, and
similarly for Win32.


+1

I think most users would be OK with having System.Posix treat FilePath
differently, as long as this is clearly documented, but if you feel a
separate API is better then I have no objection. As long as there's
some way to say I know what I'm doing, here's the bytes to the
library.

The Win32 package uses wide-character functions, so I'm not sure
whether bytes would be appropriate there. My instinct says to stick
with chars, via withCWString or equivalent. The package maintainer
will have a better idea of what fits with the OS's idioms.


Ok, I spent most of today adding ByteString alternatives for all of the 
functions in System.Posix that use FilePath or environment strings.  The 
Haddocks for my augmented unix package are here:


http://community.haskell.org/~simonmar/unix-with-bytestring-extras/index.html

In particular, the module System.Posix.ByteString is the whole 
System.Posix API but with ByteString FilePaths and environment strings:


http://community.haskell.org/~simonmar/unix-with-bytestring-extras/System-Posix-ByteString.html

It has one addition relative to System.Posix:

  getArgs :: IO [ByteString]

Let me know what you think.  I suspect the main controversial aspect is 
that I included


  type FilePath = ByteString

which is a bit cute but might be confusing.

Cheers,
Simon

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: behaviour change in getDirectoryContents in GHC 7.2?

2011-11-09 Thread Simon Marlow

On 09/11/2011 13:11, Ian Lynagh wrote:
 On Wed, Nov 09, 2011 at 11:02:54AM +, Simon Marlow wrote:

 I would be happy with the surrogate approach I think.  Arguable if
 you try to treat a string with lone surrogates as Unicode and it
 fails, then that is a feature: the original string wasn't Unicode.
 All you can do with an invalid Unicode string is use it as a
 FilePath again, and the right thing will happen.

 If we aren't going to guarantee that the encoded string is unicode, then
 is there any benefit to encoding it in the first place?

With a decoded FilePath you can:

  - use it as a FilePath argument to some other function

  - map all the illegal characters to '?' and then treat it as
Unicode, e.g. for printing it out (but then you lost the ability to
roundtrip, which is why we can't do this automatically).

Ok, so since we need something like

  makePrintable :: FilePath - String

arguably we might as well make that do the locale decoding.  That's 
certainly a good point...


Cheers,
Simon

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: behaviour change in getDirectoryContents in GHC 7.2?

2011-11-09 Thread John Millikin
On Wed, Nov 9, 2011 at 08:04, Simon Marlow marlo...@gmail.com wrote:
 Ok, I spent most of today adding ByteString alternatives for all of the
 functions in System.Posix that use FilePath or environment strings.  The
 Haddocks for my augmented unix package are here:

 http://community.haskell.org/~simonmar/unix-with-bytestring-extras/index.html

 In particular, the module System.Posix.ByteString is the whole System.Posix
 API but with ByteString FilePaths and environment strings:

 http://community.haskell.org/~simonmar/unix-with-bytestring-extras/System-Posix-ByteString.html

This looks lovely -- thank you.

Once it's released, I'll port all my libraries over to using it.

 It has one addition relative to System.Posix:

  getArgs :: IO [ByteString]

Thank you very much! Several tools I use daily accept binary data as
command-line options, and this will make it much easier to port them
to Haskell in the future.

 Let me know what you think.  I suspect the main controversial aspect is that
 I included

  type FilePath = ByteString

 which is a bit cute but might be confusing.

Indeed, I was very confused when I saw that in the docs. If it's not
too much trouble, could those functions accept/return ByteString
directly?

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: behaviour change in getDirectoryContents in GHC 7.2?

2011-11-09 Thread Simon Marlow

On 09/11/2011 15:58, Max Bolingbroke wrote:


(Note that the above outlined problems are problems in the current
implementation too -- but the current implementation doesn't even
pretend to support U+EFxx characters. Its correctness is entirely
dependent on them never showing up, which is why we chose a part of
the private codepoint region that is reserved specifically for the
purpose of encoding hacks).


But we can't make that assumption, because the user might have 
accidentally set the locale wrong and then all kinds of garbage will 
show up in decoded file paths.  I think it's important that programs 
that just traverse the file system keep working under those conditions, 
rather than randomly failing due to (encode . decode) being almost but 
not quite the identity.


Cheers,
Simon



___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: behaviour change in getDirectoryContents in GHC 7.2?

2011-11-09 Thread Ian Lynagh
On Wed, Nov 09, 2011 at 03:58:47PM +, Max Bolingbroke wrote:
 
 (Note that the above outlined problems are problems in the current
 implementation too

Then the proposal seems to me to be strictly better than the current
system. Under both systems the wrong thing happen when U+EFxx is entered
as unicode text, but the proposed system works for all filenames read
from the filesystem.


In the longer term, I think we need to fix the underlying problem that
(for example) both getLine and getArgs produce a String from bytes, but
do so in different ways. At some point we should change the type of
getArgs and friends.


Thanks
Ian


___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: behaviour change in getDirectoryContents in GHC 7.2?

2011-11-09 Thread John Lask

My primary concerns are (in order of priority - and I only speak for myself)

(a) consistency across platforms
(b) minimize (unrequired) performance overhead

I would prefer an api which is consistent for both win32, posix or other 
os which only did as much as what the user (us) wanted

for example ...

module System.Directory.ByteString ...

FilePath = ByteString

getDirectoryContents :: FilePath - IO [FilePath]

which is the same for both win32 and posix and represents raw 
uninterpreted bytestrings in whatever encoding/(non-encoding) the os 
providesimplicitly it is for the user to know and understand what 
their getting (utf-16 in the case of windows, bytes in case of posix 
platforms)



then this api can be re-exported with the decoding/encoding by
System.Directory/System.IO

which would export FilePath=String

ie a two level api...



___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users