Re: [Haskell-cafe] Core packages and locale support

2010-06-27 Thread Brandon S Allbery KF8NH
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 6/27/10 03:11 , Roman Beslik wrote:
> No! The target encoding is the current locale. It is a no-brainer to find
> it. Use your Unix.
> $ man setlocale
> $ locale

So you want to use someone else's file, and their locale isn't the same as
yours.  What now?

- -- 
brandon s. allbery [linux,solaris,freebsd,perl]  allb...@kf8nh.com
system administrator  [openafs,heimdal,too many hats]  allb...@ece.cmu.edu
electrical and computer engineering, carnegie mellon university  KF8NH
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwnaM0ACgkQIn7hlCsL25WS4QCg0Bst0JdIylfzyTY6PHFccwl0
3lYAoNbFxah33H/8FM4z+LjYIaodwLjj
=Z8yt
-END PGP SIGNATURE-
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Core packages and locale support

2010-06-27 Thread Alexey Khudyakov
On Sun, 27 Jun 2010 10:28:49 +0300
Roman Beslik  wrote:

>   On 27.06.10 10:17, Bulat Ziganshin wrote:
> > Sunday, June 27, 2010, 11:07:47 AM, you wrote:
> >>> Currently, FilePath is an alias for String.  Changing FilePath to a real
> >>> type
> >> Just do not change FilePath, what may be simpler?
> > if FilePath will become abstract type, it will break all programs
> > that use it since they use it as String
>
> Hello, do you read me? I said: "do not change FilePath".

It's no good either. Then there is no way to have both automatic
decoding of file names and working with file names with incorrect
encoding. 


-- 
Alexey Khudyakov 
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Core packages and locale support

2010-06-27 Thread Bulat Ziganshin
Hello Roman,

Sunday, June 27, 2010, 11:37:24 AM, you wrote:

>>> No! The target encoding is the current locale. It is a no-brainer to
>> not necessarily. current locale, encoding of current terminal and
>> encoding of every filesystem mounted are all different things
> And we should stick to the current locale. Problem solved.
> "6.3  CString
> The module CString provides routines marshalling Haskell into C strings
> and vice versa. The marshalling converts each Haskell character, 
> representing a Unicode code point, to one or more bytes in a manner 
> that, by default, is determined by the *current locale*."
> "The Haskell 98 Foreign Function Interface."

1. it doesn't work on practice. ghc provides simple 8-bit conversion
and i think a lot of code relies on this behavior

2. when you mount external/network volume, it doesn't necessarily has
the same encoding as your current locale



-- 
Best regards,
 Bulatmailto:bulat.zigans...@gmail.com

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Core packages and locale support

2010-06-27 Thread Bulat Ziganshin
Hello Roman,

Sunday, June 27, 2010, 11:28:49 AM, you wrote:
>>> Just do not change FilePath, what may be simpler?
>> if FilePath will become abstract type, it will break all programs
>> that use it since they use it as String
> Hello, do you read me? I said: "do not change FilePath".

what you mean by abstract type then? :)


-- 
Best regards,
 Bulatmailto:bulat.zigans...@gmail.com

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Core packages and locale support

2010-06-27 Thread Bulat Ziganshin
Hello Roman,

Sunday, June 27, 2010, 11:24:16 AM, you wrote:

> O'kay, but IMHO few people want to have a headache with recoding. You
> knew that the implementation was incorrect, why you relied on it?

what is alternative? :)  on windows i've used low-level open()-styly
APIs, on Linux i got the same results with official API



-- 
Best regards,
 Bulatmailto:bulat.zigans...@gmail.com

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Core packages and locale support

2010-06-27 Thread Roman Beslik

 On 27.06.10 10:18, Bulat Ziganshin wrote:

Hello Roman,

Sunday, June 27, 2010, 11:11:59 AM, you wrote:


No! The target encoding is the current locale. It is a no-brainer to

not necessarily. current locale, encoding of current terminal and
encoding of every filesystem mounted are all different things

And we should stick to the current locale. Problem solved.
"6.3  CString
The module CString provides routines marshalling Haskell into C strings 
and vice versa. The marshalling converts each Haskell character, 
representing a Unicode code point, to one or more bytes in a manner 
that, by default, is determined by the *current locale*."

"The Haskell 98 Foreign Function Interface."

--
Best regards,
  Roman Beslik.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Core packages and locale support

2010-06-27 Thread Roman Beslik

 On 27.06.10 10:17, Bulat Ziganshin wrote:

Sunday, June 27, 2010, 11:07:47 AM, you wrote:

Currently, FilePath is an alias for String.  Changing FilePath to a real
type

Just do not change FilePath, what may be simpler?

if FilePath will become abstract type, it will break all programs
that use it since they use it as String

Hello, do you read me? I said: "do not change FilePath".

--
Best regards,
  Roman Beslik.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Core packages and locale support

2010-06-27 Thread Roman Beslik

 On 27.06.10 09:38, Bulat Ziganshin wrote:

Sunday, June 27, 2010, 3:52:54 AM, you wrote:

I fail to see how it will brake programs. Current programs do not use
Unicode because it is implemented incorrectly.

i use it. current Linux implementation treats String as sequence of
bytes, and with manual recoding it allows to use filesystems with
any encoding
O'kay, but IMHO few people want to have a headache with recoding. You 
knew that the implementation was incorrect, why you relied on it?


--
Best regards,
  Roman Beslik.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Core packages and locale support

2010-06-27 Thread Bulat Ziganshin
Hello Roman,

Sunday, June 27, 2010, 11:11:59 AM, you wrote:

> No! The target encoding is the current locale. It is a no-brainer to

not necessarily. current locale, encoding of current terminal and
encoding of every filesystem mounted are all different things


-- 
Best regards,
 Bulatmailto:bulat.zigans...@gmail.com

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Core packages and locale support

2010-06-27 Thread Bulat Ziganshin
Hello Roman,

Sunday, June 27, 2010, 11:07:47 AM, you wrote:

>> Currently, FilePath is an alias for String.  Changing FilePath to a real
>> type
> Just do not change FilePath, what may be simpler?

if FilePath will become abstract type, it will break all programs
that use it since they use it as String



-- 
Best regards,
 Bulatmailto:bulat.zigans...@gmail.com

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Core packages and locale support

2010-06-27 Thread Roman Beslik

 On 27.06.10 03:58, Felipe Lessa wrote:

On Sun, Jun 27, 2010 at 02:55:33AM +0300, Roman Beslik wrote:

  On 26.06.10 15:44, Felipe Lessa wrote:

However, suppose your program needs to create a file with a name
based on a database information.  Your database is UTF-8.  How do
you translate that UTF-8 data into a filepath?  This is the
problem we got in Haskell.  We have a nice coding-agnostic String
datatype, but we don't know how to create a file with this very
name.

It is simple — you recode from (database | "network server" | file)
encoding to the current locale.

Recoding is indeed very simple.  You know the source coding
(e.g. your database is in UTF-8).  But how do you discover the
target coding?  How can you find out that this system uses
ISO8859-1, while this other one uses UTF-16, while...?

See the problem now? :)
No! The target encoding is the current locale. It is a no-brainer to 
find it. Use your Unix.

$ man setlocale
$ locale

--
Best regards,
  Roman Beslik.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Core packages and locale support

2010-06-27 Thread Roman Beslik

 On 27.06.10 04:07, Brandon S Allbery KF8NH wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 6/26/10 19:52 , Roman Beslik wrote:

I fail to see how it will brake programs. Current programs do not use
Unicode because it is implemented incorrectly.

Currently, FilePath is an alias for String.  Changing FilePath to a real
type

Just do not change FilePath, what may be simpler?

--
Best regards,
  Roman Beslik.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Core packages and locale support

2010-06-26 Thread Bulat Ziganshin
Hello Roman,

Sunday, June 27, 2010, 3:52:54 AM, you wrote:

> I fail to see how it will brake programs. Current programs do not use
> Unicode because it is implemented incorrectly.

i use it. current Linux implementation treats String as sequence of
bytes, and with manual recoding it allows to use filesystems with
any encoding


-- 
Best regards,
 Bulatmailto:bulat.zigans...@gmail.com

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Core packages and locale support

2010-06-26 Thread Felipe Lessa
On Sat, Jun 26, 2010 at 10:01:57PM -0300, Felipe Lessa wrote:
> The types are:
>
>   getArgs   :: IO [String]
>   writeFile :: FilePath -> String -> IO ()

On a similar note, getArgs probably suffers from the same
problem.  Which should it be?

  a) getArgs :: IO [String]
  b) getArgs :: IO [Word8]
  c) getArgs :: IO [FilePath]
  d) getArgs :: IO [Argument]

Cheers,

--
Felipe.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Core packages and locale support

2010-06-26 Thread Brandon S Allbery KF8NH
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 6/26/10 19:52 , Roman Beslik wrote:
> I fail to see how it will brake programs. Current programs do not use
> Unicode because it is implemented incorrectly.

Currently, FilePath is an alias for String.  Changing FilePath to a real
type will break programs because there is no constructor for FilePath
currently, so everyone uses String.  And Haskell doesn't auto-coerce, so you
would need to use a typeclass and separate String and FilePath instances for
compatibility.

(On the other hand, this might be a good idea anyway; another instance that
would be useful would be [Word8].)

- -- 
brandon s. allbery [linux,solaris,freebsd,perl]  allb...@kf8nh.com
system administrator  [openafs,heimdal,too many hats]  allb...@ece.cmu.edu
electrical and computer engineering, carnegie mellon university  KF8NH
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwmpGgACgkQIn7hlCsL25W4IACdEzcDMkz62yqn4wKfx49y0zXy
DRcAnjxWf0a4SdBE7lBLVFZessUeVJ+n
=XreM
-END PGP SIGNATURE-
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Core packages and locale support

2010-06-26 Thread Felipe Lessa
On Sun, Jun 27, 2010 at 02:52:54AM +0300, Roman Beslik wrote:
>  On 26.06.10 16:34, Alexey Khudyakov wrote:
> >On Sat, 26 Jun 2010 10:14:50 -0300
> >Felipe Lessa  wrote:
> >>On Sat, Jun 26, 2010 at 05:01:06PM +0400, Bulat Ziganshin wrote:
> >>>but what you propose cannot be used in Windows at all! while current
> >>>FilePath still works on Unix, with manual filenames en/decoding
> >>Now we got back on topic! :)
> >>
> >>The FilePath datatype is OS-dependent and making it abstract
> >>should be at least a first step.  If you got it from somewhere
> >>else where it is already encoded, then fine.  If you need to
> >>construct it, then you need to use a smart constructor.  If you
> >>need to show/print it, then you need to convert it to String.
> >>And so on.
> >>
> >It should solve most of problems I believe. But such change will break
> >a lot of programs maybe most of them. How could one introduce such a
> >change? One variant is to create new hierarchy and gradually deprecate
> >old.
>
> I fail to see how it will brake programs. Current programs do not
> use Unicode because it is implemented incorrectly.

For example, this program would break:

  import System.Environment (getArgs)

  main :: IO ()
  main = getArgs >>= \[a] -> writeFile a "hello world"

The types are:

  getArgs   :: IO [String]
  writeFile :: FilePath -> String -> IO ()

Right now we have

  type FilePath = String

so the code above works.  If we had

  data FilePath = ...

then that would be a type error work at all.  So even one of the
most trivial programs wouldn't compile anymore.

Cheers,

--
Felipe.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Core packages and locale support

2010-06-26 Thread Felipe Lessa
On Sun, Jun 27, 2010 at 02:55:33AM +0300, Roman Beslik wrote:
>  On 26.06.10 15:44, Felipe Lessa wrote:
> >However, suppose your program needs to create a file with a name
> >based on a database information.  Your database is UTF-8.  How do
> >you translate that UTF-8 data into a filepath?  This is the
> >problem we got in Haskell.  We have a nice coding-agnostic String
> >datatype, but we don't know how to create a file with this very
> >name.
>
> It is simple — you recode from (database | "network server" | file)
> encoding to the current locale.

Recoding is indeed very simple.  You know the source coding
(e.g. your database is in UTF-8).  But how do you discover the
target coding?  How can you find out that this system uses
ISO8859-1, while this other one uses UTF-16, while...?

See the problem now? :)

Cheers,

--
Felipe.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Core packages and locale support

2010-06-26 Thread Roman Beslik

 On 26.06.10 15:44, Felipe Lessa wrote:

On Sat, Jun 26, 2010 at 09:29:29AM +0300, Roman Beslik wrote:

Incorrect encoding of filepaths is common in e.g. Cyrillic Linux
(because of multiple possible encodings --- CP1251, KOI8-R, UTF-8)
and is solved by fiddling with the current locale and media mount
options. No need to change a program, or to tell character encoding
to a program. It is not a programming language issue.

If your program saves files using filepaths given by the user or
created programatically from another filepath, then you don't
need to decode/encode anything and the problem isn't in the
programming language.

However, suppose your program needs to create a file with a name
based on a database information.  Your database is UTF-8.  How do
you translate that UTF-8 data into a filepath?  This is the
problem we got in Haskell.  We have a nice coding-agnostic String
datatype, but we don't know how to create a file with this very
name.
It is simple — you recode from (database | "network server" | file) 
encoding to the current locale.


--
Best regards,
  Roman Beslik.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Core packages and locale support

2010-06-26 Thread Roman Beslik

 On 26.06.10 16:34, Alexey Khudyakov wrote:

On Sat, 26 Jun 2010 10:14:50 -0300
Felipe Lessa  wrote:

On Sat, Jun 26, 2010 at 05:01:06PM +0400, Bulat Ziganshin wrote:

but what you propose cannot be used in Windows at all! while current
FilePath still works on Unix, with manual filenames en/decoding

Now we got back on topic! :)

The FilePath datatype is OS-dependent and making it abstract
should be at least a first step.  If you got it from somewhere
else where it is already encoded, then fine.  If you need to
construct it, then you need to use a smart constructor.  If you
need to show/print it, then you need to convert it to String.
And so on.


It should solve most of problems I believe. But such change will break
a lot of programs maybe most of them. How could one introduce such a
change? One variant is to create new hierarchy and gradually deprecate
old.
I fail to see how it will brake programs. Current programs do not use 
Unicode because it is implemented incorrectly.


--
Best regards,
  Roman Beslik.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Core packages and locale support

2010-06-26 Thread Felipe Lessa
On Sat, Jun 26, 2010 at 05:34:49PM +0400, Alexey Khudyakov wrote:
> It should solve most of problems I believe. But such change will break
> a lot of programs maybe most of them. How could one introduce such a
> change? One variant is to create new hierarchy and gradually deprecate
> old.
>
> Also same problem affect command line arguments and process module.

So that means we should make this change as soon as possible,
doesn't it? :)

The problem now is designing a future-proof OS-agnostic API to
avoid having to change this core part of the base library again
in the near future.

Cheers,

--
Felipe.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Core packages and locale support

2010-06-26 Thread Alexey Khudyakov
On Sat, 26 Jun 2010 10:14:50 -0300
Felipe Lessa  wrote:
> On Sat, Jun 26, 2010 at 05:01:06PM +0400, Bulat Ziganshin wrote:
> > but what you propose cannot be used in Windows at all! while current
> > FilePath still works on Unix, with manual filenames en/decoding
> 
> Now we got back on topic! :)
> 
> The FilePath datatype is OS-dependent and making it abstract
> should be at least a first step.  If you got it from somewhere
> else where it is already encoded, then fine.  If you need to
> construct it, then you need to use a smart constructor.  If you
> need to show/print it, then you need to convert it to String.
> And so on.
> 
It should solve most of problems I believe. But such change will break
a lot of programs maybe most of them. How could one introduce such a
change? One variant is to create new hierarchy and gradually deprecate
old.

Also same problem affect command line arguments and process module.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Core packages and locale support

2010-06-26 Thread Felipe Lessa
On Sat, Jun 26, 2010 at 05:01:06PM +0400, Bulat Ziganshin wrote:
> >> > Even if we said "we don't care", we at least should change
> >> > FilePath to be [Word8], and not [String].  Currently filepaths
>
> > other OSs worked fine, should I use this API (i.e. type FilePath
> > = String) to its fullest extent, my program will suddently become
> > unportable to all Unix OSs.
>
> but what you propose cannot be used in Windows at all! while current
> FilePath still works on Unix, with manual filenames en/decoding

Now we got back on topic! :)

The FilePath datatype is OS-dependent and making it abstract
should be at least a first step.  If you got it from somewhere
else where it is already encoded, then fine.  If you need to
construct it, then you need to use a smart constructor.  If you
need to show/print it, then you need to convert it to String.
And so on.

Cheers,

--
Felipe.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Core packages and locale support

2010-06-26 Thread Bulat Ziganshin
Hello Felipe,

Saturday, June 26, 2010, 4:54:16 PM, you wrote:

>> > Even if we said "we don't care", we at least should change
>> > FilePath to be [Word8], and not [String].  Currently filepaths

> other OSs worked fine, should I use this API (i.e. type FilePath
> = String) to its fullest extent, my program will suddently become
> unportable to all Unix OSs.

but what you propose cannot be used in Windows at all! while current
FilePath still works on Unix, with manual filenames en/decoding

-- 
Best regards,
 Bulatmailto:bulat.zigans...@gmail.com

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Core packages and locale support

2010-06-26 Thread Felipe Lessa
On Sat, Jun 26, 2010 at 04:48:39PM +0400, Bulat Ziganshin wrote:
> Saturday, June 26, 2010, 4:44:20 PM, Felipe Lessa wrote:
> > Even if we said "we don't care", we at least should change
> > FilePath to be [Word8], and not [String].  Currently filepaths
> > are silently "truncated" if any codepoint is beyond 255.
>
> and there is no OS except Unix ;)

Of course there is, however we should use the least common
denominator if we want to create portable programs.  Even if
other OSs worked fine, should I use this API (i.e. type FilePath
= String) to its fullest extent, my program will suddently become
unportable to all Unix OSs.

Cheers,

--
Felipe.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Core packages and locale support

2010-06-26 Thread Bulat Ziganshin
Hello Felipe,

Saturday, June 26, 2010, 4:44:20 PM, you wrote:

> Even if we said "we don't care", we at least should change
> FilePath to be [Word8], and not [String].  Currently filepaths
> are silently "truncated" if any codepoint is beyond 255.

and there is no OS except Unix ;)


-- 
Best regards,
 Bulatmailto:bulat.zigans...@gmail.com

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Core packages and locale support

2010-06-26 Thread Felipe Lessa
On Sat, Jun 26, 2010 at 09:29:29AM +0300, Roman Beslik wrote:
> Incorrect encoding of filepaths is common in e.g. Cyrillic Linux
> (because of multiple possible encodings --- CP1251, KOI8-R, UTF-8)
> and is solved by fiddling with the current locale and media mount
> options. No need to change a program, or to tell character encoding
> to a program. It is not a programming language issue.

If your program saves files using filepaths given by the user or
created programatically from another filepath, then you don't
need to decode/encode anything and the problem isn't in the
programming language.

However, suppose your program needs to create a file with a name
based on a database information.  Your database is UTF-8.  How do
you translate that UTF-8 data into a filepath?  This is the
problem we got in Haskell.  We have a nice coding-agnostic String
datatype, but we don't know how to create a file with this very
name.

The opposite also may also be problem.  Okay, you got an already
correctly-encoded filepath.  But you want to store this
information in your database.  Now, you have two options:

  a) Save the enconded filepath.  Each record of your database
  will potentially have a different encoding, which is very bad.

  b) Recode into, say, UTF-8.  But to do that you need to know
  the original coding using in the filepath, so we got the same
  problem above.

Even if we said "we don't care", we at least should change
FilePath to be [Word8], and not [String].  Currently filepaths
are silently "truncated" if any codepoint is beyond 255.

Cheers,

--
Felipe.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Core packages and locale support

2010-06-25 Thread Roman Beslik



On 25.06.10 20:09, Jason Dagit wrote:


you got everything right here. So, as you said, there is a mismatch
between representation in Haskell (list of code points) and
representation in the operating system (list of bytes), so we need to
know the encoding. Encoding is supplied by the user via locale
(https://secure.wikimedia.org/wikipedia/en/wiki/Locale), particularly
LC_CTYPE variable.

The problem with encodings is not new -- it was already solved
e.g. for
input/output.


This is the part where I don't understand the problem well.  I thought 
that with IO the program assumes the locale of the environment but 
that with filepaths you don't know what locale (more specifically 
which encoding) they were created with.  So if you try to treat them 
as having the locale of the current environment you run the risk of 
misunderstanding their encoding.


Incorrect encoding of filepaths is common in e.g. Cyrillic Linux 
(because of multiple possible encodings --- CP1251, KOI8-R, UTF-8) and 
is solved by fiddling with the current locale and media mount options. 
No need to change a program, or to tell character encoding to a program. 
It is not a programming language issue.


--
Best regards,
  Roman Beslik.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Core packages and locale support

2010-06-25 Thread Jason Dagit
On Fri, Jun 25, 2010 at 3:15 PM, Brandon S Allbery KF8NH <
allb...@ece.cmu.edu> wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 6/25/10 17:56 , Roman Cheplyaka wrote:
> > * Brandon S Allbery KF8NH  [2010-06-25
> 05:00:08-0400]
> >> You might want to look at how Python is dealing with this (including the
> >> pain involved; best to learn from example).
> >
> > Do you mean the pain when filenames can not be decoded using current
> > locale settings and thus the files are not accessible? (The same about
> > environment variables.)
>
> Yes, this.
>
> > Agreed, it's unpleasant. The other way would be changing [Char] to
> [Word8]
> > or ByteString. But this would a) break all existing programs and b) be
> > an OS-specific hack. Crap.
>
> But it *is* OS-specific, just as Windows' UTF-16 is an OS-specific
> mechanism.  Unfortunately, there's no good solution in the Unix case aside
> from assuming a specific encoding, and the locale is as good as any; but I
> think LC_CTYPE is probably the most applicable.  This will, however,
> confuse
> everyone else.
>
> Perhaps best is to look at whether there is any consensus building as to
> how
> to resolve it, and if not use locale but document it as an unstable
> interface.  Or possibly just leave things as is until consensus develops.
> It would be Bad to choose one (say, locale) only to have everyone else go
> in
> a different direction (say, UTF-8 with the application libraries
> potentially
> re-encoding filenames).
>

In the case of IO you can disable the locale specific encoding/decoding by
switching to binary mode.  Would a similar API be available when working
with filepaths?  Darcs, for instance, deals with lots of file paths and has
very specific requirements.  Losing access to files due to bad encodings, or
mistaken encodings, is the sort of thing that would break some people's
repositories.  So tools like Darcs would probably need a way to disable this
sort of automatic encoding/decoding.

Jason
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Core packages and locale support

2010-06-25 Thread Brandon S Allbery KF8NH
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 6/25/10 17:56 , Roman Cheplyaka wrote:
> * Brandon S Allbery KF8NH  [2010-06-25 05:00:08-0400]
>> You might want to look at how Python is dealing with this (including the
>> pain involved; best to learn from example).
> 
> Do you mean the pain when filenames can not be decoded using current
> locale settings and thus the files are not accessible? (The same about
> environment variables.)

Yes, this.

> Agreed, it's unpleasant. The other way would be changing [Char] to [Word8]
> or ByteString. But this would a) break all existing programs and b) be
> an OS-specific hack. Crap.

But it *is* OS-specific, just as Windows' UTF-16 is an OS-specific
mechanism.  Unfortunately, there's no good solution in the Unix case aside
from assuming a specific encoding, and the locale is as good as any; but I
think LC_CTYPE is probably the most applicable.  This will, however, confuse
everyone else.

Perhaps best is to look at whether there is any consensus building as to how
to resolve it, and if not use locale but document it as an unstable
interface.  Or possibly just leave things as is until consensus develops.
It would be Bad to choose one (say, locale) only to have everyone else go in
a different direction (say, UTF-8 with the application libraries potentially
re-encoding filenames).

(The flip side of *that*, of course, is that everyone else (save GvR) may be
waiting for the same thing.  Which is why we look around first.  At worst it
may be a reason to push for something, perhaps as part of LSB and then
assumed anywhere that doesn't have its own solution.  I think the main
issues there would be *BSD, which is used to being on the wrong end of the
Linux stick, and Solaris which Oracle has helpfully (effectively; nobody
manning the rudders) nuked from orbit. :/ )

- -- 
brandon s. allbery [linux,solaris,freebsd,perl]  allb...@kf8nh.com
system administrator  [openafs,heimdal,too many hats]  allb...@ece.cmu.edu
electrical and computer engineering, carnegie mellon university  KF8NH
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwlKo8ACgkQIn7hlCsL25W4YwCfYeWFXWMiE6FqoODYVNv4jK4c
LusAnRwi839s9l6bnNj7tcXUTu1i1BGU
=7L0U
-END PGP SIGNATURE-
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Core packages and locale support

2010-06-25 Thread Roman Cheplyaka
* Brandon S Allbery KF8NH  [2010-06-25 05:00:08-0400]
> On 6/25/10 02:42 , Roman Cheplyaka wrote:
> > * Jason Dagit  [2010-06-24 20:52:03-0700]
> >> On Sat, Jun 19, 2010 at 1:06 AM, Roman Cheplyaka  wrote:
> >>> While ghc 6.12 finally has proper locale support, core packages (such as
> >>> unix) still use withCString and therefore work incorrectly when argument
> >>> (e.g. file path) is not ASCII.
> >>
> >> Pardon me if I'm misunderstanding withCString, but my understanding of unix
> >> paths is that they are to be treated as strings of bytes.  That is, unlike
> >> windows, they do not have an encoding predefined.  Furthermore, you could
> >> have two filepaths in the same directory with different encodings due to
> >> this.
> > 
> > you got everything right here. So, as you said, there is a mismatch
> > between representation in Haskell (list of code points) and
> > representation in the operating system (list of bytes), so we need to
> > know the encoding. Encoding is supplied by the user via locale
> > (https://secure.wikimedia.org/wikipedia/en/wiki/Locale), particularly
> > LC_CTYPE variable.
> 
> You might want to look at how Python is dealing with this (including the
> pain involved; best to learn from example).

Do you mean the pain when filenames can not be decoded using current
locale settings and thus the files are not accessible? (The same about
environment variables.)

Agreed, it's unpleasant. The other way would be changing [Char] to [Word8]
or ByteString. But this would a) break all existing programs and b) be
an OS-specific hack. Crap.

Brandon, do you have any ideas on how we should proceed with this?

-- 
Roman I. Cheplyaka :: http://ro-che.info/
"Don't let school get in the way of your education." - Mark Twain
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Core packages and locale support

2010-06-25 Thread Brandon S Allbery KF8NH
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 6/25/10 17:05 , Roman Cheplyaka wrote:
> By the way, GTK (which internally uses UTF-8 for strings) treats this
> problem differently -- it has special variable G_FILENAME_ENCODING and
> also G_BROKEN_FILENAMES (which means that filenames are encoded as
> locale says). I have no clue how their G_* variables are better than our
> conventional LC_* variables though.
> http://www.gtk.org/api/2.6/glib/glib-Character-Set-Conversion.html

I would assume what they really mean by this is that the filename encoding
should be part of the file metadata and G_BROKEN_FILENAMES means it isn't.
G_FILENAME_ENCODING would then be the encoding used when creating new files.

- -- 
brandon s. allbery [linux,solaris,freebsd,perl]  allb...@kf8nh.com
system administrator  [openafs,heimdal,too many hats]  allb...@ece.cmu.edu
electrical and computer engineering, carnegie mellon university  KF8NH
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwlHcAACgkQIn7hlCsL25X1SQCgq6z+2CbiPbw4ECSABZaKmAhU
2PgAoLcK2SQAeyvLqWnr7cEz3uMCN98C
=kdp+
-END PGP SIGNATURE-
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Core packages and locale support

2010-06-25 Thread Roman Cheplyaka
* Jason Dagit  [2010-06-25 10:09:21-0700]
> On Thu, Jun 24, 2010 at 11:42 PM, Roman Cheplyaka  wrote:
> 
> > * Jason Dagit  [2010-06-24 20:52:03-0700]
> > > On Sat, Jun 19, 2010 at 1:06 AM, Roman Cheplyaka 
> > wrote:
> > >
> > > > While ghc 6.12 finally has proper locale support, core packages (such
> > as
> > > > unix) still use withCString and therefore work incorrectly when
> > argument
> > > > (e.g. file path) is not ASCII.
> > > >
> > >
> > > Pardon me if I'm misunderstanding withCString, but my understanding of
> > unix
> > > paths is that they are to be treated as strings of bytes.  That is,
> > unlike
> > > windows, they do not have an encoding predefined.  Furthermore, you could
> > > have two filepaths in the same directory with different encodings due to
> > > this.
> > >
> > > In this case, what would be the correct way of handling the paths?
> > >  Converting to a Haskell String would require knowing the encoding,
> > right?
> > >  My reasoning is that Haskell Char type is meant to correspond to code
> > > points so putting them into a string means you have to know their code
> > point
> > > which is different from their (multi-)byte value right?
> > >
> > > Perhaps I have some details wrong?  If so, please clarify.
> >
> > Jason,
> >
> > you got everything right here. So, as you said, there is a mismatch
> > between representation in Haskell (list of code points) and
> > representation in the operating system (list of bytes), so we need to
> > know the encoding. Encoding is supplied by the user via locale
> > (https://secure.wikimedia.org/wikipedia/en/wiki/Locale), particularly
> > LC_CTYPE variable.
> >
> > The problem with encodings is not new -- it was already solved e.g. for
> > input/output.
> >
> 
> This is the part where I don't understand the problem well.  I thought that
> with IO the program assumes the locale of the environment but that with
> filepaths you don't know what locale (more specifically which encoding) they
> were created with.  So if you try to treat them as having the locale of the
> current environment you run the risk of misunderstanding their encoding.

Sure you do. But there is no other source of encoding information apart
from the current locale. So UNIX (currently) puts the responsibility on
the user.

It's hard to give convincing examples demonstrating this semantics
because UNIX userspace is mostly written in C and there char is just a
byte, so most of them don't bother with encoding and decoding.

Difference between IO and filenames is vague -- what if you pipe ls(1)
to some program? Since ls does no recoding, encoding filenames
differently from locale is a bad idea.

By the way, GTK (which internally uses UTF-8 for strings) treats this
problem differently -- it has special variable G_FILENAME_ENCODING and
also G_BROKEN_FILENAMES (which means that filenames are encoded as
locale says). I have no clue how their G_* variables are better than our
conventional LC_* variables though.
http://www.gtk.org/api/2.6/glib/glib-Character-Set-Conversion.html

-- 
Roman I. Cheplyaka :: http://ro-che.info/
"Don't let school get in the way of your education." - Mark Twain
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Core packages and locale support

2010-06-25 Thread Jason Dagit
On Thu, Jun 24, 2010 at 11:42 PM, Roman Cheplyaka  wrote:

> * Jason Dagit  [2010-06-24 20:52:03-0700]
> > On Sat, Jun 19, 2010 at 1:06 AM, Roman Cheplyaka 
> wrote:
> >
> > > While ghc 6.12 finally has proper locale support, core packages (such
> as
> > > unix) still use withCString and therefore work incorrectly when
> argument
> > > (e.g. file path) is not ASCII.
> > >
> >
> > Pardon me if I'm misunderstanding withCString, but my understanding of
> unix
> > paths is that they are to be treated as strings of bytes.  That is,
> unlike
> > windows, they do not have an encoding predefined.  Furthermore, you could
> > have two filepaths in the same directory with different encodings due to
> > this.
> >
> > In this case, what would be the correct way of handling the paths?
> >  Converting to a Haskell String would require knowing the encoding,
> right?
> >  My reasoning is that Haskell Char type is meant to correspond to code
> > points so putting them into a string means you have to know their code
> point
> > which is different from their (multi-)byte value right?
> >
> > Perhaps I have some details wrong?  If so, please clarify.
>
> Jason,
>
> you got everything right here. So, as you said, there is a mismatch
> between representation in Haskell (list of code points) and
> representation in the operating system (list of bytes), so we need to
> know the encoding. Encoding is supplied by the user via locale
> (https://secure.wikimedia.org/wikipedia/en/wiki/Locale), particularly
> LC_CTYPE variable.
>
> The problem with encodings is not new -- it was already solved e.g. for
> input/output.
>

This is the part where I don't understand the problem well.  I thought that
with IO the program assumes the locale of the environment but that with
filepaths you don't know what locale (more specifically which encoding) they
were created with.  So if you try to treat them as having the locale of the
current environment you run the risk of misunderstanding their encoding.

Jason
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Core packages and locale support

2010-06-25 Thread Brandon S Allbery KF8NH
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 6/25/10 02:42 , Roman Cheplyaka wrote:
> * Jason Dagit  [2010-06-24 20:52:03-0700]
>> On Sat, Jun 19, 2010 at 1:06 AM, Roman Cheplyaka  wrote:
>>> While ghc 6.12 finally has proper locale support, core packages (such as
>>> unix) still use withCString and therefore work incorrectly when argument
>>> (e.g. file path) is not ASCII.
>>
>> Pardon me if I'm misunderstanding withCString, but my understanding of unix
>> paths is that they are to be treated as strings of bytes.  That is, unlike
>> windows, they do not have an encoding predefined.  Furthermore, you could
>> have two filepaths in the same directory with different encodings due to
>> this.
> 
> you got everything right here. So, as you said, there is a mismatch
> between representation in Haskell (list of code points) and
> representation in the operating system (list of bytes), so we need to
> know the encoding. Encoding is supplied by the user via locale
> (https://secure.wikimedia.org/wikipedia/en/wiki/Locale), particularly
> LC_CTYPE variable.

You might want to look at how Python is dealing with this (including the
pain involved; best to learn from example).

- -- 
brandon s. allbery [linux,solaris,freebsd,perl]  allb...@kf8nh.com
system administrator  [openafs,heimdal,too many hats]  allb...@ece.cmu.edu
electrical and computer engineering, carnegie mellon university  KF8NH
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwkcAYACgkQIn7hlCsL25W4BgCfVEyndklgo2TOyyemqdTKGkvS
dBMAoKq3t9vMOkZZHiEHkIN5IDjgVbRt
=69C5
-END PGP SIGNATURE-
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Core packages and locale support

2010-06-24 Thread Roman Cheplyaka
* Jason Dagit  [2010-06-24 20:52:03-0700]
> On Sat, Jun 19, 2010 at 1:06 AM, Roman Cheplyaka  wrote:
> 
> > While ghc 6.12 finally has proper locale support, core packages (such as
> > unix) still use withCString and therefore work incorrectly when argument
> > (e.g. file path) is not ASCII.
> >
> 
> Pardon me if I'm misunderstanding withCString, but my understanding of unix
> paths is that they are to be treated as strings of bytes.  That is, unlike
> windows, they do not have an encoding predefined.  Furthermore, you could
> have two filepaths in the same directory with different encodings due to
> this.
> 
> In this case, what would be the correct way of handling the paths?
>  Converting to a Haskell String would require knowing the encoding, right?
>  My reasoning is that Haskell Char type is meant to correspond to code
> points so putting them into a string means you have to know their code point
> which is different from their (multi-)byte value right?
> 
> Perhaps I have some details wrong?  If so, please clarify.

Jason,

you got everything right here. So, as you said, there is a mismatch
between representation in Haskell (list of code points) and
representation in the operating system (list of bytes), so we need to
know the encoding. Encoding is supplied by the user via locale
(https://secure.wikimedia.org/wikipedia/en/wiki/Locale), particularly
LC_CTYPE variable.

The problem with encodings is not new -- it was already solved e.g. for
input/output.

As I said, I'm willing to prepare the patches, but I really need a
mentor for this.

-- 
Roman I. Cheplyaka :: http://ro-che.info/
"Don't let school get in the way of your education." - Mark Twain
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Core packages and locale support

2010-06-24 Thread Jason Dagit
On Sat, Jun 19, 2010 at 1:06 AM, Roman Cheplyaka  wrote:

> While ghc 6.12 finally has proper locale support, core packages (such as
> unix) still use withCString and therefore work incorrectly when argument
> (e.g. file path) is not ASCII.
>

Pardon me if I'm misunderstanding withCString, but my understanding of unix
paths is that they are to be treated as strings of bytes.  That is, unlike
windows, they do not have an encoding predefined.  Furthermore, you could
have two filepaths in the same directory with different encodings due to
this.

In this case, what would be the correct way of handling the paths?
 Converting to a Haskell String would require knowing the encoding, right?
 My reasoning is that Haskell Char type is meant to correspond to code
points so putting them into a string means you have to know their code point
which is different from their (multi-)byte value right?

Perhaps I have some details wrong?  If so, please clarify.

Jason
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Core packages and locale support

2010-06-19 Thread Roman Cheplyaka
While ghc 6.12 finally has proper locale support, core packages (such as
unix) still use withCString and therefore work incorrectly when argument
(e.g. file path) is not ASCII.

Is someone already working on this? If it's just a matter of time and
manpower I can help but I need some guidance from authority.

-- 
Roman I. Cheplyaka :: http://ro-che.info/
"Don't let school get in the way of your education." - Mark Twain
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe