Re: [Haskell-cafe] Core packages and locale support
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 6/27/10 03:11 , Roman Beslik wrote: > No! The target encoding is the current locale. It is a no-brainer to find > it. Use your Unix. > $ man setlocale > $ locale So you want to use someone else's file, and their locale isn't the same as yours. What now? - -- brandon s. allbery [linux,solaris,freebsd,perl] allb...@kf8nh.com system administrator [openafs,heimdal,too many hats] allb...@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwnaM0ACgkQIn7hlCsL25WS4QCg0Bst0JdIylfzyTY6PHFccwl0 3lYAoNbFxah33H/8FM4z+LjYIaodwLjj =Z8yt -END PGP SIGNATURE- ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Core packages and locale support
On Sun, 27 Jun 2010 10:28:49 +0300 Roman Beslik wrote: > On 27.06.10 10:17, Bulat Ziganshin wrote: > > Sunday, June 27, 2010, 11:07:47 AM, you wrote: > >>> Currently, FilePath is an alias for String. Changing FilePath to a real > >>> type > >> Just do not change FilePath, what may be simpler? > > if FilePath will become abstract type, it will break all programs > > that use it since they use it as String > > Hello, do you read me? I said: "do not change FilePath". It's no good either. Then there is no way to have both automatic decoding of file names and working with file names with incorrect encoding. -- Alexey Khudyakov ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[2]: [Haskell-cafe] Core packages and locale support
Hello Roman, Sunday, June 27, 2010, 11:37:24 AM, you wrote: >>> No! The target encoding is the current locale. It is a no-brainer to >> not necessarily. current locale, encoding of current terminal and >> encoding of every filesystem mounted are all different things > And we should stick to the current locale. Problem solved. > "6.3 CString > The module CString provides routines marshalling Haskell into C strings > and vice versa. The marshalling converts each Haskell character, > representing a Unicode code point, to one or more bytes in a manner > that, by default, is determined by the *current locale*." > "The Haskell 98 Foreign Function Interface." 1. it doesn't work on practice. ghc provides simple 8-bit conversion and i think a lot of code relies on this behavior 2. when you mount external/network volume, it doesn't necessarily has the same encoding as your current locale -- Best regards, Bulatmailto:bulat.zigans...@gmail.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[2]: [Haskell-cafe] Core packages and locale support
Hello Roman, Sunday, June 27, 2010, 11:28:49 AM, you wrote: >>> Just do not change FilePath, what may be simpler? >> if FilePath will become abstract type, it will break all programs >> that use it since they use it as String > Hello, do you read me? I said: "do not change FilePath". what you mean by abstract type then? :) -- Best regards, Bulatmailto:bulat.zigans...@gmail.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[2]: [Haskell-cafe] Core packages and locale support
Hello Roman, Sunday, June 27, 2010, 11:24:16 AM, you wrote: > O'kay, but IMHO few people want to have a headache with recoding. You > knew that the implementation was incorrect, why you relied on it? what is alternative? :) on windows i've used low-level open()-styly APIs, on Linux i got the same results with official API -- Best regards, Bulatmailto:bulat.zigans...@gmail.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Core packages and locale support
On 27.06.10 10:18, Bulat Ziganshin wrote: Hello Roman, Sunday, June 27, 2010, 11:11:59 AM, you wrote: No! The target encoding is the current locale. It is a no-brainer to not necessarily. current locale, encoding of current terminal and encoding of every filesystem mounted are all different things And we should stick to the current locale. Problem solved. "6.3 CString The module CString provides routines marshalling Haskell into C strings and vice versa. The marshalling converts each Haskell character, representing a Unicode code point, to one or more bytes in a manner that, by default, is determined by the *current locale*." "The Haskell 98 Foreign Function Interface." -- Best regards, Roman Beslik. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Core packages and locale support
On 27.06.10 10:17, Bulat Ziganshin wrote: Sunday, June 27, 2010, 11:07:47 AM, you wrote: Currently, FilePath is an alias for String. Changing FilePath to a real type Just do not change FilePath, what may be simpler? if FilePath will become abstract type, it will break all programs that use it since they use it as String Hello, do you read me? I said: "do not change FilePath". -- Best regards, Roman Beslik. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Core packages and locale support
On 27.06.10 09:38, Bulat Ziganshin wrote: Sunday, June 27, 2010, 3:52:54 AM, you wrote: I fail to see how it will brake programs. Current programs do not use Unicode because it is implemented incorrectly. i use it. current Linux implementation treats String as sequence of bytes, and with manual recoding it allows to use filesystems with any encoding O'kay, but IMHO few people want to have a headache with recoding. You knew that the implementation was incorrect, why you relied on it? -- Best regards, Roman Beslik. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[2]: [Haskell-cafe] Core packages and locale support
Hello Roman, Sunday, June 27, 2010, 11:11:59 AM, you wrote: > No! The target encoding is the current locale. It is a no-brainer to not necessarily. current locale, encoding of current terminal and encoding of every filesystem mounted are all different things -- Best regards, Bulatmailto:bulat.zigans...@gmail.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[2]: [Haskell-cafe] Core packages and locale support
Hello Roman, Sunday, June 27, 2010, 11:07:47 AM, you wrote: >> Currently, FilePath is an alias for String. Changing FilePath to a real >> type > Just do not change FilePath, what may be simpler? if FilePath will become abstract type, it will break all programs that use it since they use it as String -- Best regards, Bulatmailto:bulat.zigans...@gmail.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Core packages and locale support
On 27.06.10 03:58, Felipe Lessa wrote: On Sun, Jun 27, 2010 at 02:55:33AM +0300, Roman Beslik wrote: On 26.06.10 15:44, Felipe Lessa wrote: However, suppose your program needs to create a file with a name based on a database information. Your database is UTF-8. How do you translate that UTF-8 data into a filepath? This is the problem we got in Haskell. We have a nice coding-agnostic String datatype, but we don't know how to create a file with this very name. It is simple — you recode from (database | "network server" | file) encoding to the current locale. Recoding is indeed very simple. You know the source coding (e.g. your database is in UTF-8). But how do you discover the target coding? How can you find out that this system uses ISO8859-1, while this other one uses UTF-16, while...? See the problem now? :) No! The target encoding is the current locale. It is a no-brainer to find it. Use your Unix. $ man setlocale $ locale -- Best regards, Roman Beslik. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Core packages and locale support
On 27.06.10 04:07, Brandon S Allbery KF8NH wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 6/26/10 19:52 , Roman Beslik wrote: I fail to see how it will brake programs. Current programs do not use Unicode because it is implemented incorrectly. Currently, FilePath is an alias for String. Changing FilePath to a real type Just do not change FilePath, what may be simpler? -- Best regards, Roman Beslik. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[2]: [Haskell-cafe] Core packages and locale support
Hello Roman, Sunday, June 27, 2010, 3:52:54 AM, you wrote: > I fail to see how it will brake programs. Current programs do not use > Unicode because it is implemented incorrectly. i use it. current Linux implementation treats String as sequence of bytes, and with manual recoding it allows to use filesystems with any encoding -- Best regards, Bulatmailto:bulat.zigans...@gmail.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Core packages and locale support
On Sat, Jun 26, 2010 at 10:01:57PM -0300, Felipe Lessa wrote: > The types are: > > getArgs :: IO [String] > writeFile :: FilePath -> String -> IO () On a similar note, getArgs probably suffers from the same problem. Which should it be? a) getArgs :: IO [String] b) getArgs :: IO [Word8] c) getArgs :: IO [FilePath] d) getArgs :: IO [Argument] Cheers, -- Felipe. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Core packages and locale support
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 6/26/10 19:52 , Roman Beslik wrote: > I fail to see how it will brake programs. Current programs do not use > Unicode because it is implemented incorrectly. Currently, FilePath is an alias for String. Changing FilePath to a real type will break programs because there is no constructor for FilePath currently, so everyone uses String. And Haskell doesn't auto-coerce, so you would need to use a typeclass and separate String and FilePath instances for compatibility. (On the other hand, this might be a good idea anyway; another instance that would be useful would be [Word8].) - -- brandon s. allbery [linux,solaris,freebsd,perl] allb...@kf8nh.com system administrator [openafs,heimdal,too many hats] allb...@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwmpGgACgkQIn7hlCsL25W4IACdEzcDMkz62yqn4wKfx49y0zXy DRcAnjxWf0a4SdBE7lBLVFZessUeVJ+n =XreM -END PGP SIGNATURE- ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Core packages and locale support
On Sun, Jun 27, 2010 at 02:52:54AM +0300, Roman Beslik wrote: > On 26.06.10 16:34, Alexey Khudyakov wrote: > >On Sat, 26 Jun 2010 10:14:50 -0300 > >Felipe Lessa wrote: > >>On Sat, Jun 26, 2010 at 05:01:06PM +0400, Bulat Ziganshin wrote: > >>>but what you propose cannot be used in Windows at all! while current > >>>FilePath still works on Unix, with manual filenames en/decoding > >>Now we got back on topic! :) > >> > >>The FilePath datatype is OS-dependent and making it abstract > >>should be at least a first step. If you got it from somewhere > >>else where it is already encoded, then fine. If you need to > >>construct it, then you need to use a smart constructor. If you > >>need to show/print it, then you need to convert it to String. > >>And so on. > >> > >It should solve most of problems I believe. But such change will break > >a lot of programs maybe most of them. How could one introduce such a > >change? One variant is to create new hierarchy and gradually deprecate > >old. > > I fail to see how it will brake programs. Current programs do not > use Unicode because it is implemented incorrectly. For example, this program would break: import System.Environment (getArgs) main :: IO () main = getArgs >>= \[a] -> writeFile a "hello world" The types are: getArgs :: IO [String] writeFile :: FilePath -> String -> IO () Right now we have type FilePath = String so the code above works. If we had data FilePath = ... then that would be a type error work at all. So even one of the most trivial programs wouldn't compile anymore. Cheers, -- Felipe. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Core packages and locale support
On Sun, Jun 27, 2010 at 02:55:33AM +0300, Roman Beslik wrote: > On 26.06.10 15:44, Felipe Lessa wrote: > >However, suppose your program needs to create a file with a name > >based on a database information. Your database is UTF-8. How do > >you translate that UTF-8 data into a filepath? This is the > >problem we got in Haskell. We have a nice coding-agnostic String > >datatype, but we don't know how to create a file with this very > >name. > > It is simple — you recode from (database | "network server" | file) > encoding to the current locale. Recoding is indeed very simple. You know the source coding (e.g. your database is in UTF-8). But how do you discover the target coding? How can you find out that this system uses ISO8859-1, while this other one uses UTF-16, while...? See the problem now? :) Cheers, -- Felipe. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Core packages and locale support
On 26.06.10 15:44, Felipe Lessa wrote: On Sat, Jun 26, 2010 at 09:29:29AM +0300, Roman Beslik wrote: Incorrect encoding of filepaths is common in e.g. Cyrillic Linux (because of multiple possible encodings --- CP1251, KOI8-R, UTF-8) and is solved by fiddling with the current locale and media mount options. No need to change a program, or to tell character encoding to a program. It is not a programming language issue. If your program saves files using filepaths given by the user or created programatically from another filepath, then you don't need to decode/encode anything and the problem isn't in the programming language. However, suppose your program needs to create a file with a name based on a database information. Your database is UTF-8. How do you translate that UTF-8 data into a filepath? This is the problem we got in Haskell. We have a nice coding-agnostic String datatype, but we don't know how to create a file with this very name. It is simple — you recode from (database | "network server" | file) encoding to the current locale. -- Best regards, Roman Beslik. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Core packages and locale support
On 26.06.10 16:34, Alexey Khudyakov wrote: On Sat, 26 Jun 2010 10:14:50 -0300 Felipe Lessa wrote: On Sat, Jun 26, 2010 at 05:01:06PM +0400, Bulat Ziganshin wrote: but what you propose cannot be used in Windows at all! while current FilePath still works on Unix, with manual filenames en/decoding Now we got back on topic! :) The FilePath datatype is OS-dependent and making it abstract should be at least a first step. If you got it from somewhere else where it is already encoded, then fine. If you need to construct it, then you need to use a smart constructor. If you need to show/print it, then you need to convert it to String. And so on. It should solve most of problems I believe. But such change will break a lot of programs maybe most of them. How could one introduce such a change? One variant is to create new hierarchy and gradually deprecate old. I fail to see how it will brake programs. Current programs do not use Unicode because it is implemented incorrectly. -- Best regards, Roman Beslik. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Core packages and locale support
On Sat, Jun 26, 2010 at 05:34:49PM +0400, Alexey Khudyakov wrote: > It should solve most of problems I believe. But such change will break > a lot of programs maybe most of them. How could one introduce such a > change? One variant is to create new hierarchy and gradually deprecate > old. > > Also same problem affect command line arguments and process module. So that means we should make this change as soon as possible, doesn't it? :) The problem now is designing a future-proof OS-agnostic API to avoid having to change this core part of the base library again in the near future. Cheers, -- Felipe. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Core packages and locale support
On Sat, 26 Jun 2010 10:14:50 -0300 Felipe Lessa wrote: > On Sat, Jun 26, 2010 at 05:01:06PM +0400, Bulat Ziganshin wrote: > > but what you propose cannot be used in Windows at all! while current > > FilePath still works on Unix, with manual filenames en/decoding > > Now we got back on topic! :) > > The FilePath datatype is OS-dependent and making it abstract > should be at least a first step. If you got it from somewhere > else where it is already encoded, then fine. If you need to > construct it, then you need to use a smart constructor. If you > need to show/print it, then you need to convert it to String. > And so on. > It should solve most of problems I believe. But such change will break a lot of programs maybe most of them. How could one introduce such a change? One variant is to create new hierarchy and gradually deprecate old. Also same problem affect command line arguments and process module. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Core packages and locale support
On Sat, Jun 26, 2010 at 05:01:06PM +0400, Bulat Ziganshin wrote: > >> > Even if we said "we don't care", we at least should change > >> > FilePath to be [Word8], and not [String]. Currently filepaths > > > other OSs worked fine, should I use this API (i.e. type FilePath > > = String) to its fullest extent, my program will suddently become > > unportable to all Unix OSs. > > but what you propose cannot be used in Windows at all! while current > FilePath still works on Unix, with manual filenames en/decoding Now we got back on topic! :) The FilePath datatype is OS-dependent and making it abstract should be at least a first step. If you got it from somewhere else where it is already encoded, then fine. If you need to construct it, then you need to use a smart constructor. If you need to show/print it, then you need to convert it to String. And so on. Cheers, -- Felipe. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[2]: [Haskell-cafe] Core packages and locale support
Hello Felipe, Saturday, June 26, 2010, 4:54:16 PM, you wrote: >> > Even if we said "we don't care", we at least should change >> > FilePath to be [Word8], and not [String]. Currently filepaths > other OSs worked fine, should I use this API (i.e. type FilePath > = String) to its fullest extent, my program will suddently become > unportable to all Unix OSs. but what you propose cannot be used in Windows at all! while current FilePath still works on Unix, with manual filenames en/decoding -- Best regards, Bulatmailto:bulat.zigans...@gmail.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Core packages and locale support
On Sat, Jun 26, 2010 at 04:48:39PM +0400, Bulat Ziganshin wrote: > Saturday, June 26, 2010, 4:44:20 PM, Felipe Lessa wrote: > > Even if we said "we don't care", we at least should change > > FilePath to be [Word8], and not [String]. Currently filepaths > > are silently "truncated" if any codepoint is beyond 255. > > and there is no OS except Unix ;) Of course there is, however we should use the least common denominator if we want to create portable programs. Even if other OSs worked fine, should I use this API (i.e. type FilePath = String) to its fullest extent, my program will suddently become unportable to all Unix OSs. Cheers, -- Felipe. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[2]: [Haskell-cafe] Core packages and locale support
Hello Felipe, Saturday, June 26, 2010, 4:44:20 PM, you wrote: > Even if we said "we don't care", we at least should change > FilePath to be [Word8], and not [String]. Currently filepaths > are silently "truncated" if any codepoint is beyond 255. and there is no OS except Unix ;) -- Best regards, Bulatmailto:bulat.zigans...@gmail.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Core packages and locale support
On Sat, Jun 26, 2010 at 09:29:29AM +0300, Roman Beslik wrote: > Incorrect encoding of filepaths is common in e.g. Cyrillic Linux > (because of multiple possible encodings --- CP1251, KOI8-R, UTF-8) > and is solved by fiddling with the current locale and media mount > options. No need to change a program, or to tell character encoding > to a program. It is not a programming language issue. If your program saves files using filepaths given by the user or created programatically from another filepath, then you don't need to decode/encode anything and the problem isn't in the programming language. However, suppose your program needs to create a file with a name based on a database information. Your database is UTF-8. How do you translate that UTF-8 data into a filepath? This is the problem we got in Haskell. We have a nice coding-agnostic String datatype, but we don't know how to create a file with this very name. The opposite also may also be problem. Okay, you got an already correctly-encoded filepath. But you want to store this information in your database. Now, you have two options: a) Save the enconded filepath. Each record of your database will potentially have a different encoding, which is very bad. b) Recode into, say, UTF-8. But to do that you need to know the original coding using in the filepath, so we got the same problem above. Even if we said "we don't care", we at least should change FilePath to be [Word8], and not [String]. Currently filepaths are silently "truncated" if any codepoint is beyond 255. Cheers, -- Felipe. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Core packages and locale support
On 25.06.10 20:09, Jason Dagit wrote: you got everything right here. So, as you said, there is a mismatch between representation in Haskell (list of code points) and representation in the operating system (list of bytes), so we need to know the encoding. Encoding is supplied by the user via locale (https://secure.wikimedia.org/wikipedia/en/wiki/Locale), particularly LC_CTYPE variable. The problem with encodings is not new -- it was already solved e.g. for input/output. This is the part where I don't understand the problem well. I thought that with IO the program assumes the locale of the environment but that with filepaths you don't know what locale (more specifically which encoding) they were created with. So if you try to treat them as having the locale of the current environment you run the risk of misunderstanding their encoding. Incorrect encoding of filepaths is common in e.g. Cyrillic Linux (because of multiple possible encodings --- CP1251, KOI8-R, UTF-8) and is solved by fiddling with the current locale and media mount options. No need to change a program, or to tell character encoding to a program. It is not a programming language issue. -- Best regards, Roman Beslik. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Core packages and locale support
On Fri, Jun 25, 2010 at 3:15 PM, Brandon S Allbery KF8NH < allb...@ece.cmu.edu> wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 6/25/10 17:56 , Roman Cheplyaka wrote: > > * Brandon S Allbery KF8NH [2010-06-25 > 05:00:08-0400] > >> You might want to look at how Python is dealing with this (including the > >> pain involved; best to learn from example). > > > > Do you mean the pain when filenames can not be decoded using current > > locale settings and thus the files are not accessible? (The same about > > environment variables.) > > Yes, this. > > > Agreed, it's unpleasant. The other way would be changing [Char] to > [Word8] > > or ByteString. But this would a) break all existing programs and b) be > > an OS-specific hack. Crap. > > But it *is* OS-specific, just as Windows' UTF-16 is an OS-specific > mechanism. Unfortunately, there's no good solution in the Unix case aside > from assuming a specific encoding, and the locale is as good as any; but I > think LC_CTYPE is probably the most applicable. This will, however, > confuse > everyone else. > > Perhaps best is to look at whether there is any consensus building as to > how > to resolve it, and if not use locale but document it as an unstable > interface. Or possibly just leave things as is until consensus develops. > It would be Bad to choose one (say, locale) only to have everyone else go > in > a different direction (say, UTF-8 with the application libraries > potentially > re-encoding filenames). > In the case of IO you can disable the locale specific encoding/decoding by switching to binary mode. Would a similar API be available when working with filepaths? Darcs, for instance, deals with lots of file paths and has very specific requirements. Losing access to files due to bad encodings, or mistaken encodings, is the sort of thing that would break some people's repositories. So tools like Darcs would probably need a way to disable this sort of automatic encoding/decoding. Jason ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Core packages and locale support
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 6/25/10 17:56 , Roman Cheplyaka wrote: > * Brandon S Allbery KF8NH [2010-06-25 05:00:08-0400] >> You might want to look at how Python is dealing with this (including the >> pain involved; best to learn from example). > > Do you mean the pain when filenames can not be decoded using current > locale settings and thus the files are not accessible? (The same about > environment variables.) Yes, this. > Agreed, it's unpleasant. The other way would be changing [Char] to [Word8] > or ByteString. But this would a) break all existing programs and b) be > an OS-specific hack. Crap. But it *is* OS-specific, just as Windows' UTF-16 is an OS-specific mechanism. Unfortunately, there's no good solution in the Unix case aside from assuming a specific encoding, and the locale is as good as any; but I think LC_CTYPE is probably the most applicable. This will, however, confuse everyone else. Perhaps best is to look at whether there is any consensus building as to how to resolve it, and if not use locale but document it as an unstable interface. Or possibly just leave things as is until consensus develops. It would be Bad to choose one (say, locale) only to have everyone else go in a different direction (say, UTF-8 with the application libraries potentially re-encoding filenames). (The flip side of *that*, of course, is that everyone else (save GvR) may be waiting for the same thing. Which is why we look around first. At worst it may be a reason to push for something, perhaps as part of LSB and then assumed anywhere that doesn't have its own solution. I think the main issues there would be *BSD, which is used to being on the wrong end of the Linux stick, and Solaris which Oracle has helpfully (effectively; nobody manning the rudders) nuked from orbit. :/ ) - -- brandon s. allbery [linux,solaris,freebsd,perl] allb...@kf8nh.com system administrator [openafs,heimdal,too many hats] allb...@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwlKo8ACgkQIn7hlCsL25W4YwCfYeWFXWMiE6FqoODYVNv4jK4c LusAnRwi839s9l6bnNj7tcXUTu1i1BGU =7L0U -END PGP SIGNATURE- ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Core packages and locale support
* Brandon S Allbery KF8NH [2010-06-25 05:00:08-0400] > On 6/25/10 02:42 , Roman Cheplyaka wrote: > > * Jason Dagit [2010-06-24 20:52:03-0700] > >> On Sat, Jun 19, 2010 at 1:06 AM, Roman Cheplyaka wrote: > >>> While ghc 6.12 finally has proper locale support, core packages (such as > >>> unix) still use withCString and therefore work incorrectly when argument > >>> (e.g. file path) is not ASCII. > >> > >> Pardon me if I'm misunderstanding withCString, but my understanding of unix > >> paths is that they are to be treated as strings of bytes. That is, unlike > >> windows, they do not have an encoding predefined. Furthermore, you could > >> have two filepaths in the same directory with different encodings due to > >> this. > > > > you got everything right here. So, as you said, there is a mismatch > > between representation in Haskell (list of code points) and > > representation in the operating system (list of bytes), so we need to > > know the encoding. Encoding is supplied by the user via locale > > (https://secure.wikimedia.org/wikipedia/en/wiki/Locale), particularly > > LC_CTYPE variable. > > You might want to look at how Python is dealing with this (including the > pain involved; best to learn from example). Do you mean the pain when filenames can not be decoded using current locale settings and thus the files are not accessible? (The same about environment variables.) Agreed, it's unpleasant. The other way would be changing [Char] to [Word8] or ByteString. But this would a) break all existing programs and b) be an OS-specific hack. Crap. Brandon, do you have any ideas on how we should proceed with this? -- Roman I. Cheplyaka :: http://ro-che.info/ "Don't let school get in the way of your education." - Mark Twain ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Core packages and locale support
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 6/25/10 17:05 , Roman Cheplyaka wrote: > By the way, GTK (which internally uses UTF-8 for strings) treats this > problem differently -- it has special variable G_FILENAME_ENCODING and > also G_BROKEN_FILENAMES (which means that filenames are encoded as > locale says). I have no clue how their G_* variables are better than our > conventional LC_* variables though. > http://www.gtk.org/api/2.6/glib/glib-Character-Set-Conversion.html I would assume what they really mean by this is that the filename encoding should be part of the file metadata and G_BROKEN_FILENAMES means it isn't. G_FILENAME_ENCODING would then be the encoding used when creating new files. - -- brandon s. allbery [linux,solaris,freebsd,perl] allb...@kf8nh.com system administrator [openafs,heimdal,too many hats] allb...@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwlHcAACgkQIn7hlCsL25X1SQCgq6z+2CbiPbw4ECSABZaKmAhU 2PgAoLcK2SQAeyvLqWnr7cEz3uMCN98C =kdp+ -END PGP SIGNATURE- ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Core packages and locale support
* Jason Dagit [2010-06-25 10:09:21-0700] > On Thu, Jun 24, 2010 at 11:42 PM, Roman Cheplyaka wrote: > > > * Jason Dagit [2010-06-24 20:52:03-0700] > > > On Sat, Jun 19, 2010 at 1:06 AM, Roman Cheplyaka > > wrote: > > > > > > > While ghc 6.12 finally has proper locale support, core packages (such > > as > > > > unix) still use withCString and therefore work incorrectly when > > argument > > > > (e.g. file path) is not ASCII. > > > > > > > > > > Pardon me if I'm misunderstanding withCString, but my understanding of > > unix > > > paths is that they are to be treated as strings of bytes. That is, > > unlike > > > windows, they do not have an encoding predefined. Furthermore, you could > > > have two filepaths in the same directory with different encodings due to > > > this. > > > > > > In this case, what would be the correct way of handling the paths? > > > Converting to a Haskell String would require knowing the encoding, > > right? > > > My reasoning is that Haskell Char type is meant to correspond to code > > > points so putting them into a string means you have to know their code > > point > > > which is different from their (multi-)byte value right? > > > > > > Perhaps I have some details wrong? If so, please clarify. > > > > Jason, > > > > you got everything right here. So, as you said, there is a mismatch > > between representation in Haskell (list of code points) and > > representation in the operating system (list of bytes), so we need to > > know the encoding. Encoding is supplied by the user via locale > > (https://secure.wikimedia.org/wikipedia/en/wiki/Locale), particularly > > LC_CTYPE variable. > > > > The problem with encodings is not new -- it was already solved e.g. for > > input/output. > > > > This is the part where I don't understand the problem well. I thought that > with IO the program assumes the locale of the environment but that with > filepaths you don't know what locale (more specifically which encoding) they > were created with. So if you try to treat them as having the locale of the > current environment you run the risk of misunderstanding their encoding. Sure you do. But there is no other source of encoding information apart from the current locale. So UNIX (currently) puts the responsibility on the user. It's hard to give convincing examples demonstrating this semantics because UNIX userspace is mostly written in C and there char is just a byte, so most of them don't bother with encoding and decoding. Difference between IO and filenames is vague -- what if you pipe ls(1) to some program? Since ls does no recoding, encoding filenames differently from locale is a bad idea. By the way, GTK (which internally uses UTF-8 for strings) treats this problem differently -- it has special variable G_FILENAME_ENCODING and also G_BROKEN_FILENAMES (which means that filenames are encoded as locale says). I have no clue how their G_* variables are better than our conventional LC_* variables though. http://www.gtk.org/api/2.6/glib/glib-Character-Set-Conversion.html -- Roman I. Cheplyaka :: http://ro-che.info/ "Don't let school get in the way of your education." - Mark Twain ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Core packages and locale support
On Thu, Jun 24, 2010 at 11:42 PM, Roman Cheplyaka wrote: > * Jason Dagit [2010-06-24 20:52:03-0700] > > On Sat, Jun 19, 2010 at 1:06 AM, Roman Cheplyaka > wrote: > > > > > While ghc 6.12 finally has proper locale support, core packages (such > as > > > unix) still use withCString and therefore work incorrectly when > argument > > > (e.g. file path) is not ASCII. > > > > > > > Pardon me if I'm misunderstanding withCString, but my understanding of > unix > > paths is that they are to be treated as strings of bytes. That is, > unlike > > windows, they do not have an encoding predefined. Furthermore, you could > > have two filepaths in the same directory with different encodings due to > > this. > > > > In this case, what would be the correct way of handling the paths? > > Converting to a Haskell String would require knowing the encoding, > right? > > My reasoning is that Haskell Char type is meant to correspond to code > > points so putting them into a string means you have to know their code > point > > which is different from their (multi-)byte value right? > > > > Perhaps I have some details wrong? If so, please clarify. > > Jason, > > you got everything right here. So, as you said, there is a mismatch > between representation in Haskell (list of code points) and > representation in the operating system (list of bytes), so we need to > know the encoding. Encoding is supplied by the user via locale > (https://secure.wikimedia.org/wikipedia/en/wiki/Locale), particularly > LC_CTYPE variable. > > The problem with encodings is not new -- it was already solved e.g. for > input/output. > This is the part where I don't understand the problem well. I thought that with IO the program assumes the locale of the environment but that with filepaths you don't know what locale (more specifically which encoding) they were created with. So if you try to treat them as having the locale of the current environment you run the risk of misunderstanding their encoding. Jason ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Core packages and locale support
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 6/25/10 02:42 , Roman Cheplyaka wrote: > * Jason Dagit [2010-06-24 20:52:03-0700] >> On Sat, Jun 19, 2010 at 1:06 AM, Roman Cheplyaka wrote: >>> While ghc 6.12 finally has proper locale support, core packages (such as >>> unix) still use withCString and therefore work incorrectly when argument >>> (e.g. file path) is not ASCII. >> >> Pardon me if I'm misunderstanding withCString, but my understanding of unix >> paths is that they are to be treated as strings of bytes. That is, unlike >> windows, they do not have an encoding predefined. Furthermore, you could >> have two filepaths in the same directory with different encodings due to >> this. > > you got everything right here. So, as you said, there is a mismatch > between representation in Haskell (list of code points) and > representation in the operating system (list of bytes), so we need to > know the encoding. Encoding is supplied by the user via locale > (https://secure.wikimedia.org/wikipedia/en/wiki/Locale), particularly > LC_CTYPE variable. You might want to look at how Python is dealing with this (including the pain involved; best to learn from example). - -- brandon s. allbery [linux,solaris,freebsd,perl] allb...@kf8nh.com system administrator [openafs,heimdal,too many hats] allb...@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkwkcAYACgkQIn7hlCsL25W4BgCfVEyndklgo2TOyyemqdTKGkvS dBMAoKq3t9vMOkZZHiEHkIN5IDjgVbRt =69C5 -END PGP SIGNATURE- ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Core packages and locale support
* Jason Dagit [2010-06-24 20:52:03-0700] > On Sat, Jun 19, 2010 at 1:06 AM, Roman Cheplyaka wrote: > > > While ghc 6.12 finally has proper locale support, core packages (such as > > unix) still use withCString and therefore work incorrectly when argument > > (e.g. file path) is not ASCII. > > > > Pardon me if I'm misunderstanding withCString, but my understanding of unix > paths is that they are to be treated as strings of bytes. That is, unlike > windows, they do not have an encoding predefined. Furthermore, you could > have two filepaths in the same directory with different encodings due to > this. > > In this case, what would be the correct way of handling the paths? > Converting to a Haskell String would require knowing the encoding, right? > My reasoning is that Haskell Char type is meant to correspond to code > points so putting them into a string means you have to know their code point > which is different from their (multi-)byte value right? > > Perhaps I have some details wrong? If so, please clarify. Jason, you got everything right here. So, as you said, there is a mismatch between representation in Haskell (list of code points) and representation in the operating system (list of bytes), so we need to know the encoding. Encoding is supplied by the user via locale (https://secure.wikimedia.org/wikipedia/en/wiki/Locale), particularly LC_CTYPE variable. The problem with encodings is not new -- it was already solved e.g. for input/output. As I said, I'm willing to prepare the patches, but I really need a mentor for this. -- Roman I. Cheplyaka :: http://ro-che.info/ "Don't let school get in the way of your education." - Mark Twain ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Core packages and locale support
On Sat, Jun 19, 2010 at 1:06 AM, Roman Cheplyaka wrote: > While ghc 6.12 finally has proper locale support, core packages (such as > unix) still use withCString and therefore work incorrectly when argument > (e.g. file path) is not ASCII. > Pardon me if I'm misunderstanding withCString, but my understanding of unix paths is that they are to be treated as strings of bytes. That is, unlike windows, they do not have an encoding predefined. Furthermore, you could have two filepaths in the same directory with different encodings due to this. In this case, what would be the correct way of handling the paths? Converting to a Haskell String would require knowing the encoding, right? My reasoning is that Haskell Char type is meant to correspond to code points so putting them into a string means you have to know their code point which is different from their (multi-)byte value right? Perhaps I have some details wrong? If so, please clarify. Jason ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Core packages and locale support
While ghc 6.12 finally has proper locale support, core packages (such as unix) still use withCString and therefore work incorrectly when argument (e.g. file path) is not ASCII. Is someone already working on this? If it's just a matter of time and manpower I can help but I need some guidance from authority. -- Roman I. Cheplyaka :: http://ro-che.info/ "Don't let school get in the way of your education." - Mark Twain ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe