Bug#374510: brazilian-conjugate: error messages in UTF-8 locale
* Jakson A. Aquino [EMAIL PROTECTED] [2006-06-19 21:45]: On Tue, Jun 20, 2006 at 12:31:51AM +0200, Rafael Laboissiere wrote: I tried the above but it does not work for me. How should I set my locale variables such that it works? By default, the UTF-8 locales aren't available in a Debian installation. To set my locale to UTF-8, I run dpkg-reconfigure locales and replaced the ISO-8859-1 entries with UTF-8 ones. But it's possible to have both pt_BR.ISO-8859-1 and pt_BR.UTF-8 in the same system. I also put the following lines in my .bashrc: export LANGUAGE=pt_BR.UTF-8 export LANG=pt_BR.UTF-8 Perhaps the above lines are unnecessary in my case because I no longer have the ISO-8859-1 locale. I also had to change the settings of some applications, and the following web page was useful: http://melkor.dnp.fmph.uniba.sk/~garabik/debian-utf8/HOWTO/howto.html Thanks for the instruction but I cannot replicate your findings. I need a precise cookbook. Here is what I did: $ dpkg-reconfigure locales # Here I have activated pt_BR.UTF-8 $ cd /var/tmp $ cp /usr/bin/conjugue . $ cp /usr/lib/brazilian-conjugate/verbos . $ recode l1..utf8 conjugue $ recode l1..utf8 verbos $ perl -pi -e 's|/usr/lib/brazilian-conjugate|.|' conjugue $ LC_ALL=pt_BR.UTF-8 LC_CTYPE=pt_BR.UTF-8 LANGUAGE=pt_BR.UTF-8 LANG=pt_BR.UTF-8 ./conjugue The last command gives the 200 lines of error messages that you observed before. Is this the case for you too? I am wondering whether gawk (which is the /etc/alternative/awk in my system) supports UTF-8. Cheers, -- Rafael -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#374510: brazilian-conjugate: error messages in UTF-8 locale
On Tue, Jun 20, 2006 at 08:18:52PM +0200, Rafael Laboissiere wrote: * Jakson A. Aquino [EMAIL PROTECTED] [2006-06-20 14:34]: If my guess is correct the problem happens when gawk is called in a locale and the files conjugue and verbos were encoded in a different locale. I found the source of the problem: I was doing the tests in a system with gawk 3.1.4. When I did it in my up-to-date unstable chroot with gawk 3.1.5, I could replicate what you reported. I am now about to do the following for the package brazilian-conjugate: * Install the original conjuge script into /usr/bin/conjugue-ISO-8859-1. * Create /usr/bin/conjugue-UTF-8 with the recode command as you suggested. * Create the appropriate /usr/lib/brazilian-conjugate/verbos-char-enc files and change the content fo /usr/bin/conjugue-char-enc accordingly. * Create a simple wrapper script /usr/bin/conjugue that would call the appropriate /usr/bin/conjugue-char-enc according to the current locale, something like the following: #!/usr/bin/perl -w my $encoding = ISO-8859-1; # default value $ENV{LANG} =~ /[a-z]{2}?(?:(?:_[A-Z]{2}?)?(?:\.(.*))?)?/; $encoding = $1 if defined $1; system (/usr/bin/conjugue-$encoding, @ARGV) Notice that the script above relies on the environment variable LANG. Do you think that this would be okay? I tested and it worked. I had only to change line 1810 of conjugue-* to fix the name of the verbos-* files. But it doesn't work if I simply export my locale as: $ export LC_ALL=pt_BR $ export LANG=pt_BR I configured my system to have en_US.UTF-8 as default locale and added pt_BR.UTF-8 and pt_BR.ISO-8859-1 as other available locales. I think that when I don't specify the charset encoding it defaults to UTF-8, and not to ISO-8859-1, as assumed by the wrapper script. One possible solution would be do not assume any default charset and make the script exit if the encoding wasn't found in the locale string. In this case, the output wold be a help message (in Portuguese and English) teaching how to make an unambiguous specification of the locale. This is just a suggestion. It certainly would be better if the script could always discover what's the correct encoding. Best regards, Jakson -- Jakson -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#374510: brazilian-conjugate: error messages in UTF-8 locale
* Jakson A. Aquino [EMAIL PROTECTED] [2006-06-20 17:06]: On Tue, Jun 20, 2006 at 08:18:52PM +0200, Rafael Laboissiere wrote: I am now about to do the following for the package brazilian-conjugate: * Install the original conjuge script into /usr/bin/conjugue-ISO-8859-1. * Create /usr/bin/conjugue-UTF-8 with the recode command as you suggested. * Create the appropriate /usr/lib/brazilian-conjugate/verbos-char-enc files and change the content fo /usr/bin/conjugue-char-enc accordingly. * Create a simple wrapper script /usr/bin/conjugue that would call the appropriate /usr/bin/conjugue-char-enc according to the current locale, something like the following: [...] I tested and it worked. I had only to change line 1810 of conjugue-* to fix the name of the verbos-* files. This is in my third point above, but not very explicitely written. But it doesn't work if I simply export my locale as: $ export LC_ALL=pt_BR $ export LANG=pt_BR I configured my system to have en_US.UTF-8 as default locale and added pt_BR.UTF-8 and pt_BR.ISO-8859-1 as other available locales. I think that when I don't specify the charset encoding it defaults to UTF-8, and not to ISO-8859-1, as assumed by the wrapper script. One possible solution would be do not assume any default charset and make the script exit if the encoding wasn't found in the locale string. In this case, the output wold be a help message (in Portuguese and English) teaching how to make an unambiguous specification of the locale. This is just a suggestion. It certainly would be better if the script could always discover what's the correct encoding. I will try to discover whether it is possible to discover the correct encoding. How can it be that in your system pt_BR defaults to pt_BR.UTF-8? -- Rafael -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#374510: brazilian-conjugate: error messages in UTF-8 locale
On Tue, Jun 20, 2006 at 10:36:33PM +0200, Rafael Laboissiere wrote: * Jakson A. Aquino [EMAIL PROTECTED] [2006-06-20 17:06]: On Tue, Jun 20, 2006 at 08:18:52PM +0200, Rafael Laboissiere wrote: I am now about to do the following for the package brazilian-conjugate: * Install the original conjuge script into /usr/bin/conjugue-ISO-8859-1. * Create /usr/bin/conjugue-UTF-8 with the recode command as you suggested. * Create the appropriate /usr/lib/brazilian-conjugate/verbos-char-enc files and change the content fo /usr/bin/conjugue-char-enc accordingly. * Create a simple wrapper script /usr/bin/conjugue that would call the appropriate /usr/bin/conjugue-char-enc according to the current locale, something like the following: [...] I tested and it worked. I had only to change line 1810 of conjugue-* to fix the name of the verbos-* files. This is in my third point above, but not very explicitely written. Sorry, I missed the point. But it doesn't work if I simply export my locale as: $ export LC_ALL=pt_BR $ export LANG=pt_BR I configured my system to have en_US.UTF-8 as default locale and added pt_BR.UTF-8 and pt_BR.ISO-8859-1 as other available locales. I think that when I don't specify the charset encoding it defaults to UTF-8, and not to ISO-8859-1, as assumed by the wrapper script. One possible solution would be do not assume any default charset and make the script exit if the encoding wasn't found in the locale string. In this case, the output wold be a help message (in Portuguese and English) teaching how to make an unambiguous specification of the locale. This is just a suggestion. It certainly would be better if the script could always discover what's the correct encoding. I will try to discover whether it is possible to discover the correct encoding. How can it be that in your system pt_BR defaults to pt_BR.UTF-8? I configured my .bashrc and restarted the session four times exporting the following values to LANG and LANGUAGE: VALUE RESULT pt_BR The lines of error pt_BR.ISO-8859-1 The lines of error pt_BR.UTF-8OK (nothing) OK I manually configured my system to UTF-8, and I'm not a expert in this issue. Probably I put something in a configuration file that isn't standard in Linux systems configured to UTF-8, but I don't know what I did wrong/different. Anyway, your correction to conjugue is working fine here as long as I either set my locale to *.UTF-8 or do not set it. Best regards, Jakson -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#374510: brazilian-conjugate: error messages in UTF-8 locale
* Jakson A. Aquino [EMAIL PROTECTED] [2006-06-20 19:00]: I manually configured my system to UTF-8, and I'm not a expert in this issue. Probably I put something in a configuration file that isn't standard in Linux systems configured to UTF-8, but I don't know what I did wrong/different. Anyway, your correction to conjugue is working fine here as long as I either set my locale to *.UTF-8 or do not set it. I think that I understand what is going on. The default locale in your system should be en_US.UTF-8. In this case, the default charmap for you should be UTF-8. When you do: export LANG=pt_BR Then the system will pick the locale pt_BR.UTF-8, inheriting the charmap from the default. At any event, I found the way to know which is the current charmap in a system: locale charmap Here is an example: $ export LC_CTYPE=en_US.ISO-8859-1 $ export LANG=pt_BR $ locale charmap ISO-8859-1 $ export LC_CTYPE=en_US.UTF-8 $ locale charmap UTF-8 Could you please try the modified conjugue script below: #!/usr/bin/perl -w my $encoding = (my @lines = `locale charmap`)[-1]; chomp $encoding; my $script = /usr/bin/conjugue-$encoding; if (-f $script) { system ($script, @ARGV); } else { die Current locale charmap `$encoding' is unknown to conjugue.\n . Accepted charmaps are UTF-8 and ISO-8859-1.\n; } -- Rafael -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#374510: brazilian-conjugate: error messages in UTF-8 locale
On Wed, Jun 21, 2006 at 02:26:40AM +0200, Rafael Laboissiere wrote: Could you please try the modified conjugue script below: #!/usr/bin/perl -w my $encoding = (my @lines = `locale charmap`)[-1]; chomp $encoding; my $script = /usr/bin/conjugue-$encoding; if (-f $script) { system ($script, @ARGV); } else { die Current locale charmap `$encoding' is unknown to conjugue.\n . Accepted charmaps are UTF-8 and ISO-8859-1.\n; } Your script seems to be very robust now! I used the commands locale and locale charmap to check what was happening and you are correct: if no charmap is set by the user, the system inherits the default one. I reconfigured my locales and set none as the default system locale, and didn't set my personal locale. If I open either xterm or console I get: $ locale LANG= LC_CTYPE=POSIX LC_NUMERIC=POSIX LC_TIME=POSIX LC_COLLATE=POSIX LC_MONETARY=POSIX LC_MESSAGES=POSIX LC_PAPER=POSIX LC_NAME=POSIX LC_ADDRESS=POSIX LC_TELEPHONE=POSIX LC_MEASUREMENT=POSIX LC_IDENTIFICATION=POSIX LC_ALL= $ conjugue Current locale charmap `ANSI_X3.4-1968' is unknown to conjugue. Accepted charmaps are UTF-8 and ISO-8859-1. However, if I open uxterm it automatically converts LC_CTYPE into en_US.UTF-8 and conjugue does its job and calls conjugue-UTF-8. It seems that the script will never print the hundreds of lines of error messages. I think that The worst thing that might happen is the existence of a mismatch between the user's locale configuration and the charmap of his terminal. But in this case all applications that output non-ASCII characters will print strange letters. Thanks, Jakson -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#374510: brazilian-conjugate: error messages in UTF-8 locale
Package: brazilian-conjugate Version: 2.4.really.3.0.beta4-9.1 Severity: normal If the locale is UTF-8, conjugue outputs more than 200 lines of error before conjugating the verb, and both the error messages and the conjugated verb are output in ISO-8859-1 charset. Below are some of the error lines: vogal não normalizada: w=, y= (apoi:ap) cuidado: mpc do verbo apoiar (ap) não resolvido ... 248 lines omitted ... conjugue: erro na linha 5219 do banco destroçar não é verbo Solution: I don't know how to solve the problem in the Debian package, but here I recoded two files into UTF-8: # recode l1..utf8 /usr/bin/conjugue # recode l1..utf8 /usr/lib/brazilian-conjugate/verbos Best regards, Jakson -- System Information: Debian Release: testing/unstable APT prefers testing APT policy: (500, 'testing') Architecture: amd64 (x86_64) Shell: /bin/sh linked to /bin/bash Kernel: Linux 2.6.12-1-amd64-k8 Locale: LANG=pt_BR.UTF-8, LC_CTYPE=pt_BR.UTF-8 (charmap=UTF-8) -- no debconf information -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#374510: brazilian-conjugate: error messages in UTF-8 locale
* Jakson Aquino [EMAIL PROTECTED] [2006-06-19 15:39]: Package: brazilian-conjugate Version: 2.4.really.3.0.beta4-9.1 Severity: normal If the locale is UTF-8, conjugue outputs more than 200 lines of error before conjugating the verb, and both the error messages and the conjugated verb are output in ISO-8859-1 charset. Below are some of the error lines: vogal não normalizada: w=, y= (apoi:ap) cuidado: mpc do verbo apoiar (ap) não resolvido ... 248 lines omitted ... conjugue: erro na linha 5219 do banco destroçar não é verbo Solution: I don't know how to solve the problem in the Debian package, but here I recoded two files into UTF-8: # recode l1..utf8 /usr/bin/conjugue # recode l1..utf8 /usr/lib/brazilian-conjugate/verbos I tried the above but it does not work for me. How should I set my locale variables such that it works? -- Rafael -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#374510: brazilian-conjugate: error messages in UTF-8 locale
On Tue, Jun 20, 2006 at 12:31:51AM +0200, Rafael Laboissiere wrote: I tried the above but it does not work for me. How should I set my locale variables such that it works? By default, the UTF-8 locales aren't available in a Debian installation. To set my locale to UTF-8, I run dpkg-reconfigure locales and replaced the ISO-8859-1 entries with UTF-8 ones. But it's possible to have both pt_BR.ISO-8859-1 and pt_BR.UTF-8 in the same system. I also put the following lines in my .bashrc: export LANGUAGE=pt_BR.UTF-8 export LANG=pt_BR.UTF-8 Perhaps the above lines are unnecessary in my case because I no longer have the ISO-8859-1 locale. I also had to change the settings of some applications, and the following web page was useful: http://melkor.dnp.fmph.uniba.sk/~garabik/debian-utf8/HOWTO/howto.html Best regards, Jakson -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]