Bug#374510: brazilian-conjugate: error messages in UTF-8 locale

2006-06-20 Thread Rafael Laboissiere
* Jakson A. Aquino [EMAIL PROTECTED] [2006-06-19 21:45]:

 On Tue, Jun 20, 2006 at 12:31:51AM +0200, Rafael Laboissiere wrote:
  I tried the above but it does not work for me.  How should I set my
  locale variables such that it works?
 
 By default, the UTF-8 locales aren't available in a Debian
 installation. To set my locale to UTF-8, I run dpkg-reconfigure
 locales and replaced the ISO-8859-1 entries with UTF-8 ones.  But
 it's possible to have both pt_BR.ISO-8859-1 and pt_BR.UTF-8 in the
 same system. I also put the following lines in my .bashrc:
 
 export LANGUAGE=pt_BR.UTF-8
 export LANG=pt_BR.UTF-8
 
 Perhaps the above lines are unnecessary in my case because I no longer
 have the ISO-8859-1 locale. I also had to change the settings of some
 applications, and the following web page was useful:
 
 http://melkor.dnp.fmph.uniba.sk/~garabik/debian-utf8/HOWTO/howto.html

Thanks for the instruction but I cannot replicate your findings.  I
need a precise cookbook.  Here is what I did:

$ dpkg-reconfigure locales # Here I have activated pt_BR.UTF-8
$ cd /var/tmp
$ cp /usr/bin/conjugue .
$ cp /usr/lib/brazilian-conjugate/verbos .
$ recode l1..utf8 conjugue
$ recode l1..utf8 verbos
$ perl -pi -e 's|/usr/lib/brazilian-conjugate|.|' conjugue
$ LC_ALL=pt_BR.UTF-8 LC_CTYPE=pt_BR.UTF-8 LANGUAGE=pt_BR.UTF-8 LANG=pt_BR.UTF-8 
./conjugue

The last command gives the 200 lines of error messages that you observed
before.  Is this the case for you too?  I am wondering whether gawk
(which is the /etc/alternative/awk in my system) supports UTF-8.

Cheers,

-- 
Rafael


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#374510: brazilian-conjugate: error messages in UTF-8 locale

2006-06-20 Thread Jakson A. Aquino
On Tue, Jun 20, 2006 at 08:18:52PM +0200, Rafael Laboissiere wrote:
 * Jakson A. Aquino [EMAIL PROTECTED] [2006-06-20 14:34]:
 
  If my guess is correct the problem happens when gawk is called in a
  locale and the files conjugue and verbos were encoded in a
  different locale.
 
 I found the source of the problem: I was doing the tests in a system with
 gawk 3.1.4.  When I did it in my up-to-date unstable chroot with gawk
 3.1.5, I could replicate what you reported.
 
 I am now about to do the following for the package brazilian-conjugate:
 
   * Install the original conjuge script into /usr/bin/conjugue-ISO-8859-1.
   * Create /usr/bin/conjugue-UTF-8 with the recode command as you
 suggested.
   * Create the appropriate /usr/lib/brazilian-conjugate/verbos-char-enc
 files and change the content fo /usr/bin/conjugue-char-enc
 accordingly.
   * Create a simple wrapper script /usr/bin/conjugue that would call the
 appropriate /usr/bin/conjugue-char-enc according to the current
 locale, something like the following:
   
 #!/usr/bin/perl -w
 my $encoding = ISO-8859-1;   # default value
 $ENV{LANG} =~ /[a-z]{2}?(?:(?:_[A-Z]{2}?)?(?:\.(.*))?)?/;
 $encoding = $1 if defined $1;
 system (/usr/bin/conjugue-$encoding, @ARGV)
 
 Notice that the script above relies on the environment variable LANG.  Do
 you think that this would be okay?

I tested and it worked. I had only to change line 1810 of conjugue-*
to fix the name of the verbos-* files. But it doesn't work if I simply
export my locale as:

  $ export LC_ALL=pt_BR
  $ export LANG=pt_BR

I configured my system to have en_US.UTF-8 as default locale and added
pt_BR.UTF-8 and pt_BR.ISO-8859-1 as other available locales. I think
that when I don't specify the charset encoding it defaults to UTF-8,
and not to ISO-8859-1, as assumed by the wrapper script. One possible
solution would be do not assume any default charset and make the
script exit if the encoding wasn't found in the locale string. In this
case, the output wold be a help message (in Portuguese and English)
teaching how to make an unambiguous specification of the locale. This
is just a suggestion. It certainly would be better if the script could
always discover what's the correct encoding.

Best regards,

Jakson

-- 
Jakson



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#374510: brazilian-conjugate: error messages in UTF-8 locale

2006-06-20 Thread Rafael Laboissiere
* Jakson A. Aquino [EMAIL PROTECTED] [2006-06-20 17:06]:

 On Tue, Jun 20, 2006 at 08:18:52PM +0200, Rafael Laboissiere wrote:
  I am now about to do the following for the package brazilian-conjugate:
  
* Install the original conjuge script into /usr/bin/conjugue-ISO-8859-1.
* Create /usr/bin/conjugue-UTF-8 with the recode command as you
  suggested.
* Create the appropriate /usr/lib/brazilian-conjugate/verbos-char-enc
  files and change the content fo /usr/bin/conjugue-char-enc
  accordingly.
* Create a simple wrapper script /usr/bin/conjugue that would call the
  appropriate /usr/bin/conjugue-char-enc according to the current
  locale, something like the following:

  [...]

 I tested and it worked. I had only to change line 1810 of conjugue-*
 to fix the name of the verbos-* files.

This is in my third point above, but not very explicitely written.

 But it doesn't work if I simply export my locale as:
 
   $ export LC_ALL=pt_BR
   $ export LANG=pt_BR
 
 I configured my system to have en_US.UTF-8 as default locale and added
 pt_BR.UTF-8 and pt_BR.ISO-8859-1 as other available locales. I think
 that when I don't specify the charset encoding it defaults to UTF-8,
 and not to ISO-8859-1, as assumed by the wrapper script. One possible
 solution would be do not assume any default charset and make the
 script exit if the encoding wasn't found in the locale string. In this
 case, the output wold be a help message (in Portuguese and English)
 teaching how to make an unambiguous specification of the locale. This
 is just a suggestion. It certainly would be better if the script could
 always discover what's the correct encoding.

I will try to discover whether it is possible to discover the correct
encoding.  How can it be that in your system pt_BR defaults to
pt_BR.UTF-8?
 
-- 
Rafael


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#374510: brazilian-conjugate: error messages in UTF-8 locale

2006-06-20 Thread Jakson A. Aquino
On Tue, Jun 20, 2006 at 10:36:33PM +0200, Rafael Laboissiere wrote:
 * Jakson A. Aquino [EMAIL PROTECTED] [2006-06-20 17:06]:
 
  On Tue, Jun 20, 2006 at 08:18:52PM +0200, Rafael Laboissiere wrote:
   I am now about to do the following for the package brazilian-conjugate:
   
 * Install the original conjuge script into /usr/bin/conjugue-ISO-8859-1.
 * Create /usr/bin/conjugue-UTF-8 with the recode command as you
   suggested.
 * Create the appropriate /usr/lib/brazilian-conjugate/verbos-char-enc
   files and change the content fo /usr/bin/conjugue-char-enc
   accordingly.
 * Create a simple wrapper script /usr/bin/conjugue that would call the
   appropriate /usr/bin/conjugue-char-enc according to the current
   locale, something like the following:
 
   [...]
 
  I tested and it worked. I had only to change line 1810 of conjugue-*
  to fix the name of the verbos-* files.
 
 This is in my third point above, but not very explicitely written.

Sorry, I missed the point.

 
  But it doesn't work if I simply export my locale as:
  
$ export LC_ALL=pt_BR
$ export LANG=pt_BR
  
  I configured my system to have en_US.UTF-8 as default locale and added
  pt_BR.UTF-8 and pt_BR.ISO-8859-1 as other available locales. I think
  that when I don't specify the charset encoding it defaults to UTF-8,
  and not to ISO-8859-1, as assumed by the wrapper script. One possible
  solution would be do not assume any default charset and make the
  script exit if the encoding wasn't found in the locale string. In this
  case, the output wold be a help message (in Portuguese and English)
  teaching how to make an unambiguous specification of the locale. This
  is just a suggestion. It certainly would be better if the script could
  always discover what's the correct encoding.
 
 I will try to discover whether it is possible to discover the correct
 encoding.  How can it be that in your system pt_BR defaults to
 pt_BR.UTF-8?

I configured my .bashrc and restarted the session four times exporting
the following values to LANG and LANGUAGE:

  VALUE  RESULT
  pt_BR  The lines of error
  pt_BR.ISO-8859-1   The lines of error
  pt_BR.UTF-8OK
  (nothing)  OK

I manually configured my system to UTF-8, and I'm not a expert in this
issue. Probably I put something in a configuration file that isn't
standard in Linux systems configured to UTF-8, but I don't know what I
did wrong/different. Anyway, your correction to conjugue is working
fine here as long as I either set my locale to *.UTF-8 or do not set
it.

Best regards,

Jakson


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#374510: brazilian-conjugate: error messages in UTF-8 locale

2006-06-20 Thread Rafael Laboissiere
* Jakson A. Aquino [EMAIL PROTECTED] [2006-06-20 19:00]:

 I manually configured my system to UTF-8, and I'm not a expert in this
 issue. Probably I put something in a configuration file that isn't
 standard in Linux systems configured to UTF-8, but I don't know what I
 did wrong/different. Anyway, your correction to conjugue is working
 fine here as long as I either set my locale to *.UTF-8 or do not set
 it.

I think that I understand what is going on.  The default locale in your
system should be en_US.UTF-8.  In this case, the default charmap for you
should be UTF-8.  When you do:

export LANG=pt_BR

Then the system will pick the locale pt_BR.UTF-8, inheriting the charmap
from the default.

At any event, I found the way to know which is the current charmap in a
system:

locale charmap

Here is an example:

$ export LC_CTYPE=en_US.ISO-8859-1
$ export LANG=pt_BR
$ locale charmap
ISO-8859-1
$ export LC_CTYPE=en_US.UTF-8
$ locale charmap
UTF-8

Could you please try the modified conjugue script below:

#!/usr/bin/perl -w
my $encoding = (my @lines = `locale charmap`)[-1];
chomp $encoding;
my $script = /usr/bin/conjugue-$encoding;
if (-f $script) {
  system ($script, @ARGV);
} else {
  die Current locale charmap `$encoding' is unknown to conjugue.\n
. Accepted charmaps are UTF-8 and ISO-8859-1.\n;
}


-- 
Rafael


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#374510: brazilian-conjugate: error messages in UTF-8 locale

2006-06-20 Thread Jakson A. Aquino
On Wed, Jun 21, 2006 at 02:26:40AM +0200, Rafael Laboissiere wrote:
 Could you please try the modified conjugue script below:
 
 #!/usr/bin/perl -w
 my $encoding = (my @lines = `locale charmap`)[-1];
 chomp $encoding;
 my $script = /usr/bin/conjugue-$encoding;
 if (-f $script) {
   system ($script, @ARGV);
 } else {
   die Current locale charmap `$encoding' is unknown to conjugue.\n
 . Accepted charmaps are UTF-8 and ISO-8859-1.\n;
 }
 

Your script seems to be very robust now! I used the commands locale
and locale charmap to check what was happening and you are correct:
if no charmap is set by the user, the system inherits the default one.
I reconfigured my locales and set none as the default system locale,
and didn't set my personal locale. If I open either xterm or console I
get:

$ locale
LANG=
LC_CTYPE=POSIX
LC_NUMERIC=POSIX
LC_TIME=POSIX
LC_COLLATE=POSIX
LC_MONETARY=POSIX
LC_MESSAGES=POSIX
LC_PAPER=POSIX
LC_NAME=POSIX
LC_ADDRESS=POSIX
LC_TELEPHONE=POSIX
LC_MEASUREMENT=POSIX
LC_IDENTIFICATION=POSIX
LC_ALL=
$ conjugue
Current locale charmap `ANSI_X3.4-1968' is unknown to conjugue.
Accepted charmaps are UTF-8 and ISO-8859-1.

However, if I open uxterm it automatically converts LC_CTYPE into
en_US.UTF-8 and conjugue does its job and calls conjugue-UTF-8.

It seems that the script will never print the hundreds of lines of
error messages. I think that The worst thing that might happen is the
existence of a mismatch between the user's locale configuration and
the charmap of his terminal. But in this case all applications that
output non-ASCII characters will print strange letters.

Thanks,

Jakson



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#374510: brazilian-conjugate: error messages in UTF-8 locale

2006-06-19 Thread Jakson Aquino
Package: brazilian-conjugate
Version: 2.4.really.3.0.beta4-9.1
Severity: normal

If the locale is UTF-8, conjugue outputs more than 200 lines of error
before conjugating the verb, and both the error messages and the
conjugated verb are output in ISO-8859-1 charset. Below are some of
the error lines:

  vogal não normalizada: w=, y= (apoi:ap)
  cuidado: mpc do verbo apoiar (ap) não resolvido
  ... 248 lines omitted ...
  conjugue: erro na linha 5219 do banco
  destroçar não é verbo

Solution: I don't know how to solve the problem in the Debian package,
but here I recoded two files into UTF-8:

  # recode l1..utf8 /usr/bin/conjugue
  # recode l1..utf8 /usr/lib/brazilian-conjugate/verbos

Best regards,

Jakson

-- System Information:
Debian Release: testing/unstable
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.12-1-amd64-k8
Locale: LANG=pt_BR.UTF-8, LC_CTYPE=pt_BR.UTF-8 (charmap=UTF-8)

-- no debconf information



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#374510: brazilian-conjugate: error messages in UTF-8 locale

2006-06-19 Thread Rafael Laboissiere
* Jakson Aquino [EMAIL PROTECTED] [2006-06-19 15:39]:

 Package: brazilian-conjugate
 Version: 2.4.really.3.0.beta4-9.1
 Severity: normal
 
 If the locale is UTF-8, conjugue outputs more than 200 lines of error
 before conjugating the verb, and both the error messages and the
 conjugated verb are output in ISO-8859-1 charset. Below are some of
 the error lines:
 
   vogal não normalizada: w=, y= (apoi:ap)
   cuidado: mpc do verbo apoiar (ap) não resolvido
   ... 248 lines omitted ...
   conjugue: erro na linha 5219 do banco
   destroçar não é verbo
 
 Solution: I don't know how to solve the problem in the Debian package,
 but here I recoded two files into UTF-8:
 
   # recode l1..utf8 /usr/bin/conjugue
   # recode l1..utf8 /usr/lib/brazilian-conjugate/verbos

I tried the above but it does not work for me.  How should I set my
locale variables such that it works?

-- 
Rafael


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#374510: brazilian-conjugate: error messages in UTF-8 locale

2006-06-19 Thread Jakson A. Aquino
On Tue, Jun 20, 2006 at 12:31:51AM +0200, Rafael Laboissiere wrote:
 I tried the above but it does not work for me.  How should I set my
 locale variables such that it works?

By default, the UTF-8 locales aren't available in a Debian
installation. To set my locale to UTF-8, I run dpkg-reconfigure
locales and replaced the ISO-8859-1 entries with UTF-8 ones.  But
it's possible to have both pt_BR.ISO-8859-1 and pt_BR.UTF-8 in the
same system. I also put the following lines in my .bashrc:

export LANGUAGE=pt_BR.UTF-8
export LANG=pt_BR.UTF-8

Perhaps the above lines are unnecessary in my case because I no longer
have the ISO-8859-1 locale. I also had to change the settings of some
applications, and the following web page was useful:

http://melkor.dnp.fmph.uniba.sk/~garabik/debian-utf8/HOWTO/howto.html

Best regards,

Jakson


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]