Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly.

2012-02-22 Thread Eric Blake
On 02/22/2012 10:02 PM, Linda Walsh wrote:
> 
> 
> Eric Blake wrote:
> 
>>
>> Don't think of it as 'wide-int', rather, think of it as 'the integral
>> type that both contains wchar_t and WEOF'.  You cannot write 'signed
>> wint_t' nor 'unsigned 'wint_t'.
> 
> 
> ---
> ?? You say don't think of it that way, but unless I missed something,
> just like wchar stood for 'wide char', (and char's have always been
> signed or unsigned, (separate from short ints/unsigned short),  the
> term 'wint' would have come from wide int.  But ints have never been
> unsigned unless specifically prefixed as such... so wints shouldn't
> have the ambiguity that chars have.
> 
>  It may very well exist as unsigned somewhere -- but the implementer
> should be chained to a 1960's card punch and forced to write in cobol.
> 
> You still haven't mentioned anyplace where wint_t is an unsigned
> value.

Yes, I have:
https://lists.gnu.org/archive/html/bug-bash/2012-02/msg00070.html
"both glibc (32-bit wchar_t) and cygwin (16-bit wchar_t) use a 32-bit
unsigned int for wint_t."

$ printf '#include\n' |gcc -E -|grep wint_t | head -n1
typedef unsigned int wint_t;

>   Is this a hypothetical issue?  I.e. in theory it could
> be unsigned , but in practice no one has ever made it so?

No, it is not hypothetical.  It is real.  wint_t can be either signed or
unsigned, and portable code cannot assume.

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly.

2012-02-22 Thread John Kearney
And on the up side if they do ever give in and allow registration of
family name characters we may get a wchar_t, schar_t lwchar_t and a
llwchar_t
:)
just imagine a variable length 64bit char system.

Everything from Sumerian to Klingon in Unicode, though I think they
already are, though not officially, or are being done,

Oh god what I really want now is bash in klingon.

:))
just imagine black blackround glaring green text.
know what I'm doing tonight.

check out ( shakes head in disbelief, while chuckling )
Ubuntu Klingon Translators https://launchpad.net/~ubuntu-l10n-tlh
Expansion: Ubuntu Font should support pIqaD (Klingon)
https://bugs.launchpad.net/ubuntu/+source/ubuntu-font-family-sources/+bug/650729



On 02/23/2012 04:54 AM, Eric Blake wrote:
> On 02/22/2012 07:43 PM, John Kearney wrote:
>> ^ caviot you can represent the full 0x10 in UTF-16, you just
>> need 2 UTF-16 characters. check out the latest version of
>> unicode.c for an example how.
> 
> Yes, and Cygwin actually does this.
> 
> A strict reading of POSIX states that wchar_t must be wide enough
> for all supported characters, technically limiting things to just
> the basic plane if you have 16-bit wchar_t and a POSIX-compliant
> app.  But cygwin has exploited a loophole in the POSIX wording -
> POSIX does not require that all bit patterns are valid characters.
> So the actual Cygwin implementation is that on paper, rather than
> representing all 65536 patterns as valid characters, the values
> used in surrogate halves (0xd800 to 0xdfff) are listed as
> non-characters (so the use of them triggers undefined behavior per
> POSIX), but actually using them treats them as surrogate pairs
> (leading to the full Unicode character set, but reintroducing the
> headaches that multibyte characters had with 'char', but now with
> wchar_t, where you are back to dealing with variable-sized 
> character elements).
> 
> Furthermore, the mess of 16-bit vs. 32-bit wchar_t is one of the
> reasons why C11 has introduced two new character types, 16-bit and
> 32-bit characters, designed to fully map to the full Unicode set,
> regardless of what size wchar_t is.  It will be interesting to see
> how the next version of POSIX takes the additions of C11 and
> retrofits the other wide-character functions in POSIX but not C99
> to handle the new character types.
> 




Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly.

2012-02-22 Thread Linda Walsh



Eric Blake wrote:



Don't think of it as 'wide-int', rather, think of it as 'the integral
type that both contains wchar_t and WEOF'.  You cannot write 'signed
wint_t' nor 'unsigned 'wint_t'.



---
?? You say don't think of it that way, but unless I missed something,
just like wchar stood for 'wide char', (and char's have always been
signed or unsigned, (separate from short ints/unsigned short),  the
term 'wint' would have come from wide int.  But ints have never been
unsigned unless specifically prefixed as such... so wints shouldn't
have the ambiguity that chars have.

 It may very well exist as unsigned somewhere -- but the implementer
should be chained to a 1960's card punch and forced to write in cobol.

You still haven't mentioned anyplace where wint_t is an unsigned
value.   Is this a hypothetical issue?  I.e. in theory it could
be unsigned , but in practice no one has ever made it so?

If so, it might be a good time to shoot that idea in the foot.
(or something like that...)...







Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly.

2012-02-22 Thread Eric Blake
On 02/22/2012 07:43 PM, John Kearney wrote:
> ^ caviot you can represent the full 0x10 in UTF-16, you just need 2
> UTF-16 characters. check out the latest version of unicode.c for an
> example how.

Yes, and Cygwin actually does this.

A strict reading of POSIX states that wchar_t must be wide enough for
all supported characters, technically limiting things to just the basic
plane if you have 16-bit wchar_t and a POSIX-compliant app.  But cygwin
has exploited a loophole in the POSIX wording - POSIX does not require
that all bit patterns are valid characters.  So the actual Cygwin
implementation is that on paper, rather than representing all 65536
patterns as valid characters, the values used in surrogate halves
(0xd800 to 0xdfff) are listed as non-characters (so the use of them
triggers undefined behavior per POSIX), but actually using them treats
them as surrogate pairs (leading to the full Unicode character set, but
reintroducing the headaches that multibyte characters had with 'char',
but now with wchar_t, where you are back to dealing with variable-sized
character elements).

Furthermore, the mess of 16-bit vs. 32-bit wchar_t is one of the reasons
why C11 has introduced two new character types, 16-bit and 32-bit
characters, designed to fully map to the full Unicode set, regardless of
what size wchar_t is.  It will be interesting to see how the next
version of POSIX takes the additions of C11 and retrofits the other
wide-character functions in POSIX but not C99 to handle the new
character types.

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly.

2012-02-22 Thread John Kearney
^ caviot you can represent the full 0x10 in UTF-16, you just need 2
UTF-16 characters. check out the latest version of unicode.c for an
example how.

On 02/22/2012 11:32 PM, Eric Blake wrote:
> On 02/22/2012 03:01 PM, Linda Walsh wrote:
>> My question had to do with an unqualified wint_t not
>> unsigned wint_t and what platform existed where an 'int' type or
>> wide-int_t, was, without qualifiers, unsigned.  I still would like
>> to know -- and posix allows int/wide-ints to be unsigned without
>> the unsigned keyword?
> 
> 'int' is signed, and at least 16 bits (these days, it's usually 32).  It
> can also be written 'signed int'.
> 
> 'unsigned int' is unsigned, and at least 16 bits (these days, it's
> usually 32).
> 
> 'wchar_t' is an arbitrary integral type, either signed or unsigned, and
> capable of holding the value of all valid wide characters.   It is
> possible to define a system where wchar_t and char are identical
> (limiting yourself to 256 valid characters), but that is not done in
> practice.  More common are platforms that use 65536 characters (only the
> basic plane of Unicode) for 16 bits, or full Unicode (0 to 0x10) for
> 32 bits.  Platforms that use 65536 characters and 16-bit wchar_t must
> have wchar_t be unsigned; whereas platforms that have wchar_t wider than
> the largest valid character can choose signed or unsigned with no impact.
> 
> 'wint_t' is an arbitrary integral type, either signed or unsigned, at
> least as wide as wchar_t, and capable of holding the value of all valid
> wide characters and the sentinel WEOF.  Like wchar_t, it may hold values
> that are neither WEOF or valid characters; and in fact, it is more
> likely to do so, since either wchar_t is saturated (all bit values are
> valid characters) and thus wint_t is a wider type, or wchar_t is sparse
> (as is the case with 32-bit wchar_t encoding Unicode), and the addition
> of WEOF to the set does not plug in the remaining sparse values; but
> using such values has unspecified results on any interface that takes a
> wint_t.  WEOF only has to be distinct, it does not have to be negative.
> 
> Don't think of it as 'wide-int', rather, think of it as 'the integral
> type that both contains wchar_t and WEOF'.  You cannot write 'signed
> wint_t' nor 'unsigned 'wint_t'.
> 




Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly.

2012-02-22 Thread Eric Blake
On 02/22/2012 03:01 PM, Linda Walsh wrote:
> My question had to do with an unqualified wint_t not
> unsigned wint_t and what platform existed where an 'int' type or
> wide-int_t, was, without qualifiers, unsigned.  I still would like
> to know -- and posix allows int/wide-ints to be unsigned without
> the unsigned keyword?

'int' is signed, and at least 16 bits (these days, it's usually 32).  It
can also be written 'signed int'.

'unsigned int' is unsigned, and at least 16 bits (these days, it's
usually 32).

'wchar_t' is an arbitrary integral type, either signed or unsigned, and
capable of holding the value of all valid wide characters.   It is
possible to define a system where wchar_t and char are identical
(limiting yourself to 256 valid characters), but that is not done in
practice.  More common are platforms that use 65536 characters (only the
basic plane of Unicode) for 16 bits, or full Unicode (0 to 0x10) for
32 bits.  Platforms that use 65536 characters and 16-bit wchar_t must
have wchar_t be unsigned; whereas platforms that have wchar_t wider than
the largest valid character can choose signed or unsigned with no impact.

'wint_t' is an arbitrary integral type, either signed or unsigned, at
least as wide as wchar_t, and capable of holding the value of all valid
wide characters and the sentinel WEOF.  Like wchar_t, it may hold values
that are neither WEOF or valid characters; and in fact, it is more
likely to do so, since either wchar_t is saturated (all bit values are
valid characters) and thus wint_t is a wider type, or wchar_t is sparse
(as is the case with 32-bit wchar_t encoding Unicode), and the addition
of WEOF to the set does not plug in the remaining sparse values; but
using such values has unspecified results on any interface that takes a
wint_t.  WEOF only has to be distinct, it does not have to be negative.

Don't think of it as 'wide-int', rather, think of it as 'the integral
type that both contains wchar_t and WEOF'.  You cannot write 'signed
wint_t' nor 'unsigned 'wint_t'.

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly.

2012-02-22 Thread Linda Walsh



Eric Blake wrote:


On 02/22/2012 05:19 AM, Linda Walsh wrote:


Eric Blake wrote:



Not only can wchar_t can be either signed or unsigned, you also have to
worry about platforms where it is only 16 bits, such as cygwin; on the
other hand, wint_t is always 32 bits, but you still have the issue that
it can be either signed or unsigned.



What platform uses unsigned wide ints?  Is that even posix compat?


Yes, it is posix compatible to have wint_t be unsigned.  Not only that,
but both glibc (32-bit wchar_t) and cygwin (16-bit wchar_t) use a 32-bit
unsigned int for wint_t.  Any code that expects WEOF to be less than 0
is broken.



I never had any question that wchar_t could be signed or
unsigned.

My question had to do with an unqualified wint_t not
unsigned wint_t and what platform existed where an 'int' type or
wide-int_t, was, without qualifiers, unsigned.  I still would like
to know -- and posix allows int/wide-ints to be unsigned without
the unsigned keyword?

That seems very confusing.





Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly.

2012-02-22 Thread John Kearney
On 02/22/2012 01:59 PM, Eric Blake wrote:
> On 02/22/2012 05:19 AM, Linda Walsh wrote:
>>
>>
>> Eric Blake wrote:
>>
>>
>>> Not only can wchar_t can be either signed or unsigned, you also have to
>>> worry about platforms where it is only 16 bits, such as cygwin; on the
>>> other hand, wint_t is always 32 bits, but you still have the issue that
>>> it can be either signed or unsigned.
>>
>>
>>
>> What platform uses unsigned wide ints?  Is that even posix compat?
> 
> Yes, it is posix compatible to have wint_t be unsigned.  Not only that,
> but both glibc (32-bit wchar_t) and cygwin (16-bit wchar_t) use a 32-bit
> unsigned int for wint_t.  Any code that expects WEOF to be less than 0
> is broken.
> 
But if what you want is a uint32  use a uint32_t ;)



Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly.

2012-02-22 Thread Eric Blake
On 02/22/2012 05:19 AM, Linda Walsh wrote:
> 
> 
> Eric Blake wrote:
> 
> 
>> Not only can wchar_t can be either signed or unsigned, you also have to
>> worry about platforms where it is only 16 bits, such as cygwin; on the
>> other hand, wint_t is always 32 bits, but you still have the issue that
>> it can be either signed or unsigned.
> 
> 
> 
> What platform uses unsigned wide ints?  Is that even posix compat?

Yes, it is posix compatible to have wint_t be unsigned.  Not only that,
but both glibc (32-bit wchar_t) and cygwin (16-bit wchar_t) use a 32-bit
unsigned int for wint_t.  Any code that expects WEOF to be less than 0
is broken.

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly.

2012-02-22 Thread Linda Walsh



Eric Blake wrote:



Not only can wchar_t can be either signed or unsigned, you also have to
worry about platforms where it is only 16 bits, such as cygwin; on the
other hand, wint_t is always 32 bits, but you still have the issue that
it can be either signed or unsigned.




What platform uses unsigned wide ints?  Is that even posix compat?




Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly.

2012-02-21 Thread Chet Ramey
On 2/21/12 8:43 AM, John Kearney wrote:

> signed / unsigend isn't really the problem anyway utf-8 only encodes
> up to 0x7fff  and utf-16 only encodes up to 0x0010 .
> 
> In my latest version I've pretty much removed all reference to wchar_t
> in unicode.c. It was unnecessary.

It's useful if the platform defines __STDC_ISO_10646__, wchar_t is 32 bits,
and the value is less than 0x7fff.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/



Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly.

2012-02-21 Thread John Kearney
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/21/2012 01:34 PM, Eric Blake wrote:
> On 02/20/2012 07:42 PM, Chet Ramey wrote:
>> On 2/18/12 5:39 AM, John Kearney wrote:
>> 
>>> Bash Version: 4.2 Patch Level: 10 Release Status: release
>>> 
>>> Description: Current u32toutf8 only encode values below 0x
>>> correctly. wchar_t can be ambiguous size better in my opinion
>>> to use unsigned long, or uint32_t, or something clearer.
>> 
>> Thanks for the patch.  It's good to have a complete
>> implementation, though as a practical matter you won't see UTF-8
>> characters longer than four bytes.  I agree with you about the
>> unsigned 32-bit int type; wchar_t is signed, even if it's 32
>> bits, on several systems I use.
> 
> Not only can wchar_t can be either signed or unsigned, you also
> have to worry about platforms where it is only 16 bits, such as
> cygwin; on the other hand, wint_t is always 32 bits, but you still
> have the issue that it can be either signed or unsigned.
> 
signed / unsigend isn't really the problem anyway utf-8 only encodes
up to 0x7fff  and utf-16 only encodes up to 0x0010 .

In my latest version I've pretty much removed all reference to wchar_t
in unicode.c. It was unnecessary.

However I would be interested in something like utf16_t or uint16_t
currently using unsigned short which is intelligent but works.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJPQ593AAoJEKUDtR0WmS05g0wH/RPQMl1mfUdJBfzv5QkUtVSG
ibezTe3/b7/9h8SG3LLrv2FiPS+FtcCbE4n8tUror3V1BHomsQHZdlj/Zshi8W/n
YDl5ac5nc0rrOlw+SJxyCAJl9vHeEAXavjGw8m0KUv/vn0tZyWNM0RYXc7tRxJU2
uqY7G5sGLUt8uGuswCmSmucKjoB7guiUbsmTR+OzgDgKxuuSeQBr6/oIImo721pk
nI5TYdqerPGCIMJoYPeZChCBAZ/WhK9i3C3/SxKme4zWnjySaDw3NH0yfqFHl4Ts
IIOT4fYpm0h62U76+NJSPGWfadTd8UL4A/Jy4I3IwUS+mflwdU0Pu2zmwb8I+Xk=
=pkAF
-END PGP SIGNATURE-



Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly.

2012-02-21 Thread Eric Blake
On 02/20/2012 07:42 PM, Chet Ramey wrote:
> On 2/18/12 5:39 AM, John Kearney wrote:
> 
>> Bash Version: 4.2
>> Patch Level: 10
>> Release Status: release
>>
>> Description:
>>  Current u32toutf8 only encode values below 0x correctly.
>> wchar_t can be ambiguous size better in my opinion to use
>> unsigned long, or uint32_t, or something clearer.
> 
> Thanks for the patch.  It's good to have a complete implementation,
> though as a practical matter you won't see UTF-8 characters longer
> than four bytes.  I agree with you about the unsigned 32-bit int
> type; wchar_t is signed, even if it's 32 bits, on several systems
> I use.

Not only can wchar_t can be either signed or unsigned, you also have to
worry about platforms where it is only 16 bits, such as cygwin; on the
other hand, wint_t is always 32 bits, but you still have the issue that
it can be either signed or unsigned.

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: Fix u32toutf8 so it encodes values > 0xFFFF correctly.

2012-02-20 Thread Chet Ramey
On 2/18/12 5:39 AM, John Kearney wrote:

> Bash Version: 4.2
> Patch Level: 10
> Release Status: release
> 
> Description:
>   Current u32toutf8 only encode values below 0x correctly.
> wchar_t can be ambiguous size better in my opinion to use
> unsigned long, or uint32_t, or something clearer.

Thanks for the patch.  It's good to have a complete implementation,
though as a practical matter you won't see UTF-8 characters longer
than four bytes.  I agree with you about the unsigned 32-bit int
type; wchar_t is signed, even if it's 32 bits, on several systems
I use.

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/



Fix u32toutf8 so it encodes values > 0xFFFF correctly.

2012-02-18 Thread John Kearney
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='x86_64'
- -DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='x86_64-pc-linux-gnu'
- -DCONF_VENDOR='pc' -DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash'
- -DSHELL -DHAVE_CONFIG_H   -I.  -I../bash -I../bash/include
- -I../bash/lib   -g -O2 -Wall
uname output: Linux DETH00 3.0.0-15-generic #26-Ubuntu SMP Fri Jan 20
17:23:00 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Machine Type: x86_64-pc-linux-gnu

Bash Version: 4.2
Patch Level: 10
Release Status: release

Description:
Current u32toutf8 only encode values below 0x correctly.
wchar_t can be ambiguous size better in my opinion to use
unsigned long, or uint32_t, or something clearer.
Repeat-By:
  ---'

Fix:
diff --git a/lib/sh/unicode.c b/lib/sh/unicode.c
index d34fa08..3f7d378 100644
- --- a/lib/sh/unicode.c
+++ b/lib/sh/unicode.c
@@ -54,7 +54,7 @@ extern const char *locale_charset __P((void));
 extern char *get_locale_var __P((char *));
 #endif

- -static int u32init = 0;
+static int u32init = 0;
 static int utf8locale = 0;
 #if defined (HAVE_ICONV)
 static iconv_t localconv;
@@ -115,26 +115,61 @@ u32tochar (wc, s)
 }

 int
- -u32toutf8 (wc, s)
- - wchar_t wc;
+u32toutf8 (c, s)
+ unsigned long c;
  char *s;
 {
   int l;

- -  l = (wc < 0x0080) ? 1 : ((wc < 0x0800) ? 2 : 3);
- -
- -  if (wc < 0x0080)
- -s[0] = (unsigned char)wc;
- -  else if (wc < 0x0800)
+  if (c <= 0x7F)
+{
+  s[0] = (char)c;
+  l = 1;
+}
+  else if (c <= 0x7FF)
+{
+  s[0] = (c >>   6)| 0xc0; /* 110x  */
+  s[1] = (c& 0x3f) | 0x80; /* 10xx  */
+  l = 2;
+}
+  else if (c <= 0x)
+{
+  s[0] =  (c >> 12) | 0xe0; /* 1110  */
+  s[1] = ((c >>  6) & 0x3f) | 0x80; /* 10xx  */
+  s[2] =  (c& 0x3f) | 0x80; /* 10xx  */
+  l = 3;
+}
+  else if (c <= 0x1F)
 {
- -  s[0] = (wc >> 6) | 0xc0;
- -  s[1] = (wc & 0x3f) | 0x80;
+  s[0] =  (c >> 18) | 0xf0; /*  0xxx */
+  s[1] = ((c >> 12) & 0x3f) | 0x80; /* 10xx  */
+  s[2] = ((c >>  6) & 0x3f) | 0x80; /* 10xx  */
+  s[3] = ( c& 0x3f) | 0x80; /* 10xx  */
+  l = 4;
+}
+  else if (c <= 0x3FF)
+{
+  s[0] =  (c >> 24) | 0xf8; /*  10xx */
+  s[1] = ((c >> 18) & 0x3f) | 0x80; /* 10xx  */
+  s[2] = ((c >> 12) & 0x3f) | 0x80; /* 10xx  */
+  s[3] = ((c >>  6) & 0x3f) | 0x80; /* 10xx  */
+  s[4] = ( c& 0x3f) | 0x80; /* 10xx  */
+  l = 5;
+}
+  else if (c <= 0x7FFF)
+{
+  s[0] =  (c >> 30) | 0xfc; /*  110x */
+  s[1] = ((c >> 24) & 0x3f) | 0x80; /* 10xx  */
+  s[2] = ((c >> 18) & 0x3f) | 0x80; /* 10xx  */
+  s[3] = ((c >> 12) & 0x3f) | 0x80; /* 10xx  */
+  s[4] = ((c >>  6) & 0x3f) | 0x80; /* 10xx  */
+  s[5] = ( c& 0x3f) | 0x80; /* 10xx  */
+  l = 6;
 }
   else
 {
- -  s[0] = (wc >> 12) | 0xe0;
- -  s[1] = ((wc >> 6) & 0x3f) | 0x80;
- -  s[2] = (wc & 0x3f) | 0x80;
+  /* Error Invalid UTF-8 */
+  l = 0;
 }
   s[l] = '\0';
   return l;
@@ -150,7 +185,7 @@ u32cconv (c, s)
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJPP3/tAAoJEKUDtR0WmS059CcH/iIyBOGhf0IgSmnIFyw0YLpA
3ZWSaXWoEZodrDr1fX67hj2424icXm9fTZw70G+rS1YjtCfm86O/Qou4VNROylAv
TbjPUWkHRWVci7IqcDGb1tNWRrulxUvNFA/Uc1xBtKckAO6HHHRTYFa+sCkd5Fnx
dm7e0iMTqMMmL/dUwB+di+hSkGD+ZXS1vY76wizdwG7CteUxAVunse+ffP7TRYbn
K86Whc7p7llG12hruCPGArc9iS7YiBaC/XNIKXmN7fn93dhQTcdzzk/UTGmaZgDk
cQk4R7/NBljP4LtQtKwX4JYAi5XJM5TeSLykL97UFxW/5OGM+SmSVJbKLlHU/mQ=
=EJUb
-END PGP SIGNATURE-