On Wed, Oct 31, 2012 at 12:13:56AM +0000, Nicholas Marriott wrote:
> That is:
> 
> - libedit has a wchar_t* buffer (el->el_line.buffer) and el_line calls
>   ct_encode_string to convert it to a char*.
> 
> - ct_encode_string calls wctomb which it expects to make UTF-8 but in
>   fact because setlocale has not been called it outputs ASCII.
> 
> - el_line then uses ct_enc_width which assumes UTF-8 and returns 2. So
>   the offset is adjusted by 2 even though only 1 byte was filled in.
> 
> - ftp obviously isn't happy about having a position after a \0, so it
>   goes boom.
> 
> The setlocale() change below will only fix the problem if LC_CTYPE or
> LC_ALL is set to UTF-8. ftp still cores if pasting UTF-8 in C locale.
> 
> I think the right fix is for libedit to use the return value of wctomb
> to adjust the offset rather than assuming UTF-8 and working out the
> width itself.
> 
> Perhaps something like this (very lightly tested):
> 
> Index: chartype.c
> ===================================================================
> RCS file: /cvs/src/lib/libedit/chartype.c,v
> retrieving revision 1.4
> diff -u -p -r1.4 chartype.c
> --- chartype.c        17 Nov 2011 20:14:24 -0000      1.4
> +++ chartype.c        31 Oct 2012 00:13:12 -0000
> @@ -44,6 +44,8 @@
>  #define CT_BUFSIZ 1024
>  
>  #ifdef WIDECHAR
> +protected ssize_t ct_encode_char1(char *, size_t, Char);
> +
>  protected void
>  ct_conv_buff_resize(ct_buffer_t *conv, size_t mincsize, size_t minwsize)
>  {
> @@ -178,27 +180,25 @@ ct_decode_argv(int argc, const char *arg
>  protected size_t
>  ct_enc_width(Char c)
>  {
> -     /* UTF-8 encoding specific values */
> -     if (c < 0x80)
> -             return 1;
> -     else if (c < 0x0800)
> -             return 2;
> -     else if (c < 0x10000)
> -             return 3;
> -     else if (c < 0x110000)
> -             return 4;
> -     else
> -             return 0; /* not a valid codepoint */
> +     char s[MB_CUR_MAX];
> +
> +     return ct_encode_char1(s, sizeof s, c);
>  }
>  
>  protected ssize_t
>  ct_encode_char(char *dst, size_t len, Char c)
>  {
> -     ssize_t l = 0;
>       if (len < ct_enc_width(c))
>               return -1;
> -     l = ct_wctomb(dst, c);
> +     return ct_encode_char1(dst, len, c);
> +}
>  
> +protected ssize_t
> +ct_encode_char1(char *dst, size_t len, Char c)
> +{
> +     ssize_t l = 0;
> +
> +     l = ct_wctomb(dst, c);
>       if (l < 0) {
>               ct_wctomb_reset;
>               l = 0;
> 

With this patch (and without the patch for ftp), the behavior of ftp is
a little weird.

> cd pub/Open
             ^

Here I write https://en.wikipedia.org/wiki/%C2%BA and press tab. I can
see the character. I press backspace and the completion doesn't work. I
press backspace again and tab. ftp completes to "cd pub/OpeBSD".


> 
> On Tue, Oct 30, 2012 at 11:56:18PM +0000, Nicholas Marriott wrote:
> > Hi
> > 
> > The buffer isn't zero-terminated, it's the result of calling wctomb to
> > convert the internal wchar_t* that libedit has into a char*.
> > 
> > libedit works out the offset in el_line with ct_enc_width which rather
> > foolishly makes the assumption that wctomb will convert to UTF-8, but
> > ftp doesn't call setlocale so it just leaves it as ASCII.
> > 
> > Try this:
> > 
> > Index: main.c
> > ===================================================================
> > RCS file: /cvs/src/usr.bin/ftp/main.c,v
> > retrieving revision 1.85
> > diff -u -p -r1.85 main.c
> > --- main.c  26 Aug 2012 02:16:02 -0000      1.85
> > +++ main.c  30 Oct 2012 23:52:34 -0000
> > @@ -67,6 +67,7 @@
> >  
> >  #include <ctype.h>
> >  #include <err.h>
> > +#include <locale.h>
> >  #include <netdb.h>
> >  #include <pwd.h>
> >  #include <stdio.h>
> > @@ -90,6 +91,8 @@ main(volatile int argc, char *argv[])
> >     char *outfile = NULL;
> >     const char *errstr;
> >     int dumb_terminal = 0;
> > +
> > +   setlocale(LC_CTYPE, "");
> >  
> >     ftpport = "ftp";
> >     httpport = "http";
> > 
> > 
> > 
> > 
> > 
> > On Tue, Oct 30, 2012 at 10:31:16PM +0100, Otto Moerbeek wrote:
> > > On Tue, Oct 30, 2012 at 10:17:12PM +0100, Otto Moerbeek wrote:
> > > 
> > > > On Tue, Oct 30, 2012 at 08:59:27PM +0100, Juan Francisco Cantero 
> > > > Hurtado wrote:
> > > > 
> > > > > On Tue, Oct 30, 2012 at 09:31:58AM +0100, Otto Moerbeek wrote:
> > > > > > On Mon, Oct 29, 2012 at 06:43:13PM +0100, Juan Francisco Cantero 
> > > > > > Hurtado wrote:
> > > > > > 
> > > > > > > Chris Cappuccio sent me a mail saying he can't see the 
> > > > > > > characters, only
> > > > > > > a question mark.
> > > > > > > 
> > > > > > > I'm linking each character to their wikipedia page, so you can
> > > > > > > copy-paste the character.
> > > > > > > 
> > > > > > > On Thu, Oct 25, 2012 at 05:07:34AM +0200, Juan Francisco Cantero 
> > > > > > > Hurtado wrote:
> > > > > > > > This afternoon I was downloading a tarball from a OpenBSD 
> > > > > > > > mirror. I
> > > > > > > > press the key "?" and after the tab key. ftp crashed with a 
> > > > > > > > segfault.
> > > > > > 
> > > > > > Please also include your environment settings. It is likely locale
> > > > > > plays a role here.
> > > > > > 
> > > > > > At least env | grep LC
> > > > > > 
> > > > > 
> > > > > I've tried the bug in amd64 without locales and also with
> > > > > LC_TIME="es_ES.ISO8859-1" LC_CTYPE="en_US.UTF-8".
> > > > > 
> > > > > The i386 system was a clean installation in a virtual machine.
> > > > 
> > > > I can now reproduce using a terminal that accepts more than just low 
> > > > ascii.
> > > > 
> > > > What I see is that when complete() is called the cursor position in
> > > > the EditLine struct is not what it is supposed to be, it points a
> > > > couple of bytes beyond the terminating NUL while it is supposed to
> > > > point to the NUL. That causes confusing in the scanner, getting the
> > > > argument list count wrong.
> > > 
> > > Ehh, the buffer is not NUL terminated, but observation still holds:
> > > the cursor position is a couple of bytes further than it
> > > should be.
> > > 
> > > > 
> > > > The root of the problem seems to be inside the editline lib.
> > > > 
> > > > Cc:ing nicm@, maybe he has a clue
> > > > 
> > > >         -Otto
> > > >         
> > > > 
> > > > > 
> > > > > > 
> > > > > > > https://en.wikipedia.org/wiki/%C2%BA
> > > > > > > > 
> > > > > > > > Steps for reproduce:
> > > > > > > > # ftp ftp.fr.openbsd.org
> > > > > > > > user and password
> > > > > > > > ascii art
> > > > > > > > ftp> cd pub/Open?    <- Here press the tab key
> > > > > > > https://en.wikipedia.org/wiki/%C2%BA
> > > > > > > > segmentation fault (core dumped)  ftp ftp.fr.openbsd.org
> > > > > > > > 
> > > > > > > > It also crashes with the letter "?" and "?".
> > > > > > > https://en.wikipedia.org/wiki/%C3%81
> > > > > > > https://en.wikipedia.org/wiki/%C3%91
> > > > > > > > 
> > > > > > > > Tested in:
> > > > > > > > - A snapshot from yesterday. i386. root account. console/ksh 
> > > > > > > > without
> > > > > > > >   locales.
> > > > > > > > - A snapshot from a few days ago. amd64. user. urxvt/zsh with 
> > > > > > > > utf8
> > > > > > > >   locales.
> > > > > > > > 
> > > > > > > > I also tested the bug in a remote session with OpenBSD 4.7 and 
> > > > > > > > ftp works
> > > > > > > > without problems.
> > > > > > > > 
> > > > > > > > I've updated the code of usr.bin/ftp to 2012-10-01 and 
> > > > > > > > 2012-01-01 and
> > > > > > > > tried both versions. ftp also crashes.
> > > > > > > > 
> > > > > > > > Backtrace:
> > > > > > > > Thread 1 (process 3436):
> > > > > > > > #0  memcpy (dst0=0x9d4160, src0=Variable "src0" is not 
> > > > > > > > available.
> > > > > > > > ) at /usr/src/lib/libc/string/bcopy.c:115
> > > > > > > > #1  0x000000000040432b in complete (el=Variable "el" is not 
> > > > > > > > available.
> > > > > > > > ) at /usr/src/usr.bin/ftp/complete.c:313
> > > > > > > > #2  0x000000000041eb84 in el_wgets (el=0x20da64800, 
> > > > > > > > nread=0x7f7ffffe3ebc) at read.c:612
> > > > > > > > #3  0x000000000041ef8d in el_gets (el=0x20da64800, 
> > > > > > > > nread=Variable "nread" is not available.
> > > > > > > > ) at eln.c:78
> > > > > > > > #4  0x000000000040e55f in cmdscanner (top=Variable "top" is not 
> > > > > > > > available.
> > > > > > > > ) at /usr/src/usr.bin/ftp/main.c:465
> > > > > > > > #5  0x000000000040eb7c in main (argc=1, argv=0x7f7ffffe4398) at 
> > > > > > > > /usr/src/usr.bin/ftp/main.c:369
> > > > > > > > 
> > > > > > > > Let me know if it's necessary more info or whatever :)
> > > > > > > > 
> > > > > > > > Cheers.
> > > > > > > > 
> > > > > > > 

-- 
Juan Francisco Cantero Hurtado http://juanfra.info

Reply via email to