At Sun, 15 Mar 2009 15:57:40 +0100
Gregor Best wrote:

> At Sun, 15 Mar 2009 10:05:20 +0100
> Julien Danjou wrote:
> 
> > Hi Gregor,
> > 
> > At 1237059176 time_t, Gregor Best wrote:
> > > the attached patch fixes a thing which has been bugging me for a long
> > > time: If you open a prompt, it lets you enter UTF-8 strings like "αλπψα"
> > > just fine, but if you then want to use BackSpace to remove these UTF-8
> > > glyphs, you have to press it twice for each character.
> > 
> > Indeed. :-(
> > 
> > > To fix this, i used a "feature" (I
> > > don't know whether that is intended or not) of string:wlen() which was
> > > that on a malformed UTF-8 string, it returns an illogically high number
> > > (for example the string "αλπψα" mangled by sub(1, len() - 1)) yielded
> > > something around 4 billion, which sounds "unlikely" to say the least).
> > 
> > Well, it's actually a feature but it was bad implemented.
> > We use mbstowcs() to count UTF-8 string length, but it returns a size_t
> > that we used to push directly on the Lua stack.
> > Unfortunately it returns (size_t) -1 on error, so we push a very big
> > number instead of -1.
> > I've pushed a fix so you will get -1
> > (5afd2586970e23165c900e03e6ee600e6d5a8ccd).
> > 
> > > I patched the BackSpace
> > > part of prompt.run() so that it removes the last two bytes if the
> > > difference of the wlen()s of the old command and the new command is larger
> > > than 1.
> > 
> > I did not dig into it, but I used € to test, and it still fails here.
> > Could you check ?
> > 
> > Cheers,
> 
> Okay, here's an updated version. It turned out that while Greek characters
> like α,β,γ, etc... actually consist of two bytes, for which the original patch
> worked fine. Characters like € however consist of more than two bytes (3 in
> this case), so I modified the routine to remove bytes from the command until
> command:wlen() is not -1, which should work in any case.
> 

Hmm, I just noticed, awful.prompt.prompt_text_with_cursor() needs some
reworking for multibyte characters, too, and I guess the same applies for the
part of prompt.run() which allows to move the cursor via left / right
keypresses and the part which allows deleting characters with the DEL key.

-- 
GCS/IT/M d- s+:- a--- C++ UL+++ US UB++ P+++ L+++ E--- W+ N+ o--
K- w--- O M-- V PS+ PE- Y+ PGP+++ t+ 5 X+ R tv+ b++ DI+++ D+++ G+
e- h! r y+

    Gregor Best

Attachment: signature.asc
Description: PGP signature

Reply via email to