Re: Call for testers: the features/str-metrics branch

Vincent van Ravesteijn Wed, 07 May 2014 02:08:28 -0700

(re-sent, including figures this time)

Jean-Marc Lasgouttes schreef op 6-5-2014 22:33:

Le 06/05/14 22:21, Vincent van Ravesteijn a écrit :

OK, I have an idea for having correct selections without loosing the
Color_selectiontext enum: we could draw the complete string as
selected and non-selected, but use clipping to make sure that only the
right part of the selection is visible. It will be a bit tricky, but
it is doable.


In LyX 2.0.7 coloring parts of arabic strings works ok. So, I'm not sure
why there is a problem here now. Ok, ligatures that should have
different parts colored differently is a bit difficult. My feeling is
that it is ok to split the ligatures in this exceptional case. The
contextual forms in arabic though are not ligatures and can be painted
in different colors without problems.

Actually, in master, the composition of character is done also bylooking forward and therefore by using characters beyond the ones weare interested in. However, this is all hardcoded stuff, and I wouldlike instead to rely on whatever information Qt can give me.

Have you had a look on QFontMetrics::width(QString const & str, int n =-1). This function interprets the whole string str, and computes thewidth of the string up to the nth character. This gives you the correctpositions for the arabic contextual forms.

The bigger problem will be cursor positioning, but I need moreinformation from people who understand Arabic writing to progress on
that.
What info do you need ?
What is the difference between a ligature and a contextual form?

According to: http://en.wikipedia.org/wiki/Arabic_alphabet#Ligatures,there is only one compulsory ligature (having two forms, see later), andthat's the one I showed in a previous mail.

The contextual forms means that in general, each character has fourdifferent presentation forms. These are the unicode points in thearabic_table in src/Encoding.cpp. The unicode points are located in the"Arabic Presentation Forms-B" unicode table.


In the four columns you can see the:
- isolated form
- end form; when the character is only connected to the character in front,

- initial form; when the character is only connected to the characterbehind.- mid form; when the character is connected to both the character infront as behind,


An example of ha (0x0647):

Ha has four different representation forms.

Here an example of meem (0x0645):

Although the character looks pretty much the same in the first/secondand third/fourth form, they are different forms and have thereforedifferent unicode points.


The case is different when considering for example waw (0X0648):

This character can only be connected to the character before, but neverto the character behind. This means that the first and third form havethe same unicode point, as well as the second and fourth form. Thereader can confirm this in the arabic_table that this is indeed the case.

Are there in arabic Compose character that do not really have theirown width (like accents in latin scripts), but decorate anothercharacter?

Most important compose chars are the "accents" that indicate the vowelsounds and a few more. The range as defined byEncodings::arabicComposeChar follows exactly what is defined by ISO-8859-6.

I think that the chars are recognized from Qt by QChar::category() ==Qt::Mark_NonSpacing.

I want to have some feeling of how this works. If you have a web pagefor newbies describing these features, this would be perfect. Also,what program is supposed to have a sound implementation of theselanguages in terms of behavior? Word? LibreOffice?
I am not sure when I will have time to continue, but I want tounderstand all these things.
And first I will probably try to implement your idea of using Qt toplace cursor.

Ah that answers my first question.

JMarc

Vincent

Re: Call for testers: the features/str-metrics branch

Reply via email to