(re-sent, including figures this time)

Jean-Marc Lasgouttes schreef op 6-5-2014 22:33:
Le 06/05/14 22:21, Vincent van Ravesteijn a écrit :
OK, I have an idea for having correct selections without loosing the
Color_selectiontext enum: we could draw the complete string as
selected and non-selected, but use clipping to make sure that only the
right part of the selection is visible. It will be a bit tricky, but
it is doable.

In LyX 2.0.7 coloring parts of arabic strings works ok. So, I'm not sure
why there is a problem here now. Ok, ligatures that should have
different parts colored differently is a bit difficult. My feeling is
that it is ok to split the ligatures in this exceptional case. The
contextual forms in arabic though are not ligatures and can be painted
in different colors without problems.

Actually, in master, the composition of character is done also by looking forward and therefore by using characters beyond the ones we are interested in. However, this is all hardcoded stuff, and I would like instead to rely on whatever information Qt can give me.

Have you had a look on QFontMetrics::width(QString const & str, int n = -1). This function interprets the whole string str, and computes the width of the string up to the nth character. This gives you the correct positions for the arabic contextual forms.
The bigger problem will be cursor positioning, but I need more information from people who understand Arabic writing to progress on
that.

What info do you need ?

What is the difference between a ligature and a contextual form?

According to: http://en.wikipedia.org/wiki/Arabic_alphabet#Ligatures, there is only one compulsory ligature (having two forms, see later), and that's the one I showed in a previous mail.

The contextual forms means that in general, each character has four different presentation forms. These are the unicode points in the arabic_table in src/Encoding.cpp. The unicode points are located in the "Arabic Presentation Forms-B" unicode table.

In the four columns you can see the:
- isolated form
- end form; when the character is only connected to the character in front,
- initial form; when the character is only connected to the character behind. - mid form; when the character is connected to both the character in front as behind,

An example of ha (0x0647):

Ha has four different representation forms.

Here an example of meem (0x0645):


Although the character looks pretty much the same in the first/second and third/fourth form, they are different forms and have therefore different unicode points.

The case is different when considering for example waw (0X0648):

This character can only be connected to the character before, but never to the character behind. This means that the first and third form have the same unicode point, as well as the second and fourth form. The reader can confirm this in the arabic_table that this is indeed the case.


Are there in arabic Compose character that do not really have their own width (like accents in latin scripts), but decorate another character?

Most important compose chars are the "accents" that indicate the vowel sounds and a few more. The range as defined by Encodings::arabicComposeChar follows exactly what is defined by ISO-8859-6.

I think that the chars are recognized from Qt by QChar::category() == Qt::Mark_NonSpacing.

I want to have some feeling of how this works. If you have a web page for newbies describing these features, this would be perfect. Also, what program is supposed to have a sound implementation of these languages in terms of behavior? Word? LibreOffice?

I am not sure when I will have time to continue, but I want to understand all these things.

And first I will probably try to implement your idea of using Qt to place cursor.

Ah that answers my first question.
JMarc
Vincent

Reply via email to