(re-sent, including figures this time)
Jean-Marc Lasgouttes schreef op 6-5-2014 22:33:
Le 06/05/14 22:21, Vincent van Ravesteijn a écrit :
OK, I have an idea for having correct selections without loosing the
Color_selectiontext enum: we could draw the complete string as
selected and non-selected, but use clipping to make sure that only the
right part of the selection is visible. It will be a bit tricky, but
it is doable.
In LyX 2.0.7 coloring parts of arabic strings works ok. So, I'm not sure
why there is a problem here now. Ok, ligatures that should have
different parts colored differently is a bit difficult. My feeling is
that it is ok to split the ligatures in this exceptional case. The
contextual forms in arabic though are not ligatures and can be painted
in different colors without problems.
Actually, in master, the composition of character is done also by
looking forward and therefore by using characters beyond the ones we
are interested in. However, this is all hardcoded stuff, and I would
like instead to rely on whatever information Qt can give me.
Have you had a look on QFontMetrics::width(QString const & str, int n =
-1). This function interprets the whole string str, and computes the
width of the string up to the nth character. This gives you the correct
positions for the arabic contextual forms.
The bigger problem will be cursor positioning, but I need more
information from people who understand Arabic writing to progress on
that.
What info do you need ?
What is the difference between a ligature and a contextual form?
According to: http://en.wikipedia.org/wiki/Arabic_alphabet#Ligatures,
there is only one compulsory ligature (having two forms, see later), and
that's the one I showed in a previous mail.
The contextual forms means that in general, each character has four
different presentation forms. These are the unicode points in the
arabic_table in src/Encoding.cpp. The unicode points are located in the
"Arabic Presentation Forms-B" unicode table.
In the four columns you can see the:
- isolated form
- end form; when the character is only connected to the character in front,
- initial form; when the character is only connected to the character
behind.
- mid form; when the character is connected to both the character in
front as behind,
An example of ha (0x0647):
Ha has four different representation forms.
Here an example of meem (0x0645):
Although the character looks pretty much the same in the first/second
and third/fourth form, they are different forms and have therefore
different unicode points.
The case is different when considering for example waw (0X0648):
This character can only be connected to the character before, but never
to the character behind. This means that the first and third form have
the same unicode point, as well as the second and fourth form. The
reader can confirm this in the arabic_table that this is indeed the case.
Are there in arabic Compose character that do not really have their
own width (like accents in latin scripts), but decorate another
character?
Most important compose chars are the "accents" that indicate the vowel
sounds and a few more. The range as defined by
Encodings::arabicComposeChar follows exactly what is defined by ISO-8859-6.
I think that the chars are recognized from Qt by QChar::category() ==
Qt::Mark_NonSpacing.
I want to have some feeling of how this works. If you have a web page
for newbies describing these features, this would be perfect. Also,
what program is supposed to have a sound implementation of these
languages in terms of behavior? Word? LibreOffice?
I am not sure when I will have time to continue, but I want to
understand all these things.
And first I will probably try to implement your idea of using Qt to
place cursor.
Ah that answers my first question.
JMarc
Vincent