Hi!

In Options - Language Settings - Writing aids there is option "Minimal number 
of characters for hyphenation", which defaults to 5. This has the effect that 
if word has 4 characters or less, it will not be automatically hyphenated. So 
far, this is good. But what exactly should count as a "character" here?

In some languages, at least in Finnish, compound words are hyphenated by 
parts. If we have compound word "valokuva" which is composed of "valo" 
and "kuva", the preferred position to split the word is between the 
parts "valo-kuva". It is also possible to hyphenate the individual parts. 
Considering that "valokuva" has 8 characters and can therefore be hyphenated, 
all the possible hyphenation points for this word are "va-lo-ku-va". I call 
this Option 1.

When "Minimal number of characters for hyphenation" is 5, words "valo" 
and "kuva" will not be hyphenated when they occur alone. We could then claim 
that they should not be hyphenated in compound words either, so 
that "valokuva" would only be split as "valo-kuva" by the hyphenator. This is 
Option 2.

The Finnish spellchecker extension Voikko currently uses Option 1. But we have 
also implemented Option 2, and it can be activated by adding one line of code 
before building the extension. The OpenOffice.org builtin hyphenator uses 
Option 1. As I understand, it currently cannot do anything else because no 
morphological analysis is performed on the words before hyphenation, which is 
needed for Option 2. But hunspell does support morphological analysis and [1] 
suggests that it might in future be used in OOo to improve the hyphenation 
quality.

Therefore I would like to know that if Option 2 becomes technically possible 
for OOo's builtin hyphenator to implement, will it make sense to use it 
instead of current behaviour? Or should there perhaps be a separate option to 
allow users to choose this, defaulting to current model (Option 1)? I have no 
strong opinions about which behaviour is actually better. I do not have MS 
Word, but I have been told that it does something that is close but not quite 
the same as Option 2 for Finnish compound words. But I do think that it makes 
sense for all backends (OOo builtin, Voikko, proprietary extensions) to 
interpret the options in the same way, which is why I am bringing this up for 
discussion.

Harri

[1] http://hunspell.sourceforge.net/tb87nemeth.pdf

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to