Hi! In Options - Language Settings - Writing aids there is option "Minimal number of characters for hyphenation", which defaults to 5. This has the effect that if word has 4 characters or less, it will not be automatically hyphenated. So far, this is good. But what exactly should count as a "character" here?
In some languages, at least in Finnish, compound words are hyphenated by parts. If we have compound word "valokuva" which is composed of "valo" and "kuva", the preferred position to split the word is between the parts "valo-kuva". It is also possible to hyphenate the individual parts. Considering that "valokuva" has 8 characters and can therefore be hyphenated, all the possible hyphenation points for this word are "va-lo-ku-va". I call this Option 1. When "Minimal number of characters for hyphenation" is 5, words "valo" and "kuva" will not be hyphenated when they occur alone. We could then claim that they should not be hyphenated in compound words either, so that "valokuva" would only be split as "valo-kuva" by the hyphenator. This is Option 2. The Finnish spellchecker extension Voikko currently uses Option 1. But we have also implemented Option 2, and it can be activated by adding one line of code before building the extension. The OpenOffice.org builtin hyphenator uses Option 1. As I understand, it currently cannot do anything else because no morphological analysis is performed on the words before hyphenation, which is needed for Option 2. But hunspell does support morphological analysis and [1] suggests that it might in future be used in OOo to improve the hyphenation quality. Therefore I would like to know that if Option 2 becomes technically possible for OOo's builtin hyphenator to implement, will it make sense to use it instead of current behaviour? Or should there perhaps be a separate option to allow users to choose this, defaulting to current model (Option 1)? I have no strong opinions about which behaviour is actually better. I do not have MS Word, but I have been told that it does something that is close but not quite the same as Option 2 for Finnish compound words. But I do think that it makes sense for all backends (OOo builtin, Voikko, proprietary extensions) to interpret the options in the same way, which is why I am bringing this up for discussion. Harri [1] http://hunspell.sourceforge.net/tb87nemeth.pdf --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
