Op Dinsdag 2007-10-23 skryf r.baars:
> Since some weeks now, I am preparing for creating new hyphentation 
> patterns for Dutch, includieng the feature of changing the word while 
> hyphenating.
> 
> Normally, one generates patterns from a hyphenation file, which is in 
> the rather simple format of hyphenated words (ex-am-ple).
> 
> This format is clearly not good enough to show all hypehnation patterns 
> for these changing words.  I'll use Dutch examples partly from now, 
> though it applies to German and Greek (at least) as well.
> 
> For an input dictionary, I see 2 alternatives:
> 1) just list all possible hyphenations:
> ex=ample, exam=ple
> omaatje, oma=tje
> ruïne, ru=ine
> tv-=special,
> 2) Make special notes for the changes, signalling the hyphenations with 
> special chars like brackets containging the optional alteration
> ex[]am[]ple
> oma[a=]tje   (remove 1 a when hyphenating)
> ru[ï=i]ne    (change "i inti i when hyphenating)
> tv-[-=]special (remove the - (another one will be inserted by 
> hyphenating process)
> 
> For the first, and most common hyphenation, a shorthand could be 
> introduced by any char, saving 1 char per word (Is that worth it?)
> The chars for the brackets and hyphenation could be 'declared' in the 
> file header, leading to a format like:
> 
> []   #hyphenation area
> =   #hypehnation char
> ru[ï=i]ne   #example comment
> 
> * Would more languages then Dutch have use for a format like this?
> * Would it be feasible to base a pattern generator on this format
> 
> * What are the general thought about trying to set a standard for 
> hyphenation registration ?
> 
> Please feel free to comment on this.
> 
> 
> Ruud Baars
> 

Haai Ruud

I've implemented the Afrikaans hyphenation. The rules for Afrikaans are
(to my best knowledge) very similar to Dutch, so we need all of the same
features. So obviously at some stage we need to also look at these
things. Up to now, we only do standard hyphenation without removing the
diaeresis (?deelteken) or handling the obvious hyphenation points of
things like TV-kanaal.

One thing that I should perhaps mention, is that I didn't follow a word
based approach at all. I just built the rules by hand. So my interest is
not so much in a word list specifying per-word rules (for now at least),
but in the way to specify this in the hyph.dic files. Is this already
supported?

Groete
Friedel

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to