On 12.01.2007 09:25:59 Vincent Hennebert wrote: > Jeremias Maerki a écrit : > > Good to see that happen! Here's my take: > > > > On 11.01.2007 13:24:16 Manuel Mall wrote: > >> Hi, > >> > >> when I implemented the UAX#14 line breaking I noticed that fop doesn't > >> currently support the Unicode soft hyphen (SHY). > >> > >> I am thinking of adding support for this character to the line breaking > >> but am unsure of its correct behaviour in an XSL:FO environment. So I > >> have few questions related to treatment of the SHY: > >> > >> 1) If hyphenation is not enabled should a SHY still produce a valid > >> break opportunity or should it be ignored? > > > > I think it should represent a valid break opportunity. > > Well, I don't agree. See the description of SHY in section 15.2 of the > Unicode standard: SHY is used as a hint for automatic hyphenators and > overrides there behaviors. I would typically use it for nicely rendering > veryLongProgramVariablesLikeWeCanFindInJava in e.g. a portion of text > describing them in some documentation. Here I obviously want to force > hyphenation to occur between the words that make the variable name > (Long-Program-Variables instead of LongPro-gramVar-iables or whatever). > > So, as a hint for hyphenators, SHY should be ignored when hyphenation is > disabled, and when enabled have the priority over automatic hyphenation.
Hmm, I'm used to different behaviour in word processors and I don't read the UCD spec like you do. Also 5.3 in UAX#14 also doesn't give me the impression that a SHY is only active when hyphenation is enabled. It says: "The action of a hyphenation algorithm is equivalent to the insertion of a SHY. However, when a word contains an explicit SHY, it is customarily treated as overriding the action of the hyphenator for that word." I read this as: "SHY is the basic operator to add additional break points and a hyphenator can be added to do that task automatically." An example from the OpenOffice Help: "Definite separator To support automatic hyphenation by entering a separator inside a word yourself, use the keys Ctrl+minus sign. The word is separated at this position when it is at the end of the line, even if automatic hyphenation for this paragraph is switched off." <snip/> > > >> 2) If hyphenation is enabled shall a word containing a SHY still undergo > >> hyphenation? > > Yes, IMO. A SHY may sometimes be used to handle a special case and if > > that is done in a longer word, I still expect the hyphenation to do its > > work on the rest of the word, but then taking the shy into account when > > doing word-splitting. Nothing fancy, though. > > [Jörg] > > That's an interesting question. The problem are languages which use > > compound words and agglutination. Last time I looked, for the English > > language words containing shy were not automatically hyphenated, because > > this wouldn't make sense. German, Hungarian, Turkish etc. are somewhat > > more delicate. > > I think it's best to do automatic hyphenation, but remove shy (as well > > as other Unicode chars like joiners) before passing the word to the > > hyphenator. The shy position should however dominate the other > > hyphenation positions, perhaps by giving it a lower penalty. > > We would just have to set the right penalty for SHY and automatic > hyphens, such that SHY are preferred yet don't completely prevent > breaking to occur at other hyphens in the word. Will probably need some > trial-and-error steps. > > > > > >> 3) Shall a break opportunity created by a SHY be given the same penalty > >> (in the Knuth sense) as a normal hyphenation break? > > > > Yes, IMO. > > Well, I was also thinking yes on the first time, but given point 2 above... Given the wording of UAX#14 5.3 I remain with my opinion. Jeremias Maerki