Nicholas Moser created FOP-2963:
-----------------------------------

             Summary: Add Option for Safer Hyphenation
                 Key: FOP-2963
                 URL: https://issues.apache.org/jira/browse/FOP-2963
             Project: FOP
          Issue Type: Improvement
            Reporter: Nicholas Moser
         Attachments: example-after-disabled.pdf, example-after-enabled.pdf, 
example-before.pdf, patch.diff

This is a new proposed setting for FOP I have decided to call *safer 
hyphenation*.

Currently, FOP may generate PDFs where text can overlap or go off the page. The 
most common scenarios I've seen this occur are:
 # A very small amount of space is allocated for text, such as the cell of 
table. Even if there are valid hyphenation points for words, a sufficiently 
large word may exit the cell as there aren't enough hyphenation points in it.
 # A string of characters such as numbers will exit the space allocated for 
them even if there is plenty of room to line break. This is because hyphenation 
patterns do not set line breaks for strings of numbers, therefore it sees no 
valid hyphenation points.

Examples of these issues can be seen in the attached PDF *example-before.pdf*. 
The third row on the first table has a really long word with many hyphenation 
points. Despite this, it exits the cell twice due to there not being enough 
hyphenation points. Additionally, The rows below this row contain a long series 
of numbers that have no hyphenation points and go off the page.

My proposed fix for this involves a new configuration setting called *safer 
hyphenation*. It effectively does three things.
 # Places hyphenation points between every character in a string buffer, 
ignoring hyphenation patterns.
 # Moves hyphenation from the second pass to the third pass of 
findOptimalBreakingPoints(...)
 # Massively increases the penalty for hyphenation.

The first change is fairly simple. A hyphenation can occur anywhere in any word 
in the document. This effectively fixes both of the problems, since now they 
will line break before they exit their allocated space. The issue is that now, 
the line breaking algorithm will attempt to use these new hyphenation points 
even when not necessary. This will result in many ugly hyphenations. Since 
hyphenation patterns are no longer used, I argue that the best way to handle 
this is to avoid hyphenation now unless it is absolutely necessary.

The second and third changes attempt to avoid hyphenation unless it is 
absolutely necessary. The second change only allows hyphenation during the 
third pass of the optimal breaking point search, after the max adjustment has 
been changed to 20. The third change massively increases the penalty for using 
a hyphenation. This results in the algorithm in avoiding hyphenation unless 
there are no other options.

Since this is a new configuration setting, I've included two additional PDFs, 
*example-after-disabled.pdf* and *example-after-enabled.pdf*. The first PDF 
proves that when the configuration is off, the changes are entirely passive and 
cause no different. The second PDF shows the improvements of using safer 
hyphenation. It also shows the downside, in that old hyphenation (with 
hyphenation patterns) can no longer be used to improve the layout of a 
paragraph.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to