Just a quick heads up that I finally took the plunge to add UAX#14 line breaking to FOP. This is based on code donated by Joerg quite some time ago on which I did some work in October 2005. This had been documented on list at the time.
One of the major stumbling blocks in progressing this was the conflict between the recursive / nested getNextKnuthElement calls and the need to do the UAX#14 line breaking processing across inline boundaries. In the end I decided, in the interest of making at least some progress in this area, to not attempt the 'all singing all dancing solution', but to simply apply this to the TextLayoutManager only. Yes, that gives us only limited new functionality, but hopefully its still an improvement. Also, the code is based on the Unicode 4.1 standard and not 5.0 but that can be fixed later. Its looking OK so far and most of the layout engine tests pass. The change consists of a new package org.apache.fop.text.linebreak containing two classes and changes to the TextLayoutManager. Nothing else has been touched so far. Its not ready for a commit yet, but hopefully in a few days. The question that arises is if this should go into the planned release or if that is too risky and I should wait with the commit until the release is out or do it in a branch? Another issue is that one of the two new files is actually generated by a little Java program (also from Joerg) from Unicode data files. While it would be a 'nice to have' for this generation to be integrated into the FOP build I would initially commit the generated file into the repository. To integrate the generation into the build we would either need have the Unicode data files in the Apache repository (not sure about licensing issues here) or the build would need to fetch those files causing an external dependency which usually is a hassle for people behind corporate firewalls etc.. Thats why I propose to apply the KISS principle initially. Manuel
