On 27.02.2006 14:59:31 Manuel Mall wrote: > On Monday 27 February 2006 21:33, Jeremias Maerki wrote: > > On 27.02.2006 12:36:58 Manuel Mall wrote: > > > On Monday 27 February 2006 18:55, Jeremias Maerki wrote: > > > > What's the status of UAX#14? Does anybody have had time to work > > > > on that, yet? I'm asking because I'm considering hacking in > > > > support for the fixed width spaces (U+2000..U+200A). One of my > > > > clients asks for that but I can't allocate enough time right now > > > > to do the whole thing, unfortunately. > > > > > > I don't think UAX#14 will happen in a hurry. However in > > > http://wiki.apache.org/xmlgraphics-fop/LineBreaking I do describe > > > possible handling of fixed width spaces. The main decision, and > > > that has little to do with UAX#14 is if these spaces are to be > > > treated like white space when it comes to linebreaks or like > > > non-breakable spaces. If one follows the XSL-FO spec to the letter > > > these spaces are not white space and therefore are not removed > > > around a line break. I have no idea what actual user expectations > > > are when it comes to these spaces. Would authors (especially in non > > > english / latin languages) expect these spaces to be removed around > > > a linebreak or not? The relevant Knuth sequences which need to be > > > generated depend on that decision: Is the space removable or not > > > when a break occurs? > > > > I think we're talking about two different removals here, right? Once > > it's about the FO white-space-affecting properties. Here's where I > > think that these do not affect special Unicode spaces (only XML white > > space, see below). When we're talking about line-breaking I think the > > space that makes up the break possibility is removed (except in the > > case of tagged PDF where the space will need to be preserved for the > > structure info) but not any of the other "special" spaces in the > > vicinity. At least, that would be my expectation and my > > interpretation. > > > > Removal of spaces around formatter line breaks is also covered by the > spec. The property suppress-at-line-break controls it. And check its > definition of "auto". The fixed width spaces are explicitly excluded. > So, contrary to my initial post there is no ambiguity in the spec. > Fixed width spaces are not removed unless the user explicitly sets the > suppress-at-line-break property. As we do not yet support the > suppress-at-line-break property the only Knuth sequences which need to > be generated are for non-elastic, non-removable spaces. That should be > reasonably straight forward. > > Interestingly enough this means the default behaviour of > suppress-at-line-break is that independent of any other white space > handling properties U+0020 (space) is always(!) removed around > formatter generated line breaks. Need to think about that a bit more.
Wait a sec! suppress-at-line-break only applies to fo:character not to general text content!!! I think it is less complicated than you think right now. > > I've just gone through the FO spec again searching for "white" and it > > seems clear to me that the spec makes a rather clear distinction when > > white-space in terms of the XML spec is meant or when general > > white-space is meant. > > > > > I am also uncertain how these spaces interact with line > > > justification. They are by definition not elastic. So if you have a > > > fixed width space only between two words this is not an inter word > > > gap that can be used for justification. > > > > Yes. > > > > > Therefore any calculations which rely on knowing the > > > number of words on a line to determine how many inter word gaps we > > > have to then calculate the per gap justification amount will need > > > to be adjusted to not count inter word gaps which only contain > > > fixed width spaces. On the other hand they are still word > > > boundaries for the purpose of finding words for hyphenation. > > > > Yes. > > > > But is there really a problem when it comes to adjusting inter-word > > gaps because that's already handled by the right element list for all > > the different cases, right? At least, I don't see where exactly > > you're uncertain. The fixed width spaces just don't have any > > stretch/shrink they contribute to inter-word gaps. > > > > Yes, the Knuth algorithm will take the stretch/shrink into account when > doing its optimal line breaking but it will not tell you what the final > inter word gap is. That is I think separately computed based on the > number of words found with some fine tuning. This is where you may (or > may not) run into trouble. Ah, ok. Thanks for the clarification. > > I'll look into the fixed width spaces. So, thanks for your fast > > answer and the valuable pointer to the Wiki. In case I don't manage > > to do this cleanly, the least I can do is make sure we don't get ugly > > "#" in the output because the renderers don't know about the special > > spaces. This will also help for when someone has time to go towards > > UAX#14. > > > > Jeremias Maerki > > Manuel Jeremias Maerki