Re: Knuth linebreaking questions
Finn Bock wrote: > I tend to read that to mean that word spacing may be pushed beyond the > specified range by justification. And I would think that unjustified > alignment still has the option of using the word-spacing range but > ofcourse has to stay within the range. I'm not convinced ... The effect of having left-aligned text and adjustable word-spacing would be an output in which most lines are justified, but the ones in which the adjustment ratio would be > 1 ... I really don't think this would be better than having all lines left-aligned! :-) And if the user sets text-align="left" but does not explicitly sets word-spacing, and the default value is used, and for a lucky coincidence the algorithm find breaking points involving ratios < 1, the output would show justified lines, instead of the left-aligned lines the user would have likely expected. Regards Luca
Re: Knuth linebreaking questions
On Thu, Dec 02, 2004 at 12:16:55PM +0100, Finn Bock wrote: > >and point #4 uses the user-definable threshold; where should this constant > >be stored? Inside the code of LineLM or in a configuration file? > > An extension attribute? > > ... > > I suspect that the other knuth parameters should be specified the same > way. But it is not a high priority IMO. It is not a layout specification in the fo file, it is a fine-tuning of the algorithm applied by a particular FO Processor. It should be in the user configuration. It may be specified in the configuration file, or it may be specified by the calling application in the configuration object FOUserAgent.userConfig. In the configuration file it should be something like: 5 other parameters FOUserAgent should get appropriate methods to extract the layout part of the configuration and pass it on to a client class, e.g. LineLM. Cf. FOUserAgent.getUserRendererConfig(). TeX's terms are pretolerance and tolerance for the two values of maxAdjustment. Regards, Simon -- Simon Pepping home page: http://www.leverkruid.nl
Re: Knuth linebreaking questions
And why not adjust the spacing within the user specified min/max for START and END alignment? [Luca] Should the user desire adjusted spaces, wouldn't it be better for him to specify justified alignment? :-) Seriously, the recommendation (at 7.16.2 "letter-spacing" and 7.16.8 "word-spacing") states that these spaces "may also be influenced by justification", but says nothing about start and end alignments. I tend to read that to mean that word spacing may be pushed beyond the specified range by justification. And I would think that unjustified alignment still has the option of using the word-spacing range but ofcourse has to stay within the range. I'm still not sure why it would be ok to ignore any user specified min and max values of 'word-spacing' during START and END alignment. If a user specifies a length range, what would the reason be for not using it? Perhaps with additional DEFAULT_SPACE_WIDTH. When alignment is start or end, each space has always its .optimum width, so there is no need to look at the .minimum and .maximum: the user most preferred value is already used. Is there anything that prevents using a non .optimum value within the range if the result is judged to be better (with a lower demerit). Ok, performance is indeed a fine reason, but IMHO such quality vs. speed tradeoffs should eventually be made by the user rather than us. Simon told the same: # Note that in TeX such thresholds are user-adjustable parameters. I # think they should eventually be so in FOP too, for those of us who # have the most exquisite taste of line layout. and I think it's a good idea; the algorithm should: 1 find breaking points without hyphenation 2 hyphenate 3 find breaking points with hyphenation 4 decide which ones are "better" and point #4 uses the user-definable threshold; where should this constant be stored? Inside the code of LineLM or in a configuration file? An extension attribute? ... I suspect that the other knuth parameters should be specified the same way. But it is not a high priority IMO. regards, finn
Re: Knuth linebreaking questions
Finn Bock wrote: (starting from the second question) > And why not adjust the spacing within the user specified min/max for > START and END alignment? Should the user desire adjusted spaces, wouldn't it be better for him to specify justified alignment? :-) Seriously, the recommendation (at 7.16.2 "letter-spacing" and 7.16.8 "word-spacing") states that these spaces "may also be influenced by justification", but says nothing about start and end alignments. > I'm still not sure why it would be ok to ignore any user specified > min and max values of 'word-spacing' during START and END alignment. > If a user specifies a length range, what would the reason be for not > using it? Perhaps with additional DEFAULT_SPACE_WIDTH. When alignment is start or end, each space has always its .optimum width, so there is no need to look at the .minimum and .maximum: the user most preferred value is already used. But the knuth algorithm would not work if there were no elements with adjustable width (glue with stretchability and/or shrinkability); the actual value used is not very relevant, because the computed adjustment ratio will not be applied. > Ok, performance is indeed a fine reason, but IMHO such quality vs. > speed tradeoffs should eventually be made by the user rather than us. Simon told the same: # Note that in TeX such thresholds are user-adjustable parameters. I # think they should eventually be so in FOP too, for those of us who # have the most exquisite taste of line layout. and I think it's a good idea; the algorithm should: 1 find breaking points without hyphenation 2 hyphenate 3 find breaking points with hyphenation 4 decide which ones are "better" and point #4 uses the user-definable threshold; where should this constant be stored? Inside the code of LineLM or in a configuration file? Regards Luca
Re: Knuth linebreaking questions
1) What is the purpose of 2 glues for a normal space in END and START alignment: new KnuthGlue(0, 3 * wordSpaceIPD.opt, 0, , false)); new KnuthPenalty(0, 0, false, , true)); new KnuthGlue(wordSpaceIPD.opt, - 3 * wordSpaceIPD.opt, 0, , true)); [Luca Furini] The purpose is to give each line (but the last one) the same stretchability, regardless of the number of spaces in it. If the penalty is not used (there is no line ending there) the overall effect of the 2 glues is a 0 stretchability and does not modify the line total; if the penalty is used (a line ends there) then the stretchability of the previous glue is added to the line total, which becomes 3 * wordSpaceIPD.opt because the previous space, as said before, added 0 (the following glue is suppressed). In justified text, a line with many spaces can be adjusted in order to be much shorter, or much longer. If left-aligned text used the same elements, the algorithm would find the same breaking points; but this time adjustment ratios are not used, so a line with many spaces would be too much longer, or too much shorter, than the other lines. Using these elements, the algorithm creates lines whose unadjusted width is quite the same. Ok, thank you for the explanation. I'm still not sure why it would be ok to ignore any user specified min and max values of 'word-spacing' during START and END alignment. If a user specifies a length range, what would the reason be for not using it? Perhaps with additional DEFAULT_SPACE_WIDTH. And why not adjust the spacing within the user specified min/max for START and END alignment? 3) What is the reasoning for doing hyphenation only after threshold=1 fails. Naive common sense tells me that if the user specify hyphenation we should do hyphenation before finding line breaks. Finding hyphenation points is time-expansive (all words must be hyphenated, not only the ones "near a line's end"), the sequence of elements becomes longer, there are more feasible breaking points, and a line ending with a "-" is less beautiful; so I thought that if a set of breaking points could be find without hyphenation. I just took the "hyphenate" property as a suggestion instead of an order! :-) Note that the same algorithm with the same threshold could find a different set of breaking points with and without hyphenation, because the elements are different. Without hyphenation, spaces could need a little higher adjustment, for example. Ok, performance is indeed a fine reason, but IMHO such quality vs. speed tradeoffs should eventually be made by the user rather than us. Thank you for taking the time to explain it all in such great detail. regards, finn
Re: Knuth linebreaking questions
Finn Bock wrote: 3) What is the reasoning for doing hyphenation only after threshold=1 fails. Naive common sense tells me that if the user specify hyphenation we should do hyphenation before finding line breaks. The purpose of professional typography and layout is to assist the reader: provide an easy reading with minimal distractions. Typographic concepts reflect this. Justified text makes it easier to identify paragraphs. Unfortunately, long words may cause word spaces to be stretched into large white blobs which disrupt reading. Hyphenation is essential to cut down on space allocated for text justification, especially for languages which can form arbitrary long compound words. Hyphenation has of course it's own drawback: words are mostly identified by the letters at the beginning and the end, and hyphenation disrupts this. Several lines ending in hyphenated words may also cause the reader to pick up the wrong continuation line (that's the reason for having the hyphenation-ladder-count property). This tradeoff between using hyphenation in order to avoid visual artefacts and having lots of hyphenated words disrupting the flow has to be balanced. J.Pietschmann
Re: Knuth linebreaking questions
On Tue, Nov 30, 2004 at 07:27:29PM +0100, Luca Furini wrote: > Finn Bock wrote: > > > 3) What is the reasoning for doing hyphenation only after threshold=1 > > fails. Naive common sense tells me that if the user specify hyphenation > > we should do hyphenation before finding line breaks. > > Finding hyphenation points is time-expansive (all words must be > hyphenated, not only the ones "near a line's end"), the sequence of > elements becomes longer, there are more feasible breaking points, and a > line ending with a "-" is less beautiful; so I thought that if a set of > breaking points could be find without hyphenation. > > I just took the "hyphenate" property as a suggestion instead of an order! :-) This is the practice in TeX too. It may be considered as a satisfactory implementation of hyphenate="true": Take hyphenation into account, when your line layout algorithm considers it a better solution to hyphenate these lines. This algorithm does not think it necessary to try hyphenation when there is a non-hyphenated solution with an amount of demerits below a certain threshold. Note that in TeX such thresholds are user-adjustable parameters. I think they should eventually be so in FOP too, for those of us who have the most exquisite taste of line layout. > Note that the same algorithm with the same threshold could find a > different set of breaking points with and without hyphenation, because the > elements are different. Without hyphenation, spaces could need a little > higher adjustment, for example. > > > 4) I've compared your code to tex_wrap > > http://oedipus.sourceforge.net/texlib/ > > and the main difference is in the way new KnuthNodes are added to the > > active list. Is the BestRecords part of Knuth or is it your own > > invention? Why is it only fitness_class'es in BestRecord that is higher > > then minDemerits + incompatibleFitnessDemerit that is added to > > activeList? Why not all fitness_class'es in BestRecords? > > At the moment I don't have the book at hand, but I am quite sure it's > *not* an invention of mine! :-) > > As far as I can remember, the Knuth book uses 4 different variables, named > C1, ... C4 :-( (or maybe D or A, anyway not a very self-documenting name!) > and I just created this structure to store them. The algorithm distinguishes four classes of lines: tight, normal, loose, very loose. When two consecutive lines are not of the same or of two adjacent classes, it gives a penalty of incompatibleFitnessDemerit. If the line of class i leading to breakpoint b does not have an amount of demerits best.getDemerits(i) which is less than the minimum demerits of all four classes (there is one best line of each class leading to breakpoint b), best.getMinDemerits(), plus incompatibleFitnessDemerit, it can never be selected. The optimization omits it from the list of best breakpoints. Knuth mentions that it saves him 25% of executions of his loop, in his computational experiments. Regards, Simon -- Simon Pepping home page: http://www.leverkruid.nl
Re: Knuth linebreaking questions
[Finn] 3) What is the reasoning for doing hyphenation only after threshold=1 fails. Naive common sense tells me that if the user specify hyphenation we should do hyphenation before finding line breaks. [Luca] Finding hyphenation points is time-expansive (all words must be hyphenated, not only the ones "near a line's end"), the sequence of elements becomes longer, there are more feasible breaking points, and a line ending with a "-" is less beautiful; so I thought that if a set of breaking points could be find without hyphenation. I've just started to read Knuth's chapter on breaking paragraphs into lines, and from what I've read, he considers excessive hyphenation a bad form. The main benefits he gives for taking the entire paragraph into account when deciding where to break lines (as opposed to the more traditional just-look-at-the-current-line analysis) are a reduced need for hyphenation and a reduced number of over-spaced lines (i.e., too few words on a line requiring large spaces between them for the line to be justified.) Glen
Re: Knuth linebreaking questions
Finn Bock wrote: > 1) What is the purpose of 2 glues for a normal space in END and START > alignment: > > new KnuthGlue(0, 3 * wordSpaceIPD.opt, 0, , false)); > new KnuthPenalty(0, 0, false, , true)); > new KnuthGlue(wordSpaceIPD.opt, - 3 * wordSpaceIPD.opt, 0, , true)); The purpose is to give each line (but the last one) the same stretchability, regardless of the number of spaces in it. If the penalty is not used (there is no line ending there) the overall effect of the 2 glues is a 0 stretchability and does not modify the line total; if the penalty is used (a line ends there) then the stretchability of the previous glue is added to the line total, which becomes 3 * wordSpaceIPD.opt because the previous space, as said before, added 0 (the following glue is suppressed). In justified text, a line with many spaces can be adjusted in order to be much shorter, or much longer. If left-aligned text used the same elements, the algorithm would find the same breaking points; but this time adjustment ratios are not used, so a line with many spaces would be too much longer, or too much shorter, than the other lines. Using these elements, the algorithm creates lines whose unadjusted width is quite the same. > and why isn't the min and max of wordspaceIPD used. Well, you just made me notice there is a little bug, LineLayoutManager.DEFAULT_SPACE_WIDTH should be used insted! :-) It's just a "magic number": the point is that every TextLM should use the same value. > 2) What does the threshold parameter to findBreakingPoints controll? > It seems to be a performance parameter which control the number of > active nodes, rather than a quality parameter. > Or to frame my question > differently, if threshold=1 finds a set of breaks, will threshold=5 > always pick the same set of breaks? Or can threshold=5 find a better set > of breaks? It controls both performance and quality: minimum quality. If threshold = 1 finds a set of breaks, it is the best possible set of breaks, because the adjustment ratio of each break is <= 1 which means that spaces and other adjustable objects will not need to be longer than their .max width. But with this optimal threshold the algorithm could fail, and find no set of breaking points; so, a try with a higher threshold must be done. If with threshold = 1 a set is found, with threshold = 5 the same set would be found, but it would take more time, because a greater number of active nodes are used. > 3) What is the reasoning for doing hyphenation only after threshold=1 > fails. Naive common sense tells me that if the user specify hyphenation > we should do hyphenation before finding line breaks. Finding hyphenation points is time-expansive (all words must be hyphenated, not only the ones "near a line's end"), the sequence of elements becomes longer, there are more feasible breaking points, and a line ending with a "-" is less beautiful; so I thought that if a set of breaking points could be find without hyphenation. I just took the "hyphenate" property as a suggestion instead of an order! :-) Note that the same algorithm with the same threshold could find a different set of breaking points with and without hyphenation, because the elements are different. Without hyphenation, spaces could need a little higher adjustment, for example. > 4) I've compared your code to tex_wrap > http://oedipus.sourceforge.net/texlib/ > and the main difference is in the way new KnuthNodes are added to the > active list. Is the BestRecords part of Knuth or is it your own > invention? Why is it only fitness_class'es in BestRecord that is higher > then minDemerits + incompatibleFitnessDemerit that is added to > activeList? Why not all fitness_class'es in BestRecords? At the moment I don't have the book at hand, but I am quite sure it's *not* an invention of mine! :-) As far as I can remember, the Knuth book uses 4 different variables, named C1, ... C4 :-( (or maybe D or A, anyway not a very self-documenting name!) and I just created this structure to store them. I'll try and find some time to look at this ... Thanks for your interest and your comments, they are most welcome! Regards Luca
Knuth linebreaking questions
Hi Luca (and others), I've been trying to get my head around the line breaking code and during that process some questions has come up. I urge you *not* to take anything I ask as a sign of criticism or as a request for changes. I don't have the Knuth paper where the algorithm is described so perhaps the answers would be obvious if I read it. 1) What is the purpose of 2 glues for a normal space in END and START alignment: new KnuthGlue(0, 3 * wordSpaceIPD.opt, 0, , false)); new KnuthPenalty(0, 0, false, , true)); new KnuthGlue(wordSpaceIPD.opt, - 3 * wordSpaceIPD.opt, 0, , true)); and why isn't the min and max of wordspaceIPD used. 2) What does the threshold parameter to findBreakingPoints controll? It seems to be a performance parameter which control the number of active nodes, rather than a quality parameter. Or to frame my question differently, if threshold=1 finds a set of breaks, will threshold=5 always pick the same set of breaks? Or can threshold=5 find a better set of breaks? 3) What is the reasoning for doing hyphenation only after threshold=1 fails. Naive common sense tells me that if the user specify hyphenation we should do hyphenation before finding line breaks. 4) I've compared your code to tex_wrap http://oedipus.sourceforge.net/texlib/ and the main difference is in the way new KnuthNodes are added to the active list. Is the BestRecords part of Knuth or is it your own invention? Why is it only fitness_class'es in BestRecord that is higher then minDemerits + incompatibleFitnessDemerit that is added to activeList? Why not all fitness_class'es in BestRecords? regards, finn