Re: DO NOT REPLY [Bug 29124] - New line breaking algorithm
Simon Pepping wrote: Still to be done: - Resolve the regressions mentioned above. As concerns leader with use content, patch created and successfully tested. The ContentLM calls getNextKnuthElements on his child InlineStackingLM, uses the returned elements to calculate the pattern width and returns them to the LeaderLM. The LeaderLM uses them when calling addAreas. I also found a bug affecting leaders with leader-pattern = dots: the TextArea with the dot (created in LeaderLM.getLeaderInlineArea) had width = 0; calling setWidth() fixes this problem. There is still a little difference between a leader with leader-pattern = dots and one with use-content and a single dot as content: the former is placed a bit over the baseline, but I couldn't find the reason. Note that using the fo file xml-fop/examples/fo/basic/leader.fo to test the patch you won't see the leaders with leader-pattern = use-content, as they don't have a width property and the default .opt value (12pt) is than the pattern width. Setting a larger width, or text-align-last = justify, makes the leaders visible. - I support the idea to create an InlineLayoutManager interface, which extends LayoutManager. Done, same patch (or maybe I should create a different one?). I also removed the getWordSpaceIPD() method, as I find out that a constant value works better: the LineLM and its child must use the same value, or the result is not always correct. 1. Can we be sure that U+A is always alone or the first item in a textArray; does this not depend on the Parser, how it calls the SAX characters method? Right, it's better to handle the most general case. The patch will fix this too. I will try to fix the other points reported by Simon as soon as possible. Regards Luca
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm [EMAIL PROTECTED] changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED --- Additional Comments From [EMAIL PROTECTED] 2004-09-05 18:21 --- Luca, Patch applied. Thanks for this innovative and extensive contribution. Still to be done: - Resolve the regressions mentioned above. - I support the idea to create an InlineLayoutManager interface, which extends LayoutManager. - This patch has made a lot of existing code redundant. Much of that code is still present. To keep the code clean and intelligible, the redundant pieces should be removed at some time by somebody. I have added a space after casts. See the style guidelines in the file dev/conventions.html of the FOP web site. I have a few remarks about the code. I leave it to you to follow these up or not, but I would like to see point 1 addressed: 1. In TextLM: // linefeed; this can happen when linefeed-treatment=preserve // the linefeed character is the first one in textArray, // so we can just return a list with a penalty item In LineLM: if (returnedList.size() 1 || !(thisElement.isPenalty() ((KnuthPenalty)thisElement).getP() == -KnuthElement.INFINITE)) { } else { // a list with a single penalty item whose value is -inf // represents a preserved linefeed, wich forces a line break Can we be sure that U+A is always alone or the first item in a textArray; does this not depend on the Parser, how it calls the SAX characters method? 2. In InlineStackingLM.applyChanges: Falling over the end of oldListIterator can be done better: treat only currLM != prevLM in the loop, treat !oldListIterator.hasNext() after the loop. Same for getChangedKnuthElements? : while(oldListIterator.hasNext()) { oldElement = (KnuthElement)oldListIterator.next(); currLM = oldElement.getLayoutManager(); // initialize prevLM if (prevLM == null) { prevLM = currLM; } if (currLM != prevLM) { bSomethingChanged = prevLM.applyChanges(oldList.subList(fromIndex, oldListIterator.previousIndex())) || bSomethingChanged; prevLM = currLM; fromIndex = oldListIterator.previousIndex(); } } bSomethingChanged = currLM.applyChanges(oldList.subList(fromIndex, oldList.size())) || bSomethingChanged; Possible cases, after the loop: xxyy or yy, xx done prevLM = currLM = y fromIndex = last done (2 and 0) 3. In InlineStackingLM: Unnecessary differences between treatment of returnedList and returnList in getNextKnuthElements and getChangedKnuthElements. In getChangedKnuthElements it is not necessary to have a separate returnedList and returnList. 4. Break up long methods in LineLM: findHyphenationPoints, getNextBreakPoss, considerLegalBreak (?), findBreakingPoints (?). Regards, Simon
[Fwd: DO NOT REPLY [Bug 29124] - New line breaking algorithm]
--- Additional Comments From [EMAIL PROTECTED] 2004-08-31 18:44 --- Thanks for the new patch. I could apply it without problems, and testing it goes well. You mention that you have not implemented the Knuth algorithm for ContentLM. Would it be difficult to do that? FOP team, If I would apply this patch, we would get the following regressions: - ContentLM does not show its content. A leader with leader-pattern=use-content results in a blank area of the right size. Doesnt sound like it will be difficult to fix after the patch is applied. - When for an exceptionally difficult paragraph no set of breaking points can be found, the whole paragraph is printed on a single line. This occurs, for example, when in a narrow typesetting width only a single word or a part of it fits in a line. I would think that strange effects like this are possible today. Can you see what the output would look like in such a scenario with the current code? I am working towards applying this patch despite these regressions, for these reasons: - This patch is a good piece of work, and a step forward for FOP's layout. Agreed. - It becomes increasingly hard to maintain this patch outside of CVS. I know how you feel. I found it hard work before when I examined Luca's earlier Knuth patch. Chris
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-09-02 04:23 --- Q I am working towards applying this patch despite these regressions, for these reasons: - This patch is a good piece of work, and a step forward for FOP's layout. - It becomes increasingly hard to maintain this patch outside of CVS. Please, speak up if you are against this. /Q Simon, I have not had the time to be following this issue much so will be deferring to your judgment. Thanks, Glen
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-08-31 18:44 --- Luca, Thanks for the new patch. I could apply it without problems, and testing it goes well. You mention that you have not implemented the Knuth algorithm for ContentLM. Would it be difficult to do that? FOP team, If I would apply this patch, we would get the following regressions: - ContentLM does not show its content. A leader with leader-pattern=use-content results in a blank area of the right size. - When for an exceptionally difficult paragraph no set of breaking points can be found, the whole paragraph is printed on a single line. This occurs, for example, when in a narrow typesetting width only a single word or a part of it fits in a line. I am working towards applying this patch despite these regressions, for these reasons: - This patch is a good piece of work, and a step forward for FOP's layout. - It becomes increasingly hard to maintain this patch outside of CVS. Please, speak up if you are against this. Simon
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-08-28 10:09 --- Created an attachment (id=12560) patch to existing files (version 7.1)
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-08-27 19:29 --- Luca, The build failed. That is probably due to this line in the patch: cvs server: I know nothing about layoutmgr/LeafNodeLaoyutManager.java Simon
Re: DO NOT REPLY [Bug 29124] - New line breaking algorithm
Simon Pepping wrote: Nested inline and other LMs: The output contains errors, see the comments in the text. The errors occur when hyphenation is set to true. Fixed: there were errors in the method addALetterSpaceTo of LeafNodeLM and InlineStackingLM. I also found a bug in the LeafNodeLM.addAreas method, affecting HEAD too: the area is added to the area tree (with parentLM.addChild(curArea)) *before* widthAdjustArea is called, so its width is not correctly added to the inline parent width and the output sometimes shows overlapped text (when there is another child of inline parent after the leader area). Justification: This is a test fo you submitted earlier. According to the text in the file the second block should be hyphenated; it is not. Should it still be hyphenated, or can this not be enforced with the Knuth algorithm and text-align=start? I cannot find a hyphenate property in the fo file you attached, so I'm not sure whether I understand what you mean. Anyway, hyphenate = true means, according to the recommendation (7.9.4), that hyphenation may be used in the line-breaking algorithm, not that it *must* be used. As hyphenation is time-expansive and bad-looking, I think it should be used only if necessary. No breakpoints: An exception is thrown, at LineLayoutManager.getNextBreakPoss(LineLayoutManager.java:495). It occurs because breakpoints has size 0; the third call to findBreakingPoints also returned 0. This should not be possible; the algorithm should always return a breakpoint. Right, I completely forgot to provide a fallback in case the algorithm doesn't find a good set of breaking points. I added a boolean argument called force to findBreakingPoints: if it is true, and after the main loop there are no active nodes, the last deactivated node is used to create LineBreakPositions. There will zero or more good lines followed by a single line including all the remaining content (this line will obviously get off the right margin). The method findBreakingPoints will be called no more than three times: I) no hyphenation, adjustment ratios must be = 1 II) hyphenation (if allowed), or ratios up to 5 III) ratios up to 20, and if necessary force the creation of LineBreakPositions A few small remarks: Can you move the following log messages to trace log level: [DEBUG] AbstractLayoutManager - - Word to hyphenate: We Done In TextLM, returning null for a forced LF is not an idea that I like, because it overloads the null return value. Cannot you return an special Knuth element for LF? Alternatively, you could return null and process the paragraph. The second paragraph would then be produced and processed later. A preserved linefeed can be represented by a penalty item whose value is -infinite: +inf means that there can't be a break here, -inf means that there must be a break (as there can't be a better breakpoint). Preserved linefeeds inside inlines are much more problematical than I first thought, but they should work now: I had to add a List argument to the applyChanges() and getChangedKnuthElements() methods, to tell an ISLM which children it has to consider. InlineStackingLM.getNextKnuthElements: 'if (lc.startsNewArea())' no longer used? I tried to preserve the existing code as much as possible, so I didn't touch that if statement. Maybe I removed some lines in the LineLM so that lc.startsNewArea is never true? Regards, Luca
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-08-24 15:24 --- I'm going to attach still another version of the patch :-), corrected according to Simon's comments. The new files (Knuth*.java) are unchanged, so I don't attach them. Regards, Luca
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-08-24 15:25 --- Created an attachment (id=12518) patch to existing files (sixth edition)
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-08-19 16:36 --- Created an attachment (id=12488) patch - existing files (fifth edition)
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-08-19 19:28 --- The last patches do not apply without errors: - Last lines of patch files do not end with an newline. - Diff of area/inline/TextArea.java is incomplete. - Compile error in render/xml/XMLRenderer.java. I did manage to fix all problems.
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-08-19 19:32 --- Created an attachment (id=12492) test fo: Nested inline and other LMs
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-08-19 19:32 --- Created an attachment (id=12493) test fo: Justification
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-08-19 19:33 --- Created an attachment (id=12494) test fo: No breakpoints
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-08-17 13:51 --- Oops, sorry again, Simon! In the code I used when creating the last patch there was an error affecting the TextLayoutManager.getChangedKnuthElements() method: two missing break inside a switch. Due to this error, the sequence of elements generated for each space (when text-align is center, start or end) is wrong, and some text disappears (I even got IndexOutOfBounds exceptions). Inserting these breaks is enough to make everything work: ... // ai refers to a space switch (alignment) { case CENTER : ... iReturnedIndex ++; break; /* this was missing */ case START : // fall through case END: ... iReturnedIndex ++; break; /* this was missing */ case JUSTIFY: ... } ... As you can see, in the last patch I changed the getNextKnuthElements() and getChangedKnuthElements() return type, so they now return a sequence of elements instead of a single one. This maybe reduces similarities between getNextKnuthElements() and getNextBreakPoss(), but I think it makes the code simpler and easier to understand. Maybe it would be even better to make them return the whole sequence, so that these methods are called once per LM. Now I'm working on the newly-created LMs, so next patch (which I think will be ready tomorrow) will apply to the latest code version and will include Finn Bock's changes. Regards, Luca
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-08-16 19:13 --- Hi Luca, I have problems with the results for text-align=start, end and center in your test FO file. The lines are too long, and the first line ends with a space. For start and end a paragraph is dropped. Herewith the area tree output for start/end and center: block width=288000 ipd=288000 height=30800 props=border-start:(87,#00,1000);break-after:8;border-end:(87,#00,1000);border-after:(87,#00,1000);border-before:(87,#00,1000);break-before:8; lineArea height=14400 text twsadjust=0 tlsadjust=0 props=font-size:12000;font-family:F1;color:#00;Poche corte parole molto corte, in modo che tutto vada bene e non ci siano guai. Qualche /text /lineArea lineArea height=14400 text twsadjust=0 tlsadjust=0 props=font-size:12000;font-family:F1;color:#00;tra parola per fare tre righe./text /lineArea /block block width=288000 ipd=288000 height=30800 props=border-start:(87,#ff,1000);break-after:8;border-end:(87,#ff,1000);border-after:(87,#ff,1000);border-before:(87,#ff,1000);break-before:8; lineArea height=14400 text twsadjust=0 tlsadjust=0 props=font-size:12000;font-family:F1;color:#00;Poche corte parole molto corte, in modo che tutto vada bene e non /text /lineArea lineArea height=14400 text twsadjust=0 tlsadjust=0 props=font-size:12000;font-family:F1;color:#00;ci siano guai. Qualche altra paro/text text twsadjust=0 tlsadjust=0 props=font-size:12000;font-family:F1;color:#00;la per fare tre righe./text /lineArea /block Regards, Simon
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-08-12 08:23 --- I'm going to attach an updated patch, including HyphContext (which I forgot to include in the previous versions, sorry) and a few changes to fix a couple of bugs. I used linux's diff between the modified files and the original ones (updated yesterday, 11 August); for some reasons (maybe I use some wrong options) wincvs's diff did not include new files and did not use the latest version of the original files, so finding lots of difference due to recent cvs commits. I hope I did not forget anything this time! :-) Luca
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-08-12 08:25 --- Created an attachment (id=12400) patch file (fourth edition, including HyphContext and bug fixes)
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-08-10 19:28 --- Luca, Your patch applied cleanly to a checkout from CVS dated 1 August 2004, but I get build errors. Could it be that differences to HyphContext are not included in the patch? [javac] src/java/org/apache/fop/layoutmgr/LineLayoutManager.java:1081: cannot resolve symbol [javac] symbol : constructor HyphContext (int[],int) [javac] location: class org.apache.fop.layoutmgr.HyphContext [javac] return new HyphContext(hyph.getHyphenationPoints(), sbChars.length()); [javac]^ [javac] src/java/org/apache/fop/layoutmgr/TextLayoutManager.java:425: cannot resolve symbol [javac] symbol : method isWordEnd () [javac] location: class org.apache.fop.layoutmgr.HyphContext [javac] newIPD.add(MinOptMax.multiply(letterSpaceIPD, (hc.isWordEnd()? [javac] ^ [javac] src/java/org/apache/fop/layoutmgr/TextLayoutManager.java:437: cannot resolve symbol [javac] symbol : method isWordEnd () [javac] location: class org.apache.fop.layoutmgr.HyphContext [javac] (short)0, (short)(hc.isWordEnd()? (iStopIndex - iStartIndex - 1): (iStopIndex - iStartIndex)), [javac] ^ [javac] 3 errors Simon
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-08-05 16:09 --- I'm going to attach the corrected patch, sorry again! Knuth's algorithm is described in the essay D. E. Knuth and M. F. Plass, Breaking paragraphs into lines and I found it in the book D. E. Knuth, Digital typography, published by CSLI Publications Unfortunately, I couldn't find any link to an on-line version of this essay. As regards the names of the classes, they were mainly devised to detect quickly the new files among the others! So, it's not a problem for me to change them :-) Luca
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-08-05 16:11 --- Created an attachment (id=12345) patch file (third edition, minor oversight fixed)
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-08-03 12:41 --- Hi all At long last, I have finished the patch implementing Knuth's line breaking algorithm; it took me more than I expected, mainly because of a long sequence of hw and sw troubles ... Murphy's laws are not something to laugh at! :-) I have worked on [Line, Text, InlineStacking, LeafNode]LM, so the algorithm should work well with any fo file containing text, leaders, characters, inlines and the other formatting objects handled by a LeafNodeLM (external graphics, pagenumbers and citations). The general idea of Knuth algorithm is: try to find breaking points without hyphenating words if this fails hyphenate all words try again The hyphenate all words phase could be time-expansive, so this step is performed trying to use as much as possible the information already known, and to minimize the changes to the existing sequence of elements. The old sequence is used to collect word fragments, and elements are replaced only if the LM which created them has something to change. So, hyphenate all words means: scan the old sequence once: collect word fragments hyphenate word scan the old sequence once more: if the LM which returned this element has changed something replace all elements returned by this LM These are the new methods added; at the moment, I added them to the LayoutManager interface, but maybe it could be better to create a new interface implementd only by LM returning inline areas. + getNextKnuthElement() This is used instead of getNextBreakPoss(). The next step (I have already started working on it) would be to use the same method all LM use. + addALetterSpaceTo() The low-level LMs (TLMs, LNLMs) have only a partial view of the text, and therefore cannot know the exact number of letter spaces, while the LineLM has a full view. If a TLM's text is Tex, it can only suppose it has 2 letter spaces; if the following formatting object is a character t, the LineLM tells the TLM to add a letter space, as the x is not the last letter of the word. + getWordChars() This is not a new method, it just has different parameters; text is collected from fo:characters too. + hyphenate() The TLM does not apply the changes to vecAreaInfo immediately, otherwise the existing Position objects stored in the old sequence couldn't be used any more. The LeafNodeLM returns a single area, so it can apply changes immediatly. + applyChanges() This method tells the TLM to apply the changes to vecAreaInfo; all LM returns true if something is changed or false otherwise, so the LLM knows whether it has to replace the old elements or not. + getChangedKnuthElement() This is used by the LLM to obtain the new elements. + getWordSpaceIPD() This is used by the LLM to ask for the word space dimension; the LLM needs it to center text. A few details to fix: - word spacing and letter spacing are now fully implemented, they can both have MinOptMax values; but I am still thinking about how to differentiate a user-defined zero value from a default zero value ... - Leaders with leader-pattern = rule or space work well; with dots the space left is right, but the dots don't fill it properly. Leaders with leader- pattern=use-content don't work, as the ContentLayoutManager has at the moment only a null implementation of the method getNextKnuthElement. There is also a minor bug concerning (IMO) white space handling: if there white space both before and after the leader, the latter one is removed, so instead of word __ word the output shows word __word - with the other fo elements (fo:externalgraphic, fo:page-number and fo:page- number-citation) the LeafNodeLM behave exactly the same way as with the old code, i.e. a fo:page-number-citation generates a ? . - text-align-last is partially implemented; text-align-last = justify works only if text-align = justify too; this is because Knuth's algorithm doesn't provide for a different alignment for the last line. I'm going to attach: - the patch to existing files and new files - a test fo file
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-08-03 12:45 --- Created an attachment (id=12308) patch (second edition)
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-08-03 12:45 --- Created an attachment (id=12309) test fo file
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-08-03 13:32 --- Thanks, you will need to add the comments in this patch to the *code*, people will not always have the benefit of this Bugzilla entry when looking at it. This is extremely important, as layout is very complex. Also, is this Knuth algorithm copyrighted? Where did you get it from? It is rare that we have classes named after authors. Glen
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-08-03 14:42 --- Thanks again for you work--we have very few who can work on layout. Apparently, Dr. Knuth wouldn't seem to mind using his algorithms: http://lpf.ai.mit.edu/Patents/knuth-to-pto.txt
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-08-03 15:00 --- Luca, I also thank you for your time and commitment. Grazie mille^H^H^H^H^H un miliardo! I'm sure it is very much appreciated by the other COMMITTERs, as well as the throngs who will benefit from your time and energy in the future. This is a very exciting addition to FOP, and I'm hoping it will help to simplify the code in other ways as well. It's really nice to have a multitude of people who 'capish' (grok) the inner workings of FOP. Web Maestro Clay
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-05-21 17:21 --- Created an attachment (id=11625) LineLayoutManager.java
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-05-21 17:22 --- Created an attachment (id=11626) TextLayoutManager.java
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-05-20 15:31 --- Created an attachment (id=11602) patch to existing files
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-05-20 15:31 --- Created an attachment (id=11603) KnuthElement.java
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-05-20 15:32 --- Created an attachment (id=11604) KnuthBox.java
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-05-20 15:32 --- Created an attachment (id=11605) KnuthGlue.java
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-05-20 15:32 --- Created an attachment (id=11606) KnuthPenalty.java
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-05-20 15:33 --- Created an attachment (id=11607) KnuthPossPosIter.java
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-05-20 15:34 --- Created an attachment (id=11608) fo test file (text-indent)
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-05-20 15:34 --- Created an attachment (id=11609) fo test file (text-align-last)
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-05-20 15:35 --- Created an attachment (id=11610) fo test file (word-spacing and letter-spacing)
DO NOT REPLY [Bug 29124] - New line breaking algorithm
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=29124. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=29124 New line breaking algorithm --- Additional Comments From [EMAIL PROTECTED] 2004-05-20 15:36 --- Created an attachment (id=11611) fo test file (long paragraphs of text)