Re: Proposed GNU troff behavior change: require end-of-input macros to exit
Quoth hoh...@posteo.de: If gpic gets Ä (0xc3 0x84) it complains about 0x84. If gpic gets ä (0xc3 0xa4) it does not complain about 0xa4. gpic says: "invalid input character". So because both being above ASCII (8 bit area), what makes 0x84 wrong? It seems that 0x84 is located in a control area whereas 0xa4 in an graphics one. ECMA-48 says for 0x84: 8.3.132 SPI - SPACING INCREMENT Hm. If you want to know why I ignore preconv, read the last mail.) This is from src/libs/libgroff/invalid.cpp: // Table of invalid input characters. char invalid_char_table[256]= { #ifndef IS_EBCDIC_HOST 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, #else 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, #endif }; So the bad input bytes are a bunch of the ASCII C0 control characters and all of the C1 control characters. And that’s the way it is.
Re: Proposed GNU troff behavior change: require end-of-input macros to exit
If gpic gets Ä (0xc3 0x84) it complains about 0x84. If gpic gets ä (0xc3 0xa4) it does not complain about 0xa4. gpic says: "invalid input character". So because both being above ASCII (8 bit area), what makes 0x84 wrong? It seems that 0x84 is located in a control area whereas 0xa4 in an graphics one. ECMA-48 says for 0x84: 8.3.132 SPI - SPACING INCREMENT Hm. If you want to know why I ignore preconv, read the last mail.) On Thu, 28 Dec 2023 17:43:12 + Lennart Jablonka wrote: > Quoth holger.herrl...@posteo.de: > >echo ä | gpic | hexStream > >0x2e 0x69 0x66 0x20 0x21 0x64 0x50 0x53 | .if !dPS > >0x20 0x2e 0x64 0x73 0x20 0x50 0x53 0x0a | .ds PS. > >0x2e 0x69 0x66 0x20 0x21 0x64 0x50 0x45 | .if !dPE > >0x20 0x2e 0x64 0x73 0x20 0x50 0x45 0x0a | .ds PE. > >0x2e 0x6c 0x66 0x20 0x31 0x20 0x2d 0x0a | .lf 1 -. > >0xc3 0xa4 0x0a | ... > > > >echo Ä | gpic | hexStream > >gpic::1: invalid input character code 132 > >0x2e 0x69 0x66 0x20 0x21 0x64 0x50 0x53 | .if !dPS > >0x20 0x2e 0x64 0x73 0x20 0x50 0x53 0x0a | .ds PS. > >0x2e 0x69 0x66 0x20 0x21 0x64 0x50 0x45 | .if !dPE > >0x20 0x2e 0x64 0x73 0x20 0x50 0x45 0x0a | .ds PE. > >0x2e 0x6c 0x66 0x20 0x31 0x20 0x2d 0x0a | .lf 1 -. > >0xc3 0x0a| .. > > > >The character emerges from a input file name. So it is missed by > >preconv somewhere, however why is 'ä' working properly/ just passed > >through? > > You don’t seem to be running preconv. Are you? > > gpic is reading from standard input the bytes a4 c3 (ä) or > 84 c3 (Ä). It interprets those as Latin 1: a4 c3 is ¤ Ã. > 84 c3 is a control character followed by Ã. The control > characters 80–9f are invalid. On Fri, 8 Dec 2023 18:48:50 -0600 g.branden.robin...@gmail.com wrote: > [self-follow-up] > > Some clarifications, to our Texinfo manual and to my own remarks... > > At 2023-12-08T15:34:28-0600, G. Branden Robinson wrote: > > The '\c' in the above example needs explanation. For > > historical reasons (and for compatibility with AT 'troff'), the > > end macro exits as soon as it causes a page break and no remaining > > data is in the partially collected line. > > Clearer would be: > > "as soon as it causes a page break and no output line is pending." > > > To always force processing the whole end macro independently of > > this behaviour it is thus advisable to insert something that > > starts an empty partially filled line ('\c') whenever there is a > > chance that a page break can happen. > > "An empty partially filled line" is somewhat baffling wording. > Clearer would be: > > "to ensure that an output line is pending, even if it has no visible > content, whenever a page break might occur during end-of-input macro > processing." > > > I would prefer to just make `em` behave the way people expect, but > > retain the weird old behavior for the benefit of historical > > documents. > > ...in AT compatibility mode ("groff -C") only. > > Regards, > Branden pgpfLMrklFkmI.pgp Description: OpenPGP digital signature
Re: Proposed GNU troff behavior change: require end-of-input macros to exit
At 2023-12-17T21:13:27+, Deri wrote: > Add .fl after Hello. :-) A quick experiment reveals that `br` has the same effect, which means that a different documentary statement of mine is wrong: --snip-- What if the file ends before enough words have been collected to fill an output line? Or the output line is exactly full but not yet broken, and there is no more input? GNU 'troff' interprets the end of input as a break. Certain requests also cause breaks, implicitly or explicitly. This is discussed in *note Manipulating Filling and Adjustment::. --end snip-- "GNU 'troff' interprets the end of input as a break." Guess I'm going to have to walk that back. That puts me back in the position of not knowing how to distinguish `fl` and `br`. Regards, Branden signature.asc Description: PGP signature
Re: Proposed GNU troff behavior change: require end-of-input macros to exit
At 2023-12-17T21:13:27+, Deri wrote: > > I challenge you to explain! :D > > Add .fl after Hello. :-) That's not an explanation! Just more magic! :-O Anyone want to spare me some time in GDB? >cringe< Regards, Branden signature.asc Description: PGP signature
Re: Proposed GNU troff behavior change: require end-of-input macros to exit
On Sunday, 17 December 2023 20:53:52 GMT G. Branden Robinson wrote: > Hi Deri, > > At 2023-12-10T18:43:42+, Deri wrote: > > On Saturday, 9 December 2023 19:25:27 GMT G. Branden Robinson wrote: > > > When a line of output is "finished" and sent to the device > > > (device-independent output is prepared for it), the vertical > > > position advances by one vee, and, (in groff, if vertical position > > > traps are not disabled,) any visible vertical position traps planted > > > between the previous text baseline and the new one are sprung.[1] > > > If one of these traps is what I term "the implicit page trap"[2], > > > then the page is ejected and a new one started. > > > > Is this the problem? It does not make sense to start a new page until > > groff "knows" there will be further output. > > Changing that is a deeper, more intrusive change than I am proposing. > It would affect approximately all page transitions processed by the > formatter. > > Not saying that it's wrong or a bad idea, but it's significantly more > ambitious than what I proposed. > > > So if the implicit page trap is triggered it should set a flag to > > trigger the new page code if further output, other than "x trailer", > > is output. This is an example of troff output:- > > > > x T pdf > > x res 72000 1 1 > > x init > > p1 > > x font 5 TR > > f5 > > s1 > > V12000 > > H72000 > > mr 0 0 0 > > DFd > > tline > > n12000 0 > > V792000 > > H72000 > > tline > > n12000 0 > > V792000 > > p2 > > x trailer > > V792000 > > x stop > > > > The V792000 and p2 are unnecessary unless there is further output. > > That's true, but my hunch is that most *roff users over the years to > want to write within 1v of the page bottom use "local" vertical motions > from a safe distance to do so. > > Hmmm, I was going to show you an exhibit involving `\V` but managed to > surprise myself. > > Here's a counterexample. > > $ cat ATTIC/kiss-foot.tr > .\" U.S. letter paper assumed > .sp 65v > Hello! > > This produces a ONE page PostScript document with "Hello!"'s text > baseline sitting at the very boundary of the paper. And there is no > blank second page. I had predicted one, so obviously there is more for > me to understand here. (And another trip back to groff.texi with my > text editor after that.) > > Obviously when the formatter performed a break and pushed the output > line, it recognized that there was no more input, and didn't break the > page as well. > > Here's the grout: > > $ groff -Z ATTIC/kiss-foot.tr > x T ps > x res 72000 1 1 > x init > p1 > x font 5 TR > f5 > s1 > V792000 > H72000 > md > DFd > tHello! > n12000 0 > x trailer > V792000 > x stop > > (It even _SAYS_ there's a break after the text, with the 'n12000 0' > documentary command.) > > I challenge you to explain! :D > > Regards, > Branden Add .fl after Hello. :-) Cheers Deri
Re: Proposed GNU troff behavior change: require end-of-input macros to exit
Hi Deri, At 2023-12-10T18:43:42+, Deri wrote: > On Saturday, 9 December 2023 19:25:27 GMT G. Branden Robinson wrote: > > When a line of output is "finished" and sent to the device > > (device-independent output is prepared for it), the vertical > > position advances by one vee, and, (in groff, if vertical position > > traps are not disabled,) any visible vertical position traps planted > > between the previous text baseline and the new one are sprung.[1] > > If one of these traps is what I term "the implicit page trap"[2], > > then the page is ejected and a new one started. > > Is this the problem? It does not make sense to start a new page until > groff "knows" there will be further output. Changing that is a deeper, more intrusive change than I am proposing. It would affect approximately all page transitions processed by the formatter. Not saying that it's wrong or a bad idea, but it's significantly more ambitious than what I proposed. > So if the implicit page trap is triggered it should set a flag to > trigger the new page code if further output, other than "x trailer", > is output. This is an example of troff output:- > > x T pdf > x res 72000 1 1 > x init > p1 > x font 5 TR > f5 > s1 > V12000 > H72000 > mr 0 0 0 > DFd > tline > n12000 0 > V792000 > H72000 > tline > n12000 0 > V792000 > p2 > x trailer > V792000 > x stop > > The V792000 and p2 are unnecessary unless there is further output. That's true, but my hunch is that most *roff users over the years to want to write within 1v of the page bottom use "local" vertical motions from a safe distance to do so. Hmmm, I was going to show you an exhibit involving `\V` but managed to surprise myself. Here's a counterexample. $ cat ATTIC/kiss-foot.tr .\" U.S. letter paper assumed .sp 65v Hello! This produces a ONE page PostScript document with "Hello!"'s text baseline sitting at the very boundary of the paper. And there is no blank second page. I had predicted one, so obviously there is more for me to understand here. (And another trip back to groff.texi with my text editor after that.) Obviously when the formatter performed a break and pushed the output line, it recognized that there was no more input, and didn't break the page as well. Here's the grout: $ groff -Z ATTIC/kiss-foot.tr x T ps x res 72000 1 1 x init p1 x font 5 TR f5 s1 V792000 H72000 md DFd tHello! n12000 0 x trailer V792000 x stop (It even _SAYS_ there's a break after the text, with the 'n12000 0' documentary command.) I challenge you to explain! :D Regards, Branden signature.asc Description: PGP signature
Re: Proposed GNU troff behavior change: require end-of-input macros to exit
Hi Holger, At 2023-12-11T19:35:42+0100, Holger Herrlich wrote: > As far as I got, by playing around, the '\c' doesn't matter. I think it does; I think it was Werner who put the cautionary language (which I quoted) in our Texinfo manual about this, and this seemed to be squarely on point when I fixed the bug in our mm(7) that I also cited. > It seems that the additional page comes from an additional call to the > default page break. Do you have an alternative explanation for why my fix to Savannah #64336 worked? https://git.savannah.gnu.org/cgit/groff.git/commit/?id=3b615aa0fac692a8a24442a14726e71594c5f805 Regards, Branden signature.asc Description: PGP signature
Re: Proposed GNU troff behavior change: require end-of-input macros to exit
As far as I got, by playing around, the '\c' doesn't matter. It seems that the additional page comes from an additional call to the default page break. Using a custom trap, just disable it in your end trap: 8< .\" .\" run: groff em-test.groff > em-test.ps .\" .nr PAGE-trap 20c .nr PAGE-ll 13c .\" .de my-trap .tl '\\v'|\\n[PAGE-trap]u'\\h'|0'\\D'l \\n[PAGE-ll]u 0c 'bp \" matters: not .bp .. .\" .de your-end \c \" doesn't matter . ne 3v . sp (\\n[.t]u - 3v) .wh \n[PAGE-trap]u\" matters: disable regular trap . in +4i . lc _ . br Approved:\t\a . sp Date:\t\t\a .. .\" .wh \n[PAGE-trap]u my-trap .em your-end .\" .\".SF-std .\" XXX first line XXX .br .\" .sp |(\n[PAGE-trap]u-2.99v) \" matters: 3.01v get you one page only YYY last line YYY .\" 8< Without custom trap handling, one need to prevent the default one to engage: 8< .\" .\" run: groff em-test.groff > em-test.ps .\" or: groff em-test.groff > em-test.ps .\" .nr Pt 20c\" page trap .nr Pl 13c\" page length .\" .pl \n(Ptu .\" .\".de mt\" my trap .\".tl '\\v'|\\n(Ptu'\\h'|0'\\D'l \\n(Plu 0c .\"'bp \" matters: not .bp .\".. .\" .de ye\" your end .\"\c \" doesn't matter . ne 3v . sp (\\n(.tu - 3.01v) \" prevent another page break .\".wh \n(Ptu\" matters: disable regular trap . in +4i . lc _ . br Approved:\t\a . sp Date:\t\t\a .. .\" .\".wh \n(Ptu mt .em ye .\" .\".SF-std .\" XXX first line XXX .br .\" .sp |(\n(Ptu-2.99v) \" matters: 3.01v get you one page only YYY last line YYY .\" 8<
Re: Proposed GNU troff behavior change: require end-of-input macros to exit
On Saturday, 9 December 2023 19:25:27 GMT G. Branden Robinson wrote: > At 2023-12-09T09:26:16-0500, Douglas McIlroy wrote: > > > For historical reasons (and for compatibility with AT 'troff'), > > > the end macro exits as soon as it causes a page break and > > > no remaining data is in the partially collected line. > > > > This isn't the only anomalous behavior at the end of a document. Since > > day one, troff has occasionally emitted a blank page at the end. I > > believe this is because a new page is triggered when the previous page > > is filled rather than when some output needs somewhere to go. A > > document that exactly fills the last page thus gets an extra page. > > Right. When a line of output is "finished" and sent to the device > (device-independent output is prepared for it), the vertical position > advances by one vee, and, (in groff, if vertical position traps are not > disabled,) any visible vertical position traps planted between the > previous text baseline and the new one are sprung.[1] If one of these > traps is what I term "the implicit page trap"[2], then the page is > ejected and a new one started. Is this the problem? It does not make sense to start a new page until groff "knows" there will be further output. So if the implicit page trap is triggered it should set a flag to trigger the new page code if further output, other than "x trailer", is output. This is an example of troff output:- x T pdf x res 72000 1 1 x init p1 x font 5 TR f5 s1 V12000 H72000 mr 0 0 0 DFd tline n12000 0 V792000 H72000 tline n12000 0 V792000 p2 x trailer V792000 x stop The V792000 and p2 are unnecessary unless there is further output. Cheers Deri
Re: Proposed GNU troff behavior change: require end-of-input macros to exit
Hi Doug, At 2023-12-09T09:26:16-0500, Douglas McIlroy wrote: > > For historical reasons (and for compatibility with AT 'troff'), > > the end macro exits as soon as it causes a page break and > > no remaining data is in the partially collected line. > > This isn't the only anomalous behavior at the end of a document. Since > day one, troff has occasionally emitted a blank page at the end. I > believe this is because a new page is triggered when the previous page > is filled rather than when some output needs somewhere to go. A > document that exactly fills the last page thus gets an extra page. Right. When a line of output is "finished" and sent to the device (device-independent output is prepared for it), the vertical position advances by one vee, and, (in groff, if vertical position traps are not disabled,) any visible vertical position traps planted between the previous text baseline and the new one are sprung.[1] If one of these traps is what I term "the implicit page trap"[2], then the page is ejected and a new one started. Here is how I try to present this information in our Texinfo manual and roff(7). Vertical spacing has an impact on page‐breaking decisions. Generally, when a break occurs, the formatter moves the drawing position to the next text baseline automatically. If the formatter were already writing to the last line that would fit on the page, advancing by one vee would place the next text baseline off the page. Rather than let that happen, roff formatters instruct the output driver to eject the page, start a new one, and again set the drawing position to one vee below the page top; this is a page break. When the last line of input text corresponds to the last output line that fits on the page, the break caused by the end of input will also break the page, producing a useless blank one. Macro packages keep users from having to confront this difficulty by setting “traps”; moreover, all but the simplest page layouts tend to have headers and footers, or at least bear vertical margins larger than one vee. Of itself, I don't think this procedure is closely coupled with end-of-input macro handling. > Before jumping for a special fix for .em, you might like to consider > the more general question of how a page gets initiated and/or when a > trap gets sprung (upon reaching it or upon passing it?). I've been trying to, and I haven't come up with any other ideas, hence mooting it on the list. :) > Then .em > might not need so much special pleading. > > In regard to the narrow issue of .em, what alternate > fixes have you considered? For example, instead of > exiting, .em might be required to do .rm em. You mean that we might require the macro called by the `em` request to delete the `em` request itself (or whatever the user renamed it to)? > Or the removal could be done automatically when .em is invoked. Under > either regime, a user could even arrange for .em to be reinstated to > accomplish a second coming--er, ending. It seems like this would imply looping on "if end-of-input macro defined" at the end of input. That seems similar to my proposal; either one could lead to groff "hanging" (blocking while trying to read from stdin) if the user doesn't follow the rules. One of the reasons my proposal has the shape it does is because it is so hard to explain. [groff.texi from Git HEAD] --snip-- (1) While processing an end-of-input macro, the formatter assumes that the next page break must be the last; it goes into "sudden death overtime". --end snip-- Any time I have to resort to a sports metaphor (and possibly a U.S.-centric one at that) to explain a technical point, I treat that as an indicator either that I'm being too whimsical or that the system in question is too baroque. Regards, Branden [1] With mysterious exceptions that seem to be widely shared among *roffs. In a nutshell, implementations seem to assume that only one vertical position between the current and next baselines will have traps planted. This is not the same thing as having one trap hide others at the _same_ vertical position, nor the same thing as vertical position traps not being able to be sprung while a trap-called macro is being interpreted. https://savannah.gnu.org/bugs/?56499 [2] https://www.gnu.org/software/groff/manual/groff.html.node/The-Implicit-Page-Trap.html signature.asc Description: PGP signature
Re: Proposed GNU troff behavior change: require end-of-input macros to exit
[self-follow-up] Some clarifications, to our Texinfo manual and to my own remarks... At 2023-12-08T15:34:28-0600, G. Branden Robinson wrote: > The '\c' in the above example needs explanation. For historical > reasons (and for compatibility with AT 'troff'), the end macro > exits as soon as it causes a page break and no remaining data is > in the partially collected line. Clearer would be: "as soon as it causes a page break and no output line is pending." > To always force processing the whole end macro independently of > this behaviour it is thus advisable to insert something that starts > an empty partially filled line ('\c') whenever there is a chance > that a page break can happen. "An empty partially filled line" is somewhat baffling wording. Clearer would be: "to ensure that an output line is pending, even if it has no visible content, whenever a page break might occur during end-of-input macro processing." > I would prefer to just make `em` behave the way people expect, but > retain the weird old behavior for the benefit of historical documents. ...in AT compatibility mode ("groff -C") only. Regards, Branden signature.asc Description: PGP signature
Re: Proposed GNU troff behavior change: require end-of-input macros to exit
On Fri, Dec 08, 2023, G. Branden Robinson wrote: > I propose that GNU troff stop behaving like AT troff in one aspect of > end-of-input macro processing, documented in our Texinfo manual. I'm all for it, for all the reasons given. -- Peter Schaffter https://www.schaffter.ca