[Warning: another long message, including SIMH PDP-11 output] At 2022-06-06T12:17:30-0500, G. Branden Robinson wrote: > It is disappointing that I couldn't get any useful information out of > Unix V7 nroff. But I tried only the default device (Teletype Model > 37), and maybe others are more performant. That in turn may require > that I temporarily learn the escape sequences for ancient terminals > like the "GE TermiNet 300", "DASI-300S", or "Diablo Hyperterm", which > I've never even heard of in any other context. Some days my beard is > not so gray. > > And that will be possible only if adequate documentation for those > devices survives.
It doesn't, as far as I've been able to tell. I couldn't find any manuals for the "DASI 300S" terminal (which apparently came in "GSI" and "DTC" varieties). All surviving documentation appears to be nroff or old Unix plot(1)-related. Nevertheless, the nroff terminal descriptions in V7 Unix suffice to draw some guarded inferences. Letting "nroff -T300s" output go to the emulated terminal under SIMH caused only confusion in the terminal driver, so I used the trusty old "od -c" standby. I'll illustrate step-by-step my sequence of experiments. I started with about the simplest input document there is. $ cat > 1lineR.roff foo .pl \n(nlu $ nroff -T300s 1lineR.roff | od -c 0000000 033 006 f o o \r \n \n \n \n \n \n \n \n \n \n 0000020 \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n * 0000100 \n \n \n \n \n \n \n \n 033 006 0000112 It seems that limiting that page length with the `pl` request didn't do a lot of good, but nevertheless I retained it in the subsequent experiments to minimize confounding factors. It seems that DASI 300s output involves the sequence 033 006 (ESC, ACK), always at the beginning, and occasionally later. Possibly this is some kind of time-fill or synchronization primitive, because it isn't correlated with output in a way I can easily discern, except that you get more of them as the byte count of the output goes up. Next, let's see what happens when we try to set boldface. $ cat > 1lineB.roff \fBfoo .pl \n(nlu $ nroff -T300s |#1lineB.roff | od -c 0000000 033 006 033 E f o o \r \n \n \n \n \n \n \n \n 0000020 \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n * 0000100 \n \n \n \n \n \n \n \n \n \n 033 006 0000114 (Here you can enjoy a laugh at my expense as my fumble-fingered typos are recorded for posterity much as they were on actual Teletypes. Recall that on PDP-11 Unix, the terminal driver used '#' for erase and DEL (^?) for keyboard interrupt.) It appears that the sequence 033 E enabled boldface on the 300S. It was not achieved with overstriking, contrary to my expectations. (But then I know _nothing_ about this terminal device's operation.) Next, let's have a look at "italics". $ cat > 1lineI.roff \fIfoo .pl \n(nlu $ nroff -T300s 1lineI.roff | od -c 0000000 033 006 _ \b f _ \b o _ \b o \r \n \n \n \n 0000020 \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n * 0000100 \n \n \n \n \n \n \n \n \n \n \n \n \n \n 033 006 0000120 Italics are achieved by overstriking with ASCII underscores, a mechanism recognized by less(1) even today. Now let's try a style change, going from italics to roman. $ cat > 1lineIthenR.roff \fIfoo\fRbar .pl \n(nlu $ nroff -T300s 1lineIthenR.ro | od -c 0000000 033 006 _ \b f _ \b o _ \b o b a r \r \n 0000020 \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n * 0000120 \n 033 006 \n 0000123 (I omitted an ls(1) command where I was reminded about the 14-character limit of file name components in Unix V7.) There is no surprise here. Let's see how the device gets out of boldface. $ cat > 1lineBthenR.ro \fBfoo\fRbar .pl \n(nlu $ nroff -T300s 1lineBthenR.ro | od -c 0000000 033 006 033 E f o o 033 E b a r \r \n \n \n 0000020 \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n * 0000100 \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n 033 0000120 006 \0 0000121 Another expectation overturned. ESC E is apparently a toggle. But wait. Maybe bold mode has to be refreshed every n characters or something; maybe it expires for some reason. (Terminals have done dumber things.) Let's see. $ cat alphabet.roff \fBabcdefghijklmopqrstuvwxyz .pl \n(nlu $ nroff -T300s alphabet.roff | od -c 0000000 033 006 033 E a b c d e f g h i j k l 0000020 m o p q r s t u v w x y z \r \n \n 0000040 \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n * 0000140 033 006 0000142 If there is such a limit, we're not likely to trip it with our inputs. What if we change between both non-roman styles? $ cat > 1lineBthenI.ro \fBfoo\fIbar .pl \n(nlu $ nroff -T300s 1lineBthenI.ro | od -c 0000000 033 006 033 E f o o 033 E _ \b b _ \b a _ 0000020 \b r \r \n \n \n \n \n \n \n \n \n \n \n \n \n 0000040 \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n * 0000120 \n \n \n \n \n 033 006 \n 0000127 No surprise here. Let's bring in the man(7) package, throw Ingo's example at this terminal, and see what happens. $ cat > 1lineBtrap.man .TH foo 1 .B bar baz .pl \(nlu $ nroff -T300s -man 1lineBtrap.man | od -c 0000000 033 006 \n \n \n f o o ( 1 ) 0000020 006 033 006 U 0000040 N I X P r o g r a m m e r ' s 0000060 M a n u a l 0000100 006 033 006 f o o ( 1 0000120 ) \r \n \n \n \n 033 E b a r 0000140 033 E b a z \r \n \n \n \n \n \n \n \n \n 0000160 \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n * 0000220 \n \n \n \n \n \n \n \n \n \n \n \n \n \n P r 0000240 i n t e d 9 / 2 2 / 8 8 0000260 * 0000320 1 \r 0000340 \n \n \n 033 h 006 \n \n \n 033 006 006 \n 033 006 033 0000360 006 \0 0000361 We see ESC, ACK sequences much more now, and also some isolated ACKs. I can't account for these. It's _possible_ they invalidate any conclusions I might draw. But apart from that nothing looks surprising. The man(7) header and footer are there. The body of the man page contains "bar" in bold (I'm already using "foo" for the page name, and didn't want to confuse my weary eyes), then a space, then "baz" apparently in plain roman. _That_ is exactly what we would expect. Now for the moment of truth. Stick a `\c` after the "bar". $ cat > 1lineBtrapcont.man .TH foo 1 .B bar\c baz .pl \(nlu $ nroff -T300s -man 1lineBtrapcont.man | os#d-# -c 0000000 033 006 \n \n \n f o o ( 1 ) 0000020 006 033 006 U 0000040 N I X P r o g r a m m e r ' s 0000060 M a n u a l 0000100 006 033 006 f o o ( 1 0000120 ) \r \n \n \n \n 033 E b a r 0000140 033 E b a z \r \n \n \n \n \n \n \n \n \n \n 0000160 \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n * 0000220 \n \n \n \n \n \n \n \n \n \n \n \n \n P r i 0000240 n t e d 9 / 2 2 / 8 8 0000260 * 0000320 1 \r \n 0000340 \n \n 033 h 006 \n \n \n 033 006 006 \n 033 006 033 006 0000360 Our output stream is one byte shorter; the space between "bar" and "baz" has disappeared. "baz" remains roman. This is enough to push me fairly hard toward a conclusion that it was _not_ correct to use `itc` with groff man(7)'s input traps for macros like `B` and `I`. The change was made about 5 years ago. That apparently no one has complained about this in the 3+ years groff 1.22.4 has been out suggests that reverting it will not be too disruptive. There is a practical reason to believe that too; if you need to join together two different styles, man(7)'s font alternation macros (BI, BR, IB, and IR [the other two would not see use in this scenario]) are the much more obvious tool for the job, and are heavily attested in extant man pages[1]. In other words, you wouldn't say: .B foo\c bar You'd say: .BR foo bar There are other input trap users in groff man(7), though. SM, SB: These are relatively rarely used (except in historical SunOS man pages). They are indistinguishable from regular roman and bold text, respectively, on nroff devices, namely the terminal emulators that most people use to view man pages. However, their use with \c is more likely because there are no alternation macros for type size. I do remember one real-world use of \c with a small font macro, that being the ksh93 man page[2]. Here's what it did. .SS Field Splitting. After parameter expansion and command substitution, the results of substitutions are scanned for the field separator characters (those found in .SM .B IFS\^\c ) and split into distinct fields where such characters are found. Here, the page author was clearly _not_ expecting `itc` semantics. We can be confident that they wanted the closing parenthesis to match the opening one in size. The 1/12th em horizontal motion looks like detail-oriented typography to me, to keep the closing parenthesis at a larger type size from crowding the smaller "S" preceding it. So, groff 1.22.4 rendered their page worse--if someone looked at it with a troff device instead of a terminal. Consequently, the input traps for `SM` and `SB` should be switched back to `it` as well. That leaves `SH` and `SS`. I can't think of a reason not to revert these to `it` as well. First, it is vanishingly rare for man page authors to knowingly leverage the input traps of these macros in the first place.[3] Second, it makes very little sense to apply any of the other input-trap-using macros to a section or subsection title. The typeface and size of these headings is already under the control of the macro package (though groff man(7) exposes registers and strings to parameterize them at rendering time to suit the taste of the reader). Using `TP` with `SH` or `SS` would be a hideous structural violation and should not be supported under any circumstance. Blissfully, I haven't seen even docbook-to-man produce such an acrid emesis. I therefore propose to proceed as follows. 1. Move all of B, I, SM, SB, SS, and SH back from `itc` to `it`. 2. Keep TP right where it is because `itc` does good there, as originally discussed in 2017. The fix simply got carried away due to lack of understanding (not least my own).[4] Any objections? Regards, Branden [1] https://lists.gnu.org/archive/html/groff/2017-05/msg00066.html [2] https://lists.gnu.org/archive/html/groff/2017-04/msg00027.html [3] I say "knowingly" because the traps are unconditional. What happens with these macros (except TP) is that if they are given arguments, those arguments are immediately placed on the output by the macro itself, springing the input trap. Thus, by the time the macro "returns", the input trap has already disappeared. [4] https://lists.gnu.org/archive/html/groff/2017-05/msg00019.html
signature.asc
Description: PGP signature