[dropped Peter from CC; I'm sure he'll find one copy of this enough] Hi Michał,
At 2023-04-05T18:13:16+0000, Michał Kruszewski wrote: > I have once evaluated ms, mm and mom. I have come from the Latex > world after being sick of its bloat. I was looking for something > simple. I know some differences between ms, mm and mom, but I do not > really understand why people did not want to cooperate to create a > single macro package and single program. The reasons are mostly historical and organizational. Adding "me", "man", and mdoc to the above list, I'll offer a summary. Some of this is grounded on my absorption of historical documents and some is reckless speculation with a conspiratorial bent. I am a first-hand observer of practically nothing discussed here. ms, written by Mike Lesk, came first, in Version 6 Unix (1975). man(7) came next, in Version 7 Unix (1979). While man page documents date back all the way to First Edition Unix (1971) (and the basic format even farther than that, apparently, to Multics documentation), they did not get a set of macros designed for them until Version 7. What came before can be found in the archives of TUHS; whether that constitutes a "macro package" may be a matter of argument, and I haven't researched them myself. Doug McIlroy designed and implemented man(7), and subscribes to this list. He is thus best positioned to address why man(7) was developed instead of routing man page composition through ms(7). But I can guess. ms was born with typesetting in mind, and man pages needed to be formattable on the Teletype machines used as Unix time-sharing terminals. There is also the matter of execution speed. When a person at a terminal wants a man page, they want it fast. Today even the most complex man pages format in time intervals below the threshold of human perception, but that wasn't the case in the 1970s nor for many years afterward (thus the now obsolescent phenomenon of "cat pages"). I suspect that there was also a general understanding that man pages could (and should) be written by people who otherwise did not concern themselves with the construction of typeset documents. There were reasons, then, to construct a domain-specific macro package for man page documents. The package was clearly inspired by ms(7), as many of the macro names are the same, with (originally) two macros `LP` and `PP` that did exactly the same thing (starting a new paragraph), and not indenting the first line differently, as ms's distinct macros did. `IP`, `RS`, and `RE` also "port" in the document writer's mind between ms(7) and man(7). Some macros share names but behave a little differently, as with `B`, `I`, and `SH`. I've been tripped up by those small differences occasionally. mm(7) also shares many macros in common with ms(7), and in many ways matches ms(7)'s behavior more closely than man(7) does. But it also has a _lot_ more macros. Also, both ms(7) and mm(7) come from AT&T. So why do we have both? I think the answer is corporate structure/politics. My inference from reading anecdotes and a variety of historical Unix docs, Brian Kernighan's memoir, and a scanned copy of a Bell Labs CSRC office personnel list (complete with telephone extensions!) is that the ms/mm bifurcation arose from the organizational distinction between CSRC (the Computing Science Research Center) and USG (the Unix Support Group). At the corporate level, AT&T desperately wanted to make money selling Unix, and through most of the 1970s this was difficult because of an old U.S. antitrust legal case that forbade them from going commercial outside the telephone industry. However, AT&T lawyers repeatedly tested the waters by charging higher and higher convenience fees (like TicketMaster/Live Nation has in more recent years) for Unix source licenses throughout the '70s, and apparently drew no official rebuke from the Federal Trade Commission. By 1980 it was clear that a revanchist wave of social and economic conservatism was breaking over the country--Congress and Democratic President Jimmy Carter had already deregulated the airline industry two years earlier--and that laws against the exercise of monopoly power and restrictions on rentiers of every sort would be tumbling. Ronald Reagan (Carter's main electoral opponent that year) had a campaign team that made a deal with the new (and avowedly U.S.-hostile) autocratic, theological regime in Tehran to hold on to some American hostages from our embassy just a bit longer to keep Reagan's opponent Jimmy Carter from claiming a PR win in election season[1]--though that might not have been enough to keep him in office regardless, as Federal Reserve chairman Paul Volcker had been "hitting the economy over the head with a sledge hammer" to combat inflation.[2] But I digress. My supposition is that forces within USG wanted to add features to ms(7) to support more of their own needs--though this may have been couched in terms of better supporting "users"--and that Research Unix didn't want to be bothered with such things, since their business was _research_. For them, ms(7) was a tool for writing journal papers. For USG, a macro package was a terrific tool for increasing the volume of cool-looking inter-office mail. Thus USG decided to "fork" ms(7) and support their own package, mm(7). The names on the original mm documentation from 1980 are D. W. Smith, J. R. Mashey, E. C. Pariser, and N. W. Smith. I don't recall ever reading any interviews/emails with any of these people about mm. (People have sought out John Mashey to discuss his shell, the immediate predecessor of the Bourne shell.) It might be a good idea to get their perspectives documented, along with Mike Lesk's before the pass away. (Eric Allman has told the story of me(7)'s origin at least once.) It is not clear to me that AT&T commercial Unix even continued to ship ms(7), though there are certainly some people on the TUHS mailing list who could tell you. When we look at the features that USG actually added to distinguish mm(7) from ms(7), we see some conveniences and a lot of highly particular stuff for composing AT&T official documents (some of which groff mm supports, but much of which it doesn't, and the lack of which we don't seem to get complaints about). Personally I think a lot of this comes from executives insisting on "getting the icon in cornflower blue". This sort of micromanagement might explain why the DWB 3.3 mm manual credits no authors at all. Meanwhile, as the 1980s dawned the University of California at Berkeley was spinning up an operating systems research organization that would come to rival the Bell Labs CSRC in notoriety. (And for those for whom this is the sole criterion of merit, the CSRG was affiliated with at least one billionaire, Bill Joy, whereas as far as I know the CSRC is not.[3]) Ken Thompson had done a sabbatical at Berkeley and a thousand flowers bloomed from what he left behind. Initially there was much cross-pollination between Berkeley's CSRG and Bell's CSRC, but over time the relationship appears to have become strained, perhaps due more to organizational issues and/or the non-stop ratcheting up of Unix license fees by AT&T. The latter's leadership appears to have been frustrated with Berkeley for distributing its own work gratis to anybody who already had an AT&T Unix license, instead of bottling up their nice new features and bug fixes so that AT&T could make more money selling those same people System III or System V or whatever. The fruits of this fraught relationship can be seen in the fact that 4.2BSD (August 1983) shipped with some extensions to the ms(7) package. But Research Unix didn't take them and, as noted above, I'm not sure AT&T commercial Unix kept shipping ms(7) at all. Into this collaborative void, a Berkeley undergraduate named Eric Allman came along and wrote a macro package that the local system administrators decided to name "me". So AT&T and Berkeley Unices were fighting with each other all through the 1980s, which led to a legendary lawsuit[4] establishing that (1) people who shout the loudest about ownership and copyrights are often the poorest stewards of copyright and the lousiest keepers of records of ownership and (2) they would prefer to hold their counterparty to a non-disclosure agreement than permit fact (1) to come to public light. This growing enmity was terrible for *roff development, tragically so because Kernighan's device-independent rewrite of it circa 1980 positioned it really well for the laser printer/desktop publishing revolution. But Kernighan either didn't have the power to free its source code or didn't want to die on that hill. Meanwhile, a guy named Brian Reid wrote a typesetting system called Scribe that was proprietary but which won a lot of admirers, including Richard Stallman, and by extension the rather frothy Texinfo community (witness recent messages from Eli Zaretskii on the help-texinfo mailing list). And another guy in California named Donald Knuth produced a phenomenal achievement of software engineering that produced more diagnostic output than a human could read in a lifetime, written employing literate programming techniques in a language that, in spite of some technical flaws, was much more readable than most of its competitors. But it was (more or less) freely licensed, and gratis, so a vibrant community rapidly sprung up around it; two of the first things this community did, as far as I can tell, were to get rid of literate programming and the readable programming language. For the win! In 1989-1990, James Clark wrote and released groff. It wasn't literately programmed nor implemented in a readable programming language, but was assuredly free[5] and gratis. But by this time much of troff's lunch had been eaten by TeX. groff was pretty successful, and many of the remaining users of Unix troff threw it over in favor of the GNU implementation. This was aided by groff's aggressive absorption/reimplementation of some Sun extensions to man(7)--since Sun workstations were phenomenally popular among the sorts of Unix nerds who spent their entire lives at universities--but more important in my opinion was groff's embrace of a great many extended features from sqtroff, a now nearly forgotten descendant of Unix troff produced by a Canadian company called SoftQuad.[6] But not far into the 1990s, as groff's star rose in the limited skies of the Unix world, Microsoft made a play to kill Unix, while at the same time its brilliant, visionary founder with an unerring ability to predict the future,[7] failed to anticipate the importance of the Web as an application (in the OSI model sense). So a whole lot of Unix developer energy was directed toward adoption and improvement of the Linux kernel and BSD systems as "back-end" platforms for "delivery of content", and into the development of skill with tools for the presentation of that content in a Web browser. Initially, this meant presenting HTML. And it's not easy to get a *roff to turn out HTML, in part because HTML's original design was pretty dire. <MARQUEE><BLINK>Worse is better!</BLINK></MARQUEE> This narrow, obsessive focus on Linux and BSD solely as a network switches and web content delivery engines, rather than as development environments (for which Unix was originally purposed) or as platforms for knowledge workers in general (those benighted souls who devote most of their labor to the absorption, analysis, and composition of natural languages rather than machine-interpretable ones) sucked a lot of energy away from *roff, and much of what remained got funneled into TeX. With, perhaps, some of the consequences of which you complained above. But the main outcome in *roff macro package land was that groff maintained its reimplementations of AT&T's man(7), ms(7), and mm(7), and, thanks to the free licensing, adopted BSD's me(7) and mdoc(7)...oh, I forgot to cover mdoc(7). Okay, well, that project documents its own history amply,[8] but in a nutshell, the Berkeley CSRG decided that man(7) sucked, mainly, I think, because it lacked semantic tagging. Importantly, they realized this way before Tim Berners-Lee did. Cynthia Livingston took 2 cracks at solving the problem of writing a macro package for composition of semantically oriented man pages. (I could be mistaken here, and that someone else wrote the first one, now called "old mdoc".) mdoc(7) caught on like wildfire in the BSDs and not so much anywhere else, though you will find the occasional champion of it elsewhere. In an echo of your original (implied) question, the historical reasons for the multiplicity of BSDs are also worth pursuing. And since a huge email archive of Theo de Raadt's fight with Charles Hannum is available on the Web, it's more authoritative, more entertaining, and even less edifying of human nature than my account here. But we can end our story on a positive note! In 2002, Peter Schaffter looked at the state of groff, decided it was too damned hard to learn (it may be), that the existing macro packages had been stagnating for years (they certainly had) and determined to solve both problems at once by writing a new macro package that was sui generis and went to great lengths not to document itself in terms of the underlying formatting engine. Peter has told me (correct me if I misstate this) that he wishes he'd had my improvements to groff's documentation when he first encountered it, but after observing how much work it has required, he'd still have taken the route he did. Certainly I find the examples of mom's output that we ship with groff to be impressive. And I think other people will too, if they just look. So check out $DESTDIR/share/doc/groff/examples/mom sometime. (s/groff/groff-base/ on Debian systems) > The *roff community is rather small. Dividing it by providing > multiple packages doing more or less the same, or implementing > multiple programs (groff/pdfroof for example) is not probably the > right move. pdfroff is a wrapper, but as I noted recently regarding its (lack of) support for groff's "-a" option, it is not a perfect one. Then, too, Ingo Schwarze has opined about the dubious wisdom of having wrappers around wrappers. groff(1) is itself a wrapper. > I do not want to learn and use ms, mm or mom depending on the type of > the document I write. My impression was that ms is the most minimal > and the simplest. Of those three, yes--ms is the simplest. > I can easily extend ms by defining my own macros or by writing > Perl/Python scripts. As far as I know, that's true of all three (plus me(7)). > Current pdf support in ms is far from being perfect. Yes, regrettably. I _really_ want to improve this in the groff 1.24 cycle. > However, I hope that one day it will be obvious that groff + ms is the > way to go. I don't have any ambition to bring groff ms into feature parity with mom. I think that would be the sort of waste of effort you lament. My objectives for the "historical" macro packages are to: (1) correctly render correctly composed historical documents using these packages (except where we disclaim interest, as with the proprietary markings and corporate logo support in mm); (2) support rendering to hyperlink-capable output formats (HTML and PDF) with reasonable, basic support for hyperlink features (so, actual hypertext in both formats, and a well-realized contents pane for bookmarks in PDF); and (3) support composition of new documents that don't demand features not covered by the above points. For example, for me it's an anti-goal to add macros for drop caps to ms, mm, or me(7). Regards, Branden [1] https://nymag.com/intelligencer/2023/03/did-reagan-teams-iran-hostage-sabotage-defeat-jimmy-carter.html [2] https://www.mercatus.org/macro-musings/paul-krugman-year-inflation-infamy [3] Because he got rich at Sun Microsystems, Joy is frequently credited with innovations he didn't personally make. See, e.g., "Relationship with vi", <https://invisible-island.net/ncurses/ncurses.faq.html>. [4] USL v. BSDI [5] unless you have a BSD brain that regards copyleft as a parasitic virus that frustrates your otherwise inevitable status as the next Bill Joy [6] https://lists.gnu.org/archive/html/groff/2022-12/msg00097.html [7] https://en.wikipedia.org/wiki/The_Road_Ahead_(Gates_book) [8] https://man.openbsd.org/mdoc.7
signature.asc
Description: PGP signature