Update of bug #64279 (project groff): Category: None => General Severity: 3 - Normal => 1 - Wish Item Group: None => Documentation Status: None => Need Info
_______________________________________________________ Follow-up Comment #1: Are you sure you're looking at a copy of roff(7) from the latest release candidate, 1.23.0.rc4? The page covers much more than just *roff history. [...] Below we present typographical concepts that form the background of all roff implementations, narrate the development history of some roff systems, detail the command pipeline managed by groff(1), survey the formatting language, suggest tips for editing roff input, and recommend further reading materials. Concepts roff input files contain text interspersed with instructions to control the formatter. Even in the absence of such instructions, a roff formatter still processes its input in several ways, by filling, hyphenating, breaking, and adjusting it, and supplementing it with inter-sentence space. These processes are basic to typesetting, and can be controlled at the input document's discretion. When a device-independent roff formatter starts up, it obtains information about the device for which it is preparing output from the latter's description file (see groff_font(5)). An essential property is the length of the output line, such as "6.5 inches". The formatter interprets plain text files employing the Unix line- ending convention. It reads input a character at a time, collecting words as it goes, and fits as many words together on an output line as it can--this is known as filling. To a roff system, a word is any sequence of one or more characters that aren't spaces, tabs, or newlines. The exceptions separate words. A roff formatter attempts to detect boundaries between sentences, and supplies additional inter-sentence space between them. It flags certain characters (normally "!", "?", and ".") as potentially ending a sentence. When the formatter encounters one of these end- of-sentence characters at the end of an input line, or one of them is followed by two (unescaped) spaces on the same input line, it appends an inter-word space followed by an inter-sentence space in the output. The dummy character escape sequence \& can be used after an end-of-sentence character to defeat end-of-sentence detection on a per-instance basis. Normally, the occurrence of a visible non-end-of-sentence character (as opposed to a space or tab) immediately after an end-of-sentence character cancels detection of the end of a sentence. However, several characters are treated transparently after the occurrence of an end-of-sentence character. That is, a roff does not cancel end-of-sentence detection when it processes them. This is because such characters are often used as footnote markers or to close quotations and parentheticals. The default set is ", ', ), ], *, \[dg], \[dd], \[rq], and \[cq]. The last four are examples of special characters, escape sequences whose purpose is to obtain glyphs that are not easily typed at the keyboard, or which have special meaning to the formatter (like \). When an output line is nearly full, it is uncommon for the next word collected from the input to exactly fill it--typically, there is room left over only for part of the next word. The process of splitting a word so that it appears partially on one line (with a hyphen to indicate to the reader that the word has been broken) with its remainder on the next is hyphenation. Hyphenation points can be manually specified; groff also uses a hyphenation algorithm and language-specific pattern files to decide which words can be hyphenated and where. Hyphenation does not always occur even when the hyphenation rules for a word allow it; it can be disabled, and when not disabled there are several parameters that can prevent it in certain circumstances. Once an output line is full, the next word (or remainder of a hyphenated one) is placed on a different output line; this is called a break. In this document and in roff discussions generally, a "break" if not further qualified always refers to the termination of an output line. When the formatter is filling text, it introduces breaks automatically to keep output lines from exceeding the configured line length. After an automatic break, a roff formatter adjusts the line if applicable (see below), and then resumes collecting and filling text on the next output line. Sometimes, a line cannot be broken automatically. This usually does not happen with natural language text unless the output line length has been manipulated to be extremely short, but it can with specialized text like program source code. groff provides a means of telling the formatter where the line may be broken without hyphens. This is done with the non-printing break point escape sequence \:. There are several ways to cause a break at a predictable location. A blank input line not only causes a break, but by default it also outputs a one-line vertical space (effectively a blank output line). Macro packages may discourage or disable this "blank line method" of paragraphing in favor of their own macros. A line that begins with one or more spaces causes a break. The spaces are output at the beginning of the next line without being adjusted (see below). Again, macro packages may provide other methods of producing indented paragraphs. Trailing spaces on text lines (see below) are discarded. The end of input causes a break. After the formatter performs an automatic break, it may then adjust the line, widening inter-word spaces until the text reaches the right margin. Extra spaces between words are preserved. Leading and trailing spaces are handled as noted above. Text can be aligned to the left or right margin only, or centered, using requests. A roff formatter translates horizontal tab characters, also called simply "tabs", in the input into movements to the next tab stop. These tab stops are by default located every half inch measured from the current position on the input line. With them, simple tables can be made. However, this method can be deceptive, as the appearance (and width) of the text in an editor and the results from the formatter can vary greatly, particularly when proportional typefaces are used. A tab character does not cause a break and therefore does not interrupt filling. The formatter provides facilities for sophisticated table composition; there are many details to track when using the "tab" and "field" low-level features, so most users turn to the tbl(1) preprocessor for table construction. Requests and macros A request is an instruction to the formatter that occurs after a control character, which is recognized at the beginning of an input line. The regular control character is a dot ".". Its counterpart, the no-break control character, a neutral apostrophe "'", suppresses the break implied by some requests. These characters were chosen because it is uncommon for lines of text in natural languages to begin with them. If you require a formatted period or apostrophe (closing single quotation mark) where the formatter is expecting a control character, prefix the dot or neutral apostrophe with the dummy character escape sequence, "\&". An input line beginning with a control character is called a control line. Every line of input that is not a control line is a text line. Requests often take arguments, words (separated from the request name and each other by spaces) that specify details of the action the formatter is expected to perform. If a request is meaningless without arguments, it is typically ignored. Of key importance are the requests that define macros. Macros are invoked like requests, enabling the request repertoire to be extended or overridden. A macro can be thought of as an abbreviation you can define for a collection of control and text lines. When the macro is called by giving its name after a control character, it is replaced with what it stands for. The process of textual replacement is known as interpolation. Interpolations are handled as soon as they are recognized, and once performed, a roff formatter scans the replacement for further requests, macro calls, and escape sequences. In roff systems, the "de" request defines a macro. Page geometry roff systems format text under certain assumptions about the size of the output medium, or page. For the formatter to correctly break a line it is filling, it must know the line length, which it derives from the page width. For it to decide whether to write an output line to the current page or wait until the next one, it must know the page length. A device's resolution converts practical units like inches or centimeters to basic units, a convenient length measure for the output device or file format. The formatter and output driver use basic units to reckon page measurements. The device description file defines its resolution and page dimensions (see groff_font(5)). A page is a two-dimensional structure upon which a roff system imposes a rectangular coordinate system with its upper left corner as the origin. Coordinate values are in basic units and increase down and to the right. Useful ones are therefore always positive and within numeric ranges corresponding to the page boundaries. While the formatter (and, later, output driver) is processing a page, it keeps track of its drawing position, which is the location at which the next glyph will be written, from which the next motion will be measured, or where a geometric primitive will commence rendering. Notionally, glyphs are drawn from the text baseline upward and to the right. (groff does not yet support right-to-left scripts.) The text baseline is a (usually invisible) line upon which the glyphs of a typeface are aligned. A glyph therefore "starts" at its bottom-left corner. If drawn at the origin, a typical letter glyph would lie partially or wholly off the page, depending on whether, like "g", it features a descender below the baseline. Such a situation is nearly always undesirable. It is furthermore conventional not to write or draw at the extreme edges of the page. Therefore the initial drawing position of a roff formatter is not at the origin, but below and to the right of it. This rightward shift from the left edge is known as the page offset. (groff's terminal output devices have page offsets of zero.) The downward shift leaves room for a text output line. Text is arranged on a one-dimensional lattice of text baselines from the top to the bottom of the page. Vertical spacing is the distance between adjacent text baselines. Typographic tradition sets this quantity to 120% of the type size. The initial vertical drawing position is one unit of vertical spacing below the page top. Typographers term this unit a vee. Vertical spacing has an impact on page-breaking decisions. Generally, when a break occurs, the formatter moves the drawing position to the next text baseline automatically. If the formatter were already writing to the last line that would fit on the page, advancing by one vee would place the next text baseline off the page. Rather than let that happen, roff formatters instruct the output driver to eject the page, start a new one, and again set the drawing position to one vee below the page top; this is a page break. When the last line of input text corresponds to the last output line that fits on the page, the break caused by the end of input will also break the page, producing a useless blank one. Macro packages keep users from having to confront this difficulty by setting "traps"; moreover, all but the simplest page layouts tend to have headers and footers, or at least bear vertical margins larger than one vee. Other language elements Escape sequences start with the escape character, a backslash \, and are followed by at least one additional character. They can appear anywhere in the input. With requests, the escape and control characters can be changed; further, escape sequence recognition can be turned off and back on. Strings store character sequences. In groff, they can be parameterized as macros can. Registers store numerical values, including measurements. The latter are generally in basic units; scaling units can be appended to numeric expressions to clarify their meaning when stored or interpolated. Some read-only predefined registers interpolate text. Fonts are identified either by a name or by a mounting position (a non-negative number). Four styles are available on all devices. R is "roman": normal, upright text. B is bold, an upright typeface with a heavier weight. I is italic, a face that is oblique on typesetter output devices and usually underlined instead on terminal devices. BI is bold-italic, combining both of the foregoing style variations. Typesetting devices group these four styles into families of text fonts; they also typically offer one or more special fonts that provide unstyled glyphs; see groff_char(7). groff supports named colors for glyph rendering and drawing of geometric primitives. Stroke and fill colors are distinct; the stroke color is used for glyphs. Glyphs are visual representation forms of characters. In groff, the distinction between those two elements is not always obvious (and a full discussion is beyond our scope). In brief, "A" is a character when we consider it in the abstract: to make it a glyph, we must select a typeface with which to render it, and determine its type size and color. The formatting process turns input characters into output glyphs. A few characters commonly seen on keyboards are treated specially by the roff language and may not look correct in output if used unthinkingly; they are the (double) quotation mark ("), the neutral apostrophe ('), the minus sign (-), the backslash (\), the caret or circumflex accent (^), the grave accent (`), and the tilde (~). All of these and more can be produced with special character escape sequences; see groff_char(7). groff offers streams, identifiers for writable files, but for security reasons this feature is disabled by default. A further few language elements arise as page layouts become more sophisticated and demanding. Environments collect formatting parameters like line length and typeface. A diversion stores formatted output for later use. A trap is a condition on the input or output, tested automatically by the formatter, that is associated with a macro, calling it when that condition is fulfilled. Footnote support often exercises all three of the foregoing features. A simple implementation might work as follows. A pair of macros is defined: one starts a footnote and the other ends it. The author calls the first macro where a footnote marker is desired. The macro establishes a diversion so that the footnote text is collected at the place in the body text where its corresponding marker appears. An environment is created for the footnote so that it is set at a smaller typeface. The footnote text is formatted in the diversion using that environment, but it does not yet appear in the output. The document author calls the footnote end macro, which returns to the previous environment and ends the diversion. Later, after much more body text in the document, a trap, set a small distance above the page bottom, is sprung. The macro called by the trap draws a line across the page and emits the stored diversion. Thus, the footnote is rendered. History [...] The "History" section is only about 1/3rd of the page by line count. Significant, but not even a majority of the content. _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?64279> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/