Update of bug #65108 (group groff): Status: None => Need Info Assigned to: None => barx
_______________________________________________________ Follow-up Comment #3: Well, let's rough out a syntax that would work both for existing uses of `so` as _soelim_(1) understands it and for formatter syntax, which interprets the `so` under slightly different rules (since it brings to bear the full power of the _troff_ lexical analyzer). 1. An argument of type `file` (as described in _groff_(7)) to a request consumes the rest of the rest of the line. 2. Unescaped spaces can therefore populate the argument. 3. A leading double quote is recognized and removed; a file name can thus start with spaces. 4. Any other/remaining double quotes are not treated specially. 5. Only the following escape sequences are recognized. 5a. `\ ` (backslash-space) represents a space. It is not necessary in _troff_, but is recognized to avoid disrupting existing _soelim_(1) usage. 5b. `\"` ends the file name argument and starts a comment. 5c. `\\` represents a (single) literal backslash. It is handled however the system's standard C library wants to handle it. 5d. `\[u00XX]` where each X is an uppercase hexadecimal digit encodes a character. Only codes in the range 00-1F and 80-FF are accepted in this syntax; those in the range 20-7F are ignored with a diagnostic advising the user to deobfuscate their inputs. How are these handled today? Specimen: $ cat EXPERIMENTS/extending-so-syntax.troff .so foo bar file.troff .so foo\ bar\ file.troff .so "foo bar file.troff .so foo.troff\" comment .so foo\u[0020]bar\u[0020]file.troff _groff_ _soelim_: $ soelim EXPERIMENTS/extending-so-syntax.troff .lf 1 ./EXPERIMENTS/extending-so-syntax.troff soelim:./EXPERIMENTS/extending-so-syntax.troff:1: error: can't open 'foo': No such file or directory .so foo bar file.troff soelim:./EXPERIMENTS/extending-so-syntax.troff:2: error: can't open 'foo bar file.troff': No such file or directory .so foo\ bar\ file.troff soelim:./EXPERIMENTS/extending-so-syntax.troff:3: error: can't open '"foo': No such file or directory .so "foo bar file.troff .so foo.troff\" comment .so foo\u[0020]bar\u[0020]file.troff DWB 3.3 _soelim_: ...never mind, DWB 3.3 _troff_ *has* no _soelim_. Wow! Learned something new today. Heirloom Doctools _soelim_: $ ./bin/soelim ./extending-so-syntax.troff foo: No such file or directory .so foo bar file.troff foo\: No such file or directory .so foo\ bar\ file.troff "foo: No such file or directory .so "foo bar file.troff foo.troff\": No such file or directory .so foo.troff\" comment foo\u[0020]bar\u[0020]file.troff: No such file or directory .so foo\u[0020]bar\u[0020]file.troff Uh, that's a little hard to interpret. $ printf '.so foo bar file.troff\n' | ./bin/soelim foo: No such file or directory .so foo bar file.troff Interesting that it transforms the input in this way, by adding a newline where it decided to stop lexing the file name. I'm tempted to call that a bug. 0000000 . s o f o o \n b a r f i l e 0000020 . t r o f f \n 0000026 The other cases: $ printf '.so foo\\ bar\\ file.troff\n' | ./bin/soelim foo\: No such file or directory .so foo\ bar\ file.troff $ printf '.so "foo bar file.troff\n' | ./bin/soelim "foo: No such file or directory .so "foo bar file.troff $ printf '.so "foo.troff\\"comment\n' | ./bin/soelim "foo.troff\"comment: No such file or directory .so "foo.troff\"comment $ printf '.so foo\u[0020]bar\u[0020]file.troff\n' | ./bin/soelim printf '.so foo\\u[0020]bar\\u[0020]file.troff\n' | ./bin/soelim foo\u[0020]bar\u[0020]file.troff: No such file or directory .so foo\u[0020]bar\u[0020]file.troff There seem to be no further surprises here. Unix V7 did not have _soelim_, either. Let me check Solaris 10. $ printf '.so foo\\ bar\\ file.troff\n' | soelim foo\: No such file or directory .so foo\ bar\ file.troff $ printf '.so "foo bar file.troff\n' |soelim "foo: No such file or directory .so "foo bar file.troff $ printf '.so "foo.troff\\"comment\n' |soelim "foo.troff\"comment: No such file or directory .so "foo.troff\"comment $ printf '.so foo\u[0020]bar\u[0020]file.troff\n' |soelim foo\u[0020]bar\u[0020]file.troff: No such file or directory .so foo\u[0020]bar\u[0020]file.troff These look identical to Heirloom to me. I guess we know now where Heirloom got its inspiration, and perhaps even code, for _soelim_ from. Since backslash-space is apparently a GNU extension in the first place, we might consider dropping it. It wasn't portable, and even the rest of the _groff_ ecosystem struggled to handle files with spaces in their names. I further venture that this exact same syntax could be applied to the `sy`/`pso` problem in bug #62787 and to user-constructed diagnostic messages in bug #64071. I highly value the prospect of having a parallel syntax for these 3 issues if we can get it. For _soelim_(1) itself I would further add that this program will continue to recognize only backslash as an escape character, but GNU _troff_ will recognize the configured escape character. Thoughts? _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?65108> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/
signature.asc
Description: PGP signature