Hi I think I should probably respond in the channel so the other folks in the z/OS Open Tools community can see. I think I may have botched this a little. Do you typically respond through email or do you use the web interface? I'm probably not doing this quite right on my end... I am not subscribed yet but I can - is that a better way to respond?
thanks, mike On Fri, Mar 31, 2023 at 8:57 AM G. Branden Robinson < g.branden.robin...@gmail.com> wrote: > [let me know if you're subscribed to the list or if you'd prefer not to > be CCed] > > [also, if you want to break any of the several subjects arising in this > message into a separate thread, please feel free] > > Hi Mike, > > At 2023-03-31T07:29:16-0700, Mike Fulton wrote: > > Over the last year, we have been working hard in the z/OS Open Tools > > community (https://zosopentools.github.io/meta/#/) to not only port > > the fundamental tools to z/OS, but also to do it completely in the > > open. > > This is good news! Knowing that you're a software developer might also > make communications easier. :) > > > We create one 'port' repo for each Open Source package and the repo > > contains information on compiler options, dependencies, and so forth > > so that anyone can (relatively easily) build the software. > > > We also have a special repo (meta) that has a rudimentary package > > manager and build tool that we use (e.g. _zopen install_ to install > > binaries, _zopen build_ to build from source, etc.). > > Much as with GNU/Linux distributions; this is a pleasure to hear. > > As a groff developer, I'm interested in minimizing the number of patches > you have to carry "downstream" to support groff. > > I assume the change here: > > > https://github.com/ZOSOpenTools/groffport/blob/main/patches/makevarescape.sed.patch > > is due to a limitation of the system's sed(1)? > > If the problem is the '\+' part of the pattern, I see that POSIX says > that the interpretation of that is "implementation-defined", though the > latest draft of Issue 8 (just out in the past 24 hours or so) says that > "a future version of this standard may require "\?", "\+", and "\|" to > behave as described for the ERE special characters '?', '+', and '|', > respectively." (IEEE P1003.1™-202x/D3, March 2023, p. 181). > > A workaround would be: > > -s|[^ ]/\+|&\\\\:|g > +s|[^ ]//*|&\\\\:|g > > If you also want to steal a slight improvement from groff 1.23, you can > do this instead: > > -s|[^ ]/\+|&\\\\:|g > +s|[^ ]//*|&\\\\:\\\\%|g > > > We have indeed moved to a 'UTF-8 first' model, which for the most part > > is a 'ISO8859-1 first' model > > Interestingly, this meshes closely with groff's assumptions. Due to its > chronological origins ca. 1990, it does not accept UTF-8 input, but it > aware of UTF-8 and can produce it as output. The formatter, troff(1), > accepts ISO Latin-1 input, except on systems where the C preprocessor > macro "IS_EBCDIC_HOST" evaluates true; it then assumes that its input is > encoded using code page 1047. > > I reckon you've already dealt with this if necessary, and ensured that > your groff 1.22.4 build does not define that symbol. > > Is code page 1047 deprecated or obsolescent on z/OS? If groff dropped > support for it, do you suspect any z/OS users would be inconvenienced? > > > and we have a special OS library that takes care of edge case > > conversions to EBCDIC (and provides a couple functions that are > > missing). This is also Open Source (zoslib). > > This really good stuff to hear about; thanks for bringing this > initiative to my attention. > > > We have about 80 packages we are porting / have ported. Some are very > > far along like gnu make and Perl with many fixes upstreamed. Some are > > just barely building - htop is probably a good example of one we have > > just started on. > > I'm glad groff is a member of the first 100! :D > > > I am also not sure if we want to work in UTF-8 or in ISO-8859-1. My > > goal would be UTF-8 across the board, but I expect there are things we > > still need to fix to get there. Our vim port seems to work well with > > UTF-8 but I'll be honest that the testing of that is sparse still. > > My suggestion would be to back the UTF-8 horse. groff already has > machinery in place for accommodating input in UTF-8 via the preconv(1) > preprocessor. > > If there is no longer an audience for code page 1047, several aspects of > groff could be simplified, and it might make the transition of GNU > troff's internal type to int32_t easier. (I started down this road once > before.) > > > With all that background, I'm wondering if 'both' is the right answer? > > I don't feel qualified to answer this question in general; for groff, > it's a pickle because the original implementer (James Clark) used many > C0 and C1 control code points for internal purposes, to encode "node > types" that could be encountered internally by the formatter when > processing diversions (a Unix nroff/troff feature that usually only > authors of macro packages mess with). > > You can see these assignments in the "input.h" header file. > > https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/input.h > > Use of these codes for internal purposes isn't necessarily incompatible > with UTF-8 input; GNU troff already rejects them upon input, and almost > none of them are meaningful for a "plain text" document that is going to > achieve format control mostly via roff language features rather than > control characters. Input processing could be made more sophisticated > (and more stateful when reading the input byte stream to keep track of > UTF-8 sequences). > > > Would others also find it valuable to be able to have the mathematical > > angle brackets in UTF-8 be transliterated to angle brackets in > > ISO8859-1? > > Unless you mean degradation to basic Latin less than and greater than > signs, U+003C and U+003E, then I don't think there are any valid > transliteration targets in ISO Latin-1. The "left-" and "right-pointing > double angle quotation mark"s (U+00AB and U+00BB) are indeed visually > similar but semantically pretty distinct. I don't think I'd want to > impose such a fallback in general. (There are multiple ways groff users > could provide fallbacks for themselves.) > > > If so, perhaps a 'starter fix' would be if I worked with the libiconv > > folks to see if that can be added (I opened a similar question in the > > libiconv channel since honestly I'm not sure the best way to fix > > this). > > You can pursue both lines of attack independently, especially if the > iconv developers have a good reason for not performing this fallback > already. > > I'm not sure groff has a good reason for not performing this fallback. > At this point I think I will tap Dave Kemper, another groff developer > who has a fairly strong interest in the fallback issue. > > > In parallel, I think I need to understand how I could change the way I > > build man so that it operates in UTF-8 mode. > > I think that is a good idea. It looks like your man is man-db, which is > really good news because that's developed by Colin Watson who has also > been groff's package maintainer for Debian for a long time. > > Probably the first thing to do is make sure we know what groff is > producing in your environment. > > Here is how to (mostly) bypass man(1) and render the groff(1) man page > much as man(1) itself would do. > > $ zcat $(man -w groff) | groff -man -Tutf8 | less -R > > (If less(1) is not available, try "more", "more -b", or this: > > $ zcat $(man -w groff) | groff -man -Tutf8 -P -c | ul | more > > FYI: The version of "more" on my Debian system breaks lines at incorrect > places when given the above.) > > Here, we are using man(1) only as a librarian, to tell us where the > groff(1) man page is. We are directing formatting ourselves. > > If this looks fine and you get the angle brackets you're expecting, then > something is running in the pipeline man-db man(1) constructs, _after_ > grotty(1) produces the output, and doing violence to the angle brackets; > that would be where the bug lies. > > To cut out yet another source of trouble, if your terminal emulator has > more than 765 lines of scrollback buffer, you can omit paging the > groff(1) document entirely. > > But if it _doesn't_ look fine, then we need to find out why. > > I would next inspect groff's device-independent output (which I call > "grout" for short) to see what's being handed to groff's terminal output > driver (grotty(1)). > > $ zcat $(man -w groff) | groff -man -Tutf8 | less > > Around line 459 you should see a sequence of lines like this. > > tGNU > wh24 > Cla > h24 > thttp://www.gnu.org > Cra > h24 > t. > > Those "Cla" and "Cra" lines are key. If they are not absent, then you > have almost certainly found a bug in groff. > > Another thing I would do is to view the groff_char(7) man page. > > $ man groff_char > > On my system, code point coverage is complete except for three > characters. > > troff: <standard input>:1051: warning: can't find special character 'bs' > troff: <standard input>:1192: warning: can't find special character > 'radicalex' > troff: <standard input>:1195: warning: can't find special character > 'sqrtex' > > These problems are expected everywhere[1] for historical and technical > reasons I won't get into unless asked. > > Let me know what you find and we'll see if we can narrow this down. > > Regards, > Branden > > [1] the first everywhere, the last two on all terminal devices >