On Mon, Mar 02, 2020 at 06:25:47PM +0100, Ingo Schwarze wrote:
> Hi,
> 
> Marc Chantreux wrote on Mon, Mar 02, 2020 at 11:49:31AM +0100:
> 
> > coming from linux, i'm used to read manpages
> > in a vi buffer so i can do much more than
> > reading the content.
> 
> I have no idea what the "much more" refers to.  The main effect is to
> lose tagging functionality.  That is, compared to man(1) with the
> default pager, you cannot use the :t functionality to move to the
> place where a word is defined.
> 
> > i basically use
> > 
> >     :r !man ls
> >     or
> >     !!sh (when the line content is "man ls")
> 
> Yikes.  I had no idea what either of these are doing and had to
> try them out.  vi(1) contains so much bloat that is never really
> needed and doesn't belong in a text editor at all.
> 
> > under openbsd, it seems man doesn't if stdout
> > is a tty.
> 
> You mean, man(1) doesn't *imply col -b* if stdout is *not* a tty?
> 
> > i digged the man manual a little bit
> > without finding a solution so i worked the
> > things around:
> > 
> >     :r !man ls|fmt
> 
> As others said, the normal way to strip backspace formatting is
> 
>    $ man ls | col -b
> 
> It is documented in man(1) below the -c option and below EXAMPLES,
> and in mandoc(1) below "ASCII Output":
> 
>   https://man.openbsd.org/man.1#c
>   https://man.openbsd.org/man.1#EXAMPLES
>   https://man.openbsd.org/mandoc.1#ASCII_Output
> 
> You find such stuff as follows:
> 
>    $ man -k 'Xr=col(1)'
>   man(1) - display manual pages
>   mandoc(1) - format manual pages
> 
> The advantage of col(1) over fmt(1) is that it is guaranteed to not
> mess up line breaks.
> 
> > now i would like a poor version of keyword
> > feature in openbsd vi. the linux version
> > 
> >     map K yw:E /tmp/vi.keyword.$$p!!xargs man
> 
> You don't say what that is supposed to do.
> 
> Under Debian Jessie, if i start "vim", then type
> 
>   :map K yw:E /tmp/vi.keyword.$$p!!xargs man   <ENTER>
>   als   <ESC>
>   K   <ENTER>
> 
> i get:
> 
>   Error detected while processing function 
> netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore..netrw#Explore:
>   line   30:
>   E132: Function call depth is higher than 'maxfuncdepth'
>   Press ENTER or type command to continue
> 
> That doesn't seem useful to me.
> 
> I also tried the same with OpenBSD vi(1) and it resulted in
> 
>   Usage: e[dit][!] [+cmd] [file].
> 
> So, no idea what you are trying to do.
> 
> > becomes
> > 
> >     map K yw:E /tmp/vi.keyword.$$p!!xargs -IX sh -c 'man X|fmt'
> > 
> > which doesn't work as | separates 2 vi commands.
> > 
> > i really would like to know one or the two of these:
> > 
> > * is there a way to ask man to deliver pure (non-formatted) text ?
> 
> In 2014, i already wrote a patch to do that because the question
> came up repeatedly.  But demand wasn't that high after all, so i
> never committed it.  Now, i updated the patch to -current, see
> below.
> 
> On the one hand, the UNIX phlosophy is to have each tool do one
> thing well, then use pipes to connect tools as needed.  Then again,
> arguably, you maybe shouldn't need another tool to just revert
> something that the first tool does.  Why would *not* adding backspace
> formatting require a pipe to another program, rather than not adding
> it in the first place?
> 
> Also, the patch that would be required is very small and straightforward.
> 
> So, what do people think?  Should i test the patch below in more
> depth and commit it?  Or do people consider this bloat?
> 
> Yours,
>   Ingo
> 
> 
> Index: main.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/mandoc/main.c,v
> retrieving revision 1.247
> diff -u -p -r1.247 main.c
> --- main.c    24 Feb 2020 21:15:05 -0000      1.247
> +++ main.c    2 Mar 2020 17:06:53 -0000
> @@ -158,6 +158,7 @@ main(int argc, char *argv[])
>       /* Search options. */
>  
>       memset(&conf, 0, sizeof(conf));
> +     conf.output.backspace = -1;
>       conf_file = NULL;
>       defpaths = auxpaths = NULL;
>  
> @@ -373,6 +374,9 @@ main(int argc, char *argv[])
>                       return mandoc_msg_getrc();
>               }
>       }
> +
> +     if (conf.output.backspace == -1)
> +             conf.output.backspace = 1;
>  
>       /* Parse arguments. */
>  
> Index: manconf.h
> ===================================================================
> RCS file: /cvs/src/usr.bin/mandoc/manconf.h,v
> retrieving revision 1.7
> diff -u -p -r1.7 manconf.h
> --- manconf.h 22 Nov 2018 11:30:15 -0000      1.7
> +++ manconf.h 2 Mar 2020 17:06:54 -0000
> @@ -1,6 +1,6 @@
>  /*   $OpenBSD: manconf.h,v 1.7 2018/11/22 11:30:15 schwarze Exp $ */
>  /*
> - * Copyright (c) 2011, 2015, 2017, 2018 Ingo Schwarze <schwa...@openbsd.org>
> + * Copyright (c) 2011,2015,2017,2018,2020 Ingo Schwarze 
> <schwa...@openbsd.org>
>   * Copyright (c) 2011 Kristaps Dzonsons <krist...@bsd.lv>
>   *
>   * Permission to use, copy, modify, and distribute this software for any
> @@ -33,6 +33,7 @@ struct      manoutput {
>       char     *tag;
>       size_t    indent;
>       size_t    width;
> +     int       backspace;
>       int       fragment;
>       int       mdoc;
>       int       noval;
> Index: mandoc.1
> ===================================================================
> RCS file: /cvs/src/usr.bin/mandoc/mandoc.1,v
> retrieving revision 1.166
> diff -u -p -r1.166 mandoc.1
> --- mandoc.1  15 Feb 2020 15:28:01 -0000      1.166
> +++ mandoc.1  2 Mar 2020 17:06:55 -0000
> @@ -284,6 +284,13 @@ The following
>  .Fl O
>  arguments are accepted:
>  .Bl -tag -width Ds
> +.It Cm format Ns = Ns Cm none
> +No back-spaced encoding is used, neither for bold face and underlining
> +nor for character overstrikes.  Only the last character of each
> +overstrike group is printed.
> +This has the same effect as piping the output through
> +.Xr col 1
> +.Fl bx .
>  .It Cm indent Ns = Ns Ar indent
>  The left margin for normal text is set to
>  .Ar indent
> Index: manpath.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/mandoc/manpath.c,v
> retrieving revision 1.28
> diff -u -p -r1.28 manpath.c
> --- manpath.c 10 Feb 2020 14:42:03 -0000      1.28
> +++ manpath.c 2 Mar 2020 17:06:57 -0000
> @@ -1,6 +1,6 @@
>  /*   $OpenBSD: manpath.c,v 1.28 2020/02/10 14:42:03 schwarze Exp $ */
>  /*
> - * Copyright (c) 2011,2014,2015,2017-2019 Ingo Schwarze 
> <schwa...@openbsd.org>
> + * Copyright (c) 2011,2014,2015,2017-2020 Ingo Schwarze 
> <schwa...@openbsd.org>
>   * Copyright (c) 2011 Kristaps Dzonsons <krist...@bsd.lv>
>   *
>   * Permission to use, copy, modify, and distribute this software for any
> @@ -226,7 +226,7 @@ manconf_output(struct manoutput *conf, c
>  {
>       const char *const toks[] = {
>           "includes", "man", "paper", "style", "indent", "width",
> -         "tag", "fragment", "mdoc", "noval", "toc"
> +         "format", "tag", "fragment", "mdoc", "noval", "toc"
>       };
>       const size_t ntoks = sizeof(toks) / sizeof(toks[0]);
>  
> @@ -247,11 +247,11 @@ manconf_output(struct manoutput *conf, c
>               }
>       }
>  
> -     if (tok < 6 && *cp == '\0') {
> +     if (tok < 7 && *cp == '\0') {
>               mandoc_msg(MANDOCERR_BADVAL_MISS, 0, 0, "-O %s=?", toks[tok]);
>               return -1;
>       }
> -     if (tok > 6 && tok < ntoks && *cp != '\0') {
> +     if (tok > 7 && tok < ntoks && *cp != '\0') {
>               mandoc_msg(MANDOCERR_BADVAL, 0, 0, "-O %s=%s", toks[tok], cp);
>               return -1;
>       }
> @@ -308,22 +308,43 @@ manconf_output(struct manoutput *conf, c
>                   "-O width=%s is %s", cp, errstr);
>               return -1;
>       case 6:
> +             switch (conf->backspace) {
> +             case 0:
> +                     oldval = mandoc_strdup("none");
> +                     break;
> +             case 1:
> +                     oldval = mandoc_strdup("backspace");
> +                     break;
> +             default:
> +                     if (strcmp(cp, "none") == 0) {
> +                             conf->backspace = 0;
> +                             return 0;
> +                     } else if (strcmp(cp, "backspace") == 0) {
> +                             conf->backspace = 1;
> +                             return 0;
> +                     }
> +                     mandoc_msg(MANDOCERR_BADVAL_BAD, 0, 0,
> +                         "-O format=%s", cp);
> +                     return -1;
> +             }
> +             break;
> +     case 7:
>               if (conf->tag != NULL) {
>                       oldval = mandoc_strdup(conf->tag);
>                       break;
>               }
>               conf->tag = mandoc_strdup(cp);
>               return 0;
> -     case 7:
> +     case 8:
>               conf->fragment = 1;
>               return 0;
> -     case 8:
> +     case 9:
>               conf->mdoc = 1;
>               return 0;
> -     case 9:
> +     case 10:
>               conf->noval = 1;
>               return 0;
> -     case 10:
> +     case 11:
>               conf->toc = 1;
>               return 0;
>       default:
> Index: term.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/mandoc/term.c,v
> retrieving revision 1.141
> diff -u -p -r1.141 term.c
> --- term.c    3 Jun 2019 20:23:39 -0000       1.141
> +++ term.c    2 Mar 2020 17:07:04 -0000
> @@ -1,7 +1,7 @@
>  /*   $OpenBSD: term.c,v 1.141 2019/06/03 20:23:39 schwarze Exp $ */
>  /*
>   * Copyright (c) 2008, 2009, 2010, 2011 Kristaps Dzonsons <krist...@bsd.lv>
> - * Copyright (c) 2010-2019 Ingo Schwarze <schwa...@openbsd.org>
> + * Copyright (c) 2010-2020 Ingo Schwarze <schwa...@openbsd.org>
>   *
>   * Permission to use, copy, modify, and distribute this software for any
>   * purpose with or without fee is hereby granted, provided that the above
> @@ -795,24 +795,26 @@ encode1(struct termp *p, int c)
>       f = (c == ASCII_HYPH || c > 127 || isgraph(c)) ?
>           p->fontq[p->fonti] : TERMFONT_NONE;
>  
> -     if (p->flags & TERMP_BACKBEFORE) {
> -             if (p->tcol->buf[p->col - 1] == ' ' ||
> -                 p->tcol->buf[p->col - 1] == '\t')
> -                     p->col--;
> -             else
> +     if (p->backspace) {
> +             if (p->flags & TERMP_BACKBEFORE) {
> +                     if (p->tcol->buf[p->col - 1] == ' ' ||
> +                         p->tcol->buf[p->col - 1] == '\t')
> +                             p->col--;
> +                     else
> +                             p->tcol->buf[p->col++] = '\b';
> +                     p->flags &= ~TERMP_BACKBEFORE;
> +             }
> +             if (f == TERMFONT_UNDER || f == TERMFONT_BI) {
> +                     p->tcol->buf[p->col++] = '_';
>                       p->tcol->buf[p->col++] = '\b';
> -             p->flags &= ~TERMP_BACKBEFORE;
> -     }
> -     if (f == TERMFONT_UNDER || f == TERMFONT_BI) {
> -             p->tcol->buf[p->col++] = '_';
> -             p->tcol->buf[p->col++] = '\b';
> -     }
> -     if (f == TERMFONT_BOLD || f == TERMFONT_BI) {
> -             if (c == ASCII_HYPH)
> -                     p->tcol->buf[p->col++] = '-';
> -             else
> -                     p->tcol->buf[p->col++] = c;
> -             p->tcol->buf[p->col++] = '\b';
> +             }
> +             if (f == TERMFONT_BOLD || f == TERMFONT_BI) {
> +                     if (c == ASCII_HYPH)
> +                             p->tcol->buf[p->col++] = '-';
> +                     else
> +                             p->tcol->buf[p->col++] = c;
> +                     p->tcol->buf[p->col++] = '\b';
> +             }
>       }
>       if (p->tcol->lastcol <= p->col || (c != ' ' && c != ASCII_NBRSP))
>               p->tcol->buf[p->col] = c;
> @@ -839,7 +841,9 @@ encode(struct termp *p, const char *word
>               adjbuf(p->tcol, p->col + 2 + (sz * 5));
>  
>       for (i = 0; i < sz; i++) {
> -             if (ASCII_HYPH == word[i] ||
> +             if (p->backspace == 0 && word[i] == '\b')
> +                     p->col--;
> +             else if (word[i] == ASCII_HYPH ||
>                   isgraph((unsigned char)word[i]))
>                       encode1(p, word[i]);
>               else {
> Index: term.h
> ===================================================================
> RCS file: /cvs/src/usr.bin/mandoc/term.h,v
> retrieving revision 1.75
> diff -u -p -r1.75 term.h
> --- term.h    4 Jan 2019 03:20:44 -0000       1.75
> +++ term.h    2 Mar 2020 17:07:04 -0000
> @@ -1,7 +1,7 @@
>  /*   $OpenBSD: term.h,v 1.75 2019/01/04 03:20:44 schwarze Exp $ */
>  /*
>   * Copyright (c) 2008, 2009, 2010, 2011 Kristaps Dzonsons <krist...@bsd.lv>
> - * Copyright (c) 2011-2015, 2017, 2019 Ingo Schwarze <schwa...@openbsd.org>
> + * Copyright (c) 2011-2015,2017,2019,2020 Ingo Schwarze 
> <schwa...@openbsd.org>
>   *
>   * Permission to use, copy, modify, and distribute this software for any
>   * purpose with or without fee is hereby granted, provided that the above
> @@ -73,6 +73,7 @@ struct      termp {
>       size_t            viscol;       /* Chars on current line. */
>       size_t            trailspace;   /* See term_flushln(). */
>       size_t            minbl;        /* Minimum blanks before next field. */
> +     int               backspace;    /* Use \b in output. */
>       int               synopsisonly; /* Print the synopsis only. */
>       int               mdocstyle;    /* Imitate mdoc(7) output. */
>       int               ti;           /* Temporary indent for one line. */
> Index: term_ascii.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/mandoc/term_ascii.c,v
> retrieving revision 1.50
> diff -u -p -r1.50 term_ascii.c
> --- term_ascii.c      19 Jul 2019 21:45:37 -0000      1.50
> +++ term_ascii.c      2 Mar 2020 17:07:04 -0000
> @@ -1,7 +1,7 @@
>  /*   $OpenBSD: term_ascii.c,v 1.50 2019/07/19 21:45:37 schwarze Exp $ */
>  /*
>   * Copyright (c) 2010, 2011 Kristaps Dzonsons <krist...@bsd.lv>
> - * Copyright (c) 2014, 2015, 2017, 2018 Ingo Schwarze <schwa...@openbsd.org>
> + * Copyright (c) 2014, 2015, 2017-2020 Ingo Schwarze <schwa...@openbsd.org>
>   *
>   * Permission to use, copy, modify, and distribute this software for any
>   * purpose with or without fee is hereby granted, provided that the above
> @@ -112,6 +112,8 @@ ascii_init(enum termenc enc, const struc
>               }
>       }
>  
> +     if (outopts->backspace)
> +             p->backspace = 1;
>       if (outopts->mdoc) {
>               p->mdocstyle = 1;
>               p->defindent = 5;
> 


Hi,

I wanted to do a similar thing (mandoc to UTF-8 text) and used col -b.

I noticed while processing the output of mandoc(1) to ASCII/UTF-8 using col(1)
it filters away UTF-8 non-breaking spaces too (\xc2\xa0) for example.

To reproduce more simply:

OpenBSD:
printf 'test\xc2\xa0.\n' | col -b | hexdump -C
00000000  74 65 73 74 2e 0a                                 |test..|

util-linux col uses wide-chars and outputs:
00000000  74 65 73 74 c2 a0 2e 0a                           |test....|

On NetBSD and other col implementations there is a -p option.
The -p option is specified in an older standard:

Technical Standard Commands and Utilities Issue 4, Version 2
page 200
https://pubs.opengroup.org/onlinepubs/9695969399/toc.pdf

The below patch adds -p to col (from NetBSD):


Patch below:


diff --git usr.bin/col/col.1 usr.bin/col/col.1
index cceebfec5db..f0f1e906992 100644
--- usr.bin/col/col.1
+++ usr.bin/col/col.1
@@ -41,7 +41,7 @@
 .Nd filter reverse line feeds and backspaces from input
 .Sh SYNOPSIS
 .Nm col
-.Op Fl bfhx
+.Op Fl bfhpx
 .Op Fl l Ar num
 .Sh DESCRIPTION
 .Nm
@@ -73,6 +73,12 @@ Buffer at least
 .Ar num
 lines in memory.
 By default, 128 lines are buffered.
+.It Fl p
+Force unknown control sequences to be passed through unchanged.
+Normally,
+.Nm
+will filter out any control sequences from the input other than those
+recognized and interpreted by itself, which are listed below.
 .It Fl x
 Output multiple spaces instead of tabs.
 .El
diff --git usr.bin/col/col.c usr.bin/col/col.c
index c3c51b4c630..8b59a2f09cf 100644
--- usr.bin/col/col.c
+++ usr.bin/col/col.c
@@ -92,6 +92,7 @@ int   fine;                   /* if `fine' resolution (half 
lines) */
 int    max_bufd_lines;         /* max # of half lines to keep in memory */
 int    nblank_lines;           /* # blanks after last flushed line */
 int    no_backspaces;          /* if not to output any backspaces */
+int    pass_unknown_seqs;      /* whether to pass unknown control sequences */
 
 #define        PUTC(ch) \
        if (putchar(ch) == EOF) \
@@ -118,7 +119,8 @@ main(int argc, char *argv[])
 
        max_bufd_lines = 256;
        compress_spaces = 1;            /* compress spaces into tabs */
-       while ((opt = getopt(argc, argv, "bfhl:x")) != -1)
+       pass_unknown_seqs = 0;          /* remove unknown escape sequences */
+       while ((opt = getopt(argc, argv, "bfhl:px")) != -1)
                switch (opt) {
                case 'b':               /* do not output backspaces */
                        no_backspaces = 1;
@@ -136,6 +138,9 @@ main(int argc, char *argv[])
                                errx(1, "bad -l argument, %s: %s", errstr, 
                                        optarg);
                        break;
+               case 'p':               /* pass unknown control sequences */
+                       pass_unknown_seqs = 1;
+                       break;
                case 'x':               /* do not compress spaces into tabs */
                        compress_spaces = 0;
                        break;
@@ -212,7 +217,8 @@ main(int argc, char *argv[])
                                addto_lineno(&cur_line, -2);
                                continue;
                        }
-                       continue;
+                       if (!pass_unknown_seqs)
+                               continue;
                }
 
                /* Must stuff ch in a line - are we at the right one? */
@@ -534,7 +540,7 @@ xreallocarray(void *p, size_t n, size_t size)
 void
 usage(void)
 {
-       (void)fprintf(stderr, "usage: col [-bfhx] [-l num]\n");
+       (void)fprintf(stderr, "usage: col [-bfhpx] [-l num]\n");
        exit(1);
 }
 

-- 
Kind regards,
Hiltjo

Reply via email to