Re: WAYTO: indexed man pages

2021-06-02 Thread James K. Lowden
On Mon, 31 May 2021 14:48:42 -0400
Douglas McIlroy  wrote:

> > Now that I think of it, the current system is makewhatis/apropos. I
> > often get a ton of noise entries, usually perl modules, but maybe
> > there?s a way around that.
> 
> apropos whatever | grep -v  '(3p'

That's what we have, but is it the best we can do?  I bet we agree the
answer to that question is No.  

Try for example 

apropos color

On my system, with xlib and some perl stuff installed, that yields 178
entries. I can winnow it down to 38 with:

man -k color | grep -v ^'[xX]' | grep -v :: 

including ppmquant (for X) and ctail (for dot) and 3 items releated to
what I want, dircolors.  That's a 2% signal/noise ratio.  Ironically,
most of the apropos output is not apropos to the input.  

The problem, I assert, is lack of context.  apropos has no way to know
I"m interested in colors for filenames in ls.  

The fiirst order of business IMO is to navigate large documents by
indexed keyword, something the info reader does tolerably well.  

More sophisticated -- and requiring no new input -- would be an ability
to "zoom out" of the context of one manpage to show related pages that
reference the term. If I'm reading the ls(1) page and don't find what I
want, what's "in the neighborhood"?  Well, dpkg tell us that ls(1) is
part of GNU coreutils. AFAIK, the man system offers no way to ask "what
coretutils manpages reference color"?  A further outer ring of
association can also be derived from the packaging system, namely
packages that depend on the package, or that it depends on, or that are
recommended, subject to the constraint (or not) that they're
installed.  

Another basis for "zoom out" could be the kind of work that made Google
rich: citation counts.  If ls(1) references certain documents or
environment variables, what other documents reference those same
documents/variables?  If many do, that's information.  It's not rocket
surgery, either; it's basically what cscope has been doing for 30
years for function calls. 

ISTM that we rely too heavily on general tools like regular
expressions, and don't exploit information already present in our
systems.  We're training ourselves to create "google-able" terms --
like go-lang for the Go language -- because general purpose search
engines lack context specifiers.  

We also don't leverage the documentation writer's expertise
and *time*: any effort to add index terms to documentation is nothing
next to the thousands or millions of times that page will be read and
searched.  That is why I want to provide authors with macros for index
terms: to let them to express their expertise for the benefit of all.  

I don't see how the value of a subject index can be doubted, given that
every large body of information is indexed, be it the Encyclopedia
Britannica or your local library. Nor is technical feasibility a high
bar.  The real obstacle, as ever, is people.  

Where there's a will there's a way.  But: is there a will? 

--jkl



Re: WAYTO: indexed man pages

2021-06-02 Thread Ingo Schwarze
Hi James,

James K. Lowden wrote on Sat, May 29, 2021 at 06:01:52PM -0400:

> how indexes could be implemented in

That problem was solved in 2016, and five years of experience have
shown the solution works well.

  https://man.openbsd.org/mandoc.db

> then man(1) might be adapted to use it,

It already is.  Try

  man -O tag=set ksh

on a system running a reasonably recent (less than two years old)
version of the mandoc implementation of man(1).

By the way, the same tags that man(1) reaches with -O tag
can be navigated to on the web as well:

  https://man.openbsd.org/ksh#set~2

Imagine the noise trying to search by typing "/set" in less(1)...

> and then man page authors woud have cause to designate indexed terms.

Five years of experience show there is surprisingly little need for
that.  The mandoc implementation of deep linking to anchors fully
automated the task without requiring any additional markup.  In
practice, it just works without needing to bother the author, just
regular semantic markup is sufficient without adding manual anchors,
in the vast majority of cases.

All the same, the mdoc(7) language did recently start supporting
manual tagging for the very small fraction of situations where it
helps, and for the relatively few authors who are willing to go to
such lengths:

  https://man.openbsd.org/mdoc#Tg~2

> The best "find term by index" feature in documentation viewing software
> isn't very good: in the info viewer, you can type " i " and a string,
> and enter.   It also supports tab-completion on indexed terms, which is
> as close as info ever comes to "very cool".  

That's not the state of the art.
Mandoc has been doing semantic searching since about 2004:

  https://man.openbsd.org/apropos#Macro_Keys
  https://man.openbsd.org/apropos#EXAMPLES

I often use queries akin to

  man -ks 2 Vt=timespec

when i wonder which system calls deal with the "struct timespec"
data type.  And the like - you get the picture.

> The nearest thing in less(1) to support for an index would be its tag
> support. If groff emitted a tags file, less could use it to navigate
> within the man page.  

Yes, the mandoc implementation of man(1) has been doing exactly that
for years.

> I would bet it's technically feasible to use "one giant tags file",
> indexing the whole man-page corpus.

Well, actually, you get one mandoc.db(5) file per manual page tree,
so for example if you have your base system in /usr/, your ports and
packages in /usr/local/ and your X11 in /usr/X11R6/, then you get

  /usr/share/man/mandoc.db
  /usr/local/man/mandoc.db
  /usr/X11R6/man/mandoc.db

and man(1) transparently knows how to use those.
If you have specialised collections of manual pages installed,
you may also have files like

  /usr/local/lib/tcl/tcl8.5/man/mandoc.db
  /usr/local/plan9/man/mandoc.db
  /usr/local/share/doc/posix/man/mandoc.db
  /usr/local/lib/eopenssl11/man/mandoc.db

such that these don't get confused with native system documentation
but can be accessed when desired using the -M option or the MANPATH
environment variable.

> There is the, ahem,  *slight* problem of how to denote index anchors,
> i.e., locations in the document that the index "points at" for a term
> included in the index. Currently in mdoc we have only "points to"
> macros:
> 
>   .Sx point to a heading
>   .Xr point to a page
>   .Lk point to a URL (mandoc)
> 
> IOW mdoc has nothing akin to the HTML functionality 
> 
>   
> and
>   .  

It has: .Tg
Only, it has not yet been implemented in groff.

Then again, it is very rarely needed.
The OpenBSD base system currently contains 82 instances of .Tg
in its 3325 mdoc manual pages, so on average one every fourty pages.

> Possible ways to denote a target in mdoc: 
> 
> 1.  Extend .Lk to denote a target
> 2.  Introduce a new macro, perhaps .Ix

Please, do not reinvent what is already in production and working well
for years.

> 1.  If grotty produced etags output, less(1) could support navigation
> by "man page tags". 

The mandoc implemention of man(1) has already been doing exactly
that for years.  And the tags you can access with less(1) :t are
more or less the same that you can navigate to on the web using
fragment identifiers.

> 3.  If the above two were accomplished, man page authors would finally
> have the opportunity to enhance their documents with index anchors. 

Little need to, it mostly just works without providing manual anchors.

> IMO hyperlink nonsupport is the Achilles heel of the man page system.

Maybe it was a decade ago.

All that said, specific suggestions to improve the system are always
welcome, and so are patches.  During the last decade, i have
received useful suggestions from probably more than a hundred
people, and patches from probably more than twenty, too.

To name just one person who helped move tagging and indexing support
forward even further recently: Klemens Nanni provided valuable input
on several occasions, but others helped, too.  And of course

Re: WAYTO: indexed man pages

2021-06-02 Thread Steffen Nurpmeso
Ingo Schwarze wrote in
 <20210602223746.ga92...@athene.usta.de>:
 ...
 |Then again, i feel that's a problem of relatively little urgency.

That is because you use that mutilated less.  The original can.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)