Nicholas Clark wrote:
First, a definition. By "scope", I mean the part of the document that is
deemed relevant to an index entry, and that may be extracted and shown
in isolation by a processing or display tool. For example, perldoc -f
considers the scope of a function to end at the beginning of the next
=item, or at the end of the enclosing =over.
* There's no limitation as to the number of times that a given entry can
appear in a document or collection of documents. That is, it is not an
error to have X<whatever> appear twice in the same file.
This means that if two or more adjacent paragraphs are needed to make sense,
it's no problem under the scope rules - just mark both.
Good point. And perhaps some of the programs that use the data could
choose two treat two consecutive paragraphs as one entry. Although in
some cases it's simpler to just choose a slightly wider scope, as it
might be better to have too much context than not enough.
Currently it is used in only *one* place in the perl documentation:
pod/perlfunc.pod uses it for the "-X" filetest operators.
* It should be considered case-insensitive.
This would lead to some ambiguities:
-b File is a block special file.
-B File is a "binary" file (opposite of -T).
-c File is a character special file.
-C Same for inode change time (Unix, may differ for other platforms)
-s File has nonzero size (returns size in bytes).
-S File is a socket.
-t Filehandle is opened to a tty.
-T File is an ASCII text file (heuristic guess).
I'm not sure if this is really a problem
Or maybe I was too quick to dismiss case-sensitivity. Maybe it should be
up to the processing programs to decide whether they want to be
case-sensitive or not (and maybe the user can control it with
command-line arguments and such). But if we go for case-sensitivity, I'd
add a style rule that says:
* all entries should be written in lowercase, unless uppercase is
necessary due to case sensitivity. For example, for generic keyworks
like "operator", use X<operator>, not X<Operator>.
Perl comes with over 100 files in the pod/ directory, totaling over
100,000 lines of POD. Obviously, indexing all of it by hand is a very
large task, so the question arises as to who will do it. If people agree
that this is a good idea and are willing to apply the patches, I could
lead the project, and hope to attracting volunteers. In the worst case
(no one else is willing to help), I believe that even if I can't index
*all* of the pods, a partial index is better than no index at all. I
would start with the documents that I consider more important, such as
perlop, perlsub, perlre, perlobj, etc. Documents such as perldelta* and
the faqs probably don't need indexing that much.
Potentially the faqs do. The deltas might benefit from it, particularly if
searching for a topic on a recent set of documentation brings up an important
bug fix in a specific version of perl, and the user realises that there perl
is older than this.
I agree that it can be useful, it's just a lower priority IMO. The faqs
because they can already be searched with perldoc -q (at least the
questions), and perldelta* because it's a bit "esoteric" (not what a
beginner would be looking at when trying to figure out the purpose of an
operator or function, for example ;-)
The proposal seems very well thought through, and I'd be very happy to see
you start on this soon. I'd hope that you'd soon attract volunteers to help,
but as you rightly say, only time will tell.
Thanks, I'm glad you like it. I've already started. Any volunteers? ;-)
I'll post later in perlmonks to see if there are any potential
volunteers out there.
Ivan