Nicholas Clark wrote:
First, a definition. By "scope", I mean the part of the document that is deemed relevant to an index entry, and that may be extracted and shown in isolation by a processing or display tool. For example, perldoc -f considers the scope of a function to end at the beginning of the next =item, or at the end of the enclosing =over.

* There's no limitation as to the number of times that a given entry can appear in a document or collection of documents. That is, it is not an error to have X<whatever> appear twice in the same file.

This means that if two or more adjacent paragraphs are needed to make sense,
it's no problem under the scope rules - just mark both.

Good point. And perhaps some of the programs that use the data could choose two treat two consecutive paragraphs as one entry. Although in some cases it's simpler to just choose a slightly wider scope, as it might be better to have too much context than not enough.

Currently it is used in only *one* place in the perl documentation: pod/perlfunc.pod uses it for the "-X" filetest operators.

* It should be considered case-insensitive.

This would lead to some ambiguities:

    -b  File is a block special file.
    -B  File is a "binary" file (opposite of -T).
    -c  File is a character special file.
    -C  Same for inode change time (Unix, may differ for other platforms)
    -s  File has nonzero size (returns size in bytes).
    -S  File is a socket.
    -t  Filehandle is opened to a tty.
    -T  File is an ASCII text file (heuristic guess).

I'm not sure if this is really a problem

Or maybe I was too quick to dismiss case-sensitivity. Maybe it should be up to the processing programs to decide whether they want to be case-sensitive or not (and maybe the user can control it with command-line arguments and such). But if we go for case-sensitivity, I'd add a style rule that says:

* all entries should be written in lowercase, unless uppercase is necessary due to case sensitivity. For example, for generic keyworks like "operator", use X<operator>, not X<Operator>.

Perl comes with over 100 files in the pod/ directory, totaling over 100,000 lines of POD. Obviously, indexing all of it by hand is a very large task, so the question arises as to who will do it. If people agree that this is a good idea and are willing to apply the patches, I could lead the project, and hope to attracting volunteers. In the worst case (no one else is willing to help), I believe that even if I can't index *all* of the pods, a partial index is better than no index at all. I would start with the documents that I consider more important, such as perlop, perlsub, perlre, perlobj, etc. Documents such as perldelta* and the faqs probably don't need indexing that much.


Potentially the faqs do. The deltas might benefit from it, particularly if
searching for a topic on a recent set of documentation brings up an important
bug fix in a specific version of perl, and the user realises that there perl
is older than this.

I agree that it can be useful, it's just a lower priority IMO. The faqs because they can already be searched with perldoc -q (at least the questions), and perldelta* because it's a bit "esoteric" (not what a beginner would be looking at when trying to figure out the purpose of an operator or function, for example ;-)

The proposal seems very well thought through, and I'd be very happy to see
you start on this soon. I'd hope that you'd soon attract volunteers to help,
but as you rightly say, only time will tell.

Thanks, I'm glad you like it. I've already started. Any volunteers? ;-) I'll post later in perlmonks to see if there are any potential volunteers out there.

Ivan

Reply via email to