Oleg Kobchenko wrote:
> OK, everybody got it that attaching keywords to
> any and all J documentation is suggested. Now, how to
> go about it practically?
Well, from my standpoint as a librarian, there's a bit more to the
issue than just keywords. "Strict" keywords (meaning single words
only) strip meaning (semantics) from words because, in most cases,
meaning is dependent on the context of surrounding words. Thus, for
this whole thing to be useful requires additionally that the underlying
keyword mechanism must be able to handle *phrases* (two or more words
in sequence), not just single words only. In most search engines, the
user puts quotation marks around the exact words of a phrase to search
(which, of course, needs to be indexed, too).
The previously mentioned issue of multiple terminologies (synonyms) is
one that libraries had to address long ago. The current trend of
"social tagging" ("use your own words") only exacerbates the problem
unless there's some way of linking terminology. This won't solve the
problem facing the group, but libraries solved it for themselves
through the use of what's called "controlled vocabularies". In other
words, a single synonym was arbitrarily chosen, usually based on
frequency of usage (this is termed an "authorized heading"), and all
the related synonyms refer to this single heading for the concept.
(The authorized heading together with the cross-referenced synonyms or
closely related terms is called an "authority record".) Essentially,
only authorized headings/terms are assigned (as "tags" are) to content.
If someone searches for one of the nonauthorized terms, the cross
reference points (or takes) them to the authorized term in the index
and displays all matching content. As I said, that's how libraries do
it, but I doubt that that's very practical here, since you really want
the computer to automatically do most of the grunt work, rather than
humans. On the other hand, the only way I see to permit concepts,
ideas, and meanings to get indexed is to be sure that the underlying
mechanism is capable of handling both single words and phrases of
multiple words in sequence. To me, that's probably going to get
everybody closest to the best of all worlds, searchwise.
The only outstanding "problem" is that, in some cases, whole groups of
synonyms will need to be entered (rather than a single term or phrase)
for the sake of finding alternative terminologies. The only way I
think this truly might work successfully is to create much finer
granularity in the documentation, so that variant terminology is
connected (indexed) to only the small section of text dealing with the
particular idea, concept, algorithm, etc. In other words, in many
cases indexing might have to be oriented around paragraphs or very
small sections rather than around whole large sections or chapters.
Just my $.02 on this issue.
Harvey
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm