Oleg Kobchenko wrote:
> OK, everybody got it that attaching keywords to 
> any and all J documentation is suggested. Now, how to
> go about it practically?

Well, from my standpoint as a librarian, there's a bit more to the 
issue than just keywords.  "Strict" keywords (meaning single words 
only) strip meaning (semantics) from words because, in most cases, 
meaning is dependent on the context of surrounding words.  Thus, for 
this whole thing to be useful requires additionally that the underlying 
keyword mechanism must be able to handle *phrases* (two or more words 
in sequence), not just single words only.  In most search engines, the 
user puts quotation marks around the exact words of a phrase to search 
(which, of course, needs to be indexed, too).

The previously mentioned issue of multiple terminologies (synonyms) is 
one that libraries had to address long ago.  The current trend of 
"social tagging" ("use your own words") only exacerbates the problem 
unless there's some way of linking terminology.  This won't solve the 
problem facing the group, but libraries solved it for themselves 
through the use of what's called "controlled vocabularies".  In other 
words, a single synonym was arbitrarily chosen, usually based on 
frequency of usage (this is termed an "authorized heading"), and all 
the related synonyms refer to this single heading for the concept.  
(The authorized heading together with the cross-referenced synonyms or 
closely related terms is called an "authority record".)  Essentially, 
only authorized headings/terms are assigned (as "tags" are) to content. 
If someone searches for one of the nonauthorized terms, the cross 
reference points (or takes) them to the authorized term in the index 
and displays all matching content.  As I said, that's how libraries do 
it, but I doubt that that's very practical here, since you really want 
the computer to automatically do most of the grunt work, rather than 
humans.  On the other hand, the only way I see to permit concepts, 
ideas, and meanings to get indexed is to be sure that the underlying 
mechanism is capable of handling both single words and phrases of 
multiple words in sequence.  To me, that's probably going to get 
everybody closest to the best of all worlds, searchwise.

The only outstanding "problem" is that, in some cases, whole groups of 
synonyms will need to be entered (rather than a single term or phrase) 
for the sake of finding alternative terminologies.  The only way I 
think this truly might work successfully is to create much finer 
granularity in the documentation, so that variant terminology is 
connected (indexed) to only the small section of text dealing with the 
particular idea, concept, algorithm, etc.  In other words, in many 
cases indexing might have to be oriented around paragraphs or very 
small sections rather than around whole large sections or chapters.  
Just my $.02 on this issue.

Harvey


----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to