On Jul 1, 2004, at 5:54 AM, George Abraham wrote:

>  This KWIC concept sounds cool. Any place I could find more info?

I got curious, so I googled for KWIC -- one thing lead to another.

Here is an overall demonstration of the concept:


    Hit the right arrow (upper right corner) to start the demo

1) I forgot that the early KWIC indexes published at IBM were done
before widespread use of hard disks (a 10 meg drive cost about $13,000
per month) -- so KWIC (or any) indexing the full text of documents was
impractical -- instead they prepared KWIC indexes of document titles.

2) Kwic indexing was developed by Hans Peter Lunn -- who went to work
for IBM


3) As the computing industry advanced, a more general form of a KWIC
index, called a concordance, became something of a CS Class exercise.


4) Further advances made it practical to provide KWICK/Concordance
indexing of the full text of documents


5) Today, several institutions, including Stanford University and
Amazon.com use KWIC indexing to augment Full text searches.  What
appears to happen is this:

a) A keyword search is performed title, author, bio,  as well as the
content of the documents (using boolean logic, stemming, synonyms,

b) For any hits found in the full-text content are extracted along with
a given amount of leading and training words.  A quick index is then
dynamically generated on the extracted lines.

c) the extracted text snippets are presented with the keywords
highlighted (bold color) as a more detailed subindex of the particular



First of all, this really isn't a KWIC index -- all it is is a text
snippet with the hit words highlighted.

Second the "KWIC" index only appears if the keywords do not all appear
in the title/author/bio

Third the "KWIC" index is subordinate to, and relative to a single
document -- you do not get the advantage of seeing the results of all
the documents "In Context"

What I think would be much more useful would a composite KWIC index of
all the hits (with a link to
the doc).

Apple's search technology (kind of) uses KWIC-type indexing in iTunes
-- they just don't rearrange the text nor highlight the hit words --
the just display the text "as-is".

Based on my experience of finding things with KWIC, I think modern
search techniques are missing something by not fully exploiting the
KWIC way of presenting results -- it is ugly, but a human can very
quickly scan the (KWIC Formatted) context of all the matches to find
what he seeks.


[Todays Threads] [This Message] [Subscription] [Fast Unsubscribe] [User Settings] [Donations and Support]

Reply via email to