On Jul 1, 2004, at 5:54 AM, George Abraham wrote:

>  This KWIC concept sounds cool. Any place I could find more info?
>

I got curious, so I googled for KWIC -- one thing lead to another.

Here is an overall demonstration of the concept:

    http://coronet.iicm.edu/wbtmaster/courses/kwic_intro_start9.htm

    Hit the right arrow (upper right corner) to start the demo

1) I forgot that the early KWIC indexes published at IBM were done
before widespread use of hard disks (a 10 meg drive cost about $13,000
per month) -- so KWIC (or any) indexing the full text of documents was
impractical -- instead they prepared KWIC indexes of document titles.

2) Kwic indexing was developed by Hans Peter Lunn -- who went to work
for IBM

    http://web.utk.edu/~jgantt/hanspeterluhn.html

3) As the computing industry advanced, a more general form of a KWIC
index, called a concordance, became something of a CS Class exercise.

    http://www.cs.wm.edu/~noonan/cs312/homework/concordance/

4) Further advances made it practical to provide KWICK/Concordance
indexing of the full text of documents

    http://www.georgetown.edu/faculty/ballc/corpora/tutorial3.html

5) Today, several institutions, including Stanford University and
Amazon.com use KWIC indexing to augment Full text searches.  What
appears to happen is this:

a) A keyword search is performed title, author, bio,  as well as the
content of the documents (using boolean logic, stemming, synonyms,
whatever)

b) For any hits found in the full-text content are extracted along with
a given amount of leading and training words.  A quick index is then
dynamically generated on the extracted lines.

c) the extracted text snippets are presented with the keywords
highlighted (bold color) as a more detailed subindex of the particular
document.

    http://www.infotoday.com/newsbreaks/nb031103-1.shtml

    http://highwire.stanford.edu/inthepress/asbmb/asbmb_2003feb.dtl

First of all, this really isn't a KWIC index -- all it is is a text
snippet with the hit words highlighted.

Second the "KWIC" index only appears if the keywords do not all appear
in the title/author/bio

Third the "KWIC" index is subordinate to, and relative to a single
document -- you do not get the advantage of seeing the results of all
the documents "In Context"

What I think would be much more useful would a composite KWIC index of
all the hits (with a link to
the doc).

Apple's search technology (kind of) uses KWIC-type indexing in iTunes
-- they just don't rearrange the text nor highlight the hit words --
the just display the text "as-is".

Based on my experience of finding things with KWIC, I think modern
search techniques are missing something by not fully exploiting the
KWIC way of presenting results -- it is ugly, but a human can very
quickly scan the (KWIC Formatted) context of all the matches to find
what he seeks.

HTH

  Dick
[Todays Threads] [This Message] [Subscription] [Fast Unsubscribe] [User Settings] [Donations and Support]

Reply via email to