On Jan 2, 8:07 pm, robert <no-s...@no-spam.invalid> wrote: > how can one index (text documents) for efficient similar word search? > existing modules? > what principles are used by search engines therefore?
Only your second question is on-topic for this newsgroup. Try this: http://pylucene.osafoundation.org/ Looking at the site for Lucene itself, where you should find references to the various technologies they use, and some (definitely recommended) googling should give you some clues about your other questions. Some computer science topics are: Burkhard-Keller tree, Voronoi diagram/tree, permuted lexicon ... do bear in mind that what is actually used in the real-world search engines like Google may be rather difficult to find out; Google sure ain't open source, not any more. HTH, John -- http://mail.python.org/mailman/listinfo/python-list