similar words index?

2009-01-06 Thread robert

how can one index (text documents) for efficient similar word search?
existing modules?
what principles are used by search engines therefore?
--
http://mail.python.org/mailman/listinfo/python-list


Re: similar words index?

2009-01-02 Thread John Machin
On Jan 2, 8:07 pm, robert no-s...@no-spam.invalid wrote:
 how can one index (text documents) for efficient similar word search?
 existing modules?
 what principles are used by search engines therefore?

Only your second question is on-topic for this newsgroup. Try this:

http://pylucene.osafoundation.org/

Looking at the site for Lucene itself, where you should find
references to the various technologies they use, and some (definitely
recommended) googling should give you some clues about your other
questions. Some computer science topics are: Burkhard-Keller tree,
Voronoi diagram/tree, permuted lexicon ... do bear in mind that what
is actually used in the real-world search engines like Google may be
rather difficult to find out; Google sure ain't open source, not any
more.

HTH,
John
--
http://mail.python.org/mailman/listinfo/python-list


Re: similar words index?

2009-01-02 Thread Jochen Schulz
* robert:
 how can one index (text documents) for efficient similar word search?
 existing modules?

I implemented one approach in mspace.py:

http://well-adjusted.de/mspace.py/

But beware that it is pure Python and not optimized for speed. You gain
quite a lot by having Psyco installed, though.

J.
-- 
I count my partner's eyelashes.
[Agree]   [Disagree]
 http://www.slowlydownward.com/NODATA/data_enter2.html
--
http://mail.python.org/mailman/listinfo/python-list


Re: similar words index?

2009-01-02 Thread bearophileHUGS
Jochen Schulz:
 I implemented one approach in mspace.py:
 http://well-adjusted.de/mspace.py/
 But beware that it is pure Python and not optimized for speed. You gain
 quite a lot by having Psyco installed, though.

Something similar, I haven't compared performance, Psyco helps a lot
here too:
http://code.activestate.com/recipes/572156/

(I have also implemented the same code in D language through a bridge
created by Pyd, more than 100 times faster).

Bye,
bearophile
--
http://mail.python.org/mailman/listinfo/python-list