DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=10340>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=10340

[PATCH] Phonetic Search capability





------- Additional Comments From [EMAIL PROTECTED]  2004-01-20 00:33 -------
Copy/paste from Robert's original email follows.

I took the reference to Phonetix and went one better... the attached 
patch
allows for phonetic searching without adding new terms, fields, or
analyzers.

There is an interface 'PhoneticProvider' that IndexReader's can 
implement to
improve performance, otherwise it falls back to a linear search of 
terms -
similar to the way Fuzzy searches work.

An interesting point, is that the encoder is completely definable, so
'phonetic searching' does not necessarily have to relate to 'phonetics' 
at
all, but rather it can be viewed as 'alternate term' support, where a 
single
term, can have an alternate representation.

The expression language has been changed to allow terms ending with "$" 
to
be a phonetic search, so

+balloon$

would find all terms that sound like balloon.

This implementation will work with all existing index files, but if the
standard IndexReader/Writer were modified to store a 'encoding index' 
for
each term, it would be easy to implement PhoneticProvider, would would 
stop
the linear term search.

The posted patch contains code under the LGPL that came directly from 
the
phonetix library. This is my first patch, so I am not sure that is ok,
rather, I might have to post the entire library, and change the build 
to
link with it???

Let me know what you think.

Robert Engels

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to