DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=10340>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=10340 [PATCH] Phonetic Search capability ------- Additional Comments From [EMAIL PROTECTED] 2004-01-20 00:33 ------- Copy/paste from Robert's original email follows. I took the reference to Phonetix and went one better... the attached patch allows for phonetic searching without adding new terms, fields, or analyzers. There is an interface 'PhoneticProvider' that IndexReader's can implement to improve performance, otherwise it falls back to a linear search of terms - similar to the way Fuzzy searches work. An interesting point, is that the encoder is completely definable, so 'phonetic searching' does not necessarily have to relate to 'phonetics' at all, but rather it can be viewed as 'alternate term' support, where a single term, can have an alternate representation. The expression language has been changed to allow terms ending with "$" to be a phonetic search, so +balloon$ would find all terms that sound like balloon. This implementation will work with all existing index files, but if the standard IndexReader/Writer were modified to store a 'encoding index' for each term, it would be easy to implement PhoneticProvider, would would stop the linear term search. The posted patch contains code under the LGPL that came directly from the phonetix library. This is my first patch, so I am not sure that is ok, rather, I might have to post the entire library, and change the build to link with it??? Let me know what you think. Robert Engels --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
