You should present all the alternatives to the user as well as the contexts of each hit in terms of country, state and full name etc. and let them decide which one they intended.
Peter -----Original Message----- From: Jayakumar.V [mailto:[EMAIL PROTECTED] Sent: 28 September 2005 12:03 To: Peter Gelderbloem; java-user@lucene.apache.org Subject: RE: Issue with sounds-like queries Hi, The reason I'm using sounds-like queries is that this search feature will be used by our lobby staff(s), who'll be of different nationalities. No two users may spell the place name the same way. They may also misspell the names. To bring out the closest match based on what they've input, I need to use sounds-like queries. -- Jayakumar.V -----Original Message----- From: Peter Gelderbloem [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 28, 2005 14:14 To: java-user@lucene.apache.org; [EMAIL PROTECTED] Subject: RE: Issue with sounds-like queries May be you should not be using sounds like queries in the first place? They are supposed to be fuzzy afaik. -----Original Message----- From: Jayakumar.V [mailto:[EMAIL PROTECTED] Sent: 27 September 2005 14:54 To: java-user@lucene.apache.org Subject: Issue with sounds-like queries Hi, I'm facing an issue with sounds-like queries. I've experimented with both Apache Codec & the Phonetix library from Tangentum Technologies (http://www.tangentum.biz/en/products/phonetix/faqs/index.html ) to see if I could sort out the issue somehow using either of the libraries. I've an index containing details of various Banks in the world & their associated Branches. Each document has a field holding the Branch Name(s) for the individual Bank(s). While searching for the following branch name :- QUILON, it also returns back details where the branch name may contain the word COLONY, since using Metaphone or DoubleMetaphone, both QUILON & COLONY get encoded to the same value :- KLN. This returns in-correct results. Another example would be CALICUT (located in South India) & CALCUTTA (located in North India), both get encoded to KLKT. I can narrow down the result by filtering based on COUNTRY or COUNTRY + STATE but still I might get back results which may not be the one intended. I also tried using the RefinedSoundex class. The issue here is that, "QUILON BRANCH" will get encoded as - Q50708190830, whereas "QUILON" alone will get encoded as - Q50708. The user may input only "QUILON" while making a search which will not return back hits in the above case. Hope I was clear in communicating the issue. Any thoughts / inputs will be really helpful. Thanks & Regards Jayakumar.V UAE Xchange Center PB.No. : 170, Abudhabi, UAE Phone: + 971-2-6105656, 6105658 Fax: +971-2-6323775 _____ Confidentiality Notice : This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. _____ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]