Hi Colin, Even assuming you came up with a good way of indexing, the example query "Ontario, CA" should yield 3 hits. All 2, 3 and 4 are valid retrievals. Could you please justify which 2 hits you want and why?
Thanks, Rajesh Munavalli On 1/27/06, Colin Young <[EMAIL PROTECTED]> wrote: > > I'm having some trouble coming up with a good search strategy for > geographical data. e.g., given: > > [1] city: London, United Kingdom > [2] city: London, Ontario, Canada > [3] city: Ontario, California, United States > [4] state: Ontario, Canada > [5] city: Vancouver, Washington, United States > [6] city: Vancouver, British Columbia, Canada > [7] city: Washington, DC, United States > [8] state: Washington, United States > > and also given the following synonyms: > > Ontario = ON > California = CA > Washington = WA > Canada = CA > United States = US = America = United States of America > United Kingdom = UK = Great Britain = England > > for the following queries, I want the listed number of hits '()' from > matching '[]': > > i. Ontario (2) [3, 4] > ii. London (2) [1, 2] > iii. Ontario, Canada (1) [4] > iv. Ontario, California (1) [3] > v. Ontario, CA (2) [3, 4] > vi. Ontario, US (1) [3] > vii. Vancouver (2) [5, 6] > viii. Washington (2) [7, 8] > ix. Washington, DC (1) [7] > x. Vancouver, CA (1) [6] > xi. Vancouver, WA (1) [5] > > How do I index and store the input (assume that I know the mechanics so > I'm not looking for specific java syntax or how to generate synonyms during > analysis) so that I get the desired results. My current attempt indexes > strings like "London Ontario Canada", "London ON Canada", "London Ontario > CA", "London ON CA" -- i.e. every combination of entity name and > corresponding code -- in a content field and creates a type field containing > "city" (or "state" or "country" as appropriate to identify the type of > entity being indexed) and uses a phrase query with a slop of 1 which works > really well except e.g. "Ontario CA" for which I'd like 2 hits, but given > the above data gives 3 hits (from 2, 3 and 4, and the problem will only get > worse as I add more cities in Ontario since each results in a hit). The slop > of 1 is required since not all countries customarily use states, and I need > to support the user optionally dropping the state as in the above example of > "Ontario, CA" where we don't know if the user intended the "CA" to represent > the state of California or the country of Canada, while "London, UK" would > be unambiguous. > > The major problem as I see it is that at parse time I don't know if the > user is searching for a city, state or country, and I don't want to force > them to specify that. > > Does anyone have any good ideas to help me solve this problem? > > Thanks. > > Colin Young > > > Notice: This email message is for the sole use of the intended > recipient(s) and may contain confidential and privileged information. Any > unauthorized review, use, disclosure or distribution is prohibited. If you > are not the intended recipient, please contact the sender by reply email and > destroy all copies of the original message. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >