Hi Colin,
        Even assuming you came up with a good way of indexing, the example
query "Ontario, CA" should yield 3 hits. All 2, 3 and 4 are valid
retrievals. Could you please justify which 2 hits you want and why?

Thanks,

Rajesh Munavalli

On 1/27/06, Colin Young <[EMAIL PROTECTED]> wrote:
>
> I'm having some trouble coming up with a good search strategy for
> geographical data. e.g., given:
>
> [1] city: London, United Kingdom
> [2] city: London, Ontario, Canada
> [3] city: Ontario, California, United States
> [4] state: Ontario, Canada
> [5] city: Vancouver, Washington, United States
> [6] city: Vancouver, British Columbia, Canada
> [7] city: Washington, DC, United States
> [8] state: Washington, United States
>
> and also given the following synonyms:
>
> Ontario = ON
> California = CA
> Washington = WA
> Canada = CA
> United States = US = America = United States of America
> United Kingdom = UK = Great Britain = England
>
> for the following queries, I want the listed number of hits '()' from
> matching '[]':
>
> i. Ontario (2) [3, 4]
> ii. London (2) [1, 2]
> iii. Ontario, Canada (1) [4]
> iv. Ontario, California (1) [3]
> v. Ontario, CA (2) [3, 4]
> vi. Ontario, US (1) [3]
> vii. Vancouver (2) [5, 6]
> viii. Washington (2) [7, 8]
> ix. Washington, DC (1) [7]
> x. Vancouver, CA (1) [6]
> xi. Vancouver, WA (1) [5]
>
> How do I index and store the input (assume that I know the mechanics so
> I'm not looking for specific java syntax or how to generate synonyms during
> analysis) so that I get the desired results. My current attempt indexes
> strings like "London Ontario Canada", "London ON Canada", "London Ontario
> CA", "London ON CA" -- i.e. every combination of entity name and
> corresponding code -- in a content field and creates a type field containing
> "city" (or "state" or "country" as appropriate to identify the type of
> entity being indexed) and uses a phrase query with a slop of 1 which works
> really well except e.g. "Ontario CA" for which I'd like 2 hits, but given
> the above data gives 3 hits (from 2, 3 and 4, and the problem will only get
> worse as I add more cities in Ontario since each results in a hit). The slop
> of 1 is required since not all countries customarily use states, and I need
> to support the user optionally dropping the state as in the above example of
> "Ontario, CA" where we don't know if the user intended the "CA" to represent
> the state of California or the country of Canada, while "London, UK" would
> be unambiguous.
>
> The major problem as I see it is that at parse time I don't know if the
> user is searching for a city, state or country, and I don't want to force
> them to specify that.
>
> Does anyone have any good ideas to help me solve this problem?
>
> Thanks.
>
> Colin Young
>
>
> Notice: This email message is for the sole use of the intended
> recipient(s) and may contain confidential and privileged information. Any
> unauthorized review, use, disclosure or distribution is prohibited. If you
> are not the intended recipient, please contact the sender by reply email and
> destroy all copies of the original message.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Reply via email to