Hi. I'm trying to improve autocomplete search results on a GeoNames Cities index. I have been using django-haystack, but have run into issues there. I may need to replace it, or bypass it. But my question here pertains to indexing and querying with autocomplete using multiple fields.
Users expect to be able to use two-letter abbreviations for states to narrow their city choices. For example, "San Francisco, CA" and "New York, NY" should have the cities you'd expect at the top of the list. However that is not the case, and I think for different reasons. You can see the results below. It turns out that there are a lot of San Franciscos in the world! Searching for "San Francisco CA" retrieves San Francisco, Caraga, 13, PH 5.5191193 San Francisco, Caraga, 13, PH 5.5163627 San Francisco, Calabarzon, 40, PH 5.4498897 San Francisco, Calabarzon, 40, PH 5.281434 San Francisco, Caraga, 13, PH 5.281434 San Francisco, California, CA, US 5.2123656 South San Francisco, California, CA, US 4.3138 San Francisco (El Calvito), Chiapas, 05, MX 4.137272 San Francisco, Baja California Sur, 03, MX 4.137272 San Francisco (BaƱos de Agua Caliente), Guanajuato, 11, MX 3.3008962 I would like to boost the state (region_code) value so that San Francisco and South San Francisco are at the top. For "New York NY" I get Nyack, New York, US 3.0575132 West Nyack, New York, US 2.670291 South Nyack, New York, US 2.5124028 Upper Nyack, New York, US 2.5124028 Instead of what I want, which is "New York City, New York, US". The autocomplete field is EdgeNGram called "content_auto". It currently has the following format, which is what I want to return: "CityName, RegionName, CountryCode." So I think what I want to do in both cases is boost results if there is a match on the region_code field, but *not* display the region_code field in the results. The type of the search is currently query_string, which is what haystack uses. If there is some way to make that work, then that would be good. However, I'm afraid it is limiting what I'm able to do. I did some experiments -- If I query directly with curl for sf using { "query":{ "multi_match":{ "query": "San Francisco CA", "type": "cross_fields", "fields": ["content_auto", "region_code^3"] } } } I get a result I'm satisfied with. However the similar query using "New York NY" puts the city as the sixth result! I also tried putting the region_code in the content_auto string, and boosting the region_code field. Also, the following works for SF, but I have no way of knowing in advance what the region_code is going to be. It ranks New York City third, and I would have to pick out two-letter combinations. "default_field": "text", "default_operator": "OR", "query": "(content_auto:(san) AND content_auto:(francisco)) CA^1.5" It would really help if someone could help me limit my *own* queries about how ElasticSearch works, so that I can focus on the best approach! Thanks in advance for your help :-) curl 'localhost:9200/cities/_mapping?&pretty' { "cities" : { "mappings" : { "modelresult" : { "_boost" : { "name" : "boost", "null_value" : 1.0 }, "properties" : { "content_auto" : { "type" : "string", "analyzer" : "edgengram_analyzer" }, "django_ct" : { "type" : "string", "index" : "not_analyzed", "include_in_all" : false }, "django_id" : { "type" : "string", "index" : "not_analyzed", "include_in_all" : false }, "id" : { "type" : "string" }, "location" : { "type" : "geo_point" }, "region_code" : { "type" : "string", "analyzer" : "snowball" }, "text" : { "type" : "string", "analyzer" : "snowball" } } } } } } NY example: curl -XGET 'http://localhost:9200/cities/modelresult/_search?pretty' -d '{ "from": 0, "query": { "filtered": { "filter": { "terms": { "django_ct": [ "cities.city" ] } }, "query": { "query_string": { "analyze_wildcard": true, "auto_generate_phrase_queries": true, "default_field": "text", "default_operator": "AND", "query": "(content_auto:(new) AND content_auto:(york,) AND content_auto:(ny))" } } } }, "size": 10, "sort": [ { "_score": { "order": "desc" } } ] }' { "took" : 2, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 4, "max_score" : 3.0575132, "hits" : [ { "_index" : "cities", "_type" : "modelresult", "_id" : "cities.city.5129433", "_score" : 3.0575132, "_source":{"django_id": "5129433", "region_code": "NY", "text": "Nyack\nNew York\nNY\nUnited States\nUS\n", "django_ct": "cities.city", "location": "41.09065,-73.91791", "content_auto": "Nyack, New York, US", "id": "cities.city.5129433"} }, { "_index" : "cities", "_type" : "modelresult", "_id" : "cities.city.5143946", "_score" : 2.670291, "_source":{"django_id": "5143946", "region_code": "NY", "text": "West Nyack\nNew York\nNY\nUnited States\nUS\n", "django_ct": "cities.city", "location": "41.09648,-73.97292", "content_auto": "West Nyack, New York, US", "id": "cities.city.5143946"} }, { "_index" : "cities", "_type" : "modelresult", "_id" : "cities.city.5138940", "_score" : 2.5124028, "_source":{"django_id": "5138940", "region_code": "NY", "text": "South Nyack\nNew York\nNY\nUnited States\nUS\n", "django_ct": "cities.city", "location": "41.08315,-73.92014", "content_auto": "South Nyack, New York, US", "id": "cities.city.5138940"} }, { "_index" : "cities", "_type" : "modelresult", "_id" : "cities.city.5142011", "_score" : 2.5124028, "_source":{"django_id": "5142011", "region_code": "NY", "text": "Upper Nyack\nNew York\nNY\nUnited States\nUS\n", "django_ct": "cities.city", "location": "41.10704,-73.92014", "content_auto": "Upper Nyack, New York, US", "id": "cities.city.5142011"} } ] } } }' -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8722fa24-8172-4a89-b9c1-39bd70f60da3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.