Hi.  I'm trying to improve autocomplete search results on a GeoNames Cities 
index.
I have been using django-haystack, but have run into issues there. I may 
need to replace it, or bypass it.  But my question here pertains to 
indexing and querying with autocomplete using multiple fields.

Users expect to be able to use two-letter abbreviations for states to 
narrow their city choices.  For example,
"San Francisco, CA" and "New York, NY" should have the cities you'd expect 
at the top of the list.
However that is not the case, and I think for different reasons.  You can 
see the results below.

It turns out that there are a lot of San Franciscos in the world! 
 Searching for "San Francisco CA" retrieves

San Francisco, Caraga, 13, PH 5.5191193
San Francisco, Caraga, 13, PH 5.5163627
San Francisco, Calabarzon, 40, PH 5.4498897
San Francisco, Calabarzon, 40, PH 5.281434
San Francisco, Caraga, 13, PH 5.281434
San Francisco, California, CA, US 5.2123656
South San Francisco, California, CA, US 4.3138
San Francisco (El Calvito), Chiapas, 05, MX 4.137272
San Francisco, Baja California Sur, 03, MX 4.137272
San Francisco (BaƱos de Agua Caliente), Guanajuato, 11, MX 3.3008962

I would like to boost the state (region_code) value so that San Francisco 
and South San Francisco are at the top.

For "New York NY" I get 

Nyack, New York, US 3.0575132
West Nyack, New York, US 2.670291
South Nyack, New York, US 2.5124028
Upper Nyack, New York, US 2.5124028

Instead of what I want, which is "New York City, New  York, US".

The autocomplete field is EdgeNGram called "content_auto".  It currently 
has the following format, which is what I want to return: "CityName, 
RegionName, CountryCode." 

So I think what I want to do in both cases is boost results if there is a 
match on the region_code field, but *not* display the region_code field in 
the results.

The type of the search is currently query_string, which is what haystack 
uses.  If there is some way to make that work, then that would be good. 
 However, I'm afraid it is limiting what I'm able to do.

I did some experiments --
If I query directly with curl for sf using

{
        "query":{
                "multi_match":{
                        "query": "San Francisco CA",
                        "type": "cross_fields",
                        "fields": ["content_auto", "region_code^3"]
                }
        }
}


I get a result I'm satisfied with.  However the similar query using "New 
York NY" puts the city as the sixth result!  I also tried putting the 
region_code in the content_auto string, and boosting the region_code field.
Also, the following works for SF, but I have no way of knowing in advance 
what the region_code is going to be.  It ranks New  York City third, and I 
would have to pick out two-letter combinations.
"default_field": "text",
          "default_operator": "OR",
          "query": "(content_auto:(san) AND content_auto:(francisco)) 
CA^1.5"


It would really help if someone could help me limit my *own* queries about 
how ElasticSearch works, so that I can focus on the best approach!

Thanks in advance for your help :-)




curl 'localhost:9200/cities/_mapping?&pretty'
{
  "cities" : {
    "mappings" : {
      "modelresult" : {
        "_boost" : {
          "name" : "boost",
          "null_value" : 1.0
        },
        "properties" : {
          "content_auto" : {
            "type" : "string",
            "analyzer" : "edgengram_analyzer"
          },
          "django_ct" : {
            "type" : "string",
            "index" : "not_analyzed",
            "include_in_all" : false
          },
          "django_id" : {
            "type" : "string",
            "index" : "not_analyzed",
            "include_in_all" : false
          },
          "id" : {
            "type" : "string"
          },
          "location" : {
            "type" : "geo_point"
          },
          "region_code" : {
            "type" : "string",
            "analyzer" : "snowball"
          },
          "text" : {
            "type" : "string",
            "analyzer" : "snowball"
          }
        }
      }
    }
  }
}


NY example:
curl -XGET 'http://localhost:9200/cities/modelresult/_search?pretty' -d '{
   "from": 0,
   "query": {
     "filtered": {
       "filter": {
         "terms": {
           "django_ct": [
             "cities.city"
           ]
         }
       },
       "query": {
         "query_string": {
           "analyze_wildcard": true,
           "auto_generate_phrase_queries": true,
           "default_field": "text",
           "default_operator": "AND",
           "query": "(content_auto:(new) AND content_auto:(york,) AND 
content_auto:(ny))"
         }
       }
     }
   },
   "size": 10,
   "sort": [
     {
       "_score": {
         "order": "desc"
       }
     }
   ]
 }'
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : 3.0575132,
    "hits" : [ {
      "_index" : "cities",
      "_type" : "modelresult",
      "_id" : "cities.city.5129433",
      "_score" : 3.0575132,
      "_source":{"django_id": "5129433", "region_code": "NY", "text": 
"Nyack\nNew York\nNY\nUnited States\nUS\n", "django_ct": "cities.city", 
"location": "41.09065,-73.91791", "content_auto": "Nyack, New York, US", 
"id": "cities.city.5129433"}
    }, {
      "_index" : "cities",
      "_type" : "modelresult",
      "_id" : "cities.city.5143946",
      "_score" : 2.670291,
      "_source":{"django_id": "5143946", "region_code": "NY", "text": "West 
Nyack\nNew York\nNY\nUnited States\nUS\n", "django_ct": "cities.city", 
"location": "41.09648,-73.97292", "content_auto": "West Nyack, New York, 
US", "id": "cities.city.5143946"}
    }, {
      "_index" : "cities",
      "_type" : "modelresult",
      "_id" : "cities.city.5138940",
      "_score" : 2.5124028,
      "_source":{"django_id": "5138940", "region_code": "NY", "text": 
"South Nyack\nNew York\nNY\nUnited States\nUS\n", "django_ct": 
"cities.city", "location": "41.08315,-73.92014", "content_auto": "South 
Nyack, New York, US", "id": "cities.city.5138940"}
    }, {
      "_index" : "cities",
      "_type" : "modelresult",
      "_id" : "cities.city.5142011",
      "_score" : 2.5124028,
      "_source":{"django_id": "5142011", "region_code": "NY", "text": 
"Upper Nyack\nNew York\nNY\nUnited States\nUS\n", "django_ct": 
"cities.city", "location": "41.10704,-73.92014", "content_auto": "Upper 
Nyack, New York, US", "id": "cities.city.5142011"}
    } ]
  }
}

}'



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8722fa24-8172-4a89-b9c1-39bd70f60da3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to