I am trying to set up some common analyzers and mappings via Java, but I am 
having trouble. 
The precise JSON I use, you can find below. 

Essentially I want to get shingles for a field called "text". I 
had this working, but I got all the traditional stop words unigrams on 
top when running a facet search: Not so useful. So I tried to throw a 
few stop word lists in the mix. Because I am dealing with English and 
German web content, I need a German and an English stop work list, plus a
 custom one with stuff like "www" and "http" on it. 

First question: Can someone see whether the below JSON should work? 

On the Java side, I am using the following code to pass the JSON to Elastic 
Search: 

client.admin().indices().prepareClose(INDEX_NAME).execute().actionGet();
 
 
Client.admin().indices().prepareUpdateSettings(INDEX_NAME).setSettings(settingString).execute().actionGet();
 
client.admin().indices().prepareOpen(INDEX_NAME).execute().get(); 
client.admin().indices().preparePutMapping(INDEX_NAME).setType("_default_").setSource(mappingString).execute().actionGet();
 

Second question: Is trhis correct? Also: When should/can I run 
this code? After the index was created? Will it still work after some 
content has already been added to the index? Do I need to give ES some 
time after I issued the above commands? If so, how do I know when it is 
ready again? 

Many Thanks! 

  

Settings: 
{ 
  "analysis":{ 
    "analyzer":{ 
      "analyzer_shingle":{ 
        "tokenizer":"standard", 
        "filter":["standard", "lowercase"] 
      }, 
       "title" : { 
            "type" : "string", 
            "index": "not_analyzed" 
          }, 
          "analyzer_shingle_tf":{ 
        "tokenizer":"standard", 
        "filter":["standard", "lowercase", "filter_english", "filter_german", 
"filter_www", "filter_stop", "filter_shingle"] 
      } 
    }, 
    "filter":{ 
      "filter_shingle":{ 
        "type":"shingle", 
        "max_shingle_size":5, 
        "min_shingle_size":2, 
        "output_unigrams":"true" 
      }, 
      "filter_stop":{ 
        "type":"stop", 
        "enable_position_increments":"true" 
      }, 
          "filter_english":{ 
        "type":"stop", 
        "stopwords":"_english_" 
      }, 
          "filter_german":{ 
        "type":"stop", 
        "stopwords":"_german_" 
      }, 
          "filter_www":{ 
        "type":"stop", 
                "stopwords_path":"stopwords_www.txt" 
      } 
    } 
  } 
} 

Mapping: 
  _default_ : { 
    "properties" : { 
      "coordinates" : { 
        "type" : "geo_point", 
      } 
          "text":{ 
           "search_analyzer":"analyzer_shingle_tf", 
           "index_analyzer":"analyzer_shingle_tf", 
           "type":"string" 
      } 
    } 
  } 
} 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1387457943.76317.YahooMailNeo%40web28801.mail.ir2.yahoo.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to