Hello Ravi , Your approach is wrong. When you use synonym filter , it indexes all synonyms of that token hence and synonym will match against that term. So when you do a facet , you will get an aggregation of all synonyms rather than just one.
Better approach would be to store the unique name into some other field and take a facet of that field. Thanks Vineeth On Mon, Jul 21, 2014 at 11:21 PM, <ravi...@gmail.com> wrote: > Hi All, > > I have a requirement in which I need to find distinct company names. I was > using "Keyword" tokenizer for that field and through term facet I was able > to get distinct company names. However terms facet treated company names > like "ibm suisse", "ibm corporation", "ibm" as different companies. > Online documentation suggested me to use "Synonym filter" to solve this. > My settings is: > > curl -XPUT 'http://localhost:9200/dataindex/' -d '{ > "settings": { > "index": { > "analysis": { > "analyzer": { > "customAnalyzer": { > "type": "custom", > "tokenizer": "whitespace", > "filter": [ > "lowercase","synonym" > ] > } > }, > "filter": { > "synonym" : { > "type" : "synonym", > "tokenizer": "keyword", > "synonyms_path" : "analysis/synonym.txt" > } > } > } > } > } > }' > > My mapping is: > > curl -XPUT 'http://localhost:9200/dataindex/tweet/_mapping' -d ' > { > "tweet" : { > "properties" : { > "company": { > "type": "string", > "analyzer": "customAnalyzer" > } > } > } > }' > > In the synonym.txt file I have : ibm suisse, ibm corporation, ibm > business, ibm => ibm corp ltd > > Indexed data: > curl -XPUT 'http://localhost:9200/dataindex/tweet/1' -d '{ > "company" : "ibm" > }' > curl -XPUT 'http://localhost:9200/dataindex/tweet/2' -d '{ > "company" : "ibm corporation" > }' > curl -XPUT 'http://localhost:9200/dataindex/tweet/3' -d '{ > "company" : "ibm suisse" > }' > curl -XPUT 'http://localhost:9200/dataindex/tweet/4' -d '{ > "company" : "ibm business" > }' > > If I run a terms facet: > { > "facets": { > "loc_facet": { > "terms": { > "field": "company" > } > } > } > } > I get 3 terms ie {term: ibm corp ltd, count: 3} {term: suisse, count: 1} > {term: corporation, count: 1} > I want the facet result to return only one term: ibm corp ltd with > count=3. This way i will get distinct company names and also map synonym > names into single company name. > Please correct me if I am using wrong tokenizer or my approach is not > correct. > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/1ba32926-7015-4b8a-89ae-bf43a2561b71%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/1ba32926-7015-4b8a-89ae-bf43a2561b71%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5ny%3Di76CHwpbEoY-4nGaraQfz-Tmmm5MVJbiA%2B0nrgKZQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.