I also used Clint's example and tried to map it to a document and search the field, but still getting html in query results... Here is my code. I appreciate the help.
//Tokenizer PUT /foo/ { "settings": { "index" : { "analysis" : { "analyzer" : { "test_1" : { "char_filter" : [ "html_strip" ], "tokenizer" : "standard" } } } } } } //Mapping PUT /foo/foo_type/_mapping { "foo_type":{ "properties" : { "title": { "type":"string", "index": "analyzed", "analyzer":"test_1" } } } } Get /foo/foo_type/_mapping { "foo": { "mappings": { "foo_type": { "properties": { "date": { "type": "date", "format": "dateOptionalTime" }, "title": { "type": "string", "analyzer": "test_1" } } } } } } ////Index///////////// PUT /foo/foo_type/1 { "date" : "2009-11-15T14:12:12", "title" : "The quick & <b>brown</b> fox" } //Search ////////// GET /foo/_search?pretty:true { "fields": ["title"], "query": { "query_string": { "query": "brown", "analyzer": "test_1" } } } //Results showing html tags still////// "hits": [ { "_index": "foo", "_type": "foo_type", "_id": "1", "_score": 0.076713204, "fields": { "title": [ "The quick & <b>brown</b> fox" ] } On Thursday, August 7, 2014 6:06:56 PM UTC-4, Jörg Prante wrote: > > Have you checked Clint's example? > > https://gist.github.com/clintongormley/780895 > > Jörg > > > On Thu, Aug 7, 2014 at 8:23 PM, IronMike <sabda...@gmail.com <javascript:> > > wrote: > >> I would like to strip html tags for indexing. Here is a simple example I >> tried so far, but doesn't seem to strip html tags. Any ideas what's missing? >> >> //settings & Mappings >> POST twitter >> { >> "mappings": { >> "tweet" : { >> "properties" : { >> "message" : { >> "type" : "string", >> "analyzer": "strip_html_analyzer" >> }, >> "date" : { >> "type" : "date" >> }, >> "name" : { >> "type" : "string" >> } >> } >> } >> }, >> "settings": { >> "analysis": { >> "analyzer": { >> "strip_html_analyzer":{ >> "type":"custom", >> "tokenizer":"standard", >> "filter":"standard", >> "char_filter":"my_html" >> } >> }, >> "char_filter": { >> "my_html":{ >> "type":"html_strip" >> } >> } >> } >> } >> } >> >> >> //Index a document >> PUT /twitter/tweet/1 >> { >> "name" : "mike", >> "date" : "2009-11-15T14:12:12", >> "message" : "<html>trying out <b>Elasticsearch</b>, This is an html >> test</html>" >> } >> >> >> //query result for "html", I expect the query to return nothing since it >> is supposed to strip the tag? >> "hits": { >> "total": 1, >> "max_score": 0.11626227, >> "hits": [ >> { >> "_index": "twitter", >> "_type": "tweet", >> "_id": "1", >> "_score": 0.11626227, >> "fields": { >> "message": [ >> "<html>trying out <b>Elasticsearch</b>, This is an html >> test</html>" >> ] >> }, >> "highlight": { >> "message": [ >> "<html>trying out <b>Elasticsearch</b>, This is an >> <em>html</em> test</html>" >> ] >> } >> } >> ] >> } >> >> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearc...@googlegroups.com <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/517fe8b8-0b38-4646-bc8f-a27896454515%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/517fe8b8-0b38-4646-bc8f-a27896454515%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a831f6f4-b47c-4c35-a40b-058e3c1b1043%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.