Re: _all analyzer advice
Ah. Cheers. I had looked at that page a few times but missed that. On Tuesday, 1 July 2014 19:04:56 UTC+1, Glen Smith wrote: > > > http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-analyzers.html > > On Tuesday, July 1, 2014 6:23:54 AM UTC-4, mooky wrote: >> >> Thanks. >> So default_index and default_search have special meaning. >> Is this in the docs anywhere? >> >> -N >> >> >> >> On Monday, 30 June 2014 17:21:40 UTC+1, Glen Smith wrote: >>> >>> Totally. For example: >>> >>> "analyzer": { >>> "default_index": { >>> "tokenizer": "standard", >>> "filter": ["standard", "lowercase"] >>> }, >>> "default_search": { >>> "tokenizer": "standard", >>> "filter": ["standard", "lowercase", "stop"] >>> }, >>> >>> >>> On Monday, June 30, 2014 12:19:55 PM UTC-4, mooky wrote: Excellent. Thanks for the info. Is it possible to set my custom analyser as the default analyser for an index (ie instead of standard_analyzer) -N On Monday, 30 June 2014 14:41:10 UTC+1, Glen Smith wrote: > > You can set up an analyser for your index... > > ... > "my-index": { > "analysis": { > "analyzer": { > "default_index": { > "tokenizer": "standard", > "filter": ["standard", "icu_fold_filter", "stop"] > }, > "default_search": { > "tokenizer": "standard", > "filter": ["standard", "icu_fold_filter", "stop"] > }, > "custom_index": { > "tokenizer": "whitespace", > "filter": ["lower"] > }, > "custom_search": { > "tokenizer": "whitespace", > "filter": ["lower"] > } > } > } > } > ... > > and then map your relevant field accordingly: > > { > "_timestamp": { > "enabled": "true", > "store": "yes" > }, > "properties": { > "my_field": { > "type": "string", > "index_analyzer": "custom_index", > "search_analyzer": "custom_search" > } > } > } > > > Note that you can (and often should) set up index analysis and search > analysis differently (eg if you use synonyms, only expand search terms). > > Hope I haven't missed the point... > > On Monday, June 30, 2014 8:47:36 AM UTC-4, mooky wrote: >> >> Hi all, >> >> I have a google-style search capability in my app that uses the _all >> field with the default (standard) analyzer (I don't configure anything - >> so >> its Elastic's default). >> >> There are a few cases where we don't quite get the behaviour we want, >> and I am trying to work out how I tweak the analyzer configuration. >> >> 1) if the user searches using 99.97, then they get the results they >> expect, but if they search using 99.97%, they get nothing. They should >> get >> the results that match "99.97%". The default analyzer config loses the >> %, I >> guess. >> >> 2) I have no idea what the text is ( : ) ) but the user wants to >> search using 托克金通贸易 - which is in the data - but currently we get zero >> results. It looks like the standard analyzer/tokenizer breaks on each >> character. >> >> I *_think_* I just want a whitespace analyzer with lower-casing >> However, >> a) I am not exactly sure how to configure that, and; >> b) I am not 100% sure what I am losing/gaining vs standard analyzer. >> (dont need stop-words - in any case default cfg for standard analyser >> doesn't have any IIRC) >> >> (FWIW, on all our other text fields, we tend to use no analyzer) >> >> (Elastic 1.1.1 and 1.2 ...) >> >> Cheers. >> -M >> > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6796a0dc-5eaa-4db4-ab47-400215743c61%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: _all analyzer advice
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-analyzers.html On Tuesday, July 1, 2014 6:23:54 AM UTC-4, mooky wrote: > > Thanks. > So default_index and default_search have special meaning. > Is this in the docs anywhere? > > -N > > > > On Monday, 30 June 2014 17:21:40 UTC+1, Glen Smith wrote: >> >> Totally. For example: >> >> "analyzer": { >> "default_index": { >> "tokenizer": "standard", >> "filter": ["standard", "lowercase"] >> }, >> "default_search": { >> "tokenizer": "standard", >> "filter": ["standard", "lowercase", "stop"] >> }, >> >> >> On Monday, June 30, 2014 12:19:55 PM UTC-4, mooky wrote: >>> >>> Excellent. Thanks for the info. >>> >>> Is it possible to set my custom analyser as the default analyser for an >>> index (ie instead of standard_analyzer) >>> >>> -N >>> >>> On Monday, 30 June 2014 14:41:10 UTC+1, Glen Smith wrote: You can set up an analyser for your index... ... "my-index": { "analysis": { "analyzer": { "default_index": { "tokenizer": "standard", "filter": ["standard", "icu_fold_filter", "stop"] }, "default_search": { "tokenizer": "standard", "filter": ["standard", "icu_fold_filter", "stop"] }, "custom_index": { "tokenizer": "whitespace", "filter": ["lower"] }, "custom_search": { "tokenizer": "whitespace", "filter": ["lower"] } } } } ... and then map your relevant field accordingly: { "_timestamp": { "enabled": "true", "store": "yes" }, "properties": { "my_field": { "type": "string", "index_analyzer": "custom_index", "search_analyzer": "custom_search" } } } Note that you can (and often should) set up index analysis and search analysis differently (eg if you use synonyms, only expand search terms). Hope I haven't missed the point... On Monday, June 30, 2014 8:47:36 AM UTC-4, mooky wrote: > > Hi all, > > I have a google-style search capability in my app that uses the _all > field with the default (standard) analyzer (I don't configure anything - > so > its Elastic's default). > > There are a few cases where we don't quite get the behaviour we want, > and I am trying to work out how I tweak the analyzer configuration. > > 1) if the user searches using 99.97, then they get the results they > expect, but if they search using 99.97%, they get nothing. They should > get > the results that match "99.97%". The default analyzer config loses the %, > I > guess. > > 2) I have no idea what the text is ( : ) ) but the user wants to > search using 托克金通贸易 - which is in the data - but currently we get zero > results. It looks like the standard analyzer/tokenizer breaks on each > character. > > I *_think_* I just want a whitespace analyzer with lower-casing > However, > a) I am not exactly sure how to configure that, and; > b) I am not 100% sure what I am losing/gaining vs standard analyzer. > (dont need stop-words - in any case default cfg for standard analyser > doesn't have any IIRC) > > (FWIW, on all our other text fields, we tend to use no analyzer) > > (Elastic 1.1.1 and 1.2 ...) > > Cheers. > -M > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/da9ec5f7-89a0-4fa4-aafa-1ee05b226a94%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: _all analyzer advice
Thanks. So default_index and default_search have special meaning. Is this in the docs anywhere? -N On Monday, 30 June 2014 17:21:40 UTC+1, Glen Smith wrote: > > Totally. For example: > > "analyzer": { > "default_index": { > "tokenizer": "standard", > "filter": ["standard", "lowercase"] > }, > "default_search": { > "tokenizer": "standard", > "filter": ["standard", "lowercase", "stop"] > }, > > > On Monday, June 30, 2014 12:19:55 PM UTC-4, mooky wrote: >> >> Excellent. Thanks for the info. >> >> Is it possible to set my custom analyser as the default analyser for an >> index (ie instead of standard_analyzer) >> >> -N >> >> On Monday, 30 June 2014 14:41:10 UTC+1, Glen Smith wrote: >>> >>> You can set up an analyser for your index... >>> >>> ... >>> "my-index": { >>> "analysis": { >>> "analyzer": { >>> "default_index": { >>> "tokenizer": "standard", >>> "filter": ["standard", "icu_fold_filter", "stop"] >>> }, >>> "default_search": { >>> "tokenizer": "standard", >>> "filter": ["standard", "icu_fold_filter", "stop"] >>> }, >>> "custom_index": { >>> "tokenizer": "whitespace", >>> "filter": ["lower"] >>> }, >>> "custom_search": { >>> "tokenizer": "whitespace", >>> "filter": ["lower"] >>> } >>> } >>> } >>> } >>> ... >>> >>> and then map your relevant field accordingly: >>> >>> { >>> "_timestamp": { >>> "enabled": "true", >>> "store": "yes" >>> }, >>> "properties": { >>> "my_field": { >>> "type": "string", >>> "index_analyzer": "custom_index", >>> "search_analyzer": "custom_search" >>> } >>> } >>> } >>> >>> >>> Note that you can (and often should) set up index analysis and search >>> analysis differently (eg if you use synonyms, only expand search terms). >>> >>> Hope I haven't missed the point... >>> >>> On Monday, June 30, 2014 8:47:36 AM UTC-4, mooky wrote: Hi all, I have a google-style search capability in my app that uses the _all field with the default (standard) analyzer (I don't configure anything - so its Elastic's default). There are a few cases where we don't quite get the behaviour we want, and I am trying to work out how I tweak the analyzer configuration. 1) if the user searches using 99.97, then they get the results they expect, but if they search using 99.97%, they get nothing. They should get the results that match "99.97%". The default analyzer config loses the %, I guess. 2) I have no idea what the text is ( : ) ) but the user wants to search using 托克金通贸易 - which is in the data - but currently we get zero results. It looks like the standard analyzer/tokenizer breaks on each character. I *_think_* I just want a whitespace analyzer with lower-casing However, a) I am not exactly sure how to configure that, and; b) I am not 100% sure what I am losing/gaining vs standard analyzer. (dont need stop-words - in any case default cfg for standard analyser doesn't have any IIRC) (FWIW, on all our other text fields, we tend to use no analyzer) (Elastic 1.1.1 and 1.2 ...) Cheers. -M >>> -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/20a33da6-0a79-4c48-b378-e5473828c507%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: _all analyzer advice
Hi Glen, On a related note, I have a use case where I want to search using wild-cards on a custom analyzed field. I am currently seeing some discrepancies w.r.t what I expect. Basically, I have string data in a field such as "Name-55", "Name-56" etc. I want to be able to search for "Name-5*", and get these results. I have indexed the data as terms "Name", "-", "55" "Name", "-", "56" I am using a custom pattern analyzer to achieve this. I am using a similar custom pattern analyzer for my query string, except that I am swallowing &,? and *. "my_template" : { "template" : "*", "order": 1, "settings" :{ "analysis": { "analyzer": { "custom_index":{ "type": "pattern", "pattern":"([\\s]+)|((?<=\\p{L})(?=\\P{L})|((?<=\\P{L})(?=\\p{L}))|((?<=\\d)(?=\\D))|((?<=\\D)(?=\\d)))" }, "custom_search":{ "type": "pattern", "pattern":"([?&*\\s]+)|((?<=\\p{L})(?=\\P{L})|((?<=\\P{L})(?=\\p{L}))|((?<=\\d)(?=\\D))|((?<=\\D)(?=\\d)))" } } } }, "mappings" : { "account" : { "properties" : { "myfield" : { "type" : "string", "store" : "yes", "index" : "analyzed", "index_analyzer" :"custom_index", "search_analyzer":"custom_search" }} Using this, I see that when I search for "Name-5*", I do not get any results returned. However, if I search for "Name- 5*" (Note additional white-space in the search string), then I get the results Name-55 and Name-56. Do you have an understanding of why elasticsearch may be exhibiting this behavior? Is there some issue in the way I have setup the patterns in my analyzer? Your help is much appreciated! Thanks, On Monday, June 30, 2014 9:21:40 AM UTC-7, Glen Smith wrote: > > Totally. For example: > > "analyzer": { > "default_index": { > "tokenizer": "standard", > "filter": ["standard", "lowercase"] > }, > "default_search": { > "tokenizer": "standard", > "filter": ["standard", "lowercase", "stop"] > }, > > > On Monday, June 30, 2014 12:19:55 PM UTC-4, mooky wrote: >> >> Excellent. Thanks for the info. >> >> Is it possible to set my custom analyser as the default analyser for an >> index (ie instead of standard_analyzer) >> >> -N >> >> On Monday, 30 June 2014 14:41:10 UTC+1, Glen Smith wrote: >>> >>> You can set up an analyser for your index... >>> >>> ... >>> "my-index": { >>> "analysis": { >>> "analyzer": { >>> "default_index": { >>> "tokenizer": "standard", >>> "filter": ["standard", "icu_fold_filter", "stop"] >>> }, >>> "default_search": { >>> "tokenizer": "standard", >>> "filter": ["standard", "icu_fold_filter", "stop"] >>> }, >>> "custom_index": { >>> "tokenizer": "whitespace", >>> "filter": ["lower"] >>> }, >>> "custom_search": { >>> "tokenizer": "whitespace", >>> "filter": ["lower"] >>> } >>> } >>> } >>> } >>> ... >>> >>> and then map your relevant field accordingly: >>> >>> { >>> "_timestamp": { >>> "enabled": "true", >>> "store": "yes" >>> }, >>> "properties": { >>> "my_field": { >>> "type": "string", >>> "index_analyzer": "custom_index", >>> "search_analyzer": "custom_search" >>> } >>> } >>> } >>> >>> >>> Note that you can (and often should) set up index analysis and search >>> analysis differently (eg if you use synonyms, only expand search terms). >>> >>> Hope I haven't missed the point... >>> >>> On Monday, June 30, 2014 8:47:36 AM UTC-4, mooky wrote: Hi all, I have a google-style search capability in my app that uses the _all field with the default (standard) analyzer (I don't configure anything - so its Elastic's default). There are a few cases where we don't quite get the behaviour we want, and I am trying to work out how I tweak the analyzer configuration. 1) if the user searches using 99.97, then they get the results they expect, but if they search using 99.97%, they get nothing. They should get the results that match "99.97%". The default analyzer config loses the %, I guess. 2) I have no idea what the text is ( : ) ) but the user wants to search using 托克金通贸易 - which is in the data - but currently we get zero results. It looks like the standard analyzer/tokenizer breaks on each character. I *_think_* I just want a whitespace analyzer with lower-casing However, a) I am not exactly sure how to configure that, and; b) I am n
Re: _all analyzer advice
Totally. For example: "analyzer": { "default_index": { "tokenizer": "standard", "filter": ["standard", "lowercase"] }, "default_search": { "tokenizer": "standard", "filter": ["standard", "lowercase", "stop"] }, On Monday, June 30, 2014 12:19:55 PM UTC-4, mooky wrote: > > Excellent. Thanks for the info. > > Is it possible to set my custom analyser as the default analyser for an > index (ie instead of standard_analyzer) > > -N > > On Monday, 30 June 2014 14:41:10 UTC+1, Glen Smith wrote: >> >> You can set up an analyser for your index... >> >> ... >> "my-index": { >> "analysis": { >> "analyzer": { >> "default_index": { >> "tokenizer": "standard", >> "filter": ["standard", "icu_fold_filter", "stop"] >> }, >> "default_search": { >> "tokenizer": "standard", >> "filter": ["standard", "icu_fold_filter", "stop"] >> }, >> "custom_index": { >> "tokenizer": "whitespace", >> "filter": ["lower"] >> }, >> "custom_search": { >> "tokenizer": "whitespace", >> "filter": ["lower"] >> } >> } >> } >> } >> ... >> >> and then map your relevant field accordingly: >> >> { >> "_timestamp": { >> "enabled": "true", >> "store": "yes" >> }, >> "properties": { >> "my_field": { >> "type": "string", >> "index_analyzer": "custom_index", >> "search_analyzer": "custom_search" >> } >> } >> } >> >> >> Note that you can (and often should) set up index analysis and search >> analysis differently (eg if you use synonyms, only expand search terms). >> >> Hope I haven't missed the point... >> >> On Monday, June 30, 2014 8:47:36 AM UTC-4, mooky wrote: >>> >>> Hi all, >>> >>> I have a google-style search capability in my app that uses the _all >>> field with the default (standard) analyzer (I don't configure anything - so >>> its Elastic's default). >>> >>> There are a few cases where we don't quite get the behaviour we want, >>> and I am trying to work out how I tweak the analyzer configuration. >>> >>> 1) if the user searches using 99.97, then they get the results they >>> expect, but if they search using 99.97%, they get nothing. They should get >>> the results that match "99.97%". The default analyzer config loses the %, I >>> guess. >>> >>> 2) I have no idea what the text is ( : ) ) but the user wants to search >>> using 托克金通贸易 - which is in the data - but currently we get zero results. It >>> looks like the standard analyzer/tokenizer breaks on each character. >>> >>> I *_think_* I just want a whitespace analyzer with lower-casing >>> However, >>> a) I am not exactly sure how to configure that, and; >>> b) I am not 100% sure what I am losing/gaining vs standard analyzer. >>> (dont need stop-words - in any case default cfg for standard analyser >>> doesn't have any IIRC) >>> >>> (FWIW, on all our other text fields, we tend to use no analyzer) >>> >>> (Elastic 1.1.1 and 1.2 ...) >>> >>> Cheers. >>> -M >>> >> -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/63eeca9b-27ca-45da-9b57-d688add036e9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: _all analyzer advice
Excellent. Thanks for the info. Is it possible to set my custom analyser as the default analyser for an index (ie instead of standard_analyzer) -N On Monday, 30 June 2014 14:41:10 UTC+1, Glen Smith wrote: > > You can set up an analyser for your index... > > ... > "my-index": { > "analysis": { > "analyzer": { > "default_index": { > "tokenizer": "standard", > "filter": ["standard", "icu_fold_filter", "stop"] > }, > "default_search": { > "tokenizer": "standard", > "filter": ["standard", "icu_fold_filter", "stop"] > }, > "custom_index": { > "tokenizer": "whitespace", > "filter": ["lower"] > }, > "custom_search": { > "tokenizer": "whitespace", > "filter": ["lower"] > } > } > } > } > ... > > and then map your relevant field accordingly: > > { > "_timestamp": { > "enabled": "true", > "store": "yes" > }, > "properties": { > "my_field": { > "type": "string", > "index_analyzer": "custom_index", > "search_analyzer": "custom_search" > } > } > } > > > Note that you can (and often should) set up index analysis and search > analysis differently (eg if you use synonyms, only expand search terms). > > Hope I haven't missed the point... > > On Monday, June 30, 2014 8:47:36 AM UTC-4, mooky wrote: >> >> Hi all, >> >> I have a google-style search capability in my app that uses the _all >> field with the default (standard) analyzer (I don't configure anything - so >> its Elastic's default). >> >> There are a few cases where we don't quite get the behaviour we want, and >> I am trying to work out how I tweak the analyzer configuration. >> >> 1) if the user searches using 99.97, then they get the results they >> expect, but if they search using 99.97%, they get nothing. They should get >> the results that match "99.97%". The default analyzer config loses the %, I >> guess. >> >> 2) I have no idea what the text is ( : ) ) but the user wants to search >> using 托克金通贸易 - which is in the data - but currently we get zero results. It >> looks like the standard analyzer/tokenizer breaks on each character. >> >> I *_think_* I just want a whitespace analyzer with lower-casing >> However, >> a) I am not exactly sure how to configure that, and; >> b) I am not 100% sure what I am losing/gaining vs standard analyzer. >> (dont need stop-words - in any case default cfg for standard analyser >> doesn't have any IIRC) >> >> (FWIW, on all our other text fields, we tend to use no analyzer) >> >> (Elastic 1.1.1 and 1.2 ...) >> >> Cheers. >> -M >> > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ea2f0a12-1a51-40a1-983e-f3265fae29eb%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: _all analyzer advice
You can set up an analyser for your index... ... "my-index": { "analysis": { "analyzer": { "default_index": { "tokenizer": "standard", "filter": ["standard", "icu_fold_filter", "stop"] }, "default_search": { "tokenizer": "standard", "filter": ["standard", "icu_fold_filter", "stop"] }, "custom_index": { "tokenizer": "whitespace", "filter": ["lower"] }, "custom_search": { "tokenizer": "whitespace", "filter": ["lower"] } } } } ... and then map your relevant field accordingly: { "_timestamp": { "enabled": "true", "store": "yes" }, "properties": { "my_field": { "type": "string", "index_analyzer": "custom_index", "search_analyzer": "custom_search" } } } Note that you can (and often should) set up index analysis and search analysis differently (eg if you use synonyms, only expand search terms). Hope I haven't missed the point... On Monday, June 30, 2014 8:47:36 AM UTC-4, mooky wrote: > > Hi all, > > I have a google-style search capability in my app that uses the _all field > with the default (standard) analyzer (I don't configure anything - so its > Elastic's default). > > There are a few cases where we don't quite get the behaviour we want, and > I am trying to work out how I tweak the analyzer configuration. > > 1) if the user searches using 99.97, then they get the results they > expect, but if they search using 99.97%, they get nothing. They should get > the results that match "99.97%". The default analyzer config loses the %, I > guess. > > 2) I have no idea what the text is ( : ) ) but the user wants to search > using 托克金通贸易 - which is in the data - but currently we get zero results. It > looks like the standard analyzer/tokenizer breaks on each character. > > I *_think_* I just want a whitespace analyzer with lower-casing > However, > a) I am not exactly sure how to configure that, and; > b) I am not 100% sure what I am losing/gaining vs standard analyzer. (dont > need stop-words - in any case default cfg for standard analyser doesn't > have any IIRC) > > (FWIW, on all our other text fields, we tend to use no analyzer) > > (Elastic 1.1.1 and 1.2 ...) > > Cheers. > -M > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/75ee71a8-6533-4a71-bef5-ac59a7d16115%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
_all analyzer advice
Hi all, I have a google-style search capability in my app that uses the _all field with the default (standard) analyzer (I don't configure anything - so its Elastic's default). There are a few cases where we don't quite get the behaviour we want, and I am trying to work out how I tweak the analyzer configuration. 1) if the user searches using 99.97, then they get the results they expect, but if they search using 99.97%, they get nothing. They should get the results that match "99.97%". The default analyzer config loses the %, I guess. 2) I have no idea what the text is ( : ) ) but the user wants to search using 托克金通贸易 - which is in the data - but currently we get zero results. It looks like the standard analyzer/tokenizer breaks on each character. I *_think_* I just want a whitespace analyzer with lower-casing However, a) I am not exactly sure how to configure that, and; b) I am not 100% sure what I am losing/gaining vs standard analyzer. (dont need stop-words - in any case default cfg for standard analyser doesn't have any IIRC) (FWIW, on all our other text fields, we tend to use no analyzer) (Elastic 1.1.1 and 1.2 ...) Cheers. -M -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6c6112e3-bdfb-4664-9fb6-b4b3c87f938f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.