Re: _all analyzer advice
Ah. Cheers. I had looked at that page a few times but missed that. On Tuesday, 1 July 2014 19:04:56 UTC+1, Glen Smith wrote: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-analyzers.html On Tuesday, July 1, 2014 6:23:54 AM UTC-4, mooky wrote: Thanks. So default_index and default_search have special meaning. Is this in the docs anywhere? -N On Monday, 30 June 2014 17:21:40 UTC+1, Glen Smith wrote: Totally. For example: analyzer: { default_index: { tokenizer: standard, filter: [standard, lowercase] }, default_search: { tokenizer: standard, filter: [standard, lowercase, stop] }, On Monday, June 30, 2014 12:19:55 PM UTC-4, mooky wrote: Excellent. Thanks for the info. Is it possible to set my custom analyser as the default analyser for an index (ie instead of standard_analyzer) -N On Monday, 30 June 2014 14:41:10 UTC+1, Glen Smith wrote: You can set up an analyser for your index... ... my-index: { analysis: { analyzer: { default_index: { tokenizer: standard, filter: [standard, icu_fold_filter, stop] }, default_search: { tokenizer: standard, filter: [standard, icu_fold_filter, stop] }, custom_index: { tokenizer: whitespace, filter: [lower] }, custom_search: { tokenizer: whitespace, filter: [lower] } } } } ... and then map your relevant field accordingly: { _timestamp: { enabled: true, store: yes }, properties: { my_field: { type: string, index_analyzer: custom_index, search_analyzer: custom_search } } } Note that you can (and often should) set up index analysis and search analysis differently (eg if you use synonyms, only expand search terms). Hope I haven't missed the point... On Monday, June 30, 2014 8:47:36 AM UTC-4, mooky wrote: Hi all, I have a google-style search capability in my app that uses the _all field with the default (standard) analyzer (I don't configure anything - so its Elastic's default). There are a few cases where we don't quite get the behaviour we want, and I am trying to work out how I tweak the analyzer configuration. 1) if the user searches using 99.97, then they get the results they expect, but if they search using 99.97%, they get nothing. They should get the results that match 99.97%. The default analyzer config loses the %, I guess. 2) I have no idea what the text is ( : ) ) but the user wants to search using 托克金通贸易 - which is in the data - but currently we get zero results. It looks like the standard analyzer/tokenizer breaks on each character. I *_think_* I just want a whitespace analyzer with lower-casing However, a) I am not exactly sure how to configure that, and; b) I am not 100% sure what I am losing/gaining vs standard analyzer. (dont need stop-words - in any case default cfg for standard analyser doesn't have any IIRC) (FWIW, on all our other text fields, we tend to use no analyzer) (Elastic 1.1.1 and 1.2 ...) Cheers. -M -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6796a0dc-5eaa-4db4-ab47-400215743c61%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: _all analyzer advice
Thanks. So default_index and default_search have special meaning. Is this in the docs anywhere? -N On Monday, 30 June 2014 17:21:40 UTC+1, Glen Smith wrote: Totally. For example: analyzer: { default_index: { tokenizer: standard, filter: [standard, lowercase] }, default_search: { tokenizer: standard, filter: [standard, lowercase, stop] }, On Monday, June 30, 2014 12:19:55 PM UTC-4, mooky wrote: Excellent. Thanks for the info. Is it possible to set my custom analyser as the default analyser for an index (ie instead of standard_analyzer) -N On Monday, 30 June 2014 14:41:10 UTC+1, Glen Smith wrote: You can set up an analyser for your index... ... my-index: { analysis: { analyzer: { default_index: { tokenizer: standard, filter: [standard, icu_fold_filter, stop] }, default_search: { tokenizer: standard, filter: [standard, icu_fold_filter, stop] }, custom_index: { tokenizer: whitespace, filter: [lower] }, custom_search: { tokenizer: whitespace, filter: [lower] } } } } ... and then map your relevant field accordingly: { _timestamp: { enabled: true, store: yes }, properties: { my_field: { type: string, index_analyzer: custom_index, search_analyzer: custom_search } } } Note that you can (and often should) set up index analysis and search analysis differently (eg if you use synonyms, only expand search terms). Hope I haven't missed the point... On Monday, June 30, 2014 8:47:36 AM UTC-4, mooky wrote: Hi all, I have a google-style search capability in my app that uses the _all field with the default (standard) analyzer (I don't configure anything - so its Elastic's default). There are a few cases where we don't quite get the behaviour we want, and I am trying to work out how I tweak the analyzer configuration. 1) if the user searches using 99.97, then they get the results they expect, but if they search using 99.97%, they get nothing. They should get the results that match 99.97%. The default analyzer config loses the %, I guess. 2) I have no idea what the text is ( : ) ) but the user wants to search using 托克金通贸易 - which is in the data - but currently we get zero results. It looks like the standard analyzer/tokenizer breaks on each character. I *_think_* I just want a whitespace analyzer with lower-casing However, a) I am not exactly sure how to configure that, and; b) I am not 100% sure what I am losing/gaining vs standard analyzer. (dont need stop-words - in any case default cfg for standard analyser doesn't have any IIRC) (FWIW, on all our other text fields, we tend to use no analyzer) (Elastic 1.1.1 and 1.2 ...) Cheers. -M -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/20a33da6-0a79-4c48-b378-e5473828c507%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: _all analyzer advice
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-analyzers.html On Tuesday, July 1, 2014 6:23:54 AM UTC-4, mooky wrote: Thanks. So default_index and default_search have special meaning. Is this in the docs anywhere? -N On Monday, 30 June 2014 17:21:40 UTC+1, Glen Smith wrote: Totally. For example: analyzer: { default_index: { tokenizer: standard, filter: [standard, lowercase] }, default_search: { tokenizer: standard, filter: [standard, lowercase, stop] }, On Monday, June 30, 2014 12:19:55 PM UTC-4, mooky wrote: Excellent. Thanks for the info. Is it possible to set my custom analyser as the default analyser for an index (ie instead of standard_analyzer) -N On Monday, 30 June 2014 14:41:10 UTC+1, Glen Smith wrote: You can set up an analyser for your index... ... my-index: { analysis: { analyzer: { default_index: { tokenizer: standard, filter: [standard, icu_fold_filter, stop] }, default_search: { tokenizer: standard, filter: [standard, icu_fold_filter, stop] }, custom_index: { tokenizer: whitespace, filter: [lower] }, custom_search: { tokenizer: whitespace, filter: [lower] } } } } ... and then map your relevant field accordingly: { _timestamp: { enabled: true, store: yes }, properties: { my_field: { type: string, index_analyzer: custom_index, search_analyzer: custom_search } } } Note that you can (and often should) set up index analysis and search analysis differently (eg if you use synonyms, only expand search terms). Hope I haven't missed the point... On Monday, June 30, 2014 8:47:36 AM UTC-4, mooky wrote: Hi all, I have a google-style search capability in my app that uses the _all field with the default (standard) analyzer (I don't configure anything - so its Elastic's default). There are a few cases where we don't quite get the behaviour we want, and I am trying to work out how I tweak the analyzer configuration. 1) if the user searches using 99.97, then they get the results they expect, but if they search using 99.97%, they get nothing. They should get the results that match 99.97%. The default analyzer config loses the %, I guess. 2) I have no idea what the text is ( : ) ) but the user wants to search using 托克金通贸易 - which is in the data - but currently we get zero results. It looks like the standard analyzer/tokenizer breaks on each character. I *_think_* I just want a whitespace analyzer with lower-casing However, a) I am not exactly sure how to configure that, and; b) I am not 100% sure what I am losing/gaining vs standard analyzer. (dont need stop-words - in any case default cfg for standard analyser doesn't have any IIRC) (FWIW, on all our other text fields, we tend to use no analyzer) (Elastic 1.1.1 and 1.2 ...) Cheers. -M -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/da9ec5f7-89a0-4fa4-aafa-1ee05b226a94%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: _all analyzer advice
You can set up an analyser for your index... ... my-index: { analysis: { analyzer: { default_index: { tokenizer: standard, filter: [standard, icu_fold_filter, stop] }, default_search: { tokenizer: standard, filter: [standard, icu_fold_filter, stop] }, custom_index: { tokenizer: whitespace, filter: [lower] }, custom_search: { tokenizer: whitespace, filter: [lower] } } } } ... and then map your relevant field accordingly: { _timestamp: { enabled: true, store: yes }, properties: { my_field: { type: string, index_analyzer: custom_index, search_analyzer: custom_search } } } Note that you can (and often should) set up index analysis and search analysis differently (eg if you use synonyms, only expand search terms). Hope I haven't missed the point... On Monday, June 30, 2014 8:47:36 AM UTC-4, mooky wrote: Hi all, I have a google-style search capability in my app that uses the _all field with the default (standard) analyzer (I don't configure anything - so its Elastic's default). There are a few cases where we don't quite get the behaviour we want, and I am trying to work out how I tweak the analyzer configuration. 1) if the user searches using 99.97, then they get the results they expect, but if they search using 99.97%, they get nothing. They should get the results that match 99.97%. The default analyzer config loses the %, I guess. 2) I have no idea what the text is ( : ) ) but the user wants to search using 托克金通贸易 - which is in the data - but currently we get zero results. It looks like the standard analyzer/tokenizer breaks on each character. I *_think_* I just want a whitespace analyzer with lower-casing However, a) I am not exactly sure how to configure that, and; b) I am not 100% sure what I am losing/gaining vs standard analyzer. (dont need stop-words - in any case default cfg for standard analyser doesn't have any IIRC) (FWIW, on all our other text fields, we tend to use no analyzer) (Elastic 1.1.1 and 1.2 ...) Cheers. -M -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/75ee71a8-6533-4a71-bef5-ac59a7d16115%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: _all analyzer advice
Excellent. Thanks for the info. Is it possible to set my custom analyser as the default analyser for an index (ie instead of standard_analyzer) -N On Monday, 30 June 2014 14:41:10 UTC+1, Glen Smith wrote: You can set up an analyser for your index... ... my-index: { analysis: { analyzer: { default_index: { tokenizer: standard, filter: [standard, icu_fold_filter, stop] }, default_search: { tokenizer: standard, filter: [standard, icu_fold_filter, stop] }, custom_index: { tokenizer: whitespace, filter: [lower] }, custom_search: { tokenizer: whitespace, filter: [lower] } } } } ... and then map your relevant field accordingly: { _timestamp: { enabled: true, store: yes }, properties: { my_field: { type: string, index_analyzer: custom_index, search_analyzer: custom_search } } } Note that you can (and often should) set up index analysis and search analysis differently (eg if you use synonyms, only expand search terms). Hope I haven't missed the point... On Monday, June 30, 2014 8:47:36 AM UTC-4, mooky wrote: Hi all, I have a google-style search capability in my app that uses the _all field with the default (standard) analyzer (I don't configure anything - so its Elastic's default). There are a few cases where we don't quite get the behaviour we want, and I am trying to work out how I tweak the analyzer configuration. 1) if the user searches using 99.97, then they get the results they expect, but if they search using 99.97%, they get nothing. They should get the results that match 99.97%. The default analyzer config loses the %, I guess. 2) I have no idea what the text is ( : ) ) but the user wants to search using 托克金通贸易 - which is in the data - but currently we get zero results. It looks like the standard analyzer/tokenizer breaks on each character. I *_think_* I just want a whitespace analyzer with lower-casing However, a) I am not exactly sure how to configure that, and; b) I am not 100% sure what I am losing/gaining vs standard analyzer. (dont need stop-words - in any case default cfg for standard analyser doesn't have any IIRC) (FWIW, on all our other text fields, we tend to use no analyzer) (Elastic 1.1.1 and 1.2 ...) Cheers. -M -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ea2f0a12-1a51-40a1-983e-f3265fae29eb%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: _all analyzer advice
Hi Glen, On a related note, I have a use case where I want to search using wild-cards on a custom analyzed field. I am currently seeing some discrepancies w.r.t what I expect. Basically, I have string data in a field such as Name-55, Name-56 etc. I want to be able to search for Name-5*, and get these results. I have indexed the data as terms Name, -, 55 Name, -, 56 I am using a custom pattern analyzer to achieve this. I am using a similar custom pattern analyzer for my query string, except that I am swallowing ,? and *. my_template : { template : *, order: 1, settings :{ analysis: { analyzer: { custom_index:{ type: pattern, pattern:([\\s]+)|((?=\\p{L})(?=\\P{L})|((?=\\P{L})(?=\\p{L}))|((?=\\d)(?=\\D))|((?=\\D)(?=\\d))) }, custom_search:{ type: pattern, pattern:([?*\\s]+)|((?=\\p{L})(?=\\P{L})|((?=\\P{L})(?=\\p{L}))|((?=\\d)(?=\\D))|((?=\\D)(?=\\d))) } } } }, mappings : { account : { properties : { myfield : { type : string, store : yes, index : analyzed, index_analyzer :custom_index, search_analyzer:custom_search }} Using this, I see that when I search for Name-5*, I do not get any results returned. However, if I search for Name- 5* (Note additional white-space in the search string), then I get the results Name-55 and Name-56. Do you have an understanding of why elasticsearch may be exhibiting this behavior? Is there some issue in the way I have setup the patterns in my analyzer? Your help is much appreciated! Thanks, On Monday, June 30, 2014 9:21:40 AM UTC-7, Glen Smith wrote: Totally. For example: analyzer: { default_index: { tokenizer: standard, filter: [standard, lowercase] }, default_search: { tokenizer: standard, filter: [standard, lowercase, stop] }, On Monday, June 30, 2014 12:19:55 PM UTC-4, mooky wrote: Excellent. Thanks for the info. Is it possible to set my custom analyser as the default analyser for an index (ie instead of standard_analyzer) -N On Monday, 30 June 2014 14:41:10 UTC+1, Glen Smith wrote: You can set up an analyser for your index... ... my-index: { analysis: { analyzer: { default_index: { tokenizer: standard, filter: [standard, icu_fold_filter, stop] }, default_search: { tokenizer: standard, filter: [standard, icu_fold_filter, stop] }, custom_index: { tokenizer: whitespace, filter: [lower] }, custom_search: { tokenizer: whitespace, filter: [lower] } } } } ... and then map your relevant field accordingly: { _timestamp: { enabled: true, store: yes }, properties: { my_field: { type: string, index_analyzer: custom_index, search_analyzer: custom_search } } } Note that you can (and often should) set up index analysis and search analysis differently (eg if you use synonyms, only expand search terms). Hope I haven't missed the point... On Monday, June 30, 2014 8:47:36 AM UTC-4, mooky wrote: Hi all, I have a google-style search capability in my app that uses the _all field with the default (standard) analyzer (I don't configure anything - so its Elastic's default). There are a few cases where we don't quite get the behaviour we want, and I am trying to work out how I tweak the analyzer configuration. 1) if the user searches using 99.97, then they get the results they expect, but if they search using 99.97%, they get nothing. They should get the results that match 99.97%. The default analyzer config loses the %, I guess. 2) I have no idea what the text is ( : ) ) but the user wants to search using 托克金通贸易 - which is in the data - but currently we get zero results. It looks like the standard analyzer/tokenizer breaks on each character. I *_think_* I just want a whitespace analyzer with lower-casing However, a) I am not exactly sure how to configure that, and; b) I am not 100% sure what I am losing/gaining vs standard analyzer. (dont need stop-words - in any case default cfg for standard analyser doesn't have any IIRC) (FWIW, on all our other text fields, we tend to use no analyzer) (Elastic 1.1.1 and 1.2 ...) Cheers. -M -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to