Re: _all analyzer advice

2014-07-02 Thread mooky
Ah. Cheers.
I had looked at that page a few times but missed that.

On Tuesday, 1 July 2014 19:04:56 UTC+1, Glen Smith wrote:


 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-analyzers.html

 On Tuesday, July 1, 2014 6:23:54 AM UTC-4, mooky wrote:

 Thanks.
 So default_index and default_search have special meaning.
 Is this in the docs anywhere?

 -N



 On Monday, 30 June 2014 17:21:40 UTC+1, Glen Smith wrote:

 Totally. For example:

 analyzer: {
 default_index: {
 tokenizer: standard,
 filter: [standard, lowercase]
 },
 default_search: {
 tokenizer: standard,
 filter: [standard, lowercase, stop]
 },


 On Monday, June 30, 2014 12:19:55 PM UTC-4, mooky wrote:

 Excellent. Thanks for the info.

 Is it possible to set my custom analyser as the default analyser for an 
 index (ie instead of standard_analyzer)

 -N

 On Monday, 30 June 2014 14:41:10 UTC+1, Glen Smith wrote:

 You can set up an analyser for your index...

 ...
 my-index: {
 analysis: {
 analyzer: {
 default_index: {
 tokenizer: standard,
 filter: [standard, icu_fold_filter, stop]
 },
 default_search: {
 tokenizer: standard,
 filter: [standard, icu_fold_filter, stop]
 },
 custom_index: {
 tokenizer: whitespace,
 filter: [lower]
 },
 custom_search: {
 tokenizer: whitespace,
 filter: [lower]
 }
 }
 }
 }
 ...

 and then map your relevant field accordingly:

 {
 _timestamp: {
 enabled: true,
 store: yes
 },
 properties: {
 my_field: {
 type: string,
 index_analyzer: custom_index,
 search_analyzer: custom_search
 }
 }
 }


 Note that you can (and often should) set up index analysis and search 
 analysis differently (eg if you use synonyms, only expand search terms).

 Hope I haven't missed the point...

 On Monday, June 30, 2014 8:47:36 AM UTC-4, mooky wrote:

 Hi all,

 I have a google-style search capability in my app that uses the _all 
 field with the default (standard) analyzer (I don't configure anything - 
 so 
 its Elastic's default).

 There are a few cases where we don't quite get the behaviour we want, 
 and I am trying to work out how I tweak the analyzer configuration.

 1) if the user searches using 99.97, then they get the results they 
 expect, but if they search using 99.97%, they get nothing. They should 
 get 
 the results that match 99.97%. The default analyzer config loses the 
 %, I 
 guess.

 2) I have no idea what the text is ( : ) ) but the user wants to 
 search using 托克金通贸易 - which is in the data - but currently we get zero 
 results. It looks like the standard analyzer/tokenizer breaks on each 
 character.

 I *_think_* I just want a whitespace analyzer with lower-casing 
 However, 
 a) I am not exactly sure how to configure that, and;
 b) I am not 100% sure what I am losing/gaining vs standard analyzer. 
 (dont need stop-words - in any case default cfg for standard analyser 
 doesn't have any IIRC)

 (FWIW, on all our other text fields, we tend to use no analyzer)

 (Elastic 1.1.1 and 1.2 ...)

 Cheers.
 -M



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6796a0dc-5eaa-4db4-ab47-400215743c61%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: _all analyzer advice

2014-07-01 Thread mooky
Thanks.
So default_index and default_search have special meaning.
Is this in the docs anywhere?

-N



On Monday, 30 June 2014 17:21:40 UTC+1, Glen Smith wrote:

 Totally. For example:

 analyzer: {
 default_index: {
 tokenizer: standard,
 filter: [standard, lowercase]
 },
 default_search: {
 tokenizer: standard,
 filter: [standard, lowercase, stop]
 },


 On Monday, June 30, 2014 12:19:55 PM UTC-4, mooky wrote:

 Excellent. Thanks for the info.

 Is it possible to set my custom analyser as the default analyser for an 
 index (ie instead of standard_analyzer)

 -N

 On Monday, 30 June 2014 14:41:10 UTC+1, Glen Smith wrote:

 You can set up an analyser for your index...

 ...
 my-index: {
 analysis: {
 analyzer: {
 default_index: {
 tokenizer: standard,
 filter: [standard, icu_fold_filter, stop]
 },
 default_search: {
 tokenizer: standard,
 filter: [standard, icu_fold_filter, stop]
 },
 custom_index: {
 tokenizer: whitespace,
 filter: [lower]
 },
 custom_search: {
 tokenizer: whitespace,
 filter: [lower]
 }
 }
 }
 }
 ...

 and then map your relevant field accordingly:

 {
 _timestamp: {
 enabled: true,
 store: yes
 },
 properties: {
 my_field: {
 type: string,
 index_analyzer: custom_index,
 search_analyzer: custom_search
 }
 }
 }


 Note that you can (and often should) set up index analysis and search 
 analysis differently (eg if you use synonyms, only expand search terms).

 Hope I haven't missed the point...

 On Monday, June 30, 2014 8:47:36 AM UTC-4, mooky wrote:

 Hi all,

 I have a google-style search capability in my app that uses the _all 
 field with the default (standard) analyzer (I don't configure anything - 
 so 
 its Elastic's default).

 There are a few cases where we don't quite get the behaviour we want, 
 and I am trying to work out how I tweak the analyzer configuration.

 1) if the user searches using 99.97, then they get the results they 
 expect, but if they search using 99.97%, they get nothing. They should get 
 the results that match 99.97%. The default analyzer config loses the %, 
 I 
 guess.

 2) I have no idea what the text is ( : ) ) but the user wants to search 
 using 托克金通贸易 - which is in the data - but currently we get zero results. 
 It 
 looks like the standard analyzer/tokenizer breaks on each character.

 I *_think_* I just want a whitespace analyzer with lower-casing 
 However, 
 a) I am not exactly sure how to configure that, and;
 b) I am not 100% sure what I am losing/gaining vs standard analyzer. 
 (dont need stop-words - in any case default cfg for standard analyser 
 doesn't have any IIRC)

 (FWIW, on all our other text fields, we tend to use no analyzer)

 (Elastic 1.1.1 and 1.2 ...)

 Cheers.
 -M



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/20a33da6-0a79-4c48-b378-e5473828c507%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: _all analyzer advice

2014-07-01 Thread Glen Smith
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-analyzers.html

On Tuesday, July 1, 2014 6:23:54 AM UTC-4, mooky wrote:

 Thanks.
 So default_index and default_search have special meaning.
 Is this in the docs anywhere?

 -N



 On Monday, 30 June 2014 17:21:40 UTC+1, Glen Smith wrote:

 Totally. For example:

 analyzer: {
 default_index: {
 tokenizer: standard,
 filter: [standard, lowercase]
 },
 default_search: {
 tokenizer: standard,
 filter: [standard, lowercase, stop]
 },


 On Monday, June 30, 2014 12:19:55 PM UTC-4, mooky wrote:

 Excellent. Thanks for the info.

 Is it possible to set my custom analyser as the default analyser for an 
 index (ie instead of standard_analyzer)

 -N

 On Monday, 30 June 2014 14:41:10 UTC+1, Glen Smith wrote:

 You can set up an analyser for your index...

 ...
 my-index: {
 analysis: {
 analyzer: {
 default_index: {
 tokenizer: standard,
 filter: [standard, icu_fold_filter, stop]
 },
 default_search: {
 tokenizer: standard,
 filter: [standard, icu_fold_filter, stop]
 },
 custom_index: {
 tokenizer: whitespace,
 filter: [lower]
 },
 custom_search: {
 tokenizer: whitespace,
 filter: [lower]
 }
 }
 }
 }
 ...

 and then map your relevant field accordingly:

 {
 _timestamp: {
 enabled: true,
 store: yes
 },
 properties: {
 my_field: {
 type: string,
 index_analyzer: custom_index,
 search_analyzer: custom_search
 }
 }
 }


 Note that you can (and often should) set up index analysis and search 
 analysis differently (eg if you use synonyms, only expand search terms).

 Hope I haven't missed the point...

 On Monday, June 30, 2014 8:47:36 AM UTC-4, mooky wrote:

 Hi all,

 I have a google-style search capability in my app that uses the _all 
 field with the default (standard) analyzer (I don't configure anything - 
 so 
 its Elastic's default).

 There are a few cases where we don't quite get the behaviour we want, 
 and I am trying to work out how I tweak the analyzer configuration.

 1) if the user searches using 99.97, then they get the results they 
 expect, but if they search using 99.97%, they get nothing. They should 
 get 
 the results that match 99.97%. The default analyzer config loses the %, 
 I 
 guess.

 2) I have no idea what the text is ( : ) ) but the user wants to 
 search using 托克金通贸易 - which is in the data - but currently we get zero 
 results. It looks like the standard analyzer/tokenizer breaks on each 
 character.

 I *_think_* I just want a whitespace analyzer with lower-casing 
 However, 
 a) I am not exactly sure how to configure that, and;
 b) I am not 100% sure what I am losing/gaining vs standard analyzer. 
 (dont need stop-words - in any case default cfg for standard analyser 
 doesn't have any IIRC)

 (FWIW, on all our other text fields, we tend to use no analyzer)

 (Elastic 1.1.1 and 1.2 ...)

 Cheers.
 -M



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/da9ec5f7-89a0-4fa4-aafa-1ee05b226a94%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: _all analyzer advice

2014-06-30 Thread Glen Smith
You can set up an analyser for your index...

...
my-index: {
analysis: {
analyzer: {
default_index: {
tokenizer: standard,
filter: [standard, icu_fold_filter, stop]
},
default_search: {
tokenizer: standard,
filter: [standard, icu_fold_filter, stop]
},
custom_index: {
tokenizer: whitespace,
filter: [lower]
},
custom_search: {
tokenizer: whitespace,
filter: [lower]
}
}
}
}
...

and then map your relevant field accordingly:

{
_timestamp: {
enabled: true,
store: yes
},
properties: {
my_field: {
type: string,
index_analyzer: custom_index,
search_analyzer: custom_search
}
}
}


Note that you can (and often should) set up index analysis and search 
analysis differently (eg if you use synonyms, only expand search terms).

Hope I haven't missed the point...

On Monday, June 30, 2014 8:47:36 AM UTC-4, mooky wrote:

 Hi all,

 I have a google-style search capability in my app that uses the _all field 
 with the default (standard) analyzer (I don't configure anything - so its 
 Elastic's default).

 There are a few cases where we don't quite get the behaviour we want, and 
 I am trying to work out how I tweak the analyzer configuration.

 1) if the user searches using 99.97, then they get the results they 
 expect, but if they search using 99.97%, they get nothing. They should get 
 the results that match 99.97%. The default analyzer config loses the %, I 
 guess.

 2) I have no idea what the text is ( : ) ) but the user wants to search 
 using 托克金通贸易 - which is in the data - but currently we get zero results. It 
 looks like the standard analyzer/tokenizer breaks on each character.

 I *_think_* I just want a whitespace analyzer with lower-casing 
 However, 
 a) I am not exactly sure how to configure that, and;
 b) I am not 100% sure what I am losing/gaining vs standard analyzer. (dont 
 need stop-words - in any case default cfg for standard analyser doesn't 
 have any IIRC)

 (FWIW, on all our other text fields, we tend to use no analyzer)

 (Elastic 1.1.1 and 1.2 ...)

 Cheers.
 -M


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/75ee71a8-6533-4a71-bef5-ac59a7d16115%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: _all analyzer advice

2014-06-30 Thread mooky
Excellent. Thanks for the info.

Is it possible to set my custom analyser as the default analyser for an 
index (ie instead of standard_analyzer)

-N

On Monday, 30 June 2014 14:41:10 UTC+1, Glen Smith wrote:

 You can set up an analyser for your index...

 ...
 my-index: {
 analysis: {
 analyzer: {
 default_index: {
 tokenizer: standard,
 filter: [standard, icu_fold_filter, stop]
 },
 default_search: {
 tokenizer: standard,
 filter: [standard, icu_fold_filter, stop]
 },
 custom_index: {
 tokenizer: whitespace,
 filter: [lower]
 },
 custom_search: {
 tokenizer: whitespace,
 filter: [lower]
 }
 }
 }
 }
 ...

 and then map your relevant field accordingly:

 {
 _timestamp: {
 enabled: true,
 store: yes
 },
 properties: {
 my_field: {
 type: string,
 index_analyzer: custom_index,
 search_analyzer: custom_search
 }
 }
 }


 Note that you can (and often should) set up index analysis and search 
 analysis differently (eg if you use synonyms, only expand search terms).

 Hope I haven't missed the point...

 On Monday, June 30, 2014 8:47:36 AM UTC-4, mooky wrote:

 Hi all,

 I have a google-style search capability in my app that uses the _all 
 field with the default (standard) analyzer (I don't configure anything - so 
 its Elastic's default).

 There are a few cases where we don't quite get the behaviour we want, and 
 I am trying to work out how I tweak the analyzer configuration.

 1) if the user searches using 99.97, then they get the results they 
 expect, but if they search using 99.97%, they get nothing. They should get 
 the results that match 99.97%. The default analyzer config loses the %, I 
 guess.

 2) I have no idea what the text is ( : ) ) but the user wants to search 
 using 托克金通贸易 - which is in the data - but currently we get zero results. It 
 looks like the standard analyzer/tokenizer breaks on each character.

 I *_think_* I just want a whitespace analyzer with lower-casing 
 However, 
 a) I am not exactly sure how to configure that, and;
 b) I am not 100% sure what I am losing/gaining vs standard analyzer. 
 (dont need stop-words - in any case default cfg for standard analyser 
 doesn't have any IIRC)

 (FWIW, on all our other text fields, we tend to use no analyzer)

 (Elastic 1.1.1 and 1.2 ...)

 Cheers.
 -M



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ea2f0a12-1a51-40a1-983e-f3265fae29eb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: _all analyzer advice

2014-06-30 Thread Robbie
Hi Glen,
 On a related note, I have a use case where I want to search using 
wild-cards on a custom analyzed field. I am currently seeing some 
discrepancies w.r.t what I expect.

Basically, I have string data in a field such as Name-55, Name-56 etc. 
I want to be able to search for Name-5*, and get these results.

I have indexed the data as terms 
Name, -, 55
Name, -, 56

I am using a custom pattern analyzer to achieve this. I am using a similar 
custom pattern analyzer for my query string, except that I am swallowing 
,? and *.


my_template : {
template : *,
order: 1,
settings :{
analysis: {
   analyzer: {
   custom_index:{
type: pattern,
pattern:([\\s]+)|((?=\\p{L})(?=\\P{L})|((?=\\P{L})(?=\\p{L}))|((?=\\d)(?=\\D))|((?=\\D)(?=\\d)))
},
custom_search:{
type: pattern,
pattern:([?*\\s]+)|((?=\\p{L})(?=\\P{L})|((?=\\P{L})(?=\\p{L}))|((?=\\d)(?=\\D))|((?=\\D)(?=\\d)))
}
   }
}
},
mappings : {
account : {
properties : {
myfield : {
type : string,
store : yes,
index : analyzed,
index_analyzer :custom_index,
search_analyzer:custom_search
}}


Using this, I see that when I search for Name-5*, I do not get any 
results returned.

However, if I search for Name- 5* (Note additional white-space in the 
search string), then I get the results Name-55 and Name-56.

Do you have an understanding of why elasticsearch may be exhibiting this 
behavior? Is there some issue in the way I have setup the patterns in my 
analyzer?

Your help is much appreciated!

Thanks,

On Monday, June 30, 2014 9:21:40 AM UTC-7, Glen Smith wrote:

 Totally. For example:

 analyzer: {
 default_index: {
 tokenizer: standard,
 filter: [standard, lowercase]
 },
 default_search: {
 tokenizer: standard,
 filter: [standard, lowercase, stop]
 },


 On Monday, June 30, 2014 12:19:55 PM UTC-4, mooky wrote:

 Excellent. Thanks for the info.

 Is it possible to set my custom analyser as the default analyser for an 
 index (ie instead of standard_analyzer)

 -N

 On Monday, 30 June 2014 14:41:10 UTC+1, Glen Smith wrote:

 You can set up an analyser for your index...

 ...
 my-index: {
 analysis: {
 analyzer: {
 default_index: {
 tokenizer: standard,
 filter: [standard, icu_fold_filter, stop]
 },
 default_search: {
 tokenizer: standard,
 filter: [standard, icu_fold_filter, stop]
 },
 custom_index: {
 tokenizer: whitespace,
 filter: [lower]
 },
 custom_search: {
 tokenizer: whitespace,
 filter: [lower]
 }
 }
 }
 }
 ...

 and then map your relevant field accordingly:

 {
 _timestamp: {
 enabled: true,
 store: yes
 },
 properties: {
 my_field: {
 type: string,
 index_analyzer: custom_index,
 search_analyzer: custom_search
 }
 }
 }


 Note that you can (and often should) set up index analysis and search 
 analysis differently (eg if you use synonyms, only expand search terms).

 Hope I haven't missed the point...

 On Monday, June 30, 2014 8:47:36 AM UTC-4, mooky wrote:

 Hi all,

 I have a google-style search capability in my app that uses the _all 
 field with the default (standard) analyzer (I don't configure anything - 
 so 
 its Elastic's default).

 There are a few cases where we don't quite get the behaviour we want, 
 and I am trying to work out how I tweak the analyzer configuration.

 1) if the user searches using 99.97, then they get the results they 
 expect, but if they search using 99.97%, they get nothing. They should get 
 the results that match 99.97%. The default analyzer config loses the %, 
 I 
 guess.

 2) I have no idea what the text is ( : ) ) but the user wants to search 
 using 托克金通贸易 - which is in the data - but currently we get zero results. 
 It 
 looks like the standard analyzer/tokenizer breaks on each character.

 I *_think_* I just want a whitespace analyzer with lower-casing 
 However, 
 a) I am not exactly sure how to configure that, and;
 b) I am not 100% sure what I am losing/gaining vs standard analyzer. 
 (dont need stop-words - in any case default cfg for standard analyser 
 doesn't have any IIRC)

 (FWIW, on all our other text fields, we tend to use no analyzer)

 (Elastic 1.1.1 and 1.2 ...)

 Cheers.
 -M



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to