Re: Best way to search/index the data - with and without whitespace
Thale, I played with your data a little and it turns out it is more complex than I thought. Something like this works somewhat but may require some fine-tuning depending on your exact requirements. Anyway give this a try and see how it works (BTW I did this in ES 1.0 RC 2): 1) PUT http://localhost:9200/test { settings: { index: { number_of_shards: 1, number_of_replicas: 0, analysis: { analyzer: { en1: { tokenizer: standard, filter: [ standard, lowercase, en1 ] } }, filter: { en1: { type : ngram, min_gram : 4, max_gram : 4 } } } } }, mappings: { doc: { properties: { street: { type: string, analyzer: en1 } } } } } 2) POST http://localhost:9200/test/doc/_bulk { index: {} } { street: Lakeshore Dr } { index: {} } { street: Sunnyshore Dr } { index: {} } { street: Lake View Dr } { index: {} } { street: Shore Dr } Query example: GET http://localhost:9200/test/doc/_search { query: { match: { street: { query: lake shore dr, minimum_should_match: 350% } } } } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b3c62899-30ad-4223-9dfe-7053e2c72f72%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Best way to search/index the data - with and without whitespace
Hello Binh Ly - Thanks for the replay. I thought I had read that ngram searching should only be used at either index time or search time, but not both... Is that not the case? Thanks again. Thale On Wednesday, January 29, 2014 6:49:10 PM UTC-5, Binh Ly wrote: Thale, I would try edge ngrams (both index and search) and see how that works. I don't see why it wouldn't work for your 2 cases - just make your queries into match queries and use the AND operator. Good luck! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ef6a8b2f-e291-419f-8a8b-1eefa8657d2b%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Best way to search/index the data - with and without whitespace
This is how I set up the mappings: curl -s -XPUT 'localhost:9200/test' -d '{ mappings: { properties: { name: { street: { type: string, index_analyzer: index_ngram, search_analyzer: search_ngram } } } }, settings: { analysis: { filter: { desc_ngram: { type: edgeNGram, min_gram: 3, max_gram: 20 } }, analyzer: { index_ngram: { type: custom, tokenizer: keyword, filter: [ desc_ngram, lowercase ] }, search_ngram: { type: custom, tokenizer: keyword, filter: lowercase } } } } }' This is how I built the index: curl -s -XPUT 'localhost:9200/test/name/1' -d '{ street: Lakeshore Dr }' curl -s -XPUT 'localhost:9200/test/name/2' -d '{ street: Sunnyshore Dr }' curl -s -XPUT 'localhost:9200/test/name/3' -d '{ street: Lake View Dr }' curl -s -XPUT 'localhost:9200/test/name/4' -d '{ street: Shore Dr }' If a user attempts to search for Lake Shore Dr, I want to only match to document 1/Lakeshore Dr If a user attempts to search for Lakeview Dr, I want to only match to document 3/Lake View Dr Here is an example of the query that is not working correctly: curl -s -XGET 'localhost:9200/test/_search?pretty=true' -d '{ query:{ bool:{ must:[ { match:{ street:{ query:lake shore dr, type:boolean } } } ] } } }'; So is the issue with how I am setting up the mappings (tokenizer?, edgegram vs ngrams?, size of ngrams?) or the query (I have tried things like setting the minimum_should_match, and the analyzer to use), but I have not been able to get the desired results. Thanks all. On Thursday, February 6, 2014 10:16:40 AM UTC-5, Binh Ly wrote: Thale, you are correct - ngrams are usually used at index-time only, but in your case and requirements, you might want to experiment both index and seach time. I'd probably just increase the edge min ngram size to something reasonable like maybe 4(?) and see if that works or not. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3b7a9d63-3a08-4cfc-96ce-4b22d44cd9db%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Best way to search/index the data - with and without whitespace
Thale, I would try edge ngrams (both index and search) and see how that works. I don't see why it wouldn't work for your 2 cases - just make your queries into match queries and use the AND operator. Good luck! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/16c67efb-0d8a-48df-a58f-4a2842c0cfda%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.