Re: Best way to search/index the data - with and without whitespace

2014-02-07 Thread Binh Ly
Thale,

I played with your data a little and it turns out it is more complex than I 
thought. Something like this works somewhat but may require some 
fine-tuning depending on your exact requirements. Anyway give this a try 
and see how it works (BTW I did this in ES 1.0 RC 2):

1) PUT http://localhost:9200/test
{
  settings: {
index: {
  number_of_shards: 1, 
  number_of_replicas: 0,
  analysis: {
analyzer: {
  en1: {
tokenizer: standard,
filter: [
  standard,
  lowercase,
  en1
]
  }
},
filter: {
  en1: {
type : ngram, 
min_gram : 4, 
max_gram : 4
  }
}
  }
}
  }, 
  mappings: {
doc: {
  properties: {
street: {
  type: string,
  analyzer: en1
}
  }
}
  }
}

2) POST http://localhost:9200/test/doc/_bulk
{ index: {} }
{ street: Lakeshore Dr }
{ index: {} }
{ street: Sunnyshore Dr }
{ index: {} }
{ street: Lake View Dr }
{ index: {} }
{ street: Shore Dr }

Query example:

GET http://localhost:9200/test/doc/_search
{
  query: {
match: {
  street: {
query: lake shore dr,
minimum_should_match: 350%
  }
}
  }
}


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b3c62899-30ad-4223-9dfe-7053e2c72f72%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Best way to search/index the data - with and without whitespace

2014-02-06 Thread thale jacobs
Hello Binh Ly - Thanks for the replay.  I thought I had read that ngram 
searching should only be used at either index time or search time, but not 
both...  Is that not the case?  Thanks again.  Thale

On Wednesday, January 29, 2014 6:49:10 PM UTC-5, Binh Ly wrote:

 Thale, I would try edge ngrams (both index and search) and see how that 
 works. I don't see why it wouldn't work for your 2 cases - just make your 
 queries into match queries and use the AND operator. Good luck!


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ef6a8b2f-e291-419f-8a8b-1eefa8657d2b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Best way to search/index the data - with and without whitespace

2014-02-06 Thread thale jacobs
This is how I set up the mappings:

curl -s -XPUT 'localhost:9200/test' -d '{
mappings: {
properties: {
name: {
street: {
type: string,
index_analyzer: index_ngram,
search_analyzer: search_ngram
}
}
}
},
settings: {
analysis: {
filter: {
desc_ngram: {
type: edgeNGram,
min_gram: 3,
max_gram: 20
}
},
analyzer: {
index_ngram: {
type: custom,
tokenizer: keyword,
filter: [ desc_ngram, lowercase ]
},
search_ngram: {
type: custom,
tokenizer: keyword,
filter: lowercase
}
}
}
}
}'


This is how I built the index:

curl -s -XPUT 'localhost:9200/test/name/1' -d '{ street: Lakeshore Dr }'
curl -s -XPUT 'localhost:9200/test/name/2' -d '{ street: Sunnyshore Dr 
}'
curl -s -XPUT 'localhost:9200/test/name/3' -d '{ street: Lake View Dr }'
curl -s -XPUT 'localhost:9200/test/name/4' -d '{ street: Shore Dr }'

If a user attempts to search  for Lake Shore Dr, I want to only match to 
document 1/Lakeshore Dr
If a user attempts to search for Lakeview Dr, I want to only match to 
document 3/Lake View Dr

Here is an example of the query that is not working correctly:

curl -s -XGET 'localhost:9200/test/_search?pretty=true' -d '{
   query:{
  bool:{
 must:[
{
   match:{
  street:{
 query:lake shore dr,
 type:boolean
  }
   }
}
 ]
  }
   }
}';


So is the issue with how I am setting up the mappings (tokenizer?, edgegram 
vs ngrams?, size of ngrams?) or the query (I have tried things like setting 
the minimum_should_match, and the analyzer to use), but I have not been 
able to get the desired results.

Thanks all.







On Thursday, February 6, 2014 10:16:40 AM UTC-5, Binh Ly wrote:

 Thale, you are correct - ngrams are usually used at index-time only, but 
 in your case and requirements, you might want to experiment both index and 
 seach time. I'd probably just increase the edge min ngram size to something 
 reasonable like maybe 4(?) and see if that works or not.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3b7a9d63-3a08-4cfc-96ce-4b22d44cd9db%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Best way to search/index the data - with and without whitespace

2014-01-29 Thread Binh Ly
Thale, I would try edge ngrams (both index and search) and see how that 
works. I don't see why it wouldn't work for your 2 cases - just make your 
queries into match queries and use the AND operator. Good luck!

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/16c67efb-0d8a-48df-a58f-4a2842c0cfda%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.