Hi! This is a sample setup, close to what I am working with

https://gist.github.com/anonymous/6e1457321a8ad78c6af8

As you can see, I am trying to remove the hyphens from all words, so that 
words like "hand-made" are indexed as "handmade". The goal is to make a 
search for "handmade" find all documents, containing "hand-made" and vice 
versa.
For some reason it doesn't work, though :(

I have also attached 3 sample queries. The expected result would be for all 
of them to return the same result set. 
1) Astonishingly, a search for "Chemie-injenieur" finds 2 results, but a 
search for "Chemieingenieur" finds none. This is pretty creepy to me, since 
the char_filter is supposed to strip the hyphens prior to tokenizing in the 
indexing process.
2) Another creepy fact is that if I specify the searchAnalyzer explicitly, 
I find no results (see query 3) from this document set
3) Moreover the analyzeAPI shows that the search term "Chemie-ingenieur" 
gets translated to "Chemieingenieur" using this analyzer

4) And the most creepy facts is that when I run these queries with the 
actual index data (800+ documents), I get 17 results for "Chemie-ingenieur" 
and 22 for "Chemieingenieur", where NONE OF THEM OVERLAPS. I.e. I have a 
total of 39 documents that should be matching either of the queries. Some 
of the documents that match "Chemie-ingenieur" actually don't contain the 
word with the hyphen. So I would expect these documents to be contained in 
both result sets, maybe with a different relevancy score. This is, however, 
not the case.

Please help me get over this, I have been struggling with it for a full 
week already. I would be very grateful for some explanation too, apart from 
a solution, since the output is much different that what I expect from my 
understanding and this means that I don't really understand the system.

P.S. Please focus on the actual problem and let's not discuss the mapping 
into details. The version I have pasted is pretty different than what I 
have started with initially, due to the try-and-error approach I have been 
using for almost a week.

Thanks sincerely,
Georgi

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/417363d0-965f-4398-8174-9889db47d50b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to