Re: Searching with wrong keyboard layout or using translit
Another approach for this problem is to use another Solr core for storing users queries for auto complete functionality ( see http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ ) and index not only user_query field, but also transliterated and diff_layout versions and use dismax query parser to search suggestions in all fields. This solution is only viable if you have huge log of user queries ( which I believe google does ). HTH, Alex 2010/10/29 Alexander Kanarsky kanarsky2...@gmail.com: Pavel, it depends on size of your documents corpus, complexity and types of the queries you plan to use etc. I would recommend you to search for the discussions on synonyms expansion in Lucene (index time vs. query time tradeoffs etc.) since your problem is quite similar to that (think Moskva vs. Moskwa). Unless you have a small corpus, I would go with the second approach and expand the terms during the query time. However, the first approach might be useful, too: say, you may want to boost the score for the documents that naturally contain the word 'Moskva', so such a documents will be at the top of the result list. Having both forms indexed will allow you to achieve this easily by utilizing Solr's dismax query (to boost the results from the field with the original terms): http://localhost:8983/solr/select/?q=MoskvadefType=dismaxqf=text^10.0+text_translit^0.1 ('text' field has the original Cyrillic tokens, 'text_translit' is for transliterated ones) -Alexander 2010/10/28 Pavel Minchenkov char...@gmail.com: Alexander, Thanks, What variat has better performance? 2010/10/28 Alexander Kanarsky kanarsky2...@gmail.com Pavel, I think there is no single way to implement this. Some ideas that might be helpful: 1. Consider adding additional terms while indexing. This assumes conversion of Russian text to both translit and wrong keyboard forms and index converted terms along with original terms (i.e. your Analyzer/Filter should produce Moskva and Vjcrdf for term Москва). You may re-use the same field (if you plan for a simple term queries) or create a separate fields for the generated terms (better for phrase, proximity queries etc. since it keeps the original text positional info). Then the query could use any of these forms to fetch the document. If you use separate fields, you'll need to expand/create your query to search for them, of course. 2. If you have to index just an original Russian text, you might generate all term forms while analyzing the query, then you could treat the converted terms as a synonyms and use the combination of TermQuery for all term forms or the MultiPhraseQuery for the phrases. For Solr in this case you probably will need to add a custom filter similar to SynonymFilter. Hope this helps, -Alexander On Wed, Oct 27, 2010 at 1:31 PM, Pavel Minchenkov char...@gmail.com wrote: Hi, When I'm trying to search Google with wrong keyboard layout -- it corrects my query, example: http://www.google.ru/search?q=vjcrdf (I typed word Moscow in Russian but in English keyboard layout). http://www.google.ru/search?q=vjcrdfAlso, when I'm searching using translit, It does the same: http://www.google.ru/search?q=moskva What is the right way to implement this feature in Solr? -- Pavel Minchenkov -- Pavel Minchenkov
Re: Searching with wrong keyboard layout or using translit
Pavel, I think there is no single way to implement this. Some ideas that might be helpful: 1. Consider adding additional terms while indexing. This assumes conversion of Russian text to both translit and wrong keyboard forms and index converted terms along with original terms (i.e. your Analyzer/Filter should produce Moskva and Vjcrdf for term Москва). You may re-use the same field (if you plan for a simple term queries) or create a separate fields for the generated terms (better for phrase, proximity queries etc. since it keeps the original text positional info). Then the query could use any of these forms to fetch the document. If you use separate fields, you'll need to expand/create your query to search for them, of course. 2. If you have to index just an original Russian text, you might generate all term forms while analyzing the query, then you could treat the converted terms as a synonyms and use the combination of TermQuery for all term forms or the MultiPhraseQuery for the phrases. For Solr in this case you probably will need to add a custom filter similar to SynonymFilter. Hope this helps, -Alexander On Wed, Oct 27, 2010 at 1:31 PM, Pavel Minchenkov char...@gmail.com wrote: Hi, When I'm trying to search Google with wrong keyboard layout -- it corrects my query, example: http://www.google.ru/search?q=vjcrdf (I typed word Moscow in Russian but in English keyboard layout). http://www.google.ru/search?q=vjcrdfAlso, when I'm searching using translit, It does the same: http://www.google.ru/search?q=moskva What is the right way to implement this feature in Solr? -- Pavel Minchenkov
Re: Searching with wrong keyboard layout or using translit
Alexander, Thanks, What variat has better performance? 2010/10/28 Alexander Kanarsky kanarsky2...@gmail.com Pavel, I think there is no single way to implement this. Some ideas that might be helpful: 1. Consider adding additional terms while indexing. This assumes conversion of Russian text to both translit and wrong keyboard forms and index converted terms along with original terms (i.e. your Analyzer/Filter should produce Moskva and Vjcrdf for term Москва). You may re-use the same field (if you plan for a simple term queries) or create a separate fields for the generated terms (better for phrase, proximity queries etc. since it keeps the original text positional info). Then the query could use any of these forms to fetch the document. If you use separate fields, you'll need to expand/create your query to search for them, of course. 2. If you have to index just an original Russian text, you might generate all term forms while analyzing the query, then you could treat the converted terms as a synonyms and use the combination of TermQuery for all term forms or the MultiPhraseQuery for the phrases. For Solr in this case you probably will need to add a custom filter similar to SynonymFilter. Hope this helps, -Alexander On Wed, Oct 27, 2010 at 1:31 PM, Pavel Minchenkov char...@gmail.com wrote: Hi, When I'm trying to search Google with wrong keyboard layout -- it corrects my query, example: http://www.google.ru/search?q=vjcrdf (I typed word Moscow in Russian but in English keyboard layout). http://www.google.ru/search?q=vjcrdfAlso, when I'm searching using translit, It does the same: http://www.google.ru/search?q=moskva What is the right way to implement this feature in Solr? -- Pavel Minchenkov -- Pavel Minchenkov
Searching with wrong keyboard layout or using translit
Hi, When I'm trying to search Google with wrong keyboard layout -- it corrects my query, example: http://www.google.ru/search?q=vjcrdf (I typed word Moscow in Russian but in English keyboard layout). http://www.google.ru/search?q=vjcrdfAlso, when I'm searching using translit, It does the same: http://www.google.ru/search?q=moskva What is the right way to implement this feature in Solr? -- Pavel Minchenkov