Re: Searching with wrong keyboard layout or using translit

2010-10-31 Thread Alexey Serba
Another approach for this problem is to use another Solr core for
storing users queries for auto complete functionality ( see
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
) and index not only user_query field, but also transliterated and
diff_layout versions and use dismax query parser to search suggestions
in all fields.

This solution is only viable if you have huge log of user queries (
which I believe google does ).

HTH,
Alex



2010/10/29 Alexander Kanarsky kanarsky2...@gmail.com:
 Pavel,

 it depends on size of your documents corpus, complexity and types of
 the queries you plan to use etc. I would recommend you to search for
 the discussions on synonyms expansion in Lucene (index time vs. query
 time tradeoffs etc.) since your problem is quite similar to that
 (think Moskva vs. Moskwa). Unless you have a small corpus, I would go
 with the second approach and expand the terms during the query time.
 However, the first approach might be useful, too: say, you may want to
 boost the score for the documents that naturally contain the word
 'Moskva', so such a documents will be at the top of the result list.
 Having both forms indexed will allow you to achieve this easily by
 utilizing Solr's dismax query (to boost the results from the field
 with the original terms):
 http://localhost:8983/solr/select/?q=MoskvadefType=dismaxqf=text^10.0+text_translit^0.1
 ('text' field has the original Cyrillic tokens, 'text_translit' is for
 transliterated ones)

 -Alexander


 2010/10/28 Pavel Minchenkov char...@gmail.com:
 Alexander,

 Thanks,
 What variat has better performance?


 2010/10/28 Alexander Kanarsky kanarsky2...@gmail.com

 Pavel,

 I think there is no single way to implement this. Some ideas that
 might be helpful:

 1. Consider adding additional terms while indexing. This assumes
 conversion of Russian text to both translit and wrong keyboard
 forms and index converted terms along with original terms (i.e. your
 Analyzer/Filter should produce Moskva and Vjcrdf for term Москва). You
 may re-use the same field (if you plan for a simple term queries) or
 create a separate fields for the generated terms (better for phrase,
 proximity queries etc. since it keeps the original text positional
 info). Then the query could use any of these forms to fetch the
 document. If you use separate fields, you'll need to expand/create
 your query to search for them, of course.
 2. If you have to index just an original Russian text, you might
 generate all term forms while analyzing the query, then you could
 treat the converted terms as a synonyms and use the combination of
 TermQuery for all term forms or the MultiPhraseQuery for the phrases.
 For Solr in this case you probably will need to add a custom filter
 similar to SynonymFilter.

 Hope this helps,
 -Alexander

 On Wed, Oct 27, 2010 at 1:31 PM, Pavel Minchenkov char...@gmail.com
 wrote:
  Hi,
 
  When I'm trying to search Google with wrong keyboard layout -- it
 corrects
  my query, example: http://www.google.ru/search?q=vjcrdf (I typed word
  Moscow in Russian but in English keyboard layout).
  http://www.google.ru/search?q=vjcrdfAlso, when I'm searching using
  translit, It does the same: http://www.google.ru/search?q=moskva
 
  What is the right way to implement this feature in Solr?
 
  --
  Pavel Minchenkov
 




 --
 Pavel Minchenkov




Re: Searching with wrong keyboard layout or using translit

2010-10-28 Thread Alexander Kanarsky
Pavel,

I think there is no single way to implement this. Some ideas that
might be helpful:

1. Consider adding additional terms while indexing. This assumes
conversion of Russian text to both translit and wrong keyboard
forms and index converted terms along with original terms (i.e. your
Analyzer/Filter should produce Moskva and Vjcrdf for term Москва). You
may re-use the same field (if you plan for a simple term queries) or
create a separate fields for the generated terms (better for phrase,
proximity queries etc. since it keeps the original text positional
info). Then the query could use any of these forms to fetch the
document. If you use separate fields, you'll need to expand/create
your query to search for them, of course.
2. If you have to index just an original Russian text, you might
generate all term forms while analyzing the query, then you could
treat the converted terms as a synonyms and use the combination of
TermQuery for all term forms or the MultiPhraseQuery for the phrases.
For Solr in this case you probably will need to add a custom filter
similar to SynonymFilter.

Hope this helps,
-Alexander

On Wed, Oct 27, 2010 at 1:31 PM, Pavel Minchenkov char...@gmail.com wrote:
 Hi,

 When I'm trying to search Google with wrong keyboard layout -- it corrects
 my query, example: http://www.google.ru/search?q=vjcrdf (I typed word
 Moscow in Russian but in English keyboard layout).
 http://www.google.ru/search?q=vjcrdfAlso, when I'm searching using
 translit, It does the same: http://www.google.ru/search?q=moskva

 What is the right way to implement this feature in Solr?

 --
 Pavel Minchenkov



Re: Searching with wrong keyboard layout or using translit

2010-10-28 Thread Pavel Minchenkov
Alexander,

Thanks,
What variat has better performance?


2010/10/28 Alexander Kanarsky kanarsky2...@gmail.com

 Pavel,

 I think there is no single way to implement this. Some ideas that
 might be helpful:

 1. Consider adding additional terms while indexing. This assumes
 conversion of Russian text to both translit and wrong keyboard
 forms and index converted terms along with original terms (i.e. your
 Analyzer/Filter should produce Moskva and Vjcrdf for term Москва). You
 may re-use the same field (if you plan for a simple term queries) or
 create a separate fields for the generated terms (better for phrase,
 proximity queries etc. since it keeps the original text positional
 info). Then the query could use any of these forms to fetch the
 document. If you use separate fields, you'll need to expand/create
 your query to search for them, of course.
 2. If you have to index just an original Russian text, you might
 generate all term forms while analyzing the query, then you could
 treat the converted terms as a synonyms and use the combination of
 TermQuery for all term forms or the MultiPhraseQuery for the phrases.
 For Solr in this case you probably will need to add a custom filter
 similar to SynonymFilter.

 Hope this helps,
 -Alexander

 On Wed, Oct 27, 2010 at 1:31 PM, Pavel Minchenkov char...@gmail.com
 wrote:
  Hi,
 
  When I'm trying to search Google with wrong keyboard layout -- it
 corrects
  my query, example: http://www.google.ru/search?q=vjcrdf (I typed word
  Moscow in Russian but in English keyboard layout).
  http://www.google.ru/search?q=vjcrdfAlso, when I'm searching using
  translit, It does the same: http://www.google.ru/search?q=moskva
 
  What is the right way to implement this feature in Solr?
 
  --
  Pavel Minchenkov
 




-- 
Pavel Minchenkov


Searching with wrong keyboard layout or using translit

2010-10-27 Thread Pavel Minchenkov
Hi,

When I'm trying to search Google with wrong keyboard layout -- it corrects
my query, example: http://www.google.ru/search?q=vjcrdf (I typed word
Moscow in Russian but in English keyboard layout).
http://www.google.ru/search?q=vjcrdfAlso, when I'm searching using
translit, It does the same: http://www.google.ru/search?q=moskva

What is the right way to implement this feature in Solr?

-- 
Pavel Minchenkov