Re: Re: How to properly use Levenstein distance with ~ in Java
Hi Aleksander, The Fuzzy Searche '~' is not supported in dismax (defType=dismax) https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser You are using SearchComponent spellchecker. This does not change the query results. btw: It looks like you are using path /select with qt=dismax. This normaly would throw an exception. Is there a tag requestHandler name=/dismax ... inside your solrconfig.xml ? Best regards Karsten P.S. in Context: http://lucene.472066.n3.nabble.com/How-to-properly-use-Levenstein-distance-with-in-Java-td4164793.html On 20 October 2014 11:13, Aleksander Sadecki wrote: Ok, thank you for your response. But why I cannot use '~'?
Re: How to properly use Levenstein distance with ~ in Java
We’re reimplementing fuzzy support in edismax on Solr 4.x right now. See: https://issues.apache.org/jira/browse/SOLR-629 wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Oct 22, 2014, at 11:08 PM, karsten-s...@gmx.de wrote: Hi Aleksander, The Fuzzy Searche '~' is not supported in dismax (defType=dismax) https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser You are using SearchComponent spellchecker. This does not change the query results. btw: It looks like you are using path /select with qt=dismax. This normaly would throw an exception. Is there a tag requestHandler name=/dismax ... inside your solrconfig.xml ? Best regards Karsten P.S. in Context: http://lucene.472066.n3.nabble.com/How-to-properly-use-Levenstein-distance-with-in-Java-td4164793.html On 20 October 2014 11:13, Aleksander Sadecki wrote: Ok, thank you for your response. But why I cannot use '~'?
Re: How to properly use Levenstein distance with ~ in Java
The last real update on that is 2.5 years old. Is there more recent update? I am interested in this topic as well. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 23 October 2014 10:10, Walter Underwood wun...@wunderwood.org wrote: We’re reimplementing fuzzy support in edismax on Solr 4.x right now. See: https://issues.apache.org/jira/browse/SOLR-629 wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Oct 22, 2014, at 11:08 PM, karsten-s...@gmx.de wrote: Hi Aleksander, The Fuzzy Searche '~' is not supported in dismax (defType=dismax) https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser You are using SearchComponent spellchecker. This does not change the query results. btw: It looks like you are using path /select with qt=dismax. This normaly would throw an exception. Is there a tag requestHandler name=/dismax ... inside your solrconfig.xml ? Best regards Karsten P.S. in Context: http://lucene.472066.n3.nabble.com/How-to-properly-use-Levenstein-distance-with-in-Java-td4164793.html On 20 October 2014 11:13, Aleksander Sadecki wrote: Ok, thank you for your response. But why I cannot use '~'?
RE: How to properly use Levenstein distance with ~ in Java
In terms of recent work with edit-distance (specifically Levenshtein) and your expressed interest might find this paper provocative. We measure the keyword similarity between two strings by lemmatizing them, removing stopwords, and computing the cosine similarity. We then include the keyword similar- ity between the query and the input question, the keyword similarity between the query and the returned evidence, and an indicator feature for whether the query involves a join. The evidence features compute KB-specific properties... We compute the join-key string similarity mea- sured using the Levenshtein distance. http://dx.doi.org/10.1145/2623330.2623677 re will -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Thursday, October 23, 2014 12:05 PM To: solr-user Subject: Re: How to properly use Levenstein distance with ~ in Java The last real update on that is 2.5 years old. Is there more recent update? I am interested in this topic as well. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 23 October 2014 10:10, Walter Underwood wun...@wunderwood.org wrote: We’re reimplementing fuzzy support in edismax on Solr 4.x right now. See: https://issues.apache.org/jira/browse/SOLR-629 wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Oct 22, 2014, at 11:08 PM, karsten-s...@gmx.de wrote: Hi Aleksander, The Fuzzy Searche '~' is not supported in dismax (defType=dismax) https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Par ser You are using SearchComponent spellchecker. This does not change the query results. btw: It looks like you are using path /select with qt=dismax. This normaly would throw an exception. Is there a tag requestHandler name=/dismax ... inside your solrconfig.xml ? Best regards Karsten P.S. in Context: http://lucene.472066.n3.nabble.com/How-to-properly-use-Levenstein-dis tance-with-in-Java-td4164793.html On 20 October 2014 11:13, Aleksander Sadecki wrote: Ok, thank you for your response. But why I cannot use '~'?
Re: How to properly use Levenstein distance with ~ in Java
When used on bare terms, ~ is indeed fuzzy matching rather than proximity, it's an overloaded operator in that sense. If I had to guess, I'd guess that your analysis chain for the field is doing interesting things for taveranx and the resulting token is far enough away (in the Levenshtein sense) that it's not found. The admin/analysis page is very much your friend here, it'll show you what the term taveranx becomes in your index. You might try varying the closeness of the term by adding taveranx~0.2 (or whatever) to your query to see if it's eventually found. And as a test see if specifying fuzzy operations works on other terms, in which case my hypothesis will get a little support Best, Erick On Tue, Oct 21, 2014 at 1:07 AM, Ramzi Alqrainy ramzi.alqra...@gmail.com wrote: Because ~ is proximity matching. Lucene supports finding words are a within a specific distance away. Search for foo bar within 4 words from each other. foo bar~4 Note that for proximity searches, exact matches are proximity zero, and word transpositions (bar foo) are proximity 1. A query such as foo bar~1000 is an interesting alternative to foo AND bar. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-properly-use-Levenstein-distance-with-in-Java-tp4164793p4165079.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to properly use Levenstein distance with ~ in Java
Ok, thank you for your response. But why I cannot use '~'? On 20 October 2014 07:40, Ramzi Alqrainy ramzi.alqra...@gmail.com wrote: You can use Levenstein Distance algorithm inside solr without writing code by specifing the source of terms in solrconfig.xml searchComponent name=spellcheck class=solr.SpellCheckComponent lst name=spellchecker str name=classnamesolr.IndexBasedSpellChecker/str str name=spellcheckIndexDir./spellchecker/str str name=fieldcontent/str str name=buildOnCommittrue/str /lst /searchComponent This example shows the results of a simple query that defines a query using the spellcheck.q parameter. The query also includes a spellcheck.build=true parameter, which is needs to be called only once in order to build the index. spellcheck.build should not be specified with for each request. http://localhost:8983/solr/spellCheckCompRH?q=*:*spellcheck.q=hell%20ultrasharspellcheck=truespellcheck.build=true lst name=spellcheck lst name=suggestions lst name=hell int name=numFound1/int int name=startOffset0/int int name=endOffset4/int arr name=suggestion strdell/str /arr /lst lst name=ultrashar int name=numFound1/int int name=startOffset5/int int name=endOffset14/int arr name=suggestion strultrasharp/str /arr /lst /lst /lst Once the suggestions are collected, they are ranked by the configured distance measure (Levenstein Distance by default) and then by aggregate frequency. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-properly-use-Levenstein-distance-with-in-Java-tp4164793p4164883.html Sent from the Solr - User mailing list archive at Nabble.com. -- Pozdrawiam / Best regards Aleksander Sadecki
Re: How to properly use Levenstein distance with ~ in Java
Because ~ is proximity matching. Lucene supports finding words are a within a specific distance away. Search for foo bar within 4 words from each other. foo bar~4 Note that for proximity searches, exact matches are proximity zero, and word transpositions (bar foo) are proximity 1. A query such as foo bar~1000 is an interesting alternative to foo AND bar. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-properly-use-Levenstein-distance-with-in-Java-tp4164793p4165079.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to properly use Levenstein distance with ~ in Java
You can use Levenstein Distance algorithm inside solr without writing code by specifing the source of terms in solrconfig.xml searchComponent name=spellcheck class=solr.SpellCheckComponent lst name=spellchecker str name=classnamesolr.IndexBasedSpellChecker/str str name=spellcheckIndexDir./spellchecker/str str name=fieldcontent/str str name=buildOnCommittrue/str /lst /searchComponent This example shows the results of a simple query that defines a query using the spellcheck.q parameter. The query also includes a spellcheck.build=true parameter, which is needs to be called only once in order to build the index. spellcheck.build should not be specified with for each request. http://localhost:8983/solr/spellCheckCompRH?q=*:*spellcheck.q=hell%20ultrasharspellcheck=truespellcheck.build=true lst name=spellcheck lst name=suggestions lst name=hell int name=numFound1/int int name=startOffset0/int int name=endOffset4/int arr name=suggestion strdell/str /arr /lst lst name=ultrashar int name=numFound1/int int name=startOffset5/int int name=endOffset14/int arr name=suggestion strultrasharp/str /arr /lst /lst /lst Once the suggestions are collected, they are ranked by the configured distance measure (Levenstein Distance by default) and then by aggregate frequency. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-properly-use-Levenstein-distance-with-in-Java-tp4164793p4164883.html Sent from the Solr - User mailing list archive at Nabble.com.