Levenstein Distance
I have a list of synoynms which is being expanded at query time. This yields a lot of results (in millions). My use-case is name search. I want to sort the results by Levenstein Distance. I know this can be done with strdist function. But sorting being inefficient and Solr function adding to its woes kills the performance. I want the results to be returned as quickly as possible. One of the ways which I think Levenstein can work is, applying the strdist on the synonym file and getting the scores of each of the synonym. And then use these scores to boost the results appropriately, it should be equivalent to levenstein distance. But I am not sure how to do this in Solr or infact if Solr supports this. -- View this message in context: http://lucene.472066.n3.nabble.com/Levenstein-Distance-tp3988026.html Sent from the Solr - User mailing list archive at Nabble.com.
Sorting performance
Here is the usecase: I am using synonym expansion at query time to get results. this is essentially a name search, so a search for Jim may be expanded at query time for James, Jung, Jimmy, etc. So ranking fields like TF, IDF, Norms do not mean anything to me. I just reset them to zero. so all the results which I get have the same rank. I have used a copy field to boost the weights of exact match, so Jim would be boosted to the top. However I want the other results like Jimmy, Jung, James to be sorted by Levenstein Distance with respect to word Jim (the original query). The number of results returned are quite large. So a genereal strdist sort takes 6-7 seconds. Is there any other option than applying a sort= in the query to achieve the same functionality? Any particular way to index the data to achieve the same result? any idea to boost the performance and get the intended functionality? -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-performance-tp3987633.html Sent from the Solr - User mailing list archive at Nabble.com.
Sorting performance
Here is the usecase: I am using synonym expansion at query time to get results. this is essentially a name search, so a search for Jim may be expanded at query time for James, Jung, Jimmy, etc. So ranking fields like TF, IDF, Norms do not mean anything to me. I just reset them to zero. so all the results which I get have the same rank. I have used a copy field to boost the weights of exact match, so Jim would be boosted to the top. However I want the other results like Jimmy, Jung, James to be sorted by Levenstein Distance with respect to word Jim (the original query). The number of results returned are quite large. So a genereal strdist sort takes 6-7 seconds. Is there any other option than applying a sort= in the query to achieve the same functionality? Any particular way to index the data to achieve the same result? any idea to boost the performance and get the intended functionality? -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-performance-tp3987632.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Difference between textfield and strfield
is there any other option to sorting. I mean, sorting can affect query performance. Is there a way to embed this into Solr and not have a toll on the system, I tried boosting the scores based on strdist, but that seems to bring in more results than expected. -- View this message in context: http://lucene.472066.n3.nabble.com/Difference-between-textfield-and-strfield-tp3986916p3987338.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Difference between textfield and strfield
I cannot move from textfield to strfield, since I am using synonym expansion. Is there anything we can do on textfield itself -- View this message in context: http://lucene.472066.n3.nabble.com/Difference-between-textfield-and-strfield-tp3986916p3986938.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Difference between textfield and strfield
Well the I do not have phrases for synonym expansion. So it does work well. The synonym expansion is done at query time. And since i am just searching against the first name field, tf, idf and other ranking parameters do not make sense, hence their weight has been initialized to 1. So after applying synonym expansion I am getting results in random word format. the Results are perfect just that they are not ordered by Levenstein distance of the original query. So the use case is if use enters query ab it gets expanded at query time to abc,abxy,aberfg And I get results for ab, abc, abxy, aberfg. But I want them to be sorted by Levenstein distance from the original query (ab) So order shoud be ab abc abxy aberfg .. ! TextField makes this even more difficult? Any other suggestions? Spellcheckers? Ngrams? -- View this message in context: http://lucene.472066.n3.nabble.com/Difference-between-textfield-and-strfield-tp3986916p3986928.html Sent from the Solr - User mailing list archive at Nabble.com.
Difference between textfield and strfield
Hi, Can anyone explain me the basic pros and cons between textfield and strfield. I am trying to use Levenstein distance on textfield, but it seems that it can only be applied on the strfield. So my question is whats the difference between the 2 and what are the radical advantages of one over the other Currently I have the text field defined for first_name and i apply synonym expansion at query time to this field. -- View this message in context: http://lucene.472066.n3.nabble.com/Difference-between-textfield-and-strfield-tp3986916.html Sent from the Solr - User mailing list archive at Nabble.com.
Relevancy ranking for synonym matches
I was wondering if there is any solution for this. Currently I expand my results to match the synonyms at query time. So if I entered James, I would get results for Jim, Gomes, Game etc as they would be expanded by matching the synonyms for James. But then since this is just a one word match, tf, idf and other parameters dont make sense. I have reset those factors to 1. Hence the results I get have an equal score. What I really want to do is, sort these results by Levenstein Distance without using ~ sign. The issue in using ~ sign is, if I have a synonym which is radically different (say Greg for James), if I use James~0, Greg would not even match closely with James and the number of results returned would be less than the actual number of synonym matches. So my usecase is, without reducing the number of results, I want to sort them by Levenstein Distance, or closest string match to the original query -- View this message in context: http://lucene.472066.n3.nabble.com/Relevancy-ranking-for-synonym-matches-tp3986634.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr boost relevancy
Wait, I thought the fuzzy match is invoked with a ~. I am not invoking any ~ but expanding my query terms with the synonyms at query time. So from what I understarnd, when I query for James, internally, Solr would expand using synonym search to James, Jim, Games, Jameson. So I guess, the original information about the query is lost and it returns you the results matched for Games, Jameson, Jim and James in any order (since I normalized the scores). Using a copy field for James would return results for James as top results but I dont see the other 3 keywords being arranged by Levenstein Distance. Or am I thinking in the wrong direction? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-boost-relevancy-tp3986200p3986283.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr boost relevancy
Hi Lori, Yeah. I thought exactly of the same solution. Use a copy field and boost the relevancy of the the exact match. But my question is more broad here. For eg, if i have a synonym for James as Jim, Games, Jimmy, Jameson And if I normalize the tf, norm, etc factors to 1, on searching for James I could get Jameson and Jim as my top matches since now the score of all the documents is 1. Definitely, having a copy field for James and then boosting relevancy of James would put James as the top result. But what after James, the order of results for the other synonyms is still skewed. By Levenstein distance, I would want Games to be the next set of results and probably Jameson as next. How do I achieve that? Thats my bigger question? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-boost-relevancy-tp3986200p3986280.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr boost relevancy
Consider a db of just names. Now if I use synonym expansion at query time, I get a set of results. (Background: I created a class, which resets idf, tf, .. .all to 1) since they dont matter to me anymore. What really matters is, how closely does the query match to the given name. Currently I am getting all results with the same score (makes sense since I reset all the factors to 1), but how do I rank now depending on the closeness of match. P.S: the query is being exapanded at query time to match all the documents from the synonyms. I want to make sure that if I enter "Raj" , i get Raj as the topmost results and the synonyms like "Raju" to be after that. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-boost-relevancy-tp3986200.html Sent from the Solr - User mailing list archive at Nabble.com.