Re: full name free text search problem
You need to tokenize the full name in several different ways and then search both (all) tokenization versions with different boosts. This way you can tokenize as full string (perhaps lowercased) and then also on white space and then maybe even with phonetic mapping to catch spellings. You can see something similar in: https://gist.github.com/arafalov/5e04884e5aefaf46678c Regards, Alex. On 31 January 2018 at 05:48, Deepak Udapudiwrote: > Hi all, > > I have the below scenario in full name search that we are trying to implement. > > Solr configuration :- > > fieldType name="keywords_text" class="solr.TextField"> > > > > > > > > > > > > multiValued="true" /> > > > > > > Scenario :- > > Solr configuration has office name, facility name and the full name as > displayed above. > We are searching based on the input name with the records sorts by distance. > > Problem :- > > I am getting the records matching the full name sorted by distance. > If the input string(for ex Dae Kim) is provided, I am getting the records > other than Dae Kim(for ex Rodney Kim) too at the top of the search results > including Dae Kim > just before the next Dae Kim because Kim is matching with all the fields like > full name, facility name and the office name. So, the hit frequency is high > and it's > distance is less compared to the next Dae Kim in the search results with > higher distance. > > Expected results :- > > I want to see all the records for Dae Kim to be seen at the top of the search > results sorted by distance without any irrelevant results. > > Queries :- > > What is the fix for the above problem if anyone has faced it? > How do I handle the problem? > > Any inputs would be highly appreciated. > > Thanks in advance. > > Regards, > Deepak > > > > > The information contained in this email message and any attachments is > confidential and intended only for the addressee(s). If you are not an > addressee, you may not copy or disclose the information, or act upon it, and > you should delete it entirely from your email system. Please notify the > sender that you received this email in error.
Re: full name free text search problem
"I am getting the records matching the full name sorted by distance. If the input string(for ex Dae Kim) is provided, I am getting the records other than Dae Kim(for ex Rodney Kim) too at the top of the search results including Dae Kim just before the next Dae Kim because Kim is matching with all the fields like full name, facility name and the office name. So, the hit frequency is high and it's distance is less compared to the next Dae Kim in the search results with higher distance. " All is quite confused. First of all, sorted by distance, do you mean sorted by string distance ? By a space distance ? You are analysing the fields without tokenization and then you put everything in the same multivalued field. This means you are going to have just exact matches. And you lose the semantic of the field source ( which could have given a different score boost depending on the field) . If you want to sort or score by a string distance, you need to use function query sorting or boosting[1] In particular you are interested in strdist ( you find the details in the page linked). If it is geographical distance, take a look to the spatial module [2]. Regards [1] https://lucene.apache.org/solr/guide/6_6/function-queries.html [2] https://lucene.apache.org/solr/guide/6_6/spatial-search.html - --- Alessandro Benedetti Search Consultant, R Software Engineer, Director Sease Ltd. - www.sease.io -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: full name free text search problem
Hi Deepak, Look at the score of your response results. You can do this in Debug mode. Rahul. On Wed, Jan 31, 2018 at 4:18 AM, Deepak Udapudiwrote: > Hi all, > > I have the below scenario in full name search that we are trying to > implement. > > Solr configuration :- > > fieldType name="keywords_text" class="solr.TextField"> > > > > > >delimiter="/"/> > > > > > > multiValued="true" /> > > > > > > Scenario :- > > Solr configuration has office name, facility name and the full name as > displayed above. > We are searching based on the input name with the records sorts by > distance. > > Problem :- > > I am getting the records matching the full name sorted by distance. > If the input string(for ex Dae Kim) is provided, I am getting the records > other than Dae Kim(for ex Rodney Kim) too at the top of the search results > including Dae Kim > just before the next Dae Kim because Kim is matching with all the fields > like full name, facility name and the office name. So, the hit frequency is > high and it's > distance is less compared to the next Dae Kim in the search results with > higher distance. > > Expected results :- > > I want to see all the records for Dae Kim to be seen at the top of the > search results sorted by distance without any irrelevant results. > > Queries :- > > What is the fix for the above problem if anyone has faced it? > How do I handle the problem? > > Any inputs would be highly appreciated. > > Thanks in advance. > > Regards, > Deepak > > > > > The information contained in this email message and any attachments is > confidential and intended only for the addressee(s). If you are not an > addressee, you may not copy or disclose the information, or act upon it, > and you should delete it entirely from your email system. Please notify the > sender that you received this email in error. > -- "Learning is not necessary, neither is survival"
full name free text search problem
Hi all, I have the below scenario in full name search that we are trying to implement. Solr configuration :- fieldType name="keywords_text" class="solr.TextField"> Scenario :- Solr configuration has office name, facility name and the full name as displayed above. We are searching based on the input name with the records sorts by distance. Problem :- I am getting the records matching the full name sorted by distance. If the input string(for ex Dae Kim) is provided, I am getting the records other than Dae Kim(for ex Rodney Kim) too at the top of the search results including Dae Kim just before the next Dae Kim because Kim is matching with all the fields like full name, facility name and the office name. So, the hit frequency is high and it's distance is less compared to the next Dae Kim in the search results with higher distance. Expected results :- I want to see all the records for Dae Kim to be seen at the top of the search results sorted by distance without any irrelevant results. Queries :- What is the fix for the above problem if anyone has faced it? How do I handle the problem? Any inputs would be highly appreciated. Thanks in advance. Regards, Deepak The information contained in this email message and any attachments is confidential and intended only for the addressee(s). If you are not an addressee, you may not copy or disclose the information, or act upon it, and you should delete it entirely from your email system. Please notify the sender that you received this email in error.
Re: text search problem
Ravi, for the hyphen issue, try setting autoGeneratePhraseQueries=true for that fieldType (no re-index needed). As of 1.4, this defaults to false. One word of caution, autoGeneratePhraseQueries may not work as expected for langauges that aren't whitespace delimited. As Erick mentioned, the Analysis page will help you verify that your content and your queries are handled the way you expect them to be. See this thread for more info on autoGeneratePhraseQueries http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201202.mbox/%3c439f69a3-f292-482b-a102-7c011c576...@gmail.com%3E On Mon, Jul 21, 2014 at 8:42 PM, Erick Erickson erickerick...@gmail.com wrote: Try escaping the hyphen as \-. Or enclosing it all in quotes. But you _really_ have to spend some time with the debug option an admin/analysis page or you will find endless surprises. Best, Erick On Mon, Jul 21, 2014 at 11:12 AM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: Thanks for the reply Erick, I will try as you suggested. There I have another question related to this lines. When I have - in my description , name then the search results are different. For e.g. ABC-123 , it look sofr ABC or 123, I want to treat this search as exact match, i.e if my document has ABC-123 then I should get the results. When I check with hl-on, it has emABCem and get the results. How can I avoid this situation. Thanks Ravi -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Saturday, July 19, 2014 4:40 PM To: solr-user@lucene.apache.org Subject: Re: text search problem Try adding debug=all to the query and see what the parsed form of the query is, likely you're 1 using phrase queries, so broadway hotel requires both words in the 1 text or 2 if you're not using phrases, you're searching for the AND of the two terms. But debug=all will show you. Plus, take a look at the admin/analysis page, your tokenization may not be what you expect. Best, Erick On Fri, Jul 18, 2014 at 2:00 PM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: Hi, Below is the text_general field type when I search Text:Boradway it is not returning all the records, it returning only few records. But when I search for Text:*Broadway*, it is getting more records. When I get into multiple words ln search like Broadway Hotel, it may not get Broadway , HotelBroadway Hotel. DO you have any thought how to handle these type of keyword search. Text:Broadway,Vehicle Detailing,Water Systems,Vehicle Detailing,Car Wash Water Recovery My Field type look like this. fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.KStemFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1 catenateNumbers=1 catenateAll=1 preserveOriginal=0/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- /analyzer analyzer type=query charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.KStemFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1 catenateNumbers=1 catenateAll=1 preserveOriginal=0/ /analyzer /fieldType Do you have any thought the behavior or how to get this? Thanks Ravi
RE: text search problem
Thanks for the reply Erick, I will try as you suggested. There I have another question related to this lines. When I have - in my description , name then the search results are different. For e.g. ABC-123 , it look sofr ABC or 123, I want to treat this search as exact match, i.e if my document has ABC-123 then I should get the results. When I check with hl-on, it has emABCem and get the results. How can I avoid this situation. Thanks Ravi -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Saturday, July 19, 2014 4:40 PM To: solr-user@lucene.apache.org Subject: Re: text search problem Try adding debug=all to the query and see what the parsed form of the query is, likely you're 1 using phrase queries, so broadway hotel requires both words in the 1 text or 2 if you're not using phrases, you're searching for the AND of the two terms. But debug=all will show you. Plus, take a look at the admin/analysis page, your tokenization may not be what you expect. Best, Erick On Fri, Jul 18, 2014 at 2:00 PM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: Hi, Below is the text_general field type when I search Text:Boradway it is not returning all the records, it returning only few records. But when I search for Text:*Broadway*, it is getting more records. When I get into multiple words ln search like Broadway Hotel, it may not get Broadway , HotelBroadway Hotel. DO you have any thought how to handle these type of keyword search. Text:Broadway,Vehicle Detailing,Water Systems,Vehicle Detailing,Car Wash Water Recovery My Field type look like this. fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.KStemFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1 catenateNumbers=1 catenateAll=1 preserveOriginal=0/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- /analyzer analyzer type=query charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.KStemFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1 catenateNumbers=1 catenateAll=1 preserveOriginal=0/ /analyzer /fieldType Do you have any thought the behavior or how to get this? Thanks Ravi
Re: text search problem
Try escaping the hyphen as \-. Or enclosing it all in quotes. But you _really_ have to spend some time with the debug option an admin/analysis page or you will find endless surprises. Best, Erick On Mon, Jul 21, 2014 at 11:12 AM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: Thanks for the reply Erick, I will try as you suggested. There I have another question related to this lines. When I have - in my description , name then the search results are different. For e.g. ABC-123 , it look sofr ABC or 123, I want to treat this search as exact match, i.e if my document has ABC-123 then I should get the results. When I check with hl-on, it has emABCem and get the results. How can I avoid this situation. Thanks Ravi -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Saturday, July 19, 2014 4:40 PM To: solr-user@lucene.apache.org Subject: Re: text search problem Try adding debug=all to the query and see what the parsed form of the query is, likely you're 1 using phrase queries, so broadway hotel requires both words in the 1 text or 2 if you're not using phrases, you're searching for the AND of the two terms. But debug=all will show you. Plus, take a look at the admin/analysis page, your tokenization may not be what you expect. Best, Erick On Fri, Jul 18, 2014 at 2:00 PM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: Hi, Below is the text_general field type when I search Text:Boradway it is not returning all the records, it returning only few records. But when I search for Text:*Broadway*, it is getting more records. When I get into multiple words ln search like Broadway Hotel, it may not get Broadway , HotelBroadway Hotel. DO you have any thought how to handle these type of keyword search. Text:Broadway,Vehicle Detailing,Water Systems,Vehicle Detailing,Car Wash Water Recovery My Field type look like this. fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.KStemFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1 catenateNumbers=1 catenateAll=1 preserveOriginal=0/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- /analyzer analyzer type=query charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.KStemFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1 catenateNumbers=1 catenateAll=1 preserveOriginal=0/ /analyzer /fieldType Do you have any thought the behavior or how to get this? Thanks Ravi
Re: text search problem
Try adding debug=all to the query and see what the parsed form of the query is, likely you're 1 using phrase queries, so broadway hotel requires both words in the text or 2 if you're not using phrases, you're searching for the AND of the two terms. But debug=all will show you. Plus, take a look at the admin/analysis page, your tokenization may not be what you expect. Best, Erick On Fri, Jul 18, 2014 at 2:00 PM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: Hi, Below is the text_general field type when I search Text:Boradway it is not returning all the records, it returning only few records. But when I search for Text:*Broadway*, it is getting more records. When I get into multiple words ln search like Broadway Hotel, it may not get Broadway , HotelBroadway Hotel. DO you have any thought how to handle these type of keyword search. Text:Broadway,Vehicle Detailing,Water Systems,Vehicle Detailing,Car Wash Water Recovery My Field type look like this. fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.KStemFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1 catenateNumbers=1 catenateAll=1 preserveOriginal=0/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- /analyzer analyzer type=query charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.KStemFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1 catenateNumbers=1 catenateAll=1 preserveOriginal=0/ /analyzer /fieldType Do you have any thought the behavior or how to get this? Thanks Ravi
text search problem
Hi, Below is the text_general field type when I search Text:Boradway it is not returning all the records, it returning only few records. But when I search for Text:*Broadway*, it is getting more records. When I get into multiple words ln search like Broadway Hotel, it may not get Broadway , Hotel Broadway Hotel. DO you have any thought how to handle these type of keyword search. Text:Broadway,Vehicle Detailing,Water Systems,Vehicle Detailing,Car Wash Water Recovery My Field type look like this. fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.KStemFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1 catenateNumbers=1 catenateAll=1 preserveOriginal=0/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- /analyzer analyzer type=query charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.KStemFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1 catenateNumbers=1 catenateAll=1 preserveOriginal=0/ /analyzer /fieldType Do you have any thought the behavior or how to get this? Thanks Ravi