Re: full name free text search problem

2018-01-31 Thread Alexandre Rafalovitch
You need to tokenize the full name in several different ways and then
search both (all) tokenization versions with different boosts.

This way you can tokenize as full string (perhaps lowercased) and then
also on white space and then maybe even with phonetic mapping to catch
spellings.

You can see something similar in:
https://gist.github.com/arafalov/5e04884e5aefaf46678c

Regards,
   Alex.

On 31 January 2018 at 05:48, Deepak Udapudi  wrote:
> Hi all,
>
> I have the below scenario in full name search that we are trying to implement.
>
> Solr configuration :-
>
> fieldType name="keywords_text" class="solr.TextField">
> 
>   
>   
> 
> 
>   
>   
> 
>   
>
>
>  multiValued="true" />
>   
>   
>   
> 
>
> Scenario :-
>
> Solr configuration has office name, facility name and the full name as 
> displayed above.
> We are searching based on the input name with the records sorts by distance.
>
> Problem :-
>
> I am getting the records matching the full name sorted by distance.
> If the input string(for ex Dae Kim) is provided, I am getting the records 
> other than Dae Kim(for ex Rodney Kim) too at the top of the search results 
> including Dae Kim
> just before the next Dae Kim because Kim is matching with all the fields like 
> full name, facility name and the office name. So, the hit frequency is high 
> and it's
> distance is less compared to the next Dae Kim in the search results with 
> higher distance.
>
> Expected results :-
>
> I want to see all the records for Dae Kim to be seen at the top of the search 
> results sorted by distance without any irrelevant results.
>
> Queries :-
>
> What is the fix for the above problem if anyone has faced it?
> How do I handle the problem?
>
> Any inputs would be highly appreciated.
>
> Thanks in advance.
>
> Regards,
> Deepak
>
>
>
>
> The information contained in this email message and any attachments is 
> confidential and intended only for the addressee(s). If you are not an 
> addressee, you may not copy or disclose the information, or act upon it, and 
> you should delete it entirely from your email system. Please notify the 
> sender that you received this email in error.


Re: full name free text search problem

2018-01-31 Thread Alessandro Benedetti
"I am getting the records matching the full name sorted by distance. 
If the input string(for ex Dae Kim) is provided, I am getting the records
other than Dae Kim(for ex Rodney Kim) too at the top of the search results
including Dae Kim 
just before the next Dae Kim because Kim is matching with all the fields
like full name, facility name and the office name. So, the hit frequency is
high and it's 
distance is less compared to the next Dae Kim in the search results with
higher distance. "

All is quite confused.
First of all, sorted by distance, do you mean sorted by string distance ?
By a space distance ?
You are analysing the fields without tokenization and then you put
everything in the same multivalued field.
This means you are going to have just exact matches.
And you lose the semantic of the field source ( which could have given a
different score boost depending on the field) .

If you want to sort or score by a string distance, you need to use function
query sorting or boosting[1]
In particular you are interested in strdist ( you find the details in the
page linked).
If it is geographical distance, take a look to the spatial module [2].

Regards

[1] https://lucene.apache.org/solr/guide/6_6/function-queries.html
[2] https://lucene.apache.org/solr/guide/6_6/spatial-search.html



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: full name free text search problem

2018-01-30 Thread Rahul Sood
Hi Deepak,
Look at the score of your response results.
You can do this in Debug mode.
Rahul.

On Wed, Jan 31, 2018 at 4:18 AM, Deepak Udapudi  wrote:

> Hi all,
>
> I have the below scenario in full name search that we are trying to
> implement.
>
> Solr configuration :-
>
> fieldType name="keywords_text" class="solr.TextField">
> 
>   
>   
> 
> 
>delimiter="/"/>
>   
> 
>   
>
>
>  multiValued="true" />
>   
>   
>   
> 
>
> Scenario :-
>
> Solr configuration has office name, facility name and the full name as
> displayed above.
> We are searching based on the input name with the records sorts by
> distance.
>
> Problem :-
>
> I am getting the records matching the full name sorted by distance.
> If the input string(for ex Dae Kim) is provided, I am getting the records
> other than Dae Kim(for ex Rodney Kim) too at the top of the search results
> including Dae Kim
> just before the next Dae Kim because Kim is matching with all the fields
> like full name, facility name and the office name. So, the hit frequency is
> high and it's
> distance is less compared to the next Dae Kim in the search results with
> higher distance.
>
> Expected results :-
>
> I want to see all the records for Dae Kim to be seen at the top of the
> search results sorted by distance without any irrelevant results.
>
> Queries :-
>
> What is the fix for the above problem if anyone has faced it?
> How do I handle the problem?
>
> Any inputs would be highly appreciated.
>
> Thanks in advance.
>
> Regards,
> Deepak
>
>
>
>
> The information contained in this email message and any attachments is
> confidential and intended only for the addressee(s). If you are not an
> addressee, you may not copy or disclose the information, or act upon it,
> and you should delete it entirely from your email system. Please notify the
> sender that you received this email in error.
>



-- 
"Learning is not necessary, neither is survival"


full name free text search problem

2018-01-30 Thread Deepak Udapudi
Hi all,

I have the below scenario in full name search that we are trying to implement.

Solr configuration :-

fieldType name="keywords_text" class="solr.TextField">

  
  


  
  

  



  
  
  


Scenario :-

Solr configuration has office name, facility name and the full name as 
displayed above.
We are searching based on the input name with the records sorts by distance.

Problem :-

I am getting the records matching the full name sorted by distance.
If the input string(for ex Dae Kim) is provided, I am getting the records other 
than Dae Kim(for ex Rodney Kim) too at the top of the search results including 
Dae Kim
just before the next Dae Kim because Kim is matching with all the fields like 
full name, facility name and the office name. So, the hit frequency is high and 
it's
distance is less compared to the next Dae Kim in the search results with higher 
distance.

Expected results :-

I want to see all the records for Dae Kim to be seen at the top of the search 
results sorted by distance without any irrelevant results.

Queries :-

What is the fix for the above problem if anyone has faced it?
How do I handle the problem?

Any inputs would be highly appreciated.

Thanks in advance.

Regards,
Deepak




The information contained in this email message and any attachments is 
confidential and intended only for the addressee(s). If you are not an 
addressee, you may not copy or disclose the information, or act upon it, and 
you should delete it entirely from your email system. Please notify the sender 
that you received this email in error.


Re: text search problem

2014-07-23 Thread Josh Lincoln
Ravi, for the hyphen issue, try setting autoGeneratePhraseQueries=true for
that fieldType (no re-index needed). As of 1.4, this defaults to false. One
word of caution, autoGeneratePhraseQueries may not work as expected for
langauges that aren't whitespace delimited. As Erick mentioned, the
Analysis page will help you verify that your content and your queries are
handled the way you expect them to be.

See this thread for more info on autoGeneratePhraseQueries
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201202.mbox/%3c439f69a3-f292-482b-a102-7c011c576...@gmail.com%3E


On Mon, Jul 21, 2014 at 8:42 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Try escaping the hyphen as \-. Or enclosing it all
 in quotes.

 But you _really_ have to spend some time with the debug option
 an admin/analysis page or you will find endless surprises.

 Best,
 Erick


 On Mon, Jul 21, 2014 at 11:12 AM, EXTERNAL Taminidi Ravi (ETI,
 Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote:

 
  Thanks for the reply Erick, I will try as you suggested. There I have
   another question related to this lines.
 
  When I have - in my description , name then the search results are
  different. For e.g.
 
  ABC-123 , it look sofr ABC or 123, I want to treat this search as exact
  match, i.e if my document has ABC-123 then I should get the results.
 
  When I check with hl-on, it has emABCem and get the results. How can
  I avoid this situation.
 
  Thanks
 
  Ravi
 
 
  -Original Message-
  From: Erick Erickson [mailto:erickerick...@gmail.com]
  Sent: Saturday, July 19, 2014 4:40 PM
  To: solr-user@lucene.apache.org
  Subject: Re: text search problem
 
  Try adding debug=all to the query and see what the parsed form of the
  query is, likely you're
  1 using phrase queries, so broadway hotel requires both words in the
  1 text
  or
  2 if you're not using phrases, you're searching for the AND of the two
  terms.
 
  But debug=all will show you.
 
  Plus, take a look at the admin/analysis page, your tokenization may not
 be
  what you expect.
 
  Best,
  Erick
 
 
  On Fri, Jul 18, 2014 at 2:00 PM, EXTERNAL Taminidi Ravi (ETI,
  Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com
 wrote:
 
   Hi,  Below is the text_general field type when I search Text:Boradway
   it is not returning all the records, it returning only few records.
   But when I search for Text:*Broadway*, it is getting more records.
   When I get into multiple words ln search like Broadway Hotel, it may
   not get Broadway , HotelBroadway Hotel. DO you have any
   thought how to handle these type of keyword search.
  
   Text:Broadway,Vehicle Detailing,Water Systems,Vehicle Detailing,Car
   Wash Water Recovery
  
   My Field type look like this.
  
   fieldType name=text_general class=solr.TextField
   positionIncrementGap=100
 analyzer type=index
charFilter class=solr.HTMLStripCharFilterFactory /
 tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true
   words=stopwords.txt /
 filter class=solr.KStemFilterFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.WordDelimiterFilterFactory
   generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0
   splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1
   catenateNumbers=1 catenateAll=1 preserveOriginal=0/
  
 !-- in this example, we will only use synonyms at query
  time
   filter class=solr.SynonymFilterFactory
   synonyms=index_synonyms.txt ignoreCase=true expand=false/
   --
  
 /analyzer
 analyzer type=query
charFilter class=solr.HTMLStripCharFilterFactory /
tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.KStemFilterFactory/
   filter class=solr.StopFilterFactory ignoreCase=true
   words=stopwords.txt /
   filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt
   ignoreCase=true expand=true/
   filter class=solr.LowerCaseFilterFactory/
 filter class=solr.WordDelimiterFilterFactory
   generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0
   splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1
   catenateNumbers=1 catenateAll=1 preserveOriginal=0/
  
/analyzer
   /fieldType
  
  
  
   Do you have any thought the behavior or how to get this?
  
   Thanks
  
   Ravi
  
 



RE: text search problem

2014-07-21 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)

Thanks for the reply Erick, I will try as you suggested. There I have  another 
question related to this lines.

When I have - in my description , name then the search results are different. 
For e.g.

ABC-123 , it look sofr ABC or 123, I want to treat this search as exact 
match, i.e if my document has ABC-123 then I should get the results. 

When I check with hl-on, it has emABCem and get the results. How can I 
avoid this situation.

Thanks

Ravi


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Saturday, July 19, 2014 4:40 PM
To: solr-user@lucene.apache.org
Subject: Re: text search problem

Try adding debug=all to the query and see what the parsed form of the query 
is, likely you're
1 using phrase queries, so broadway hotel requires both words in the 
1 text
or
2 if you're not using phrases, you're searching for the AND of the two
terms.

But debug=all will show you.

Plus, take a look at the admin/analysis page, your tokenization may not be what 
you expect.

Best,
Erick


On Fri, Jul 18, 2014 at 2:00 PM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote:

 Hi,  Below is the text_general field type when I search Text:Boradway  
 it is not returning all the records, it returning only few records. 
 But when I search for Text:*Broadway*, it is getting more records. 
 When I get into multiple words ln search like Broadway Hotel, it may 
 not get Broadway , HotelBroadway Hotel. DO you have any 
 thought how to handle these type of keyword search.

 Text:Broadway,Vehicle Detailing,Water Systems,Vehicle Detailing,Car 
 Wash Water Recovery

 My Field type look like this.

 fieldType name=text_general class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
  charFilter class=solr.HTMLStripCharFilterFactory /
   tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
   filter class=solr.KStemFilterFactory/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.WordDelimiterFilterFactory
 generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0
 splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1
 catenateNumbers=1 catenateAll=1 preserveOriginal=0/

   !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --

   /analyzer
   analyzer type=query
  charFilter class=solr.HTMLStripCharFilterFactory /
  tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.KStemFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.LowerCaseFilterFactory/
   filter class=solr.WordDelimiterFilterFactory
 generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0
 splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1
 catenateNumbers=1 catenateAll=1 preserveOriginal=0/

  /analyzer
 /fieldType



 Do you have any thought the behavior or how to get this?

 Thanks

 Ravi



Re: text search problem

2014-07-21 Thread Erick Erickson
Try escaping the hyphen as \-. Or enclosing it all
in quotes.

But you _really_ have to spend some time with the debug option
an admin/analysis page or you will find endless surprises.

Best,
Erick


On Mon, Jul 21, 2014 at 11:12 AM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote:


 Thanks for the reply Erick, I will try as you suggested. There I have
  another question related to this lines.

 When I have - in my description , name then the search results are
 different. For e.g.

 ABC-123 , it look sofr ABC or 123, I want to treat this search as exact
 match, i.e if my document has ABC-123 then I should get the results.

 When I check with hl-on, it has emABCem and get the results. How can
 I avoid this situation.

 Thanks

 Ravi


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Saturday, July 19, 2014 4:40 PM
 To: solr-user@lucene.apache.org
 Subject: Re: text search problem

 Try adding debug=all to the query and see what the parsed form of the
 query is, likely you're
 1 using phrase queries, so broadway hotel requires both words in the
 1 text
 or
 2 if you're not using phrases, you're searching for the AND of the two
 terms.

 But debug=all will show you.

 Plus, take a look at the admin/analysis page, your tokenization may not be
 what you expect.

 Best,
 Erick


 On Fri, Jul 18, 2014 at 2:00 PM, EXTERNAL Taminidi Ravi (ETI,
 Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote:

  Hi,  Below is the text_general field type when I search Text:Boradway
  it is not returning all the records, it returning only few records.
  But when I search for Text:*Broadway*, it is getting more records.
  When I get into multiple words ln search like Broadway Hotel, it may
  not get Broadway , HotelBroadway Hotel. DO you have any
  thought how to handle these type of keyword search.
 
  Text:Broadway,Vehicle Detailing,Water Systems,Vehicle Detailing,Car
  Wash Water Recovery
 
  My Field type look like this.
 
  fieldType name=text_general class=solr.TextField
  positionIncrementGap=100
analyzer type=index
   charFilter class=solr.HTMLStripCharFilterFactory /
tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt /
filter class=solr.KStemFilterFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.WordDelimiterFilterFactory
  generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0
  splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1
  catenateNumbers=1 catenateAll=1 preserveOriginal=0/
 
!-- in this example, we will only use synonyms at query
 time
  filter class=solr.SynonymFilterFactory
  synonyms=index_synonyms.txt ignoreCase=true expand=false/
  --
 
/analyzer
analyzer type=query
   charFilter class=solr.HTMLStripCharFilterFactory /
   tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.KStemFilterFactory/
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt /
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true/
  filter class=solr.LowerCaseFilterFactory/
filter class=solr.WordDelimiterFilterFactory
  generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0
  splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1
  catenateNumbers=1 catenateAll=1 preserveOriginal=0/
 
   /analyzer
  /fieldType
 
 
 
  Do you have any thought the behavior or how to get this?
 
  Thanks
 
  Ravi
 



Re: text search problem

2014-07-19 Thread Erick Erickson
Try adding debug=all to the query and see what the parsed form of the query
is, likely you're
1 using phrase queries, so broadway hotel requires both words in the text
or
2 if you're not using phrases, you're searching for the AND of the two
terms.

But debug=all will show you.

Plus, take a look at the admin/analysis page, your tokenization may not be
what
you expect.

Best,
Erick


On Fri, Jul 18, 2014 at 2:00 PM, EXTERNAL Taminidi Ravi (ETI,
Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote:

 Hi,  Below is the text_general field type when I search Text:Boradway  it
 is not returning all the records, it returning only few records. But when I
 search for Text:*Broadway*, it is getting more records. When I get into
 multiple words ln search like Broadway Hotel, it may not get Broadway ,
 HotelBroadway Hotel. DO you have any thought how to handle these
 type of keyword search.

 Text:Broadway,Vehicle Detailing,Water Systems,Vehicle Detailing,Car Wash
 Water Recovery

 My Field type look like this.

 fieldType name=text_general class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
  charFilter class=solr.HTMLStripCharFilterFactory /
   tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
   filter class=solr.KStemFilterFactory/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.WordDelimiterFilterFactory
 generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0
 splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1
 catenateNumbers=1 catenateAll=1 preserveOriginal=0/

   !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --

   /analyzer
   analyzer type=query
  charFilter class=solr.HTMLStripCharFilterFactory /
  tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.KStemFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.LowerCaseFilterFactory/
   filter class=solr.WordDelimiterFilterFactory
 generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0
 splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1
 catenateNumbers=1 catenateAll=1 preserveOriginal=0/

  /analyzer
 /fieldType



 Do you have any thought the behavior or how to get this?

 Thanks

 Ravi



text search problem

2014-07-18 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
Hi,  Below is the text_general field type when I search Text:Boradway  it is 
not returning all the records, it returning only few records. But when I search 
for Text:*Broadway*, it is getting more records. When I get into multiple words 
ln search like Broadway Hotel, it may not get Broadway , Hotel
Broadway Hotel. DO you have any thought how to handle these type of keyword 
search.

Text:Broadway,Vehicle Detailing,Water Systems,Vehicle Detailing,Car Wash Water 
Recovery

My Field type look like this.

fieldType name=text_general class=solr.TextField 
positionIncrementGap=100
  analyzer type=index
 charFilter class=solr.HTMLStripCharFilterFactory /
  tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt /
  filter class=solr.KStemFilterFactory/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.WordDelimiterFilterFactory 
generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0 
splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1 
catenateNumbers=1 catenateAll=1 preserveOriginal=0/

  !-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt 
ignoreCase=true expand=false/
--

  /analyzer
  analyzer type=query
 charFilter class=solr.HTMLStripCharFilterFactory /
 tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.KStemFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
  filter class=solr.WordDelimiterFilterFactory 
generateWordParts=0 generateNumberParts=0 splitOnCaseChange=0 
splitOnNumerics=0 stemEnglishPossessive=0 catenateWords=1 
catenateNumbers=1 catenateAll=1 preserveOriginal=0/

 /analyzer
/fieldType



Do you have any thought the behavior or how to get this?

Thanks

Ravi