AutoSuggest with custom sorting
Hi I am supposed to implement auto suggest where the prefix matches are sorted based on the following criteria. We have two fields (max characters ~ 100) that we need to search. Field 1 has only one word (no spaces) where as Field2 has multiple words separated by spaces. Example - Row1 ```Field1 - ROFL Field2 - Rolls on the floor laughing Row2 Field1: IRLL Field2 - Rolling Row3 Field1 - IRLTR Field2 - I Roll 1. Results matching field1 should be ranked higher. Results matching the first word of Field2 should be ranked higher than any subsequent matches. If you search for "RO*" in the above example the ranking should be Row1->Row2->Row3. 2.The next sort parameter is the length of the word. So, if you are searching for IR, Row2 (2 out of 4 ) matches higher than Row3 (2 out of 5). 3. The final sort parameter is an integer field that we already have as part of the schema. Any help or pointers will be deeply appreciated. -Papiya Pink OTC Markets Inc. provides the leading inter-dealer quotation and trading system in the over-the-counter (OTC) securities market. We create innovative technology and data solutions to efficiently connect market participants, improve price discovery, increase issuer disclosure, and better inform investors. Our marketplace, comprised of the issuer-listed OTCQX and broker-quoted Pink Sheets, is the third largest U.S. equity trading venue for company shares. This document contains confidential information of Pink OTC Markets and is only intended for the recipient. Do not copy, reproduce (electronically or otherwise), or disclose without the prior written consent of Pink OTC Markets. If you receive this message in error, please destroy all copies in your possession (electronically or otherwise) and contact the sender above.
Re: AutoSuggest with custom sorting
I guess my basic issue is that Solr scores all matches for prefix searches equally. Any way to score PINK over PINKSHEETS when you are searching for PI ? Thanks Papiya Papiya Misra wrote: Hi I am supposed to implement auto suggest where the prefix matches are sorted based on the following criteria. We have two fields (max characters ~ 100) that we need to search. Field 1 has only one word (no spaces) where as Field2 has multiple words separated by spaces. Example - Row1 ```Field1 - ROFL Field2 - Rolls on the floor laughing Row2 Field1: IRLL Field2 - Rolling Row3 Field1 - IRLTR Field2 - I Roll 1. Results matching field1 should be ranked higher. Results matching the first word of Field2 should be ranked higher than any subsequent matches. If you search for "RO*" in the above example the ranking should be Row1->Row2->Row3. 2.The next sort parameter is the length of the word. So, if you are searching for IR, Row2 (2 out of 4 ) matches higher than Row3 (2 out of 5). 3. The final sort parameter is an integer field that we already have as part of the schema. Any help or pointers will be deeply appreciated. -Papiya Pink OTC Markets Inc. provides the leading inter-dealer quotation and trading system in the over-the-counter (OTC) securities market. We create innovative technology and data solutions to efficiently connect market participants, improve price discovery, increase issuer disclosure, and better inform investors. Our marketplace, comprised of the issuer-listed OTCQX and broker-quoted Pink Sheets, is the third largest U.S. equity trading venue for company shares. This document contains confidential information of Pink OTC Markets and is only intended for the recipient. Do not copy, reproduce (electronically or otherwise), or disclose without the prior written consent of Pink OTC Markets. If you receive this message in error, please destroy all copies in your possession (electronically or otherwise) and contact the sender above.
Re: AutoSuggest with custom sorting
First off: i would suggest that instead of doing a simple prefix search, you look into using EdgeNGrams for this sort of thing. I'm also assuming since you need custom scoring for this, you aren't going to get what you need using the TermsComponent or any other simple solution using your main corpus -- it would make more sense to setup a special index consisting of one document per "term" to include in your autosuggest. : > 1. Results matching field1 should be ranked higher. Results matching the easily done with dismax .. even if you are using EdgeNGrams (just make sure you have EdgeNGrams on at index time, but not at query time) : > 2.The next sort parameter is the length of the word. So, if you are : > searching for IR, Row2 (2 out of 4 ) matches higher than Row3 (2 out of 5). this can be accomplished by indexing a numeric field containing the "length" of the field as a number, and then doing a secondary sort on it. the fieldNorm typically takes care of this sort of thing for you, but is more of a generalized concept, and doesn't give you exact precision for small numbers. -Hoss
Re: AutoSuggest with custom sorting
Chris Hostetter wrote: this can be accomplished by indexing a numeric field containing the "length" of the field as a number, and then doing a secondary sort on it. the fieldNorm typically takes care of this sort of thing for you, but is more of a generalized concept, and doesn't give you exact precision for small numbers Or see https://issues.apache.org/jira/browse/LUCENE-1360 if you don't want to index a field length. -Sean
Re: AutoSuggest with custom sorting
This was extremely helpful. Thanks a lot. On 05/04/2010 05:30 PM, Chris Hostetter wrote: First off: i would suggest that instead of doing a simple prefix search, you look into using EdgeNGrams for this sort of thing. I'm also assuming since you need custom scoring for this, you aren't going to get what you need using the TermsComponent or any other simple solution using your main corpus -- it would make more sense to setup a special index consisting of one document per "term" to include in your autosuggest. :> 1. Results matching field1 should be ranked higher. Results matching the easily done with dismax .. even if you are using EdgeNGrams (just make sure you have EdgeNGrams on at index time, but not at query time) :> 2.The next sort parameter is the length of the word. So, if you are :> searching for IR, Row2 (2 out of 4 ) matches higher than Row3 (2 out of 5). this can be accomplished by indexing a numeric field containing the "length" of the field as a number, and then doing a secondary sort on it. the fieldNorm typically takes care of this sort of thing for you, but is more of a generalized concept, and doesn't give you exact precision for small numbers. -Hoss Pink OTC Markets Inc. provides the leading inter-dealer quotation and trading system in the over-the-counter (OTC) securities market. We create innovative technology and data solutions to efficiently connect market participants, improve price discovery, increase issuer disclosure, and better inform investors. Our marketplace, comprised of the issuer-listed OTCQX and broker-quoted Pink Sheets, is the third largest U.S. equity trading venue for company shares. This document contains confidential information of Pink OTC Markets and is only intended for the recipient. Do not copy, reproduce (electronically or otherwise), or disclose without the prior written consent of Pink OTC Markets. If you receive this message in error, please destroy all copies in your possession (electronically or otherwise) and contact the sender above.