AutoSuggest with custom sorting

2010-04-26 Thread Papiya Misra

Hi

I am supposed to implement auto suggest where the prefix matches are
sorted based on the following criteria.

We have two fields (max characters ~ 100) that we need to search. Field
1 has only one word (no spaces) where as Field2 has multiple words
separated by spaces.

Example -

Row1
```Field1 - ROFL
Field2 - Rolls on the floor laughing
Row2
   Field1: IRLL
   Field2 - Rolling
Row3
   Field1 -  IRLTR
   Field2 - I Roll


1. Results matching field1 should be ranked higher. Results matching the
first word of Field2 should be ranked higher than any subsequent matches.
   If you search for "RO*" in the above example the ranking should be
Row1->Row2->Row3.

2.The next sort parameter is the length of the word. So, if you are
searching for IR, Row2 (2 out of 4 ) matches higher than Row3 (2 out of 5).

3. The final sort parameter is an integer field that we already have as
part of the schema.

Any help or pointers will be deeply appreciated.

-Papiya

Pink OTC Markets Inc. provides the leading inter-dealer quotation and trading 
system in the over-the-counter (OTC) securities market.   We create innovative 
technology and data solutions to efficiently connect market participants, 
improve price discovery, increase issuer disclosure, and better inform 
investors.   Our marketplace, comprised of the issuer-listed OTCQX and 
broker-quoted   Pink Sheets, is the third largest U.S. equity trading venue for 
company shares.

This document contains confidential information of Pink OTC Markets and is only 
intended for the recipient.   Do not copy, reproduce (electronically or 
otherwise), or disclose without the prior written consent of Pink OTC Markets.  
If you receive this message in error, please destroy all copies in your 
possession (electronically or otherwise) and contact the sender above.


Re: AutoSuggest with custom sorting

2010-04-27 Thread Papiya Misra

I guess my basic issue is that Solr scores all matches for prefix
searches equally. Any way to score PINK over PINKSHEETS when you are
searching for PI ?

Thanks
Papiya

Papiya Misra wrote:

Hi

I am supposed to implement auto suggest where the prefix matches are
sorted based on the following criteria.

We have two fields (max characters ~ 100) that we need to search. Field
1 has only one word (no spaces) where as Field2 has multiple words
separated by spaces.

Example -

Row1
```Field1 - ROFL
 Field2 - Rolls on the floor laughing
Row2
Field1: IRLL
Field2 - Rolling
Row3
Field1 -  IRLTR
Field2 - I Roll


1. Results matching field1 should be ranked higher. Results matching the
first word of Field2 should be ranked higher than any subsequent matches.
If you search for "RO*" in the above example the ranking should be
Row1->Row2->Row3.

2.The next sort parameter is the length of the word. So, if you are
searching for IR, Row2 (2 out of 4 ) matches higher than Row3 (2 out of 5).

3. The final sort parameter is an integer field that we already have as
part of the schema.

Any help or pointers will be deeply appreciated.

-Papiya




Pink OTC Markets Inc. provides the leading inter-dealer quotation and trading 
system in the over-the-counter (OTC) securities market.   We create innovative 
technology and data solutions to efficiently connect market participants, 
improve price discovery, increase issuer disclosure, and better inform 
investors.   Our marketplace, comprised of the issuer-listed OTCQX and 
broker-quoted   Pink Sheets, is the third largest U.S. equity trading venue for 
company shares.

This document contains confidential information of Pink OTC Markets and is only 
intended for the recipient.   Do not copy, reproduce (electronically or 
otherwise), or disclose without the prior written consent of Pink OTC Markets.  
If you receive this message in error, please destroy all copies in your 
possession (electronically or otherwise) and contact the sender above.


Re: AutoSuggest with custom sorting

2010-05-04 Thread Chris Hostetter

First off: i would suggest that instead of doing a simple prefix search, 
you look into using EdgeNGrams for this sort of thing.

I'm also assuming since you need custom scoring for this, you aren't going 
to get what you need using the TermsComponent or any other simple solution 
using your main corpus -- it would make more sense to setup a special 
index consisting of one document per "term" to include in your 
autosuggest.

: > 1. Results matching field1 should be ranked higher. Results matching the

easily done with dismax .. even if you are using EdgeNGrams (just make 
sure you have EdgeNGrams on at index time, but not at query time)

: > 2.The next sort parameter is the length of the word. So, if you are
: > searching for IR, Row2 (2 out of 4 ) matches higher than Row3 (2 out of 5).

this can be accomplished by indexing a numeric field containing the 
"length" of the field as a number, and then doing a secondary sort on it.  
the fieldNorm typically takes care of this sort of thing for you, but is 
more of a generalized concept, and doesn't give you exact precision for 
small numbers.


-Hoss



Re: AutoSuggest with custom sorting

2010-05-04 Thread Sean Timm

Chris Hostetter wrote:
this can be accomplished by indexing a numeric field containing the 
"length" of the field as a number, and then doing a secondary sort on it.  
the fieldNorm typically takes care of this sort of thing for you, but is 
more of a generalized concept, and doesn't give you exact precision for 
small numbers
Or see https://issues.apache.org/jira/browse/LUCENE-1360 if you don't 
want to index a field length.


-Sean


Re: AutoSuggest with custom sorting

2010-05-05 Thread Papiya Misra

This was extremely helpful. Thanks a lot.

On 05/04/2010 05:30 PM, Chris Hostetter wrote:

First off: i would suggest that instead of doing a simple prefix search,
you look into using EdgeNGrams for this sort of thing.

I'm also assuming since you need custom scoring for this, you aren't going
to get what you need using the TermsComponent or any other simple solution
using your main corpus -- it would make more sense to setup a special
index consisting of one document per "term" to include in your
autosuggest.

:>  1. Results matching field1 should be ranked higher. Results matching the

easily done with dismax .. even if you are using EdgeNGrams (just make
sure you have EdgeNGrams on at index time, but not at query time)

:>  2.The next sort parameter is the length of the word. So, if you are
:>  searching for IR, Row2 (2 out of 4 ) matches higher than Row3 (2 out of 5).

this can be accomplished by indexing a numeric field containing the
"length" of the field as a number, and then doing a secondary sort on it.
the fieldNorm typically takes care of this sort of thing for you, but is
more of a generalized concept, and doesn't give you exact precision for
small numbers.


-Hoss




Pink OTC Markets Inc. provides the leading inter-dealer quotation and trading 
system in the over-the-counter (OTC) securities market.   We create innovative 
technology and data solutions to efficiently connect market participants, 
improve price discovery, increase issuer disclosure, and better inform 
investors.   Our marketplace, comprised of the issuer-listed OTCQX and 
broker-quoted   Pink Sheets, is the third largest U.S. equity trading venue for 
company shares.

This document contains confidential information of Pink OTC Markets and is only 
intended for the recipient.   Do not copy, reproduce (electronically or 
otherwise), or disclose without the prior written consent of Pink OTC Markets.  
If you receive this message in error, please destroy all copies in your 
possession (electronically or otherwise) and contact the sender above.