Re: Solr facet search improvements

2015-01-28 Thread Jack Krupansky
It would probably be better to do entity extraction and normalization of
job titles as a front-end process before ingesting the data into Solr, but
you could also do it as a custom or script update processor. The latter can
be easily coded in JavaScript to run within Solr

Your first step in any case will be to define the specific rules you wish
to use for both normalization of job titles and the actual matching. Yes,
you can do that in Solr, but you have to do it, Solr will not do it
magically for you. Also, post some specific query examples that completely
cover the range of queries you need to be able to handle.

-- Jack Krupansky

On Wed, Jan 28, 2015 at 5:56 AM, thakkar.aayush thakkar.aay...@gmail.com
wrote:

 I have around 1 million job titles which are indexed on Solr and am looking
 to improve the faceted search results on job title matches.

 For example: a job search for *Research Scientist Computer Architecture* is
 made, and the facet field title which is tokenized in solr and gives the
 following results:

 1. Senior Data Scientist
 2. PARALLEL COMPUTING SOFTWARE ENGINEER
 3. Engineer/Scientist 4
 4. Data Scientist
 5. Engineer/Scientist
 6. Senior Research Scientist
 7. Research Scientist-Wireless Networks
 8. Research Scientist-Andriod Development
 9. Quantum Computing Theorist Job
 10.Data Sceintist Smart Analytics

 I want to be able to improve / optimize the job titles and be able to make
 exclusions and some normalizations. Is this possible with Solr? What is the
 best way to have more granular control over the facted search results ?

 For example *Engineer/Scientist 4* - is not useful and too specific and
 titles like *Quantum Computing theorist* would ideally also be excluded



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-facet-search-improvements-tp4182502.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr facet search improvements

2015-01-28 Thread Shawn Heisey
On 1/28/2015 3:56 AM, thakkar.aayush wrote:
 I have around 1 million job titles which are indexed on Solr and am looking
 to improve the faceted search results on job title matches.
 
 For example: a job search for *Research Scientist Computer Architecture* is
 made, and the facet field title which is tokenized in solr and gives the
 following results:
 
 1. Senior Data Scientist 
 2. PARALLEL COMPUTING SOFTWARE ENGINEER 
 3. Engineer/Scientist 4 
 4. Data Scientist 
 5. Engineer/Scientist 
 6. Senior Research Scientist 
 7. Research Scientist-Wireless Networks 
 8. Research Scientist-Andriod Development 
 9. Quantum Computing Theorist Job 
 10.Data Sceintist Smart Analytics
 
 I want to be able to improve / optimize the job titles and be able to make
 exclusions and some normalizations. Is this possible with Solr? What is the
 best way to have more granular control over the facted search results ?
 
 For example *Engineer/Scientist 4* - is not useful and too specific and
 titles like *Quantum Computing theorist* would ideally also be excluded

Normally, if the field is tokenized, you will not get the original
values in the facet.  You will get values like senior instead of
Senior Data Scientist.  If DocValues are enabled on the field, then
you may well indeed get the original values.  I've never tried facets on
a tokenized field with DocValues, but everything I understand about the
feature says it would result in the original (not tokenized) values.

If you want different values in the facets, then you'll need to change
those values before they get indexed in Solr.  That can be done with
custom UpdateProcessor code embedded in the update chain, or you can
simply do the changes in your program that indexes the data in Solr.

Thanks,
Shawn



Solr facet search improvements

2015-01-28 Thread thakkar.aayush
I have around 1 million job titles which are indexed on Solr and am looking
to improve the faceted search results on job title matches.

For example: a job search for *Research Scientist Computer Architecture* is
made, and the facet field title which is tokenized in solr and gives the
following results:

1. Senior Data Scientist 
2. PARALLEL COMPUTING SOFTWARE ENGINEER 
3. Engineer/Scientist 4 
4. Data Scientist 
5. Engineer/Scientist 
6. Senior Research Scientist 
7. Research Scientist-Wireless Networks 
8. Research Scientist-Andriod Development 
9. Quantum Computing Theorist Job 
10.Data Sceintist Smart Analytics

I want to be able to improve / optimize the job titles and be able to make
exclusions and some normalizations. Is this possible with Solr? What is the
best way to have more granular control over the facted search results ?

For example *Engineer/Scientist 4* - is not useful and too specific and
titles like *Quantum Computing theorist* would ideally also be excluded



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-facet-search-improvements-tp4182502.html
Sent from the Solr - User mailing list archive at Nabble.com.