I don't mean to hijack.
Yes, there are two ways.
1) Index time field boosting : Please note that it is like hard-coding
those boosts into the index. If you want to change boosting for a field,
you will have to re-index.
2) Query-time (field-level) boosting: This is more flexible. Achieves
exactly same as above. I don't think it introduces any significant
performance impact.
When it comes to Lucene/Solr, you always specify the field name along
with the keyword as in fieldName:keyword(s), which is the atomic unit
that's searchable in Lucene/Solr. In this case, you just have to provide
boost as well as shown in the following link.
References
http://www.solrtutorial.com/solr-search-relevancy.html
https://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_increase_the_score_for_specific_documents
On Tue, Jun 10, 2014 at 12:26 AM, Derek Poh <[email protected]> wrote:
> Hi Mark
>
> Appreciate you taking the time to reply and with references.
>
> Regarding 3. Configure and defined the relevance ranking and matching
> logic of the return result.
>
> Can each search handler be configure to
> - search on a few fields
> - assign a numeric rank to each of the field, such that a match on a field
> with the highest rank will rank the document higher in the return search
> result.
> - the ranking of each field will also act as tie-breaker.
> Eg.
> Category = 3
> SPPKeyWord= 2
> KeySpecification= 1
>
> Document that has match on field Category will be ranked higher in the
> result than document that has match on SPPKeyWord.
> Document that has match only on field KeySpecification willrank the lowest
> in the result.
>
>
>
> On 6/10/2014 12:27 AM, Mark Bennett wrote:
>
>> Hello Derek,
>>
>> See answers inline.
>>
>> --
>> Mark Bennett / LucidWorks: Search & Big Data /
>> [email protected]
>> Office: 408-898-4201 / Telecommute: 408-733-0387 / Cell: 408-829-6513
>>
>> On Jun 9, 2014, at 12:00 AM, Derek Poh <[email protected]> wrote:
>>
>> My company is actively looking at alternative search engine applications
>>> to replace our current Endeca application.
>>>
>>> I have no experience and knowledge on Solr and Lucene.
>>> Please bear with me, I would like to find out if the following features
>>> are available on Solr.
>>>
>>> 1. Aggregate results (rollups).
>>> Eg. Froma list of search result of products (each has field = supplier
>>> id), can the results be aggregated by supplier id with the original results
>>> ordering retain.
>>>
>> Yes it can:
>> http://wiki.apache.org/solr/FieldCollapsing
>>
>> 2. Filter/Navigator, counts.
>>> List out a field's possible values and their counts fromthe indexed data
>>> and from the return results.
>>> The field's values can be sorted by the values description or by the
>>> values countsin the return results.
>>>
>> Yes, Solr calls these "Facets" and offers several types:
>> http://wiki.apache.org/solr/SimpleFacetParameters
>> http://wiki.apache.org/solr/HierarchicalFaceting
>>
>> Eg. Field "Business Type" belowwith it's possible values and the count
>>> for each value(in bracket). Can the field be return in the result with it's
>>> values sorted either by description or bycounts?
>>> Business Type
>>> Manufacturer (15269)
>>> Exporter (12493)
>>> Trading Company (5541)
>>> Agent (1324)
>>> Wholesaler (1202)
>>> Importer (682)
>>> Buying Office (394)
>>> Distributor (278)
>>> Other (157)
>>> Retailer (116)
>>> Consultant (54)
>>>
>> Absolutely, and Solr is very fast and accurate.
>>
>> 3. Configureand defined the relevance rankingand matching logic of the
>>> return result.
>>>
>> Yes, though not by that name.
>> Step 1:
>> Configure default edismax parameters in your solrconfig.xml
>>
>> Step 2:
>> Create additional search handlers in solrconfig.xml, and each search
>> handler can have its own edismax configuration.
>>
>> Normally the format of the search URL is:
>> http://localhost:8983/solr/collection_name/select?q=text:budget
>>
>> You would replace the "select" with the name of the search handler that
>> has the edismax config you want.
>>
>> With multiple search handlers, you'd use something like:
>> http://localhost:8983/solr/collection_name/search_
>> freshest?q=text:budget
>> http://localhost:8983/solr/collection_name/search_most_
>> popular?q=text:budget
>>
>> 4. Defined and configure the thesaurus (1-wayor 2-way), stemming and
>>> stop words.
>>>
>> Yes, Solr is very good about this, you have both options.
>>
>> Also, Solr let's you choose:
>> * Index time, or query time, or both
>> * Use expansion or reduction
>>
>> You can even have more than one thesaurus file and have them each handled
>> differently.
>>
>> For example:
>> * Use an english_language thesaurus, which rarely changes, and expand
>> that at index time
>> * Use your company_synonyms, which may change frequently, and expand them
>> at search time.
>>
>> I'll let you find these in the wiki, http://wiki.apache.org
>>
>> 5. Multi-language supportfor Simplified Chinese and Spanish.
>>>
>> Yes!
>>
>> And for simplified Chinese, please make sure to use the SmartCN analyzer,
>> and not the simplistic "CJK"; SmartCN actually looks for Chinese language
>> word breaks using statistical methods, and therefore should give better
>> results.
>>
>> 6. Scalability.
>>> At present, we are indexing 4million recordsand the number is expected
>>> to increase by more than 10 folds in the near future.
>>>
>> 40 million documents can normally be handled on a single machine,
>> assuming it has enough RAM and doesn't have a lot of other stuff running.
>> You might want a second machine for failover.
>>
>> When people use multiple machines, then the way to do that is via
>> SolrCloud.
>>
>> 7. Search results debugging. Eg. why record was matchedor why record was
>>> ranked as such.
>>>
>> Yes.
>>
>> You typically add &debugQuery=true&debug.explain.structured=true to the
>> URL.
>>
>> The output is a bit technical, it takes some practice to understand.
>>
>> There's also a graphical relevancy debugger with a free eval period:
>> http://www.lucidworks.com/market_app/lucidworks-relevancy-workbench/
>>
>> Derek
>>>
>>> ----------------------
>>> CONFIDENTIALITY NOTICE
>>> This e-mail (including any attachments) may contain confidential and/or
>>> privileged information. If you are not the intended recipient or have
>>> received this e-mail in error, please inform the sender immediately and
>>> delete this e-mail (including any attachments) from your computer, and you
>>> must not use, disclose to anyone else or copy this e-mail (including any
>>> attachments), whether in whole or in part.
>>> This e-mail and any reply to it may be monitored for security, legal,
>>> regulatory compliance and/or other appropriate reasons.
>>>
>>
>>
>>
>
> ----------------------
> CONFIDENTIALITY NOTICE
> This e-mail (including any attachments) may contain confidential and/or
> privileged information. If you are not the intended recipient or have
> received this e-mail in error, please inform the sender immediately and
> delete this e-mail (including any attachments) from your computer, and you
> must not use, disclose to anyone else or copy this e-mail (including any
> attachments), whether in whole or in part.
> This e-mail and any reply to it may be monitored for security, legal,
> regulatory compliance and/or other appropriate reasons.
>