Re: search engine - Precision, recall

2017-07-27 Thread Florian Meier
Hi Itay,

in IR research there’s a long tradition (TREC and alike) for measuring the 
effectiveness of search engines. In this context it is measured by using a so 
called test collection, which consists of three things:
1. Documents 
2. Topics i.e. information needs/queries of users for these documents 
3. relevance assessment data i.e. which documents are relevant for which topics 

Using the results your search engine returns for each topic, one can calculate 
Precision and Recall or - depending on the context and use case the searching 
engine is running in - other measures might be more appropriate. For example in 
an enterprise context it might be the case that only one document is possible 
to fulfill the searchers information need. If you find this to be the case for 
a lot of information needs,  different measures like MRR might be more 
suitable. 

Moreover there’s also the question on whether you are aiming for high precision 
or high recall. Balancing both is a hard task and it is up to you and your 
users needs to find out what is more important for them. E.g. is it critical to 
not have certain documents? Than high recall might be your aim...

If you are aiming for high precision you could, for example, measure something 
like Precision@10, which means how many relevant documents are in the top 10 
documents returned. For this you don’t need relevance assessment data. However, 
what you need in any case is an idea what your users are searching for and 
generate possible test queries from this and also have an idea what documents 
are relevant. Finally, I thinks it’s not possible to give a recommended 
percentage cause it depends a lot on your context.

Hope this helps,
Florian


> Am 27.07.2017 um 15:20 schrieb Itay K :
> 
> Hi,
> 
> I'm trying to measure Precision and recall for a search engine which is
> crawling data sources of an organization.
> 
> Are there any best practices regrading these indexes and specific
> industries (e.g. for financial organizations, the recommended percentage
> for precision and recall is ~60%).
> 
> Is there any best practice in general for the recommended percentage?
> 
> I read an article from 2005 regrading measured precision and recall for web
> search engines but unfortunately my use case isn't a web application and I
> believe that since than a lot has changed.
> 
> thanks



Re: difference in json update handler update/json and update/json/docs

2017-02-09 Thread Florian Meier
this was the right lead, thanks Alex

> Am 08.02.2017 um 22:20 schrieb Alexandre Rafalovitch :
> 
> /update/json expects Solr JSON update format.
> /update is an auto-route that should be equivalent to /update/json
> with the right content type/extension.
> 
> /update/json/docs expects random JSON and tries to extract fields for
> indexing from it.
> https://cwiki.apache.org/confluence/display/solr/Transforming+and+Indexing+Custom+JSON
> 
> Regards,
>   Alex.
> 
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
> 
> 
> On 8 February 2017 at 15:54, Florian Meier
>  wrote:
>> dear solr users,
>> can somebody explain the exact difference between the to update handlers? 
>> I’m asking cause with some curl commands solr fails to identify the fields 
>> of the json doc and indexes everything in _str_:
>> 
>> Those work perfectly:
>> curl 'http://localhost:8983/solr/testcore2/update/json?commit=true' 
>> --data-binary @example/exampledocs/cacmDocs.json
>> 
>> 
>> curl 'http://localhost:8983/solr/testcore2/update?commit=true' --data-binary 
>> @example/exampledocs/cacmDocs.json -H 'Content-type:application/json'
>> 
>> But those two (both with update/json/docs) don't
>> 
>> curl 'http://localhost:8983/solr/testcore2/update/json/docs?commit=true' 
>> --data-binary @example/exampledocs/cacmDocs.json -H 
>> 'Content-type:application/json‘
>> 
>> curl 'http://localhost:8983/solr/testcore2/update/json/docs?commit=true' 
>> --data-binary @example/exampledocs/cacmDocs.json
>> 
>> Cheers,
>> Florian
>> 
>> 
>> 
>> 
>> 



difference in json update handler update/json and update/json/docs

2017-02-08 Thread Florian Meier
dear solr users,
can somebody explain the exact difference between the to update handlers? I’m 
asking cause with some curl commands solr fails to identify the fields of the 
json doc and indexes everything in _str_:

Those work perfectly:
curl 'http://localhost:8983/solr/testcore2/update/json?commit=true' 
--data-binary @example/exampledocs/cacmDocs.json


curl 'http://localhost:8983/solr/testcore2/update?commit=true' --data-binary 
@example/exampledocs/cacmDocs.json -H 'Content-type:application/json'

But those two (both with update/json/docs) don't

curl 'http://localhost:8983/solr/testcore2/update/json/docs?commit=true' 
--data-binary @example/exampledocs/cacmDocs.json -H 
'Content-type:application/json‘

curl 'http://localhost:8983/solr/testcore2/update/json/docs?commit=true' 
--data-binary @example/exampledocs/cacmDocs.json

Cheers,
Florian