Re: [Dbp-spotlight-users] How is the confidence value calculated?

Radim Rehurek Tue, 17 Jun 2014 02:38:33 -0700

Thanks David.




If I understand your reply correctly, you're advocating using 
"similarityScore" directly as Spotlight's detection confidence.




I wonder if this is better than Stefano's formula. Stefano, did you evaluate
your formula somehow? Mixing support into the confidence formula makes good 
sense to me too.




Best,

Radim















---------- Původní zpráva ----------
Od: David Przybilla <[email protected]>
Komu: Radim Rehurek <[email protected]>
Datum: 17. 6. 2014 11:06:01
Předmět: Re: [Dbp-spotlight-users] How is the confidence value calculated?

"

Hi Radim, Stefano,

1. This is roughly how I think it works, best to confirm checking some code/
paper:

So the support you give via the endpoint serve as a filter over how many 
annotated counts an entity should have.

The confidence value you give via the endpoint is used twice:

 - To filter spots ( chunks of surfaceforms which will be matched later to a
topic)
 - To Filter topic annotations (once you have disambiguated) ( secondRank 
Filter is also used in this stage)


Similarity_of_t = ln(surfaceForm Prior ) + ln(prior_of_t) + 
contextSimilarity_for_t
softTotalSimilarity = sum(e ^ Similarity_of_i)
final_similarity_of_t  = e ^(Similarity_of_t - softTotalSimilarity)


-- order the topics by similarity(greaterFirst
secondRank =  e ^(bottomTopicFinalSimilarityScore - 
topTopicFinalSimilarityScore)

topics with : secondRank > (1 - confidence ^2) are filtered


2. what is the best value ?

I  think this really depends on your use-case, for example if you need lots 
of general topics you might want to have a low value, however be prepared 
for a wave of dodgy topics and surface forms annotations as well.

If you are doing social-media most likely you have lots of surface forms and
variations of them which are not getting spotted because of the confidence 
value.
My advice is to empirically adjust the confidence and support value and then
tweak the spotlight model to adapt it to your particular use case [1]

[1] https://github.com/idio/spotlight-model-editor
(https://github.com/idio/spotlight-model-editor)






On Mon, Jun 16, 2014 at 5:32 PM, Radim Rehurek <[email protected]
(mailto:[email protected])> wrote:
"
I would be also extremely interested in an answer to this. Thanks for 
asking, Stefano.


What's the best way to calculate "Spotlight's detection confidence" = a 
single number?



Cheers,

Radim



---------- Původní zpráva ----------
Od: Stefano Bocconi <[email protected](mailto:[email protected])>
Komu: [email protected]
(mailto:[email protected]) <dbp-spotlight-users@
lists.sourceforge.net(mailto:[email protected])>
Datum: 16. 6. 2014 18:14:26
Předmět: [Dbp-spotlight-users] How is the confidence value calculated?

"



Hi,




I am new to this list, I came here from the github Spotlight page about 
support and feedback. Questions related to what I am asking have popped up a
couple of times in this list as far as I can see, but the answers do not 
provide what I am looking for.




I am using the statistical back-end, and I am basically trying to 
reconstruct the confidence value of the entities extracted.




I have extracted entities from tweets and as a first experiment I did not 
asked for any threshold confidence. Now I would like to calculate the 
confidence of each results to see how filtering based on that influences the
quality of some other process I am doing with the entities.




I am now using the formula:




(1 - .5 * percentageofsecondrank) * similarityscore




Based on the fact that confidence increases with similarity score, but 
decreases if the second candidate is also similar.




Is this comparable to what Spotlight uses in http://spotlight.dbpedia.org/
rest/annotate(http://spotlight.dbpedia.org/rest/annotate)? Or else what is 
the formula? Does support play a role?




Thanks,




Stefano










----------------------------------------------------------------------------
--
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems_____________________________________________
__
(http://p.sf.net/sfu/hpccsystems_______________________________________________)
Dbp-spotlight-users mailing list
[email protected]
(mailto:[email protected])
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users
(https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users)" 





----------------------------------------------------------------------------
--
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems(http://p.sf.net/sfu/hpccsystems)
_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
(mailto:[email protected])
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users
(https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users)

"



"

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems

_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

Re: [Dbp-spotlight-users] How is the confidence value calculated?

Reply via email to