Thanks to everyone for the help yesterday with the statistical endpoint.

I'm trying to understand how to tune the tool to get optimal results.

When I used the example text in the demo interface -

President Obama called Wednesday on Congress to extend a tax break
  for students included in last year's economic stimulus package, arguing
  that the policy provides more generous assistance.

A confidence of 0.5 only picked up 'Congress'.  Reducing the confidence to
0.3 picked up a lot more stuff - including linking 'Wednesday' to a sports
team, which seems bizarre to me.

On my own data, which comes from Twitter, I see weird things like mentions
of 'police' linking to the musical group The Police, and the word
'celebrate' (in the context of celebrating an anniversary) linking to the
Madonna song.  If I turn the confidence up, I lose those references, but I
also lose 'good' references as well.

I feel like whitelisting or blacklisting is the way to go, but I'm having
trouble correlating the types I see in my results with the ontology at
http://mappings.dbpedia.org/server/ontology/classes/  That ontology
particularly confuses me, as it seems very uneven - as an example, under
'Organization', there are classes that make sense to me, like 'Company' and
"Sports League', and then oddly specific things like 'Comedy Group' and
'Samba School' at the same level.  In my results, there are a mix of types
from DBpedia, Schema, and Freebase, and it's not clear to me how I would
specify (for example) that I'm interested in people, places, and events,
but not musical groups, internet concepts (it always picks up 'http' from
embedded links and gives me 'Hypertext Transfer Protocol'), etc.

Thanks!

Betsey Benagh

Boston Fusion Corp.
1 Van de Graaff Drive, Ste 107
Burlington, MA 01803-5176
[email protected]
617-583-5730 x106 (office)
781-367-6720 (mobile)
------------------------------------------------------------------------------
_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

Reply via email to