Hi David,
Great to hear that you are making progress. The Linking part of the
annotation process is the youngest, and therefore least smooth to get
running at this point. In the code you see in trunk we are doing some very
rudimentary linking, by throwing away things that don't pass a threshold.
In my branch we have some much more powerful machine-learning linkers under
development. I hope to report on those this summer.

The basic idea to "train" the simple linker is to look through the log of
training disambiguations and find a score that throws away X% of the
incorrect results (at the cost of throwing away some correct results too).
As you have noticed, the EvalDisambiguationOnly class does two things:
creates a log for us to train the linker, and it reports some evaluation
results for us. I want to separate these two things, but haven't had time
yet. The log file produced by this class is what you need for
similarity-thresholds.txt

You could for example grab all the incorrects like this:

cat training.tsv.log | cut -f 2,12 | grep -P "^0" | cut -f 2

You could save this directly into similarity-thresholds.txt, for example.
Or you could try to do some curve fitting, or use some "art" to find better
values. If you send me your log, I can take a look at the values for you.

Cheers,
Pablo

On Tue, Feb 21, 2012 at 9:52 AM, David Müller <[email protected]
> wrote:

> Dear all,
>
> I finally found some time to work on a Korean language version for
> Spotlight 0.5 and already made good progress. I followed the
> instructions given in index.sh and reached the last point where it
> says train a linker based on similarity tresholds.
>
> My question here is: What do I need to do to generate the similarity
> treshholds file? Which class will create that file? Obviously not the
> evaluation process as described in
> http://wiki.dbpedia.org/spotlight/datageneration since it expects the
> similarity treshhold file as input.
>
> Second question I have: How will the output of
> org.dbpedia.spotlight.evaluation.EvaluateDisambiguationOnly be used to
> train a linker? The only output it seems to generate is a log file and
> a file of not found surface forms.
>
> I appreciate any help at this point,
>
> Cheers
> David
>
> Knowledge Systems Lab
> Knowledge Service Engineering
> KAIST, Korea
>
>
> ------------------------------------------------------------------------------
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d
> _______________________________________________
> Dbp-spotlight-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users
>
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

Reply via email to