Hi Reinhard,
Better treatment of word shape has been in our list since forever. For now,
we just did the simplest thing possible.
The trade-off is:
- keeping original case is too sparse. You will miss annotations just
because a word never appeared in a different case within an anchor link.
This hurts your recall.
- lowercasing everything introduces too much noise. You will annotate
things that do not make sense -- e.g.: LED versus led (the past tense of
lead). This hurts your precision.
A better solution is under development, but with very little resources
allocated to it. Let us know if you'd like to help out somehow.
Cheers,
Pablo
On Thu, Mar 15, 2012 at 12:05 PM, reinhard schwab <[email protected]>wrote:
> hi,
>
> one question about generating the candidate index.
> the CandidateIndexer class has one parameter
>
> --case-sensitive
>
>
> but to my surprise every surface form is lowercased too because
> a lowercased variant of the surface form is always generated in
> AddSurfaceFormsToIndex.
>
> what are the pros and cons to ignore the case sensitive flag when
> generating the
> lowercase variant?
> at the begin of a sentence words start with a capital letter.
>
> why i ask?
> in german the verb and surface form "lassen" is matched with the
> dbpedia resource "Lassen Peak" or something similar
> (http://de.dbpedia.org/page/Lassen),
> even if i set the case sensitive flag.
>
> best regards
> reinhard
>
>
>
> ------------------------------------------------------------------------------
> This SF email is sponsosred by:
> Try Windows Azure free for 90 days Click Here
> http://p.sf.net/sfu/sfd2d-msazure
> _______________________________________________
> Dbp-spotlight-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users
>
------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users