hi pablo,

i just want to generate spotter dictionaries for german.

if i use tsv as option, the generated dictionary is based on the surface
forms from uris(title, redirects, disambiguations).
if i use index as option, the generated dictionary is based on the
occurences in wikipedia?

in downloads, you provide

http://spotlight.dbpedia.org/download/release-0.4/surface_forms-Wikipedia-TitRedDis.thresh3.spotterDictionary.gz
http://spotlight.dbpedia.org/download/release-0.4/surface_forms-Wikipedia-TitRedDis.uriThresh10.tsv.spotterDictionary.gz
http://spotlight.dbpedia.org/download/release-0.4/surface_forms-Wikipedia-TitRedDis.uriThresh75.tsv.spotterDictionary.gz
http://spotlight.dbpedia.org/download/release-0.5/spotter.small.dict
http://spotlight.dbpedia.org/download/release-0.5/spotter.large.dict

i just want to generate these files for german.
the threshold of 75 and 10 refers to c(uri)?
thresh3 refers to c(sf,uri)?

best regards
reinhard


Am 05.03.2012 12:57, schrieb Pablo Mendes:
> Hi Reinhard,
> We've assumed that you would have filtered the URIs before you've
> created the index, as this seems to be the most space/time efficient
> solution.
>
> On which of the two alternatives below do you intend to filter?
> 1. c(uri) --number of occurrences of a given URI
> 2. c(sf,uri) -- number of occurrences of a given sf->uri pair
>
> You could easily do c(uri) because that's usually stored in the index.
> However, c(sf,uri) does not go to the context index anymore. In my dev
> branch, it goes to the candidate index, though. But that one is built
> from a TSV file, and it would be much easier to filter directly from that.
>
> Is there any particular reason for building that file from the index?
>
> Best,
> Pablo
>
> On Mon, Mar 5, 2012 at 12:26 PM, reinhard schwab
> <[email protected] <mailto:[email protected]>> wrote:
>
>     hi,
>
>     i want now to create a spotter dictionary using
>     IndexLingPipeSpotter as
>     mentioned
>     http://sourceforge.net/mailarchive/message.php?msg_id=28435284
>
>     two optional inputs:
>     - tsv (surfaceForms.tsv)
>     - index
>
>     if i want to use the index as input, how can i filter those uris with
>     occurences
>     above a threshold?
>     there is no parameter for a threshold.
>
>     best regards
>     reinhard
>
>     
> ------------------------------------------------------------------------------
>     Try before you buy = See our experts in action!
>     The most comprehensive online learning library for Microsoft
>     developers
>     is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3,
>     MVC3,
>     Metro Style Apps, more. Free future releases when you subscribe now!
>     http://p.sf.net/sfu/learndevnow-dev2
>     _______________________________________________
>     Dbp-spotlight-users mailing list
>     [email protected]
>     <mailto:[email protected]>
>     https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users
>
>

------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

Reply via email to