Re: [Dbp-spotlight-users] Time performance for each phase

Pajolma Rupi Tue, 09 Jun 2015 00:58:15 -0700

Hi David, 

Yes, my objective was to test the running time for each endpoint, so that I 
have an idea about the phase that takes longer during the annotation process. 
I ran a few tests with small text files and it seems like the phrase spotting 
phase (spot endpoint + candidates endpoint) takes longer in comparison to the 
disambiguation one (annotate endpoint). My explanation would be that during the 
disambiguation phase, it's only the contextual score that is taken into account 
(if I understood it right from the paper DBpedia spotlight shedding light on 
the web of documents : the resource with the biggest contextual score is 
chosen) and this score is already calculated during the phrase spotting (more 
precisely during the candidate generation sub-phase). Given this fact, the 
disambiguation consists of just choosing the resource with the biggest 
contextual score and takes much less time than the phrase spotting one. Please 
let me know if you have a different opinion on the matter.


Best, 
Pajolma 

----- Original Message -----

> From: "David Przybilla" <[email protected]>
> To: "Pajolma Rupi" <[email protected]>
> Cc: [email protected]
> Sent: Friday, June 5, 2015 10:19:07 AM
> Subject: Re: [Dbp-spotlight-users] Time performance for each phase

> Hi Pajolma,

> Sorry, I miss understood "performance" :) and Ithought we were talking about
> the quality of the extractions.

> If it is benchmarking time, then I guess yes, you could call the given
> endpoints and subtract the time.

> Other possibility is for you take a look at SpotlightInterface which encode
> all the pipelines for `candidates` , `annotate` and `spot`, then isolate the
> calls, passing some testing set that you could provide.

> On Thu, Jun 4, 2015 at 4:30 PM, Pajolma Rupi < [email protected] > wrote:

> > Hi David,
> 

> > I managed to find the kore50 corpus but not the milne-witten one. Do you
> > know
> > if it's still publicly available?
> 

> > In order to test the time performance of each phase, I was thinking to use
> > the available endpoints:
> 

> > 1-spot
> 
> > 2-candidates
> 
> > 3-disambiguate
> 
> > 4-annotate
> 

> > Because for using the disambiguate endpoint I would have to provide NE
> > annotations in my call I was thinking to use the annotate endpoint instead
> > and subtract the time consumed by the candidates endpoint in order to be
> > able to get the time consumed by the disambiguation phase. Would such logic
> > be correct with respect to the implementation? Is there any other phase in
> > the pipeline (between disambiguation and annotation) which might affect
> > this
> > logic? If I understood it well, the pipeline consists of the processing
> > done
> > by each of the endpoints in the order that I've listed them above. Please
> > let me know if it is not the case.
> 

> > Thank you in advance,
> 
> > Pajolma
> 

> > > From: "David Przybilla" < [email protected] >
> > 
> 
> > > To: "Pajolma Rupi" < [email protected] >
> > 
> 
> > > Cc: [email protected]
> > 
> 
> > > Sent: Tuesday, June 2, 2015 6:45:19 PM
> > 
> 
> > > Subject: Re: [Dbp-spotlight-users] Time performance for each phase
> > 
> 

> > > Hi Pajolma,
> > 
> 

> > > As far as I know there are no separate evaluations out of the box, but
> > > you
> > > could use the milne-witten corpus to evaluate only the spottter and
> > > disambiguation separately.
> > 
> 

> > > In my experience problems are usually related to spotting: surface forms
> > > which are not in the models, surface forms without enough probability.
> > 
> 

> > > There is also specific corpus for evaluating disambiguation (kore50)
> > 
> 

> > > On Tue, Jun 2, 2015 at 1:58 PM, Pajolma Rupi < [email protected] >
> > > wrote:
> > 
> 

> > > > Dear all,
> > > 
> > 
> 

> > > > I was not able to find some information regarding the time performance
> > > > of
> > > > Spotlight service for each of the phases (separately): phrase spotting
> > > > (candidate generation, candidate selection), disambiguation,
> > > > indexing.There
> > > > are some numbers present in the paper " Improving efficiency and
> > > > accuracy
> > > > in
> > > > multilingual entity extraction " but they are calculated in the context
> > > > of
> > > > all the annotation process, meanwhile I'm interested in knowing during
> > > > which
> > > > specific phase the service performs better and during which phase it
> > > > performs worse.
> > > 
> > 
> 

> > > > Could you please let me know if such information exists already?
> > > 
> > 
> 
> > > > I would also be interested in knowing if I can produce such information
> > > > by
> > > > running my own local instance of Spotlight (I'm using Java in order to
> > > > annotate text).
> > > 
> > 
> 

> > > > Thank you in advance,
> > > 
> > 
> 
> > > > Pajolma
> > > 
> > 
> 

> > > > ------------------------------------------------------------------------------
> > > 
> > 
> 

> > > > _______________________________________________
> > > 
> > 
> 
> > > > Dbp-spotlight-users mailing list
> > > 
> > 
> 
> > > > [email protected]
> > > 
> > 
> 
> > > > https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users
> > > 
> > 
>

------------------------------------------------------------------------------

_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

Re: [Dbp-spotlight-users] Time performance for each phase

Reply via email to