Hi David, Yes, my objective was to test the running time for each endpoint, so that I have an idea about the phase that takes longer during the annotation process. I ran a few tests with small text files and it seems like the phrase spotting phase (spot endpoint + candidates endpoint) takes longer in comparison to the disambiguation one (annotate endpoint). My explanation would be that during the disambiguation phase, it's only the contextual score that is taken into account (if I understood it right from the paper DBpedia spotlight shedding light on the web of documents : the resource with the biggest contextual score is chosen) and this score is already calculated during the phrase spotting (more precisely during the candidate generation sub-phase). Given this fact, the disambiguation consists of just choosing the resource with the biggest contextual score and takes much less time than the phrase spotting one. Please let me know if you have a different opinion on the matter.
Best, Pajolma ----- Original Message ----- > From: "David Przybilla" <[email protected]> > To: "Pajolma Rupi" <[email protected]> > Cc: [email protected] > Sent: Friday, June 5, 2015 10:19:07 AM > Subject: Re: [Dbp-spotlight-users] Time performance for each phase > Hi Pajolma, > Sorry, I miss understood "performance" :) and Ithought we were talking about > the quality of the extractions. > If it is benchmarking time, then I guess yes, you could call the given > endpoints and subtract the time. > Other possibility is for you take a look at SpotlightInterface which encode > all the pipelines for `candidates` , `annotate` and `spot`, then isolate the > calls, passing some testing set that you could provide. > On Thu, Jun 4, 2015 at 4:30 PM, Pajolma Rupi < [email protected] > wrote: > > Hi David, > > > I managed to find the kore50 corpus but not the milne-witten one. Do you > > know > > if it's still publicly available? > > > In order to test the time performance of each phase, I was thinking to use > > the available endpoints: > > > 1-spot > > > 2-candidates > > > 3-disambiguate > > > 4-annotate > > > Because for using the disambiguate endpoint I would have to provide NE > > annotations in my call I was thinking to use the annotate endpoint instead > > and subtract the time consumed by the candidates endpoint in order to be > > able to get the time consumed by the disambiguation phase. Would such logic > > be correct with respect to the implementation? Is there any other phase in > > the pipeline (between disambiguation and annotation) which might affect > > this > > logic? If I understood it well, the pipeline consists of the processing > > done > > by each of the endpoints in the order that I've listed them above. Please > > let me know if it is not the case. > > > Thank you in advance, > > > Pajolma > > > > From: "David Przybilla" < [email protected] > > > > > > > To: "Pajolma Rupi" < [email protected] > > > > > > > Cc: [email protected] > > > > > > Sent: Tuesday, June 2, 2015 6:45:19 PM > > > > > > Subject: Re: [Dbp-spotlight-users] Time performance for each phase > > > > > > Hi Pajolma, > > > > > > As far as I know there are no separate evaluations out of the box, but > > > you > > > could use the milne-witten corpus to evaluate only the spottter and > > > disambiguation separately. > > > > > > In my experience problems are usually related to spotting: surface forms > > > which are not in the models, surface forms without enough probability. > > > > > > There is also specific corpus for evaluating disambiguation (kore50) > > > > > > On Tue, Jun 2, 2015 at 1:58 PM, Pajolma Rupi < [email protected] > > > > wrote: > > > > > > > Dear all, > > > > > > > > > > I was not able to find some information regarding the time performance > > > > of > > > > Spotlight service for each of the phases (separately): phrase spotting > > > > (candidate generation, candidate selection), disambiguation, > > > > indexing.There > > > > are some numbers present in the paper " Improving efficiency and > > > > accuracy > > > > in > > > > multilingual entity extraction " but they are calculated in the context > > > > of > > > > all the annotation process, meanwhile I'm interested in knowing during > > > > which > > > > specific phase the service performs better and during which phase it > > > > performs worse. > > > > > > > > > > Could you please let me know if such information exists already? > > > > > > > > > > I would also be interested in knowing if I can produce such information > > > > by > > > > running my own local instance of Spotlight (I'm using Java in order to > > > > annotate text). > > > > > > > > > > Thank you in advance, > > > > > > > > > > Pajolma > > > > > > > > > > ------------------------------------------------------------------------------ > > > > > > > > > > _______________________________________________ > > > > > > > > > > Dbp-spotlight-users mailing list > > > > > > > > > > [email protected] > > > > > > > > > > https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users > > > > > >
------------------------------------------------------------------------------
_______________________________________________ Dbp-spotlight-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users
