Hi all,

My proposal has not been selected for GSoC. But I am still want to continue
with my project. So can someone provide me any guidelines (if I can
continue)?

Thanks,
Abhishek

On Thu, Apr 9, 2015 at 11:53 PM, Abhishek Gupta <a.gu...@gmail.com> wrote:

> Hi Thiago,
>
> Thanks for your reply and assurance.
> Moreover I replied your question for the extraction framework and I have
> also created an issue regarding using bold instances as the probable
> surface forms here
> <https://github.com/dbpedia-spotlight/dbpedia-spotlight/issues/353>.
>
> Thanks,
> Abhishek
>
> On Thu, Apr 9, 2015 at 1:19 AM, Thiago Galery <tgal...@gmail.com> wrote:
>
>> Hi Abhishek,
>> sorry for taking so long to write to you. Things at work have been really
>> busy. About the issue you raised about the originality of your proposal,
>> rest assured that no one sent a proposal similar to yours.
>>
>> I'm happy that you send a PR for the extraction framework. It seems that
>> Dimitris is already taking a look at it.
>> As for your suggestions in Spotlight, just removing the stopword filter
>> is something that I don't advise that much, cause I remember getting a lot
>> of crap once. Maybe it should be modified somehow. If you have a good idea
>> and want to send a PR, it would be very welcome. I think discussing things
>> on github would be better.
>>
>> All the best,
>> Thiago
>>
>> On Mon, Apr 6, 2015 at 6:15 AM, Abhishek Gupta <a.gu...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> Recently I was checking out the indexing process of dbpedia-spotlight
>>> and I observe a certain things:
>>>
>>> 1) There is a missing constructor definition in wikiPage object
>>> <https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/sources/WikiPage.scala>
>>>  for
>>> instance defined in function wikiPageCopy here
>>> <https://github.com/dbpedia-spotlight/dbpedia-spotlight/blob/master/index/src/main/scala/org/dbpedia/spotlight/io/DisambiguationContextSource.scala#L67>.
>>> For this I have created an PR
>>> https://github.com/dbpedia/extraction-framework/pull/377
>>>
>>> 2) For stopwords filter defined here
>>> <https://github.com/dbpedia-spotlight/dbpedia-spotlight/blob/master/index/src/main/scala/org/dbpedia/spotlight/util/ExtractCandidateMap.scala#L186>,
>>> I did an analysis over the conceptURI's extraction with stopwords list
>>> here
>>> <http://wifo5-04.informatik.uni-mannheim.de/downloads/release-0.4/stopwords.en.list>.
>>> From the analysis it came out that we are neglecting around 25481 entities
>>> in which almost all of them are from important category like music, film,
>>> band etc. E.g. Am_(musician)
>>> <http://en.wikipedia.org/wiki/AM_(musician)>, Home_(2015_film)
>>> <http://en.wikipedia.org/wiki/Home_(2015_film)>, The_Who
>>> <http://en.wikipedia.org/wiki/The_Who> etc. And if we do case sensitive
>>> checking (checking if entity contains more than one capital alphabets as
>>> one is default) even then we will reject some entities which has only one
>>> word like Am, Home etc. Moreover the garbage (can't etc.) we will incur
>>> after removing this filter won't be much. So i suggest if we can remove
>>> this filter.
>>>
>>> 3) I would like to suggest a surface form extraction. If we can extract
>>> bold text in the first line of the wikipedia then we can use that as
>>> probable Surface Form for that entity. E.g. Stanford_University
>>> <http://en.wikipedia.org/wiki/Stanford_University>, Aon_(company)
>>> <http://en.wikipedia.org/wiki/Aon_%28company%29>, Radio_Warwick
>>> <http://en.wikipedia.org/wiki/Radio_Warwick>, Phi_Gamma_Delta
>>> <http://en.wikipedia.org/wiki/Phi_Gamma_Delta> etc. These are the best
>>> Surface Forms for the respective Entity.
>>>
>>> Thanks,
>>> Abhishek
>>>
>>> On Fri, Mar 27, 2015 at 11:56 AM, Abhishek Gupta <a.gu...@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I would also like to inform that in one of the recent mails my proposal
>>>> has been gone public when Thiago accidentally sent a mail to me and
>>>> dbpedia-gsoc mailing list. Details of the mails are below. The Google docs
>>>> link was there in the quotes and the doc can be seen and even edited by
>>>> anyone with that link, but nobody have changed the content of the doc. And
>>>> I believe there might be chances that someone will copy my ideas. So I
>>>> request you to take care of this issue. And I hope this might not
>>>> affect my application.
>>>> As of now I have changed the sharing settings, so please inform me if
>>>> there will be any access problem.
>>>>
>>>> *Mail details:*
>>>> from:Thiago Galery <tgal...@gmail.com>to:Abhishek Gupta <
>>>> a.gu...@gmail.com>,
>>>> dbpedia-gsoc <dbpedia-gsoc@lists.sourceforge.net>
>>>> date:Tue, Mar 24, 2015 at 3:47 AMsubject:Re: [Dbpedia-gsoc] Fwd:
>>>> Contribute to DbPedia
>>>>
>>>> I have also modified my proposal in Candidate Entity Scoring
>>>> methodology. Please take a look at it.
>>>> GSoC proposal link:
>>>> https://www.google-melange.com/gsoc/proposal/review/student/google/gsoc2015/abhishek_g/5629499534213120
>>>> Google Docs Link:
>>>> https://docs.google.com/document/d/1U4BvJpGUvL2odVA6VxnYggfEX_hmLSYP4yqhXB7dLQU/edit
>>>>
>>>> Moreover I would like to ask one more question which might help me in
>>>> modelling the problem. In below example texts which entity would you like
>>>> to annotate "river" (in bold) with "http://dbpedia.org/page/River"; or "
>>>> http://dbpedia.org/page/River_Thames"; or something else?
>>>> 1. The River Thames is a river that flows through southern England.
>>>> This *river *is the longest in entire England and the second longest
>>>> in the United Kingdom, after the River Severn.
>>>> 2. I would like to swim in the longest *river* entirely in England and
>>>> the second longest in the United Kingdom, after the River Severn.
>>>>
>>>> In the first example "river" is explicitly referring to the Thames
>>>> River. It is like a co-reference resolution. But in the second example
>>>> there is an implicit reference to the Thames river as it is the longest in
>>>> the England etc. which we are able to infer due to the context. So I would
>>>> like to know whether we are trying to annotate river with a simple "River"
>>>> or "Thames River".
>>>>
>>>> Thanks,
>>>> Abhishek
>>>>
>>>>
>>>
>>
>
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc

Reply via email to