Hi Thiago,

Sorry for the delay!
I have set up the spotlight server and it is running perfectly fine but
with minimal settings. After this set up I played with spotIight server
during which I came across some discrepancies as follows:

Example taken:
http://spotlight.dbpedia.org/rest/annotate?text=First documented in the
13th century, Berlin was the capital of the Kingdom of Prussia (1701–1918),
the German Empire (1871–1918), the Weimar Republic (1919–33) and the Third
Reich (1933–45). Berlin in the 1920s was the third largest municipality in
the world. In 1990 German reunification took place in whole Germany in which
 the city regained its status as the capital of Germany.

1) If we run this we annotate "13th Century" to "
http://dbpedia.org/page/19th_century";. This might be happening because the
context is very much from 19th century and moreover in "13th Century" and "19th
Century" there is minimal syntactic difference (one letter). But I am not
sure whether this is good or bad.
In my opinion if we have an entity in our store (
http://dbpedia.org/page/13th_century) which is perfectly matching with
surface form in raw text ("13th Century") we should have annotated SF to
the entity.
And same might be the case with "Germany" which is associated to "History
of Germany <http://dbpedia.org/page/History_of_Germany>" not "Germany
<http://dbpedia.org/page/Germany>".

2) We are spotting "place" and associating it with "Portland Place
<http://dbpedia.org/resource/Portland_Place>", maybe due to stemming SF.
And even "Location (geography)
<http://dbpedia.org/page/Location_(geography)>" is not the correct entity
type for this. This is because we are not able to detect the sense of the
word "place" itself. So for that we may have to use word senses like from
Wordnet etc.

3) We are detecting ". Berlin" as a surface form. But I don't came to know
where this SF comes from. And I suspect this SF doesn't come from the
Wikipedia.

4) We spotted "capital of Germany" but I didn't get any candidates if we
run for "candidates" instead of "annotate".

5) We are able to spot "1920s" as a surface form but not "1920".

Few more questions:
1) Are we trying to annotate every word, noun or entity(e.g. proper noun)
in raw text? Because in the above link I found "documented" (a word not a
noun or entity) annotated to "http://dbpedia.org/resource/Document";.

2) Are we using surface forms to deal with only syntactic references (e.g.
surface form "municipality" referring to "Municipality
<http://dbpedia.org/page/Municipality>" or "Metropolitan_municipality
<http://dbpedia.org/page/Metropolitan_municipality>" or "
Municipalities_of_Mexico <http://dbpedia.org/page/Municipalities_of_Mexico>")
or both, syntactic and semantic references (e.g. aliases like "Third Reich"
referring to "Nazi Germany <http://dbpedia.org/page/Nazi_Germany>")?

I am working on generating extra possible surface forms from
a canonical surface form or the entity itself to deal with unseen SF
association problems.
I have also started working on my proposal will also submit it soon.

Thanks,
Abhishek

On Thu, Mar 12, 2015 at 8:20 PM, Thiago Galery <tgal...@gmail.com> wrote:

> Hi Abhishek, thanks for the contribution. Your suggestions are pretty much
> aligned with what we where thinking in any event, and the initial plan
> seems good.
> On the assumption that there's some code that generates extra possible
> surface forms from a cannonical surface form, like your 'Michael Jordan' ->
> 'M. Jordan', 'Jordan' and so on example, it would be worth looking in the
> literature on Machine Translation on how to establish some score for the
> surface form. That is, if you spot 'M Jordan' on the text, what is the
> probability of it being a translation of the canonical name 'Michael
> Jordan' .  If there's a simple way to implement this, we could try to get
> the raw data with counts, generate some extra sfs in a principle manner and
> use that to calculate probabilities. Still for the moment, I'd focus on
> setting the spotlight server up and play with the warm up tasks.
> Thanks for the good work,
> Thiago
>
>
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Dbpedia-gsoc mailing list
Dbpedia-gsoc@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc

Reply via email to