Hi Dileepa,
El 18/06/13 13:20, Dileepa Jayakody escribió:
Hi All,
After going through a lot documentation on Stanbol and Entity
Disambiguation, I started trying out the Stanbol EntityHub indexing tool
[1] to create a site for foaf-dataset. I found a sufficient foaf dataset in
N-Quad format here [2], and would like to know if you guys are ok with me
going ahead with this dataset. Found couple of more sites providing their
contacts as foaf, but thought of starting with this dataset as it's
collected/crawled from various sources.
I would need more information about this dataset because, initially, it
seems that DataHub already contains DBpedia and Freebase dumps as well
as a lot of datasets more from Linked Open Data, but then, the
documentation says that "The seed set for the Datahub crawl contained
all example URIs marked example/*". Do you know what that means
exactly?. Also, you need to be careful about creating duplicate entries
from Datahub, Freebase and DBpedia.
For indexing purpose I'm following the steps given at [3]. I believe I
should configure this as a ReferencedSite and not ManagedSite because the
data is collected from various sources. Please correct me if I'm wrong.
I think that Referenced sites are just used when you already have your
own external knowledge base, so using the entityhub indexing tool and
then deploying the index will automatically create a ManagedSite.
Regards
Your pointers,suggestions are very much appreciated.
Thanks,
Dileepa
[1]
https://svn.apache.org/repos/asf/stanbol/trunk/entityhub/indexing/genericrdf
[2] http://km.aifb.kit.edu/projects/btc-2012/
[3]http://stanbol.apache.org/docs/trunk/customvocabulary.html
On Fri, Jun 14, 2013 at 9:46 AM, Dileepa Jayakody <[email protected]
wrote:
Thanks a lot Rafa.
I will go through these docs and let you guys know if I have questions.
Regards,
Dileepa
On Thu, Jun 13, 2013 at 5:48 PM, Rafa Haro <[email protected]> wrote:
Hi Dileepa,
I can suggest you a couple of useful links. First one is a quite good
guide for creating new engines in Stanbol. I hope is not getting old:
http://blog.iks-project.eu/**getting-started-with-apache-**
stanbol-enhancement-engine/<http://blog.iks-project.eu/getting-started-with-apache-stanbol-enhancement-engine/>
Second one is about working with custom vocabularies in Stanbol. I think
you are going to know how to configure entityhub indexing tools for storing
the FOAF data: http://stanbol.apache.org/**docs/trunk/customvocabulary.**
html <http://stanbol.apache.org/docs/trunk/customvocabulary.html>
I hope this helps.
Cheers,
Rafa Haro
El 12/06/13 23:05, Dileepa Jayakody escribió:
Hi All,
Can you guys please give me some directions on what components in Stanbol
code base I should study more for my project? (It seems not feasible to
go
through all the areas of the code base as it is pretty big :))
At the moment I'm looking at below components;
/enhancer/generic/servicesapi
/enhancement-engines/**entityhublinking, disambiguation-mlt
/entityhub/site/managed
Greatly appreciate your pointers to relavant areas of the codebase that I
should be more focused on.
Thanks,
Dileepa
On Tue, Jun 11, 2013 at 2:36 PM, Dileepa Jayakody <
[email protected]
wrote:
Hi Rafa
On Tue, Jun 11, 2013 at 2:16 PM, Rafa Haro <[email protected]> wrote:
Hi Dileepa,
El 11/06/13 07:07, Dileepa Jayakody escribió:
My suggestion on integrating foaf-search [3] would basically need to
do a
on-the-fly retrieval of data, but as you have pointed out it could
impose a
performance hit. But foaf-search looks promising with a big index of
FOAF
data.
A concern about using foaf-search is that you should ensure that you
manage FOAF information associated with the EntityHub site used for
Entity
Linking. So, for instance, if you want to link your entities with
DBpedia,
can you be sure that the results of searching with a surface form (name
mention) in foaf-search are going to include the right entity foaf
data?.
In other words, does foaf-search index have information about all your
entities in DBpedia EntityHub site?
AFAIK foaf-search has integrated DBpedia 3.8, therefore we can assume
it
is up-to-date with DBpedia entities. However the free API access is
restricted to 50000 calls per-day and there are some terms of use [8]
that
might be bit of a concern (eg: availability of service, warranty)
[8] http://www.foaf-search.net/**Terms<http://www.foaf-search.net/Terms>
Regards
--
------------------------------
This message should be regarded as confidential. If you have received
this email in error please notify the sender and destroy it
immediately.
Statements of intent shall only become binding when confirmed in hard
copy
by an authorised signatory.
Zaizi Ltd is registered in England and Wales with the registration
number
6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam
Road,
London W10 5JJ, UK.
--
------------------------------
This message should be regarded as confidential. If you have received
this email in error please notify the sender and destroy it immediately.
Statements of intent shall only become binding when confirmed in hard copy
by an authorised signatory.
Zaizi Ltd is registered in England and Wales with the registration number
6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
London W6 7AN.
--
------------------------------
This message should be regarded as confidential. If you have received this
email in error please notify the sender and destroy it immediately.
Statements of intent shall only become binding when confirmed in hard copy
by an authorised signatory.
Zaizi Ltd is registered in England and Wales with the registration number
6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
London W6 7AN.