Hi All,

For my project:  FOAF co-reference based disambiguation, as the first
milestone I'm developing an EntityHub ReferencedSite for a foaf data-set.
With help from Rupert and others I was able to index a sample foaf dataset
using the genericrdf indexing tool and setup a referenced-site. foaf-data
can be filtered, by using propertyfilter.config to import foaf:*. This will
import all entities which define foaf properties. The next step will be to
develop a EntityProcessor to further filter and clean the foaf data by
defining the required foaf properties that are going to be used for
disambiguation purpose.

To continue my project I would like to finalize the FOAF dataset I need to
use, and highly appreciate your input on this.
In the foaf-wiki site [1] there are many datasource projects but many of
them are out of date.

Following are my findings for a dataset for my project;

1. The billion-tripple challenge 2012 project [2] , a web-crawled dataset
including data from dbpedia, freebase, datahub, timbl, rest
datasources. Quantity
wise I think this has a sufficient amount (1436545545 quads) of data and
it's fairly upto date.
2. WebDataCommons project [3] which has a dataset (1079175202 quads)
created in August 2012. But the sources of the data is not specified in the
project. I have posted on their group asking if they have foaf data in
their dataset, waiting for their suggestions on it.

3. DBpedia also has resources having foaf properties. Specially
'dbpedia-ont:Person'
type entities contain foaf properties. I think we can map
dbpedia-ont:Person to a FOAF profile here. WDYT?

4. There are several websites like http://iwlearn.net/, opera-community
exposing their contact list as FOAF, but they don't contain data on public
figures, celebrities AFAIK.

Can I please have your opinions on finalizing a dataset for my project?
Appreciate your help.

Thanks,
Dileepa

[1] http://www.w3.org/wiki/FoafSites
[2] http://km.aifb.kit.edu/projects/btc-2012/
[3] http://webdatacommons.org/

Reply via email to