Hi All, For my project: FOAF co-reference based disambiguation, as the first milestone I'm developing an EntityHub ReferencedSite for a foaf data-set. With help from Rupert and others I was able to index a sample foaf dataset using the genericrdf indexing tool and setup a referenced-site. foaf-data can be filtered, by using propertyfilter.config to import foaf:*. This will import all entities which define foaf properties. The next step will be to develop a EntityProcessor to further filter and clean the foaf data by defining the required foaf properties that are going to be used for disambiguation purpose.
To continue my project I would like to finalize the FOAF dataset I need to use, and highly appreciate your input on this. In the foaf-wiki site [1] there are many datasource projects but many of them are out of date. Following are my findings for a dataset for my project; 1. The billion-tripple challenge 2012 project [2] , a web-crawled dataset including data from dbpedia, freebase, datahub, timbl, rest datasources. Quantity wise I think this has a sufficient amount (1436545545 quads) of data and it's fairly upto date. 2. WebDataCommons project [3] which has a dataset (1079175202 quads) created in August 2012. But the sources of the data is not specified in the project. I have posted on their group asking if they have foaf data in their dataset, waiting for their suggestions on it. 3. DBpedia also has resources having foaf properties. Specially 'dbpedia-ont:Person' type entities contain foaf properties. I think we can map dbpedia-ont:Person to a FOAF profile here. WDYT? 4. There are several websites like http://iwlearn.net/, opera-community exposing their contact list as FOAF, but they don't contain data on public figures, celebrities AFAIK. Can I please have your opinions on finalizing a dataset for my project? Appreciate your help. Thanks, Dileepa [1] http://www.w3.org/wiki/FoafSites [2] http://km.aifb.kit.edu/projects/btc-2012/ [3] http://webdatacommons.org/