Hi All, Below is the reply I got from Andreas Harth from webdatacommons project. He suggests that the btc-2012 dataset I mentioned in my previous mail has a sufficient FOAF dataset. Shall I go ahead with that dataset for my project?
"the BTC 2012 has FOAF data [1]. You'd get a more comprehensive FOAF dataset if you first get all instances of foaf:Persons (simple grep) and then start a crawl from those, e.g., via LDSpider [2]. I assume that a hop-1 crawl would already get you a sizable dataset. All the best with your project, I look forward to seeing the results! Best regards, Andreas. [1] http://km.aifb.kit.edu/**projects/btc-2012/<http://km.aifb.kit.edu/projects/btc-2012/> [2] http://code.google.com/p/**ldspider/<http://code.google.com/p/ldspider/> " Thanks, Dileepa On Tue, Jun 25, 2013 at 5:45 PM, Dileepa Jayakody <dileepajayak...@gmail.com > wrote: > Hi All, > > For my project: FOAF co-reference based disambiguation, as the first > milestone I'm developing an EntityHub ReferencedSite for a foaf data-set. > With help from Rupert and others I was able to index a sample foaf dataset > using the genericrdf indexing tool and setup a referenced-site. foaf-data > can be filtered, by using propertyfilter.config to import foaf:*. This will > import all entities which define foaf properties. The next step will be > to develop a EntityProcessor to further filter and clean the foaf data by > defining the required foaf properties that are going to be used for > disambiguation purpose. > > To continue my project I would like to finalize the FOAF dataset I need to > use, and highly appreciate your input on this. > In the foaf-wiki site [1] there are many datasource projects but many of > them are out of date. > > Following are my findings for a dataset for my project; > > 1. The billion-tripple challenge 2012 project [2] , a web-crawled dataset > including data from dbpedia, freebase, datahub, timbl, rest datasources. > Quantity > wise I think this has a sufficient amount (1436545545 quads) of data and > it's fairly upto date. > 2. WebDataCommons project [3] which has a dataset (1079175202 quads) > created in August 2012. But the sources of the data is not specified in the > project. I have posted on their group asking if they have foaf data in > their dataset, waiting for their suggestions on it. > > 3. DBpedia also has resources having foaf properties. Specially > 'dbpedia-ont:Person' > type entities contain foaf properties. I think we can map > dbpedia-ont:Person to a FOAF profile here. WDYT? > > 4. There are several websites like http://iwlearn.net/, opera-community > exposing their contact list as FOAF, but they don't contain data on public > figures, celebrities AFAIK. > > Can I please have your opinions on finalizing a dataset for my project? > Appreciate your help. > > Thanks, > Dileepa > > [1] http://www.w3.org/wiki/FoafSites > [2] http://km.aifb.kit.edu/projects/btc-2012/ > [3] http://webdatacommons.org/ >