Hi Dileepa,

El 26/06/13 08:23, Dileepa Jayakody escribió:
Hi All,

Below is the reply I got from Andreas Harth from webdatacommons project. He
suggests that the btc-2012 dataset I mentioned in my previous mail has a
sufficient FOAF dataset.
Shall I go ahead with that dataset for my project?
+1 for me. Because it contains data from world domain dataset, it should be easier for you to then build a good evaluation dataset for your approach.

"the BTC 2012 has FOAF data [1].  You'd get a more comprehensive FOAF
dataset if you first get all instances of foaf:Persons (simple grep)
and then start a crawl from those, e.g., via LDSpider [2].  I assume
that a hop-1 crawl would already get you a sizable dataset.

All the best with your project, I look forward to seeing the results!

Best regards,
Andreas.

[1] 
http://km.aifb.kit.edu/**projects/btc-2012/<http://km.aifb.kit.edu/projects/btc-2012/>
[2] http://code.google.com/p/**ldspider/<http://code.google.com/p/ldspider/>
"
Thanks,
Dileepa


On Tue, Jun 25, 2013 at 5:45 PM, Dileepa Jayakody <dileepajayak...@gmail.com
wrote:
Hi All,

For my project:  FOAF co-reference based disambiguation, as the first
milestone I'm developing an EntityHub ReferencedSite for a foaf data-set.
With help from Rupert and others I was able to index a sample foaf dataset
using the genericrdf indexing tool and setup a referenced-site. foaf-data
can be filtered, by using propertyfilter.config to import foaf:*. This will
import all entities which define foaf properties. The next step will be
to develop a EntityProcessor to further filter and clean the foaf data by
defining the required foaf properties that are going to be used for
disambiguation purpose.

To continue my project I would like to finalize the FOAF dataset I need to
use, and highly appreciate your input on this.
In the foaf-wiki site [1] there are many datasource projects but many of
them are out of date.

Following are my findings for a dataset for my project;

1. The billion-tripple challenge 2012 project [2] , a web-crawled dataset
including data from dbpedia, freebase, datahub, timbl, rest datasources. 
Quantity
wise I think this has a sufficient amount (1436545545 quads) of data and
it's fairly upto date.
2. WebDataCommons project [3] which has a dataset (1079175202 quads)
created in August 2012. But the sources of the data is not specified in the
project. I have posted on their group asking if they have foaf data in
their dataset, waiting for their suggestions on it.

3. DBpedia also has resources having foaf properties. Specially 
'dbpedia-ont:Person'
type entities contain foaf properties. I think we can map
dbpedia-ont:Person to a FOAF profile here. WDYT?

4. There are several websites like http://iwlearn.net/, opera-community
exposing their contact list as FOAF, but they don't contain data on public
figures, celebrities AFAIK.

Can I please have your opinions on finalizing a dataset for my project?
Appreciate your help.

Thanks,
Dileepa

[1] http://www.w3.org/wiki/FoafSites
[2] http://km.aifb.kit.edu/projects/btc-2012/
[3] http://webdatacommons.org/



--

------------------------------
This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately. Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, London W6 7AN.

Reply via email to