Hi all

I do also agree with the dataset selection. Had also a short
conversation with Dileepa about this in the #stanbol channel.

FYI As Dileepa does currently only have the possibility to run the
import on his laptop we will need to limit the size of the dataset to
about 10 million RDF triples. While this looks really low compared to
the size of the referenced datasets (both above 1 billion triples)
this is not necessary the case as the Stanbol Indexing Tool does allow
to filter triples even before they are imported to the triple store.
E.g. it is possible to define a filter that will only import FOAF
namespace related triples.

With that it will be possible to create a good dataset for the initial
phase(s) of the GSoC project. At a later point in time we can create a
much bigger index by running the same configuration on a server
machine.

best
Rupert


On Wed, Jun 26, 2013 at 2:02 PM, Stéphane Corlosquet
<scorlosq...@gmail.com> wrote:
> btc-2012 is definitely a good idea, you should start with it.
>
> If you have time, you might want to also extract foaf:Person and
> schema:Person URIs from the more recent Web Data Commons (WDC) from August
> 2012 [1], and use them as seed sets for crawling more FOAF data (you might
> have to align the schema.org vocabulary to FOAF, I think stanbol allows
> such functionality out of the box).
>
> Steph.
>
> [1]
>
> On Jun 26, 2013 2:24 AM, "Dileepa Jayakody" <dileepajayak...@gmail.com>
> wrote:
>>
>> Hi All,
>>
>> Below is the reply I got from Andreas Harth from webdatacommons project.
> He
>> suggests that the btc-2012 dataset I mentioned in my previous mail has a
>> sufficient FOAF dataset.
>> Shall I go ahead with that dataset for my project?
>>
>> "the BTC 2012 has FOAF data [1].  You'd get a more comprehensive FOAF
>> dataset if you first get all instances of foaf:Persons (simple grep)
>> and then start a crawl from those, e.g., via LDSpider [2].  I assume
>> that a hop-1 crawl would already get you a sizable dataset.
>>
>> All the best with your project, I look forward to seeing the results!
>>
>> Best regards,
>> Andreas.
>>
>> [1] http://km.aifb.kit.edu/**projects/btc-2012/<
> http://km.aifb.kit.edu/projects/btc-2012/>
>> [2] http://code.google.com/p/**ldspider/<
> http://code.google.com/p/ldspider/>
>> "
>> Thanks,
>> Dileepa
>>
>>
>> On Tue, Jun 25, 2013 at 5:45 PM, Dileepa Jayakody <
> dileepajayak...@gmail.com
>> > wrote:
>>
>> > Hi All,
>> >
>> > For my project:  FOAF co-reference based disambiguation, as the first
>> > milestone I'm developing an EntityHub ReferencedSite for a foaf
> data-set.
>> > With help from Rupert and others I was able to index a sample foaf
> dataset
>> > using the genericrdf indexing tool and setup a referenced-site.
> foaf-data
>> > can be filtered, by using propertyfilter.config to import foaf:*. This
> will
>> > import all entities which define foaf properties. The next step will be
>> > to develop a EntityProcessor to further filter and clean the foaf data
> by
>> > defining the required foaf properties that are going to be used for
>> > disambiguation purpose.
>> >
>> > To continue my project I would like to finalize the FOAF dataset I need
> to
>> > use, and highly appreciate your input on this.
>> > In the foaf-wiki site [1] there are many datasource projects but many of
>> > them are out of date.
>> >
>> > Following are my findings for a dataset for my project;
>> >
>> > 1. The billion-tripple challenge 2012 project [2] , a web-crawled
> dataset
>> > including data from dbpedia, freebase, datahub, timbl, rest
> datasources. Quantity
>> > wise I think this has a sufficient amount (1436545545 quads) of data and
>> > it's fairly upto date.
>> > 2. WebDataCommons project [3] which has a dataset (1079175202 quads)
>> > created in August 2012. But the sources of the data is not specified in
> the
>> > project. I have posted on their group asking if they have foaf data in
>> > their dataset, waiting for their suggestions on it.
>> >
>> > 3. DBpedia also has resources having foaf properties. Specially
> 'dbpedia-ont:Person'
>> > type entities contain foaf properties. I think we can map
>> > dbpedia-ont:Person to a FOAF profile here. WDYT?
>> >
>> > 4. There are several websites like http://iwlearn.net/, opera-community
>> > exposing their contact list as FOAF, but they don't contain data on
> public
>> > figures, celebrities AFAIK.
>> >
>> > Can I please have your opinions on finalizing a dataset for my project?
>> > Appreciate your help.
>> >
>> > Thanks,
>> > Dileepa
>> >
>> > [1] http://www.w3.org/wiki/FoafSites
>> > [2] http://km.aifb.kit.edu/projects/btc-2012/
>> > [3] http://webdatacommons.org/
>> >



--
| Rupert Westenthaler             rupert.westentha...@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Reply via email to