On Mon, Jul 8, 2013 at 2:19 AM, Dileepa Jayakody
<dileepajayak...@gmail.com>wrote:

> Hi All,
>
> I continued with the btc2012 dataset to create a foaf-site for Stanbol as
> per your opinions.
> Thanks to all for providing me your opinions. @Andreas I have updated the
> foaf-wiki page as you suggested by removing obsolete links to
> foaf data-source projects :)
>
> btc2012 contains data from 5 main sources: datahub, dbpedia, freebase,
> rest and timbl.
> Since Stanbol already has dbpedia and freebase datasets integrated I used
> only datahub and timble datasets to create a foaf-site.
> I used the 
> datahub/data-3.nq.gz<http://km.aifb.kit.edu/projects/btc-2012/datahub/data-3.nq.gz>and
> timbl/data-6.nq.gz<http://km.aifb.kit.edu/projects/btc-2012/timbl/data-6.nq.gz>
>  datasets both of size ~1GB.
>
> For the foaf-site creation and indexing process, I used the generic-rdf
> indexing tool [1] .
> Following is the process I used to create a foaf-site for Stanbol using
> btc2012 dataset.
>
> *Steps*
>
> 1. Build the generic-rdf indexing tool using *mvn clean install*.
>
> 2. Initialize the tool with below command :
> *java -jar org.apache.stanbol.entityhub.indexing.genericrdf
> -0.12.0-SNAPSHOT.jar init*
> Above initialization command will create the indexing tool directories for
> various purposes in the indexing process.
>
> 3. Configure the tool to filter foaf entities.
> ${indexingToolDir}/indexing/config is the main configuration directory of
> the tool.
> 3.1. To filter entities which define foaf:properties configure below
> entries in indexing.properties
>
> *
> entityDataIterable=org.apache.stanbol.entityhub.indexing.source.jenatdb.RdfIndexingSource,config:indexingsource,bnode:true
> *
> (Please note the additional bnode:true parameter above is activated to
> process blank nodes in the dataset)
>
> Above entityDataIterable configuration requires 2 additional configuration
> files : indexingsource.properties and propertiyfilter.config. These files
> are not included in generic-rdf index tool by default.
> You can use the 2 files used in freebase indexing tool at [2] for
> filtering purpose. Copy the 2 files into ${indexingToolDir}/indexing/config
> and add the below entry to propertyfilter.config
> *
> *
> *foaf:**
> Above entry instructs the tool to filter entities which defines some foaf
> property in foaf namespace.
>
> 3.2. Configure the FieldValueFilter to index only foaf:Person and
> foaf:Organization type entities by activating 'values' as below.
> *values=foaf:Person;foaf:Organization*
>
> 3.3. Check above entity filtering (FieldValueFilter) is enabled in
> indexing.properties by searching for below entry.
> *
> entityProcessor=org.apache.stanbol.entityhub.indexing.core.processor.FieldValueFilter,config:entityTypes;
>  *
>

>

>
**
> 4. Change the 'name' value in indexing.properties to a suitable new Site
> name (eg: foaf-site ) and run the indexing tool using below command:
> *java -Xmx1024m -jar  org.apache.stanbol.entityhub.indexing.genericrdf
> -0.12.0-SNAPSHOT.jar index*
>
> Don't forget to copy the n-quad datafiles downloaded from btc2012 to
{indexingToolDir}/indexing/resources/rdfdata directory prior to executing
indexing command :)


> 5. Above will execute the entity importing  and indexing process and
> create 2 files in {indexingToolDir}/indexing/dist directory.
> Copy the generated org.apache.stanbol.data.site.foaf-site-1.0.0.jar to
> ${stanbol}/fileinstall directory.
> Copy the generated foaf-site.solrindex.zip to ${stanbol}/datafiles
> directory.
>
> 6. Launch Stanbol server using full-launcher and access the foaf-site at
> : localhost:8080/entityhub/site/foaf-site
>
> So with this I have completed the first milestone I had in mind for my
> Project.
> The next task is to identify and define the foaf properties set which are
> going to be used as keys in the disambiguation algorithm. This task also
> includes developing an EntityProcessor to filter foaf entities further by
> allowing only the entities which have disambiguation properties identified
> above.
>
> Your thoughts and opinions in moving forward are highly appreciated.
>
> Thanks,
> Dileepa
>
> [1]
> https://svn.apache.org/repos/asf/stanbol/trunk/entityhub/indexing/genericrdf
> [2]
> https://svn.apache.org/repos/asf/stanbol/trunk/entityhub/indexing/freebase
>
> On Thu, Jun 27, 2013 at 11:00 AM, Andreas Kuckartz <a.kucka...@ping.de>wrote:
>
>> Dileepa Jayakody:
>> > In the foaf-wiki site [1] there are many datasource projects but many
>> > of them are out of date.
>>
>> If possible please take a few minutes to update that Wiki page.
>>
>> > Can I please have your opinions on finalizing a dataset for my
>> > project?
>>
>> The main criteria in my opinion should be:
>> - how much effort is necessary ?
>> - how much data can be expected regarding "co-reference" ?
>>
>> That being said I thing that the btc dataset would be a good choice. It
>> was created to be used in projects such as yours.
>>
>> Cheers,
>> Andreas
>>
>
>

Reply via email to