Hi All,

After successfully deploying the foaf-site, I configured an enhancement
chain to perform content enhancements using my foaf-site. Most of these
configurations were done via the osgi console configuration manager of
Apache stanbol accessible at [1].

I rerun the indexing tool with some changes in the mappings.txt to use
foaf:name, firstName, givenName etc to be used as labels in identifying and
linking entities in the content. Also I thought of using both rdfs:seeAlso
and owl:sameAs as redirect fields and converged both of them into
fise:redirects and used as the redirect field in the linking engine
configuration explaned below.

Following are the extra configurations I added to mappings.txt in the
indexing tool;
rdfs:seeAlso > fise:redirects
owl:sameAs > fise:redirects

foaf:name > rdfs:label
foaf:nick > rdfs:label
foaf:givenName > rdfs:label
foaf:familyName > rdfs:label
foaf:firstName > rdfs:label


Following are the enhancement engine configurations I did.

1. Configure a new entityhub-linking-engine [2] :
Name : foaf-site-linking
Referenced site : foaf-site
Redirect field : fise:redirects
Case sensitivity : disabled

2. Configure a weighted enhancement chain [3] :
Name : foaf-site-chain
Engines : langdetect, opennlp-sentence, opennlp-token, opennlp-pos,
foaf-site-linking

3. Now you can invoke the new foaf-site-chain by going to :
http://localhost:8080/enhancer/chain/foaf-site-chain
and giving a test content like : "Tim Bernes Lee is the inventor of World
Wide Web"

Following is a screenshot of the identified entities: Timb Berness Lee and
World Wide Web from my foaf-site dataset.
[image: Inline image 1]


Thanks,
Dileepa

[1] http://localhost:8080/system/console/configMgr
[2]
https://stanbol.apache.org/docs/trunk/components/enhancer/engines/entityhublinking
[3]
http://stanbol.apache.org/docs/trunk/components/enhancer/chains/weightedchain.html


On Mon, Jul 8, 2013 at 2:24 AM, Dileepa Jayakody
<dileepajayak...@gmail.com>wrote:

>
>
>
> On Mon, Jul 8, 2013 at 2:19 AM, Dileepa Jayakody <
> dileepajayak...@gmail.com> wrote:
>
>> Hi All,
>>
>> I continued with the btc2012 dataset to create a foaf-site for Stanbol as
>> per your opinions.
>> Thanks to all for providing me your opinions. @Andreas I have updated the
>> foaf-wiki page as you suggested by removing obsolete links to
>> foaf data-source projects :)
>>
>> btc2012 contains data from 5 main sources: datahub, dbpedia, freebase,
>> rest and timbl.
>> Since Stanbol already has dbpedia and freebase datasets integrated I used
>> only datahub and timble datasets to create a foaf-site.
>> I used the 
>> datahub/data-3.nq.gz<http://km.aifb.kit.edu/projects/btc-2012/datahub/data-3.nq.gz>and
>> timbl/data-6.nq.gz<http://km.aifb.kit.edu/projects/btc-2012/timbl/data-6.nq.gz>
>>  datasets both of size ~1GB.
>>
>> For the foaf-site creation and indexing process, I used the generic-rdf
>> indexing tool [1] .
>> Following is the process I used to create a foaf-site for Stanbol using
>> btc2012 dataset.
>>
>> *Steps*
>>
>> 1. Build the generic-rdf indexing tool using *mvn clean install*.
>>
>> 2. Initialize the tool with below command :
>> *java -jar org.apache.stanbol.entityhub.indexing.genericrdf
>> -0.12.0-SNAPSHOT.jar init*
>> Above initialization command will create the indexing tool directories
>> for various purposes in the indexing process.
>>
>> 3. Configure the tool to filter foaf entities.
>> ${indexingToolDir}/indexing/config is the main configuration directory of
>> the tool.
>> 3.1. To filter entities which define foaf:properties configure below
>> entries in indexing.properties
>>
>> *
>> entityDataIterable=org.apache.stanbol.entityhub.indexing.source.jenatdb.RdfIndexingSource,config:indexingsource,bnode:true
>> *
>> (Please note the additional bnode:true parameter above is activated to
>> process blank nodes in the dataset)
>>
>> Above entityDataIterable configuration requires 2 additional
>> configuration files : indexingsource.properties and propertiyfilter.config.
>> These files are not included in generic-rdf index tool by default.
>> You can use the 2 files used in freebase indexing tool at [2] for
>> filtering purpose. Copy the 2 files into ${indexingToolDir}/indexing/config
>> and add the below entry to propertyfilter.config
>> *
>> *
>> *foaf:**
>> Above entry instructs the tool to filter entities which defines some foaf
>> property in foaf namespace.
>>
>> 3.2. Configure the FieldValueFilter to index only foaf:Person and
>> foaf:Organization type entities by activating 'values' as below.
>> *values=foaf:Person;foaf:Organization*
>>
>> 3.3. Check above entity filtering (FieldValueFilter) is enabled in
>> indexing.properties by searching for below entry.
>> *
>> entityProcessor=org.apache.stanbol.entityhub.indexing.core.processor.FieldValueFilter,config:entityTypes;
>>  *
>>
>
>>
>
>>
> **
>> 4. Change the 'name' value in indexing.properties to a suitable new Site
>> name (eg: foaf-site ) and run the indexing tool using below command:
>> *java -Xmx1024m -jar  org.apache.stanbol.entityhub.indexing.genericrdf
>> -0.12.0-SNAPSHOT.jar index*
>>
>> Don't forget to copy the n-quad datafiles downloaded from btc2012 to
> {indexingToolDir}/indexing/resources/rdfdata directory prior to executing
> indexing command :)
>
>
>> 5. Above will execute the entity importing  and indexing process and
>> create 2 files in {indexingToolDir}/indexing/dist directory.
>> Copy the generated org.apache.stanbol.data.site.foaf-site-1.0.0.jar to
>> ${stanbol}/fileinstall directory.
>> Copy the generated foaf-site.solrindex.zip to ${stanbol}/datafiles
>> directory.
>>
>> 6. Launch Stanbol server using full-launcher and access the foaf-site at
>> : localhost:8080/entityhub/site/foaf-site
>>
>> So with this I have completed the first milestone I had in mind for my
>> Project.
>> The next task is to identify and define the foaf properties set which are
>> going to be used as keys in the disambiguation algorithm. This task also
>> includes developing an EntityProcessor to filter foaf entities further by
>> allowing only the entities which have disambiguation properties identified
>> above.
>>
>> Your thoughts and opinions in moving forward are highly appreciated.
>>
>> Thanks,
>> Dileepa
>>
>> [1]
>> https://svn.apache.org/repos/asf/stanbol/trunk/entityhub/indexing/genericrdf
>> [2]
>> https://svn.apache.org/repos/asf/stanbol/trunk/entityhub/indexing/freebase
>>
>> On Thu, Jun 27, 2013 at 11:00 AM, Andreas Kuckartz <a.kucka...@ping.de>wrote:
>>
>>> Dileepa Jayakody:
>>> > In the foaf-wiki site [1] there are many datasource projects but many
>>> > of them are out of date.
>>>
>>> If possible please take a few minutes to update that Wiki page.
>>>
>>> > Can I please have your opinions on finalizing a dataset for my
>>> > project?
>>>
>>> The main criteria in my opinion should be:
>>> - how much effort is necessary ?
>>> - how much data can be expected regarding "co-reference" ?
>>>
>>> That being said I thing that the btc dataset would be a good choice. It
>>> was created to be used in projects such as yours.
>>>
>>> Cheers,
>>> Andreas
>>>
>>
>>
>
>

Reply via email to