Hi All, After successfully deploying the foaf-site, I configured an enhancement chain to perform content enhancements using my foaf-site. Most of these configurations were done via the osgi console configuration manager of Apache stanbol accessible at [1].
I rerun the indexing tool with some changes in the mappings.txt to use foaf:name, firstName, givenName etc to be used as labels in identifying and linking entities in the content. Also I thought of using both rdfs:seeAlso and owl:sameAs as redirect fields and converged both of them into fise:redirects and used as the redirect field in the linking engine configuration explaned below. Following are the extra configurations I added to mappings.txt in the indexing tool; rdfs:seeAlso > fise:redirects owl:sameAs > fise:redirects foaf:name > rdfs:label foaf:nick > rdfs:label foaf:givenName > rdfs:label foaf:familyName > rdfs:label foaf:firstName > rdfs:label Following are the enhancement engine configurations I did. 1. Configure a new entityhub-linking-engine [2] : Name : foaf-site-linking Referenced site : foaf-site Redirect field : fise:redirects Case sensitivity : disabled 2. Configure a weighted enhancement chain [3] : Name : foaf-site-chain Engines : langdetect, opennlp-sentence, opennlp-token, opennlp-pos, foaf-site-linking 3. Now you can invoke the new foaf-site-chain by going to : http://localhost:8080/enhancer/chain/foaf-site-chain and giving a test content like : "Tim Bernes Lee is the inventor of World Wide Web" Following is a screenshot of the identified entities: Timb Berness Lee and World Wide Web from my foaf-site dataset. [image: Inline image 1] Thanks, Dileepa [1] http://localhost:8080/system/console/configMgr [2] https://stanbol.apache.org/docs/trunk/components/enhancer/engines/entityhublinking [3] http://stanbol.apache.org/docs/trunk/components/enhancer/chains/weightedchain.html On Mon, Jul 8, 2013 at 2:24 AM, Dileepa Jayakody <dileepajayak...@gmail.com>wrote: > > > > On Mon, Jul 8, 2013 at 2:19 AM, Dileepa Jayakody < > dileepajayak...@gmail.com> wrote: > >> Hi All, >> >> I continued with the btc2012 dataset to create a foaf-site for Stanbol as >> per your opinions. >> Thanks to all for providing me your opinions. @Andreas I have updated the >> foaf-wiki page as you suggested by removing obsolete links to >> foaf data-source projects :) >> >> btc2012 contains data from 5 main sources: datahub, dbpedia, freebase, >> rest and timbl. >> Since Stanbol already has dbpedia and freebase datasets integrated I used >> only datahub and timble datasets to create a foaf-site. >> I used the >> datahub/data-3.nq.gz<http://km.aifb.kit.edu/projects/btc-2012/datahub/data-3.nq.gz>and >> timbl/data-6.nq.gz<http://km.aifb.kit.edu/projects/btc-2012/timbl/data-6.nq.gz> >> datasets both of size ~1GB. >> >> For the foaf-site creation and indexing process, I used the generic-rdf >> indexing tool [1] . >> Following is the process I used to create a foaf-site for Stanbol using >> btc2012 dataset. >> >> *Steps* >> >> 1. Build the generic-rdf indexing tool using *mvn clean install*. >> >> 2. Initialize the tool with below command : >> *java -jar org.apache.stanbol.entityhub.indexing.genericrdf >> -0.12.0-SNAPSHOT.jar init* >> Above initialization command will create the indexing tool directories >> for various purposes in the indexing process. >> >> 3. Configure the tool to filter foaf entities. >> ${indexingToolDir}/indexing/config is the main configuration directory of >> the tool. >> 3.1. To filter entities which define foaf:properties configure below >> entries in indexing.properties >> >> * >> entityDataIterable=org.apache.stanbol.entityhub.indexing.source.jenatdb.RdfIndexingSource,config:indexingsource,bnode:true >> * >> (Please note the additional bnode:true parameter above is activated to >> process blank nodes in the dataset) >> >> Above entityDataIterable configuration requires 2 additional >> configuration files : indexingsource.properties and propertiyfilter.config. >> These files are not included in generic-rdf index tool by default. >> You can use the 2 files used in freebase indexing tool at [2] for >> filtering purpose. Copy the 2 files into ${indexingToolDir}/indexing/config >> and add the below entry to propertyfilter.config >> * >> * >> *foaf:** >> Above entry instructs the tool to filter entities which defines some foaf >> property in foaf namespace. >> >> 3.2. Configure the FieldValueFilter to index only foaf:Person and >> foaf:Organization type entities by activating 'values' as below. >> *values=foaf:Person;foaf:Organization* >> >> 3.3. Check above entity filtering (FieldValueFilter) is enabled in >> indexing.properties by searching for below entry. >> * >> entityProcessor=org.apache.stanbol.entityhub.indexing.core.processor.FieldValueFilter,config:entityTypes; >> * >> > >> > >> > ** >> 4. Change the 'name' value in indexing.properties to a suitable new Site >> name (eg: foaf-site ) and run the indexing tool using below command: >> *java -Xmx1024m -jar org.apache.stanbol.entityhub.indexing.genericrdf >> -0.12.0-SNAPSHOT.jar index* >> >> Don't forget to copy the n-quad datafiles downloaded from btc2012 to > {indexingToolDir}/indexing/resources/rdfdata directory prior to executing > indexing command :) > > >> 5. Above will execute the entity importing and indexing process and >> create 2 files in {indexingToolDir}/indexing/dist directory. >> Copy the generated org.apache.stanbol.data.site.foaf-site-1.0.0.jar to >> ${stanbol}/fileinstall directory. >> Copy the generated foaf-site.solrindex.zip to ${stanbol}/datafiles >> directory. >> >> 6. Launch Stanbol server using full-launcher and access the foaf-site at >> : localhost:8080/entityhub/site/foaf-site >> >> So with this I have completed the first milestone I had in mind for my >> Project. >> The next task is to identify and define the foaf properties set which are >> going to be used as keys in the disambiguation algorithm. This task also >> includes developing an EntityProcessor to filter foaf entities further by >> allowing only the entities which have disambiguation properties identified >> above. >> >> Your thoughts and opinions in moving forward are highly appreciated. >> >> Thanks, >> Dileepa >> >> [1] >> https://svn.apache.org/repos/asf/stanbol/trunk/entityhub/indexing/genericrdf >> [2] >> https://svn.apache.org/repos/asf/stanbol/trunk/entityhub/indexing/freebase >> >> On Thu, Jun 27, 2013 at 11:00 AM, Andreas Kuckartz <a.kucka...@ping.de>wrote: >> >>> Dileepa Jayakody: >>> > In the foaf-wiki site [1] there are many datasource projects but many >>> > of them are out of date. >>> >>> If possible please take a few minutes to update that Wiki page. >>> >>> > Can I please have your opinions on finalizing a dataset for my >>> > project? >>> >>> The main criteria in my opinion should be: >>> - how much effort is necessary ? >>> - how much data can be expected regarding "co-reference" ? >>> >>> That being said I thing that the btc dataset would be a good choice. It >>> was created to be used in projects such as yours. >>> >>> Cheers, >>> Andreas >>> >> >> > >