Hi Alok, 

Depending on which kind of architecture you want for your system, there are 
different possibilities. Let me list you some of them:

1. CMS Adapter: I have played around with it in the past but never tested it 
seriously, so I can’t not talk from the experience. Depending on the CMS, there 
have been others users in the past reporting all kind of problems. According to 
documentation, it allows you to represent your repository as a graph in RDF 
(and probably will allow later for example to perform SPARQL queries over this 
representation) and, also, it allows you to directly feed the Stanbol 
ContentHub for Semantic Search.

2. ContentHub: this component 
(https://stanbol.apache.org/docs/trunk/components/contenthub/contenthub5min) 
allows users to define custom Semantic cores on top of Solr. It is not longer 
supported in the current version of Stanbol (1.0) but it is supported in 0.12.* 
releases. The documentation is more or less clear, but basically what you can 
do with ContentHub is to define a custom schema using an LDPath program. The 
LDPath program defines a set of fields to be stored in Solr and how to populate 
those fields from the Enhancer results. The workflow is the following: you can 
take the content out from your CMS and sent it to the ContentHub through a REST 
API. The content is enriched with a configured chain. The Enhancement Structure 
resultant from the enrichment process is parsed using the configured LDPath 
program. As a result, you get a list of fields values to be stored in Solr. 
Besides these fields, by default, the textual content is also stored in Solr 
and the Enhancement Structure is stored in a Clerezza graph with an unique id 
for your index. So at the end you have a graph relating your content with 
entities.

3. Use Apache ManifoldCF: Apache ManifoldCF is an effort to provide an open 
source framework for connecting source content repositories like Microsoft 
Sharepoint, EMC Documentum, Alfresco or any CMIS compatible CMS, to target 
repositories or indexes, such as Apache Solr or ElasticSearch. ManifoldCF 
allows you to crawl your content from your CMS supporting “incremental 
crawling”, i.e., managing deletions, additions, modifications, etc. of the 
content in your CMS. Recently, ManifoldCF is supporting Transformation 
Connectors, which basically allows to process the content before indexing it. 
I’m currently working on a Stanbol Transformation Connector that, following the 
ContentHub use case, will allow to enrich the content with Stanbol and store 
the extracted entities information as plain metadata. I will be contributing 
this to ManifoldCF in the following weeks.

Hope this email helps.
Cheers,
Rafa


En 29 de octubre de 2014 en 7:09:07, Alok K. Shukla ([email protected]) 
escrito:

Hi everyone  

I would like to use Stanbol with existing CMS for Semantic Search. From 
documentation of CMS Adapter, I get that it would be the starting point for the 
task. Can someone please guide me along, specially with building indexes; how 
entities would be created out of CMS data. Any help would be highly 
appreciated.  

Thanks  
Alok  

Sent from my iPhone

Reply via email to