Thank you Rafa for the response, the ManifoldCF approach that you outlined 
perfectly matches my use case; good luck with the Stanbol Transformation 
Connector. Hope to see it soon.

Thanks
Alok

Sent from my iPhone

> On 30-Oct-2014, at 17:06, Rafa Haro <[email protected]> wrote:
> 
> Hi Alok, 
> 
> Depending on which kind of architecture you want for your system, there are 
> different possibilities. Let me list you some of them:
> 
> 1. CMS Adapter: I have played around with it in the past but never tested it 
> seriously, so I can’t not talk from the experience. Depending on the CMS, 
> there have been others users in the past reporting all kind of problems. 
> According to documentation, it allows you to represent your repository as a 
> graph in RDF (and probably will allow later for example to perform SPARQL 
> queries over this representation) and, also, it allows you to directly feed 
> the Stanbol ContentHub for Semantic Search.
> 
> 2. ContentHub: this component 
> (https://stanbol.apache.org/docs/trunk/components/contenthub/contenthub5min) 
> allows users to define custom Semantic cores on top of Solr. It is not longer 
> supported in the current version of Stanbol (1.0) but it is supported in 
> 0.12.* releases. The documentation is more or less clear, but basically what 
> you can do with ContentHub is to define a custom schema using an LDPath 
> program. The LDPath program defines a set of fields to be stored in Solr and 
> how to populate those fields from the Enhancer results. The workflow is the 
> following: you can take the content out from your CMS and sent it to the 
> ContentHub through a REST API. The content is enriched with a configured 
> chain. The Enhancement Structure resultant from the enrichment process is 
> parsed using the configured LDPath program. As a result, you get a list of 
> fields values to be stored in Solr. Besides these fields, by default, the 
> textual content is also stored in Solr and the Enhancement Structure is 
> stored in a Clerezza graph with an unique id for your index. So at the end 
> you have a graph relating your content with entities.
> 
> 3. Use Apache ManifoldCF: Apache ManifoldCF is an effort to provide an open 
> source framework for connecting source content repositories like Microsoft 
> Sharepoint, EMC Documentum, Alfresco or any CMIS compatible CMS, to target 
> repositories or indexes, such as Apache Solr or ElasticSearch. ManifoldCF 
> allows you to crawl your content from your CMS supporting “incremental 
> crawling”, i.e., managing deletions, additions, modifications, etc. of the 
> content in your CMS. Recently, ManifoldCF is supporting Transformation 
> Connectors, which basically allows to process the content before indexing it. 
> I’m currently working on a Stanbol Transformation Connector that, following 
> the ContentHub use case, will allow to enrich the content with Stanbol and 
> store the extracted entities information as plain metadata. I will be 
> contributing this to ManifoldCF in the following weeks.
> 
> Hope this email helps.
> Cheers,
> Rafa
> 
> 
> En 29 de octubre de 2014 en 7:09:07, Alok K. Shukla 
> ([email protected]) escrito:
> 
> Hi everyone  
> 
> I would like to use Stanbol with existing CMS for Semantic Search. From 
> documentation of CMS Adapter, I get that it would be the starting point for 
> the task. Can someone please guide me along, specially with building indexes; 
> how entities would be created out of CMS data. Any help would be highly 
> appreciated.  
> 
> Thanks  
> Alok  
> 
> Sent from my iPhone

Reply via email to