Hi Alok, Depending on which kind of architecture you want for your system, there are different possibilities. Let me list you some of them:
1. CMS Adapter: I have played around with it in the past but never tested it seriously, so I can’t not talk from the experience. Depending on the CMS, there have been others users in the past reporting all kind of problems. According to documentation, it allows you to represent your repository as a graph in RDF (and probably will allow later for example to perform SPARQL queries over this representation) and, also, it allows you to directly feed the Stanbol ContentHub for Semantic Search. 2. ContentHub: this component (https://stanbol.apache.org/docs/trunk/components/contenthub/contenthub5min) allows users to define custom Semantic cores on top of Solr. It is not longer supported in the current version of Stanbol (1.0) but it is supported in 0.12.* releases. The documentation is more or less clear, but basically what you can do with ContentHub is to define a custom schema using an LDPath program. The LDPath program defines a set of fields to be stored in Solr and how to populate those fields from the Enhancer results. The workflow is the following: you can take the content out from your CMS and sent it to the ContentHub through a REST API. The content is enriched with a configured chain. The Enhancement Structure resultant from the enrichment process is parsed using the configured LDPath program. As a result, you get a list of fields values to be stored in Solr. Besides these fields, by default, the textual content is also stored in Solr and the Enhancement Structure is stored in a Clerezza graph with an unique id for your index. So at the end you have a graph relating your content with entities. 3. Use Apache ManifoldCF: Apache ManifoldCF is an effort to provide an open source framework for connecting source content repositories like Microsoft Sharepoint, EMC Documentum, Alfresco or any CMIS compatible CMS, to target repositories or indexes, such as Apache Solr or ElasticSearch. ManifoldCF allows you to crawl your content from your CMS supporting “incremental crawling”, i.e., managing deletions, additions, modifications, etc. of the content in your CMS. Recently, ManifoldCF is supporting Transformation Connectors, which basically allows to process the content before indexing it. I’m currently working on a Stanbol Transformation Connector that, following the ContentHub use case, will allow to enrich the content with Stanbol and store the extracted entities information as plain metadata. I will be contributing this to ManifoldCF in the following weeks. Hope this email helps. Cheers, Rafa En 29 de octubre de 2014 en 7:09:07, Alok K. Shukla ([email protected]) escrito: Hi everyone I would like to use Stanbol with existing CMS for Semantic Search. From documentation of CMS Adapter, I get that it would be the starting point for the task. Can someone please guide me along, specially with building indexes; how entities would be created out of CMS data. Any help would be highly appreciated. Thanks Alok Sent from my iPhone
