Thank you Rafa for the response, the ManifoldCF approach that you outlined perfectly matches my use case; good luck with the Stanbol Transformation Connector. Hope to see it soon.
Thanks Alok Sent from my iPhone > On 30-Oct-2014, at 17:06, Rafa Haro <[email protected]> wrote: > > Hi Alok, > > Depending on which kind of architecture you want for your system, there are > different possibilities. Let me list you some of them: > > 1. CMS Adapter: I have played around with it in the past but never tested it > seriously, so I can’t not talk from the experience. Depending on the CMS, > there have been others users in the past reporting all kind of problems. > According to documentation, it allows you to represent your repository as a > graph in RDF (and probably will allow later for example to perform SPARQL > queries over this representation) and, also, it allows you to directly feed > the Stanbol ContentHub for Semantic Search. > > 2. ContentHub: this component > (https://stanbol.apache.org/docs/trunk/components/contenthub/contenthub5min) > allows users to define custom Semantic cores on top of Solr. It is not longer > supported in the current version of Stanbol (1.0) but it is supported in > 0.12.* releases. The documentation is more or less clear, but basically what > you can do with ContentHub is to define a custom schema using an LDPath > program. The LDPath program defines a set of fields to be stored in Solr and > how to populate those fields from the Enhancer results. The workflow is the > following: you can take the content out from your CMS and sent it to the > ContentHub through a REST API. The content is enriched with a configured > chain. The Enhancement Structure resultant from the enrichment process is > parsed using the configured LDPath program. As a result, you get a list of > fields values to be stored in Solr. Besides these fields, by default, the > textual content is also stored in Solr and the Enhancement Structure is > stored in a Clerezza graph with an unique id for your index. So at the end > you have a graph relating your content with entities. > > 3. Use Apache ManifoldCF: Apache ManifoldCF is an effort to provide an open > source framework for connecting source content repositories like Microsoft > Sharepoint, EMC Documentum, Alfresco or any CMIS compatible CMS, to target > repositories or indexes, such as Apache Solr or ElasticSearch. ManifoldCF > allows you to crawl your content from your CMS supporting “incremental > crawling”, i.e., managing deletions, additions, modifications, etc. of the > content in your CMS. Recently, ManifoldCF is supporting Transformation > Connectors, which basically allows to process the content before indexing it. > I’m currently working on a Stanbol Transformation Connector that, following > the ContentHub use case, will allow to enrich the content with Stanbol and > store the extracted entities information as plain metadata. I will be > contributing this to ManifoldCF in the following weeks. > > Hope this email helps. > Cheers, > Rafa > > > En 29 de octubre de 2014 en 7:09:07, Alok K. Shukla > ([email protected]) escrito: > > Hi everyone > > I would like to use Stanbol with existing CMS for Semantic Search. From > documentation of CMS Adapter, I get that it would be the starting point for > the task. Can someone please guide me along, specially with building indexes; > how entities would be created out of CMS data. Any help would be highly > appreciated. > > Thanks > Alok > > Sent from my iPhone
