Dear all Composite Resource Indexing Service is now ready for review (issue CLEREZZA-501). Junit Tests and documentation is available (install rdf.cris/core on clerezza and search for Composite Resource Indexing Service under /documentation)
excerpt: CRIS is based on Apache Lucene and provides means to index RDF resources. It works by indexing the values of properties on a resource. This enables to search for the property values using CRIS. The results that CRIS delivers are the corresponding RDF resources. GraphIndexer The core of CRIS is the GraphIndexer class. Note that GraphIndexer is not an OSGi service, but it has to be instantiated by the user to provide an index. The GraphIndexer needs two graphs to work with. One graph contains the IndexDefinitions, that is the specification of which resources and properties to index (see IndexDefinitionManager). The other graph is the the graph that contains the resources to index. Note that CRIS indexes RDF resources based on their rdf:type and that the indexing works on a per-property basis. That means, not all properties on a resource are indexed by default. The user has to specify which properties to index. GraphIndexer also provides the interface to search for resources using the findResources method. The search is specified using Conditions and optionally a SortSpecification and FacetCollectors. The findResources method is overloaded with methods that allow the specification of the resource type and search query directly. IndexDefinitionManager The IndexDefinitionManager helps to manage indexing specifications using the CRIS ontology in the index definition graph (see GraphIndexer). Indexing is enabled for resources according to their rdf:type. Additionally the index definitions specify the properties of the resource that are indexed. One can think of an index definition as specifying the keys (properties) that are mapped to the value (the resource URI) in the index. .... Note: - GraphIndexer is quit complex and has many responsibilities. - No other clerezza project depends on Composite Resource Indexing Service. - GraphIndexer is available as Platform CRIS Service in project platform.cris (for the contentgraph incl. additions) @Tommaso Lucene is used in LuceneTools.java in rdf.cris/core. Feedback appreciated - I have little experience with lucene, so feel free to improve it. Especially I am not sure when to call optimize (see comment in LuceneTools) Thanks to Reto, Daniel and Hasan for the work! We already use it in a monitoring tool - the performance is outstanding compared to the available alternatives in clerezza (filter resp. sparql) Cheers Tsuy
