Rupert Westenthaler created STANBOL-1089:
--------------------------------------------

             Summary: Provide Topic Engine SolrConfiguration that do us n-grams
                 Key: STANBOL-1089
                 URL: https://issues.apache.org/jira/browse/STANBOL-1089
             Project: Stanbol
          Issue Type: New Feature
          Components: Enhancement Engines
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler


With the Topic Classification Engine now supporting to configure different 
SolrCore configurations we should provide a configuration that does use n-grams 
for topic classification.

While this will not scale for very big classification schemes is should provide 
improvements to small and medium sized models.

Indexing of n-grams will be based on the Solr ShingleFilterFactory [1].

The SolrCore configuration will be provided by the name 
'shingle-topic-model.solrindex.zip' by the Topic ClassificationEngine bundle to 
the DataFileProvider. This means that users will need to configure this name 
with the 'org.apache.stanbol.enhancer.engine.topic.solrCoreConfig' of the 
TopicClassificationEngine. This property was added by STANBOL-1087


[1] 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to