[ https://issues.apache.org/jira/browse/SOLR-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924971#action_12924971 ]
Tommaso Teofili commented on SOLR-2129: --------------------------------------- Hi Grant, I think it would be great to have Mahout classifiers inside Solr :) I like your suggestion at point 1. I can change the current hardcoded mapping mechanism using instead a simple mapping between UIMA extracted types/features and field names defined inside solrconfig.xml. A different option could be to develop a SolrCASConsumer component in UIMA (similar to Lucas [1], Lucene CAS Consumer) providing full control on how UIMA annotations and features can be mapped to Solr fields, but on UIMA side ;) Regarding point 2 the jars are already under contrib/uima/lib so I can modify the sample solrconfig.xml adding the proper <lib> tag. Thanks for your comments and suggestions. [1] : https://svn.apache.org/repos/asf/uima/sandbox/trunk/Lucas > Provide a Solr module for dynamic metadata extraction/indexing with Apache > UIMA > ------------------------------------------------------------------------------- > > Key: SOLR-2129 > URL: https://issues.apache.org/jira/browse/SOLR-2129 > Project: Solr > Issue Type: New Feature > Reporter: Tommaso Teofili > Assignee: Robert Muir > Attachments: lib-jars.zip, SOLR-2129-asf-headers.patch, > SOLR-2129.patch > > > Provide components to enable Apache UIMA automatic metadata extraction to be > exploited when indexing documents. > The purpose of this is to get unstructured information "inside" a document > and create structured metadata (as fields) to enrich each document. > Basically this can be done with a custom UpdateRequestProcessor which > triggers UIMA while indexing documents. > The basic UIMA implementation of UpdateRequestProcessor extracts sentences > (with a tokenizer and an hidden Markov model tagger), named entities, > language, suggested category, keywords and concepts (exploiting external > services from OpenCalais and AlchemyAPI). Such an implementation can be > easily extended adding or selecting different UIMA analysis engines, both > from UIMA repositories on the web or creating new ones from scratch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org