[ 
https://issues.apache.org/jira/browse/SOLR-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924971#action_12924971
 ] 

Tommaso Teofili commented on SOLR-2129:
---------------------------------------

Hi Grant, I think it would be great to have Mahout classifiers inside Solr :)

I like your suggestion at point 1. 
I can change the current hardcoded mapping mechanism using instead a simple 
mapping between UIMA extracted types/features and field names defined inside 
solrconfig.xml.

A different option could be to develop a SolrCASConsumer component in UIMA 
(similar to Lucas [1], Lucene CAS Consumer) providing full control on how UIMA 
annotations and features can be mapped to Solr fields, but on UIMA side ;)

Regarding point 2 the jars are already under contrib/uima/lib so I can modify 
the sample solrconfig.xml adding the proper <lib> tag.
Thanks for your comments and suggestions.

[1] : https://svn.apache.org/repos/asf/uima/sandbox/trunk/Lucas

> Provide a Solr module for dynamic metadata extraction/indexing with Apache 
> UIMA
> -------------------------------------------------------------------------------
>
>                 Key: SOLR-2129
>                 URL: https://issues.apache.org/jira/browse/SOLR-2129
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Tommaso Teofili
>            Assignee: Robert Muir
>         Attachments: lib-jars.zip, SOLR-2129-asf-headers.patch, 
> SOLR-2129.patch
>
>
> Provide components to enable Apache UIMA automatic metadata extraction to be 
> exploited when indexing documents.
> The purpose of this is to get unstructured information "inside" a document 
> and create structured metadata (as fields) to enrich each document.
> Basically this can be done with a custom UpdateRequestProcessor which 
> triggers UIMA while indexing documents.
> The basic UIMA implementation of UpdateRequestProcessor extracts sentences 
> (with a tokenizer and an hidden Markov model tagger), named entities, 
> language, suggested category, keywords and concepts (exploiting external 
> services from OpenCalais and AlchemyAPI). Such an implementation can be 
> easily extended adding or selecting different UIMA analysis engines, both 
> from UIMA repositories on the web or creating new ones from scratch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to