[ 
https://issues.apache.org/jira/browse/SOLR-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tommaso Teofili updated SOLR-2129:
----------------------------------

    Description: 
Provide components to enable Apache UIMA automatic metadata extraction to be 
exploited when indexing documents.
The purpose of this is to get unstructured information "inside" a document and 
create structured metadata (as fields) to enrich each document.

Basically this can be done with a custom UpdateRequestProcessor which triggers 
UIMA while indexing documents.
The basic UIMA implementation of UpdateRequestProcessor extracts sentences 
(with a tokenizer and an hidden Markov model tagger), named entities, language, 
suggested category, keywords and concepts (exploiting external services from 
OpenCalais and AlchemyAPI). Such an implementation can be easily extended 
adding or selecting different UIMA analysis engines, both from UIMA 
repositories on the web or creating new ones from scratch.

More information can be found on the dedicated wiki page: 
http://wiki.apache.org/solr/SolrUIMA

  was:
Provide components to enable Apache UIMA automatic metadata extraction to be 
exploited when indexing documents.
The purpose of this is to get unstructured information "inside" a document and 
create structured metadata (as fields) to enrich each document.

Basically this can be done with a custom UpdateRequestProcessor which triggers 
UIMA while indexing documents.
The basic UIMA implementation of UpdateRequestProcessor extracts sentences 
(with a tokenizer and an hidden Markov model tagger), named entities, language, 
suggested category, keywords and concepts (exploiting external services from 
OpenCalais and AlchemyAPI). Such an implementation can be easily extended 
adding or selecting different UIMA analysis engines, both from UIMA 
repositories on the web or creating new ones from scratch.

More information can be found on dedicated the wiki page: 
http://wiki.apache.org/solr/SolrUIMA


> Provide a Solr module for dynamic metadata extraction/indexing with Apache 
> UIMA
> -------------------------------------------------------------------------------
>
>                 Key: SOLR-2129
>                 URL: https://issues.apache.org/jira/browse/SOLR-2129
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Tommaso Teofili
>            Assignee: Robert Muir
>         Attachments: lib-jars.zip, SOLR-2129-asf-headers.patch, 
> SOLR-2129-version-5.patch, SOLR-2129-version-6.patch, 
> SOLR-2129-version2.patch, SOLR-2129-version3.patch, SOLR-2129.patch, 
> SOLR-2129.patch
>
>
> Provide components to enable Apache UIMA automatic metadata extraction to be 
> exploited when indexing documents.
> The purpose of this is to get unstructured information "inside" a document 
> and create structured metadata (as fields) to enrich each document.
> Basically this can be done with a custom UpdateRequestProcessor which 
> triggers UIMA while indexing documents.
> The basic UIMA implementation of UpdateRequestProcessor extracts sentences 
> (with a tokenizer and an hidden Markov model tagger), named entities, 
> language, suggested category, keywords and concepts (exploiting external 
> services from OpenCalais and AlchemyAPI). Such an implementation can be 
> easily extended adding or selecting different UIMA analysis engines, both 
> from UIMA repositories on the web or creating new ones from scratch.
> More information can be found on the dedicated wiki page: 
> http://wiki.apache.org/solr/SolrUIMA

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to