[jira] Commented: (SOLR-2129) Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA

Mark Miller (JIRA) Mon, 03 Jan 2011 10:37:13 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976862#action_12976862
 ]


Mark Miller commented on SOLR-2129:
-----------------------------------

bq.  I have no problem committing this to contrib so future iterations can be 
from svn. any objections?

+1 - getting into trunk will likely expand usage and feedback, and get things 
rolling much faster. Bar is much lower for Solr contrib as well.


I've only started looking at the patch, but a few notes I jotted down:

StringBuffer usage in UpdateRequestProcessor - should be StringBuilder right?

The below is a little odd, no (critical code I know ;) )?

/* execute the AE on the given JCas */
private void executeAE(AnalysisEngine ae, JCas jcas) throws 
AnalysisEngineProcessException {
  ae.getLogger().log(Level.INFO, new StringBuffer("Analazying 
text").toString());
  ae.process(jcas);
  ae.getLogger().log(Level.INFO, new StringBuffer("Text processing 
completed").toString());
}


AEProviderFactory should be thread safe?? At a min, you have to consider 
multicore ... consider that you could be sharing AEProvider across threads 
because of this as well (static cache in AEProviderFactory). Perhaps the cache 
should not be static?


Don't want to at least log this?

} catch (AnalysisEngineProcessException e) {
     // do nothing
}



> Provide a Solr module for dynamic metadata extraction/indexing with Apache 
> UIMA
> -------------------------------------------------------------------------------
>
>                 Key: SOLR-2129
>                 URL: https://issues.apache.org/jira/browse/SOLR-2129
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Tommaso Teofili
>            Assignee: Robert Muir
>         Attachments: lib-jars.zip, SOLR-2129-asf-headers.patch, 
> SOLR-2129-version2.patch, SOLR-2129-version3.patch, SOLR-2129.patch, 
> SOLR-2129.patch
>
>
> Provide components to enable Apache UIMA automatic metadata extraction to be 
> exploited when indexing documents.
> The purpose of this is to get unstructured information "inside" a document 
> and create structured metadata (as fields) to enrich each document.
> Basically this can be done with a custom UpdateRequestProcessor which 
> triggers UIMA while indexing documents.
> The basic UIMA implementation of UpdateRequestProcessor extracts sentences 
> (with a tokenizer and an hidden Markov model tagger), named entities, 
> language, suggested category, keywords and concepts (exploiting external 
> services from OpenCalais and AlchemyAPI). Such an implementation can be 
> easily extended adding or selecting different UIMA analysis engines, both 
> from UIMA repositories on the web or creating new ones from scratch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (SOLR-2129) Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA

Reply via email to