[ https://issues.apache.org/jira/browse/SOLR-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976862#action_12976862 ]
Mark Miller commented on SOLR-2129: ----------------------------------- bq. I have no problem committing this to contrib so future iterations can be from svn. any objections? +1 - getting into trunk will likely expand usage and feedback, and get things rolling much faster. Bar is much lower for Solr contrib as well. I've only started looking at the patch, but a few notes I jotted down: StringBuffer usage in UpdateRequestProcessor - should be StringBuilder right? The below is a little odd, no (critical code I know ;) )? /* execute the AE on the given JCas */ private void executeAE(AnalysisEngine ae, JCas jcas) throws AnalysisEngineProcessException { ae.getLogger().log(Level.INFO, new StringBuffer("Analazying text").toString()); ae.process(jcas); ae.getLogger().log(Level.INFO, new StringBuffer("Text processing completed").toString()); } AEProviderFactory should be thread safe?? At a min, you have to consider multicore ... consider that you could be sharing AEProvider across threads because of this as well (static cache in AEProviderFactory). Perhaps the cache should not be static? Don't want to at least log this? } catch (AnalysisEngineProcessException e) { // do nothing } > Provide a Solr module for dynamic metadata extraction/indexing with Apache > UIMA > ------------------------------------------------------------------------------- > > Key: SOLR-2129 > URL: https://issues.apache.org/jira/browse/SOLR-2129 > Project: Solr > Issue Type: New Feature > Reporter: Tommaso Teofili > Assignee: Robert Muir > Attachments: lib-jars.zip, SOLR-2129-asf-headers.patch, > SOLR-2129-version2.patch, SOLR-2129-version3.patch, SOLR-2129.patch, > SOLR-2129.patch > > > Provide components to enable Apache UIMA automatic metadata extraction to be > exploited when indexing documents. > The purpose of this is to get unstructured information "inside" a document > and create structured metadata (as fields) to enrich each document. > Basically this can be done with a custom UpdateRequestProcessor which > triggers UIMA while indexing documents. > The basic UIMA implementation of UpdateRequestProcessor extracts sentences > (with a tokenizer and an hidden Markov model tagger), named entities, > language, suggested category, keywords and concepts (exploiting external > services from OpenCalais and AlchemyAPI). Such an implementation can be > easily extended adding or selecting different UIMA analysis engines, both > from UIMA repositories on the web or creating new ones from scratch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org