[ https://issues.apache.org/jira/browse/LUCENE-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447293#comment-13447293 ]
Tommaso Teofili commented on LUCENE-4345: ----------------------------------------- bq. Nice! I've found that filtering for nouns & verbs makes another NLP task (latent semantic indexing) work much better. This will benefit from parts-of-speech filtering. my former comment is partially correct as the Analyzer is currently used only on the unseen text rather than on the whole set of docs too, using it (or other Analyzers) with the existing docs' text would make training slower but it could be useful to improve accuracy. Maybe a subclass of the current one which is capable of doing that would be a nice addition. > Create a Classification module > ------------------------------ > > Key: LUCENE-4345 > URL: https://issues.apache.org/jira/browse/LUCENE-4345 > Project: Lucene - Core > Issue Type: New Feature > Reporter: Tommaso Teofili > Assignee: Tommaso Teofili > Priority: Minor > Attachments: LUCENE-4345_2.patch, LUCENE-4345.patch, > SOLR-3700_2.patch, SOLR-3700.patch > > > Lucene/Solr can host huge sets of documents containing lots of information in > fields so that these can be used as training examples (w/ features) in order > to very quickly create classifiers algorithms to use on new documents and / > or to provide an additional service. > So the idea is to create a contrib module (called 'classification') to host a > ClassificationComponent that will use already seen data (the indexed > documents / fields) to classify new documents / text fragments. > The first version will contain a (simplistic) Lucene based Naive Bayes > classifier but more implementations should be added in the future. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org