Benglish,

I'm not sure I understand your requirements, but perhaps you could use a Naive 
Bayes classifier?
https://en.wikipedia.org/wiki/Naive_Bayes_classifier

Typical Bayes separates into Yes/No (spam detection, etc), but can be extended 
to N-categories.

Lucene provides access to the words it has indexed in your documents.  You 
could feed those to a classifier for training.

A quick Google Search brought this back, perhaps it would get you started:
http://lucene.apache.org/core/4_8_1/classification/org/apache/lucene/classification/SimpleNaiveBayesClassifier.html

They also have a KNearestNeighbor version, see the implementers link here:
http://lucene.apache.org/core/4_8_1/classification/org/apache/lucene/classification/Classifier.html

You might also want to consider Solr, which is a layer on top of Lucene.

--
Mark Bennett / LucidWorks: Search & Big Data / [email protected]
Office: 408-898-4201 / Telecommute: 408-733-0387 / Cell: 408-829-6513

On Jun 15, 2014, at 10:37 PM, benglish <[email protected]> wrote:

> Hi pals,
> 
> I have a huge number of text files with defined tagged topics. What I am
> going to do is to tag the test files due to those pre-tagged files.
> Searching on the Net, I couldn't find my answer: Is it possible to train
> Lucene with tagged files and then it tags test files according to those
> pre-defined tags?
> 
> Yours Sincerely,
> benglish
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Train-Lucene-with-topic-defined-files-tp4141979.html
> Sent from the Lucene - General mailing list archive at Nabble.com.

Reply via email to