Dear Koji,
Since I am newbie to Lucene, I still have no opinion about the .xml file you
have talked about in your post unfortunately!!!
Let's imagine I have 5 categories named {A, B, C, D, E} and 100 files named
from 1 to 100. It is impossible in my case to train the classifier out of a
loop, because I should extract the content of each file and its category and
then add it to the training set. So it must be in a loop. Could you please
tell me if I am right with the following pseudocode:
directory = directory of training files
trainingNumber = number of training files
for(int i = 0; i < trainingNumber; i++)
{
String category = category of ith file
String text = content of ith file
classifier.train(ar, text, category, new
SomeAnalyzer(Version.LUCENE_46));
}
If it is wrong, please let me know how I should train the classifier outside
the loop
Yours Sincerely,
benglish
--
View this message in context:
http://lucene.472066.n3.nabble.com/Train-Lucene-with-topic-defined-files-tp4141979p4143318.html
Sent from the Lucene - General mailing list archive at Nabble.com.