In machine learning, one learns the weights you're speaking of, Manoj. So, the words that are more important for any category are given higher weightage during classification.
However, rather than requiring a user to manually assign these weights, a machine learning system learns the weights from training data. That's what happens when you call say DocumentCategorizerME.train(*"en"*, sampleStream); The model that the train method returns is just a record of the "weights" that have been learnt. Cohan On Wed, Jan 18, 2017 at 4:18 PM, Manoj B. Narayanan < manojb.narayanan2...@gmail.com> wrote: > Hi, > > I was wondering if there is a way to assign weights to certain words of a > class in the Document Classifier. > > Some words are important for a particular class. Even though these words > may occur in other classes, the level of importance may vary. So, if > certain words in certain classes are given specific weights, it would > produce more accurate results. > > Let me explain this with an example. > > Say we have 2 classes. Nature and Sports. > Consider these 2 sentences : > 1. We played basket ball, under the sun. > 2. The sun is a big ball of fire. > > In the first sentence, which belongs to the class 'Sports', the words > 'played','basket','ball' are more important than the word 'sun'. Whereas, > in the second sentence, the words 'sun' and 'fire' are important than the > word 'ball'. > > Thelevel of importance can be assigned by assigning weight to a few > specific words that are distinct for a class. > > Is there already a way to do this in OpenNLP Document Classifier? If not > please consider this. >