Manoj, you can add custom feature using a generator that implements this: https://github.com/apache/opennlp/blob/master/opennlp-tools/src/main/java/opennlp/tools/doccat/FeatureGenerator.java
take a look at https://github.com/apache/opennlp/blob/master/opennlp-tools/src/main/java/opennlp/tools/doccat/BagOfWordsFeatureGenerator.java and https://github.com/apache/opennlp/blob/master/opennlp-tools/src/main/java/opennlp/tools/doccat/NGramFeatureGenerator.java Damiano 2017-01-18 12:41 GMT+01:00 Cohan Sujay Carlos <co...@aiaioo.com>: > In machine learning, one learns the weights you're speaking of, Manoj. > > So, the words that are more important for any category are given higher > weightage during classification. > > However, rather than requiring a user to manually assign these weights, a > machine learning system learns the weights from training data. > > That's what happens when you call say DocumentCategorizerME.train(*"en"*, > sampleStream); > > The model that the train method returns is just a record of the "weights" > that have been learnt. > > Cohan > > On Wed, Jan 18, 2017 at 4:18 PM, Manoj B. Narayanan < > manojb.narayanan2...@gmail.com> wrote: > > > Hi, > > > > I was wondering if there is a way to assign weights to certain words of a > > class in the Document Classifier. > > > > Some words are important for a particular class. Even though these words > > may occur in other classes, the level of importance may vary. So, if > > certain words in certain classes are given specific weights, it would > > produce more accurate results. > > > > Let me explain this with an example. > > > > Say we have 2 classes. Nature and Sports. > > Consider these 2 sentences : > > 1. We played basket ball, under the sun. > > 2. The sun is a big ball of fire. > > > > In the first sentence, which belongs to the class 'Sports', the words > > 'played','basket','ball' are more important than the word 'sun'. Whereas, > > in the second sentence, the words 'sun' and 'fire' are important than the > > word 'ball'. > > > > Thelevel of importance can be assigned by assigning weight to a few > > specific words that are distinct for a class. > > > > Is there already a way to do this in OpenNLP Document Classifier? If not > > please consider this. > > >