In machine learning, one learns the weights you're speaking of, Manoj.

So, the words that are more important for any category are given higher
weightage during classification.

However, rather than requiring a user to manually assign these weights, a
machine learning system learns the weights from training data.

That's what happens when you call say DocumentCategorizerME.train(*"en"*,
sampleStream);

The model that the train method returns is just a record of the "weights"
that have been learnt.

Cohan

On Wed, Jan 18, 2017 at 4:18 PM, Manoj B. Narayanan <
manojb.narayanan2...@gmail.com> wrote:

> Hi,
>
> I was wondering if there is a way to assign weights to certain words of a
> class in the Document Classifier.
>
> Some words are important for a particular class. Even though these words
> may occur in other classes, the level of importance may vary. So, if
> certain words in certain classes are given specific weights, it would
> produce more accurate results.
>
> Let me explain this with an example.
>
> Say we have 2 classes. Nature and Sports.
> Consider these 2 sentences :
>     1. We played basket ball, under the sun.
>     2. The sun is a big ball of fire.
>
> In the first sentence, which belongs to the class 'Sports', the words
> 'played','basket','ball' are more important than the word 'sun'. Whereas,
> in the second sentence, the words 'sun' and 'fire' are important than the
> word 'ball'.
>
> Thelevel of importance can be assigned by assigning weight to a few
> specific words that are distinct for a class.
>
> Is there already a way to do this in OpenNLP Document Classifier? If not
> please consider this.
>

Reply via email to