Hi Cohan,

Thanks for the reply. Am not sure if my intention got conveyed properly. To
rephrase my intention - Let us assume that we have three key tokens that
decide the outcome. Out of the three, one token can mean a lot to one
outcome while the same can be used in another outcome with less importance.
In this case, while computing the overall score, is it possible to boost
the weight of one particular token for that outcome?

When I have closely related outcomes, the words used in the outcomes will
overlap. In such a case, I should be able to teach the machine certain
words, which should be given importance when calculating the likelihood for
a particular outcome and will be treated normal when calculating the
likelihood for other outcomes.

For example, the word 'player' is very important in a 'Sport' outcome than
in a 'Politics' outcome.
1. He has been a very popular basket ball player among our country's clubs
since the 90's. - Sport
2. The country's changes made it a very popular player in world politics
since the 90's. - Politics

While calculating the likelihood of sentence 1 corresponding 'Sport'
outcome, the word 'player' will be given more weight than while 'player' in
'Politics' outcome.

The worst case will be when I have 3 outcomes and I have 3 tokens used in
all 3 outcomes. Each outcome will have 1 token among the 3 given
importance. This will be the same worst case as before where the
surrounding words determine the outcome. But the best case will improve by
a lot.

Say, I have a sentence of 10 words. 9/10 words say that the sentence
belongs to A. 5/10 say that sentence belongs to B. I know that the sentence
belongs to B. But A would be chosen over B.

What I suggest is, when calculating the likelihood for B, I would boost
a/some tokens out of the 5 which say that the sentence belongs to B, so
that the machine would choose B over A.

I believe I have made my intention more clear.

Manoj.

On Wed, Jan 18, 2017 at 5:11 PM, Cohan Sujay Carlos <co...@aiaioo.com>
wrote:

> In machine learning, one learns the weights you're speaking of, Manoj.
>
> So, the words that are more important for any category are given higher
> weightage during classification.
>
> However, rather than requiring a user to manually assign these weights, a
> machine learning system learns the weights from training data.
>
> That's what happens when you call say DocumentCategorizerME.train(*"en"*,
> sampleStream);
>
> The model that the train method returns is just a record of the "weights"
> that have been learnt.
>
> Cohan
>
> On Wed, Jan 18, 2017 at 4:18 PM, Manoj B. Narayanan <
> manojb.narayanan2...@gmail.com> wrote:
>
> > Hi,
> >
> > I was wondering if there is a way to assign weights to certain words of a
> > class in the Document Classifier.
> >
> > Some words are important for a particular class. Even though these words
> > may occur in other classes, the level of importance may vary. So, if
> > certain words in certain classes are given specific weights, it would
> > produce more accurate results.
> >
> > Let me explain this with an example.
> >
> > Say we have 2 classes. Nature and Sports.
> > Consider these 2 sentences :
> >     1. We played basket ball, under the sun.
> >     2. The sun is a big ball of fire.
> >
> > In the first sentence, which belongs to the class 'Sports', the words
> > 'played','basket','ball' are more important than the word 'sun'. Whereas,
> > in the second sentence, the words 'sun' and 'fire' are important than the
> > word 'ball'.
> >
> > Thelevel of importance can be assigned by assigning weight to a few
> > specific words that are distinct for a class.
> >
> > Is there already a way to do this in OpenNLP Document Classifier? If not
> > please consider this.
> >
>

Reply via email to