Re: Suggestion/Query - Adding weights to words in Document Classifier

Cohan Sujay Carlos Wed, 18 Jan 2017 06:38:22 -0800

I understood you.

The assignment of "weights" to words (or to other features) happens
automatically.


Here's a set of slides on how the naive bayes classifier learns those
"weights":

http://www.slideshare.net/aiaioo/fun-with-text-hacking-text-analytics (you
may want to start at slide 16)

Does that answer your question?

Cohan


On Wed, Jan 18, 2017 at 7:52 PM, Manoj B. Narayanan <
[email protected]> wrote:

> Hi Cohan,
>
> Thanks for the reply. Am not sure if my intention got conveyed properly. To
> rephrase my intention - Let us assume that we have three key tokens that
> decide the outcome. Out of the three, one token can mean a lot to one
> outcome while the same can be used in another outcome with less importance.
> In this case, while computing the overall score, is it possible to boost
> the weight of one particular token for that outcome?
>
> When I have closely related outcomes, the words used in the outcomes will
> overlap. In such a case, I should be able to teach the machine certain
> words, which should be given importance when calculating the likelihood for
> a particular outcome and will be treated normal when calculating the
> likelihood for other outcomes.
>
> For example, the word 'player' is very important in a 'Sport' outcome than
> in a 'Politics' outcome.
> 1. He has been a very popular basket ball player among our country's clubs
> since the 90's. - Sport
> 2. The country's changes made it a very popular player in world politics
> since the 90's. - Politics
>
> While calculating the likelihood of sentence 1 corresponding 'Sport'
> outcome, the word 'player' will be given more weight than while 'player' in
> 'Politics' outcome.
>
> The worst case will be when I have 3 outcomes and I have 3 tokens used in
> all 3 outcomes. Each outcome will have 1 token among the 3 given
> importance. This will be the same worst case as before where the
> surrounding words determine the outcome. But the best case will improve by
> a lot.
>
> Say, I have a sentence of 10 words. 9/10 words say that the sentence
> belongs to A. 5/10 say that sentence belongs to B. I know that the sentence
> belongs to B. But A would be chosen over B.
>
> What I suggest is, when calculating the likelihood for B, I would boost
> a/some tokens out of the 5 which say that the sentence belongs to B, so
> that the machine would choose B over A.
>
> I believe I have made my intention more clear.
>
> Manoj.
>
> On Wed, Jan 18, 2017 at 5:11 PM, Cohan Sujay Carlos <[email protected]>
> wrote:
>
> > In machine learning, one learns the weights you're speaking of, Manoj.
> >
> > So, the words that are more important for any category are given higher
> > weightage during classification.
> >
> > However, rather than requiring a user to manually assign these weights, a
> > machine learning system learns the weights from training data.
> >
> > That's what happens when you call say DocumentCategorizerME.train(*"
> en"*,
> > sampleStream);
> >
> > The model that the train method returns is just a record of the "weights"
> > that have been learnt.
> >
> > Cohan
> >
> > On Wed, Jan 18, 2017 at 4:18 PM, Manoj B. Narayanan <
> > [email protected]> wrote:
> >
> > > Hi,
> > >
> > > I was wondering if there is a way to assign weights to certain words
> of a
> > > class in the Document Classifier.
> > >
> > > Some words are important for a particular class. Even though these
> words
> > > may occur in other classes, the level of importance may vary. So, if
> > > certain words in certain classes are given specific weights, it would
> > > produce more accurate results.
> > >
> > > Let me explain this with an example.
> > >
> > > Say we have 2 classes. Nature and Sports.
> > > Consider these 2 sentences :
> > >     1. We played basket ball, under the sun.
> > >     2. The sun is a big ball of fire.
> > >
> > > In the first sentence, which belongs to the class 'Sports', the words
> > > 'played','basket','ball' are more important than the word 'sun'.
> Whereas,
> > > in the second sentence, the words 'sun' and 'fire' are important than
> the
> > > word 'ball'.
> > >
> > > Thelevel of importance can be assigned by assigning weight to a few
> > > specific words that are distinct for a class.
> > >
> > > Is there already a way to do this in OpenNLP Document Classifier? If
> not
> > > please consider this.
> > >
> >
>

Re: Suggestion/Query - Adding weights to words in Document Classifier

Reply via email to