On Tuesday 25 March 2008, Robin Anil wrote:
> You may be interested in reading the paper which talks more about it Here
> <http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf>.

The paper looks interesting: The modifications to naive bayes presented in the 
paper seem to lead to a classifier that is comparable to SVM performance for 
text classification while having far better performance.


> the feature selection module is overloaded for each of them.

Sounds reasonable to me. I would guess the feature selection module is 
independent of the classifier?


> > So what you are hoping for is a system that can crawl and answer queries
> > at the same time, integrating more and more information as it becomes
> > available, right?
>
> No because the queries arent fixed. If you disregard the TREC queries, say
> a person is sitting there asking for opinion about a target. He may type
> "Nokia 6600" or "My left hand". Now, I would have to go though the DB and
> find everything which talks about Nokia and the other and do post
> processing if its not yet processed.

I see - you want to do the sentiment classification step at query time and 
therefore you need it to be efficient. This implies that you need to store 
each text unit (say each blog posting) either in clear text or as some 
general feature vector (depends on whether your features are query dependant 
or not) and do the classification at query time.


> Another reason is the ranking of the results become a problem. How do i say
> which among the 1000 results gives the better opinion. The doc that talks
> more about the target or the one which has more opinions about the target.
> Neither, we need to rank them based on the output of Classification
> Algorithms. 

Seems like you need an algorithm that outputs comparable scores for each 
document and is neither under- nor overconfident. I remember vaguely that the 
vanilla NB had some problems in this respect.

Isabel


-- 
The most important design issue... is the fact that Linux is supposed to be 
fun...          -- Linus Torvalds at the First Dutch International Symposium on 
Linux
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xmpp://[EMAIL PROTECTED]>

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to