Re: [Scikit-learn-general] TFIDF for short text

Olivier Grisel Thu, 03 May 2012 19:42:39 -0700

2012/5/3 Philipp Singer <[email protected]>:
>
> Hey!
>
> I am currently using Tweets crawled for Twitter and try to make text
> classification on them. My first idea was to use TFIDF for this case.
>
> But when thinking more about it, that doesn't really make sense for
> short texts which are limited to 140 characters, because the TF value
> will nearly always be 1 and so this doesn't make sense.


Binary occurrence features are probably better for short texts
although I don't have a reference handy nor practical experience with
raw tweets classification.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] TFIDF for short text

Reply via email to