2012/5/3 Philipp Singer <[email protected]>: > > Hey! > > I am currently using Tweets crawled for Twitter and try to make text > classification on them. My first idea was to use TFIDF for this case. > > But when thinking more about it, that doesn't really make sense for > short texts which are limited to 140 characters, because the TF value > will nearly always be 1 and so this doesn't make sense.
Binary occurrence features are probably better for short texts although I don't have a reference handy nor practical experience with raw tweets classification. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
