2010/1/21 Olivier Grisel <[email protected]>: > 2010/1/20 Ian Holsman <[email protected]>: >> On 1/20/10 2:35 AM, Jason Rutherglen wrote: >>> >>> We've got Newsgroup classification. I'm kinda of interested in >>> creating a Twitter classification system, or at least playing >>> around with it. Also I think as a relevant growing large data >>> set, it seems Twitter fit well with Hadoop based machine >>> learning algorithms... Just throwing out into the wild! >>> >>> >> >> Hi Jason. >> I think the biggest issues here are twofold. >> >> 1. access to the data, although I'm sure the ASF could work something out >> here > > Firehose (the live complete twitter stream) is going to be open to the > public this year. In the mean time the mean time it is possible to > gain access to a sample stream and to perform adhoc search queries on > specific terms or user profiles.
BTW, I just stumbeled upon the following project to dump a twitter statuses stream directly to HDFS: http://github.com/ieure/Twidoop -- Olivier http://twitter.com/ogrisel - http://code.oliviergrisel.name
