Right I think this answers the previous questions? There are a couple of main APIs a workbench could tie into. One is the streaming API, the other is the older Search API: http://apiwiki.twitter.com/Twitter-Search-API-Method%3A-search
Ted mentioned simply playing with the data visually is the best way to start. Perhaps we can build some helper tools? As far as classification, it seems like search via Twitter is going to evolve into somewhat uselessness quickly, and so value added search, or perhaps personalized search via classification could be more handy. I could see where various vertical web site classify Tweets into categories based on their own custom trained models. So rather than a one size fits all model, I'm thinking some easy open source tools (like Mahout) will allow anyone to build many different models to assist in organizing a stream of Tweets. What happens after that is part of the fun! > 1. access to the data, although I'm sure the ASF could work something out here I think we're providing software here, I can't see downloading the data in ASF repositories. Mahout being on Hadoop is great for archived Tweets, and then some realtime algorithms could be useful for the streaming data. > 2. training data. wouldn't you need a set of 'tweets' classified in some manner? or were you thinking of using a different data source to base it on? It'd be nice to develop a workbench to easily build the training set. Then allow easy retraining, which should occur quite often with Twitter. > Do you have any deeper thinkings about this topic? We can try things out... I think Twitter offers some unique challenges to machine learning, Ted do you agree? On Wed, Jan 20, 2010 at 1:10 PM, Hannes Carl Meyer <[email protected]> wrote: > Hi Jason, > to get access to the Twitter Data you could use the Twitter Streaming API: > http://apiwiki.twitter.com/Streaming-API-Documentation > Regards > Hannes > > On Wed, Jan 20, 2010 at 10:02 PM, Ian Holsman <[email protected]> wrote: > >> On 1/20/10 2:35 AM, Jason Rutherglen wrote: >> >>> We've got Newsgroup classification. I'm kinda of interested in >>> creating a Twitter classification system, or at least playing >>> around with it. Also I think as a relevant growing large data >>> set, it seems Twitter fit well with Hadoop based machine >>> learning algorithms... Just throwing out into the wild! >>> >>> >>> >> Hi Jason. >> I think the biggest issues here are twofold. >> >> 1. access to the data, although I'm sure the ASF could work something out >> here >> 2. training data. wouldn't you need a set of 'tweets' classified in some >> manner? or were you thinking of using a different data source to base it on? >> >
