Hi Tonci Public Data Sets - Check out infochimps.org/ or aws.amazon.com/publicdatasets/
I find a lot of the Hadoopified algorithms out there originate from Linguistics departments, TF-IDF is one example, but, have you considered looking into Information Theory ? i.e. Entropy analytics using algorithms like Pointwise Mutual Information. I'd imagine most government security agencies would be interested in using Hadoop for signal processing/code breaking. Especially the cost savings of using commodity machines. The trick will be to find a dataset that suits your algorithm. Kind regards Steve Watt From: Tonci Buljan <tonci.bul...@gmail.com> To: common-user@hadoop.apache.org Date: 03/01/2010 08:27 AM Subject: Re: Hadoop as master's thesis Thank you for your reply. I didn't mention that I already installed Hadoop on 2 machines back at home (for a essay on Hadoop which I did), one as a namenode and datanode and one as a datanode only. Everything worked perfect. I would really try to install it on more machines to see how cluster works in more detail. So I was thinking:” Now I have a cluster, where do I find a large dataset to work with?”. I like your idea about publicly available datasets, do you have any links on that? The other idea, about student grades is also great (thank you for that) and I might just start with that. Thank you very much, you both really helped me. On 1 March 2010 15:15, Mark Kerzner <markkerz...@gmail.com> wrote: > Tonci, > > to start with, you can run Hadoop on one computer in pseudo-cluster mode. > Installing and configuring will be enough headache on its own. Then you can > think of a problem, such as process student records and grades and find > some > statistics, or grade and their future achievements. Or, you can look at > some > publicly available datasets and so something with them. > > Cheers, > Mark > > On Mon, Mar 1, 2010 at 8:01 AM, Tonci Buljan <tonci.bul...@gmail.com> > wrote: > > > Hello everyone, > > > > I'm thinking of using Hadoop as a subject in my master's thesis in > > Computer > > Science. I'm supposed to solve some kind of a problem with Hadoop, but > > can't > > think of any :)). > > > > We have a lab with 10-15 computers and I tough of installing Hadoop on > > those computers, and now I should write some kind of a program to run on > my > > cluster. > > > > I really hope you understood my problem :). I really need any kind of > > suggestion. > > > > > > P.S. Sorry for my bad English, I'm from Croatia. > > >