Hi, Tonci, Actually, I am taking a Master's thesis by developing algorithms on hadoop.
My project is to extend algorithms into mapreduce fasion and to discover whether there is a optimal choice. Most of them belong to the Machine Learning area. Personally, I think this is a fresh area, and if you search the main academic database, you may find few literature about this. I recently made an proposal about my study on Hadoop, and I would like to discuss this with you in depth if you wish. Another interesting topic is to discover the limit of hadoop. We have a very large cluster at a very high rank among TOP500, so I'm wondering whether hadoop can perform as we expected. Hope this helpful. Regards Song Liu On Mon, Mar 1, 2010 at 9:16 PM, Stephen Watt <sw...@us.ibm.com> wrote: > Hi Tonci > > Public Data Sets - Check out infochimps.org/ or > aws.amazon.com/publicdatasets/ > > I find a lot of the Hadoopified algorithms out there originate from > Linguistics departments, TF-IDF is one example, but, have you considered > looking into Information Theory ? i.e. Entropy analytics using algorithms > like Pointwise Mutual Information. I'd imagine most government security > agencies would be interested in using Hadoop for signal processing/code > breaking. Especially the cost savings of using commodity machines. The > trick will be to find a dataset that suits your algorithm. > > Kind regards > Steve Watt > > > > > From: > Tonci Buljan <tonci.bul...@gmail.com> > To: > common-user@hadoop.apache.org > Date: > 03/01/2010 08:27 AM > Subject: > Re: Hadoop as master's thesis > > > > Thank you for your reply. > > > I didn't mention that I already installed Hadoop on 2 machines back at > home > (for a essay on Hadoop which I did), one as a namenode and datanode and > one > as a datanode only. Everything worked perfect. I would really try to > install > it on more machines to see how cluster works in more detail. So I was > thinking:” Now I have a cluster, where do I find a large dataset to work > with?”. > > > I like your idea about publicly available datasets, do you have any links > on that? > > The other idea, about student grades is also great (thank you for that) > and > I might just start with that. > > > Thank you very much, you both really helped me. > > > On 1 March 2010 15:15, Mark Kerzner <markkerz...@gmail.com> wrote: > > > Tonci, > > > > to start with, you can run Hadoop on one computer in pseudo-cluster > mode. > > Installing and configuring will be enough headache on its own. Then you > can > > think of a problem, such as process student records and grades and find > > some > > statistics, or grade and their future achievements. Or, you can look at > > some > > publicly available datasets and so something with them. > > > > Cheers, > > Mark > > > > On Mon, Mar 1, 2010 at 8:01 AM, Tonci Buljan <tonci.bul...@gmail.com> > > wrote: > > > > > Hello everyone, > > > > > > I'm thinking of using Hadoop as a subject in my master's thesis in > > > Computer > > > Science. I'm supposed to solve some kind of a problem with Hadoop, but > > > can't > > > think of any :)). > > > > > > We have a lab with 10-15 computers and I tough of installing Hadoop > on > > > those computers, and now I should write some kind of a program to run > on > > my > > > cluster. > > > > > > I really hope you understood my problem :). I really need any kind of > > > suggestion. > > > > > > > > > P.S. Sorry for my bad English, I'm from Croatia. > > > > > > > > >