Bok Tonci,

You'll find good dataset pointers here:

  http://www.simpy.com/user/otis/search/dataset


You may find inspiration for Hadoop usage here, assuming you have ML background:

  http://cwiki.apache.org/MAHOUT/algorithms.html

Oh, and you may also want to look out for GSOC (Google Summer of Code).

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



----- Original Message ----
> From: Tonci Buljan <tonci.bul...@gmail.com>
> To: common-user@hadoop.apache.org
> Sent: Mon, March 1, 2010 9:24:53 AM
> Subject: Re: Hadoop as master's thesis
> 
> Thank you for your reply.
> 
> 
> I didn't mention that I already installed Hadoop on 2 machines back at home
> (for a essay on Hadoop which I did), one as a namenode and datanode and one
> as a datanode only. Everything worked perfect. I would really try to install
> it on more machines to see how cluster works in more detail. So I was
> thinking:” Now I have a cluster, where do I find a large dataset to work
> with?”.
> 
> 
> I like your idea about publicly available datasets, do you have any links
> on that?
> 
> The other idea, about student grades is also great (thank you for that) and
> I might just start with that.
> 
> 
> Thank you very much, you both really helped me.
> 
> 
> On 1 March 2010 15:15, Mark Kerzner wrote:
> 
> > Tonci,
> >
> > to start with, you can run Hadoop on one computer in pseudo-cluster mode.
> > Installing and configuring will be enough headache on its own. Then you can
> > think of a problem, such as process student records and grades and find
> > some
> > statistics, or grade and their future achievements. Or, you can look at
> > some
> > publicly available datasets and so something with them.
> >
> > Cheers,
> > Mark
> >
> > On Mon, Mar 1, 2010 at 8:01 AM, Tonci Buljan 
> > wrote:
> >
> > > Hello everyone,
> > >
> > >  I'm thinking of using Hadoop as a subject in my master's thesis in
> > > Computer
> > > Science. I'm supposed to solve some kind of a problem with Hadoop, but
> > > can't
> > > think of any :)).
> > >
> > >  We have a lab with 10-15 computers and I tough of installing Hadoop on
> > > those computers, and now I should write some kind of a program to run on
> > my
> > > cluster.
> > >
> > >  I really hope you understood my problem :). I really need any kind of
> > > suggestion.
> > >
> > >
> > >  P.S. Sorry for my bad English, I'm from Croatia.
> > >
> >

Reply via email to