Hi, Tonci, Actually, I am taking a Master's thesis by developing algorithms
on hadoop.

My project is to extend algorithms into mapreduce fasion and to discover
whether there is a optimal choice.  Most of them belong to the Machine
Learning area. Personally, I think this is a fresh area, and if you search
the main academic database, you may find few literature about this.

I recently made an proposal about my study on Hadoop, and I would like to
discuss this with you in depth if you wish.

Another interesting topic is to discover the limit of hadoop. We have a very
large cluster at a very high rank among TOP500, so I'm wondering whether
hadoop can perform as we expected.

Hope this helpful.

Regards
Song Liu


On Mon, Mar 1, 2010 at 9:16 PM, Stephen Watt <sw...@us.ibm.com> wrote:

> Hi Tonci
>
> Public Data Sets - Check out infochimps.org/ or
> aws.amazon.com/publicdatasets/
>
> I find a lot of the Hadoopified algorithms out there originate from
> Linguistics departments, TF-IDF is one example, but, have you considered
> looking into Information Theory ? i.e. Entropy analytics using algorithms
> like Pointwise Mutual Information. I'd imagine most government security
> agencies would be interested in using Hadoop for signal processing/code
> breaking. Especially the cost savings of using commodity machines. The
> trick will be to find a dataset that suits your algorithm.
>
> Kind regards
> Steve Watt
>
>
>
>
> From:
> Tonci Buljan <tonci.bul...@gmail.com>
> To:
> common-user@hadoop.apache.org
> Date:
> 03/01/2010 08:27 AM
> Subject:
> Re: Hadoop as master's thesis
>
>
>
> Thank you for your reply.
>
>
>  I didn't mention that I already installed Hadoop on 2 machines back at
> home
> (for a essay on Hadoop which I did), one as a namenode and datanode and
> one
> as a datanode only. Everything worked perfect. I would really try to
> install
> it on more machines to see how cluster works in more detail. So I was
> thinking:” Now I have a cluster, where do I find a large dataset to work
> with?”.
>
>
>  I like your idea about publicly available datasets, do you have any links
> on that?
>
> The other idea, about student grades is also great (thank you for that)
> and
> I might just start with that.
>
>
>  Thank you very much, you both really helped me.
>
>
> On 1 March 2010 15:15, Mark Kerzner <markkerz...@gmail.com> wrote:
>
> > Tonci,
> >
> > to start with, you can run Hadoop on one computer in pseudo-cluster
> mode.
> > Installing and configuring will be enough headache on its own. Then you
> can
> > think of a problem, such as process student records and grades and find
> > some
> > statistics, or grade and their future achievements. Or, you can look at
> > some
> > publicly available datasets and so something with them.
> >
> > Cheers,
> > Mark
> >
> > On Mon, Mar 1, 2010 at 8:01 AM, Tonci Buljan <tonci.bul...@gmail.com>
> > wrote:
> >
> > > Hello everyone,
> > >
> > >  I'm thinking of using Hadoop as a subject in my master's thesis in
> > > Computer
> > > Science. I'm supposed to solve some kind of a problem with Hadoop, but
> > > can't
> > > think of any :)).
> > >
> > >  We have a lab with 10-15 computers and I tough of installing Hadoop
> on
> > > those computers, and now I should write some kind of a program to run
> on
> > my
> > > cluster.
> > >
> > >  I really hope you understood my problem :). I really need any kind of
> > > suggestion.
> > >
> > >
> > >  P.S. Sorry for my bad English, I'm from Croatia.
> > >
> >
>
>
>
>

Reply via email to