in hadoop-*-examples.jar, use "randomwriter" to generate the data and "sort"
to sort it.
- Aaron

On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi <forpan...@gmail.com> wrote:

> Your data is too small I guess for 15 clusters ..So it might be overhead
> time of these clusters making your total MR jobs more time consuming.
> I guess you will have to try with larger set of data..
>
> Pankil
> On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra <mnage...@asu.edu>
> wrote:
>
> > Aaron
> >
> > That could be the issue, my data is just 516MB - wouldn't this see a bit
> of
> > speed up?
> > Could you guide me to the example? I ll run my cluster on it and see what
> I
> > get. Also for my program I had a java timer running to record the time
> > taken
> > to complete execution. Does Hadoop have an inbuilt timer?
> >
> > Mithila
> >
> > On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball <aa...@cloudera.com>
> wrote:
> >
> > > Virtually none of the examples that ship with Hadoop are designed to
> > > showcase its speed. Hadoop's speedup comes from its ability to process
> > very
> > > large volumes of data (starting around, say, tens of GB per job, and
> > going
> > > up in orders of magnitude from there). So if you are timing the pi
> > > calculator (or something like that), its results won't necessarily be
> > very
> > > consistent. If a job doesn't have enough fragments of data to allocate
> > one
> > > per each node, some of the nodes will also just go unused.
> > >
> > > The best example for you to run is to use randomwriter to fill up your
> > > cluster with several GB of random data and then run the sort program.
> If
> > > that doesn't scale up performance from 3 nodes to 15, then you've
> > > definitely
> > > got something strange going on.
> > >
> > > - Aaron
> > >
> > >
> > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra <mnage...@asu.edu>
> > > wrote:
> > >
> > > > Hey all
> > > > I recently setup a three node hadoop cluster and ran an examples on
> it.
> > > It
> > > > was pretty fast, and all the three nodes were being used (I checked
> the
> > > log
> > > > files to make sure that the slaves are utilized).
> > > >
> > > > Now I ve setup another cluster consisting of 15 nodes. I ran the same
> > > > example, but instead of speeding up, the map-reduce task seems to
> take
> > > > forever! The slaves are not being used for some reason. This second
> > > cluster
> > > > has a lower, per node processing power, but should that make any
> > > > difference?
> > > > How can I ensure that the data is being mapped to all the nodes?
> > > Presently,
> > > > the only node that seems to be doing all the work is the Master node.
> > > >
> > > > Does 15 nodes in a cluster increase the network cost? What can I do
> to
> > > > setup
> > > > the cluster to function more efficiently?
> > > >
> > > > Thanks!
> > > > Mithila Nagendra
> > > > Arizona State University
> > > >
> > >
> >
>

Reply via email to