Re: Map-Reduce Slow Down

jason hadoop Thu, 16 Apr 2009 19:06:16 -0700

The firewall was run at system startup, I think there was a
/etc/sysconfig/iptables file present which triggered the firewall.
I don't currently have access to any centos 5 machines so I can't easily
check.




On Thu, Apr 16, 2009 at 6:54 PM, jason hadoop <jason.had...@gmail.com>wrote:

> The kickstart script was something that the operations staff was using to
> initialize new machines, I never actually saw the script, just figured out
> that there was a firewall in place.
>
>
>
> On Thu, Apr 16, 2009 at 1:28 PM, Mithila Nagendra <mnage...@asu.edu>wrote:
>
>> Jason: the kickstart script - was it something you wrote or is it run when
>> the system turns on?
>> Mithila
>>
>> On Thu, Apr 16, 2009 at 1:06 AM, Mithila Nagendra <mnage...@asu.edu>
>> wrote:
>>
>> > Thanks Jason! Will check that out.
>> > Mithila
>> >
>> >
>> > On Thu, Apr 16, 2009 at 5:23 AM, jason hadoop <jason.had...@gmail.com
>> >wrote:
>> >
>> >> Double check that there is no firewall in place.
>> >> At one point a bunch of new machines were kickstarted and placed in a
>> >> cluster and they all failed with something similar.
>> >> It turned out the kickstart script turned enabled the firewall with a
>> rule
>> >> that blocked ports in the 50k range.
>> >> It took us a while to even think to check that was not a part of our
>> >> normal
>> >> machine configuration
>> >>
>> >> On Wed, Apr 15, 2009 at 11:04 AM, Mithila Nagendra <mnage...@asu.edu>
>> >> wrote:
>> >>
>> >> > Hi Aaron
>> >> > I will look into that thanks!
>> >> >
>> >> > I spoke to the admin who overlooks the cluster. He said that the
>> gateway
>> >> > comes in to the picture only when one of the nodes communicates with
>> a
>> >> node
>> >> > outside of the cluster. But in my case the communication is carried
>> out
>> >> > between the nodes which all belong to the same cluster.
>> >> >
>> >> > Mithila
>> >> >
>> >> > On Wed, Apr 15, 2009 at 8:59 PM, Aaron Kimball <aa...@cloudera.com>
>> >> wrote:
>> >> >
>> >> > > Hi,
>> >> > >
>> >> > > I wrote a blog post a while back about connecting nodes via a
>> gateway.
>> >> > See
>> >> > >
>> >> >
>> >>
>> http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
>> >> > >
>> >> > > This assumes that the client is outside the gateway and all
>> >> > > datanodes/namenode are inside, but the same principles apply.
>> You'll
>> >> just
>> >> > > need to set up ssh tunnels from every datanode to the namenode.
>> >> > >
>> >> > > - Aaron
>> >> > >
>> >> > >
>> >> > > On Wed, Apr 15, 2009 at 10:19 AM, Ravi Phulari <
>> >> rphul...@yahoo-inc.com
>> >> > >wrote:
>> >> > >
>> >> > >> Looks like your NameNode is down .
>> >> > >> Verify if hadoop process are running (   jps should show you all
>> java
>> >> > >> running process).
>> >> > >> If your hadoop process are running try restarting your hadoop
>> process
>> >> .
>> >> > >> I guess this problem is due to your fsimage not being correct .
>> >> > >> You might have to format your namenode.
>> >> > >> Hope this helps.
>> >> > >>
>> >> > >> Thanks,
>> >> > >> --
>> >> > >> Ravi
>> >> > >>
>> >> > >>
>> >> > >> On 4/15/09 10:15 AM, "Mithila Nagendra" <mnage...@asu.edu> wrote:
>> >> > >>
>> >> > >> The log file runs into thousands of line with the same message
>> being
>> >> > >> displayed every time.
>> >> > >>
>> >> > >> On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra <
>> mnage...@asu.edu>
>> >> > >> wrote:
>> >> > >>
>> >> > >> > The log file : hadoop-mithila-datanode-node19.log.2009-04-14 has
>> >> the
>> >> > >> > following in it:
>> >> > >> >
>> >> > >> > 2009-04-14 10:08:11,499 INFO org.apache.hadoop.dfs.DataNode:
>> >> > >> STARTUP_MSG:
>> >> > >> > /************************************************************
>> >> > >> > STARTUP_MSG: Starting DataNode
>> >> > >> > STARTUP_MSG:   host = node19/127.0.0.1
>> >> > >> > STARTUP_MSG:   args = []
>> >> > >> > STARTUP_MSG:   version = 0.18.3
>> >> > >> > STARTUP_MSG:   build =
>> >> > >> >
>> https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18-r
>> >> > >> > 736250; compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC 2009
>> >> > >> > ************************************************************/
>> >> > >> > 2009-04-14 10:08:12,915 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0 time(s).
>> >> > >> > 2009-04-14 10:08:13,925 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1 time(s).
>> >> > >> > 2009-04-14 10:08:14,935 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2 time(s).
>> >> > >> > 2009-04-14 10:08:15,945 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 3 time(s).
>> >> > >> > 2009-04-14 10:08:16,955 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 4 time(s).
>> >> > >> > 2009-04-14 10:08:17,965 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 5 time(s).
>> >> > >> > 2009-04-14 10:08:18,975 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 6 time(s).
>> >> > >> > 2009-04-14 10:08:19,985 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 7 time(s).
>> >> > >> > 2009-04-14 10:08:20,995 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 8 time(s).
>> >> > >> > 2009-04-14 10:08:22,005 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 9 time(s).
>> >> > >> > 2009-04-14 10:08:22,008 INFO org.apache.hadoop.ipc.RPC: Server
>> at
>> >> > >> node18/
>> >> > >> > 192.168.0.18:54310 not available yet, Zzzzz...
>> >> > >> > 2009-04-14 10:08:24,025 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0 time(s).
>> >> > >> > 2009-04-14 10:08:25,035 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1 time(s).
>> >> > >> > 2009-04-14 10:08:26,045 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2 time(s).
>> >> > >> > 2009-04-14 10:08:27,055 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 3 time(s).
>> >> > >> > 2009-04-14 10:08:28,065 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 4 time(s).
>> >> > >> > 2009-04-14 10:08:29,075 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 5 time(s).
>> >> > >> > 2009-04-14 10:08:30,085 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 6 time(s).
>> >> > >> > 2009-04-14 10:08:31,095 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 7 time(s).
>> >> > >> > 2009-04-14 10:08:32,105 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 8 time(s).
>> >> > >> > 2009-04-14 10:08:33,115 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 9 time(s).
>> >> > >> > 2009-04-14 10:08:33,116 INFO org.apache.hadoop.ipc.RPC: Server
>> at
>> >> > >> node18/
>> >> > >> > 192.168.0.18:54310 not available yet, Zzzzz...
>> >> > >> > 2009-04-14 10:08:35,135 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0 time(s).
>> >> > >> > 2009-04-14 10:08:36,145 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1 time(s).
>> >> > >> > 2009-04-14 10:08:37,155 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2 time(s).
>> >> > >> >
>> >> > >> >
>> >> > >> > Hmmm I still cant figure it out..
>> >> > >> >
>> >> > >> > Mithila
>> >> > >> >
>> >> > >> >
>> >> > >> > On Tue, Apr 14, 2009 at 10:22 PM, Mithila Nagendra <
>> >> mnage...@asu.edu
>> >> > >> >wrote:
>> >> > >> >
>> >> > >> >> Also, Would the way the port is accessed change if all these
>> node
>> >> are
>> >> > >> >> connected through a gateway? I mean in the hadoop-site.xml
>> file?
>> >> The
>> >> > >> Ubuntu
>> >> > >> >> systems we worked with earlier didnt have a gateway.
>> >> > >> >> Mithila
>> >> > >> >>
>> >> > >> >> On Tue, Apr 14, 2009 at 9:48 PM, Mithila Nagendra <
>> >> mnage...@asu.edu
>> >> > >> >wrote:
>> >> > >> >>
>> >> > >> >>> Aaron: Which log file do I look into - there are alot of them.
>> >> Here
>> >> > s
>> >> > >> >>> what the error looks like:
>> >> > >> >>> [mith...@node19:~]$ cd hadoop
>> >> > >> >>> [mith...@node19:~/hadoop]$ bin/hadoop dfs -ls
>> >> > >> >>> 09/04/14 10:09:29 INFO ipc.Client: Retrying connect to server:
>> >> > node18/
>> >> > >> >>> 192.168.0.18:54310. Already tried 0 time(s).
>> >> > >> >>> 09/04/14 10:09:30 INFO ipc.Client: Retrying connect to server:
>> >> > node18/
>> >> > >> >>> 192.168.0.18:54310. Already tried 1 time(s).
>> >> > >> >>> 09/04/14 10:09:31 INFO ipc.Client: Retrying connect to server:
>> >> > node18/
>> >> > >> >>> 192.168.0.18:54310. Already tried 2 time(s).
>> >> > >> >>> 09/04/14 10:09:32 INFO ipc.Client: Retrying connect to server:
>> >> > node18/
>> >> > >> >>> 192.168.0.18:54310. Already tried 3 time(s).
>> >> > >> >>> 09/04/14 10:09:33 INFO ipc.Client: Retrying connect to server:
>> >> > node18/
>> >> > >> >>> 192.168.0.18:54310. Already tried 4 time(s).
>> >> > >> >>> 09/04/14 10:09:34 INFO ipc.Client: Retrying connect to server:
>> >> > node18/
>> >> > >> >>> 192.168.0.18:54310. Already tried 5 time(s).
>> >> > >> >>> 09/04/14 10:09:35 INFO ipc.Client: Retrying connect to server:
>> >> > node18/
>> >> > >> >>> 192.168.0.18:54310. Already tried 6 time(s).
>> >> > >> >>> 09/04/14 10:09:36 INFO ipc.Client: Retrying connect to server:
>> >> > node18/
>> >> > >> >>> 192.168.0.18:54310. Already tried 7 time(s).
>> >> > >> >>> 09/04/14 10:09:37 INFO ipc.Client: Retrying connect to server:
>> >> > node18/
>> >> > >> >>> 192.168.0.18:54310. Already tried 8 time(s).
>> >> > >> >>> 09/04/14 10:09:38 INFO ipc.Client: Retrying connect to server:
>> >> > node18/
>> >> > >> >>> 192.168.0.18:54310. Already tried 9 time(s).
>> >> > >> >>> Bad connection to FS. command aborted.
>> >> > >> >>>
>> >> > >> >>> Node19 is a slave and Node18 is the master.
>> >> > >> >>>
>> >> > >> >>> Mithila
>> >> > >> >>>
>> >> > >> >>>
>> >> > >> >>>
>> >> > >> >>> On Tue, Apr 14, 2009 at 8:53 PM, Aaron Kimball <
>> >> aa...@cloudera.com
>> >> > >> >wrote:
>> >> > >> >>>
>> >> > >> >>>> Are there any error messages in the log files on those nodes?
>> >> > >> >>>> - Aaron
>> >> > >> >>>>
>> >> > >> >>>> On Tue, Apr 14, 2009 at 9:03 AM, Mithila Nagendra <
>> >> > mnage...@asu.edu>
>> >> > >> >>>> wrote:
>> >> > >> >>>>
>> >> > >> >>>> > I ve drawn a blank here! Can't figure out what s wrong with
>> >> the
>> >> > >> ports.
>> >> > >> >>>> I
>> >> > >> >>>> > can
>> >> > >> >>>> > ssh between the nodes but cant access the DFS from the
>> slaves
>> >> -
>> >> > >> says
>> >> > >> >>>> "Bad
>> >> > >> >>>> > connection to DFS". Master seems to be fine.
>> >> > >> >>>> > Mithila
>> >> > >> >>>> >
>> >> > >> >>>> > On Tue, Apr 14, 2009 at 4:28 AM, Mithila Nagendra <
>> >> > >> mnage...@asu.edu>
>> >> > >> >>>> > wrote:
>> >> > >> >>>> >
>> >> > >> >>>> > > Yes I can..
>> >> > >> >>>> > >
>> >> > >> >>>> > >
>> >> > >> >>>> > > On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky <
>> >> > >> jim.twen...@gmail.com
>> >> > >> >>>> > >wrote:
>> >> > >> >>>> > >
>> >> > >> >>>> > >> Can you ssh between the nodes?
>> >> > >> >>>> > >>
>> >> > >> >>>> > >> -jim
>> >> > >> >>>> > >>
>> >> > >> >>>> > >> On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra <
>> >> > >> >>>> mnage...@asu.edu>
>> >> > >> >>>> > >> wrote:
>> >> > >> >>>> > >>
>> >> > >> >>>> > >> > Thanks Aaron.
>> >> > >> >>>> > >> > Jim: The three clusters I setup had ubuntu running on
>> >> them
>> >> > and
>> >> > >> >>>> the dfs
>> >> > >> >>>> > >> was
>> >> > >> >>>> > >> > accessed at port 54310. The new cluster which I ve
>> setup
>> >> has
>> >> > >> Red
>> >> > >> >>>> Hat
>> >> > >> >>>> > >> Linux
>> >> > >> >>>> > >> > release 7.2 (Enigma)running on it. Now when I try to
>> >> access
>> >> > >> the
>> >> > >> >>>> dfs
>> >> > >> >>>> > from
>> >> > >> >>>> > >> > one
>> >> > >> >>>> > >> > of the slaves i get the following response: dfs cannot
>> be
>> >> > >> >>>> accessed.
>> >> > >> >>>> > When
>> >> > >> >>>> > >> I
>> >> > >> >>>> > >> > access the DFS throught the master there s no problem.
>> So
>> >> I
>> >> > >> feel
>> >> > >> >>>> there
>> >> > >> >>>> > a
>> >> > >> >>>> > >> > problem with the port. Any ideas? I did check the list
>> of
>> >> > >> slaves,
>> >> > >> >>>> it
>> >> > >> >>>> > >> looks
>> >> > >> >>>> > >> > fine to me.
>> >> > >> >>>> > >> >
>> >> > >> >>>> > >> > Mithila
>> >> > >> >>>> > >> >
>> >> > >> >>>> > >> >
>> >> > >> >>>> > >> >
>> >> > >> >>>> > >> >
>> >> > >> >>>> > >> > On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky <
>> >> > >> >>>> jim.twen...@gmail.com>
>> >> > >> >>>> > >> > wrote:
>> >> > >> >>>> > >> >
>> >> > >> >>>> > >> > > Mithila,
>> >> > >> >>>> > >> > >
>> >> > >> >>>> > >> > > You said all the slaves were being utilized in the 3
>> >> node
>> >> > >> >>>> cluster.
>> >> > >> >>>> > >> Which
>> >> > >> >>>> > >> > > application did you run to test that and what was
>> your
>> >> > input
>> >> > >> >>>> size?
>> >> > >> >>>> > If
>> >> > >> >>>> > >> you
>> >> > >> >>>> > >> > > tried the word count application on a 516 MB input
>> file
>> >> on
>> >> > >> both
>> >> > >> >>>> > >> cluster
>> >> > >> >>>> > >> > > setups, than some of your nodes in the 15 node
>> cluster
>> >> may
>> >> > >> not
>> >> > >> >>>> be
>> >> > >> >>>> > >> running
>> >> > >> >>>> > >> > > at
>> >> > >> >>>> > >> > > all. Generally, one map job is assigned to each
>> input
>> >> > split
>> >> > >> and
>> >> > >> >>>> if
>> >> > >> >>>> > you
>> >> > >> >>>> > >> > are
>> >> > >> >>>> > >> > > running your cluster with the defaults, the splits
>> are
>> >> 64
>> >> > MB
>> >> > >> >>>> each. I
>> >> > >> >>>> > >> got
>> >> > >> >>>> > >> > > confused when you said the Namenode seemed to do all
>> >> the
>> >> > >> work.
>> >> > >> >>>> Can
>> >> > >> >>>> > you
>> >> > >> >>>> > >> > > check
>> >> > >> >>>> > >> > > conf/slaves and make sure you put the names of all
>> task
>> >> > >> >>>> trackers
>> >> > >> >>>> > >> there? I
>> >> > >> >>>> > >> > > also suggest comparing both clusters with a larger
>> >> input
>> >> > >> size,
>> >> > >> >>>> say
>> >> > >> >>>> > at
>> >> > >> >>>> > >> > least
>> >> > >> >>>> > >> > > 5 GB, to really see a difference.
>> >> > >> >>>> > >> > >
>> >> > >> >>>> > >> > > Jim
>> >> > >> >>>> > >> > >
>> >> > >> >>>> > >> > > On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball <
>> >> > >> >>>> aa...@cloudera.com>
>> >> > >> >>>> > >> > wrote:
>> >> > >> >>>> > >> > >
>> >> > >> >>>> > >> > > > in hadoop-*-examples.jar, use "randomwriter" to
>> >> generate
>> >> > >> the
>> >> > >> >>>> data
>> >> > >> >>>> > >> and
>> >> > >> >>>> > >> > > > "sort"
>> >> > >> >>>> > >> > > > to sort it.
>> >> > >> >>>> > >> > > > - Aaron
>> >> > >> >>>> > >> > > >
>> >> > >> >>>> > >> > > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi <
>> >> > >> >>>> > forpan...@gmail.com>
>> >> > >> >>>> > >> > > wrote:
>> >> > >> >>>> > >> > > >
>> >> > >> >>>> > >> > > > > Your data is too small I guess for 15 clusters
>> ..So
>> >> it
>> >> > >> >>>> might be
>> >> > >> >>>> > >> > > overhead
>> >> > >> >>>> > >> > > > > time of these clusters making your total MR jobs
>> >> more
>> >> > >> time
>> >> > >> >>>> > >> consuming.
>> >> > >> >>>> > >> > > > > I guess you will have to try with larger set of
>> >> data..
>> >> > >> >>>> > >> > > > >
>> >> > >> >>>> > >> > > > > Pankil
>> >> > >> >>>> > >> > > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila
>> Nagendra <
>> >> > >> >>>> > >> mnage...@asu.edu>
>> >> > >> >>>> > >> > > > > wrote:
>> >> > >> >>>> > >> > > > >
>> >> > >> >>>> > >> > > > > > Aaron
>> >> > >> >>>> > >> > > > > >
>> >> > >> >>>> > >> > > > > > That could be the issue, my data is just 516MB
>> -
>> >> > >> wouldn't
>> >> > >> >>>> this
>> >> > >> >>>> > >> see
>> >> > >> >>>> > >> > a
>> >> > >> >>>> > >> > > > bit
>> >> > >> >>>> > >> > > > > of
>> >> > >> >>>> > >> > > > > > speed up?
>> >> > >> >>>> > >> > > > > > Could you guide me to the example? I ll run my
>> >> > cluster
>> >> > >> on
>> >> > >> >>>> it
>> >> > >> >>>> > and
>> >> > >> >>>> > >> > see
>> >> > >> >>>> > >> > > > what
>> >> > >> >>>> > >> > > > > I
>> >> > >> >>>> > >> > > > > > get. Also for my program I had a java timer
>> >> running
>> >> > to
>> >> > >> >>>> record
>> >> > >> >>>> > >> the
>> >> > >> >>>> > >> > > time
>> >> > >> >>>> > >> > > > > > taken
>> >> > >> >>>> > >> > > > > > to complete execution. Does Hadoop have an
>> >> inbuilt
>> >> > >> timer?
>> >> > >> >>>> > >> > > > > >
>> >> > >> >>>> > >> > > > > > Mithila
>> >> > >> >>>> > >> > > > > >
>> >> > >> >>>> > >> > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball
>> <
>> >> > >> >>>> > >> aa...@cloudera.com
>> >> > >> >>>> > >> > >
>> >> > >> >>>> > >> > > > > wrote:
>> >> > >> >>>> > >> > > > > >
>> >> > >> >>>> > >> > > > > > > Virtually none of the examples that ship
>> with
>> >> > Hadoop
>> >> > >> >>>> are
>> >> > >> >>>> > >> designed
>> >> > >> >>>> > >> > > to
>> >> > >> >>>> > >> > > > > > > showcase its speed. Hadoop's speedup comes
>> from
>> >> > its
>> >> > >> >>>> ability
>> >> > >> >>>> > to
>> >> > >> >>>> > >> > > > process
>> >> > >> >>>> > >> > > > > > very
>> >> > >> >>>> > >> > > > > > > large volumes of data (starting around, say,
>> >> tens
>> >> > of
>> >> > >> GB
>> >> > >> >>>> per
>> >> > >> >>>> > >> job,
>> >> > >> >>>> > >> > > and
>> >> > >> >>>> > >> > > > > > going
>> >> > >> >>>> > >> > > > > > > up in orders of magnitude from there). So if
>> >> you
>> >> > are
>> >> > >> >>>> timing
>> >> > >> >>>> > >> the
>> >> > >> >>>> > >> > pi
>> >> > >> >>>> > >> > > > > > > calculator (or something like that), its
>> >> results
>> >> > >> won't
>> >> > >> >>>> > >> > necessarily
>> >> > >> >>>> > >> > > be
>> >> > >> >>>> > >> > > > > > very
>> >> > >> >>>> > >> > > > > > > consistent. If a job doesn't have enough
>> >> fragments
>> >> > >> of
>> >> > >> >>>> data
>> >> > >> >>>> > to
>> >> > >> >>>> > >> > > > allocate
>> >> > >> >>>> > >> > > > > > one
>> >> > >> >>>> > >> > > > > > > per each node, some of the nodes will also
>> just
>> >> go
>> >> > >> >>>> unused.
>> >> > >> >>>> > >> > > > > > >
>> >> > >> >>>> > >> > > > > > > The best example for you to run is to use
>> >> > >> randomwriter
>> >> > >> >>>> to
>> >> > >> >>>> > fill
>> >> > >> >>>> > >> up
>> >> > >> >>>> > >> > > > your
>> >> > >> >>>> > >> > > > > > > cluster with several GB of random data and
>> then
>> >> > run
>> >> > >> the
>> >> > >> >>>> sort
>> >> > >> >>>> > >> > > program.
>> >> > >> >>>> > >> > > > > If
>> >> > >> >>>> > >> > > > > > > that doesn't scale up performance from 3
>> nodes
>> >> to
>> >> > >> 15,
>> >> > >> >>>> then
>> >> > >> >>>> > >> you've
>> >> > >> >>>> > >> > > > > > > definitely
>> >> > >> >>>> > >> > > > > > > got something strange going on.
>> >> > >> >>>> > >> > > > > > >
>> >> > >> >>>> > >> > > > > > > - Aaron
>> >> > >> >>>> > >> > > > > > >
>> >> > >> >>>> > >> > > > > > >
>> >> > >> >>>> > >> > > > > > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila
>> >> Nagendra
>> >> > <
>> >> > >> >>>> > >> > > mnage...@asu.edu>
>> >> > >> >>>> > >> > > > > > > wrote:
>> >> > >> >>>> > >> > > > > > >
>> >> > >> >>>> > >> > > > > > > > Hey all
>> >> > >> >>>> > >> > > > > > > > I recently setup a three node hadoop
>> cluster
>> >> and
>> >> > >> ran
>> >> > >> >>>> an
>> >> > >> >>>> > >> > examples
>> >> > >> >>>> > >> > > on
>> >> > >> >>>> > >> > > > > it.
>> >> > >> >>>> > >> > > > > > > It
>> >> > >> >>>> > >> > > > > > > > was pretty fast, and all the three nodes
>> were
>> >> > >> being
>> >> > >> >>>> used
>> >> > >> >>>> > (I
>> >> > >> >>>> > >> > > checked
>> >> > >> >>>> > >> > > > > the
>> >> > >> >>>> > >> > > > > > > log
>> >> > >> >>>> > >> > > > > > > > files to make sure that the slaves are
>> >> > utilized).
>> >> > >> >>>> > >> > > > > > > >
>> >> > >> >>>> > >> > > > > > > > Now I ve setup another cluster consisting
>> of
>> >> 15
>> >> > >> >>>> nodes. I
>> >> > >> >>>> > ran
>> >> > >> >>>> > >> > the
>> >> > >> >>>> > >> > > > same
>> >> > >> >>>> > >> > > > > > > > example, but instead of speeding up, the
>> >> > >> map-reduce
>> >> > >> >>>> task
>> >> > >> >>>> > >> seems
>> >> > >> >>>> > >> > to
>> >> > >> >>>> > >> > > > > take
>> >> > >> >>>> > >> > > > > > > > forever! The slaves are not being used for
>> >> some
>> >> > >> >>>> reason.
>> >> > >> >>>> > This
>> >> > >> >>>> > >> > > second
>> >> > >> >>>> > >> > > > > > > cluster
>> >> > >> >>>> > >> > > > > > > > has a lower, per node processing power,
>> but
>> >> > should
>> >> > >> >>>> that
>> >> > >> >>>> > make
>> >> > >> >>>> > >> > any
>> >> > >> >>>> > >> > > > > > > > difference?
>> >> > >> >>>> > >> > > > > > > > How can I ensure that the data is being
>> >> mapped
>> >> > to
>> >> > >> all
>> >> > >> >>>> the
>> >> > >> >>>> > >> > nodes?
>> >> > >> >>>> > >> > > > > > > Presently,
>> >> > >> >>>> > >> > > > > > > > the only node that seems to be doing all
>> the
>> >> > work
>> >> > >> is
>> >> > >> >>>> the
>> >> > >> >>>> > >> Master
>> >> > >> >>>> > >> > > > node.
>> >> > >> >>>> > >> > > > > > > >
>> >> > >> >>>> > >> > > > > > > > Does 15 nodes in a cluster increase the
>> >> network
>> >> > >> cost?
>> >> > >> >>>> What
>> >> > >> >>>> > >> can
>> >> > >> >>>> > >> > I
>> >> > >> >>>> > >> > > do
>> >> > >> >>>> > >> > > > > to
>> >> > >> >>>> > >> > > > > > > > setup
>> >> > >> >>>> > >> > > > > > > > the cluster to function more efficiently?
>> >> > >> >>>> > >> > > > > > > >
>> >> > >> >>>> > >> > > > > > > > Thanks!
>> >> > >> >>>> > >> > > > > > > > Mithila Nagendra
>> >> > >> >>>> > >> > > > > > > > Arizona State University
>> >> > >> >>>> > >> > > > > > > >
>> >> > >> >>>> > >> > > > > > >
>> >> > >> >>>> > >> > > > > >
>> >> > >> >>>> > >> > > > >
>> >> > >> >>>> > >> > > >
>> >> > >> >>>> > >> > >
>> >> > >> >>>> > >> >
>> >> > >> >>>> > >>
>> >> > >> >>>> > >
>> >> > >> >>>> > >
>> >> > >> >>>> >
>> >> > >> >>>>
>> >> > >> >>>
>> >> > >> >>>
>> >> > >> >>
>> >> > >> >
>> >> > >>
>> >> > >>
>> >> > >> Ravi
>> >> > >> --
>> >> > >>
>> >> > >>
>> >> > >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Alpha Chapters of my book on Hadoop are available
>> >> http://www.apress.com/book/view/9781430219422
>> >>
>> >
>> >
>>
>
>
>
> --
> Alpha Chapters of my book on Hadoop are available
> http://www.apress.com/book/view/9781430219422
>



-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422

Re: Map-Reduce Slow Down

Reply via email to