> I'm using the machine running the namenode to run maps as well.
Please do not run maps on the machine that is running the namenode. This
would cause CPU contention and slow down namenode. Thus more easily to see
SocketTimeoutException.

Hairong 

On 5/9/08 11:24 AM, "James Moore" <[EMAIL PROTECTED]> wrote:

> On Wed, May 7, 2008 at 2:45 PM, Chris K Wensel <[EMAIL PROTECTED]> wrote:
>> Hi James
>> 
>> Were you able to start all the nodes in the same 'availability zone'? You
>> using the new AMI kernels?
> 
> After I saw your note, I restarted new instances with the new kernels
> (aki-b51cf9dc and ari-b31cf9da) and made sure everything was in the
> same availability zone.
> 
>> If you are using the contrib/ec2 scripts, you might upgrade (just the
>> scripts) to
>> 
http://svn.apache.org/viewvc/hadoop/core/branches/branch-0.17/src/contrib/ec2>>
/
> 
> I'll take a look at these - I've been doing it by hand.
> 
> Hairong wrote:
>> Taking the timeout out is very dangerous. It may cause your application to
>> hang. You could change the timeout parameter to a larger number.
> 
> Thanks - reducing the timeout did seem like a bad idea.  With the new
> kernels, I'm seeing timeout errors like this:
> 
> java.net.SocketTimeoutException: timed out waiting for rpc response
> at org.apache.hadoop.ipc.Client.call(Client.java:514)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198)
> at org.apache.hadoop.dfs.$Proxy5.mkdirs(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.j
> ava:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocation
> Handler.java:82)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandle
> r.java:59)
> at org.apache.hadoop.dfs.$Proxy5.mkdirs(Unknown Source)
> at org.apache.hadoop.dfs.DFSClient.mkdirs(DFSClient.java:550)
> at 
> org.apache.hadoop.dfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:
> 184)
> at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:982)
> at 
> org.apache.hadoop.mapred.TaskTracker$TaskInProgress.localizeTask(TaskTracker.j
> ava:1429)
> at 
> org.apache.hadoop.mapred.TaskTracker$TaskInProgress.launchTask(TaskTracker.jav
> a:1493)
> at org.apache.hadoop.mapred.TaskTracker.launchTaskForJob(TaskTracker.java:700)
> at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:693)
> at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1282)
> at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:923)
> at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1318)
> at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2210)
> 
> I'll experiment with increasing the timeout.
> 
> I'm using the machine running the namenode to run maps as well - could
> that be a source of my problem?  The load is fairly high, essentially
> no idle time.  8 cores per machine, so I've got 8 maps running.  I'm
> guessing I'd be better off running 80 smaller machines instead of 20
> larger ones for the same price, but we haven't been approved for more
> than 20 instances yet.  Given that I'm not seeing any idle time, I'm
> assuming that I'm CPU not IO-bound.
> 
> Cpu(s): 89.6%us,  5.7%sy,  0.0%ni,  0.6%id,  0.0%wa,  0.0%hi,  0.1%si,  4.0%st
> Mem:  15736360k total, 14935708k used,   800652k free,   237980k buffers
> Swap:        0k total,        0k used,        0k free,  7545100k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 28955 james     21   0 1308m 750m 9440 S  121  4.9   0:36.61
> /usr/lib/jvm/java-6-sun-1.6.0.03/jre/bin/java
> -Djava.library.path=/home/james/dev/hadoop/lib/native/Linux-amd64-64:/home/jam
> es/dfsTmp/mapred/local/taskTracker/jobcache/job_2008
> 28989 james     18   0 1298m 725m 9376 S  120  4.7   0:30.48
> /usr/lib/jvm/java-6-sun-1.6.0.03/jre/bin/java
> -Djava.library.path=/home/james/dev/hadoop/lib/native/Linux-amd64-64:/home/jam
> es/dfsTmp/mapred/local/taskTracker/jobcache/job_2008
> 29029 james     18   0 1349m 504m 9376 S  117  3.3   0:24.55
> /usr/lib/jvm/java-6-sun-1.6.0.03/jre/bin/java
> -Djava.library.path=/home/james/dev/hadoop/lib/native/Linux-amd64-64:/home/jam
> es/dfsTmp/mapred/local/taskTracker/jobcache/job_2008
> 29059 james     18   0 1301m 313m 9428 S   81  2.0   0:16.51
> /usr/lib/jvm/java-6-sun-1.6.0.03/jre/bin/java
> -Djava.library.path=/home/james/dev/hadoop/lib/native/Linux-amd64-64:/home/jam
> es/dfsTmp/mapred/local/taskTracker/jobcache/job_2008
> 25658 james     20   0 1293m 277m 9204 S    8  1.8   0:29.98
> /usr/lib/jvm/java-6-sun-1.6.0.03/jre/bin/java
> -Djava.library.path=/home/james/dev/hadoop/lib/native/Linux-amd64-64:/home/jam
> es/dfsTmp/mapred/local/taskTracker/jobcache/job_2008
> 25756 james     19   0 1286m 412m 9204 S    3  2.7   0:30.66
> /usr/lib/jvm/java-6-sun-1.6.0.03/jre/bin/java
> -Djava.library.path=/home/james/dev/hadoop/lib/native/Linux-amd64-64:/home/jam
> es/dfsTmp/mapred/local/taskTracker/jobcache/job_2008
> 25688 james     19   0 1286m 332m 9204 S    2  2.2   0:28.69
> /usr/lib/jvm/java-6-sun-1.6.0.03/jre/bin/java
> -Djava.library.path=/home/james/dev/hadoop/lib/native/Linux-amd64-64:/home/jam
> es/dfsTmp/mapred/local/taskTracker/jobcache/job_2008
>  1141 james     24   0 2332m 281m 8932 S    1  1.8   3:56.17
> /usr/lib/jvm/java-6-sun-1.6.0.03/bin/java -Xmx2000m
> -Dcom.sun.management.jmxremote
> -Dhadoop.log.dir=/home/james/dev/hadoop/logs
> -Dhadoop.log.file=hadoop-james-jobtracker-domU-1
> 25724 james     19   0 1286m 386m 9204 S    1  2.5   0:28.96
> /usr/lib/jvm/java-6-sun-1.6.0.03/jre/bin/java
> -Djava.library.path=/home/james/dev/hadoop/lib/native/Linux-amd64-64:/home/jam
> es/dfsTmp/mapred/local/taskTracker/jobcache/job_2008
>   822 james     24   0 2306m  91m 8912 S    0  0.6   3:15.12
> /usr/lib/jvm/java-6-sun-1.6.0.03/bin/java -Xmx2000m
> -Dcom.sun.management.jmxremote
> -Dhadoop.log.dir=/home/james/dev/hadoop/logs
> -Dhadoop.log.file=hadoop-james-namenode-domU-12-
> 
> FYI, I'm using JRuby to do the work in the map tasks.  It's working well so
> far.

Reply via email to