Re: Shuffle In Memory OutOfMemoryError

2010-03-09 Thread Christopher Douglas
That section of code is unmodified in MR-1182. See the patches/svn  
log. -C


Sent from my iPhone

On Mar 9, 2010, at 7:44 PM, "Ted Yu"  wrote:


I just downloaded hadoop-0.20.2 tar ball from cloudera mirror.
This is what I see in ReduceTask (line 999):
 public synchronized boolean reserve(int requestedSize,  
InputStream in)


 throws InterruptedException {
   // Wait till the request can be fulfilled...
   while ((size + requestedSize) > maxSize) {

I don't see the fix from MR-1182.

That's why I suggested to Andy that he manually apply MR-1182.

Cheers

On Tue, Mar 9, 2010 at 5:01 PM, Andy Sautins >wrote:




Thanks Christopher.

The heap size for reduce tasks is configured to be 640M (
mapred.child.java.opts set to -Xmx640m ).

Andy

-Original Message-
From: Christopher Douglas [mailto:chri...@yahoo-inc.com]
Sent: Tuesday, March 09, 2010 5:19 PM
To: common-user@hadoop.apache.org
Subject: Re: Shuffle In Memory OutOfMemoryError

No, MR-1182 is included in 0.20.2

What heap size have you set for your reduce tasks? -C

Sent from my iPhone

On Mar 9, 2010, at 2:34 PM, "Ted Yu"  wrote:


Andy:
You need to manually apply the patch.

Cheers

On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins <

andy.saut...@returnpath.net

wrote:




Thanks Ted.  My understanding is that MAPREDUCE-1182 is included
in the
0.20.2 release.  We upgraded our cluster to 0.20.2 this weekend and
re-ran
the same job scenarios.  Running with mapred.reduce.parallel.copies
set to 1
and continue to have the same Java heap space error.



-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Tuesday, March 09, 2010 12:56 PM
To: common-user@hadoop.apache.org
Subject: Re: Shuffle In Memory OutOfMemoryError

This issue has been resolved in
http://issues.apache.org/jira/browse/MAPREDUCE-1182

Please apply the patch
M1182-1v20.patch<


http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch




On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins <

andy.saut...@returnpath.net

wrote:




Thanks Ted.  Very helpful.  You are correct that I misunderstood  
the

code

at ReduceTask.java:1535.  I missed the fact that it's in a
IOException

catch

block.  My mistake.  That's what I get for being in a rush.

For what it's worth I did re-run the job with
mapred.reduce.parallel.copies set with values from 5 all the way
down to

1.

All failed with the same error:

Error: java.lang.OutOfMemoryError: Java heap space
 at


org.apache.hadoop.mapred.ReduceTask$ReduceCopier
$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)

 at


org.apache.hadoop.mapred.ReduceTask$ReduceCopier
$MapOutputCopier.getMapOutput(ReduceTask.java:1408)

 at


org.apache.hadoop.mapred.ReduceTask$ReduceCopier
$MapOutputCopier.copyOutput(ReduceTask.java:1261)

 at

org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
$MapOutputCopier.run

(ReduceTask.java:1195)



So from that it does seem like something else might be going on,
yes?

I

need to do some more research.

I appreciate your insights.

Andy

-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Sunday, March 07, 2010 3:38 PM
To: common-user@hadoop.apache.org
Subject: Re: Shuffle In Memory OutOfMemoryError

My observation is based on this call chain:
MapOutputCopier.run() calling copyOutput() calling getMapOutput()
calling
ramManager.canFitInMemory(decompressedLength)

Basically ramManager.canFitInMemory() makes decision without
considering
the
number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of
total

heap

may be used in shuffling if default parameters were used.
Of course, you should check the value for
mapred.reduce.parallel.copies

to

see if it is 5. If it is 4 or lower, my reasoning wouldn't apply.

About ramManager.unreserve() call, ReduceTask.java from hadoop
0.20.2

only

has 2731 lines. So I have to guess the location of the code
snippet you
provided.
I found this around line 1535:
 } catch (IOException ioe) {
   LOG.info("Failed to shuffle from " +
mapOutputLoc.getTaskAttemptId(),
ioe);

   // Inform the ram-manager
   ramManager.closeInMemoryFile(mapOutputLength);
   ramManager.unreserve(mapOutputLength);

   // Discard the map-output
   try {
 mapOutput.discard();
   } catch (IOException ignored) {
 LOG.info("Failed to discard map-output from " +
  mapOutputLoc.getTaskAttemptId(), ignored);
   }
Please confirm the line number.

If we're looking at the same code, I am afraid I don't see how we
can
improve it. First, I assume IOException shouldn't happen that  
often.

Second,
mapOutput.discard() just sets:
   data = null;
for in memory case. Even if we call mapOutput.discard() before
ramManager.unreserve(), we don't know when GC would kick in and
make more
memory available.
Of course, given the large number of map outputs in your system,  
it

became

more likely that the root cause from my reasoning made OOME happen

sooner.


Re: (Strange!)getFileSystem in JVM shutdown hook throws shutdown in progress exception

2010-03-09 Thread Ted Yu
By the time run() gets executed, main() has already started shutdown.
Can you perform some action in main() - now it's empty.

On Tue, Mar 9, 2010 at 9:39 PM, Silence  wrote:

>
> Hi fellows
> Below code segment add a shutdown hook to JVM, but when I got a strange
> exception,
> java.lang.IllegalStateException: Shutdown in progress
>at
> java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:39)
>at java.lang.Runtime.addShutdownHook(Runtime.java:192)
>at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1387)
>at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
>at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
>at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:180)
>at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
>at young.Main$1.run(Main.java:21)
> Java doc said this exception is threw when the virtual machine is already
> in
> the process of shutting down, (http://java.sun.com/j2se/1.5.0/docs/api/),
> what does this mean? Why this happen? How to fix ?
> I'm really appreciate if you can try this code, and help me to figure out
> what's going on here, thank you !
>
> ---
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.FileSystem;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.mapred.JobConf;
>
> @SuppressWarnings("deprecation")
> public class Main {
>
>public static void main(String[] args) {
>Runtime.getRuntime().addShutdownHook(new Thread() {
>@Override
>public void run() {
>Path path = new Path("/temp/hadoop-young");
>System.out.println("Thread run : " + path);
>Configuration conf = new JobConf();
>FileSystem fs;
>try {
>fs = path.getFileSystem(conf);
>if(fs.exists(path)){
>fs.delete(path);
>}
>} catch (Exception e) {
>System.err.println(e.getMessage());
>e.printStackTrace();
>}
>};
>});
>}
> }
> --
> View this message in context:
> http://old.nabble.com/%28Strange%21%29getFileSystem-in-JVM-shutdown-hook-throws-shutdown-in-progress-exception-tp27845803p27845803.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>


RE: Tracking Metrics in Hadoop by User

2010-03-09 Thread sagar_shukla
Hi Steve,
  I had observed issues with Ganglia in terms of refresh of data when the 
nodes go down or removed from the cluster. It could be because of the 
complexity of the environment, but I found Nagios useful in that front.

There is a Hadoop plugin available for Nagios which provides node-based 
statistics. Though I have not used it, but you can give it a try and see if 
that is useful in providing the details that you want.
http://exchange.nagios.org/directory/Plugins/Others/check_hadoop%252Ddfs-2Esh/details

Thanks,
Sagar Shukla

-Original Message-
From: Stephen Watt [mailto:sw...@us.ibm.com]
Sent: Tuesday, March 09, 2010 11:37 PM
To: common-user@hadoop.apache.org
Subject: Tracking Metrics in Hadoop by User

I'm interested in the ability to track metrics (such as CPU time, storage
used per machine, across the cluster) in Hadoop by User. I've taken a look
at the Fair and Capacity Schedulers and they seem oriented towards
ensuring fair use between users' jobs rather than providing a feature
which also reports what resources the users actually used on the cluster.
Likewise, with other tools like Ganglia, which appear to be concerned with
reporting metrics by machine (and not by job). I've also taken a look
through the common/metrics tickets in JIRA and there does not seem to be
any open work that addresses this requirement.

Have I missed something ? Has anyone been able to do this ? Is there a way
to capture metrics by Job (which could be correlated back to a user?) If
not, is there any current or forecasted work in the project that addresses
this requirement ?

Kind regards
Steve Watt

DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


(Strange!)getFileSystem in JVM shutdown hook throws shutdown in progress exception

2010-03-09 Thread Silllllence

Hi fellows
Below code segment add a shutdown hook to JVM, but when I got a strange
exception,  
java.lang.IllegalStateException: Shutdown in progress
at 
java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:39)
at java.lang.Runtime.addShutdownHook(Runtime.java:192)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1387)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:180)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
at young.Main$1.run(Main.java:21)
Java doc said this exception is threw when the virtual machine is already in
the process of shutting down, (http://java.sun.com/j2se/1.5.0/docs/api/),
what does this mean? Why this happen? How to fix ? 
I'm really appreciate if you can try this code, and help me to figure out
what's going on here, thank you !
---
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapred.JobConf;

@SuppressWarnings("deprecation")
public class Main {

public static void main(String[] args) {
Runtime.getRuntime().addShutdownHook(new Thread() {
@Override
public void run() {
Path path = new Path("/temp/hadoop-young");
System.out.println("Thread run : " + path);
Configuration conf = new JobConf();
FileSystem fs;
try {
fs = path.getFileSystem(conf);
if(fs.exists(path)){
fs.delete(path);
}
} catch (Exception e) {
System.err.println(e.getMessage());
e.printStackTrace();
}
};
});
}
}
-- 
View this message in context: 
http://old.nabble.com/%28Strange%21%29getFileSystem-in-JVM-shutdown-hook-throws-shutdown-in-progress-exception-tp27845803p27845803.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Namenode problem

2010-03-09 Thread Eason.Lee
this is datanode's log
You'd better post the namenode's log(filename contains "namenode")


2010/3/10 William Kang 

> Hi,
> I got the log dumped here:
>
> 2010-03-09 00:36:47,795 INFO
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> succeeded for blk_6221934658367436050_1025
> 2010-03-09 00:46:49,155 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 12 blocks
> got processed in 11 msecs
> 2010-03-09 01:08:08,430 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down DataNode at weliam-desktop/127.0.1.1
> /
> 2010-03-09 22:45:54,715 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
> /
> STARTUP_MSG: Starting DataNode
> STARTUP_MSG:   host = weliam-desktop/127.0.1.1
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.20.1
> STARTUP_MSG:   build =
> http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.20.1-rc1 -r
> 810220; compiled by 'oom' on Tue Sep  1 20:55:56 UTC 2009
> /
> 2010-03-09 22:45:55,330 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call
> to localhost/127.0.0.1:9000 failed on local exception:
> java.io.IOException:
> Connection reset by peer
>  at org.apache.hadoop.ipc.Client.wrapException(Client.java:774)
> at org.apache.hadoop.ipc.Client.call(Client.java:742)
>  at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
> at $Proxy4.getProtocolVersion(Unknown Source)
>  at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
> at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:346)
>  at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:383)
> at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:314)
>  at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:291)
> at
>
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:269)
>  at
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:216)
> at
>
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1283)
>  at
>
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1238)
> at
>
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1246)
>  at
> org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368)
> Caused by: java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcher.read0(Native Method)
>  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
>  at sun.nio.ch.IOUtil.read(IOUtil.java:206)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
>  at
>
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
> at
>
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>  at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>  at java.io.FilterInputStream.read(FilterInputStream.java:116)
> at
>
> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:276)
>  at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>  at java.io.DataInputStream.readInt(DataInputStream.java:370)
> at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
>  at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
>
> 2010-03-09 22:45:55,334 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down DataNode at weliam-desktop/127.0.1.1
> /
>
> At this point, unless I format the Namenode, the web interface for hadoop
> at
> port 50070 is not coming back.
>
>
> William
>
> On Mon, Mar 8, 2010 at 10:59 PM, Eason.Lee  wrote:
>
> > It's usually in $HADOOP_HOME/logs
> >
> > 2010/3/9 William Kang 
> >
> > > Hi,
> > > If the namenode is not up, how can I get the logdir?
> > >
> > >
> > > William
> > >
> > > On Mon, Mar 8, 2010 at 10:39 PM, Eason.Lee 
> wrote:
> > >
> > > > 2010/3/9 William Kang 
> > > >
> > > > > Hi Eason,
> > > > > Thanks a lot for your reply. But I do have another folder which in
> > not
> > > > > inside /tmp. I did not use default settings.
> > > > >
> > > >
> > > > you'd better post your configuration in detail~~
> > > >
> > > >
> > > > > To make it clear, I will describe what happened:
> > > > > 1. hadoop namenode -format
> > > > > 2. start-all.sh
> > > > > 3. running fine, http://localhost:50070 is accessible
> > > > > 4. stop-all.sh
> > > > > 5. start-all.sh, http://localhost:50070 is NOT accessible
> > > > > Unles

Re: Namenode problem

2010-03-09 Thread William Kang
Hi,
I got the log dumped here:

2010-03-09 00:36:47,795 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_6221934658367436050_1025
2010-03-09 00:46:49,155 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 12 blocks
got processed in 11 msecs
2010-03-09 01:08:08,430 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/
SHUTDOWN_MSG: Shutting down DataNode at weliam-desktop/127.0.1.1
/
2010-03-09 22:45:54,715 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = weliam-desktop/127.0.1.1
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.1
STARTUP_MSG:   build =
http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.20.1-rc1 -r
810220; compiled by 'oom' on Tue Sep  1 20:55:56 UTC 2009
/
2010-03-09 22:45:55,330 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call
to localhost/127.0.0.1:9000 failed on local exception: java.io.IOException:
Connection reset by peer
 at org.apache.hadoop.ipc.Client.wrapException(Client.java:774)
at org.apache.hadoop.ipc.Client.call(Client.java:742)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy4.getProtocolVersion(Unknown Source)
 at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:346)
 at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:383)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:314)
 at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:291)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:269)
 at
org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:216)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1283)
 at
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1238)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1246)
 at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
 at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
 at sun.nio.ch.IOUtil.read(IOUtil.java:206)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
 at
org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
 at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
 at java.io.FilterInputStream.read(FilterInputStream.java:116)
at
org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:276)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
 at java.io.DataInputStream.readInt(DataInputStream.java:370)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
 at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)

2010-03-09 22:45:55,334 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/
SHUTDOWN_MSG: Shutting down DataNode at weliam-desktop/127.0.1.1
/

At this point, unless I format the Namenode, the web interface for hadoop at
port 50070 is not coming back.


William

On Mon, Mar 8, 2010 at 10:59 PM, Eason.Lee  wrote:

> It's usually in $HADOOP_HOME/logs
>
> 2010/3/9 William Kang 
>
> > Hi,
> > If the namenode is not up, how can I get the logdir?
> >
> >
> > William
> >
> > On Mon, Mar 8, 2010 at 10:39 PM, Eason.Lee  wrote:
> >
> > > 2010/3/9 William Kang 
> > >
> > > > Hi Eason,
> > > > Thanks a lot for your reply. But I do have another folder which in
> not
> > > > inside /tmp. I did not use default settings.
> > > >
> > >
> > > you'd better post your configuration in detail~~
> > >
> > >
> > > > To make it clear, I will describe what happened:
> > > > 1. hadoop namenode -format
> > > > 2. start-all.sh
> > > > 3. running fine, http://localhost:50070 is accessible
> > > > 4. stop-all.sh
> > > > 5. start-all.sh, http://localhost:50070 is NOT accessible
> > > > Unless I format the namenode, the HDFS master
> > > > http://localhost:50070/dfshealth.jsp is not accessible.
> > > >
> > >
> > > Try "jps" to see if the namenode is up~~
> > > If the namenode is not up, maybe there is some error log in logdir, try
> > to
> > > post the error~~
> > >
> > >
> > > > So, I have to redo step 1, 2 again to gain access to
> > > > http://

Re: Shuffle In Memory OutOfMemoryError

2010-03-09 Thread Ted Yu
I just downloaded hadoop-0.20.2 tar ball from cloudera mirror.
This is what I see in ReduceTask (line 999):
  public synchronized boolean reserve(int requestedSize, InputStream in)

  throws InterruptedException {
// Wait till the request can be fulfilled...
while ((size + requestedSize) > maxSize) {

I don't see the fix from MR-1182.

That's why I suggested to Andy that he manually apply MR-1182.

Cheers

On Tue, Mar 9, 2010 at 5:01 PM, Andy Sautins wrote:

>
>  Thanks Christopher.
>
>  The heap size for reduce tasks is configured to be 640M (
> mapred.child.java.opts set to -Xmx640m ).
>
>  Andy
>
> -Original Message-
> From: Christopher Douglas [mailto:chri...@yahoo-inc.com]
> Sent: Tuesday, March 09, 2010 5:19 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Shuffle In Memory OutOfMemoryError
>
> No, MR-1182 is included in 0.20.2
>
> What heap size have you set for your reduce tasks? -C
>
> Sent from my iPhone
>
> On Mar 9, 2010, at 2:34 PM, "Ted Yu"  wrote:
>
> > Andy:
> > You need to manually apply the patch.
> >
> > Cheers
> >
> > On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins <
> andy.saut...@returnpath.net
> > >wrote:
> >
> >>
> >>  Thanks Ted.  My understanding is that MAPREDUCE-1182 is included
> >> in the
> >> 0.20.2 release.  We upgraded our cluster to 0.20.2 this weekend and
> >> re-ran
> >> the same job scenarios.  Running with mapred.reduce.parallel.copies
> >> set to 1
> >> and continue to have the same Java heap space error.
> >>
> >>
> >>
> >> -Original Message-
> >> From: Ted Yu [mailto:yuzhih...@gmail.com]
> >> Sent: Tuesday, March 09, 2010 12:56 PM
> >> To: common-user@hadoop.apache.org
> >> Subject: Re: Shuffle In Memory OutOfMemoryError
> >>
> >> This issue has been resolved in
> >> http://issues.apache.org/jira/browse/MAPREDUCE-1182
> >>
> >> Please apply the patch
> >> M1182-1v20.patch<
> >>
> http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch
> >> >
> >>
> >> On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins <
> andy.saut...@returnpath.net
> >>> wrote:
> >>
> >>>
> >>> Thanks Ted.  Very helpful.  You are correct that I misunderstood the
> >> code
> >>> at ReduceTask.java:1535.  I missed the fact that it's in a
> >>> IOException
> >> catch
> >>> block.  My mistake.  That's what I get for being in a rush.
> >>>
> >>> For what it's worth I did re-run the job with
> >>> mapred.reduce.parallel.copies set with values from 5 all the way
> >>> down to
> >> 1.
> >>> All failed with the same error:
> >>>
> >>> Error: java.lang.OutOfMemoryError: Java heap space
> >>>   at
> >>>
> >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> >> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
> >>>   at
> >>>
> >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> >> $MapOutputCopier.getMapOutput(ReduceTask.java:1408)
> >>>   at
> >>>
> >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier
> >> $MapOutputCopier.copyOutput(ReduceTask.java:1261)
> >>>   at
> >>>
> >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run
> >> (ReduceTask.java:1195)
> >>>
> >>>
> >>>  So from that it does seem like something else might be going on,
> >>> yes?
> >> I
> >>> need to do some more research.
> >>>
> >>> I appreciate your insights.
> >>>
> >>> Andy
> >>>
> >>> -Original Message-
> >>> From: Ted Yu [mailto:yuzhih...@gmail.com]
> >>> Sent: Sunday, March 07, 2010 3:38 PM
> >>> To: common-user@hadoop.apache.org
> >>> Subject: Re: Shuffle In Memory OutOfMemoryError
> >>>
> >>> My observation is based on this call chain:
> >>> MapOutputCopier.run() calling copyOutput() calling getMapOutput()
> >>> calling
> >>> ramManager.canFitInMemory(decompressedLength)
> >>>
> >>> Basically ramManager.canFitInMemory() makes decision without
> >>> considering
> >>> the
> >>> number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of
> >>> total
> >> heap
> >>> may be used in shuffling if default parameters were used.
> >>> Of course, you should check the value for
> >>> mapred.reduce.parallel.copies
> >> to
> >>> see if it is 5. If it is 4 or lower, my reasoning wouldn't apply.
> >>>
> >>> About ramManager.unreserve() call, ReduceTask.java from hadoop
> >>> 0.20.2
> >> only
> >>> has 2731 lines. So I have to guess the location of the code
> >>> snippet you
> >>> provided.
> >>> I found this around line 1535:
> >>>   } catch (IOException ioe) {
> >>> LOG.info("Failed to shuffle from " +
> >>> mapOutputLoc.getTaskAttemptId(),
> >>>  ioe);
> >>>
> >>> // Inform the ram-manager
> >>> ramManager.closeInMemoryFile(mapOutputLength);
> >>> ramManager.unreserve(mapOutputLength);
> >>>
> >>> // Discard the map-output
> >>> try {
> >>>   mapOutput.discard();
> >>> } catch (IOException ignored) {
> >>>   LOG.info("Failed to discard map-output from " +
> >>>mapOutputLoc.getTaskAttemptId(), ignored);
> >>> }
> >>> Plea

Re: Symbolic link in Hadoop HDFS?

2010-03-09 Thread jiang licht
Thanks, Eli. I actually just read hdfs-245 before I saw your reply. So, as I 
understand,  symlink is available by using some relevant patches, right? From 
what you mentioned, it seems to me that symlink support is quite stable as of 
now. Just curious, why not included in a recent hadoop release yet or will it 
be considered in the next release? 

Thanks,

Michael

--- On Tue, 3/9/10, Eli Collins  wrote:

From: Eli Collins 
Subject: Re: Symbolic link in Hadoop HDFS?
To: common-user@hadoop.apache.org
Date: Tuesday, March 9, 2010, 8:01 PM

Hey Michael,

Symbolic links has been implemented [1] but are not yet available in a
Hadoop release. The implementation is only available to clients that
use the new FileContext API so clients like Hive need to be migrated
from using FileSystem to FileContext. This is currently being done in
Hadoop itself [2].

Thanks,
Eli

[1] http://issues.apache.org/jira/browse/HDFS-245
[2] http://issues.apache.org/jira/browse/HADOOP-6446


On Tue, Mar 9, 2010 at 4:10 PM, jiang licht  wrote:
> Is there a way to create symlink in hdfs? And does LOAD function in Pig 
> follows such a link? Thanks!
>
>
> Michael
>
>
>



  

Re: Symbolic link in Hadoop HDFS?

2010-03-09 Thread Eli Collins
Hey Michael,

Symbolic links has been implemented [1] but are not yet available in a
Hadoop release. The implementation is only available to clients that
use the new FileContext API so clients like Hive need to be migrated
from using FileSystem to FileContext. This is currently being done in
Hadoop itself [2].

Thanks,
Eli

[1] http://issues.apache.org/jira/browse/HDFS-245
[2] http://issues.apache.org/jira/browse/HADOOP-6446


On Tue, Mar 9, 2010 at 4:10 PM, jiang licht  wrote:
> Is there a way to create symlink in hdfs? And does LOAD function in Pig 
> follows such a link? Thanks!
>
>
> Michael
>
>
>


RE: Shuffle In Memory OutOfMemoryError

2010-03-09 Thread Andy Sautins

  Thanks Christopher.  

  The heap size for reduce tasks is configured to be 640M ( 
mapred.child.java.opts set to -Xmx640m ).

  Andy

-Original Message-
From: Christopher Douglas [mailto:chri...@yahoo-inc.com] 
Sent: Tuesday, March 09, 2010 5:19 PM
To: common-user@hadoop.apache.org
Subject: Re: Shuffle In Memory OutOfMemoryError

No, MR-1182 is included in 0.20.2

What heap size have you set for your reduce tasks? -C

Sent from my iPhone

On Mar 9, 2010, at 2:34 PM, "Ted Yu"  wrote:

> Andy:
> You need to manually apply the patch.
>
> Cheers
>
> On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins  >wrote:
>
>>
>>  Thanks Ted.  My understanding is that MAPREDUCE-1182 is included  
>> in the
>> 0.20.2 release.  We upgraded our cluster to 0.20.2 this weekend and  
>> re-ran
>> the same job scenarios.  Running with mapred.reduce.parallel.copies  
>> set to 1
>> and continue to have the same Java heap space error.
>>
>>
>>
>> -Original Message-
>> From: Ted Yu [mailto:yuzhih...@gmail.com]
>> Sent: Tuesday, March 09, 2010 12:56 PM
>> To: common-user@hadoop.apache.org
>> Subject: Re: Shuffle In Memory OutOfMemoryError
>>
>> This issue has been resolved in
>> http://issues.apache.org/jira/browse/MAPREDUCE-1182
>>
>> Please apply the patch
>> M1182-1v20.patch<
>> http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch 
>> >
>>
>> On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins >> wrote:
>>
>>>
>>> Thanks Ted.  Very helpful.  You are correct that I misunderstood the
>> code
>>> at ReduceTask.java:1535.  I missed the fact that it's in a  
>>> IOException
>> catch
>>> block.  My mistake.  That's what I get for being in a rush.
>>>
>>> For what it's worth I did re-run the job with
>>> mapred.reduce.parallel.copies set with values from 5 all the way  
>>> down to
>> 1.
>>> All failed with the same error:
>>>
>>> Error: java.lang.OutOfMemoryError: Java heap space
>>>   at
>>>
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
>> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
>>>   at
>>>
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
>> $MapOutputCopier.getMapOutput(ReduceTask.java:1408)
>>>   at
>>>
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
>> $MapOutputCopier.copyOutput(ReduceTask.java:1261)
>>>   at
>>>
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run 
>> (ReduceTask.java:1195)
>>>
>>>
>>>  So from that it does seem like something else might be going on,  
>>> yes?
>> I
>>> need to do some more research.
>>>
>>> I appreciate your insights.
>>>
>>> Andy
>>>
>>> -Original Message-
>>> From: Ted Yu [mailto:yuzhih...@gmail.com]
>>> Sent: Sunday, March 07, 2010 3:38 PM
>>> To: common-user@hadoop.apache.org
>>> Subject: Re: Shuffle In Memory OutOfMemoryError
>>>
>>> My observation is based on this call chain:
>>> MapOutputCopier.run() calling copyOutput() calling getMapOutput()  
>>> calling
>>> ramManager.canFitInMemory(decompressedLength)
>>>
>>> Basically ramManager.canFitInMemory() makes decision without  
>>> considering
>>> the
>>> number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of  
>>> total
>> heap
>>> may be used in shuffling if default parameters were used.
>>> Of course, you should check the value for  
>>> mapred.reduce.parallel.copies
>> to
>>> see if it is 5. If it is 4 or lower, my reasoning wouldn't apply.
>>>
>>> About ramManager.unreserve() call, ReduceTask.java from hadoop  
>>> 0.20.2
>> only
>>> has 2731 lines. So I have to guess the location of the code  
>>> snippet you
>>> provided.
>>> I found this around line 1535:
>>>   } catch (IOException ioe) {
>>> LOG.info("Failed to shuffle from " +
>>> mapOutputLoc.getTaskAttemptId(),
>>>  ioe);
>>>
>>> // Inform the ram-manager
>>> ramManager.closeInMemoryFile(mapOutputLength);
>>> ramManager.unreserve(mapOutputLength);
>>>
>>> // Discard the map-output
>>> try {
>>>   mapOutput.discard();
>>> } catch (IOException ignored) {
>>>   LOG.info("Failed to discard map-output from " +
>>>mapOutputLoc.getTaskAttemptId(), ignored);
>>> }
>>> Please confirm the line number.
>>>
>>> If we're looking at the same code, I am afraid I don't see how we  
>>> can
>>> improve it. First, I assume IOException shouldn't happen that often.
>>> Second,
>>> mapOutput.discard() just sets:
>>> data = null;
>>> for in memory case. Even if we call mapOutput.discard() before
>>> ramManager.unreserve(), we don't know when GC would kick in and  
>>> make more
>>> memory available.
>>> Of course, given the large number of map outputs in your system, it
>> became
>>> more likely that the root cause from my reasoning made OOME happen
>> sooner.
>>>
>>> Thanks
>>>

>>> On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins <
>> andy.saut...@returnpath.net
 wrote:
>>>

  Ted,

  I'm trying to follow the logic in your mail and I'm not sure 

Re: where does jobtracker get the IP and port of namenode?

2010-03-09 Thread jiang licht
Thanks Scott. I guess what I tried to do here is to squeeze the cluster into 
our current system given some fixed network configuration. It is not flexible 
to modify some settings to accommodate the hadoop cluster. I checked hosts 
file, dig/host, resolv.conf, and netstat -r, etc for name resolution, didn't 
see any obvious violation against what hadoop cluster requires and tried tools 
such as tcpdump and telnet, etc. No quick solution found yet. So, I simply 
removed the problematic master box to make my world easier and will return to 
this problem later :)

Thanks,

Michael

--- On Tue, 3/9/10, Scott Carey  wrote:

From: Scott Carey 
Subject: Re: where does jobtracker get the IP and port of namenode?
To: "common-user@hadoop.apache.org" 
Date: Tuesday, March 9, 2010, 3:22 PM


On Mar 8, 2010, at 11:38 PM, jiang licht wrote:

> I guess my confusion is this:
> 
> I point "fs.default.name" to hdfs:A:50001 in core-site.xml (A is IP address). 
> I assume when tasktracker starts, it should use A:50001 to contact namenode. 
> But actually, tasktracker log shows that it uses B which is IP address of 
> another network interface of the  namenode box and because the tasktracker 
> box cannot reach address B, the tasktracker simply retries connection and 
> finally fails to start. I read some source code in 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize and it seems to me 
> the namenode address is passed in earlier from what is specified in 
> "fs.default.name". Is this correct that the namenode address used here by 
> tasktracker comes from "fs.default.name" in core-site.xml or somehow there is 
> another step in which this value is changed? Could someone elaborate this 
> process how tasktracker resolves namenode and contacts it? Thanks!
> 

Hadoop is rather annoyingly strict on how dns and reverse dns are aligned.  I'm 
not sure if it applies to your specific problem, but:
Even if configured to talk to A, if A is an IP address, in some places it will 
reverse-dns that IP, then dns resolve the resolved name.

So if IP A maps by reverse dns (via dns or a hosts file or whatever) to name 
FOO, and FOO resolves to IP address B, then that is likely your problem.
datanodes/namenodes with multiple ip addresses often have problems like this.  
I wish that if you configured it to 'talk to IP address A' all it did was try 
and talk to IP address A, but thats not how it works.
I'm used to seeing this as a datanode network configuration problem, not a 
namenode problem.  But you mention that the server has more than one network 
interface, so it may be related.


> Thanks,
> 
> Michael
> 
> --- On Tue, 3/9/10, jiang licht  wrote:
> 
> From: jiang licht 
> Subject: Re: where does jobtracker get the IP and port of namenode?
> To: common-user@hadoop.apache.org
> Date: Tuesday, March 9, 2010, 12:20 AM
> 
> Sorry, that was a typo in my first post. I did use 'fs.default.name' in 
> core-site.xml.
> 
> BTW, the following is the list of error message when tasktracker was started 
> and shows that tasktracker failed to connect to namenode A:50001.
> 
> /
> STARTUP_MSG: Starting TaskTracker
> STARTUP_MSG:   host = HOSTNAME/127.0.0.1
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.20.1+169.56
> STARTUP_MSG:   build =  -r 8e662cb065be1c4bc61c55e6bff161e09c1d36f3; compiled 
> by 'root' on Tue Feb  9 13:40:08 EST 2010
> /
> 2010-03-09 00:08:50,199 INFO org.mortbay.log: Logging to 
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via 
> org.mortbay.log.Slf4jLog
> 2010-03-09 00:08:50,341 INFO org.apache.hadoop.http.HttpServer: Port returned 
> by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening 
> the listener on 50060
> 2010-03-09 00:08:50,350 INFO org.apache.hadoop.http.HttpServer: 
> listener.getLocalPort() returned 50060 
> webServer.getConnectors()[0].getLocalPort() returned 50060
> 2010-03-09 00:08:50,350 INFO org.apache.hadoop.http.HttpServer: Jetty bound 
> to port 50060
> 2010-03-09 00:08:50,350 INFO org.mortbay.log: jetty-6.1.14
> 2010-03-09 00:08:50,707 INFO org.mortbay.log: Started 
> selectchannelconnec...@0.0.0.0:50060
> 2010-03-09 00:08:50,734 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
> Initializing JVM Metrics with processName=TaskTracker, sessionId=
> 2010-03-09 00:08:50,749 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: 
> Initializing RPC Metrics with hostName=TaskTracker, port=52550
> 2010-03-09 00:08:50,799 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> 2010-03-09 00:08:50,800 INFO org.apache.hadoop.ipc.Server: IPC Server 
> listener on 52550: starting
> 2010-03-09 00:08:50,800 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 0 on 52550: starting
> 2010-03-09 00:08:50,800 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 1 on 52550: starting
> 2010-03-09 00:08:50,801 INFO org.apache.hadoop.ipc.Server: IPC Se

Re: Shuffle In Memory OutOfMemoryError

2010-03-09 Thread Christopher Douglas

No, MR-1182 is included in 0.20.2

What heap size have you set for your reduce tasks? -C

Sent from my iPhone

On Mar 9, 2010, at 2:34 PM, "Ted Yu"  wrote:


Andy:
You need to manually apply the patch.

Cheers

On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins >wrote:




 Thanks Ted.  My understanding is that MAPREDUCE-1182 is included  
in the
0.20.2 release.  We upgraded our cluster to 0.20.2 this weekend and  
re-ran
the same job scenarios.  Running with mapred.reduce.parallel.copies  
set to 1

and continue to have the same Java heap space error.



-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Tuesday, March 09, 2010 12:56 PM
To: common-user@hadoop.apache.org
Subject: Re: Shuffle In Memory OutOfMemoryError

This issue has been resolved in
http://issues.apache.org/jira/browse/MAPREDUCE-1182

Please apply the patch
M1182-1v20.patch<
http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch 
>


On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins 
wrote:




Thanks Ted.  Very helpful.  You are correct that I misunderstood the

code
at ReduceTask.java:1535.  I missed the fact that it's in a  
IOException

catch

block.  My mistake.  That's what I get for being in a rush.

For what it's worth I did re-run the job with
mapred.reduce.parallel.copies set with values from 5 all the way  
down to

1.

All failed with the same error:

Error: java.lang.OutOfMemoryError: Java heap space
  at

org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)

  at

org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
$MapOutputCopier.getMapOutput(ReduceTask.java:1408)

  at

org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
$MapOutputCopier.copyOutput(ReduceTask.java:1261)

  at

org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run 
(ReduceTask.java:1195)



 So from that it does seem like something else might be going on,  
yes?

I

need to do some more research.

I appreciate your insights.

Andy

-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Sunday, March 07, 2010 3:38 PM
To: common-user@hadoop.apache.org
Subject: Re: Shuffle In Memory OutOfMemoryError

My observation is based on this call chain:
MapOutputCopier.run() calling copyOutput() calling getMapOutput()  
calling

ramManager.canFitInMemory(decompressedLength)

Basically ramManager.canFitInMemory() makes decision without  
considering

the
number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of  
total

heap

may be used in shuffling if default parameters were used.
Of course, you should check the value for  
mapred.reduce.parallel.copies

to

see if it is 5. If it is 4 or lower, my reasoning wouldn't apply.

About ramManager.unreserve() call, ReduceTask.java from hadoop  
0.20.2

only
has 2731 lines. So I have to guess the location of the code  
snippet you

provided.
I found this around line 1535:
  } catch (IOException ioe) {
LOG.info("Failed to shuffle from " +
mapOutputLoc.getTaskAttemptId(),
 ioe);

// Inform the ram-manager
ramManager.closeInMemoryFile(mapOutputLength);
ramManager.unreserve(mapOutputLength);

// Discard the map-output
try {
  mapOutput.discard();
} catch (IOException ignored) {
  LOG.info("Failed to discard map-output from " +
   mapOutputLoc.getTaskAttemptId(), ignored);
}
Please confirm the line number.

If we're looking at the same code, I am afraid I don't see how we  
can

improve it. First, I assume IOException shouldn't happen that often.
Second,
mapOutput.discard() just sets:
data = null;
for in memory case. Even if we call mapOutput.discard() before
ramManager.unreserve(), we don't know when GC would kick in and  
make more

memory available.
Of course, given the large number of map outputs in your system, it

became

more likely that the root cause from my reasoning made OOME happen

sooner.


Thanks




On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins <

andy.saut...@returnpath.net

wrote:




 Ted,

 I'm trying to follow the logic in your mail and I'm not sure I'm
following.  If you would mind helping me understand I would  
appreciate

it.


 Looking at the code maxSingleShuffleLimit is only used in  
determining

if

the copy _can_ fit into memory:

   boolean canFitInMemory(long requestedSize) {
  return (requestedSize < Integer.MAX_VALUE &&
  requestedSize < maxSingleShuffleLimit);
}

  It also looks like the RamManager.reserve should wait until  
memory

is

available so it should hit a memory limit for that reason.

  What does seem a little strange to me is the following (

ReduceTask.java

starting at 2730 ):

// Inform the ram-manager
ramManager.closeInMemoryFile(mapOutputLength);
ramManager.unreserve(mapOutputLength);

// Discard the map-output
try {
  mapOutput.discard();
} catch

Symbolic link in Hadoop HDFS?

2010-03-09 Thread jiang licht
Is there a way to create symlink in hdfs? And does LOAD function in Pig follows 
such a link? Thanks!


Michael


  

Re: Shuffle In Memory OutOfMemoryError

2010-03-09 Thread baleksan
FrrkcyriivlkfkjlkcuhgoyjturopEihymbgfkieNjl$jk

Simon

Simon

Matilda

Boris

lltjhhwgh
Sent via BlackBerry from T-Mobile

-Original Message-
From: Ted Yu 
Date: Tue, 9 Mar 2010 14:33:28 
To: 
Subject: Re: Shuffle In Memory OutOfMemoryError

Andy:
You need to manually apply the patch.

Cheers

On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins wrote:

>
>   Thanks Ted.  My understanding is that MAPREDUCE-1182 is included in the
> 0.20.2 release.  We upgraded our cluster to 0.20.2 this weekend and re-ran
> the same job scenarios.  Running with mapred.reduce.parallel.copies set to 1
> and continue to have the same Java heap space error.
>
>
>
> -Original Message-
> From: Ted Yu [mailto:yuzhih...@gmail.com]
> Sent: Tuesday, March 09, 2010 12:56 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Shuffle In Memory OutOfMemoryError
>
> This issue has been resolved in
> http://issues.apache.org/jira/browse/MAPREDUCE-1182
>
> Please apply the patch
> M1182-1v20.patch<
> http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch>
>
> On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins  >wrote:
>
> >
> >  Thanks Ted.  Very helpful.  You are correct that I misunderstood the
> code
> > at ReduceTask.java:1535.  I missed the fact that it's in a IOException
> catch
> > block.  My mistake.  That's what I get for being in a rush.
> >
> >  For what it's worth I did re-run the job with
> > mapred.reduce.parallel.copies set with values from 5 all the way down to
> 1.
> >  All failed with the same error:
> >
> > Error: java.lang.OutOfMemoryError: Java heap space
> >at
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
> >at
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
> >at
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
> >at
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
> >
> >
> >   So from that it does seem like something else might be going on, yes?
>  I
> > need to do some more research.
> >
> >  I appreciate your insights.
> >
> >  Andy
> >
> > -Original Message-
> > From: Ted Yu [mailto:yuzhih...@gmail.com]
> > Sent: Sunday, March 07, 2010 3:38 PM
> > To: common-user@hadoop.apache.org
> > Subject: Re: Shuffle In Memory OutOfMemoryError
> >
> > My observation is based on this call chain:
> > MapOutputCopier.run() calling copyOutput() calling getMapOutput() calling
> > ramManager.canFitInMemory(decompressedLength)
> >
> > Basically ramManager.canFitInMemory() makes decision without considering
> > the
> > number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of total
> heap
> > may be used in shuffling if default parameters were used.
> > Of course, you should check the value for mapred.reduce.parallel.copies
> to
> > see if it is 5. If it is 4 or lower, my reasoning wouldn't apply.
> >
> > About ramManager.unreserve() call, ReduceTask.java from hadoop 0.20.2
> only
> > has 2731 lines. So I have to guess the location of the code snippet you
> > provided.
> > I found this around line 1535:
> >} catch (IOException ioe) {
> >  LOG.info("Failed to shuffle from " +
> > mapOutputLoc.getTaskAttemptId(),
> >   ioe);
> >
> >  // Inform the ram-manager
> >  ramManager.closeInMemoryFile(mapOutputLength);
> >  ramManager.unreserve(mapOutputLength);
> >
> >  // Discard the map-output
> >  try {
> >mapOutput.discard();
> >  } catch (IOException ignored) {
> >LOG.info("Failed to discard map-output from " +
> > mapOutputLoc.getTaskAttemptId(), ignored);
> >  }
> > Please confirm the line number.
> >
> > If we're looking at the same code, I am afraid I don't see how we can
> > improve it. First, I assume IOException shouldn't happen that often.
> > Second,
> > mapOutput.discard() just sets:
> >  data = null;
> > for in memory case. Even if we call mapOutput.discard() before
> > ramManager.unreserve(), we don't know when GC would kick in and make more
> > memory available.
> > Of course, given the large number of map outputs in your system, it
> became
> > more likely that the root cause from my reasoning made OOME happen
> sooner.
> >
> > Thanks
> >
> > >
> > On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins <
> andy.saut...@returnpath.net
> > >wrote:
> >
> > >
> > >   Ted,
> > >
> > >   I'm trying to follow the logic in your mail and I'm not sure I'm
> > > following.  If you would mind helping me understand I would appreciate
> > it.
> > >
> > >   Looking at the code maxSingleShuffleLimit is only used in determining
> > if
> > > the copy _can_ fit into memory:
> > >
> > > boolean canFitInMemory(long requestedSize) {
> > >return (requestedSize < Integer.MAX_VALUE &&
> > >requestedSize < m

RE: Shuffle In Memory OutOfMemoryError

2010-03-09 Thread Andy Sautins

   Ah.  My mistake.  We will apply the patch manually to 0.20.2 and re-run.  
Just out of curiosity, why do the release notes for 0.20.2 indicate that 
MAPREDUCE-1182 is included in the release, but the patch needs to be applied 
manually.  Is there an additional part of the patch not included in the release?

   Thanks for your help.

   Andy

-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com] 
Sent: Tuesday, March 09, 2010 3:33 PM
To: common-user@hadoop.apache.org
Subject: Re: Shuffle In Memory OutOfMemoryError

Andy:
You need to manually apply the patch.

Cheers

On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins wrote:

>
>   Thanks Ted.  My understanding is that MAPREDUCE-1182 is included in the
> 0.20.2 release.  We upgraded our cluster to 0.20.2 this weekend and re-ran
> the same job scenarios.  Running with mapred.reduce.parallel.copies set to 1
> and continue to have the same Java heap space error.
>
>
>
> -Original Message-
> From: Ted Yu [mailto:yuzhih...@gmail.com]
> Sent: Tuesday, March 09, 2010 12:56 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Shuffle In Memory OutOfMemoryError
>
> This issue has been resolved in
> http://issues.apache.org/jira/browse/MAPREDUCE-1182
>
> Please apply the patch
> M1182-1v20.patch<
> http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch>
>
> On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins  >wrote:
>
> >
> >  Thanks Ted.  Very helpful.  You are correct that I misunderstood the
> code
> > at ReduceTask.java:1535.  I missed the fact that it's in a IOException
> catch
> > block.  My mistake.  That's what I get for being in a rush.
> >
> >  For what it's worth I did re-run the job with
> > mapred.reduce.parallel.copies set with values from 5 all the way down to
> 1.
> >  All failed with the same error:
> >
> > Error: java.lang.OutOfMemoryError: Java heap space
> >at
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
> >at
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
> >at
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
> >at
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
> >
> >
> >   So from that it does seem like something else might be going on, yes?
>  I
> > need to do some more research.
> >
> >  I appreciate your insights.
> >
> >  Andy
> >
> > -Original Message-
> > From: Ted Yu [mailto:yuzhih...@gmail.com]
> > Sent: Sunday, March 07, 2010 3:38 PM
> > To: common-user@hadoop.apache.org
> > Subject: Re: Shuffle In Memory OutOfMemoryError
> >
> > My observation is based on this call chain:
> > MapOutputCopier.run() calling copyOutput() calling getMapOutput() calling
> > ramManager.canFitInMemory(decompressedLength)
> >
> > Basically ramManager.canFitInMemory() makes decision without considering
> > the
> > number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of total
> heap
> > may be used in shuffling if default parameters were used.
> > Of course, you should check the value for mapred.reduce.parallel.copies
> to
> > see if it is 5. If it is 4 or lower, my reasoning wouldn't apply.
> >
> > About ramManager.unreserve() call, ReduceTask.java from hadoop 0.20.2
> only
> > has 2731 lines. So I have to guess the location of the code snippet you
> > provided.
> > I found this around line 1535:
> >} catch (IOException ioe) {
> >  LOG.info("Failed to shuffle from " +
> > mapOutputLoc.getTaskAttemptId(),
> >   ioe);
> >
> >  // Inform the ram-manager
> >  ramManager.closeInMemoryFile(mapOutputLength);
> >  ramManager.unreserve(mapOutputLength);
> >
> >  // Discard the map-output
> >  try {
> >mapOutput.discard();
> >  } catch (IOException ignored) {
> >LOG.info("Failed to discard map-output from " +
> > mapOutputLoc.getTaskAttemptId(), ignored);
> >  }
> > Please confirm the line number.
> >
> > If we're looking at the same code, I am afraid I don't see how we can
> > improve it. First, I assume IOException shouldn't happen that often.
> > Second,
> > mapOutput.discard() just sets:
> >  data = null;
> > for in memory case. Even if we call mapOutput.discard() before
> > ramManager.unreserve(), we don't know when GC would kick in and make more
> > memory available.
> > Of course, given the large number of map outputs in your system, it
> became
> > more likely that the root cause from my reasoning made OOME happen
> sooner.
> >
> > Thanks
> >
> > >
> > On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins <
> andy.saut...@returnpath.net
> > >wrote:
> >
> > >
> > >   Ted,
> > >
> > >   I'm trying to follow the logic in your mail and I'm not sure I'm
> > > following.  If you would mind helping me understand I would appreciate
> > i

Re: Shuffle In Memory OutOfMemoryError

2010-03-09 Thread Ted Yu
Andy:
You need to manually apply the patch.

Cheers

On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins wrote:

>
>   Thanks Ted.  My understanding is that MAPREDUCE-1182 is included in the
> 0.20.2 release.  We upgraded our cluster to 0.20.2 this weekend and re-ran
> the same job scenarios.  Running with mapred.reduce.parallel.copies set to 1
> and continue to have the same Java heap space error.
>
>
>
> -Original Message-
> From: Ted Yu [mailto:yuzhih...@gmail.com]
> Sent: Tuesday, March 09, 2010 12:56 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Shuffle In Memory OutOfMemoryError
>
> This issue has been resolved in
> http://issues.apache.org/jira/browse/MAPREDUCE-1182
>
> Please apply the patch
> M1182-1v20.patch<
> http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch>
>
> On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins  >wrote:
>
> >
> >  Thanks Ted.  Very helpful.  You are correct that I misunderstood the
> code
> > at ReduceTask.java:1535.  I missed the fact that it's in a IOException
> catch
> > block.  My mistake.  That's what I get for being in a rush.
> >
> >  For what it's worth I did re-run the job with
> > mapred.reduce.parallel.copies set with values from 5 all the way down to
> 1.
> >  All failed with the same error:
> >
> > Error: java.lang.OutOfMemoryError: Java heap space
> >at
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
> >at
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
> >at
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
> >at
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
> >
> >
> >   So from that it does seem like something else might be going on, yes?
>  I
> > need to do some more research.
> >
> >  I appreciate your insights.
> >
> >  Andy
> >
> > -Original Message-
> > From: Ted Yu [mailto:yuzhih...@gmail.com]
> > Sent: Sunday, March 07, 2010 3:38 PM
> > To: common-user@hadoop.apache.org
> > Subject: Re: Shuffle In Memory OutOfMemoryError
> >
> > My observation is based on this call chain:
> > MapOutputCopier.run() calling copyOutput() calling getMapOutput() calling
> > ramManager.canFitInMemory(decompressedLength)
> >
> > Basically ramManager.canFitInMemory() makes decision without considering
> > the
> > number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of total
> heap
> > may be used in shuffling if default parameters were used.
> > Of course, you should check the value for mapred.reduce.parallel.copies
> to
> > see if it is 5. If it is 4 or lower, my reasoning wouldn't apply.
> >
> > About ramManager.unreserve() call, ReduceTask.java from hadoop 0.20.2
> only
> > has 2731 lines. So I have to guess the location of the code snippet you
> > provided.
> > I found this around line 1535:
> >} catch (IOException ioe) {
> >  LOG.info("Failed to shuffle from " +
> > mapOutputLoc.getTaskAttemptId(),
> >   ioe);
> >
> >  // Inform the ram-manager
> >  ramManager.closeInMemoryFile(mapOutputLength);
> >  ramManager.unreserve(mapOutputLength);
> >
> >  // Discard the map-output
> >  try {
> >mapOutput.discard();
> >  } catch (IOException ignored) {
> >LOG.info("Failed to discard map-output from " +
> > mapOutputLoc.getTaskAttemptId(), ignored);
> >  }
> > Please confirm the line number.
> >
> > If we're looking at the same code, I am afraid I don't see how we can
> > improve it. First, I assume IOException shouldn't happen that often.
> > Second,
> > mapOutput.discard() just sets:
> >  data = null;
> > for in memory case. Even if we call mapOutput.discard() before
> > ramManager.unreserve(), we don't know when GC would kick in and make more
> > memory available.
> > Of course, given the large number of map outputs in your system, it
> became
> > more likely that the root cause from my reasoning made OOME happen
> sooner.
> >
> > Thanks
> >
> > >
> > On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins <
> andy.saut...@returnpath.net
> > >wrote:
> >
> > >
> > >   Ted,
> > >
> > >   I'm trying to follow the logic in your mail and I'm not sure I'm
> > > following.  If you would mind helping me understand I would appreciate
> > it.
> > >
> > >   Looking at the code maxSingleShuffleLimit is only used in determining
> > if
> > > the copy _can_ fit into memory:
> > >
> > > boolean canFitInMemory(long requestedSize) {
> > >return (requestedSize < Integer.MAX_VALUE &&
> > >requestedSize < maxSingleShuffleLimit);
> > >  }
> > >
> > >It also looks like the RamManager.reserve should wait until memory
> is
> > > available so it should hit a memory limit for that reason.
> > >
> > >What does seem a little strange to me is the fo

RE: Shuffle In Memory OutOfMemoryError

2010-03-09 Thread Andy Sautins

   Thanks Ted.  My understanding is that MAPREDUCE-1182 is included in the 
0.20.2 release.  We upgraded our cluster to 0.20.2 this weekend and re-ran the 
same job scenarios.  Running with mapred.reduce.parallel.copies set to 1 and 
continue to have the same Java heap space error.



-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com] 
Sent: Tuesday, March 09, 2010 12:56 PM
To: common-user@hadoop.apache.org
Subject: Re: Shuffle In Memory OutOfMemoryError

This issue has been resolved in
http://issues.apache.org/jira/browse/MAPREDUCE-1182

Please apply the patch
M1182-1v20.patch

On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins wrote:

>
>  Thanks Ted.  Very helpful.  You are correct that I misunderstood the code
> at ReduceTask.java:1535.  I missed the fact that it's in a IOException catch
> block.  My mistake.  That's what I get for being in a rush.
>
>  For what it's worth I did re-run the job with
> mapred.reduce.parallel.copies set with values from 5 all the way down to 1.
>  All failed with the same error:
>
> Error: java.lang.OutOfMemoryError: Java heap space
>at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
>at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
>at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
>at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
>
>
>   So from that it does seem like something else might be going on, yes?  I
> need to do some more research.
>
>  I appreciate your insights.
>
>  Andy
>
> -Original Message-
> From: Ted Yu [mailto:yuzhih...@gmail.com]
> Sent: Sunday, March 07, 2010 3:38 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Shuffle In Memory OutOfMemoryError
>
> My observation is based on this call chain:
> MapOutputCopier.run() calling copyOutput() calling getMapOutput() calling
> ramManager.canFitInMemory(decompressedLength)
>
> Basically ramManager.canFitInMemory() makes decision without considering
> the
> number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of total heap
> may be used in shuffling if default parameters were used.
> Of course, you should check the value for mapred.reduce.parallel.copies to
> see if it is 5. If it is 4 or lower, my reasoning wouldn't apply.
>
> About ramManager.unreserve() call, ReduceTask.java from hadoop 0.20.2 only
> has 2731 lines. So I have to guess the location of the code snippet you
> provided.
> I found this around line 1535:
>} catch (IOException ioe) {
>  LOG.info("Failed to shuffle from " +
> mapOutputLoc.getTaskAttemptId(),
>   ioe);
>
>  // Inform the ram-manager
>  ramManager.closeInMemoryFile(mapOutputLength);
>  ramManager.unreserve(mapOutputLength);
>
>  // Discard the map-output
>  try {
>mapOutput.discard();
>  } catch (IOException ignored) {
>LOG.info("Failed to discard map-output from " +
> mapOutputLoc.getTaskAttemptId(), ignored);
>  }
> Please confirm the line number.
>
> If we're looking at the same code, I am afraid I don't see how we can
> improve it. First, I assume IOException shouldn't happen that often.
> Second,
> mapOutput.discard() just sets:
>  data = null;
> for in memory case. Even if we call mapOutput.discard() before
> ramManager.unreserve(), we don't know when GC would kick in and make more
> memory available.
> Of course, given the large number of map outputs in your system, it became
> more likely that the root cause from my reasoning made OOME happen sooner.
>
> Thanks
>
> >
> On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins  >wrote:
>
> >
> >   Ted,
> >
> >   I'm trying to follow the logic in your mail and I'm not sure I'm
> > following.  If you would mind helping me understand I would appreciate
> it.
> >
> >   Looking at the code maxSingleShuffleLimit is only used in determining
> if
> > the copy _can_ fit into memory:
> >
> > boolean canFitInMemory(long requestedSize) {
> >return (requestedSize < Integer.MAX_VALUE &&
> >requestedSize < maxSingleShuffleLimit);
> >  }
> >
> >It also looks like the RamManager.reserve should wait until memory is
> > available so it should hit a memory limit for that reason.
> >
> >What does seem a little strange to me is the following (
> ReduceTask.java
> > starting at 2730 ):
> >
> >  // Inform the ram-manager
> >  ramManager.closeInMemoryFile(mapOutputLength);
> >  ramManager.unreserve(mapOutputLength);
> >
> >  // Discard the map-output
> >  try {
> >mapOutput.discard();
> >  } catch (IOException ignored) {
> >LOG.info("Failed to discard map-output f

Re: where does jobtracker get the IP and port of namenode?

2010-03-09 Thread Scott Carey

On Mar 8, 2010, at 11:38 PM, jiang licht wrote:

> I guess my confusion is this:
> 
> I point "fs.default.name" to hdfs:A:50001 in core-site.xml (A is IP address). 
> I assume when tasktracker starts, it should use A:50001 to contact namenode. 
> But actually, tasktracker log shows that it uses B which is IP address of 
> another network interface of the  namenode box and because the tasktracker 
> box cannot reach address B, the tasktracker simply retries connection and 
> finally fails to start. I read some source code in 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize and it seems to me 
> the namenode address is passed in earlier from what is specified in 
> "fs.default.name". Is this correct that the namenode address used here by 
> tasktracker comes from "fs.default.name" in core-site.xml or somehow there is 
> another step in which this value is changed? Could someone elaborate this 
> process how tasktracker resolves namenode and contacts it? Thanks!
> 

Hadoop is rather annoyingly strict on how dns and reverse dns are aligned.  I'm 
not sure if it applies to your specific problem, but:
Even if configured to talk to A, if A is an IP address, in some places it will 
reverse-dns that IP, then dns resolve the resolved name.

So if IP A maps by reverse dns (via dns or a hosts file or whatever) to name 
FOO, and FOO resolves to IP address B, then that is likely your problem.
datanodes/namenodes with multiple ip addresses often have problems like this.  
I wish that if you configured it to 'talk to IP address A' all it did was try 
and talk to IP address A, but thats not how it works.
I'm used to seeing this as a datanode network configuration problem, not a 
namenode problem.  But you mention that the server has more than one network 
interface, so it may be related.


> Thanks,
> 
> Michael
> 
> --- On Tue, 3/9/10, jiang licht  wrote:
> 
> From: jiang licht 
> Subject: Re: where does jobtracker get the IP and port of namenode?
> To: common-user@hadoop.apache.org
> Date: Tuesday, March 9, 2010, 12:20 AM
> 
> Sorry, that was a typo in my first post. I did use 'fs.default.name' in 
> core-site.xml.
> 
> BTW, the following is the list of error message when tasktracker was started 
> and shows that tasktracker failed to connect to namenode A:50001.
> 
> /
> STARTUP_MSG: Starting TaskTracker
> STARTUP_MSG:   host = HOSTNAME/127.0.0.1
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.20.1+169.56
> STARTUP_MSG:   build =  -r 8e662cb065be1c4bc61c55e6bff161e09c1d36f3; compiled 
> by 'root' on Tue Feb  9 13:40:08 EST 2010
> /
> 2010-03-09 00:08:50,199 INFO org.mortbay.log: Logging to 
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via 
> org.mortbay.log.Slf4jLog
> 2010-03-09 00:08:50,341 INFO org.apache.hadoop.http.HttpServer: Port returned 
> by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening 
> the listener on 50060
> 2010-03-09 00:08:50,350 INFO org.apache.hadoop.http.HttpServer: 
> listener.getLocalPort() returned 50060 
> webServer.getConnectors()[0].getLocalPort() returned 50060
> 2010-03-09 00:08:50,350 INFO org.apache.hadoop.http.HttpServer: Jetty bound 
> to port 50060
> 2010-03-09 00:08:50,350 INFO org.mortbay.log: jetty-6.1.14
> 2010-03-09 00:08:50,707 INFO org.mortbay.log: Started 
> selectchannelconnec...@0.0.0.0:50060
> 2010-03-09 00:08:50,734 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
> Initializing JVM Metrics with processName=TaskTracker, sessionId=
> 2010-03-09 00:08:50,749 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: 
> Initializing RPC Metrics with hostName=TaskTracker, port=52550
> 2010-03-09 00:08:50,799 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> 2010-03-09 00:08:50,800 INFO org.apache.hadoop.ipc.Server: IPC Server 
> listener on 52550: starting
> 2010-03-09 00:08:50,800 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 0 on 52550: starting
> 2010-03-09 00:08:50,800 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 1 on 52550: starting
> 2010-03-09 00:08:50,801 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 2 on 52550: starting
> 2010-03-09 00:08:50,801 INFO org.apache.hadoop.mapred.TaskTracker: 
> TaskTracker up at: HOSTNAME/127.0.0.1:52550
> 2010-03-09 00:08:50,801 INFO org.apache.hadoop.mapred.TaskTracker: Starting 
> tracker tracker_HOSTNAME:HOSTNAME/127.0.0.1:52550
> 2010-03-09 00:08:50,802 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 3 on 52550: starting
> 2010-03-09 00:08:50,854 INFO org.apache.hadoop.mapred.TaskTracker:  Using 
> MemoryCalculatorPlugin : 
> org.apache.hadoop.util.linuxmemorycalculatorplu...@27b4c1d7
> 2010-03-09 00:08:50,856 INFO org.apache.hadoop.mapred.TaskTracker: Starting 
> thread: Map-events fetcher for all reduce tasks on 
> tracker_HOSTNAME:HOSTNAME/127.0.0.1:52550
> 2010-03-09 00:08:50,858 WARN org.apache.hadoop.mapred.Tas

Re: Shuffle In Memory OutOfMemoryError

2010-03-09 Thread Ted Yu
This issue has been resolved in
http://issues.apache.org/jira/browse/MAPREDUCE-1182

Please apply the patch
M1182-1v20.patch

On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins wrote:

>
>  Thanks Ted.  Very helpful.  You are correct that I misunderstood the code
> at ReduceTask.java:1535.  I missed the fact that it's in a IOException catch
> block.  My mistake.  That's what I get for being in a rush.
>
>  For what it's worth I did re-run the job with
> mapred.reduce.parallel.copies set with values from 5 all the way down to 1.
>  All failed with the same error:
>
> Error: java.lang.OutOfMemoryError: Java heap space
>at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
>at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
>at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
>at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
>
>
>   So from that it does seem like something else might be going on, yes?  I
> need to do some more research.
>
>  I appreciate your insights.
>
>  Andy
>
> -Original Message-
> From: Ted Yu [mailto:yuzhih...@gmail.com]
> Sent: Sunday, March 07, 2010 3:38 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Shuffle In Memory OutOfMemoryError
>
> My observation is based on this call chain:
> MapOutputCopier.run() calling copyOutput() calling getMapOutput() calling
> ramManager.canFitInMemory(decompressedLength)
>
> Basically ramManager.canFitInMemory() makes decision without considering
> the
> number of MapOutputCopiers that are running. Thus 1.25 * 0.7 of total heap
> may be used in shuffling if default parameters were used.
> Of course, you should check the value for mapred.reduce.parallel.copies to
> see if it is 5. If it is 4 or lower, my reasoning wouldn't apply.
>
> About ramManager.unreserve() call, ReduceTask.java from hadoop 0.20.2 only
> has 2731 lines. So I have to guess the location of the code snippet you
> provided.
> I found this around line 1535:
>} catch (IOException ioe) {
>  LOG.info("Failed to shuffle from " +
> mapOutputLoc.getTaskAttemptId(),
>   ioe);
>
>  // Inform the ram-manager
>  ramManager.closeInMemoryFile(mapOutputLength);
>  ramManager.unreserve(mapOutputLength);
>
>  // Discard the map-output
>  try {
>mapOutput.discard();
>  } catch (IOException ignored) {
>LOG.info("Failed to discard map-output from " +
> mapOutputLoc.getTaskAttemptId(), ignored);
>  }
> Please confirm the line number.
>
> If we're looking at the same code, I am afraid I don't see how we can
> improve it. First, I assume IOException shouldn't happen that often.
> Second,
> mapOutput.discard() just sets:
>  data = null;
> for in memory case. Even if we call mapOutput.discard() before
> ramManager.unreserve(), we don't know when GC would kick in and make more
> memory available.
> Of course, given the large number of map outputs in your system, it became
> more likely that the root cause from my reasoning made OOME happen sooner.
>
> Thanks
>
> >
> On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins  >wrote:
>
> >
> >   Ted,
> >
> >   I'm trying to follow the logic in your mail and I'm not sure I'm
> > following.  If you would mind helping me understand I would appreciate
> it.
> >
> >   Looking at the code maxSingleShuffleLimit is only used in determining
> if
> > the copy _can_ fit into memory:
> >
> > boolean canFitInMemory(long requestedSize) {
> >return (requestedSize < Integer.MAX_VALUE &&
> >requestedSize < maxSingleShuffleLimit);
> >  }
> >
> >It also looks like the RamManager.reserve should wait until memory is
> > available so it should hit a memory limit for that reason.
> >
> >What does seem a little strange to me is the following (
> ReduceTask.java
> > starting at 2730 ):
> >
> >  // Inform the ram-manager
> >  ramManager.closeInMemoryFile(mapOutputLength);
> >  ramManager.unreserve(mapOutputLength);
> >
> >  // Discard the map-output
> >  try {
> >mapOutput.discard();
> >  } catch (IOException ignored) {
> >LOG.info("Failed to discard map-output from " +
> > mapOutputLoc.getTaskAttemptId(), ignored);
> >  }
> >  mapOutput = null;
> >
> >   So to me that looks like the ramManager unreserves the memory before
> the
> > mapOutput is discarded.  Shouldn't the mapOutput be discarded _before_
> the
> > ramManager unreserves the memory?  If the memory is unreserved before the
> > actual underlying data references are removed then it seems like another
> > thread can try to allocate memory ( R

SEQ

2010-03-09 Thread Raymond Jennings III
Are there any examples that show how to create a SEQ file in HDFS ?


  


Re: How namenode/datanode and jobtracker/tasktracker negotiate connection?

2010-03-09 Thread jiang licht


--- On Tue, 3/9/10, Steve Loughran  wrote:

> From: Steve Loughran 
> Subject: Re: How namenode/datanode and jobtracker/tasktracker negotiate 
> connection?
> To: common-user@hadoop.apache.org
> Date: Tuesday, March 9, 2010, 7:05 AM
> jiang licht wrote:
> > What are the exact packets and steps used to establish
> a namenode/datanode connection and 
> jobtracker/tasktracker connection?
> > 
> > I am asking this due to a weird problem related to
> starting datanodes and tasktrackers. 
> > In my case, the namenode box has 2 ethernet interfaces
> combined as bond0 interface with IP address of IP_A and
> there is an IP alias IP_B for local loopback interface as
> lo:1. All slave boxes sit on the same network segment as
> IP_B.
> > 
> > The network is configured such that no slave box can
> reach namenode box at IP_A but namenode box can reach slave
> boxes (clearly can only routed from bond0). So, slave boxes
> always use "hdfs://IP_B:50001" as "fs.default.name" in
> "core-site.xml" and use IP_B:50002" for job tracker in
> mapred-site.xml to reach namenode box.
> > 
> > There are the following 2 cases how namenode (or
> jobtracker) is configured on namenode box.
> > 
> > Case #1: If I set "fs.default.name" to
> "hdfs://IP_B:50001", no slave boxes can join the cluster as
> data nodes because the request to IP_B:50001 failed. "telnet
> IP_B 50001" on slave boxes resulted in connection refused.
> So, on namenode box, I fired "tcpdump -i bond0 tcp port
> 50001" and then from a slave box did a "telnet IP_B 5001"
> and watched for incoming and outgoing packets on namenode
> box.
> > 
> > Case #2: If I set "fs.default.name" to
> "hdfs://IP_A:50001", slave boxes can join the cluster as
> data nodes. And I did the same thing to use tcpdump and
> telnet to watch the traffic. I compared these two cases and
> found some difference in the traffic. So, I want to know if
> there is a hand-shaking stage for namenode and datanode to
> establish a connection and what are the packets for this
> purpose so that I can figure out if packets exchanged in
> case #1 are correct or not, which may reveal why the
> connection request from data node to name node fails.
> > 
> > Also in Case #2, although all slave boxes can join the
> cluster as datanodes, no slave box can start as a
> tasktracker because at the beginning of starting a
> tasktracker, the tasktracker box uses IP_A:50001 to request
> connection to namenode and as mentioned above (slaves are
> not allowed to reach namenode at IP_A but reverse direction
> is ok), this cannot be done. But my confusion here is that
> on all slave boxes "fs.default.name" is set to use
> IP_B:50001, how come it ended up with contacting the
> namenode with IP_A:50001?
> > 
> > A bit complicated. But any thoughts?
> > 
> 
> the NN listens on the card given by the IP address of its
> hostname; it does not like people connecting to it using a
> different hostname than the one it is on (irritating,
> something to fix)

> It sounds like you have DNS problems. you should have a
> consistent mapping from hostname<-->IP Addr across the
> entire cluster, but the issues you have indicate this may
> not be the case.
> 

My case is more complicated. The network is configured such that slave boxes 
cannot reach master box via its "bond0" interface IP "A" (bond0 = eth0 + eth1, 
the only physical network cards on master box). So, hostname has to be mapped 
to its ALIAS IP address of its local loopback interface B, which is in the same 
network segment as slave boxes. And because of this all slaves have to use B to 
talk to master box.

Then if I run namenode/jobtracker on B, slaves cannot join the cluster as 
datanode because connection to namenode cannot be established and that's why I 
want to know what information needs to be exchanged between namenode and 
datanode to establish the connection. Steve, you mentioned that NN requires IP 
of its hostname, what about DataNode, does DN also require a return packet 
coming back from the NN IS from the IP of the "fs.default.name" specified in 
its core-site.xml? If this is the case, it might explain why datanode cannot 
talk to namenode because returning packets use A, the IP of "bond0" interface.

If I run namenode/jobtracker on A, slaves are able to join the cluster as 
datanodes BUT somehow tasktrackers use address A to talk to namenode, which 
simply fails because it is not allowed. So, I am confused why in this case 
(both "fs.default.name" and "mapred.job.tracker" set to B on slaves) slaves use 
A?

Thanks,
Michael
 


  

Tracking Metrics in Hadoop by User

2010-03-09 Thread Stephen Watt
I'm interested in the ability to track metrics (such as CPU time, storage 
used per machine, across the cluster) in Hadoop by User. I've taken a look 
at the Fair and Capacity Schedulers and they seem oriented towards 
ensuring fair use between users' jobs rather than providing a feature 
which also reports what resources the users actually used on the cluster. 
Likewise, with other tools like Ganglia, which appear to be concerned with 
reporting metrics by machine (and not by job). I've also taken a look 
through the common/metrics tickets in JIRA and there does not seem to be 
any open work that addresses this requirement. 

Have I missed something ? Has anyone been able to do this ? Is there a way 
to capture metrics by Job (which could be correlated back to a user?) If 
not, is there any current or forecasted work in the project that addresses 
this requirement ? 

Kind regards
Steve Watt

Re: Cleaning jobcache manually

2010-03-09 Thread Marcus Herou
OK. Needed to short it down to even 1 day. Scary I know but the cache grows
like crazy.

/M

On Wed, Feb 10, 2010 at 8:19 PM, Allen Wittenauer
wrote:

>
>
>
> On 2/10/10 12:15 AM, "Marcus Herou"  wrote:
> > We run hadoop-0.18.3 and it seems that the jobcache does not get cleaned
> out
> > properly.
> >
> > Would this cron script be to any harm to hadoop ?
> >
> > # Clean all files which are two or more days old
> > /usr/bin/find ${JOB_CACHE_PATH} -type f -mtime +2 -exec rm {} \;
> >
> > Need to start cleaning today so hoping for quick response.
>
> We do it something similar, but wait for it to be 7 days long in case we
> have a particularly long running job.
>
>


-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.he...@tailsweep.com
http://www.tailsweep.com/


Re: where does jobtracker get the IP and port of namenode?

2010-03-09 Thread Sonal Goyal
Hi Michale,

Please check:
http://hadoop.apache.org/common/docs/r0.20.1/cluster_setup.html#Logging

Then see your master and slave logs. The current logs in your emails, as far
as I could deduce show that the connection is failing, but it is unclear
what is causing the connection to fail.
Thanks and Regards,
Sonal


On Tue, Mar 9, 2010 at 3:53 PM, jiang licht  wrote:

> Thanks Sonal. How to set that debug mode? Actually I set
> "dfs.namenode.logging.level" to "all". Please see my first and previous
> posts for error messages.
>
> Thanks,
>
> Michael
>
> --- On Tue, 3/9/10, Sonal Goyal  wrote:
>
> From: Sonal Goyal 
> Subject: Re: where does jobtracker get the IP and port of namenode?
> To: common-user@hadoop.apache.org
> Date: Tuesday, March 9, 2010, 4:01 AM
>
> Can you turn logging level to debug to see what the logs say?
>
> Thanks and Regards,
> Sonal
>
>
> On Tue, Mar 9, 2010 at 1:08 PM, jiang licht  wrote:
>
> > I guess my confusion is this:
> >
> > I point "fs.default.name" to hdfs:A:50001 in core-site.xml (A is IP
> > address). I assume when tasktracker starts, it should use A:50001 to
> contact
> > namenode. But actually, tasktracker log shows that it uses B which is IP
> > address of another network interface of the  namenode box and because the
> > tasktracker box cannot reach address B, the tasktracker simply retries
> > connection and finally fails to start. I read some source code in
> > org.apache.hadoop.hdfs.DistributedFileSystem.initialize and it seems to
> me
> > the namenode address is passed in earlier from what is specified in "
> > fs.default.name". Is this correct that the namenode address used here by
> > tasktracker comes from "fs.default.name" in core-site.xml or somehow
> there
> > is another step in which this value is changed? Could someone elaborate
> this
> > process how tasktracker resolves namenode and contacts it? Thanks!
> >
> > Thanks,
> >
> > Michael
> >
> > --- On Tue, 3/9/10, jiang licht  wrote:
> >
> > From: jiang licht 
> > Subject: Re: where does jobtracker get the IP and port of namenode?
> > To: common-user@hadoop.apache.org
> > Date: Tuesday, March 9, 2010, 12:20 AM
> >
> > Sorry, that was a typo in my first post. I did use 'fs.default.name' in
> > core-site.xml.
> >
> > BTW, the following is the list of error message when tasktracker was
> > started and shows that tasktracker failed to connect to namenode A:50001.
> >
> > /
> > STARTUP_MSG: Starting TaskTracker
> > STARTUP_MSG:   host = HOSTNAME/127.0.0.1
> > STARTUP_MSG:   args = []
> > STARTUP_MSG:   version = 0.20.1+169.56
> > STARTUP_MSG:   build =  -r 8e662cb065be1c4bc61c55e6bff161e09c1d36f3;
> > compiled by 'root' on Tue Feb  9 13:40:08 EST 2010
> > /
> > 2010-03-09 00:08:50,199 INFO org.mortbay.log: Logging to
> > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> > org.mortbay.log.Slf4jLog
> > 2010-03-09 00:08:50,341 INFO org.apache.hadoop.http.HttpServer: Port
> > returned by webServer.getConnectors()[0].getLocalPort() before open() is
> -1.
> > Opening the listener on 50060
> > 2010-03-09 00:08:50,350 INFO org.apache.hadoop.http.HttpServer:
> > listener.getLocalPort() returned 50060
> > webServer.getConnectors()[0].getLocalPort() returned 50060
> > 2010-03-09 00:08:50,350 INFO org.apache.hadoop.http.HttpServer: Jetty
> bound
> > to port 50060
> > 2010-03-09 00:08:50,350 INFO org.mortbay.log: jetty-6.1.14
> > 2010-03-09 00:08:50,707 INFO org.mortbay.log: Started
> > selectchannelconnec...@0.0.0.0:50060
> > 2010-03-09 00:08:50,734 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> > Initializing JVM Metrics with processName=TaskTracker, sessionId=
> > 2010-03-09 00:08:50,749 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
> > Initializing RPC Metrics with hostName=TaskTracker, port=52550
> > 2010-03-09 00:08:50,799 INFO org.apache.hadoop.ipc.Server: IPC Server
> > Responder: starting
> > 2010-03-09 00:08:50,800 INFO org.apache.hadoop.ipc.Server: IPC Server
> > listener on 52550: starting
> > 2010-03-09 00:08:50,800 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 0 on 52550: starting
> > 2010-03-09 00:08:50,800 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 1 on 52550: starting
> > 2010-03-09 00:08:50,801 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 2 on 52550: starting
> > 2010-03-09 00:08:50,801 INFO org.apache.hadoop.mapred.TaskTracker:
> > TaskTracker up at: HOSTNAME/127.0.0.1:52550
> > 2010-03-09 00:08:50,801 INFO org.apache.hadoop.mapred.TaskTracker:
> Starting
> > tracker tracker_HOSTNAME:HOSTNAME/127.0.0.1:52550
> > 2010-03-09 00:08:50,802 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 3 on 52550: starting
> > 2010-03-09 00:08:50,854 INFO org.apache.hadoop.mapred.TaskTracker:  Using
> > MemoryCalculatorPlugin :
> > org.apache.hadoop.util.linuxmemorycalculatorplu...@27b4c1d7
> > 2010-03-09 00:08:50,856 INFO org.apache.hadoop.mapred.T

Re: How namenode/datanode and jobtracker/tasktracker negotiate connection?

2010-03-09 Thread Steve Loughran

jiang licht wrote:

What are the exact packets and steps used to establish a namenode/datanode 
connection and  jobtracker/tasktracker connection?

I am asking this due to a weird problem related to starting datanodes and tasktrackers. 


In my case, the namenode box has 2 ethernet interfaces combined as bond0 
interface with IP address of IP_A and there is an IP alias IP_B for local 
loopback interface as lo:1. All slave boxes sit on the same network segment as 
IP_B.

The network is configured such that no slave box can reach namenode box at IP_A but namenode box can reach slave 
boxes (clearly can only routed from bond0). So, slave boxes always use "hdfs://IP_B:50001" as 
"fs.default.name" in "core-site.xml" and use IP_B:50002" for job tracker in 
mapred-site.xml to reach namenode box.

There are the following 2 cases how namenode (or jobtracker) is configured on 
namenode box.

Case #1: If I set "fs.default.name" to "hdfs://IP_B:50001", no slave boxes can join the cluster as data nodes 
because the request to IP_B:50001 failed. "telnet IP_B 50001" on slave boxes resulted in connection refused. So, on 
namenode box, I fired "tcpdump -i bond0 tcp port 50001" and then from a slave box did a "telnet IP_B 5001" 
and watched for incoming and outgoing packets on namenode box.

Case #2: If I set "fs.default.name" to "hdfs://IP_A:50001", slave boxes can 
join the cluster as data nodes. And I did the same thing to use tcpdump and telnet to watch the 
traffic. I compared these two cases and found some difference in the traffic. So, I want to know if 
there is a hand-shaking stage for namenode and datanode to establish a connection and what are the 
packets for this purpose so that I can figure out if packets exchanged in case #1 are correct or 
not, which may reveal why the connection request from data node to name node fails.

Also in Case #2, although all slave boxes can join the cluster as datanodes, no slave box 
can start as a tasktracker because at the beginning of starting a tasktracker, the 
tasktracker box uses IP_A:50001 to request connection to namenode and as mentioned above 
(slaves are not allowed to reach namenode at IP_A but reverse direction is ok), this 
cannot be done. But my confusion here is that on all slave boxes 
"fs.default.name" is set to use IP_B:50001, how come it ended up with 
contacting the namenode with IP_A:50001?

A bit complicated. But any thoughts?



the NN listens on the card given by the IP address of its hostname; it 
does not like people connecting to it using a different hostname than 
the one it is on (irritating, something to fix)


It sounds like you have DNS problems. you should have a consistent 
mapping from hostname<-->IP Addr across the entire cluster, but the 
issues you have indicate this may not be the case.




Re: where does jobtracker get the IP and port of namenode?

2010-03-09 Thread jiang licht
Thanks Sonal. How to set that debug mode? Actually I set 
"dfs.namenode.logging.level" to "all". Please see my first and previous posts 
for error messages.

Thanks,

Michael

--- On Tue, 3/9/10, Sonal Goyal  wrote:

From: Sonal Goyal 
Subject: Re: where does jobtracker get the IP and port of namenode?
To: common-user@hadoop.apache.org
Date: Tuesday, March 9, 2010, 4:01 AM

Can you turn logging level to debug to see what the logs say?

Thanks and Regards,
Sonal


On Tue, Mar 9, 2010 at 1:08 PM, jiang licht  wrote:

> I guess my confusion is this:
>
> I point "fs.default.name" to hdfs:A:50001 in core-site.xml (A is IP
> address). I assume when tasktracker starts, it should use A:50001 to contact
> namenode. But actually, tasktracker log shows that it uses B which is IP
> address of another network interface of the  namenode box and because the
> tasktracker box cannot reach address B, the tasktracker simply retries
> connection and finally fails to start. I read some source code in
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize and it seems to me
> the namenode address is passed in earlier from what is specified in "
> fs.default.name". Is this correct that the namenode address used here by
> tasktracker comes from "fs.default.name" in core-site.xml or somehow there
> is another step in which this value is changed? Could someone elaborate this
> process how tasktracker resolves namenode and contacts it? Thanks!
>
> Thanks,
>
> Michael
>
> --- On Tue, 3/9/10, jiang licht  wrote:
>
> From: jiang licht 
> Subject: Re: where does jobtracker get the IP and port of namenode?
> To: common-user@hadoop.apache.org
> Date: Tuesday, March 9, 2010, 12:20 AM
>
> Sorry, that was a typo in my first post. I did use 'fs.default.name' in
> core-site.xml.
>
> BTW, the following is the list of error message when tasktracker was
> started and shows that tasktracker failed to connect to namenode A:50001.
>
> /
> STARTUP_MSG: Starting TaskTracker
> STARTUP_MSG:   host = HOSTNAME/127.0.0.1
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.20.1+169.56
> STARTUP_MSG:   build =  -r 8e662cb065be1c4bc61c55e6bff161e09c1d36f3;
> compiled by 'root' on Tue Feb  9 13:40:08 EST 2010
> /
> 2010-03-09 00:08:50,199 INFO org.mortbay.log: Logging to
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> org.mortbay.log.Slf4jLog
> 2010-03-09 00:08:50,341 INFO org.apache.hadoop.http.HttpServer: Port
> returned by webServer.getConnectors()[0].getLocalPort() before open() is -1.
> Opening the listener on 50060
> 2010-03-09 00:08:50,350 INFO org.apache.hadoop.http.HttpServer:
> listener.getLocalPort() returned 50060
> webServer.getConnectors()[0].getLocalPort() returned 50060
> 2010-03-09 00:08:50,350 INFO org.apache.hadoop.http.HttpServer: Jetty bound
> to port 50060
> 2010-03-09 00:08:50,350 INFO org.mortbay.log: jetty-6.1.14
> 2010-03-09 00:08:50,707 INFO org.mortbay.log: Started
> selectchannelconnec...@0.0.0.0:50060
> 2010-03-09 00:08:50,734 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=TaskTracker, sessionId=
> 2010-03-09 00:08:50,749 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
> Initializing RPC Metrics with hostName=TaskTracker, port=52550
> 2010-03-09 00:08:50,799 INFO org.apache.hadoop.ipc.Server: IPC Server
> Responder: starting
> 2010-03-09 00:08:50,800 INFO org.apache.hadoop.ipc.Server: IPC Server
> listener on 52550: starting
> 2010-03-09 00:08:50,800 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 52550: starting
> 2010-03-09 00:08:50,800 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 1 on 52550: starting
> 2010-03-09 00:08:50,801 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 2 on 52550: starting
> 2010-03-09 00:08:50,801 INFO org.apache.hadoop.mapred.TaskTracker:
> TaskTracker up at: HOSTNAME/127.0.0.1:52550
> 2010-03-09 00:08:50,801 INFO org.apache.hadoop.mapred.TaskTracker: Starting
> tracker tracker_HOSTNAME:HOSTNAME/127.0.0.1:52550
> 2010-03-09 00:08:50,802 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 3 on 52550: starting
> 2010-03-09 00:08:50,854 INFO org.apache.hadoop.mapred.TaskTracker:  Using
> MemoryCalculatorPlugin :
> org.apache.hadoop.util.linuxmemorycalculatorplu...@27b4c1d7
> 2010-03-09 00:08:50,856 INFO org.apache.hadoop.mapred.TaskTracker: Starting
> thread: Map-events fetcher for all reduce tasks on
> tracker_HOSTNAME:HOSTNAME/127.0.0.1:52550
> 2010-03-09 00:08:50,858 WARN org.apache.hadoop.mapred.TaskTracker:
> TaskTracker's totalMemoryAllottedForTasks is -1. TaskMemoryManager is
> disabled.
> 2010-03-09 00:08:50,859 INFO org.apache.hadoop.mapred.IndexCache:
> IndexCache created with max memory = 10485760
> 2010-03-09 00:09:11,970 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /A:50001. Already tried 0 time(s).
> 2010-03-09 00:09:32,972 INFO org.apache.hadoop.ipc.Client: Re

How namenode/datanode and jobtracker/tasktracker negotiate connection?

2010-03-09 Thread jiang licht
What are the exact packets and steps used to establish a namenode/datanode 
connection and  jobtracker/tasktracker connection?

I am asking this due to a weird problem related to starting datanodes and 
tasktrackers. 

In my case, the namenode box has 2 ethernet interfaces combined as bond0 
interface with IP address of IP_A and there is an IP alias IP_B for local 
loopback interface as lo:1. All slave boxes sit on the same network segment as 
IP_B.

The network is configured such that no slave box can reach namenode box at IP_A 
but namenode box can reach slave boxes (clearly can only routed from bond0). 
So, slave boxes always use "hdfs://IP_B:50001" as "fs.default.name" in 
"core-site.xml" and use IP_B:50002" for job tracker in mapred-site.xml to reach 
namenode box.

There are the following 2 cases how namenode (or jobtracker) is configured on 
namenode box.

Case #1: If I set "fs.default.name" to "hdfs://IP_B:50001", no slave boxes can 
join the cluster as data nodes because the request to IP_B:50001 failed. 
"telnet IP_B 50001" on slave boxes resulted in connection refused. So, on 
namenode box, I fired "tcpdump -i bond0 tcp port 50001" and then from a slave 
box did a "telnet IP_B 5001" and watched for incoming and outgoing packets on 
namenode box.

Case #2: If I set "fs.default.name" to "hdfs://IP_A:50001", slave boxes can 
join the cluster as data nodes. And I did the same thing to use tcpdump and 
telnet to watch the traffic. I compared these two cases and found some 
difference in the traffic. So, I want to know if there is a hand-shaking stage 
for namenode and datanode to establish a connection and what are the packets 
for this purpose so that I can figure out if packets exchanged in case #1 are 
correct or not, which may reveal why the connection request from data node to 
name node fails.

Also in Case #2, although all slave boxes can join the cluster as datanodes, no 
slave box can start as a tasktracker because at the beginning of starting a 
tasktracker, the tasktracker box uses IP_A:50001 to request connection to 
namenode and as mentioned above (slaves are not allowed to reach namenode at 
IP_A but reverse direction is ok), this cannot be done. But my confusion here 
is that on all slave boxes "fs.default.name" is set to use IP_B:50001, how come 
it ended up with contacting the namenode with IP_A:50001?

A bit complicated. But any thoughts?

Thanks,

Michael


  

Re: where does jobtracker get the IP and port of namenode?

2010-03-09 Thread Sonal Goyal
Can you turn logging level to debug to see what the logs say?

Thanks and Regards,
Sonal


On Tue, Mar 9, 2010 at 1:08 PM, jiang licht  wrote:

> I guess my confusion is this:
>
> I point "fs.default.name" to hdfs:A:50001 in core-site.xml (A is IP
> address). I assume when tasktracker starts, it should use A:50001 to contact
> namenode. But actually, tasktracker log shows that it uses B which is IP
> address of another network interface of the  namenode box and because the
> tasktracker box cannot reach address B, the tasktracker simply retries
> connection and finally fails to start. I read some source code in
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize and it seems to me
> the namenode address is passed in earlier from what is specified in "
> fs.default.name". Is this correct that the namenode address used here by
> tasktracker comes from "fs.default.name" in core-site.xml or somehow there
> is another step in which this value is changed? Could someone elaborate this
> process how tasktracker resolves namenode and contacts it? Thanks!
>
> Thanks,
>
> Michael
>
> --- On Tue, 3/9/10, jiang licht  wrote:
>
> From: jiang licht 
> Subject: Re: where does jobtracker get the IP and port of namenode?
> To: common-user@hadoop.apache.org
> Date: Tuesday, March 9, 2010, 12:20 AM
>
> Sorry, that was a typo in my first post. I did use 'fs.default.name' in
> core-site.xml.
>
> BTW, the following is the list of error message when tasktracker was
> started and shows that tasktracker failed to connect to namenode A:50001.
>
> /
> STARTUP_MSG: Starting TaskTracker
> STARTUP_MSG:   host = HOSTNAME/127.0.0.1
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.20.1+169.56
> STARTUP_MSG:   build =  -r 8e662cb065be1c4bc61c55e6bff161e09c1d36f3;
> compiled by 'root' on Tue Feb  9 13:40:08 EST 2010
> /
> 2010-03-09 00:08:50,199 INFO org.mortbay.log: Logging to
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> org.mortbay.log.Slf4jLog
> 2010-03-09 00:08:50,341 INFO org.apache.hadoop.http.HttpServer: Port
> returned by webServer.getConnectors()[0].getLocalPort() before open() is -1.
> Opening the listener on 50060
> 2010-03-09 00:08:50,350 INFO org.apache.hadoop.http.HttpServer:
> listener.getLocalPort() returned 50060
> webServer.getConnectors()[0].getLocalPort() returned 50060
> 2010-03-09 00:08:50,350 INFO org.apache.hadoop.http.HttpServer: Jetty bound
> to port 50060
> 2010-03-09 00:08:50,350 INFO org.mortbay.log: jetty-6.1.14
> 2010-03-09 00:08:50,707 INFO org.mortbay.log: Started
> selectchannelconnec...@0.0.0.0:50060
> 2010-03-09 00:08:50,734 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=TaskTracker, sessionId=
> 2010-03-09 00:08:50,749 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
> Initializing RPC Metrics with hostName=TaskTracker, port=52550
> 2010-03-09 00:08:50,799 INFO org.apache.hadoop.ipc.Server: IPC Server
> Responder: starting
> 2010-03-09 00:08:50,800 INFO org.apache.hadoop.ipc.Server: IPC Server
> listener on 52550: starting
> 2010-03-09 00:08:50,800 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 52550: starting
> 2010-03-09 00:08:50,800 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 1 on 52550: starting
> 2010-03-09 00:08:50,801 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 2 on 52550: starting
> 2010-03-09 00:08:50,801 INFO org.apache.hadoop.mapred.TaskTracker:
> TaskTracker up at: HOSTNAME/127.0.0.1:52550
> 2010-03-09 00:08:50,801 INFO org.apache.hadoop.mapred.TaskTracker: Starting
> tracker tracker_HOSTNAME:HOSTNAME/127.0.0.1:52550
> 2010-03-09 00:08:50,802 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 3 on 52550: starting
> 2010-03-09 00:08:50,854 INFO org.apache.hadoop.mapred.TaskTracker:  Using
> MemoryCalculatorPlugin :
> org.apache.hadoop.util.linuxmemorycalculatorplu...@27b4c1d7
> 2010-03-09 00:08:50,856 INFO org.apache.hadoop.mapred.TaskTracker: Starting
> thread: Map-events fetcher for all reduce tasks on
> tracker_HOSTNAME:HOSTNAME/127.0.0.1:52550
> 2010-03-09 00:08:50,858 WARN org.apache.hadoop.mapred.TaskTracker:
> TaskTracker's totalMemoryAllottedForTasks is -1. TaskMemoryManager is
> disabled.
> 2010-03-09 00:08:50,859 INFO org.apache.hadoop.mapred.IndexCache:
> IndexCache created with max memory = 10485760
> 2010-03-09 00:09:11,970 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /A:50001. Already tried 0 time(s).
> 2010-03-09 00:09:32,972 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /A:50001. Already tried 1 time(s).
> ...
>
> Thanks,
>
> Michael
>
> --- On Mon, 3/8/10, Arun C Murthy  wrote:
>
> From: Arun C Murthy 
> Subject: Re: where does jobtracker get the IP and port of namenode?
> To: common-user@hadoop.apache.org
> Date: Monday, March 8, 2010, 10:26 PM
>
> > Here's what is set in core-site.xml
> >
> > dfs.default.name=