Yarn AppMaster request for containers not working

2015-04-16 Thread Antonescu Andrei Bogdan
Hello,

I'm writing a Yarn Client for my distributed processing framework and I`m
not able to request containers for workers from AppMaster
addContainerRequest method.

Please find here a more detailed explanation:
http://stackoverflow.com/questions/29668132/yarn-appmaster-request-for-containers-not-working

Let me know if more information is needed about configuration, server logs
or client code.

Many thanks,

Best,
Andrei


Re: How to import custom Python module in MapReduce job?

2013-08-12 Thread Andrei
For some reason using -archives option leads to "Error in configuring
object" without any further information. However, I found out that -files
option works pretty well for this purpose. I was able to run my example as
follows.

1. I put `main.py` and `lib.py` into `app` directory.
2. In `main.py` I used `lib.py` directly, that is, import string is just

import lib

3. Instead of uploading to HDFS and using -archives option I just pointed
to `app` directory in -files option:

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar *-files
app*-mapper "
*app/*main.py map" -reducer "*app/*main.py reduce" -input input -output
output

It did the trick. Note, that I tested with both - CPython (2.6) and PyPy
(1.9), so I think it's quite safe to assume this way correct for Python
scripts.

Thanks for your help, Binglin, without it I wouldn't be able to figure it
out anyway.




On Mon, Aug 12, 2013 at 1:12 PM, Binglin Chang  wrote:

> Maybe you doesn't specify symlink name in you cmd line, so the symlink
> name will be just lib.jar, so I am not sure how you import lib module in
> your main.py file.
> Please try this:
> put main.py lib.py in same jar file, e.g.  app.zip
> *-archives hdfs://hdfs-namenode/user/me/app.zip#app* -mapper "app/main.py
> map" -reducer "app/main.py reduce"
> in main.py:
> import app.lib
> or:
> import .lib
>
>


Re: How to import custom Python module in MapReduce job?

2013-08-12 Thread Andrei
Hi Binglin,

thanks for your explanation, now it makes sense. However, I'm not sure how
to implement suggested method with.

First of all, I found out that `-cachArchive` option is deprecated, so I
had to use `-archives` instead. I put my `lib.py` to directory `lib` and
then zipped it to `lib.zip`. After that I uploaded archive to HDFS and
 linked it in call to Streaming API as follows:

  hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar  -files main.py
*-archives hdfs://hdfs-namenode/user/me/lib.jar* -mapper "./main.py map"
-reducer "./main.py reduce" -combiner "./main.py combine" -input input
-output output

But script failed, and from logs I see that lib.jar hasn't been unpacked.
What am I missing?




On Mon, Aug 12, 2013 at 11:33 AM, Binglin Chang  wrote:

> Hi,
>
> The problem seems to caused by symlink, hadoop uses file cache, so every
> file is in fact a symlink.
>
> lrwxrwxrwx 1 root root 65 Aug 12 15:22 lib.py ->
> /root/hadoop3/data/nodemanager/usercache/root/filecache/13/lib.py
> lrwxrwxrwx 1 root root 66 Aug 12 15:23 main.py ->
> /root/hadoop3/data/nodemanager/usercache/root/filecache/12/main.py
> [root@master01 tmp]# ./main.py
> Traceback (most recent call last):
>   File "./main.py", line 3, in ?
> import lib
> ImportError: No module named lib
>
> This should be a python bug: when using import, it can't handle symlink
>
> You can try to use a directory containing lib.py and use -cacheArchive,
> so the symlink actually links to a directory, python may handle this case
> well.
>
> Thanks,
> Binglin
>
>
>
> On Mon, Aug 12, 2013 at 2:50 PM, Andrei  wrote:
>
>> (cross-posted from 
>> StackOverflow<http://stackoverflow.com/questions/18150208/how-to-import-custom-module-in-mapreduce-job?noredirect=1#comment26584564_18150208>
>> )
>>
>> I have a MapReduce job defined in file *main.py*, which imports module
>> lib from file *lib.py*. I use Hadoop Streaming to submit this job to
>> Hadoop cluster as follows:
>>
>> hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar
>>
>> -files lib.py,main.py
>> -mapper "./main.py map" -reducer "./main.py reduce"
>> -input input -output output
>>
>>  In my understanding, this should put both main.py and lib.py into 
>> *distributed
>> cache folder* on each computing machine and thus make module lib available
>> to main. But it doesn't happen - from log file I see, that files *are
>> really copied* to the same directory, but main can't import lib, throwing
>> *ImportError*.
>>
>> Adding current directory to the path didn't work:
>>
>> import sys
>> sys.path.append(os.path.realpath(__file__))import lib# ImportError
>>
>> though, loading module manually did the trick:
>>
>> import imp
>> lib = imp.load_source('lib', 'lib.py')
>>
>>  But that's not what I want. So why Python interpreter can see other .py 
>> files
>> in the same directory, but can't import them? Note, I have already tried
>> adding empty __init__.py file to the same directory without effect.
>>
>>
>>
>


How to import custom Python module in MapReduce job?

2013-08-11 Thread Andrei
(cross-posted from
StackOverflow
)

I have a MapReduce job defined in file *main.py*, which imports module lib from
file *lib.py*. I use Hadoop Streaming to submit this job to Hadoop cluster
as follows:

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar

-files lib.py,main.py
-mapper "./main.py map" -reducer "./main.py reduce"
-input input -output output

In my understanding, this should put both main.py and lib.py into *distributed
cache folder* on each computing machine and thus make module lib available
to main. But it doesn't happen - from log file I see, that files *are
really copied* to the same directory, but main can't import lib, throwing*
ImportError*.

Adding current directory to the path didn't work:

import sys
sys.path.append(os.path.realpath(__file__))import lib# ImportError

though, loading module manually did the trick:

import imp
lib = imp.load_source('lib', 'lib.py')

But that's not what I want. So why Python interpreter can see other .py files
in the same directory, but can't import them? Note, I have already tried
adding empty __init__.py file to the same directory without effect.


Re: Large-scale collection of logs from multiple Hadoop nodes

2013-08-05 Thread Andrei
We have similar requirements and build our log collection system around
RSyslog and Flume. It is not in production yet, but tests so far look
pretty well. We rejected idea of using AMQP since it introduces large
overhead for log events.

Probably you can use Flume interceptors to do real-time processing on your
events, though I haven't tried anything like that earlier. Alternatively,
you can use Twitter Storm to handle your logs. Anyway, I wouldn't recommend
using Hadoop MapReduce for real-time processing of logs, and there's at
least one important reason for this.

As you probably know, Flume sources obtains new event and put it into
channel, where sink then pulls it from. If we are talking about HDFS Sink,
it has pull interval (normally time, but you can also use total size of
events in channel). If this interval is large, you won't get real-time
processing. And if it is small, Flume will produce large number of small
files in HDFS, say, of size 10-100KB. However, HDFS cannot store multiple
files in a single block, and minimal block size is 64M, so each of your
10-100KB of logs will become 64M (multiplied by # of replicas!).

Of course, you can use some ad-hoc solution like deleting small files from
time to time or combining them into a larger file, but monitoring of such a
system becomes much harder and may lead to unexpected results. So,
processing log events before they get to HDFS seems to be better idea.



On Tue, Aug 6, 2013 at 7:54 AM, Inder Pall  wrote:

> We have been using a flume like system for such usecases at significantly
> large scale and it has been working quite well.
>
> Would like to hear thoughts/challenges around using zeromq alike systems
> at good enough scale.
>
> inder
> "you are the average of 5 people you spend the most time with"
> On Aug 5, 2013 11:29 PM, "Public Network Services" <
> publicnetworkservi...@gmail.com> wrote:
>
>> Hi...
>>
>> I am facing a large-scale usage scenario of log collection from a Hadoop
>> cluster and examining ways as to how it should be implemented.
>>
>> More specifically, imagine a cluster that has hundreds of nodes, each of
>> which constantly produces Syslog events that need to be gathered an
>> analyzed at another point. The total amount of logs could be tens of
>> gigabytes per day, if not more, and the reception rate in the order of
>> thousands of events per second, if not more.
>>
>> One solution is to send those events over the network (e.g., using using
>> flume) and collect them in one or more (less than 5) nodes in the cluster,
>> or in another location, whereby the logs will be processed by a either
>> constantly MapReduce job, or by non-Hadoop servers running some log
>> processing application.
>>
>> Another approach could be to deposit all these events into a queuing
>> system like ActiveMQ or RabbitMQ, or whatever.
>>
>> In all cases, the main objective is to be able to do real-time log
>> analysis.
>>
>> What would be the best way of implementing the above scenario?
>>
>> Thanks!
>>
>> PNS
>>
>>


Re: ConnectionException in container, happens only sometimes

2013-07-11 Thread Andrei
Here are logs of RM and 2 NMs:

RM (master-host): http://pastebin.com/q4qJP8Ld
NM where AM ran (slave-1-host): http://pastebin.com/vSsz7mjG
NM where slave container ran (slave-2-host): http://pastebin.com/NMFi6gRp

The only related error I've found in them is the following (from RM logs):

...
2013-07-11 07:46:06,225 ERROR
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
AppAttemptId doesnt exist in cache appattempt_1373465780870_0005_01
2013-07-11 07:46:06,227 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call org.apache.hadoop.yarn.api.AMRMProtocolPB.allocate from
10.128.40.184:47101: output error
2013-07-11 07:46:06,228 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 8030 caught an exception
java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:265)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:456)
at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2140)
at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:939)
at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:1005)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1747)
2013-07-11 07:46:11,238 INFO org.apache.hadoop.yarn.util.RackResolver:
Resolved my_user to /default-rack
2013-07-11 07:46:11,283 INFO
org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService:
NodeManager from node my_user(cmPort: 59267 httpPort: 8042) registered with
capability: 8192, assigned nodeId my_user:59267
...

Though from stack trace it's hard to tell where this error came from.

Let me know if you need any more information.










On Thu, Jul 11, 2013 at 1:00 AM, Andrei  wrote:

> Hi Omkar,
>
> I'm out of office now, so I'll post it as fast as get back there.
>
> Thanks
>
>
> On Thu, Jul 11, 2013 at 12:39 AM, Omkar Joshi wrote:
>
>> can you post RM/NM logs too.?
>>
>> Thanks,
>> Omkar Joshi
>> *Hortonworks Inc.* <http://www.hortonworks.com>
>>
>>


Re: ConnectionException in container, happens only sometimes

2013-07-10 Thread Andrei
Hi Omkar,

I'm out of office now, so I'll post it as fast as get back there.

Thanks


On Thu, Jul 11, 2013 at 12:39 AM, Omkar Joshi wrote:

> can you post RM/NM logs too.?
>
> Thanks,
> Omkar Joshi
> *Hortonworks Inc.* 
>
>


Re: ConnectionException in container, happens only sometimes

2013-07-10 Thread Andrei
If it helps, full log of AM can be found here <http://pastebin.com/zXTabyvv>
.


On Wed, Jul 10, 2013 at 4:21 PM, Andrei  wrote:

> Hi Devaraj,
>
> thanks for your answer. Yes, I suspected it could be because of host
> mapping, so I have already checked (and have just re-checked) settings in
> /etc/hosts of each machine, and they all are ok. I use both fully-qualified
> names (e.g. `master-host.company.com`) and their shortcuts (e.g.
> `master-host`), so it shouldn't depend on notation too.
>
> I have also checked AM syslog. There's nothing about network, but there
> are several messages like the following:
>
> ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container 
> complete event for unknown container id container_1373460572360_0001_01_88
>
>
> I understand container just doesn't get registered in AM (probably because
> of the same issue), is it correct? So I wonder who sends "container
> complete event" to ApplicationMaster?
>
>
>
>
>
> On Wed, Jul 10, 2013 at 3:19 PM, Devaraj k  wrote:
>
>>  >1. I assume this is the task (container) that tries to establish
>> connection, but what it wants to connect to? 
>>
>> It is trying to connect to MRAppMaster for executing the actual task.
>>
>> ** **
>>
>> >1. I assume this is the task (container) that tries to establish
>> connection, but what it wants to connect to? 
>>
>> It seems Container is not getting the correct MRAppMaster address due to
>> some reason or AM is crashing before giving the task to Container. Probably
>> it is coming due to invalid host mapping.  Can you check the host mapping
>> is proper in both the machines and also check the AM log that time for any
>> clue. 
>>
>> ** **
>>
>> Thanks
>>
>> Devaraj k
>>
>> ** **
>>
>> *From:* Andrei [mailto:faithlessfri...@gmail.com]
>> *Sent:* 10 July 2013 17:32
>> *To:* user@hadoop.apache.org
>> *Subject:* ConnectionException in container, happens only sometimes
>>
>> ** **
>>
>> Hi, 
>>
>> ** **
>>
>> I'm running CDH4.3 installation of Hadoop with the following simple
>> setup: 
>>
>> ** **
>>
>> master-host: runs NameNode, ResourceManager and JobHistoryServer
>>
>> slave-1-host and slave-2-hosts: DataNodes and NodeManagers. 
>>
>> ** **
>>
>> When I run simple MapReduce job (both - using streaming API or Pi example
>> from distribution) on client I see that some tasks fail: 
>>
>> ** **
>>
>> 13/07/10 14:40:10 INFO mapreduce.Job:  map 60% reduce 0%
>>
>> 13/07/10 14:40:14 INFO mapreduce.Job: Task Id :
>> attempt_1373454026937_0005_m_03_0, Status : FAILED
>>
>> 13/07/10 14:40:14 INFO mapreduce.Job: Task Id :
>> attempt_1373454026937_0005_m_05_0, Status : FAILED
>>
>> ...
>>
>> 13/07/10 14:40:23 INFO mapreduce.Job:  map 60% reduce 20%
>>
>> ...
>>
>> ** **
>>
>> Every time different set of tasks/attempts fails. In some cases number of
>> failed attempts becomes critical, and the whole job fails, in other cases
>> job is finished successfully. I can't see any dependency, but I noticed the
>> following. 
>>
>> ** **
>>
>> Let's say, ApplicationMaster runs on _slave-1-host_. In this case on
>> _slave-2-host_ there will be corresponding syslog with the following
>> contents: 
>>
>> ** **
>>
>> ... 
>>
>> 2013-07-10 11:06:10,986 INFO [main] org.apache.hadoop.ipc.Client:
>> Retrying connect to server: slave-2-host/127.0.0.1:11812. Already tried
>> 0 time(s); retry policy is
>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)**
>> **
>>
>> 2013-07-10 11:06:11,989 INFO [main] org.apache.hadoop.ipc.Client:
>> Retrying connect to server: slave-2-host/127.0.0.1:11812. Already tried
>> 1 time(s); retry policy is
>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)**
>> **
>>
>> ...
>>
>> 2013-07-10 11:06:20,013 INFO [main] org.apache.hadoop.ipc.Client:
>> Retrying connect to server: slave-2-host/127.0.0.1:11812. Already tried
>> 9 time(s); retry policy is
>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)**
>> **
>>
>> 2013-07-10 11:06:20,019 WARN [main] org.apache.hadoop.mapred.YarnChild:
>> Exception running child : java.net.ConnectException: Cal

Re: ConnectionException in container, happens only sometimes

2013-07-10 Thread Andrei
Hi Devaraj,

thanks for your answer. Yes, I suspected it could be because of host
mapping, so I have already checked (and have just re-checked) settings in
/etc/hosts of each machine, and they all are ok. I use both fully-qualified
names (e.g. `master-host.company.com`) and their shortcuts (e.g.
`master-host`), so it shouldn't depend on notation too.

I have also checked AM syslog. There's nothing about network, but there are
several messages like the following:

ERROR [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container
complete event for unknown container id
container_1373460572360_0001_01_88


I understand container just doesn't get registered in AM (probably because
of the same issue), is it correct? So I wonder who sends "container
complete event" to ApplicationMaster?





On Wed, Jul 10, 2013 at 3:19 PM, Devaraj k  wrote:

>  >1. I assume this is the task (container) that tries to establish
> connection, but what it wants to connect to? 
>
> It is trying to connect to MRAppMaster for executing the actual task.
>
> ** **
>
> >1. I assume this is the task (container) that tries to establish
> connection, but what it wants to connect to? 
>
> It seems Container is not getting the correct MRAppMaster address due to
> some reason or AM is crashing before giving the task to Container. Probably
> it is coming due to invalid host mapping.  Can you check the host mapping
> is proper in both the machines and also check the AM log that time for any
> clue. 
>
> ** **
>
> Thanks
>
> Devaraj k
>
> ** **
>
> *From:* Andrei [mailto:faithlessfri...@gmail.com]
> *Sent:* 10 July 2013 17:32
> *To:* user@hadoop.apache.org
> *Subject:* ConnectionException in container, happens only sometimes
>
> ** **
>
> Hi, 
>
> ** **
>
> I'm running CDH4.3 installation of Hadoop with the following simple setup:
> 
>
> ** **
>
> master-host: runs NameNode, ResourceManager and JobHistoryServer
>
> slave-1-host and slave-2-hosts: DataNodes and NodeManagers. 
>
> ** **
>
> When I run simple MapReduce job (both - using streaming API or Pi example
> from distribution) on client I see that some tasks fail: 
>
> ** **
>
> 13/07/10 14:40:10 INFO mapreduce.Job:  map 60% reduce 0%
>
> 13/07/10 14:40:14 INFO mapreduce.Job: Task Id :
> attempt_1373454026937_0005_m_03_0, Status : FAILED
>
> 13/07/10 14:40:14 INFO mapreduce.Job: Task Id :
> attempt_1373454026937_0005_m_05_0, Status : FAILED
>
> ...
>
> 13/07/10 14:40:23 INFO mapreduce.Job:  map 60% reduce 20%
>
> ...
>
> ** **
>
> Every time different set of tasks/attempts fails. In some cases number of
> failed attempts becomes critical, and the whole job fails, in other cases
> job is finished successfully. I can't see any dependency, but I noticed the
> following. 
>
> ** **
>
> Let's say, ApplicationMaster runs on _slave-1-host_. In this case on
> _slave-2-host_ there will be corresponding syslog with the following
> contents: 
>
> ** **
>
> ... 
>
> 2013-07-10 11:06:10,986 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: slave-2-host/127.0.0.1:11812. Already tried 0 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1 SECONDS)
>
> 2013-07-10 11:06:11,989 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: slave-2-host/127.0.0.1:11812. Already tried 1 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1 SECONDS)
>
> ...
>
> 2013-07-10 11:06:20,013 INFO [main] org.apache.hadoop.ipc.Client: Retrying
> connect to server: slave-2-host/127.0.0.1:11812. Already tried 9 time(s);
> retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
> sleepTime=1 SECONDS)
>
> 2013-07-10 11:06:20,019 WARN [main] org.apache.hadoop.mapred.YarnChild:
> Exception running child : java.net.ConnectException: Call From slave-2-host/
> 127.0.0.1 to slave-2-host:11812 failed on connection exception:
> java.net.ConnectException: Connection refused; For more details see:
> http://wiki.apache.org/hadoop/ConnectionRefused
>
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> 
>
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 
>
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> 
>
> 

ConnectionException in container, happens only sometimes

2013-07-10 Thread Andrei
Hi,

I'm running CDH4.3 installation of Hadoop with the following simple setup:

master-host: runs NameNode, ResourceManager and JobHistoryServer
slave-1-host and slave-2-hosts: DataNodes and NodeManagers.

When I run simple MapReduce job (both - using streaming API or Pi example
from distribution) on client I see that some tasks fail:

13/07/10 14:40:10 INFO mapreduce.Job:  map 60% reduce 0%
13/07/10 14:40:14 INFO mapreduce.Job: Task Id :
attempt_1373454026937_0005_m_03_0, Status : FAILED
13/07/10 14:40:14 INFO mapreduce.Job: Task Id :
attempt_1373454026937_0005_m_05_0, Status : FAILED
...
13/07/10 14:40:23 INFO mapreduce.Job:  map 60% reduce 20%
...

Every time different set of tasks/attempts fails. In some cases number of
failed attempts becomes critical, and the whole job fails, in other cases
job is finished successfully. I can't see any dependency, but I noticed the
following.

Let's say, ApplicationMaster runs on _slave-1-host_. In this case on
_slave-2-host_ there will be corresponding syslog with the following
contents:

...
2013-07-10 11:06:10,986 INFO [main] org.apache.hadoop.ipc.Client: Retrying
connect to server: slave-2-host/127.0.0.1:11812. Already tried 0 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1 SECONDS)
2013-07-10 11:06:11,989 INFO [main] org.apache.hadoop.ipc.Client: Retrying
connect to server: slave-2-host/127.0.0.1:11812. Already tried 1 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1 SECONDS)
...
2013-07-10 11:06:20,013 INFO [main] org.apache.hadoop.ipc.Client: Retrying
connect to server: slave-2-host/127.0.0.1:11812. Already tried 9 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1 SECONDS)
2013-07-10 11:06:20,019 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : java.net.ConnectException: Call From slave-2-host/
127.0.0.1 to slave-2-host:11812 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:782)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:729)
at org.apache.hadoop.ipc.Client.call(Client.java:1229)
at
org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:225)
at com.sun.proxy.$Proxy6.getTask(Unknown Source)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:131)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:528)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:492)
at
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:499)
at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:593)
at
org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:241)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1278)
at org.apache.hadoop.ipc.Client.call(Client.java:1196)
... 3 more


Notice several things:

1. This exception always happens on the different host than
ApplicationMaster runs on.
2. It always tries to connect to localhost, not other host in cluster.
3. Port number (11812 in this case) is always different.

My questions are:

1. I assume this is the task (container) that tries to establish
connection, but what it wants to connect to?
2. Why this error happens and how can I fix it?

Any suggestions are welcome.

Thanks,
Andrei


unsubscribe

2012-08-09 Thread Andrei Krichevskiy



Sent from my iPhone

On Aug 9, 2012, at 12:46, Pankaj Misra   
wrote:



Thanks Ioan for the help and sharing the link, appreciate it.

The symlink as specified below already exists, and the the response  
of "$ which ld" is


[root@fedora-0 container-executor]# which ld
/bin/ld

Yes, I will surely raise a JIRA for this issue if it does not get  
resolved, and once I am sure that I am not missing out anything.


Thanks and Regards
Pankaj Misra



From: Ioan Eugen Stan [stan.ieu...@gmail.com]
Sent: Thursday, August 09, 2012 4:06 PM
To: user@hadoop.apache.org
Subject: Re: Apache Hadoop 0.23.1 Source Build Failing

It seems that /bin/ld dos not exist so the compiler cannot perform
linking. Looking at the Fedora docs it seems that ld is located in
/usr/bin/ld so you may have to create a symlink to it:

$ ln -s /usr/bin/ld /bin/ld

First check that you have ld installed with: $ which ld

The scripts should also use `which ld` to find the proper path to ld.
So maybe you could raise an issue on JIRA with this.

http://docs.fedoraproject.org/en-US/Fedora/13/html/Release_Notes/sect-Release_Notes-The_GCC_Compiler_Collection.html

Cheers,

On Thu, Aug 9, 2012 at 1:05 PM, Pankaj Misra > wrote:

Dear All,

I am building hadoop 0.23.1 release from source with native  
support. I have already built/installed the following pre- 
requisites for native support

1. gcc-c++ 4.7.1
2. protoc 2.4.1
3. autotools chain
4. JDK 1.6.0_33
5. zlib 1.2.5-6
6. lzo 2.06-2

I have also set the following variables and exported them
export LD_LIBRARY_PATH=/usr/local/lib

Other variables are also set as given below.
export LD_LIBRARY_PATH=/usr/local/lib
export JAVA_HOME=/usr/java/jdk1.6.0_33
export ANT_HOME=/home/fedora/apache-ant-1.8.4
export MAVEN_HOME=/home/fedora/apache-maven-3.0.4
export PATH=$JAVA_HOME/bin:$ANT_HOME/bin:$MAVEN_HOME/bin:$PATH
export HADOOP_COMMON_HOME=/home/fedora/hadoop/bin/release-0.23.1/ 
hadoop-0.23.1

export HADOOP_HDFS_HOME=$HADOOP_COMMON_HOME
export YARN_HOME=$HADOOP_COMMON_HOME
export HADOOP_MAPRED_HOME=$HADOOP_COMMON_HOME
export HADOOP_CONF_DIR=$HADOOP_COMMON_HOME/etc/hadoop


I am using the following to build the source with native support 
(using root user access).

mvn clean install -Pdist -Pnative -DskipTests=true

However, after building a number of sub-projects, the build fails  
at the nodemanager with the following error
[INFO] Compiling 129 source files to /home/fedora/hadoop/src/ 
release-0.23.1/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn- 
server/hadoop-yarn-server-nodemanager/target/classes

[INFO]
[INFO] --- make-maven-plugin:1.0-beta-1:autoreconf (compile) @  
hadoop-yarn-server-nodemanager ---

[INFO]
[INFO] --- make-maven-plugin:1.0-beta-1:configure (compile) @  
hadoop-yarn-server-nodemanager ---

[INFO] checking for gcc... gcc
[INFO] checking whether the C compiler works... no
[INFO] configure: error: in `/home/fedora/hadoop/src/release-0.23.1/ 
hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn- 
server-nodemanager/target/native/container-executor':

[INFO] configure: error: C compiler cannot create executables
[INFO] See `config.log' for more details


The config.log mentions the following issues.

configure:2562: checking whether the C compiler works
configure:2584: gcc -DHADOOP_CONF_DIR=/etc/hadoop -m32
conftest.c  >&5

/bin/ld: cannot find crt1.o: No such file or directory
/bin/ld: cannot find crti.o: No such file or directory
/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-redhat-linux/ 
4.7.0/libgcc_s.so when searching for -lgcc_s

/bin/ld: cannot find -lgcc_s
/bin/ld: skipping incompatible /usr/lib64/libc.so when searching  
for -lc

/bin/ld: cannot find -lc
/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-redhat-linux/ 
4.7.0/libgcc_s.so when searching for -lgcc_s

/bin/ld: cannot find -lgcc_s
/bin/ld: cannot find crtn.o: No such file or directory
collect2: error: ld returned 1 exit status
configure:2588: $? = 1
configure:2626: result: no
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "linux-container-executor"
| #define PACKAGE_TARNAME "linux-container-executor"
| #define PACKAGE_VERSION "1.0.0"
| #define PACKAGE_STRING "linux-container-executor 1.0.0"
| #define PACKAGE_BUGREPORT "mapreduce-...@hadoop.apache.org"
| #define PACKAGE_URL ""
| /* end confdefs.h.  */
|
| int
| main ()
| {
|
|   ;
|   return 0;
| }
configure:2631: error: in `/home/fedora/hadoop/src/release-0.23.1/ 
hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn- 
server-nodemanager/target/native/container-executor':

configure:2633: error: C compiler cannot create executables

Overall build summary is as given below
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop Main  SUCCESS  
[7.920s]
[INFO] Apache Hadoop Project POM . SUCCESS  
[1.405s]
[INFO] Apache Hadoop Annotations . SUCCESS  
[6.452s]
[INFO] Apache Hadoop Assembli