Re: node manager ports during mapreduce job

2015-01-13 Thread hitarth trivedi
Hi,



Yes, after 10 minutes it is expiring and relaunching and this time I can
see it is on different node manager.

Let me tell you the configuration. It has 1 resource manager talking to 4
node managers. If I only have one node manager running, everything works
fine. If I have multiple node managers running, it works only if firewall
is off on these node managers.

I have attached the logs for 2 nodemanagers running  so that it is easy for
debugging. Typical mapreduce program with single node manager or with
multiple node managers with firewall turned off, is taking about 30 sec.
The attached logs with 2 node managers took 11 min.

If all the 4 are running sometimes it takes 40 minutes or it times out
after for about 45 minutes.



Let me know what we are doing wrong.


Thanks,

Hitarth

On Sun, Jan 11, 2015 at 11:27 PM, Rohith Sharma K S <
rohithsharm...@huawei.com> wrote:

>  Hi
>
>
>
> Could you give more information regarding problem?
>
>
>
> I did not get what do you mean by this statement
>
> >> Upon submitting the mapreduce job to the resource manager*, it is
> getting stuck while at getResources() for 10 min, timing out and then it is
> trying other node manager.*
>
> If MRAppMaster does not communicate to RM for 10 mins, RM will expire that
> applicationattempt and try to re launch it.  But you  have mentioned that
> it is trying to other node manager, which daemon is trying to other node
> manager?
>
>
>
> I suggest  you that whenever there is problem like getting stuck, take a
> thread dump using *jstack , *this would help analyzing issue faster.
>
>
>
> Any free ports i.e  1024<=x<=65365 should work fine.
>
>
>
> Thanks & Regards
>
> Rohith Sharma K S
>
>
>
> *From:* hitarth trivedi [mailto:t.hita...@gmail.com]
> *Sent:* 12 January 2015 07:01
> *To:* user@hadoop.apache.org
> *Subject:* node manager ports during mapreduce job
>
>
>
> Hi,
>
>
>
> We have a resource manager with 4 node managers. Upon submitting the
> mapreduce job to the resource manager, it is getting stuck while at
> getResources() for 10 min, timing out and then it is trying other node
> manager.
>
> When only one nodemanager running, everything is fine. Upon turning off
> the firewall on all node managers, everything seems working.
>
> Upon looking at the netstat, it was wide range of ports between 3 to
> 61000 that noedmanagers/reosurcemanagers were communicating.
>
> So I opened the tcp ports in the range 3:61000 and turned on the
> firewall. But it does not seem to work.
>
> Any idea, what needs to be done here?
>
>
>
> Thx
>
> -Hitarth
>


2nodemanagers
Description: Binary data


RE: node manager ports during mapreduce job

2015-01-11 Thread Rohith Sharma K S
Hi

Could you give more information regarding problem?

I did not get what do you mean by this statement
>> Upon submitting the mapreduce job to the resource manager, it is getting 
>> stuck while at getResources() for 10 min, timing out and then it is trying 
>> other node manager.
If MRAppMaster does not communicate to RM for 10 mins, RM will expire that 
applicationattempt and try to re launch it.  But you  have mentioned that it is 
trying to other node manager, which daemon is trying to other node manager?

I suggest  you that whenever there is problem like getting stuck, take a thread 
dump using jstack , this would help analyzing issue faster.

Any free ports i.e  1024<=x<=65365 should work fine.

Thanks & Regards
Rohith Sharma K S

From: hitarth trivedi [mailto:t.hita...@gmail.com]
Sent: 12 January 2015 07:01
To: user@hadoop.apache.org
Subject: node manager ports during mapreduce job

Hi,

We have a resource manager with 4 node managers. Upon submitting the mapreduce 
job to the resource manager, it is getting stuck while at getResources() for 10 
min, timing out and then it is trying other node manager.
When only one nodemanager running, everything is fine. Upon turning off the 
firewall on all node managers, everything seems working.
Upon looking at the netstat, it was wide range of ports between 3 to 61000 
that noedmanagers/reosurcemanagers were communicating.
So I opened the tcp ports in the range 3:61000 and turned on the firewall. 
But it does not seem to work.
Any idea, what needs to be done here?

Thx
-Hitarth