Re: [ANNOUNCE] Apache Hadoop 3.2.1 release

2019-09-25 Thread Rohith Sharma K S
Updated twitter message:

``
Apache Hadoop 3.2.1 is released: https://s.apache.org/96r4h

Announcement: https://s.apache.org/jhnpe
Overview: https://s.apache.org/tht6a
Changes: https://s.apache.org/pd6of
Release notes: https://s.apache.org/ta50b

Thanks to our community of developers, operators, and users.


-Rohith Sharma K S


On Wed, 25 Sep 2019 at 14:15, Sunil Govindan  wrote:

> Here the link of Overview URL is old.
> We should ideally use https://hadoop.apache.org/release/3.2.1.html
>
> Thanks
> Sunil
>
> On Wed, Sep 25, 2019 at 2:10 PM Rohith Sharma K S <
> rohithsharm...@apache.org> wrote:
>
>> Can someone help to post this in twitter account?
>>
>> Apache Hadoop 3.2.1 is released: https://s.apache.org/mzdb6
>> Overview: https://s.apache.org/tht6a
>> Changes: https://s.apache.org/pd6of
>> Release notes: https://s.apache.org/ta50b
>>
>> Thanks to our community of developers, operators, and users.
>>
>> -Rohith Sharma K S
>>
>> On Wed, 25 Sep 2019 at 13:44, Rohith Sharma K S <
>> rohithsharm...@apache.org> wrote:
>>
>>> Hi all,
>>>
>>> It gives us great pleasure to announce that the Apache Hadoop
>>> community has
>>> voted to release Apache Hadoop 3.2.1.
>>>
>>> Apache Hadoop 3.2.1 is the stable release of Apache Hadoop 3.2 line,
>>> which
>>> includes 493 fixes since Hadoop 3.2.0 release:
>>>
>>> - For major changes included in Hadoop 3.2 line, please refer Hadoop
>>> 3.2.1 main page[1].
>>> - For more details about fixes in 3.2.1 release, please read
>>> CHANGELOG[2] and RELEASENOTES[3].
>>>
>>> The release news is posted on the Hadoop website too, you can go to the
>>> downloads section directly[4].
>>>
>>> Thank you all for contributing to the Apache Hadoop!
>>>
>>> Cheers,
>>> Rohith Sharma K S
>>>
>>>
>>> [1] https://hadoop.apache.org/docs/r3.2.1/index.html
>>> [2]
>>> https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-common/release/3.2.1/CHANGELOG.3.2.1.html
>>> [3]
>>> https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-common/release/3.2.1/RELEASENOTES.3.2.1.html
>>> [4] https://hadoop.apache.org
>>>
>>


Re: [ANNOUNCE] Apache Hadoop 3.2.1 release

2019-09-25 Thread Rohith Sharma K S
Updated announcement


Hi all,

It gives us great pleasure to announce that the Apache Hadoop community has
voted to release Apache Hadoop 3.2.1.

Apache Hadoop 3.2.1 is the stable release of Apache Hadoop 3.2 line, which
includes 493 fixes since Hadoop 3.2.0 release:
  - For major changes included in Hadoop 3.2 line, please refer Hadoop
3.2.1 main page [1].
  - For more details about fixes in 3.2.1 release, please read CHANGELOG
[2] and RELEASENOTES [3].

The release news is posted on the Hadoop website too, you can go to the
downloads section directly [4].

This announcement itself is also up on the website [0].

Thank you all for contributing to the Apache Hadoop!

Cheers,
Rohith Sharma K S

[0] Announcement: https://hadoop.apache.org/release/3.2.1.html
[1] Overview of major changes:
https://hadoop.apache.org/docs/r3.2.1/index.html
[2] Detailed change-log:
https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-common/release/3.2.1/CHANGELOG.3.2.1.html
[3] Detailed release-notes:
https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-common/release/3.2.1/RELEASENOTES.3.2.1.html
[4] Project Home: https://hadoop.apache.org

On Wed, 25 Sep 2019 at 13:44, Rohith Sharma K S 
wrote:

> Hi all,
>
> It gives us great pleasure to announce that the Apache Hadoop
> community has
> voted to release Apache Hadoop 3.2.1.
>
> Apache Hadoop 3.2.1 is the stable release of Apache Hadoop 3.2 line, which
> includes 493 fixes since Hadoop 3.2.0 release:
>
> - For major changes included in Hadoop 3.2 line, please refer Hadoop 3.2.1
> main page[1].
> - For more details about fixes in 3.2.1 release, please read CHANGELOG[2]
> and RELEASENOTES[3].
>
> The release news is posted on the Hadoop website too, you can go to the
> downloads section directly[4].
>
> Thank you all for contributing to the Apache Hadoop!
>
> Cheers,
> Rohith Sharma K S
>
>
> [1] https://hadoop.apache.org/docs/r3.2.1/index.html
> [2]
> https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-common/release/3.2.1/CHANGELOG.3.2.1.html
> [3]
> https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-common/release/3.2.1/RELEASENOTES.3.2.1.html
> [4] https://hadoop.apache.org
>


Re: [ANNOUNCE] Apache Hadoop 3.2.1 release

2019-09-25 Thread Rohith Sharma K S
Can someone help to post this in twitter account?

Apache Hadoop 3.2.1 is released: https://s.apache.org/mzdb6
Overview: https://s.apache.org/tht6a
Changes: https://s.apache.org/pd6of
Release notes: https://s.apache.org/ta50b

Thanks to our community of developers, operators, and users.

-Rohith Sharma K S

On Wed, 25 Sep 2019 at 13:44, Rohith Sharma K S 
wrote:

> Hi all,
>
> It gives us great pleasure to announce that the Apache Hadoop
> community has
> voted to release Apache Hadoop 3.2.1.
>
> Apache Hadoop 3.2.1 is the stable release of Apache Hadoop 3.2 line, which
> includes 493 fixes since Hadoop 3.2.0 release:
>
> - For major changes included in Hadoop 3.2 line, please refer Hadoop 3.2.1
> main page[1].
> - For more details about fixes in 3.2.1 release, please read CHANGELOG[2]
> and RELEASENOTES[3].
>
> The release news is posted on the Hadoop website too, you can go to the
> downloads section directly[4].
>
> Thank you all for contributing to the Apache Hadoop!
>
> Cheers,
> Rohith Sharma K S
>
>
> [1] https://hadoop.apache.org/docs/r3.2.1/index.html
> [2]
> https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-common/release/3.2.1/CHANGELOG.3.2.1.html
> [3]
> https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-common/release/3.2.1/RELEASENOTES.3.2.1.html
> [4] https://hadoop.apache.org
>


[ANNOUNCE] Apache Hadoop 3.2.1 release

2019-09-25 Thread Rohith Sharma K S
Hi all,

It gives us great pleasure to announce that the Apache Hadoop community
has
voted to release Apache Hadoop 3.2.1.

Apache Hadoop 3.2.1 is the stable release of Apache Hadoop 3.2 line, which
includes 493 fixes since Hadoop 3.2.0 release:

- For major changes included in Hadoop 3.2 line, please refer Hadoop 3.2.1
main page[1].
- For more details about fixes in 3.2.1 release, please read CHANGELOG[2]
and RELEASENOTES[3].

The release news is posted on the Hadoop website too, you can go to the
downloads section directly[4].

Thank you all for contributing to the Apache Hadoop!

Cheers,
Rohith Sharma K S


[1] https://hadoop.apache.org/docs/r3.2.1/index.html
[2]
https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-common/release/3.2.1/CHANGELOG.3.2.1.html
[3]
https://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-common/release/3.2.1/RELEASENOTES.3.2.1.html
[4] https://hadoop.apache.org


Re: Question about Yarn rolling upgrade

2019-02-07 Thread Rohith Sharma K S
The above JIRA mentioned breaks but those are fixed in 2.6 itself. The only
one JIRA I see is YARN-8310 which is fixed in 2.10. Looking from stack
trace which you have mentioned, it doesn't seems related to your issue. May
be try applying a patch and run a job.
Otherwise, lets create a JIRA and discuss there in detail.

-Rohith Sharma K S

On Thu, 7 Feb 2019 at 22:52, Aihua Xu  wrote:

> Hi Rohith,
>
> Thanks for your suggestion. I was tracing the issue and found out it's
> caused by the incompatibility from these two changes. The tokens have been
> changed.
>
> YARN-668. Changed 
> NMTokenIdentifier/AMRMTokenIdentifier/ContainerTokenIdentifier to use 
> protobuf object as the payload. Contributed by Junping Du.
>
> YARN-2615. Changed 
> ClientToAMTokenIdentifier/RM(Timeline)DelegationTokenIdentifier to use 
> protobuf as payload. Contributed by Junping Du
>
>
> I was testing new RM with old NM.
>
> Followup on the the order of Yarn upgrade. I checked the HWX blog
> <https://hortonworks.com/blog/introducing-rolling-upgrades-downgrades-apache-hadoop-yarn-cluster/>
>  about
> rolling upgrade and it's suggesting to upgrade RM first.  But you are
> saying we should NM first and RM second? Can you confirm?
>
> Thanks,
> Aihua
>
>
>
> On Wed, Feb 6, 2019 at 8:26 PM Rohith Sharma K S <
> rohithsharm...@apache.org> wrote:
>
>> Hi Aihua,
>>
>> Could you give more clarity on when job is submitted like a) before
>> starting upgrade b) after RM upgrade and before NM upgrade c) after YARN
>> upgrade fully?
>> Typically, order of upgrade suggested is NM's first and RM second.
>>
>> Reg the NM warn messages you might be hitting
>> https://issues.apache.org/jira/browse/HADOOP-11692.
>>
>> Doesn't any subsequent jobs succeeded post upgrade?
>> -Rohith Sharma K S
>>
>> On Thu, 7 Feb 2019 at 03:20, Aihua Xu  wrote:
>>
>>> Hi all,
>>>
>>> I'm investigating the rolling upgrade process from Hadoop 2.6 to Hadoop
>>> 2.9.1. I'm trying to upgrade ResourceManager first and then try to upgrade
>>> NodeManager. When I submit a yarn job, RM fails with the following
>>> exception:
>>>
>>>  Application application_1549408943468_0001 failed 2 times due to Error 
>>> launching appattempt_1549408943468_0001_02. Got exception: 
>>> java.io.IOException: Failed on local exception: java.io.IOException: 
>>> java.io.EOFException; Host Details : local host is: 
>>> "hadoopbenchaqjm01-sjc1/10.67.2.171"; destination host is: 
>>> "hadoopbencha22-sjc1.prod.uber.internal":8041;
>>> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:805)
>>> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1497)
>>> at org.apache.hadoop.ipc.Client.call(Client.java:1439)
>>> at org.apache.hadoop.ipc.Client.call(Client.java:1349)
>>> at 
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>>> at 
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>>> at com.sun.proxy.$Proxy87.startContainers(Unknown Source)
>>> at 
>>> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at 
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>> at 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:498)
>>> at 
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>>> at 
>>> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>>> at 
>>> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>>> at 
>>> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>>> at 
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>>> at com.sun.proxy.$Proxy88.startContainers(Unknown Source)
>>> at 
>>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:122)
>>> at 
>>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:307)
>>> at 
>>> java.util.concurrent.Thr

Re: Question about Yarn rolling upgrade

2019-02-06 Thread Rohith Sharma K S
Hi Aihua,

Could you give more clarity on when job is submitted like a) before
starting upgrade b) after RM upgrade and before NM upgrade c) after YARN
upgrade fully?
Typically, order of upgrade suggested is NM's first and RM second.

Reg the NM warn messages you might be hitting
https://issues.apache.org/jira/browse/HADOOP-11692.

Doesn't any subsequent jobs succeeded post upgrade?
-Rohith Sharma K S

On Thu, 7 Feb 2019 at 03:20, Aihua Xu  wrote:

> Hi all,
>
> I'm investigating the rolling upgrade process from Hadoop 2.6 to Hadoop
> 2.9.1. I'm trying to upgrade ResourceManager first and then try to upgrade
> NodeManager. When I submit a yarn job, RM fails with the following
> exception:
>
>  Application application_1549408943468_0001 failed 2 times due to Error 
> launching appattempt_1549408943468_0001_02. Got exception: 
> java.io.IOException: Failed on local exception: java.io.IOException: 
> java.io.EOFException; Host Details : local host is: 
> "hadoopbenchaqjm01-sjc1/10.67.2.171"; destination host is: 
> "hadoopbencha22-sjc1.prod.uber.internal":8041;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:805)
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1497)
> at org.apache.hadoop.ipc.Client.call(Client.java:1439)
> at org.apache.hadoop.ipc.Client.call(Client.java:1349)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy87.startContainers(Unknown Source)
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
> at com.sun.proxy.$Proxy88.startContainers(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:122)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:307)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: java.io.EOFException
> at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:757)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1889)
> at 
> org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:720)
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:813)
> at org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:411)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1554)
> at org.apache.hadoop.ipc.Client.call(Client.java:1385)
> ... 20 more
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1798)
> at 
> org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:365)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:615)
> at org.apache.hadoop.ipc.Client$Connection.access$2200(Client.java:411)
> at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:800)
> at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:796)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1889)
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:795)
> ... 23 more
>
>
> and NM with
>
> 2019-02-06 00:29:20,214 WARN SecurityLogger.org.apache.hadoop.ipc.Server: 
> Auth failed for 10.67.2.171:54588:null (DIGEST-MD5: IO error acquiring 
> password) with true cause: (null)
>
>
> I'm wondering if it's a known issue and anybody has an insight for it.
>
> Thanks,
> Aihua
>
>
>


Re: Get information of containers - running/killed/completei

2016-11-15 Thread Rohith Sharma K S
Hi Ajay

For the running containers, you can get container report from
ResourceManager. For completed/killed containers, you need start
ApplicationHistoryServer daemon and use the same API i.e
yarnClient.getContainerReport() to get container report. Basically, this
API contact RM first for container report. If RM does not have this
container Id then yarnClient contact ApplicationHistoryServer to get
container report.

Thanks & Regards
Rohith Sharma K S

On 15 November 2016 at 11:14, AJAY GUPTA  wrote:

> Hi
>
> For monitoring purposes, I need to capture some container information for
> my application deployed on Yarn, specially for containers getting killed.
> This also included the finishTime of the container i.e., the time when the
> container got killed. Is there any API which will provide this information.
> Currently, I am able to get information of only RUNNING containers via
> yarnClient.getContainerReport().
>
>
> Thanks,
> Ajay
>
>


Re: ACCEPTED: waiting for AM container to be allocated, launched and register with RM

2016-08-19 Thread Rohith Sharma K S
Hi

From below discussion and AM logs, I see that AM container has launched but not 
able to connect to RM.

This looks like your configuration issue. Would you check your job.xml jar that 
does yarn.resourcemanager.scheduler.address has been configured? 

Essentially, this address required by MRAppMaster for connecting to RM for 
heartbeats. If you don’t not configure, default value will be taken i.e 8030.


Thanks & Regards
Rohith Sharma K S

> On Aug 20, 2016, at 7:02 AM, rammohan ganapavarapu  
> wrote:
> 
> Even if  the cluster dont have enough resources it should connect to "
> /0.0.0.0:8030 <http://0.0.0.0:8030/>" right? it should connect to my 
> , not sure why its trying to connect to 0.0.0.0:8030 
> <http://0.0.0.0:8030/>.
> I have verified the config and i removed traces of 0.0.0.0 still no luck.
> org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at 
> /0.0.0.0:8030 <http://0.0.0.0:8030/>
> 
> If an one has any clue please share.
> 
> Thanks,
> Ram
> 
> 
> On Fri, Aug 19, 2016 at 2:32 PM, rammohan ganapavarapu 
> mailto:rammohanga...@gmail.com>> wrote:
> When i submit a job using yarn its seems working only with oozie its failing 
> i guess, not sure what is missing.
> 
> yarn jar 
> /uap/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar pi 20 
> 1000
> Number of Maps  = 20
> Samples per Map = 1000
> .
> .
> .
> Job Finished in 19.622 seconds
> Estimated value of Pi is 3.1428
> 
> Ram
> 
> On Fri, Aug 19, 2016 at 11:46 AM, rammohan ganapavarapu 
> mailto:rammohanga...@gmail.com>> wrote:
> Ok, i have used yarn-utils.py to get the correct values for my cluster and 
> update those properties and restarted RM and NM but still no luck not sure 
> what i am missing, any other insights will help me.
> 
> Below are my properties from yarn-site.xml and map-site.xml.
> 
> python yarn-utils.py -c 24 -m 63 -d 3 -k False
>  Using cores=24 memory=63GB disks=3 hbase=False
>  Profile: cores=24 memory=63488MB reserved=1GB usableMem=62GB disks=3
>  Num Container=6
>  Container Ram=10240MB
>  Used Ram=60GB
>  Unused Ram=1GB
>  yarn.scheduler.minimum-allocation-mb=10240
>  yarn.scheduler.maximum-allocation-mb=61440
>  yarn.nodemanager.resource.memory-mb=61440
>  mapreduce.map.memory.mb=5120
>  mapreduce.map.java.opts=-Xmx4096m
>  mapreduce.reduce.memory.mb=10240
>  mapreduce.reduce.java.opts=-Xmx8192m
>  yarn.app.mapreduce.am <http://yarn.app.mapreduce.am/>.resource.mb=5120
>  yarn.app.mapreduce.am <http://yarn.app.mapreduce.am/>.command-opts=-Xmx4096m
>  mapreduce.task.io.sort.mb=1024
> 
> 
> 
>   mapreduce.map.memory.mb
>   5120
> 
> 
>   mapreduce.map.java.opts
>   -Xmx4096m
> 
> 
>   mapreduce.reduce.memory.mb
>   10240
> 
> 
>   mapreduce.reduce.java.opts
>   -Xmx8192m
> 
> 
>   yarn.app.mapreduce.am 
> <http://yarn.app.mapreduce.am/>.resource.mb
>   5120
> 
> 
>   yarn.app.mapreduce.am 
> <http://yarn.app.mapreduce.am/>.command-opts
>   -Xmx4096m
> 
> 
>   mapreduce.task.io.sort.mb
>   1024
> 
> 
> 
> 
>  
>   yarn.scheduler.minimum-allocation-mb
>   10240
> 
> 
>  
>   yarn.scheduler.maximum-allocation-mb
>   61440
> 
> 
>  
>   yarn.nodemanager.resource.memory-mb
>   61440
> 
> 
> 
> Ram
> 
> On Thu, Aug 18, 2016 at 11:14 PM, tkg_cangkul  <mailto:yuza.ras...@gmail.com>> wrote:
> maybe this link can be some reference to tune up the cluster:
> 
> http://jason4zhu.blogspot.co.id/2014/10/memory-configuration-in-hadoop.html 
> <http://jason4zhu.blogspot.co.id/2014/10/memory-configuration-in-hadoop.html>
> 
> 
> On 19/08/16 11:13, rammohan ganapavarapu wrote:
>> Do you know what properties to tune?
>> 
>> Thanks,
>> Ram
>> 
>> On Thu, Aug 18, 2016 at 9:11 PM, tkg_cangkul > <mailto:yuza.ras...@gmail.com>> wrote:
>> i think that's because you don't have enough resource.  u can tune your 
>> cluster config to maximize your resource.
>> 
>> 
>> On 19/08/16 11:03, rammohan ganapavarapu wrote:
>>> I dont see any thing odd except this not sure if i have to worry about it 
>>> or not.
>>> 
>>> 2016-08-19 03:29:26,621 INFO [main] org.apache.hadoop.yarn.client.RMProxy: 
>>> Connecting to ResourceManager at /0.0.0.0:8030 <http://0.0.0.0:8030/>
>>> 2016-08-19 03:29:27,646 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
>>&

Re: Issue with Hadoop Job History Server

2016-08-18 Thread Rohith Sharma K S
MR jobs and JHS should have same configurations for done-dir if configured. 
Otherwise staging-dir should be same for both. Make sure both Job and JHS has 
same configurations value.

Usually what would happen is , MRApp writes job file in one location and 
HistoryServer trying to read from different location. This causes, JHS to 
display empty jobs.

Thanks & Regards
Rohith Sharma K S

> On Aug 18, 2016, at 12:35 PM, Gao, Yunlong  wrote:
> 
> To whom it may concern,
> 
> I am using Hadoop 2.7.1.2.3.6.0-3796, with the Hortonworks distribution of 
> HDP-2.3.6.0-3796. I have a question with the Hadoop Job History sever. 
> 
> After I set up everything, the resource manager/name nodes/data nodes seem to 
> be running fine. But the job history server is not working correctly.  The 
> issue with it is that the UI of the job history server does not show any 
> jobs.  And all the rest calls to the job history server do not work either. 
> Also notice that there is no logs in HDFS under the directory of 
> "mapreduce.jobhistory.done-dir"
> 
> I have tried with different things, including restarting the job history 
> server and monitor the log -- no error/exceptions is observed. I also rename 
> the /hadoop/mapreduce/jhs/mr-jhs-state for the state recovery of job history 
> server, and then restart it again, but no particular error happens. I tried 
> with some other random stuff that I borrowed from online blogs/documents but 
> got no luck.
> 
> 
> Any help would be very much appreciated.
> 
> Thanks,
> Yunlong
> 



Re: Connecting JConsole to ResourceManager

2016-08-09 Thread Rohith Sharma K S
Hi

Have you enabled JMX remote connections parameters for RM start up? If you are 
trying to remote connection, these parameter supposed to passed in hadoop opts
You need to enable remote by configuring these parameter in RM jam start up.
-Dcom.sun.management.jmxremote.port= \
 -Dcom.sun.management.jmxremote.authenticate=false \
 -Dcom.sun.management.jmxremote.ssl=false 

 


-Regards
Rohith Sharma K S

> On Aug 9, 2016, at 12:32 PM, Atri Sharma  wrote:
> 
> Hi All,
> 
> I am trying to connect to a running ResourceManager process on Windows. I ran 
> jconsole and it shows the ResourceManager process. When I try connecting, it 
> immediately fails saying that it cannot connect.
> 
> I verified that the cluster is running fine by running the wordcount example.
> 
> Please advise.
> 
> Regards,
> 
> Atri
> 



RE: Securely discovering Application Master's metadata or sending a secret to Application Master at submission

2016-06-09 Thread Rohith Sharma K S
Hi

Basically I see you have multiple questions

1.   How to get AM RPC port ?

>>> This you can get it via YarnClient# getApplicationReport(). This gives 
>>> common/generic application specific details. Note that RM does not maintain 
>>> any custom details for applications.

2.   How can you get metadata of AM?

>>> Basically AM design should be such that bind an interface to AM RPC. And 
>>> AM-RPC host and port can be obtained from ResourceManager. Using host:port 
>>> of AM from application submitter,  connect to AM and get required details 
>>> from AM only. To achieve this , YARN does not provide any interface since 
>>> AM are written users. Essentially, user can design AM to expose client 
>>> interface to their clients. For your better understanding , see MapReduce 
>>> framework MRAppMaster.

3.   About the authenticity of job-submitter to AM

>>> Use secured hadoop cluster with Kerberos enabled. Note that AM also should 
>>> be implemented for handling Kerberos.


Thanks & Regards
Rohith Sharma K S

From: Mingyu Kim [mailto:m...@palantir.com]
Sent: 10 June 2016 03:47
To: Rohith Sharma K S; user@hadoop.apache.org
Cc: Matt Cheah
Subject: Re: Securely discovering Application Master's metadata or sending a 
secret to Application Master at submission

Hi Rohith,

Thanks for the pointers. I checked the Hadoop documentation you linked, but 
it’s not clear how I can expose client interface for providing metadata. By 
“YARN internal communications”, I was referring to the endpoints that are 
exposed by AM on the RPC port as reported in ApplicationReport. I assume either 
RM or containers will communicate with AM through these endpoints.

I believe your suggestion is to expose additional endpoints to the AM RPC port. 
Can you clarify how I can do that? Is there an interface/class I need to 
extend? How can I register the extra endpoints for providing metadata on the 
existing AM RPC port?

Mingyu

From: Rohith Sharma K S 
mailto:rohithsharm...@huawei.com>>
Date: Wednesday, June 8, 2016 at 11:15 PM
To: Mingyu Kim mailto:m...@palantir.com>>, 
"user@hadoop.apache.org<mailto:user@hadoop.apache.org>" 
mailto:user@hadoop.apache.org>>
Cc: Matt Cheah mailto:mch...@palantir.com>>
Subject: RE: Securely discovering Application Master's metadata or sending a 
secret to Application Master at submission

Hi

Do you know how I can extend the client interface of the RPC port?
>>> YARN provides YARNClIent library that uses ApplicationClientProtocol. For 
>>> your more understanding refer 
>>> https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html#Writing_a_simple_Client<https://urldefense.proofpoint.com/v2/url?u=https-3A__hadoop.apache.org_docs_stable_hadoop-2Dyarn_hadoop-2Dyarn-2Dsite_WritingYarnApplications.html-23Writing-5Fa-5Fsimple-5FClient&d=DQMGaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=ennQJq47pNnObsDh-88a9YUrUulcYQoV8giPASqXB84&m=5pHc0M-1BOxtbvvaoT6ahycddGtWm-uq9f5JW_FJRQM&s=S9H5l9wo0JK9Oet5_GiN-lW4lQBxkaC1mxPDRY1kGpk&e=>

I know AM has some endpoints exposed through the RPC port for internal YARN 
communications, but was not sure how I can extend it to expose a custom 
endpoint.
>>> I am not sure what you mean here internal YARN communication? AM can 
>>> connect to RM only via AM-RM interface for register/unregister and 
>>> heartbeat and details sent to RM are limited.  It is up to the AM’s to 
>>> expose client interface for providing metadata.
Thanks & Regards
Rohith Sharma K S
From: Mingyu Kim [mailto:m...@palantir.com]
Sent: 09 June 2016 11:21
To: Rohith Sharma K S; user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Cc: Matt Cheah
Subject: Re: Securely discovering Application Master's metadata or sending a 
secret to Application Master at submission

Hi Rohith,

Thanks for the quick response. That sounds promising. Do you know how I can 
extend the client interface of the RPC port? I know AM has some endpoints 
exposed through the RPC port for internal YARN communications, but was not sure 
how I can extend it to expose a custom endpoint. Any pointer would be 
appreciated!

Mingyu

From: Rohith Sharma K S 
mailto:rohithsharm...@huawei.com>>
Date: Wednesday, June 8, 2016 at 10:39 PM
To: Mingyu Kim mailto:m...@palantir.com>>, 
"user@hadoop.apache.org<mailto:user@hadoop.apache.org>" 
mailto:user@hadoop.apache.org>>
Cc: Matt Cheah mailto:mch...@palantir.com>>
Subject: RE: Securely discovering Application Master's metadata or sending a 
secret to Application Master at submission

Hi

Apart from AM address and tracking URL, no other meta data of applicationMaster 
are stored in YARN. May be AM can expose client interface so th

RE: Securely discovering Application Master's metadata or sending a secret to Application Master at submission

2016-06-08 Thread Rohith Sharma K S
Hi

Do you know how I can extend the client interface of the RPC port?
>>> YARN provides YARNClIent library that uses ApplicationClientProtocol. For 
>>> your more understanding refer 
>>> https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html#Writing_a_simple_Client

I know AM has some endpoints exposed through the RPC port for internal YARN 
communications, but was not sure how I can extend it to expose a custom 
endpoint.
>>> I am not sure what you mean here internal YARN communication? AM can 
>>> connect to RM only via AM-RM interface for register/unregister and 
>>> heartbeat and details sent to RM are limited.  It is up to the AM’s to 
>>> expose client interface for providing metadata.
Thanks & Regards
Rohith Sharma K S
From: Mingyu Kim [mailto:m...@palantir.com]
Sent: 09 June 2016 11:21
To: Rohith Sharma K S; user@hadoop.apache.org
Cc: Matt Cheah
Subject: Re: Securely discovering Application Master's metadata or sending a 
secret to Application Master at submission

Hi Rohith,

Thanks for the quick response. That sounds promising. Do you know how I can 
extend the client interface of the RPC port? I know AM has some endpoints 
exposed through the RPC port for internal YARN communications, but was not sure 
how I can extend it to expose a custom endpoint. Any pointer would be 
appreciated!

Mingyu

From: Rohith Sharma K S 
mailto:rohithsharm...@huawei.com>>
Date: Wednesday, June 8, 2016 at 10:39 PM
To: Mingyu Kim mailto:m...@palantir.com>>, 
"user@hadoop.apache.org<mailto:user@hadoop.apache.org>" 
mailto:user@hadoop.apache.org>>
Cc: Matt Cheah mailto:mch...@palantir.com>>
Subject: RE: Securely discovering Application Master's metadata or sending a 
secret to Application Master at submission

Hi

Apart from AM address and tracking URL, no other meta data of applicationMaster 
are stored in YARN. May be AM can expose client interface so that AM clients 
can interact with Running AM to retrieve specific AM details.

RPC port of AM can be get from YARN client interface such as 
ApplicationClientProtocol# getApplicationReport() OR ApplicationClientProtocol 
#getApplicationAttemptReport().

Thanks & Regards
Rohith Sharma K S

From: Mingyu Kim [mailto:m...@palantir.com]
Sent: 09 June 2016 10:36
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Cc: Matt Cheah
Subject: Securely discovering Application Master's metadata or sending a secret 
to Application Master at submission

Hi all,

To provide a bit of background, I’m trying to deploy a REST server on 
Application Master and discover the randomly assigned port number securely. I 
can easily discover the host name of AM through YARN REST API, but the port 
number needs to be discovered separately. (Port number is assigned within a 
specified range with retries to avoid port conflicts) An easy solution would be 
to have Application Master make a callback with the port number, but I’d like 
to design it such that YARN nodes don’t talk back to the node that submitted 
the YARN application. So, this problem reduces to securely discovering a small 
metadata of Application Master. To be clear, by being secure, I’m less 
concerned about exposing the information to others, but more concerned about 
the integrity of data (e.g. the metadata actually originated from the 
Application Master.)

I was hoping that there is a way to register some Application Master metadata 
to Resource Manager, but there doesn’t seem to be a way. Another option I 
considered was to write the information to a HDFS file, but in order to verify 
the integrity of the content, I need a way to securely send a private key to 
Application Master, which I’m not sure what the best is.

To recap, does anyone know if there is a way

• To register small metadata securely from Application Master to 
Resource Manager so that it can be discovered by the YARN application submitter?

• Or, to securely send a private key to Application Master at the 
application submission time?

Thanks a lot,
Mingyu


RE: Securely discovering Application Master's metadata or sending a secret to Application Master at submission

2016-06-08 Thread Rohith Sharma K S
Hi

Apart from AM address and tracking URL, no other meta data of applicationMaster 
are stored in YARN. May be AM can expose client interface so that AM clients 
can interact with Running AM to retrieve specific AM details.

RPC port of AM can be get from YARN client interface such as 
ApplicationClientProtocol# getApplicationReport() OR ApplicationClientProtocol 
#getApplicationAttemptReport().

Thanks & Regards
Rohith Sharma K S

From: Mingyu Kim [mailto:m...@palantir.com]
Sent: 09 June 2016 10:36
To: user@hadoop.apache.org
Cc: Matt Cheah
Subject: Securely discovering Application Master's metadata or sending a secret 
to Application Master at submission

Hi all,

To provide a bit of background, I’m trying to deploy a REST server on 
Application Master and discover the randomly assigned port number securely. I 
can easily discover the host name of AM through YARN REST API, but the port 
number needs to be discovered separately. (Port number is assigned within a 
specified range with retries to avoid port conflicts) An easy solution would be 
to have Application Master make a callback with the port number, but I’d like 
to design it such that YARN nodes don’t talk back to the node that submitted 
the YARN application. So, this problem reduces to securely discovering a small 
metadata of Application Master. To be clear, by being secure, I’m less 
concerned about exposing the information to others, but more concerned about 
the integrity of data (e.g. the metadata actually originated from the 
Application Master.)

I was hoping that there is a way to register some Application Master metadata 
to Resource Manager, but there doesn’t seem to be a way. Another option I 
considered was to write the information to a HDFS file, but in order to verify 
the integrity of the content, I need a way to securely send a private key to 
Application Master, which I’m not sure what the best is.

To recap, does anyone know if there is a way

• To register small metadata securely from Application Master to 
Resource Manager so that it can be discovered by the YARN application submitter?

• Or, to securely send a private key to Application Master at the 
application submission time?

Thanks a lot,
Mingyu


RE: Leak in RM Capacity scheduler leading to OOM

2016-03-23 Thread Rohith Sharma K S
I think you might be hitting with YARN-2997. This issue fixes for sending 
duplicated completed containers to RM.

Thanks & Regards
Rohith Sharma K S

-Original Message-
From: Sharad Agarwal [mailto:sha...@apache.org] 
Sent: 24 March 2016 08:58
To: Sharad Agarwal
Cc: yarn-...@hadoop.apache.org; user@hadoop.apache.org
Subject: Re: Leak in RM Capacity scheduler leading to OOM

Ticket for this is here ->
https://issues.apache.org/jira/browse/YARN-4852

On Wed, Mar 23, 2016 at 5:50 PM, Sharad Agarwal  wrote:

> Taking a dump of 8 GB heap shows about 18 million 
> org.apache.hadoop.yarn.proto.YarnProtos$ApplicationIdProto
>
> Similar counts are there for ApplicationAttempt, ContainerId. All 
> seems to be linked via 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerStatusProto, the 
> count of which is also about 18 million.
>
> On further debugging, looking at the CapacityScheduler code:
>
> It seems to add duplicated entries of UpdatedContainerInfo objects for 
> the completed containers. In the same dump seeing about 0.5 
> UpdatedContainerInfo million objects
>
> This issue only surfaces if the scheduler thread is not able to drain 
> fast enough the UpdatedContainerInfo objects, happens only in a big cluster.
>
> Has anyone noticed the same. We are running hadoop 2.6.0
>
> Sharad
>


RE: Concurrency control

2015-09-29 Thread Rohith Sharma K S
Hi Laxman,

In Hadoop-2.8(Not released  yet),  CapacityScheduler provides configuration for 
configuring ordering policy.  By configuring FAIR_ORDERING_POLICY in CS , 
probably you should be able to achieve  your goal i.e avoiding starving of 
applications for resources.


org.apache.hadoop.yarn.server.resourcemanager.scheduler.policy.FairOrderingPolicy>
An OrderingPolicy which orders SchedulableEntities for fairness (see 
FairScheduler FairSharePolicy), generally, processes with lesser usage are 
lesser. If sizedBasedWeight is set to true then an application with high demand 
may be prioritized ahead of an application with less usage. This is to offset 
the tendency to favor small apps, which could result in starvation for large 
apps if many small ones enter and leave the queue continuously (optional, 
default false)


Community Issue Id :  https://issues.apache.org/jira/browse/YARN-3463

Thanks & Regards
Rohith Sharma K S

From: Laxman Ch [mailto:laxman@gmail.com]
Sent: 29 September 2015 13:36
To: user@hadoop.apache.org
Subject: Re: Concurrency control

Bouncing this thread again. Any other thoughts please?

On 17 September 2015 at 23:21, Laxman Ch 
mailto:laxman@gmail.com>> wrote:
No Naga. That wont help.

I am running two applications (app1 - 100 vcores, app2 - 100 vcores) with same 
user which runs in same queue (capacity=100vcores). In this scenario, if app1 
triggers first occupies all the slots and runs longs then app2 will starve 
longer.

Let me reiterate my problem statement. I wanted "to control the amount of 
resources (vcores, memory) used by an application SIMULTANEOUSLY"

On 17 September 2015 at 22:28, Naganarasimha Garla 
mailto:naganarasimha...@gmail.com>> wrote:
Hi Laxman,
For the example you have stated may be we can do the following things :
1. Create/modify the queue with capacity and max cap set such that its 
equivalent to 100 vcores. So as there is no elasticity, given application will 
not be using the resources beyond the capacity configured
2. yarn.scheduler.capacity..minimum-user-limit-percent   so that 
each active user would be assured with the minimum guaranteed resources . By 
default value is 100 implies no user limits are imposed.

Additionally we can think of 
"yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage" which 
will enforce strict cpu usage for a given container if required.

+ Naga

On Thu, Sep 17, 2015 at 4:42 PM, Laxman Ch 
mailto:laxman@gmail.com>> wrote:
Yes. I'm already using cgroups. Cgroups helps in controlling the resources at 
container level. But my requirement is more about controlling the concurrent 
resource usage of an application at whole cluster level.

And yes, we do configure queues properly. But, that won't help.

For example, I have an application with a requirement of 1000 vcores. But, I 
wanted to control this application not to go beyond 100 vcores at any point of 
time in the cluster/queue. This makes that application to run longer even when 
my cluster is free but I will be able meet the guaranteed SLAs of other 
applications.

Hope this helps to understand my question.

And thanks Narasimha for quick response.

On 17 September 2015 at 16:17, Naganarasimha Garla 
mailto:naganarasimha...@gmail.com>> wrote:
Hi Laxman,
Yes if cgroups are enabled and "yarn.scheduler.capacity.resource-calculator" 
configured to DominantResourceCalculator then cpu and memory can be controlled.
Please Kindly  furhter refer to the official documentation
http://hadoop.apache.org/docs/r1.2.1/capacity_scheduler.html

But may be if say more about problem then we can suggest ideal configuration, 
seems like capacity configuration and splitting of the queue is not rightly 
done or you might refer to Fair Scheduler if you want more fairness for 
container allocation for different apps.

On Thu, Sep 17, 2015 at 4:10 PM, Laxman Ch 
mailto:laxman@gmail.com>> wrote:
Hi,

In YARN, do we have any way to control the amount of resources (vcores, memory) 
used by an application SIMULTANEOUSLY.

- In my cluster, noticed some large and long running mr-app occupied all the 
slots of the queue and blocking other apps to get started.
- I'm using Capacity schedulers (using hierarchical queues and preemption 
disabled)
- Using Hadoop version 2.6.0
- Did some googling around this and gone through configuration docs but I'm not 
able to find anything that matches my requirement.

If needed, I can provide more details on the usecase and problem.

--
Thanks,
Laxman




--
Thanks,
Laxman




--
Thanks,
Laxman



--
Thanks,
Laxman


RE: How to auto relaunch a YARN Application Master on a failure?

2015-08-19 Thread Rohith Sharma K S
It is possible.. You can set the number of attempts to be launched in case of 
AM failures.
yarn.resourcemanager.am.max-attempts. Default is 2, you can increase it. This 
is at global level.
Per application level, you need to send in ApplicationSubmissionContext# 
setMaxAppAttempts

Thanks & Regards
Rohith Sharma K S

From: Sridhar Chellappa [mailto:schellap2...@gmail.com]
Sent: 19 August 2015 14:55
To: user@hadoop.apache.org
Subject: How to auto relaunch a YARN Application Master on a failure?

Is this possible? If yes, can someone get back to me as to how?


RE: Confusing Yarn RPC Configuration

2015-08-19 Thread Rohith Sharma K S
>>> I believe it is the same issue for node manage connection
This would be probably related to below issues
https://issues.apache.org/jira/i#browse/YARN-3944
https://issues.apache.org/jira/i#browse/YARN-3238


Thanks & Regards
Rohith Sharma K S

From: Jeff Zhang [mailto:zjf...@gmail.com]
Sent: 18 August 2015 09:11
To: user@hadoop.apache.org
Subject: Confusing Yarn RPC Configuration


I use 
yarn.resourcemanager.connect.max-wait.ms<http://yarn.resourcemanager.connect.max-wait.ms>
 to control how much time to wait for setting up RM connection. But the weird 
thing I found that this configuration is not the real max wait time. Actually 
Yarn will convert it to retry count with configuration 
yarn.resourcemanager.connect.retry-interval.ms<http://yarn.resourcemanager.connect.retry-interval.ms>.
Let's say 
yarn.resourcemanager.connect.max-wait.ms<http://yarn.resourcemanager.connect.max-wait.ms>=1
 and  
yarn.resourcemanager.connect.retry-interval.ms<http://yarn.resourcemanager.connect.retry-interval.ms>=2000,
 then yarn will create RetryUpToMaximumCountWithFixedSleep with max count = 5 
(1/2000)
Because for each RM connection, there's retry policy inside of hadoop RPC. 
Let's say ipc.client.connect.retry.interval=1000 and 
ipc.client.connect.max.retries=10, so for each RM connection it will try 10 
times and totally cost 10 seconds (1000*10).  So overall for the RM connection 
it would cost 50 seconds (10 * 5), and this number is not consistent with 
yarn.resourcemanager.connect.max-wait.ms<http://yarn.resourcemanager.connect.max-wait.ms>
 which confuse users. I am not sure the purpose of 2 rounds of retry policy 
(Yarn side and RPC internal side), should it be only 1 round of retry policy 
and yarn related configuration is just for override the RPC configuration ?

BTW, I believe it is the same issue for node manage connection.

--
Best Regards

Jeff Zhang


RE: Remotely submit a job to Yarn on CDH5.4

2015-08-18 Thread Rohith Sharma K S
Are you trying submit job from Windows to Linux server? If yes, try to submit 
job  using with mapreduce.app-submission.cross-platform=true.


Thanks & Regards
Rohith Sharma K S

From: Fei Hu [mailto:hufe...@gmail.com]
Sent: 18 August 2015 21:11
To: user@hadoop.apache.org
Subject: Remotely submit a job to Yarn on CDH5.4

Hi,

I want to remotely submit a job to Yarn on CDH5.4. The following is the code 
about the WordCount and the error report. Any one knows how to solve it?

Thanks in advance,
Fei



INFO: Job job_1439867352386_0025 failed with state FAILED due to: Application 
application_1439867352386_0025 failed 2 times due to AM Container for 
appattempt_1439867352386_0025_02 exited with  exitCode: 1
For more detailed output, check application tracking 
page:http://compute-04:8088/proxy/application_1439867352386_0025/Then, click on 
links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1439867352386_0025_02_01
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
   at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
   at org.apache.hadoop.util.Shell.run(Shell.java:455)
   at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
   at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
   at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
   at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.


public static void main(String[] args) throws Exception {
   Configuration conf = new Configuration();
   System.setProperty("HADOOP_USER_NAME","hdfs");
   conf.set("hadoop.job.ugi", "supergroup");

   conf.set("mapreduce.framework.name", "yarn");
   conf.set("fs.defaultFS", "hdfs://compute-04:8020");
   conf.set("mapreduce.map.java.opts", "-Xmx1024M");
   conf.set("mapreduce.reduce.java.opts", "-Xmx1024M");

   conf.set("fs.hdfs.impl", 
org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
   conf.set("fs.file.impl", 
org.apache.hadoop.fs.LocalFileSystem.class.getName());
   conf.set("yarn.resourcemanager.address", "199.25.200.134:8032");

   conf.set("yarn.resourcemanager.resource-tracker.address", 
"199.25.200.134:8031");
   conf.set("yarn.resourcemanager.scheduler.address", 
"199.25.200.134:8030");
   conf.set("yarn.resourcemanager.admin.address", 
"199.25.200.134:8033");


   conf.set("yarn.nodemanager.aux-services", "mapreduce_shuffle");

   conf.set("yarn.application.classpath", 
"/etc/hadoop/conf.cloudera.hdfs,"
 + "/etc/hadoop/conf.cloudera.yarn,"
 + 
"/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/*,"
 + 
"/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/lib/*,"
 + 
"/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/*,"
 + 
"/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/lib/*,"
 + 
"/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/*,"
 + 
"/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/lib/*”);


   GenericOptionsParser optionParser = new GenericOptionsParser(conf, 
args);
   String[] remainingArgs = optionParser.getRemainingArgs();
   if (!(remainingArgs.length != 2 || remainingArgs.length != 4)) {
 System.err.println("Usage: wordcount   [-skip 
skipPatternFile]");
 System.exit(2);
   }
   Job job = Job.getInstance(conf, "word count");
   job.setJarByClass(WordCount2.class);
   job.setMapperClass(TokenizerMapper.class);
   job.setCombinerClass(IntSumReducer.class);
   job.setReducerClass(IntSumReducer.class);
   job.setOutputKeyClass(Text.class);
   job.setOutputValueClass(IntWritable.class);

   List otherArgs = new ArrayList();
   for (int i=0; i < remainingArgs.length; ++i) {
   

RE: Application Master waits a long time after Mapper/Reducers finish

2015-07-20 Thread Rohith Sharma K S
Hi

From thread dump, it seems waiting for HDFS operation.  Can you attach AM logs, 
and do you see any client retry for connecting to HDFS?

"CommitterEvent Processor #4" prio=10 tid=0x0199a800 nid=0x18df in 
Object.wait() [0x7f4f12aa4000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
….
at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1864)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:575)
at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:345)
at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362)
at 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
at 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:274)
at 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)


May be you can check from HDFS that is it Healthy?

Thanks & Regards
Rohith Sharma K S

From: Ashish Kumar Singh [mailto:ashish23...@gmail.com]
Sent: 20 July 2015 14:16
To: user@hadoop.apache.org
Subject: Application Master waits a long time after Mapper/Reducers finish

Hello Users ,

I am facing a problem running Mapreduce jobs on Hadoop 2.6.
I am observing that the Applocation Master  waits for a long time after all the 
Mappers and Reducers are completed before the job is completed .

This wait time sometimes exceeds 20-25 mins which is very strange as our 
mappers and reducers complete in less than 10 minutes for the job .

Below are some observations:
a) Job completion status stands at 95% when the wait begins

b)JOB_COMMIT is initiated just before this wait time ( logs: 2015-07-14 
01:54:46,636 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1436854849540_0123Job 
Transitioned from RUNNING to COMMITTING )

c) job success happens after 20-25 minutes ( logs: 2015-07-14 02:15:06,634 INFO 
[AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1436854849540_0123Job 
Transitioned from COMMITTING to SUCCEEDED )


Appreciate any help on this .

Thread dump while the Application master hangs is attached.
Regards,
Ashish


RE: Lost mapreduce applications displayed in UI

2015-05-12 Thread Rohith Sharma K S
Hi,

Do you remember the steps when applications won’t be displayed in RM web UI?  I 
mean after which actions in the RM web UI applications are not displaying?

Is there any filtering is applied in the UI like “Showing 0 to 0 of 0 entries 
(filtered from 4 total entries)” in the bottom of RM applications page?

Thanks & Regards
Rohith Sharma K S

From: Zhijie Shen [mailto:zs...@hortonworks.com]
Sent: 13 May 2015 05:00
To: user@hadoop.apache.org
Subject: Re: Lost mapreduce applications displayed in UI


​Maybe you have hit the completed app limit (1 by default). Once the limit 
hits, the oldest completed app will be removed from cache.



- Zhijie


From: hitarth trivedi mailto:t.hita...@gmail.com>>
Sent: Tuesday, May 12, 2015 3:32 PM
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Lost mapreduce applications displayed in UI

Hi,

My cluster suddenly stopped displaying application information in UI 
(http://localhost:8088/cluster/apps). Although the counters like 'Apps 
Submitted' , 'Apps Completed', 'Apps Running'  etc, all seems to increment 
accurately and display right information, whnever I start new mapreduce job.

Any help is appreciated.

Thanks,
Hitrix


RE: YARN Exceptions

2015-04-26 Thread Rohith Sharma K S
Are you running Secured Hadoop cluster( Kerberos ) and at YARN – container 
executor as LinuxContainerExecutor?

Thanks & Regards
Rohith Sharma K S
From: Kumar Jayapal [mailto:kjayapa...@gmail.com]
Sent: 25 April 2015 20:10
To: user@hadoop.apache.org
Subject: Re: YARN Exceptions

Yes Here is the complete log and sqoop import command to get the data from 
oracle.

[root@sqpcdh01094p001 ~]# sqoop import  --connect 
"jdbc:oracle:thin:@lorct101094t01a.qat.np.costco.com:1521/CT1<http://jdbc:oracle:thin:@lorct101094t01a.qat.np.costco.com:1521/CT1>"
 --username "edhdtaesvc" --password "" --table "SAPSR3.AUSP"  
--target-dir "/data/crmdq/CT1" --table "SAPSR3.AUSP" --split-by PARTNER_GUID 
--as-avrodatafile --compression-codec org.apache.hadoop.io.compress.SnappyCodec 
--m 1

Warning: 
/opt/cloudera/parcels/CDH-5.3.2-1.cdh5.3.2.p654.326/bin/../lib/sqoop/../accumulo
 does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
15/04/25 13:37:19 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.3.2
15/04/25 13:37:19 WARN tool.BaseSqoopTool: Setting your password on the 
command-line is insecure. Consider using -P instead.
15/04/25 13:37:20 INFO oracle.OraOopManagerFactory: Data Connector for Oracle 
and Hadoop is disabled.
15/04/25 13:37:20 INFO manager.SqlManager: Using default fetchSize of 1000
15/04/25 13:37:20 INFO tool.CodeGenTool: Beginning code generation
15/04/25 13:37:20 INFO manager.OracleManager: Time zone has been set to GMT
15/04/25 13:37:20 INFO manager.SqlManager: Executing SQL statement: SELECT t.* 
FROM SAPSR3.AUSP t WHERE 1=0
15/04/25 13:37:20 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is 
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
Note: /tmp/sqoop-root/compile/dbe5b6d69507ee60c249062c54813557/SAPSR3_AUSP.java 
uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/04/25 13:37:22 INFO orm.CompilationManager: Writing jar file: 
/tmp/sqoop-root/compile/dbe5b6d69507ee60c249062c54813557/SAPSR3.AUSP.jar
15/04/25 13:37:22 INFO mapreduce.ImportJobBase: Beginning import of SAPSR3.AUSP
15/04/25 13:37:22 INFO Configuration.deprecation: mapred.jar is deprecated. 
Instead, use mapreduce.job.jar
15/04/25 13:37:22 INFO manager.OracleManager: Time zone has been set to GMT
15/04/25 13:37:23 INFO manager.OracleManager: Time zone has been set to GMT
15/04/25 13:37:23 INFO manager.SqlManager: Executing SQL statement: SELECT t.* 
FROM SAPSR3.AUSP t WHERE 1=0
15/04/25 13:37:23 INFO mapreduce.DataDrivenImportJob: Writing Avro schema file: 
/tmp/sqoop-root/compile/dbe5b6d69507ee60c249062c54813557/sqoop_import_SAPSR3_AUSP.avsc
15/04/25 13:37:23 INFO Configuration.deprecation: mapred.map.tasks is 
deprecated. Instead, use mapreduce.job.maps
15/04/25 13:37:23 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 
14047 for edhdtaesvc on ha-hdfs:nameservice1
15/04/25 13:37:23 ERROR hdfs.KeyProviderCache: Could not find uri with key 
[dfs.encryption.key.provider.uri] to create a keyProvider !!
15/04/25 13:37:23 INFO security.TokenCache: Got dt for hdfs://nameservice1; 
Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:nameservice1, Ident: 
(HDFS_DELEGATION_TOKEN token 14047 for edhdtaesvc)
15/04/25 13:37:23 ERROR hdfs.KeyProviderCache: Could not find uri with key 
[dfs.encryption.key.provider.uri] to create a keyProvider !!
15/04/25 13:37:23 ERROR hdfs.KeyProviderCache: Could not find uri with key 
[dfs.encryption.key.provider.uri] to create a keyProvider !!
15/04/25 13:37:25 ERROR hdfs.KeyProviderCache: Could not find uri with key 
[dfs.encryption.key.provider.uri] to create a keyProvider !!
15/04/25 13:37:25 INFO db.DBInputFormat: Using read commited transaction 
isolation
15/04/25 13:37:25 INFO mapreduce.JobSubmitter: number of splits:1
15/04/25 13:37:26 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
job_1429968417065_0004
15/04/25 13:37:26 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, 
Service: ha-hdfs:nameservice1, Ident: (HDFS_DELEGATION_TOKEN token 14047 for 
edhdtaesvc)
15/04/25 13:37:26 INFO impl.YarnClientImpl: Submitted application 
application_1429968417065_0004
15/04/25 13:37:26 INFO mapreduce.Job: The url to track the job: 
http://yrncdh01094p001.corp.costco.com:8088/proxy/application_1429968417065_0004/
15/04/25 13:37:26 INFO mapreduce.Job: Running job: job_1429968417065_0004
15/04/25 13:37:40 INFO mapreduce.Job: Job job_1429968417065_0004 running in 
uber mode : false
15/04/25 13:37:40 INFO mapreduce.Job:  map 0% reduce 0%
15/04/25 13:37:40 INFO mapreduce.Job: Job job_1429968417065_0004 failed with 
state FAILED due to: Application application_1429968417065_0004 failed 2 times 
due to AM Container for appattempt_1429968417065_0004_02 exited with  
exitCode: -1000 due to: Application application_1429968417065_0004 
initialization failed (exitCode=255) with output: User edhdtaesvc not found

.Failing this a

RE: YARN HA Active ResourceManager failover when machine is stopped

2015-04-26 Thread Rohith Sharma K S
Hi

 I had seen this issue in my cluster without HA configured when the process 
is Halted.  I assume that your scenario also having similar issue when Active 
RM machine is Shutdown abruptly.  May be you can verify and compare taking 
thread dump of NM and with below JIRA’s.

Open JIRA’s in community regarding this problem are
https://issues.apache.org/jira/i#browse/YARN-1061 (Without HA)
https://issues.apache.org/jira/i#browse/YARN-2578 (With HA)


Thanks & Regards
Rohith Sharma K S

From: Matt Narrell [mailto:matt.narr...@gmail.com]
Sent: 24 April 2015 23:28
To: user@hadoop.apache.org
Subject: Re: YARN HA Active ResourceManager failover when machine is stopped

Also, another observation is that when the VMs are halted, its seems like the 
NodeManagers do not consider this a scenario to round-robin among the 
configured ResourceManagers?  Is there some timeout that I’ve missed to 
instruct the NodeManagers to do this round-robining in the case of the machine 
not responding (to distinguish it from a network blip)?

mn

On Apr 24, 2015, at 1:50 AM, Drake민영근 
mailto:drake@nexr.com>> wrote:

Hi, Matt

The second log file looks like node manager's log, not the standby resource 
manager.

Thanks.

Drake 민영근 Ph.D
kt NexR

On Fri, Apr 24, 2015 at 11:39 AM, Matt Narrell 
mailto:matt.narr...@gmail.com>> wrote:
Active ResourceManager:  http://pastebin.com/hE0ppmnb
Standby ResourceManager: http://pastebin.com/DB8VjHqA

Oppressively chatty and not much valuable info contained therein.


On Apr 23, 2015, at 4:25 PM, Vinod Kumar Vavilapalli 
mailto:vino...@hortonworks.com>> wrote:

I have run into this offline with someone else too but couldn't root-cause it.

Will you be able to share your active/standby ResourceManager logs via pastebin 
or something?

+Vinod

On Apr 23, 2015, at 9:41 AM, Matt Narrell 
mailto:matt.narr...@gmail.com>> wrote:


I’m using Hadoop 2.6.0 from HDP 2.2.4 installed via Ambari 2.0

I’m testing the YARN HA ResourceManager failover. If I STOP the active 
ResourceManager (shut the machine off), the standby ResourceManager is elected 
to active, but the NodeManagers do not register themselves with the newly 
elected active ResourceManager. If I restart the machine (but DO NOT resume the 
YARN services) the NodeManagers register with the newly elected ResourceManager 
and my jobs resume. I assume I have some bad configuration, as this produces a 
SPOF, and is not HA in the sense I’m expecting.

Thanks,
mn






RE: how to delete logs automatically from hadoop yarn

2015-04-19 Thread Rohith Sharma K S
That’s  interesting use-case!!

>>>> let’s say I want to delete container logs which are older than week or so. 
>>>> So is there any configuration to do that?
I don’t think there is such configuration exist in the YARN currently. I think 
it should be able to handle from log4j properties.

But enabling log-aggregation, disk filling issue can be overcome. I think in 
the Hadoop-2.6 or later(yet to release)handling long running services on yarn 
is done in JIRA https://issues.apache.org/jira/i#browse/YARN-2443 .

>>> Because of these continuous logs, we are running out of Linux file limit 
>>> and thereafter containers are not launched because of exception while 
>>> creating log directory inside application ID directory
I could not get how continuous logs causing exceeding Linux resource limit.  
How many containers are running in cluster and per machine? If I think, each 
containers holds one resource for logging.


Thanks & Regards
Rohith Sharma K S

From: Smita Deshpande [mailto:smita.deshpa...@cumulus-systems.com]
Sent: 20 April 2015 10:23
To: user@hadoop.apache.org
Subject: RE: how to delete logs automatically from hadoop yarn

Hi Rohith,
Thanks for your solution. The actual problem we are looking at is : We have a 
lifelong running application, so configurations by which logs will be deleted 
right after application is finished will not help us.
Because of these continuous logs, we are running out of Linux file limit and 
thereafter containers are not launched because of exception while creating log 
directory inside application ID directory.
During the job execution itself, let’s say I want to delete container logs 
which are older than week or so. So is there any configuration to do that?

Thanks,
Smita


From: Rohith Sharma K S [mailto:rohithsharm...@huawei.com]
Sent: Monday, April 20, 2015 10:09 AM
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: RE: how to delete logs automatically from hadoop yarn

Hi

With below configuration , log deletion should be triggered.  You can see from 
the log that deletion has been set to 3600 sec in NM like below. May be you can 
check NM logs for the below log that give debug information.
“INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler:
 Scheduling Log Deletion for application: application_1428298081702_0008, with 
delay of 10800 seconds”

But there is another configuration which affect deletion task is 
“yarn.nodemanager.delete.debug-delay-sec”, default value is zero. It means 
immediately deletion will be triggered. Check is this is configured?
  

  Number of seconds after an application finishes before the nodemanager's
  DeletionService will delete the application's localized file directory
  and log directory.

  To diagnose Yarn application problems, set this property's value large
  enough (for example, to 600 = 10 minutes) to permit examination of these
  directories. After changing the property's value, you must restart the
  nodemanager in order for it to have an effect.

  The roots of Yarn applications' work directories is configurable with
  the yarn.nodemanager.local-dirs property (see below), and the roots
  of the Yarn applications' log directories is configurable with the
  yarn.nodemanager.log-dirs property (see also below).
    
yarn.nodemanager.delete.debug-delay-sec
0
  


Thanks & Regards
Rohith Sharma K S
From: Sunil Garg [mailto:sunil.g...@cumulus-systems.com]
Sent: 20 April 2015 09:52
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: how to delete logs automatically from hadoop yarn


How to delete logs from Hadoop yarn automatically, I Have tried following 
settings but it is not working
Is there any other way we can do this or am I doing something wrong !!


yarn.log-aggregation-enable
false



yarn.nodemanager.log.retain-seconds
3600


Thanks
Sunil Garg


RE: how to delete logs automatically from hadoop yarn

2015-04-19 Thread Rohith Sharma K S
Hi

With below configuration , log deletion should be triggered.  You can see from 
the log that deletion has been set to 3600 sec in NM like below. May be you can 
check NM logs for the below log that give debug information.
“INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler:
 Scheduling Log Deletion for application: application_1428298081702_0008, with 
delay of 10800 seconds”

But there is another configuration which affect deletion task is 
“yarn.nodemanager.delete.debug-delay-sec”, default value is zero. It means 
immediately deletion will be triggered. Check is this is configured?
  

  Number of seconds after an application finishes before the nodemanager's
  DeletionService will delete the application's localized file directory
  and log directory.

  To diagnose Yarn application problems, set this property's value large
  enough (for example, to 600 = 10 minutes) to permit examination of these
  directories. After changing the property's value, you must restart the
  nodemanager in order for it to have an effect.

  The roots of Yarn applications' work directories is configurable with
  the yarn.nodemanager.local-dirs property (see below), and the roots
  of the Yarn applications' log directories is configurable with the
  yarn.nodemanager.log-dirs property (see also below).

yarn.nodemanager.delete.debug-delay-sec
0
  


Thanks & Regards
Rohith Sharma K S
From: Sunil Garg [mailto:sunil.g...@cumulus-systems.com]
Sent: 20 April 2015 09:52
To: user@hadoop.apache.org
Subject: how to delete logs automatically from hadoop yarn


How to delete logs from Hadoop yarn automatically, I Have tried following 
settings but it is not working
Is there any other way we can do this or am I doing something wrong !!


yarn.log-aggregation-enable
false



yarn.nodemanager.log.retain-seconds
3600


Thanks
Sunil Garg


RE: Mapreduce job got stuck

2015-04-15 Thread Rohith Sharma K S
Hi,

On master machine, NodeManager is not running because of “Caused by: 
java.net.BindException: Problem binding to [kirti:8040], got from logs.

The port 8040 is in use!!! Configure available port number.


Thanks & Regards
Rohith Sharma K S

From: Vandana kumari [mailto:kvandana1...@gmail.com]
Sent: 15 April 2015 16:29
To: user@hadoop.apache.org; Rohith Sharma K S
Subject: Re: Mapreduce job got stuck

When i made the changes as specified by Rohith, my job is running but it runs 
only on slave nodes(amit & yashbir) not on master node(kirti) and still no 
nodemanager is running on master node.

On Wed, Apr 15, 2015 at 6:39 AM, Vandana kumari 
mailto:kvandana1...@gmail.com>> wrote:
i had attached nodemanager log of master file and modified yarn-site.xml file

On Wed, Apr 15, 2015 at 6:21 AM, Rohith Sharma K S 
mailto:rohithsharm...@huawei.com>> wrote:
Hi Vandana

From the configurations, it looks like none of the NodeManagers are registered 
with RM because of configuration “yarn.resourcemanager.resource- 
tracker.address” issue.  May be you can confirm any NM’s are registered with RM.

In the below, there is space after “resource-“ but “resource-tracker” is single 
without any space. Check after removing space.
yarn.resourcemanager.resource- tracker.address

Similarly I see same issue in “yarn.nodemanager.aux- 
services.mapreduce.shuffle.class” where space after “aux-”!!!

Hope it helps you to resolve issue

Thanks & Regards
Rohith Sharma K S

From: Vandana kumari 
[mailto:kvandana1...@gmail.com<mailto:kvandana1...@gmail.com>]
Sent: 15 April 2015 15:33
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Mapreduce job got stuck

i had setup a 3 node hadoop cluster on centos 6.5 but nodemanager is not 
running on master and is running on slave nodes. Alse when i submit a job then 
job get stuck. the same job runs well on sinle node setup. I am unable to 
figure out the problem. Attaching all the configuration files.
Any help will be highly appreciated.

--
Thanks and regards
  Vandana kumari



--
Thanks and regards
  Vandana kumari



--
Thanks and regards
  Vandana kumari


RE: Change in fair-scheduler.xml

2015-04-15 Thread Rohith Sharma K S
Hi


1 - Is there a document on what should be the default settings in the XML file 
for say 96 GB.. 48 core system with say 4/queues?
You can refer below the doc for configuring fair scheduler
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html


2 - When we change the file does the yarn service need to be bounced for the 
changed values to get reflected?
Yarn admin supports runtime refresh queues without restarting ResourceManager.  
It can be achieved by  using “$HADOOP_HOME/bin/yarn rmadmin –refreshQueues”  
CLI command.


Thanks & Regards
Rohith Sharma K S

From: Manish Maheshwari [mailto:mylogi...@gmail.com]
Sent: 15 April 2015 15:43
To: user@hadoop.apache.org
Subject: Change in fair-scheduler.xml


Hi, We are trying to change properties of fair scheduler settings.

1 - Is there a document on what should be the default settings in the XML file 
for say 96 GB.. 48 core system with say 4/queues?

2 - When we change the file does the yarn service need to be bounced for the 
changed values to get reflected?

Thanks
Manish


RE: Mapreduce job got stuck

2015-04-15 Thread Rohith Sharma K S
Hi Vandana

From the configurations, it looks like none of the NodeManagers are registered 
with RM because of configuration “yarn.resourcemanager.resource- 
tracker.address” issue.  May be you can confirm any NM’s are registered with RM.

In the below, there is space after “resource-“ but “resource-tracker” is single 
without any space. Check after removing space.
yarn.resourcemanager.resource- tracker.address

Similarly I see same issue in “yarn.nodemanager.aux- 
services.mapreduce.shuffle.class” where space after “aux-”!!!

Hope it helps you to resolve issue

Thanks & Regards
Rohith Sharma K S

From: Vandana kumari [mailto:kvandana1...@gmail.com]
Sent: 15 April 2015 15:33
To: user@hadoop.apache.org
Subject: Mapreduce job got stuck

i had setup a 3 node hadoop cluster on centos 6.5 but nodemanager is not 
running on master and is running on slave nodes. Alse when i submit a job then 
job get stuck. the same job runs well on sinle node setup. I am unable to 
figure out the problem. Attaching all the configuration files.
Any help will be highly appreciated.

--
Thanks and regards
  Vandana kumari


RE: How to stop a mapreduce job from terminal running on Hadoop Cluster?

2015-04-12 Thread Rohith Sharma K S
In addition to below options, in the Hadoop-2.7(yet to release in couple of 
weeks) the user friendly option provided for killing the applications from Web 
UI.

In the application block , ‘Kill Application’ button has been provided for 
killing applications.

Thanks & Regards
Rohith Sharma K S
From: Pradeep Gollakota [mailto:pradeep...@gmail.com]
Sent: 12 April 2015 23:41
To: user@hadoop.apache.org
Subject: Re: How to stop a mapreduce job from terminal running on Hadoop 
Cluster?

Also, mapred job -kill 

On Sun, Apr 12, 2015 at 11:07 AM, Shahab Yunus 
mailto:shahab.yu...@gmail.com>> wrote:
You can kill t by using the following yarn command

yarn application -kill 
https://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/YarnCommands.html

Or use old hadoop job command
http://stackoverflow.com/questions/11458519/how-to-kill-hadoop-jobs

Regards,
Shahab

On Sun, Apr 12, 2015 at 2:03 PM, Answer Agrawal 
mailto:yrsna.tse...@gmail.com>> wrote:
To run a job we use the command
$ hadoop jar example.jar inputpath outputpath
If job is so time taken and we want to stop it in middle then which command is 
used? Or is there any other way to do that?

Thanks,






RE: Pin Map/Reduce tasks to specific cores

2015-04-06 Thread Rohith Sharma K S
Hi George

In MRV2, YARN supports CGroups implementation.  Using CGroup it is possible to 
run containers in specific cores.

For your detailed reference, some of the useful links
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2-trunk/bk_system-admin-guide/content/ch_cgroups.html
http://blog.cloudera.com/blog/2013/12/managing-multiple-resources-in-hadoop-2-with-yarn/
http://riccomini.name/posts/hadoop/2013-06-14-yarn-with-cgroups/

P.S : I could not find any related document in Hadoop Yarn docs. I will raise 
ticket for the same  in community.

Hope the above information will help your use case!!!

Thanks & Regards
Rohith Sharma K S

From: George Ioannidis [mailto:giorgio...@gmail.com]
Sent: 07 April 2015 01:55
To: user@hadoop.apache.org
Subject: Pin Map/Reduce tasks to specific cores

Hello. My question, which can be found on Stack 
Overflow<http://stackoverflow.com/questions/29283213/core-affinity-of-map-tasks-in-hadoop>
 as well, regards pinning map/reduce tasks to specific cores, either on hadoop 
v.1.2.1 or hadoop v.2.
In specific, I would like to know if the end-user can have any control on which 
core executes a specific map/reduce task.

To pin an application on linux, there's the "taskset" command, but is anything 
similar provided by hadoop? If not, is the Linux Scheduler in charge of 
allocating tasks to specific cores?

--
Below I am providing two cases to better illustrate my question:
Case #1: 2 GiB input size, HDFS block size of 64 MiB and 2 compute nodes 
available, with 32 cores each.
As follows, 32 map tasks will be called; let's suppose that 
mapred.tasktracker.map.tasks.maximum = 16, so 16 map tasks will be allocated to 
each node.
Can I guarantee that each Map Task will run on a specific core, or is it up to 
the Linux Scheduler?

--

Case #2: The same as case #1, but now the input size is 8 GiB, so there are not 
enough slots for all map tasks (128), so multiple tasks will share the same 
cores.
Can I control how much "time" each task will spend on a specific core and if it 
will be reassigned to the same core in the future?
Any information on the above would be highly appreciated.
Kind Regards,
George


RE: Does Hadoop 2.6.0 have job level blacklisting?

2015-03-29 Thread Rohith Sharma K S
Hi Chris

Is there still job level blacklisting as there was in earlier versions?
>> yes, job level blacklisting support is there.  Application Master has to 
>> identify the nodes which it wants to blacklists and send those nodes details 
>> to ResourceManager via ApplicationMasterProtocol#allocate request. On 
>> blacklisted nodes, containers will not be assigned thereafter.

   Java Doc
https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/yarn/api/ApplicationMasterProtocol.html#allocate(org.apache.hadoop.yarn.api.protocolrecords.AllocateRequest)
https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/yarn/api/protocolrecords/AllocateRequest.html

Thanks & Regards
Rohith Sharma K S
From: Chris Mawata [mailto:chris.maw...@gmail.com]
Sent: 29 March 2015 01:10
To: user@hadoop.apache.org
Subject: Does Hadoop 2.6.0 have job level blacklisting?

At 
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html#Monitoring_Health_of_NodeManagers
is a description of how you can have a script check the health of a node and 
indicate to the ResourceManager that it is unhealthy. This seems to be at the 
cluster level. Is there still job level blacklisting as there was in earlier 
versions?

Chris Mawata


RE: How to troubleshoot failed or stuck jobs

2015-03-01 Thread Rohith Sharma K S
Hi


1.   For the Failed jobs, you can directly check the MRAppMaster logs.  
There you get reason for failed jobs.

2.   For the stuck job, you need to do some ground work to identify what is 
going wrong. It can be either YARN issue or MapReduce issue.

2.1   In a recent time, I have face job stuck many times if headroom 
calculation goes wrong.  Headroom is sent by RM to ApplicationMaster and AM 
uses this as deciding factors ( 
https://issues.apache.org/jira/i#browse/YARN-1680 ).  Corresponding parent jira 
is  https://issues.apache.org/jira/i#browse/YARN-1198

2.2   When the job is stuck,
YARN – try to get ClusterMemory Used, ClusterMemory Reserved, Total Memory, How 
many NodeManagers? What is the headroom sent to AM.
 MapReduce – Any NM’s are blacklisted, Does all the reducers 
tasks are using ClusterMemory? By default Reducers start before Mapper 
completion. In case if Mapper fails because of some unstable node, then 
reducers take over the cluster. Here, it is expected reducers should be 
pre-empted. Need to identify whether reducers are getting pre-empted.
MRAppMaster log would help for some extent to analyze the issue.

Thanks & Regards
Rohith Sharma K S

From: Krish Donald [mailto:gotomyp...@gmail.com]
Sent: 02 March 2015 11:09
To: user@hadoop.apache.org
Subject: Re: How to troubleshoot failed or stuck jobs

Thanks for Link Ted,

However wanted to understand the approach which should be taken when 
troubleshooting failed or stuck jobs ?


On Sun, Mar 1, 2015 at 8:52 PM, Ted Yu 
mailto:yuzhih...@gmail.com>> wrote:
Here are some related discussions and JIRA:

http://search-hadoop.com/m/LgpTk2gxrGx
http://search-hadoop.com/m/LgpTk2YLArE

https://issues.apache.org/jira/browse/MAPREDUCE-6190

Cheers

On Sun, Mar 1, 2015 at 8:41 PM, Krish Donald 
mailto:gotomyp...@gmail.com>> wrote:
Hi,

Wanted to understand,  How to troubleshoot failed or stuck jobs ?

Thanks
Krish




RE: about the jobid

2015-03-01 Thread Rohith Sharma K S
Hi

Yarn application id allocation based on the daemon ResourceManager start 
time(assuming cluster is MR2 else JobTracker start time). Say if you have 3 job 
client submitting jobs to Yarn, then application id are 
application__0001, application__0002, 
application__0003 AND corresponding job id's are 
job__0001, job__0002 and job__0003 
respectively. 

>>>> Is the jobid should be job_201502281500_ ? what is the problem?
 No, this is behaviour. In your case, 201502271057 is the start time of 
ResourceManager. So all the applications submitted to Yarn start with 
application_201502271057_ and corresponding job id is 
Job_201502271057_. The '' is counter for every job submission.


Thanks & Regards
Rohith Sharma K S

-Original Message-
From: lujinhong [mailto:lujinh...@yahoo.com] 
Sent: 01 March 2015 19:40
To: User Hadoop
Subject: about the jobid

Hi, all.

   I run nutch in deploy mode at about 3pm, 02/28/2015, but the jobid is 
job_201502271057_0251.I found that 201502271057 is the time I start hadoop(by 
start-all.sh).
   Is the jobid should be job_201502281500_ ? what is the problem?

system date:
  [jediael@master history]$ date
Sat Feb 28 15:39:00 CST 2015

log files of hadoop:
/mnt/jediael/hadoop-1.2.1/logs/history
[jediael@master history]$ ls
donejob_201502271057_0245_conf.xml  
job_201502271057_0248_conf.xml 
job_201502271057_0251_1425107493248_jediael_%5BFeb2815%5Dfetch
job_201502271057_0243_conf.xml  job_201502271057_0246_conf.xml 
job_201502271057_0249_conf.xml  job_201502271057_0251_conf.xml 
job_201502271057_0244_conf.xml  job_201502271057_0247_conf.xml 
job_201502271057_0250_conf.xml

stdout of fetcher job:
15/02/28 15:11:32 INFO zookeeper.ClientCnxn: EventThread shut down
15/02/28 15:11:32 INFO zookeeper.ZooKeeper: Session: 0x4bc8f7c30a031b closed
15/02/28 15:11:33 INFO mapred.JobClient: Running job: job_201502271057_0251
15/02/28 15:11:34 INFO mapred.JobClient:  map 0% reduce 0%
15/02/28 15:11:51 INFO mapred.JobClient:  map 100% reduce 0%
15/02/28 15:12:00 INFO mapred.JobClient:  map 100% reduce 16%
15/02/28 15:12:03 INFO mapred.JobClient:  map 100% reduce 53%


RE: YarnClient to get the running applications list in java

2015-02-26 Thread Rohith Sharma K S
Simple way to meet your goal , you can add hadoop jars into project classpath. 
I.e  If you have hadoop package, extract it and add all the jars into project 
classpath.

Then you change java code below

YarnConfiguration conf = new YarnConfiguration();

conf.set("yarn.resourcemanager.address", "rm-ip:port"); // Running RM 
address

YarnClient yarnClient = YarnClient.createYarnClient();
yarnClient.init(conf);

yarnClient.start(); // you need to start YarnClient service

// code to getApplications()

  }


Thanks & Regards
Rohith Sharma K S
From: Mouzzam Hussain [mailto:monibab...@gmail.com]
Sent: 26 February 2015 16:23
To: user@hadoop.apache.org
Subject: YarnClient to get the running applications list in java


I am working with YarnClient for the 1st time. My goal is to get and display 
the applications running on Yarn using Java. My project setup is as follows:

public static void main(String[] args) throws IOException, YarnException {

// Create yarnClient

YarnConfiguration conf = new YarnConfiguration();

YarnClient yarnClient = YarnClient.createYarnClient();

yarnClient.init(conf);



try {

List applications = yarnClient.getApplications();

System.err.println("yarn client : " + applications.size());

} catch (YarnException e) {

e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

}



}

I get the following exception when i run the program:

java.lang.reflect.InvocationTargetException

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:483)

at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.NoClassDefFoundError: 
org/apache/hadoop/HadoopIllegalArgumentException

at projects.HelloWorld.main(HelloWorld.java:16)

... 6 more

Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.HadoopIllegalArgumentException

at java.net.URLClassLoader$1.run(URLClassLoader.java:372)

at java.net.URLClassLoader$1.run(URLClassLoader.java:361)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:360)

at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

The POM file is as follows:



http://maven.apache.org/POM/4.0.0";

 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";

 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>

4.0.0



BigContent

ManagementServer

1.0-SNAPSHOT







UTF-8

2.4.0

1.2.1













org.apache.maven.plugins

maven-compiler-plugin

3.2



1.7

1.7









org.apache.maven.plugins

maven-war-plugin

2.3







default-war

none







war-exploded

prepare-package



exploded









custom-war

package



war









src/main/webapp/WEB-INF/web.xml







resource2

























org.apache.spark

spark-streaming_2.10

${spark.version}

provided







com.sun.jersey

jersey-core

1.9.1







org.apache.hadoop

hadoop-client

${hadoop.version}





javax.servlet

*











org.apache.hadoop

hadoop-yarn-common

${hadoop.version}









org.apache.hadoop

hadoop-common

${hadoop.version}

provided

   

RE: Node manager contributing to one queue's resources

2015-02-26 Thread Rohith Sharma K S
Hi

If you are using CapacityScheduler, can you try using 
DominantResourceCalculator i.e configuring below property value in 
capacity-scheduler.xml file.

  
yarn.scheduler.capacity.resource-calculator
org.apache.hadoop.yarn.util.resource. DominantResourceCalculator 

  

The basic Idea it works is as follows  ‘if user A runs CPU-heavy tasks and user 
B runs memory-heavy tasks, it attempts to equalize CPU share of user A with 
Memory-share of user B’

See Java Doc
https://apache.googlesource.com/hadoop-common/+/60e3b885ba8344d9f448202f5f2c290b5606ff8f/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/resource/DominantResourceCalculator.java

I think this may help you!!!

Thanks & Regards
Rohith Sharma K S

From: twinkle sachdeva [mailto:twinkle.sachd...@gmail.com]
Sent: 26 February 2015 14:05
To: USers Hadoop
Subject: Node manager contributing to one queue's resources

Hi,

I have to run two kind of applications, one requiring less cores but more 
memory ( Application_High_Mem) and another application which requires more 
cores but less memory ( Application_High_Core).

I can use specific queues to submit them to, but that can lead to one node 
contributing to only one one such application and having some part of resources 
as idle.

Is there a way, let's say extending concept of queues at node manager level to 
do this or some other way, in which i can achieve it in YARN?

Thanks,
Twinkle


RE: Time out after 600 for YARN mapreduce application

2015-02-11 Thread Rohith Sharma K S
Looking into attemptID, this is mapper task getting timed out in MapReduce job. 
 The configuration that can be used to increase the value is 
'mapreduce.task.timeout'.

The task timed out is because if there is no heartbeat from 
MapperTask(YarnChild) to MRAppMaster for 10 mins.  Does MR job is custom job?  
If so any operation are you doing in cleanup() of Mapper ? Sometimes there 
would be possible that if cleanup() of Mapper is taking more time greater than 
timedout configured that result in task to timeout.


Thanks & Regards
Rohith Sharma K S
From: Alexandru Pacurar [mailto:alexandru.pacu...@propertyshark.com]
Sent: 11 February 2015 15:34
To: user@hadoop.apache.org
Subject: Time out after 600 for YARN mapreduce application

Hello,

I keep encountering an error when running nutch on hadoop YARN:

AttemptID:attempt_1423062241884_9970_m_09_0 Timed out after 600 secs

Some info on my setup. I'm running a 64 nodes cluster with hadoop 2.4.1. Each 
node has 4 cores, 1 disk and 24Gb of RAM, and the namenode/resourcemanager has 
the same specs only with 8 cores.

I am pretty sure one of these parameters is to the threshold I'm hitting:

yarn.am.liveness-monitor.expiry-interval-ms
yarn.nm.liveness-monitor.expiry-interval-ms
yarn.resourcemanager.nm.liveness-monitor.interval-ms

but I would like to understand why.

The issue usually appears under heavier load, and most of the time the on the 
next attempts it is successful. Also if I restart the Hadoop cluster the error 
goes away for some time.

Thanks,
Alex


RE: Error with winutils.sln

2015-02-10 Thread Rohith Sharma K S
Download patch from jira : https://issues.apache.org/jira/i#browse/HADOOP-9922

Thanks & Regards
Rohith Sharma K S

From: Venkat Ramakrishnan [mailto:venkat.archit...@gmail.com]
Sent: 10 February 2015 17:06
To: user@hadoop.apache.org
Subject: Re: Error with winutils.sln

Thank you Rohit.

Could you please point me to the documentation/information/location
related to Hadoop's 9922 patch?

Thx,
Venkat.


On Tue, Feb 10, 2015 at 4:51 PM, Rohith Sharma K S 
mailto:rohithsharm...@huawei.com>> wrote:
There are some issues for compiling Hadoop in win32 platform. Even I am facing 
same issues. I think it is explicitly removed the support.

But It is possible to compile successfully by  tweaking some of the files. 
Follow the below instructions

1. Apply the patch HADOOP-9922.patch to your 2.6 version

patch –p1 < HADOOP-9922.patch

2.Replace “Release|x64” with “Release|Win32” in 
$HADOOP_HOME\hadoop-common-project\hadoop-common\src\main\winutils\winutils.sln

3.   Replace “x64” with “Win32” in

$HADOOP_HOME\hadoop-common-project\hadoop-common\src\main\winutils\winutils.vcxproj
 and

$HADOOP_HOME\hadoop-common-project\hadoop-common\src\main\winutils\libwinutils.vcxproj

If in your machine native compilation does not happen because of cmake not 
installed or any other reason then you will face issue while compiling hdfs 
project. So, for the sake of compiling you can skip native compilation at hdfs.

4.  To skip native compilation, add “${skipTests}” or 
“true” in $HADOOP_HOME \hadoop-hdfs-project\hadoop-hdfs\pom.xml.

   

  ${skipTests}

  



Note : there are 2 occurrences, you add at both 2 occurrence

And compile using “mvn clean install –DskipTests”

Hope this will help to compile.. enjoy with Hadoop!!!

Thanks & Regards
Rohith Sharma K S

From: Venkat Ramakrishnan 
[mailto:venkat.archit...@gmail.com<mailto:venkat.archit...@gmail.com>]
Sent: 10 February 2015 16:22
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Error with winutils.sln

Hello,

I'm getting the following error while compiling with Windows 7 (32 bit). I
have set the Platform as Win32. The error complains about solution
configuration being different from winutils.sln:

.
.
.
.
[DEBUG] Configuring mojo org.codehaus.mojo:exec-maven-plugin:1.2:exec from 
plugin realm ClassRealm[plugin>org.codehaus.mojo:exec-maven-plugin:1.2, parent: 
sun.misc.Launcher$AppClassLoader@647e05<mailto:sun.misc.Launcher$AppClassLoader@647e05>]
[DEBUG] Configuring mojo 'org.codehaus.mojo:exec-maven-plugin:1.2:exec' with 
basic configurator -->
[DEBUG]   (f) arguments = 
[D:\h\hadoop-2.6.0-src\hadoop-common-project\hadoop-common/src/main/winutils/winutils.sln,
 /nologo, /p:Configuration=Release, 
/p:OutDir=D:\h\hadoop-2.6.0-src\hadoop-common-project\hadoop-common\target/bin/,
 
/p:IntermediateOutputPath=D:\h\hadoop-2.6.0-src\hadoop-common-project\hadoop-common\target/winutils/,
 /p:WsceConfigDir=../etc/hadoop, /p:WsceConfigFile=wsce-site.xml]
[DEBUG]   (f) basedir = 
D:\h\hadoop-2.6.0-src\hadoop-common-project\hadoop-common
[DEBUG]   (f) classpathScope = runtime
[DEBUG]   (f) executable = msbuild
[DEBUG]   (f) longClasspath = false
[DEBUG]   (f) project = MavenProject: org.apache.hadoop:hadoop-common:2.6.0 @ 
D:\h\hadoop-2.6.0-src\hadoop-common-project\hadoop-common\pom.xml
[DEBUG]   (f) session = 
org.apache.maven.execution.MavenSession@157dc72<mailto:org.apache.maven.execution.MavenSession@157dc72>
[DEBUG]   (f) skip = false
[DEBUG] -- end configuration --
[DEBUG] Executing command line: msbuild 
D:\h\hadoop-2.6.0-src\hadoop-common-project\hadoop-common/src/main/winutils/winutils.sln
 /nologo /p:Configuration=Release 
/p:OutDir=D:\h\hadoop-2.6.0-src\hadoop-common-project\hadoop-common\target/bin/ 
/p:IntermediateOutputPath=D:\h\hadoop-2.6.0-src\hadoop-common-project\hadoop-common\target/winutils/
 /p:WsceConfigDir=../etc/hadoop /p:WsceConfigFile=wsce-site.xml
Build started 07-02-2015 09:55:21.
Project 
"D:\h\hadoop-2.6.0-src\hadoop-common-project\hadoop-common\src\main\winutils\winutils.sln"
 on node 1 (default targets).
D:\h\hadoop-2.6.0-src\hadoop-common-project\hadoop-common\src\main\winutils\winutils.sln.metaproj
 : error MSB4126: The specified solution configuration "Release|Win32" is 
invalid. Please specify a valid solution configuration using the Configuration 
and Platform properties (e.g. MSBuild.exe Solution.sln /p:Configuration=Debug 
/p:Platform="Any CPU") or leave those properties blank to use the default 
solution configuration. 
[D:\h\hadoop-2.6.0-src\hadoop-common-project\hadoop-common\src\main\winutils\winutils.sln]
Done Building Project 
"D:\h\hadoop-2.6.0-src\hadoop-common-project\hadoop-common\src\main\winutils\winutils.sln"
 (default targets) -- FAILED.

Build FAILED.

"D:\h\hadoop-2.6.0-src\hadoop-common-project\hadoop-common\src\mai

RE: Error with winutils.sln

2015-02-10 Thread Rohith Sharma K S
There are some issues for compiling Hadoop in win32 platform. Even I am facing 
same issues. I think it is explicitly removed the support.

But It is possible to compile successfully by  tweaking some of the files. 
Follow the below instructions

1. Apply the patch HADOOP-9922.patch to your 2.6 version

patch –p1 < HADOOP-9922.patch

2.Replace “Release|x64” with “Release|Win32” in 
$HADOOP_HOME\hadoop-common-project\hadoop-common\src\main\winutils\winutils.sln

3.   Replace “x64” with “Win32” in

$HADOOP_HOME\hadoop-common-project\hadoop-common\src\main\winutils\winutils.vcxproj
 and

$HADOOP_HOME\hadoop-common-project\hadoop-common\src\main\winutils\libwinutils.vcxproj

If in your machine native compilation does not happen because of cmake not 
installed or any other reason then you will face issue while compiling hdfs 
project. So, for the sake of compiling you can skip native compilation at hdfs.

4.  To skip native compilation, add “${skipTests}” or 
“true” in $HADOOP_HOME \hadoop-hdfs-project\hadoop-hdfs\pom.xml.

   

  ${skipTests}

  



Note : there are 2 occurrences, you add at both 2 occurrence

And compile using “mvn clean install –DskipTests”

Hope this will help to compile.. enjoy with Hadoop!!!

Thanks & Regards
Rohith Sharma K S

From: Venkat Ramakrishnan [mailto:venkat.archit...@gmail.com]
Sent: 10 February 2015 16:22
To: user@hadoop.apache.org
Subject: Error with winutils.sln

Hello,

I'm getting the following error while compiling with Windows 7 (32 bit). I
have set the Platform as Win32. The error complains about solution
configuration being different from winutils.sln:

.
.
.
.
[DEBUG] Configuring mojo org.codehaus.mojo:exec-maven-plugin:1.2:exec from 
plugin realm ClassRealm[plugin>org.codehaus.mojo:exec-maven-plugin:1.2, parent: 
sun.misc.Launcher$AppClassLoader@647e05<mailto:sun.misc.Launcher$AppClassLoader@647e05>]
[DEBUG] Configuring mojo 'org.codehaus.mojo:exec-maven-plugin:1.2:exec' with 
basic configurator -->
[DEBUG]   (f) arguments = 
[D:\h\hadoop-2.6.0-src\hadoop-common-project\hadoop-common/src/main/winutils/winutils.sln,
 /nologo, /p:Configuration=Release, 
/p:OutDir=D:\h\hadoop-2.6.0-src\hadoop-common-project\hadoop-common\target/bin/,
 
/p:IntermediateOutputPath=D:\h\hadoop-2.6.0-src\hadoop-common-project\hadoop-common\target/winutils/,
 /p:WsceConfigDir=../etc/hadoop, /p:WsceConfigFile=wsce-site.xml]
[DEBUG]   (f) basedir = 
D:\h\hadoop-2.6.0-src\hadoop-common-project\hadoop-common
[DEBUG]   (f) classpathScope = runtime
[DEBUG]   (f) executable = msbuild
[DEBUG]   (f) longClasspath = false
[DEBUG]   (f) project = MavenProject: org.apache.hadoop:hadoop-common:2.6.0 @ 
D:\h\hadoop-2.6.0-src\hadoop-common-project\hadoop-common\pom.xml
[DEBUG]   (f) session = 
org.apache.maven.execution.MavenSession@157dc72<mailto:org.apache.maven.execution.MavenSession@157dc72>
[DEBUG]   (f) skip = false
[DEBUG] -- end configuration --
[DEBUG] Executing command line: msbuild 
D:\h\hadoop-2.6.0-src\hadoop-common-project\hadoop-common/src/main/winutils/winutils.sln
 /nologo /p:Configuration=Release 
/p:OutDir=D:\h\hadoop-2.6.0-src\hadoop-common-project\hadoop-common\target/bin/ 
/p:IntermediateOutputPath=D:\h\hadoop-2.6.0-src\hadoop-common-project\hadoop-common\target/winutils/
 /p:WsceConfigDir=../etc/hadoop /p:WsceConfigFile=wsce-site.xml
Build started 07-02-2015 09:55:21.
Project 
"D:\h\hadoop-2.6.0-src\hadoop-common-project\hadoop-common\src\main\winutils\winutils.sln"
 on node 1 (default targets).
D:\h\hadoop-2.6.0-src\hadoop-common-project\hadoop-common\src\main\winutils\winutils.sln.metaproj
 : error MSB4126: The specified solution configuration "Release|Win32" is 
invalid. Please specify a valid solution configuration using the Configuration 
and Platform properties (e.g. MSBuild.exe Solution.sln /p:Configuration=Debug 
/p:Platform="Any CPU") or leave those properties blank to use the default 
solution configuration. 
[D:\h\hadoop-2.6.0-src\hadoop-common-project\hadoop-common\src\main\winutils\winutils.sln]
Done Building Project 
"D:\h\hadoop-2.6.0-src\hadoop-common-project\hadoop-common\src\main\winutils\winutils.sln"
 (default targets) -- FAILED.

Build FAILED.

"D:\h\hadoop-2.6.0-src\hadoop-common-project\hadoop-common\src\main\winutils\winutils.sln"
 (default target) (1) ->
(ValidateSolutionConfiguration target) ->
  
D:\h\hadoop-2.6.0-src\hadoop-common-project\hadoop-common\src\main\winutils\winutils.sln.metaproj
 : error MSB4126: The specified solution configuration "Release|Win32" is 
invalid. Please specify a valid solution configuration using the Configuration 
and Platform properties (e.g. MSBuild.exe Solution.sln /p:Configuration=Debug 
/p:Platform="Any CPU") or leave those properties blank to use the default 
solution configuration. 
[D:\h\hadoop-2.6.0-src\hadoop-com

RE: Can not execute failover for RM HA

2015-02-10 Thread Rohith Sharma K S
Currently automatic failover is not supported by YARN. This is open issue in 
Yarn
Refer : https://issues.apache.org/jira/i#browse/YARN-1177

Thanks & Regards
Rohith Sharma K S

From: 郝东 [mailto:donhof...@163.com]
Sent: 10 February 2015 16:12
To: user@hadoop.apache.org
Subject: Can not execute failover for RM HA

I just set up ResourceManager HA. Both of the resourcemanagers started 
correctly. When I killed the active one, the other became active. But when I 
used the following command to do a manual failover, I got exceptions. I don't 
know what cause this problem. Could anyone help me ? Many Thanks!

Command:
yarn rmadmin -failover rm1 rm2

Exceptions:
Exception in thread "main" java.lang.UnsupportedOperationException: 
RMHAServiceTarget doesn't have a corresponding ZKFC address
at 
org.apache.hadoop.yarn.client.RMHAServiceTarget.getZKFCAddress(RMHAServiceTarget.java:51)
at 
org.apache.hadoop.ha.HAServiceTarget.getZKFCProxy(HAServiceTarget.java:94)
at 
org.apache.hadoop.ha.HAAdmin.gracefulFailoverThroughZKFCs(HAAdmin.java:315)
at org.apache.hadoop.ha.HAAdmin.failover(HAAdmin.java:286)
at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:453)
at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:382)
at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:318)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.yarn.client.cli.RMAdminCLI.main(RMAdminCLI.java:434)



RE: QueueMetrics.AppsKilled/Failed metrics and failure reasons

2015-02-03 Thread Rohith Sharma K S
There are several ways to confirm from YARN that total number of Killed/Failed 
applications in cluster
1. Get from RM web UI lists OR
2. From admin try using this to get numbers of failed and killed applications: 
./yarn application -list -appStates FAILED,KILLED
3. Using client API's

Since metrics values are displayed in ganglia is incorrect, I get doubt that 
1. does ganglia is pointing out to correct RM cluster? Or 
2. what is the method ganglia uses to retrieve QueueMetrics? 
3. Any client program calculates you have written retrieve apps and calculate 
it?


Thanks & Regards
Rohith Sharma K S

-Original Message-
From: Suma Shivaprasad [mailto:sumasai.shivapra...@gmail.com] 
Sent: 04 February 2015 11:03
To: user@hadoop.apache.org
Cc: yarn-...@hadoop.apache.org
Subject: Re: QueueMetrics.AppsKilled/Failed metrics and failure reasons

Using hadoop 2.4.0. #of Applications running on average is small ~ 40 -60.
The metrics in Ganglia shows around around 10-30 apps killed every 5 mins which 
is very high wrt to the apps running at any given time(40-60). The RM logs 
though show 0 failed apps in audit logs during that hour.
The RM UI also doesnt show any apps in Applications->Failed tab . The logs are 
getting rolled over at a slower rate ..every 1-2 hours. Am searching for 
"Application Finished - Failed" to find the apps failed. Please let me know if 
I am missing something here.

Thanks
Suma




On Wed, Feb 4, 2015 at 10:03 AM, Rohith Sharma K S < rohithsharm...@huawei.com> 
wrote:

>  Hi
>
>
>
> Could you give more information, which version of hadoop are you using?
>
>
>
> >> QueueMetrics.AppsKilled/Failed metrics shows much higher nos i.e ~100.
> However RMAuditLogger shows 1 or 2 Apps as Killed/Failed in the logs.
>
> May be I suspect that Logs might be rolled out. Does more applications 
> are running?
>
>
>
> All the applications history will be displayed  on RM web UI (provided 
> RM is not restarted or RM recovery enabled). May be you can check 
> these applications lists.
>
>
>
> For finding reasons for application killed/failed, one way is you can 
> check in NodeManager logs also. Here  you need to check using 
> container_id for corresponding application.
>
>
>
> Thanks & Regards
>
> Rohith Sharma K S
>
>
>
> *From:* Suma Shivaprasad [mailto:sumasai.shivapra...@gmail.com]
> *Sent:* 03 February 2015 21:35
> *To:* user@hadoop.apache.org; yarn-...@hadoop.apache.org
> *Subject:* QueueMetrics.AppsKilled/Failed metrics and failure reasons
>
>
>
> Hello,
>
>
> Was trying to debug reasons for Killed/Failed apps and was checking 
> for the applications that were killed/failed in RM logs - from RMAuditLogger.
>
>  QueueMetrics.AppsKilled/Failed metrics shows much higher nos i.e ~100.
> However RMAuditLogger shows 1 or 2 Apps as Killed/Failed in the logs. 
> Is it possible that some logs are missed by AuditLogger or is it the 
> other way round and metrics are being reported higher ?
>
> Thanks
>
> Suma
>


RE: QueueMetrics.AppsKilled/Failed metrics and failure reasons

2015-02-03 Thread Rohith Sharma K S
Hi

Could you give more information, which version of hadoop are you using?


>> QueueMetrics.AppsKilled/Failed metrics shows much higher nos i.e ~100. 
>> However RMAuditLogger shows 1 or 2 Apps as Killed/Failed in the logs.
May be I suspect that Logs might be rolled out. Does more applications are 
running?

All the applications history will be displayed  on RM web UI (provided RM is 
not restarted or RM recovery enabled). May be you can check these applications 
lists.

For finding reasons for application killed/failed, one way is you can check in 
NodeManager logs also. Here  you need to check using container_id for 
corresponding application.

Thanks & Regards
Rohith Sharma K S

From: Suma Shivaprasad [mailto:sumasai.shivapra...@gmail.com]
Sent: 03 February 2015 21:35
To: user@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: QueueMetrics.AppsKilled/Failed metrics and failure reasons

Hello,

Was trying to debug reasons for Killed/Failed apps and was checking for the 
applications that were killed/failed in RM logs - from RMAuditLogger.
QueueMetrics.AppsKilled/Failed metrics shows much higher nos i.e ~100. However 
RMAuditLogger shows 1 or 2 Apps as Killed/Failed in the logs. Is it possible 
that some logs are missed by AuditLogger or is it the other way round and 
metrics are being reported higher ?
Thanks
Suma


RE: hadoop yarn

2015-01-19 Thread Rohith Sharma K S
Refer below link,
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html

Thanks & Regards
Rohith  Sharma K S

From: siva kumar [mailto:siva165...@gmail.com]
Sent: 20 January 2015 11:24
To: user@hadoop.apache.org
Subject: hadoop yarn

Hi All,
   Can anyone suggest me few links for writing  MR2 program on Yarn ?





Thanks and regrads,
siva


RE: node manager ports during mapreduce job

2015-01-11 Thread Rohith Sharma K S
Hi

Could you give more information regarding problem?

I did not get what do you mean by this statement
>> Upon submitting the mapreduce job to the resource manager, it is getting 
>> stuck while at getResources() for 10 min, timing out and then it is trying 
>> other node manager.
If MRAppMaster does not communicate to RM for 10 mins, RM will expire that 
applicationattempt and try to re launch it.  But you  have mentioned that it is 
trying to other node manager, which daemon is trying to other node manager?

I suggest  you that whenever there is problem like getting stuck, take a thread 
dump using jstack , this would help analyzing issue faster.

Any free ports i.e  1024<=x<=65365 should work fine.

Thanks & Regards
Rohith Sharma K S

From: hitarth trivedi [mailto:t.hita...@gmail.com]
Sent: 12 January 2015 07:01
To: user@hadoop.apache.org
Subject: node manager ports during mapreduce job

Hi,

We have a resource manager with 4 node managers. Upon submitting the mapreduce 
job to the resource manager, it is getting stuck while at getResources() for 10 
min, timing out and then it is trying other node manager.
When only one nodemanager running, everything is fine. Upon turning off the 
firewall on all node managers, everything seems working.
Upon looking at the netstat, it was wide range of ports between 3 to 61000 
that noedmanagers/reosurcemanagers were communicating.
So I opened the tcp ports in the range 3:61000 and turned on the firewall. 
But it does not seem to work.
Any idea, what needs to be done here?

Thx
-Hitarth


RE: Question about shuffle/merge/sort phrase

2014-12-21 Thread Rohith Sharma K S
whose responsibility is it that brings each key with all its values together
>> You can set combiner class in your job. For more information , refer
http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html

Thanks & Regards
Rohith Sharma K S

This e-mail and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
e-mail in error, please notify the sender by phone or email immediately and 
delete it!

From: Todd [mailto:bit1...@163.com]
Sent: 21 December 2014 19:29
To: user@hadoop.apache.org
Subject: Question about shuffle/merge/sort phrase

Hi, Hadoopers,
I got a question about shuffle/sort/merge phrase related..
My understanding is that shuffle is used to transfer the mapper 
output(key/value pairs) from mapper node to reducer node, and merge phrase is 
used to merge all the mapper output from all mapper nodes, and sort phrase is 
used to sort the key/value pair by key,
Then my question, whose responsibility is it that brings each key with all its 
values together (The reducer's input is a key and an iterative values).

Thanks.


RE: How do I enable debug mode

2014-12-04 Thread Rohith Sharma K S
You can use below configuration at client for changing log level at MR

ApplicationMaster :yarn.app.mapreduce.am.log.level=DEBUG
Mapper : mapreduce.map.log.level=DEBUG
Reducer : mapreduce.reduce.log.level=DEBUG

Thanks & Regards
Rohith Sharma K S

This e-mail and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
e-mail in error, please notify the sender by phone or email immediately and 
delete it!

From: Gino Gu01 [mailto:gino_g...@infosys.com]
Sent: 04 December 2014 13:32
To: user@hadoop.apache.org
Subject: How do I enable debug mode

Hello,

I have below code in mapreduce program.
if(logger.isDebugEnabled()){
logger.info("Mapper value =" + value);
}

How do I enable debug mode  to print “Mapper value =” in the logs.
I tried modifying hadoop-2.5.1/etc/hadoop/ log4j.properties, and it still 
doesn’t work.

Thanks

 CAUTION - Disclaimer *

This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely

for the use of the addressee(s). If you are not the intended recipient, please

notify the sender by e-mail and delete the original message. Further, you are 
not

to copy, disclose, or distribute this e-mail or its contents to any other 
person and

any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken

every reasonable precaution to minimize this risk, but is not liable for any 
damage

you may sustain as a result of any virus in this e-mail. You should carry out 
your

own virus checks before opening the e-mail or attachment. Infosys reserves the

right to monitor and review the content of all messages sent to or from this 
e-mail

address. Messages sent to or from this e-mail address may be stored on the

Infosys e-mail system.

***INFOSYS End of Disclaimer INFOSYS***




RE: Job object toString() is throwing an exception

2014-11-25 Thread Rohith Sharma K S
Could you give error message or stack trace?

From: Corey Nolet [mailto:cjno...@gmail.com]
Sent: 26 November 2014 07:54
To: user@hadoop.apache.org
Subject: Job object toString() is throwing an exception

I was playing around in the Spark shell and newing up an instance of Job that I 
could use to configure the inputformat for a job. By default, the Scala shell 
println's the result of every command typed. It throws an exception when it 
printlns the newly created instance of Job because it looks like it's setting a 
state upon allocation and it's not happy with the state that it's in when 
toString() is called before the job is submitted.

I'm using Hadoop 2.5.1. I don't see any tickets for this for 2.6. Has anyone 
else ran into this?


RE: Hadoop Installation Path problem

2014-11-24 Thread Rohith Sharma K S
The problem is with setting JAVA_HOME. There is .(Dot) before /usr which cause 
append current directory.
export JAVA_HOME=./usr/lib64/jdk1.7.0_71/jdk7u71

Do not use .(Dot) before /usr.

Thanks & Regards
Rohith Sharma K S

This e-mail and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
e-mail in error, please notify the sender by phone or email immediately and 
delete it!

From: Anand Murali [mailto:anand_vi...@yahoo.com]
Sent: 24 November 2014 17:44
To: user@hadoop.apache.org; user@hadoop.apache.org
Subject: Hadoop Installation Path problem

Hi All:


I have done the follwoing in hadoop-env.sh

export JAVA_HOME=./usr/lib64/jdk1.7.0_71/jdk7u71
export HADOOP_HOME=/home/anand_vihar/hadoop
export PATH=:$PATH:$JAVA_HOME:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

Now when I run hadoop-env.sh and type hadoop version, I get this error.

/home/anand_vihar/hadoop/bin/hadoop: line 133: 
/home/anand_vihar/hadoop/etc/hadoop/usr/lib64/jdk1.7.0_71/jdk7u71/bin/java: No 
such file or directory
/home/anand_vihar/hadoop/bin/hadoop: line 133: exec: 
/home/anand_vihar/hadoop/etc/hadoop/usr/lib64/jdk1.7.0_71/jdk7u71/bin/java: 
cannot execute: No such file or directory


Can somebody advise. I have asked this to many people, they all say the obvious 
path problem, but where I cannot debug. This has become a show stopper for me. 
Help most welcome.

Thanks

Regards


Anand Murali
11/7, 'Anand Vihar', Kandasamy St, Mylapore
Chennai - 600 004, India
Ph: (044)- 28474593/ 43526162 (voicemail)


RE: Resource Manager's container allocation behavior !

2014-11-23 Thread Rohith Sharma K S
Hi Hamza Zafar

 I would like to let you know first that ApplicationMasterProtocol# 
allocate() has not only for requesting container but also doubles up as a 
heartbeat to let the ResourceManager know that the ApplicationMaster is alive
So basically your ApplicationMaster should be keep sending heartbeat to RM via 
allocate() call.

Container allocation will happen when NodeMager sends heartbeats to RM. This is 
the reason for you allocation time reduced when  you decrease 
heartbet-interval-ms.

Why the application is not provided with all requested containers in first 
allocate call?
>> For the first call, RM updates request but allocation will happen when NM 
>> heartbeat to RM. So for 2nd call , containers will be received by AM.

Thanks & Regards
Rohith Sharma K S

From: Hamza Zafar [mailto:11bscshza...@seecs.edu.pk]
Sent: 22 November 2014 00:45
To: user@hadoop.apache.org
Subject: Resource Manager's container allocation behavior !

My Hadoop Cluster has 52GB memory , 56 virtual cores

Scenario: I submit an application to a default queue while there is no other 
application running on the cluster. I create a request for 32 containers with 
same priority, 512MB memory and 1 virtual core . In the first allocate call I 
receive 0 containers from RM, in further allocate calls I start receiving 
containers. I keep on sending allocate calls until all the containers have been 
allocated.

Why the application is not provided with all requested containers in first 
allocate call?


I changed the configuration property 
"yarn.resourcemanager.nodemanagers.heartbeat-interval-ms" from 1000ms to 100ms 
.Now at 100ms heatbeat interval the container allocation time has reduced, but 
still the AM has to make the same number of allocate calls as it was done 
before when the heartbeat interval was 1000ms.


RE: Change the blocksize in 2.5.1

2014-11-20 Thread Rohith Sharma K S
It seems HADOOP_CONF_DIR is poiniting different location!!?
May be you can check hdfs-site.xml is in classpath when you execute hdfs 
command.


Thanks & Regards
Rohith Sharma K S

-Original Message-
From: Tomás Fernández Pena [mailto:tf.p...@gmail.com] On Behalf Of Tomás 
Fernández Pena
Sent: 20 November 2014 15:41
To: user@hadoop.apache.org
Subject: Change the blocksize in 2.5.1

Hello everyone,

I've just installed Hadoop 2.5.1 from source code, and I have problems changing 
the default block size. My hdfs-site.xml file I've set the property

  
 dfs.blocksize
 67108864
  

to have blocks of 64 MB, but it seems that the system ignore this setting. When 
I copy a new file, it uses a block size of 128M. Only if I specify the block 
size when the file is created (ie hdfs dfs
-Ddfs.blocksize=$((64*1024*1024)) -put file .) it uses a block size of
64 MB.

Any idea?

Best regards

Tomas
--
Tomás Fernández Pena
Centro de Investigacións en Tecnoloxías da Información, CITIUS. Univ.
Santiago de Compostela
Tel: +34 881816439, Fax: +34 881814112,
https://citius.usc.es/equipo/persoal-adscrito/?tf.pena
Pubkey 1024D/81F6435A, Fprint=D140 2ED1 94FE 0112 9D03 6BE7 2AFF EDED
81F6 435A



RE: MR job fails with too many mappers

2014-11-18 Thread Rohith Sharma K S
If log aggregation is enabled, log folder will be deleted. So I suggest disable 
“yarn.log-aggregation-enable” and run job again. All the logs remains at log 
folder. Then you can find container logs

Thanks & Regards
Rohith Sharma K S

This e-mail and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
e-mail in error, please notify the sender by phone or email immediately and 
delete it!

From: francexo83 [mailto:francex...@gmail.com]
Sent: 18 November 2014 22:15
To: user@hadoop.apache.org
Subject: Re: MR job fails with too many mappers

Hi,

thank you for your quick response, but I was not able to see the logs for the 
container.

I get a  "no such file or directory" when I try to access the logs of the 
container from the shell:

cd /var/log/hadoop-yarn/containers/application_1416304409718_0032


It seems that the container has never been created.



thanks





2014-11-18 16:43 GMT+01:00 Rohith Sharma K S 
mailto:rohithsharm...@huawei.com>>:
Hi

Could you get syserr and sysout log for contrainer.? These logs will be 
available in the same location  syslog for container.
${yarn.nodemanager.log-dirs}//
This helps to find problem!!


Thanks & Regards
Rohith Sharma K S

From: francexo83 [mailto:francex...@gmail.com<mailto:francex...@gmail.com>]
Sent: 18 November 2014 20:53
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: MR job fails with too many mappers

Hi All,

I have a small  hadoop cluster with three nodes and HBase 0.98.1 installed on 
it.

The hadoop version is 2.3.0 and below my use case scenario.

I wrote a map reduce program that reads data from an hbase table and does some 
transformations on these data.
Jobs are very simple so they didn't need the  reduce phase. I also wrote a 
TableInputFormat  extension in order to maximize the number of concurrent maps 
on the cluster.
In other words, each  row should be processed by a single map task.

Everything goes well until the number of rows and consequently  mappers exceeds 
30 quota.

This is the only exception I see when the job fails:

Application application_1416304409718_0032 failed 2 times due to AM Container 
for appattempt_1416304409718_0032_02 exited with exitCode: 1 due to:


Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:511)
at org.apache.hadoop.util.Shell.run(Shell.java:424)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:656)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1


Cluster configuration details:
Node1: 12 GB, 4 core
Node2: 6 GB, 4 core
Node3: 6 GB, 4 core

yarn.scheduler.minimum-allocation-mb=2048
yarn.scheduler.maximum-allocation-mb=4096
yarn.nodemanager.resource.memory-mb=6144



Regards



RE: Starting YARN in HA mode Hadoop 2.5.1

2014-11-18 Thread Rohith Sharma K S
You need to start manually. Yarn does not support starting all the RM’s in 
cluster.

Thanks & Regards
Rohith Sharma K S

From: Jogeshwar Karthik Akundi [mailto:ajkart...@gmail.com]
Sent: 18 November 2014 19:15
To: user@hadoop.apache.org
Subject: Starting YARN in HA mode Hadoop 2.5.1

Hi,
I am using Hadoop 2.5.1 and trying to enable HA mode for NN and RM.

the start-dfs.sh script provided starts both the namenodes and all the 
datanodes.
However, the start-yarn.sh starts only one RM and all the NodeManagers.

till now, these scripts allowed me to startup the entire cluster from a single 
machine (the primary node). But now, the secondary RM is not starting up.

I tried to google around but couldn't find any information. Tried reading 
through the yarn-daemon*.sh scripts, but don't find a hint on how to start both 
the RMs at one shot.

Any pointers? Am I missing something?

--
There is no charge for awesomeness


RE: MR job fails with too many mappers

2014-11-18 Thread Rohith Sharma K S
Hi

Could you get syserr and sysout log for contrainer.? These logs will be 
available in the same location  syslog for container.
${yarn.nodemanager.log-dirs}//
This helps to find problem!!


Thanks & Regards
Rohith Sharma K S

From: francexo83 [mailto:francex...@gmail.com]
Sent: 18 November 2014 20:53
To: user@hadoop.apache.org
Subject: MR job fails with too many mappers

Hi All,

I have a small  hadoop cluster with three nodes and HBase 0.98.1 installed on 
it.

The hadoop version is 2.3.0 and below my use case scenario.

I wrote a map reduce program that reads data from an hbase table and does some 
transformations on these data.
Jobs are very simple so they didn't need the  reduce phase. I also wrote a 
TableInputFormat  extension in order to maximize the number of concurrent maps 
on the cluster.
In other words, each  row should be processed by a single map task.

Everything goes well until the number of rows and consequently  mappers exceeds 
30 quota.

This is the only exception I see when the job fails:

Application application_1416304409718_0032 failed 2 times due to AM Container 
for appattempt_1416304409718_0032_02 exited with exitCode: 1 due to:


Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:511)
at org.apache.hadoop.util.Shell.run(Shell.java:424)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:656)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1


Cluster configuration details:
Node1: 12 GB, 4 core
Node2: 6 GB, 4 core
Node3: 6 GB, 4 core

yarn.scheduler.minimum-allocation-mb=2048
yarn.scheduler.maximum-allocation-mb=4096
yarn.nodemanager.resource.memory-mb=6144



Regards


RE: How to set job-priority on a hadoop job

2014-11-03 Thread Rohith Sharma K S
Hi Sunil

In MR2v, there is no job priority. There is open Jira for ApplicationPriority 
that is still in progress.
https://issues.apache.org/jira/browse/YARN-1963
https://issues.apache.org/jira/browse/MAPREDUCE-5870

You need to wait untill this feature comes up!!

Thanks & Regards
Rohith Sharma K S


From: Sunil S Nandihalli [mailto:sunil.nandiha...@gmail.com]
Sent: 03 November 2014 10:02
To: user@hadoop.apache.org
Subject: How to set job-priority on a hadoop job

Hi Everybody,
 I see that we can set job priority on a hadoop job. I have been trying to do 
it using the following command.

hadoop job -set-priority job-id VERY_LOW

It does not seem to be working.. after that I noticed that
http://archive.cloudera.com/cdh/3/hadoop/capacity_scheduler.html

says that the job-priority on a queue is disabled by default. I would like to 
enable it. No amount of googling gave me an actionable to enable priorities on 
job-queues. Can somebody help?
Thanks,
Sunil


RE: YarnChild didn't be killed after running mapreduce

2014-10-31 Thread Rohith Sharma K S
This is strange!! Can you get ps -aef | grep  fro this process?
What is the application status in RM UI?

Thanks & Regards
Rohith Sharma  K S

This e-mail and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
e-mail in error, please notify the sender by phone or email immediately and 
delete it!

From: dwld0...@gmail.com [mailto:dwld0...@gmail.com]
Sent: 31 October 2014 13:05
To: user@hadoop.apache.org
Subject: YarnChild didn't be killed after running mapreduce

All
I runed  mapreduce example successfully,but it always appeared invalid process 
on the nodemanager nodes,as follow:


27398 DataNode
27961 Jps
13669 QuorumPeerMain
27822 -- process information unavailable
18349 ThriftServer
27557 NodeManager
I deleted this invalid process under /tmp/hsperfdata_yarn ,it will be there 
after  running mapreduce(yarn) again.
I had modified many parameters in yarn-site.xml and mapred-site.xml.
   yarn-site.xml
  
yarn.nodemanager.resource.memory-mb
4096


yarn.nodemanager.resource.cpu-vcores
2


yarn.scheduler.minimum-allocation-mb
256


yarn.scheduler.maximum-allocation-mb
2048


yarn.scheduler.minimum-allocation-vcores
1


yarn.scheduler.maximum-allocation-vcores
2

 mapred-site.xml

mapreduce.map.memory.mb
512


mapreduce.map.cpu.vcores
2


mapreduce.reduce.memory.mb
512


mapreduce.reduce.cpu.vcores
2

All didn't work.   It has been up for a long time.

There ware no error log,only found  some suspicious logs,as follow:
2014-10-31 14:35:59,306 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Starting resource-monitoring for container_1414736576842_0001_01_08
2014-10-31 14:35:59,350 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Memory usage of ProcessTree 27818 for container-id 
container_1414736576842_0001_01_08: 107.9 MB of 1 GB physical memory used; 
1.5 GB of 2.1 GB virtual memory used
2014-10-31 14:36:01,068 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting 
connection close header...
2014-10-31 14:36:01,702 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 Stopping container with container Id: container_1414736576842_0001_01_08
2014-10-31 14:36:01,702 INFO 
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=root 
IP=192.168.200.128 OPERATION=Stop Container Request TARGET=ContainerManageImpl 
RESULT=SUCCESS APPID=application_1414736576842_0001 
CONTAINERID=container_1414736576842_0001_01_08
2014-10-31 14:36:01,703 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1414736576842_0001_01_08 transitioned from RUNNING to 
KILLING
2014-10-31 14:36:01,703 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
 Cleaning up container container_1414736576842_0001_01_08
2014-10-31 14:36:01,724 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code 
from container container_1414736576842_0001_01_08 is : 143
2014-10-31 14:36:01,791 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1414736576842_0001_01_08 transitioned from KILLING to 
CONTAINER_CLEANEDUP_AFTER_KILL
2014-10-31 14:36:01,791 INFO 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting 
absolute path : 
/hadoop/yarn/local/usercache/root/appcache/application_1414736576842_0001/container_1414736576842_0001_01_08
2014-10-31 14:36:01,792 INFO 
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=root 
OPERATION=Container Finished - Killed TARGET=ContainerImpl RESULT=SUCCESS 
APPID=application_1414736576842_0001 
CONTAINERID=container_1414736576842_0001_01_08
2014-10-31 14:36:01,792 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1414736576842_0001_01_08 transitioned from 
CONTAINER_CLEANEDUP_AFTER_KILL to DONE
2014-10-31 14:36:01,792 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
 Removing container_1414736576842_0001_01_08 from application 
application_1414736576842_0001
2014-10-31 14:36:01,792 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
 Considering container container_1414736576842_0001_01_08 for 
log-aggregation
2014-10-31 14:36:01,793 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got 
event CONTAINER_STOP for appId application_1414736576842_0001




dwld0...@gmail.com<mailto:dwld0...@gmail.com>


RE: mapred job pending at "Starting scan to move intermediate done files"

2014-10-22 Thread Rohith Sharma K S
Hi,

This is problem with your memory configurations in cluster. You have configured 
"yarn.nodemanager.resource.memory-mb" as 64MB which is too low.


1.   ApplicationMaster required 2GB to launch container but cluster memory 
it self has 64MB. So container never get assigned.

2.   Further steps, map memory is 64MB, but map.opts has 1024MB in 
mapred-site.xml. Again it is contradictory.

Change NodeManger memory to 8GB and map/reduce memory to 2GB. Try running job.

Thanks & Regards
Rohith Sharma K S

This e-mail and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
e-mail in error, please notify the sender by phone or email immediately and 
delete it!

From: mail list [mailto:louis.hust...@gmail.com]
Sent: 23 October 2014 07:55
To: user@hadoop.apache.org
Subject: mapred job pending at "Starting scan to move intermediate done files"

hi, all,

I am new to hadoop, and I install the hadoop-2.5.1 on ubuntu  with 
Pseudo-distributed mode.
When I run a mapped job, the job output the following logs:

louis@ubuntu:~/src/hadoop-book$ hadoop jar hadoop-examples.jar 
v3.MaxTemperatureDriver input/ncdc/all max-temp
14/10/22 19:09:56 INFO client.RMProxy: Connecting to ResourceManager at 
/0.0.0.0:8032
14/10/22 19:09:57 INFO input.FileInputFormat: Total input paths to process : 2
14/10/22 19:09:58 INFO mapreduce.JobSubmitter: number of splits:2
14/10/22 19:09:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
job_1414030015373_0001
14/10/22 19:09:58 INFO impl.YarnClientImpl: Submitted application 
application_1414030015373_0001
14/10/22 19:09:58 INFO mapreduce.Job: The url to track the job: 
http://localhost:8088/proxy/application_1414030015373_0001/
14/10/22 19:09:58 INFO mapreduce.Job: Running job: job_1414030015373_0001

As you see, the job halt. Then I check the jps output:

louis@ubuntu:~/src/hadoop-2.5.1$<mailto:louis@ubuntu:~/src/hadoop-2.5.1$> jps
22433 SecondaryNameNode
22716 NodeManager
22240 DataNode
22577 ResourceManager
23083 JobHistoryServer
23148 Jps
22080 NameNode

It seems nothing wrong, then i check the  mapred-louis-historyserver-ubuntu.log:

2014-10-22 19:09:03,831 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
History Cleaner started
2014-10-22 19:09:03,837 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
History Cleaner complete
2014-10-22 19:11:33,830 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
Starting scan to move intermediate done files
2014-10-22 19:14:33,830 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
Starting scan to move intermediate done files
2014-10-22 19:17:33,832 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
Starting scan to move intermediate done files
2014-10-22 19:20:33,830 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
Starting scan to move intermediate done files

Then i check the web ui:
It seems the job is pending

The attachment contains some configuration files at etc/hadoop/ .
Any idea will be appreciated !






RE: Reduce fails always

2014-10-06 Thread Rohith Sharma K S
Hi

How much data does wordcount job is processing?
What is the disk space ("df -h" ) available in the node where it always fail?

The point I didn't understand is why it uses only one datanode disc space?
>>  For reducers task running, containers can be allocated at any node. I 
>> think, in your cluster one of the machines disk space is very low. So  
>> whichever the task running on that particular  node is failing.


Thanks & Regards
Rohith Sharma K S


From: Abdul Navaz [mailto:navaz@gmail.com]
Sent: 06 October 2014 08:21
To: user@hadoop.apache.org
Subject: Reduce fails always

Hi All,

I am running sample word count job  in a  9 node cluster and I am getting the 
below error message.


hadoop jar chiu-wordcount2.jar WordCount /user/hduser/getty/file1.txt 
/user/hduser/getty/out10 -D mapred.reduce.tasks=2

14/10/05 18:08:45 INFO mapred.JobClient:  map 99% reduce 26%

14/10/05 18:08:48 INFO mapred.JobClient:  map 99% reduce 28%

14/10/05 18:08:51 INFO mapred.JobClient:  map 100% reduce 28%

14/10/05 18:08:57 INFO mapred.JobClient:  map 98% reduce 0%

14/10/05 18:08:58 INFO mapred.JobClient: Task Id : 
attempt_201410051754_0003_r_00_0, Status : FAILED

FSError: java.io.IOException: No space left on device

14/10/05 18:08:59 WARN mapred.JobClient: Error reading task 
outputhttp://pcvm1-10.utahddc.geniracks.net:50060/tasklog?plaintext=true&attemptid=attempt_201410051754_0003_r_00_0&filter=stdout

14/10/05 18:08:59 WARN mapred.JobClient: Error reading task 
outputhttp://pcvm1-10.utahddc.geniracks.net:50060/tasklog?plaintext=true&attemptid=attempt_201410051754_0003_r_00_0&filter=stderr

14/10/05 18:08:59 INFO mapred.JobClient: Task Id : 
attempt_201410051754_0003_m_15_0, Status : FAILED

FSError: java.io.IOException: No space left on device

14/10/05 18:09:02 INFO mapred.JobClient:  map 99% reduce 0%

14/10/05 18:09:07 INFO mapred.JobClient:  map 99% reduce 1%


I can see it uses all disk space on one of the datanode when shuffling starts.  
As soon as disc space on the node becomes nill it throws me this error and job 
aborts. The point I didn't understand is why it uses only one datanode disc 
space.  I have change the number of reducer as 4 still it uses only one 
datanode disc and throws above error.


How can I fix this issue?


Thanks & Regards,

Navaz




RE: Cannot fine profiling log file

2014-09-23 Thread Rohith Sharma K S
HI


Have you enable log aggregation..?


1.   If log aggregation is enabled then you can get logs from hdfs below 
path. Both aggregated logs and profiler will be in same file.
 ${yarn.nodemanager.remote-app-log-dir}/${user}/logs//


If not enabled, then check inside
${yarn.nodemanager.log-dirs}///profile.out(default 
name)


Thanks & Regards
Rohith Sharma K S

From: Jakub Stransky [mailto:stransky...@gmail.com]
Sent: 23 September 2014 16:27
To: user@hadoop.apache.org
Subject: Cannot fine profiling log file

Hello experienced users,

I did try to use profiling of tasks during mapreduce

mapreduce.task.profile
true


mapreduce.task.profile.maps
0-5


mapreduce.task.profile.params

-agentlib:hprof=cpu=samples,heap=sites,depth=6,force=n,thread=y,verbose=n,file=%s


The file got generated I see that thrugh Resource Manager console but I can't 
find it from where to download.

Where to find that file or how to download it?

Thanks for any advices!

Jakub


RE: About extra containers being allocated in distributed shell example.

2014-09-23 Thread Rohith Sharma K S
This looks to be an open issue 
https://issues.apache.org/jira/i#browse/YARN-1902.

Thanks & Regards
Rohith Sharma K S

From: Smita Deshpande [mailto:smita.deshpa...@cumulus-systems.com]
Sent: 22 September 2014 10:45
To: user@hadoop.apache.org
Subject: RE: About extra containers being allocated in distributed shell 
example.

Any suggestion/workaround on this one?

-Smita

From: Smita Deshpande
Sent: Tuesday, September 16, 2014 3:00 PM
To: 'user@hadoop.apache.org'
Subject: About extra containers being allocated in distributed shell example.

Hi,
In YARN distributed shell example, I am setting up my request 
for containers to the RM using the following call   (I am asking for 9 
containers here)
  private ContainerRequest setupContainerAskForRM(Resource capability) {}
But when actually RMCallbackHandler allocates containers in 
following call   (I am getting 23 containers here)
  @Override
 public void onContainersAllocated(List allocatedContainers) 
{}

I am getting extra containers which expire after 600 seconds.
Will these extra launched containers which are not doing anything will have any 
performance issue in my application?

At one point in my application, out of 19K containers 12K containers expired 
because they were not used. Can anybody suggest any workaround on this or is it 
a bug?

-Smita


Why 2 different approach for deleting localized resources and aggregated logs?

2014-08-12 Thread Rohith Sharma K S
Hi

I see two different approach for deleting localized resources and 
aggregated logs.

1.   Localized resources are deleted based on the size of localizer cache, 
per local directory.

2.   Aggregated logs are deleted based on the time(if enabled).

   Is there any specific thoughts for 2 different implementations why it is?

   Can aggregated logs also can be deleted based on size?

Thanks & Regards
Rohith Sharma K S




RE: change yarn application priority

2014-05-30 Thread Rohith Sharma K S
Hi

   Currently there is no provision for changing application priority within the 
same queue.  Follow the Jira https://issues.apache.org/jira/i#browse/YARN-1963 
for this new feature.

One way you can achieve by using enabling scheduler monitors for 
CapacitySchedulers.
Steps to be follow is

1.   Configure 2 queues, follow 
http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html

2.   Enable scheduler monitor

yarn.resourcemanager.scheduler.monitor.enable = true

One job you submit to queue 1 which run 2hours. Another job you submit queue 2.

Hope this will help you.

Thanks & Regards
Rohith Sharma K S


This e-mail and its attachments contain confidential information from HUAWEI, 
which
is intended only for the person or entity whose address is listed above. Any 
use of the
information contained herein in any way (including, but not limited to, total 
or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify 
the sender by
phone or email immediately and delete it!

From: Henry Hung [mailto:ythu...@winbond.com]
Sent: 30 May 2014 11:53
To: user@hadoop.apache.org
Subject: change yarn application priority

HI All,

I have an application that consumes all of nodemanager capacity (30 Map and 1 
Reducer) and will need 4 hours to finish.
Let's say I need to run another application that will be quicker to finish (30 
minutes) and only need 1 Map and 1 Reducer.
If I just execute the new application, it will be in queue waiting for the 1st 
application to finish.
Is there a way to change the 2nd application priority to higher than the 1st 
and let resourcemanager immediately execute the 2nd application?

I'm using Hadoop-2.2.0.

Best regards,
Henry


The privileged confidential information contained in this email is intended for 
use only by the addressees as indicated by the original sender of this email. 
If you are not the addressee indicated in this email or are not responsible for 
delivery of the email to such a person, please kindly reply to the sender 
indicating this fact and delete all copies of it from your computer and network 
server immediately. Your cooperation is highly appreciated. It is advised that 
any unauthorized use of confidential information of Winbond is strictly 
prohibited; and any information in this email irrelevant to the official 
business of Winbond shall be deemed as neither given nor endorsed by Winbond.


RE: Cleanup activity on YARN containers

2014-04-08 Thread Rohith Sharma K S
  Is there something like shutdown hook for containers?
>> There is no containers specific shutdown hook.

I was telling about Java shutdown hook i.e 
'Runtime.getRuntime().addShutdownHook(Thread<http://docs.oracle.com/javase/7/docs/api/java/lang/Thread.html>
 hook)' during start of container JVM. In hook, clean up can be done.

Thanks & Regards
Rohith Sharma K S

From: Krishna Kishore Bonagiri [mailto:write2kish...@gmail.com]
Sent: 09 April 2014 10:49
To: user@hadoop.apache.org
Subject: Re: Cleanup activity on YARN containers

Hi Rohith,

  Is there something like shutdown hook for containers? Can you please also 
tell me how to use that?

Thanks,
Kishore

On Wed, Apr 9, 2014 at 8:34 AM, Rohith Sharma K S 
mailto:rohithsharm...@huawei.com>> wrote:
For local container clean up, can be cleaned at ShutDownHook. !!??

Thanks & Regards
Rohith Sharma K S

From: Krishna Kishore Bonagiri 
[mailto:write2kish...@gmail.com<mailto:write2kish...@gmail.com>]
Sent: 08 April 2014 20:01
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Re: Cleanup activity on YARN containers

Hi Rohith,

   Thanks for the reply.

  Mine is a YARN application. I have some files that are local to where the 
containers run on, and I want to clean them up at the end of the container 
execution. So, I want to do this cleanup on the same node my container ran on. 
With what you are suggesting, I can't delete the files local to the container.

   Is there any other way?

Thanks,
Kishore

On Tue, Apr 8, 2014 at 8:55 AM, Rohith Sharma K S 
mailto:rohithsharm...@huawei.com>> wrote:
Hi Kishore,

   Is jobs are submitted through MapReduce or Is it Yarn Application?


1.  For  MapReduce Framwork, framework itself provides facility to clean up 
per task level.
Is there any callback kind of facility, in which I can write some 
code to be executed on my container at the end of my application or at the end 
of that particular container execution?
>>>  You can override setup() and cleanup() for doing initialization and 
>>> cleanup of your task. This facility is provided by MapReduce framework.

The call flow of task execution is
 The framework first calls 
setup(org.apache.hadoop.mapreduce.Mapper.Context), followed by map(Object, 
Object, Context) / reduce(Object, Iterable, Context)  for each key/value pair. 
Finally cleanup(Context) is called.

Note : In clean up, do not hold container for more than 
"mapreduce.task.timeout". Because, once map/reduce is completed, progress will 
not be sent to applicationmaster(ping is not considered as status update). If 
your application is taking more than value configured for 
"mapreduce.task.timeout", then application master consider this task as 
timedout. In such case, you need to increase value for "mapreduce.task.timeout" 
based on your cleanup time.



2.   For Yarn Application, completed container's list are sent to 
ApplicationMaster in heartbeat.  Here you can do clean up activities for 
containers.

Hope this will help for you.  :)!!


Thanks & Regards
Rohith Sharma K S

From: Krishna Kishore Bonagiri 
[mailto:write2kish...@gmail.com<mailto:write2kish...@gmail.com>]
Sent: 07 April 2014 16:41
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Cleanup activity on YARN containers

Hi,

  Is there any callback kind of facility, in which I can write some code to be 
executed on my container at the end of my application or at the end of that 
particular container execution?

 I want to do some cleanup activities at the end of my application, and the 
clean up is not related to the localized resources that are downloaded from 
HDFS.

Thanks,
Kishore




RE: Cleanup activity on YARN containers

2014-04-08 Thread Rohith Sharma K S
For local container clean up, can be cleaned at ShutDownHook. !!??

Thanks & Regards
Rohith Sharma K S

From: Krishna Kishore Bonagiri [mailto:write2kish...@gmail.com]
Sent: 08 April 2014 20:01
To: user@hadoop.apache.org
Subject: Re: Cleanup activity on YARN containers

Hi Rohith,

   Thanks for the reply.

  Mine is a YARN application. I have some files that are local to where the 
containers run on, and I want to clean them up at the end of the container 
execution. So, I want to do this cleanup on the same node my container ran on. 
With what you are suggesting, I can't delete the files local to the container.

   Is there any other way?

Thanks,
Kishore

On Tue, Apr 8, 2014 at 8:55 AM, Rohith Sharma K S 
mailto:rohithsharm...@huawei.com>> wrote:
Hi Kishore,

   Is jobs are submitted through MapReduce or Is it Yarn Application?


1.  For  MapReduce Framwork, framework itself provides facility to clean up 
per task level.
Is there any callback kind of facility, in which I can write some 
code to be executed on my container at the end of my application or at the end 
of that particular container execution?
>>>  You can override setup() and cleanup() for doing initialization and 
>>> cleanup of your task. This facility is provided by MapReduce framework.

The call flow of task execution is
 The framework first calls 
setup(org.apache.hadoop.mapreduce.Mapper.Context), followed by map(Object, 
Object, Context) / reduce(Object, Iterable, Context)  for each key/value pair. 
Finally cleanup(Context) is called.

Note : In clean up, do not hold container for more than 
"mapreduce.task.timeout". Because, once map/reduce is completed, progress will 
not be sent to applicationmaster(ping is not considered as status update). If 
your application is taking more than value configured for 
"mapreduce.task.timeout", then application master consider this task as 
timedout. In such case, you need to increase value for "mapreduce.task.timeout" 
based on your cleanup time.



2.   For Yarn Application, completed container's list are sent to 
ApplicationMaster in heartbeat.  Here you can do clean up activities for 
containers.

Hope this will help for you.  :)!!


Thanks & Regards
Rohith Sharma K S

From: Krishna Kishore Bonagiri 
[mailto:write2kish...@gmail.com<mailto:write2kish...@gmail.com>]
Sent: 07 April 2014 16:41
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Cleanup activity on YARN containers

Hi,

  Is there any callback kind of facility, in which I can write some code to be 
executed on my container at the end of my application or at the end of that 
particular container execution?

 I want to do some cleanup activities at the end of my application, and the 
clean up is not related to the localized resources that are downloaded from 
HDFS.

Thanks,
Kishore



RE: Cleanup activity on YARN containers

2014-04-07 Thread Rohith Sharma K S
Hi Kishore,

   Is jobs are submitted through MapReduce or Is it Yarn Application?


1.  For  MapReduce Framwork, framework itself provides facility to clean up 
per task level.
Is there any callback kind of facility, in which I can write some 
code to be executed on my container at the end of my application or at the end 
of that particular container execution?
>>>  You can override setup() and cleanup() for doing initialization and 
>>> cleanup of your task. This facility is provided by MapReduce framework.

The call flow of task execution is
 The framework first calls 
setup(org.apache.hadoop.mapreduce.Mapper.Context),
 followed by map(Object, Object, 
Context)
 / reduce(Object, Iterable, 
Context)
  for each key/value pair. Finally 
cleanup(Context)
 is called.

Note : In clean up, do not hold container for more than 
"mapreduce.task.timeout". Because, once map/reduce is completed, progress will 
not be sent to applicationmaster(ping is not considered as status update). If 
your application is taking more than value configured for 
"mapreduce.task.timeout", then application master consider this task as 
timedout. In such case, you need to increase value for "mapreduce.task.timeout" 
based on your cleanup time.



2.   For Yarn Application, completed container's list are sent to 
ApplicationMaster in heartbeat.  Here you can do clean up activities for 
containers.

Hope this will help for you.  :)!!


Thanks & Regards
Rohith Sharma K S

From: Krishna Kishore Bonagiri [mailto:write2kish...@gmail.com]
Sent: 07 April 2014 16:41
To: user@hadoop.apache.org
Subject: Cleanup activity on YARN containers

Hi,

  Is there any callback kind of facility, in which I can write some code to be 
executed on my container at the end of my application or at the end of that 
particular container execution?

 I want to do some cleanup activities at the end of my application, and the 
clean up is not related to the localized resources that are downloaded from 
HDFS.

Thanks,
Kishore


RE: Job fails if I change HADOOP_USER_NAME

2014-03-23 Thread Rohith Sharma K S
Hi Ashwin,

How I enable debug for AM container logs ?
>>  Set below configurations changing log level for AM, map and reducer task. 
>> Default values are INFO.
 yarn.app.mapreduce.am.log.level
 mapreduce.map.log.level
 mapreduce.reduce.log.level

and to which location are they written to ?
>> These are written into  {yarn.nodemanager.log-dirs}/ while 
>> executing job.
  Once application is finished,

1.   If log aggregation is enabled, then all container logs aggregated to 
HDFS. The log path in hdfs is {yarn.nodemanager.remote-app-log-dir}/${user}

2.   If log aggregation is disabled,then all container logs remain in local 
machine where containers has run i.e 
{yarn.nodemanager.log-dirs}/mailto:ashwinshanka...@gmail.com]
Sent: 22 March 2014 03:38
To: user@hadoop.apache.org
Subject: Re: Job fails if I change HADOOP_USER_NAME

Hi Rohit,
How I enable debug for AM container logs ? and to which location are they 
written to ?
I tried changing log4j.prop and can see DEBUGs for RM,NM etc but I don't see AM 
related debug logs.

Thanks,
Ashwin

On Fri, Mar 21, 2014 at 3:05 AM, Rohith Sharma K S 
mailto:rohithsharm...@huawei.com>> wrote:
Hi

The below stack trace is generic for any am launcher failed to launch. Can 
debug on AM container logs, so get proper stacktrace.?


Thanks & Regards
Rohith Sharma K S

From: Ashwin Shankar 
[mailto:ashwinshanka...@gmail.com<mailto:ashwinshanka...@gmail.com>]
Sent: 21 March 2014 14:02
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Job fails if I change HADOOP_USER_NAME

Hi,
I'm writing a new feature in Fair scheduler and wanted to test it out
by running jobs submitted by different users from my laptop.

My sleep job runs fine as long as the user name is my mac user name.
If I change my hadoop user name by setting HADOOP_USER_NAME,
my jobs fail with the exception org.apache.hadoop.util.Shell$ExitCodeException.
I also tried creating a new user account on my laptop and running a job as that 
user but I get the same exception.

Please let me know if any of you have come across this.
I tried changing ulimits max proc(to 1024),but doesn't solve the problem.

Here is the stack trace :

Job job_1395389889916_0001 failed with state FAILED due to: Application 
application_1395389889916_0001 failed 3 times due to AM Container for 
appattempt_1395389889916_0001_03 exited with  exitCode: 1 due to: Exception 
from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)

--
Thanks,
Ashwin



--
Thanks,
Ashwin



RE: Job fails if I change HADOOP_USER_NAME

2014-03-21 Thread Rohith Sharma K S
Hi

The below stack trace is generic for any am launcher failed to launch. Can 
debug on AM container logs, so get proper stacktrace.?


Thanks & Regards
Rohith Sharma K S

From: Ashwin Shankar [mailto:ashwinshanka...@gmail.com]
Sent: 21 March 2014 14:02
To: user@hadoop.apache.org
Subject: Job fails if I change HADOOP_USER_NAME

Hi,
I'm writing a new feature in Fair scheduler and wanted to test it out
by running jobs submitted by different users from my laptop.

My sleep job runs fine as long as the user name is my mac user name.
If I change my hadoop user name by setting HADOOP_USER_NAME,
my jobs fail with the exception org.apache.hadoop.util.Shell$ExitCodeException.
I also tried creating a new user account on my laptop and running a job as that 
user but I get the same exception.

Please let me know if any of you have come across this.
I tried changing ulimits max proc(to 1024),but doesn't solve the problem.

Here is the stack trace :

Job job_1395389889916_0001 failed with state FAILED due to: Application 
application_1395389889916_0001 failed 3 times due to AM Container for 
appattempt_1395389889916_0001_03 exited with  exitCode: 1 due to: Exception 
from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)

--
Thanks,
Ashwin



RE: NodeHealthReport local-dirs turned bad

2014-03-19 Thread Rohith Sharma K S
Hi

There is no relation to NameNode format

Does NodeManger is started with default configuration? If no , any NodeManger 
health script is configured?

Suspect can be  
1. /hadoop does not have permission or 
2. disk is full

Thanks & Regards
Rohith Sharma K S


-Original Message-
From: Margusja [mailto:mar...@roo.ee] 
Sent: 19 March 2014 17:04
To: user@hadoop.apache.org
Subject: NodeHealthReport local-dirs turned bad

Hi

I have one node in unhealthy status:




Total Vmem allocated for Containers 4.20 GB
Vmem enforcement enabledfalse
Total Pmem allocated for Container  2 GB
Pmem enforcement enabledfalse
NodeHealthyStatus   false
LastNodeHealthTime  Wed Mar 19 13:31:24 EET 2014
NodeHealthReport1/1 local-dirs turned bad: /hadoop/yarn/local;1/1 
log-dirs turned bad: /hadoop/yarn/log
Node Manager Version:   2.2.0.2.0.6.0-101 from 
b07b2906c36defd389c8b5bd22bebc1bead8115b by jenkins source checksum
82bd166aa0ada92b44f8a154836b92 on 2014-01-09T05:24Z
Hadoop Version: 2.2.0.2.0.6.0-101 from 
b07b2906c36defd389c8b5bd22bebc1bead8115b by jenkins source checksum
704f1e463ebc4fb89353011407e965 on 2014-01-09T05:18Z



I tried:
Deleted /hadoop/* and did namenode -format again Restarted nodemanager but 
still in unhealthy mode.

Is there any guideline what I should do?

--
Best regards, Margus (Margusja) Roo
+372 51 48 780
http://margus.roo.ee
http://ee.linkedin.com/in/margusroo
skype: margusja
ldapsearch -x -h ldap.sk.ee -b c=EE "(serialNumber=37303140314)"



RE: How to configure nodemanager.health-checker.script.path

2014-03-18 Thread Rohith Sharma K S
Hi

Health script should execute successfully. If your health check required to 
fail, than add ERROR that print in console.  This is because health script may 
fail because of Syntax error, Command not found(IOexception) or several other 
reasons.

In order to work health script,
Do not add "exit -1".

#!/bin/bash
echo "ERROR disk full"

Thanks & Regards
Rohith Sharma K S

From: Anfernee Xu [mailto:anfernee...@gmail.com]
Sent: 19 March 2014 10:32
To: user
Subject: How to configure nodemanager.health-checker.script.path

Hello,

I'm running MR with 2.2.0 release, I noticed we can configure 
"nodemanager.health-checker.script.path" in yarn-site.xml to customize NM 
health checking, so I add below properties to yarn-site.xml

 
 yarn.nodemanager.health-checker.script.path
 /scratch/software/hadoop2/hadoop-dc/node_health.sh
   

  
 yarn.nodemanager.health-checker.interval-ms
 1
   

To get a feel about this, the 
/scratch/software/hadoop2/hadoop-dc/node_health.sh simply print an ERROR 
message as below

#!/bin/bash
echo "ERROR disk full"
exit -1

But it seems not working, the node is still in health state, did I missing 
something?

Thanks for your help.
--
--Anfernee


RE: issue of "Log aggregation has not completed or is not enabled."

2014-03-18 Thread Rohith Sharma K S
Just for confirmation,

1.   Does NodeManager is restarted after enabling LogAggregation? If Yes, 
check for NodeManager start up logs for Log Aggregation Service start is 
success.


Thanks & Regards
Rohith Sharma K S

From: ch huang [mailto:justlo...@gmail.com]
Sent: 18 March 2014 13:09
To: user@hadoop.apache.org
Subject: issue of "Log aggregation has not completed or is not enabled."

hi,maillist:
 i try look application log use the following process

# yarn application -list
Application-Id  Application-Name  User   
Queue   State Final-State   
  Tracking-URL
application_1395126130647_0014  select user_id as userid, 
adverti...stattime(Stage-1) hivehive
FINISHED   SUCCEEDED  
ch18:19888/jobhistory/job/job_1395126130647_0014
# yarn logs -applicationId application_1395126130647_0014
Logs not available at 
/var/log/hadoop-yarn/apps/root/logs/application_1395126130647_0014
Log aggregation has not completed or is not enabled.

but i do enable Log aggregation function ,here is my yarn-site.xml 
configuration about log aggregation

  
yarn.log-aggregation-enable
true
  
  
Where to aggregate logs to.
yarn.nodemanager.remote-app-log-dir
/var/log/hadoop-yarn/apps
  

the application logs is not put on hdfs successfully,why?

# hadoop fs -ls 
/var/log/hadoop-yarn/apps/root/logs/application_1395126130647_0014
ls: `/var/log/hadoop-yarn/apps/root/logs/application_1395126130647_0014': No 
such file or directory


RE: ResourceManager shutting down

2014-03-13 Thread Rohith Sharma K S
Hi Hitesh,

  Yes it is an issue. This is handled in 
https://issues.apache.org/jira/i#browse/YARN-713 fixes DNS Issue. This fix 
available on hadoop-2.4(unreleased).


Thanks & Regards
Rohith Sharma K S

-Original Message-
From: Hitesh Shah [mailto:hit...@apache.org] 
Sent: 14 March 2014 09:03
To: user@hadoop.apache.org
Subject: Re: ResourceManager shutting down

Hi John

Would you mind filing a jira with more details. The RM going down just because 
a host was not resolvable or DNS timed out is something that should be 
addressed.

thanks
-- Hitesh

On Mar 13, 2014, at 2:29 PM, John Lilley wrote:

> Never mind... we figured out its DNS entry was going missing.
> john
>  
> From: John Lilley [mailto:john.lil...@redpoint.net]
> Sent: Thursday, March 13, 2014 2:52 PM
> To: user@hadoop.apache.org
> Subject: ResourceManager shutting down
>  
> We have this erratic behavior where every so often the RM will shutdown with 
> an UnknownHostException.  The odd thing is, the host it complains about have 
> been in use for days at that point without problem.  Any ideas?
> Thanks,
> John
>  
>  
> 2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl 
> (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State 
> change from ACCEPTED to RUNNING
> 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager 
> (ResourceManager.java:run(449)) - Error in handling event type 
> NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: java.net.UnknownHostException: 
> skitzo.office.datalever.com
> at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
> at 
> org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
> ... 15 more
> 2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager 
> (ResourceManager.java:run(453)) - Exiting, bbye..
> 2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) - 
> Stopped selectchannelconnec...@metallica.office.datalever.com:8088
> 2014-03-13 14:38:16,013 ERROR 
> delegation.AbstractDelegationTokenSecretManager 
> (AbstractDelegationTokenSecretManager.java:run(557)) - 
> InterruptedExcpetion recieved for ExpiredTokenRemover thread 
> java.lang.InterruptedException: sleep interrupted
> 2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics 
> system...
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system 
> shutdown complete.
> 2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher 
> (ApplicationMasterLauncher.java:run(98)) - 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
>  interrupted. Returning.
> 2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) - 
> Stopping server on 8141
> 2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) - 
> Stopping server on 8050 ... and so on, it shuts down
>  



RE: NodeManager health Question

2014-03-13 Thread Rohith Sharma K S
Hi ,

  As troubleshooting, few things  you can verify

1. check RM web UI for "Is there any 'Active Nodes' in Yarn cluster"?. 
http://< yarn.resourcemanager.webapp.address>/cluster.

And also verify for "Lost Nodes" or "Unhealthy Nodes" or "Rebooted Nodes".
 If there any active nodes, then cross verify for "Memory 
Total". This should be "Memory Total  = Number of Active Nodes * value of { 
yarn.nodemanager.resource.memory-mb }"


2. NodeManger logs give more information. NM logs also check.

>>> In Yarn, my Hive queries are "Accepted" but are "Unassigned" and do not run
 This may be  your Yarn Cluster does not have enough memory to 
launch container. Possible reason could be

1. None of the NM are sending heart beat to RM.(check RM Web UI for 
Unhealthy Nodes)

2. All the NM are lost/unhealthy.

3. Full cluster capacity is Used. So yarn scheduler is waiting for some 
container to get over, so it can assign released memory to other containers.

Looking into  your DataNode socket timeout exception ( that too 8 
minutes!!!), I suspect that Hadoop cluster Network is UNSTABLE. Better to debug 
on network.


Thanks & Regards
Rohith Sharma K S

From: Clay McDonald [mailto:stuart.mcdon...@bateswhite.com]
Sent: 14 March 2014 01:30
To: 'user@hadoop.apache.org'
Subject: NodeManager health Question

Hello all, I have laid out my POC in a project plan and have HDP 2.0 installed. 
HDFS is running fine and have loaded up about 6TB of data to run my test on. I 
have a series of SQL queries that I will run in Hive ver. 0.12.0. I had to 
manually install Hue and still have a few issues I'm working on there. But at 
the moment, my most pressing issue is with Hive jobs not running. In Yarn, my 
Hive queries are "Accepted" but are "Unassigned" and do not run. See attached.

In Ambari, the datanodes all have the following error; NodeManager health CRIT 
for 20 days CRITICAL: NodeManager unhealthy

>From the datanode logs I found the following;

ERROR datanode.DataNode (DataXceiver.java:run(225)) - 
dc-bigdata1.bateswhite.com:50010:DataXceiver error processing READ_BLOCK 
operation  src: /172.20.5.147:51299 dest: /172.20.5.141:50010
java.net.SocketTimeoutException: 48 millis timeout while waiting for 
channel to be ready for write. ch : java.nio.channels.SocketChannel[connected 
local=/172.20.5.141:50010 remote=/172.20.5.147:51299]
at 
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
at 
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:546)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:710)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:340)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:101)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:65)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
at java.lang.Thread.run(Thread.java:662)

Also, in the namenode log I see the following;

2014-03-13 13:50:57,204 WARN  security.UserGroupInformation 
(UserGroupInformation.java:getGroupNames(1355)) - No groups available for user 
dr.who


If anyone can point me in the right direction to troubleshoot this, I would 
really appreciate it!

Thanks! Clay


RE: class org.apache.hadoop.yarn.proto.YarnProtos$ApplicationIdProto overrides final method getUnknownFields

2014-03-03 Thread Rohith Sharma K S
Hi

  The reason for " 
org.apache.hadoop.yarn.proto.YarnProtos$ApplicationIdProto overrides final 
method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet" is hadoop is 
compiled with protoc-2.5.0 version, but in the classpath lower version of 
protobuf is present.

1. Check MRAppMaster classpath, which version of protobuf is in classpath. 
Expected to have 2.5.0 version.
   

Thanks & Regards
Rohith Sharma K S



-Original Message-
From: Margusja [mailto:mar...@roo.ee] 
Sent: 03 March 2014 22:45
To: user@hadoop.apache.org
Subject: Re: class org.apache.hadoop.yarn.proto.YarnProtos$ApplicationIdProto 
overrides final method getUnknownFields

Hi

2.2.0 and 2.3.0 gave me the same container log.

A little bit more details.
I'll try to use external java client who submits job.
some lines from maven pom.xml file:
 
   org.apache.hadoop
   hadoop-client
   2.3.0
 
 
 org.apache.hadoop
 hadoop-core
 1.2.1
 

lines from external client:
...
2014-03-03 17:36:01 INFO  FileInputFormat:287 - Total input paths to process : 1
2014-03-03 17:36:02 INFO  JobSubmitter:396 - number of splits:1
2014-03-03 17:36:03 INFO  JobSubmitter:479 - Submitting tokens for job: 
job_1393848686226_0018
2014-03-03 17:36:04 INFO  YarnClientImpl:166 - Submitted application
application_1393848686226_0018
2014-03-03 17:36:04 INFO  Job:1289 - The url to track the job: 
http://vm38.dbweb.ee:8088/proxy/application_1393848686226_0018/
2014-03-03 17:36:04 INFO  Job:1334 - Running job: job_1393848686226_0018
2014-03-03 17:36:10 INFO  Job:1355 - Job job_1393848686226_0018 running in uber 
mode : false
2014-03-03 17:36:10 INFO  Job:1362 -  map 0% reduce 0%
2014-03-03 17:36:10 INFO  Job:1375 - Job job_1393848686226_0018 failed with 
state FAILED due to: Application application_1393848686226_0018 failed 2 times 
due to AM Container for
appattempt_1393848686226_0018_02 exited with  exitCode: 1 due to: 
Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
 at org.apache.hadoop.util.Shell.run(Shell.java:379)
 at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
 at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
 at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
 at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
...

Lines from namenode:
...
14/03/03 19:12:42 INFO namenode.FSEditLog: Number of transactions: 900 Total 
time for transactions(ms): 69 Number of transactions batched in
Syncs: 0 Number of syncs: 542 SyncTimes(ms): 9783
14/03/03 19:12:42 INFO BlockStateChange: BLOCK* addToInvalidates: 
blk_1073742050_1226 90.190.106.33:50010
14/03/03 19:12:42 INFO hdfs.StateChange: BLOCK* allocateBlock: 
/user/hduser/input/data666.noheader.data. 
BP-802201089-90.190.106.33-1393506052071
blk_1073742056_1232{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[90.190.106.33:50010|RBW]]}
14/03/03 19:12:44 INFO hdfs.StateChange: BLOCK* InvalidateBlocks: ask
90.190.106.33:50010 to delete [blk_1073742050_1226]
14/03/03 19:12:53 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap
updated: 90.190.106.33:50010 is added to 
blk_1073742056_1232{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[90.190.106.33:50010|RBW]]} size 0
14/03/03 19:12:53 INFO hdfs.StateChange: DIR* completeFile: 
/user/hduser/input/data666.noheader.data is closed by
DFSClient_NONMAPREDUCE_-915999412_15
14/03/03 19:12:54 INFO BlockStateChange: BLOCK* addToInvalidates: 
blk_1073742051_1227 90.190.106.33:50010
14/03/03 19:12:54 INFO hdfs.StateChange: BLOCK* allocateBlock: 
/user/hduser/input/data666.noheader.data.info. 
BP-802201089-90.190.106.33-1393506052071
blk_1073742057_1233{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[90.190.106.33:50010|RBW]]}
14/03/03 19:12:54 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap
updated: 90.190.106.33:50010 is added to 
blk_1073742057_1233{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1,
replicas=[ReplicaUnderConstruction[90.190.106.33:50010|RBW]]} size 0
14/03/03 19:12:54 INFO hdfs.StateChange: DIR* completeFile: 
/user/hduser/input/data666.noheader.data.info is closed by
DFSClient_NONMAPREDUCE_-915999412_15
14/03/03 19:12:55 INFO hdfs.StateChange: BLOCK* allocateBlock: 
/user/hduser/.staging/job_1393848686226_0019/job

RE: Problem in Submitting a Map-Reduce Job to Remote Hadoop Cluster

2014-03-02 Thread Rohith Sharma K S
One more configuration to be added

config.set("mapreduce.framework.name<http://mapreduce.framework.name>","yarn");

Thanks
Rohith

From: Rohith Sharma K S [mailto:rohithsharm...@huawei.com]
Sent: 03 March 2014 09:02
To: user@hadoop.apache.org
Subject: RE: Problem in Submitting a Map-Reduce Job to Remote Hadoop Cluster

Hi,

 Set below configuration in your word count job.

Configuration config= new Configuration();
config.set("fs.default.name<http://fs.default.name>","hdfs://xyz-hostname:9000");
config.set("mapred.job.tracker","xyz-hostname:9001");
config.set("yarn.application.classpath ","$HADOOP_CONF_DIR, 
$HADOOP_COMMON_HOME/share/hadoop/common/*, 
$HADOOP_COMMON_HOME/share/hadoop/common/lib/*, 
$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,
$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*, $YARN_HOME/share/hadoop/mapreduce/*, 
$YARN_HOME/share/hadoop/mapreduce/lib/*, $YARN_HOME/share/hadoop/yarn/*,  
$YARN_HOME/share/hadoop/yarn/lib/*");



yarn.application.classpath
$HADOOP_CONF_DIR,
  $HADOOP_COMMON_HOME/share/hadoop/common/*,
  $HADOOP_COMMON_HOME/share/hadoop/common/lib/*,
  $HADOOP_HDFS_HOME/share/hadoop/hdfs/*,
  $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,
  $YARN_HOME/share/hadoop/mapreduce/*,
  $YARN_HOME/share/hadoop/mapreduce/lib/*,
  $YARN_HOME/share/hadoop/yarn/*,
  $YARN_HOME/share/hadoop/yarn/lib/*


Thanks & Regards
Rohith Sharma K S

From: Senthil Sekar [mailto:senthil...@gmail.com]
Sent: 01 March 2014 19:41
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Problem in Submitting a Map-Reduce Job to Remote Hadoop Cluster

Hi ,

 I have a remote server (Cent - OS - 6.3 ) with CDH-4.0.1 installed.

 I do have another Windows-7.0 machine from which iam trying to submit simple 
WordCount Map reduce job (i have included the HADOOP - 2.0.0  lib Jars in my 
Eclipse environment)

I am getting the below Exception when i try to run it from ECLIPSE of my 
Windows7 Machine
//---
Exception in thread "main" java.io.IOException: Cannot initialize Cluster. 
Please check your configuration for 
mapreduce.framework.name<http://mapreduce.framework.name> and the correspond 
server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:83)
at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:76)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:487)
at org.apache.hadoop.mapred.JobClient.(JobClient.java:466)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:879)
at com.pss.WordCount.main(WordCount.java:79)

//-

Please find the code below

//-
public class WordCount {

public static class Map extends MapReduceBase implements 
Mapper
{
private final static IntWritable one = new 
IntWritable(1);
private Text word = new Text();

@Override
public void map(LongWritable key, Text value,
OutputCollector output, Reporter reporter)
throws IOException {
String line=value.toString();
StringTokenizer tokenizer = new 
StringTokenizer(line);
while (tokenizer.hasMoreTokens())
{
word.set(tokenizer.nextToken());
output.collect(word, one);
}

}

}

public static class Reduce extends MapReduceBase implements Reducer
{

@Override
public void reduce(Text key, Iterator 
values,
OutputCollector output, Reporter reporter)
throws IOException {
// TODO Auto-generated method stub

int sum=0;
while(values.hasNext())
{
sum+=values.next().get();
}
output.collect(key, new IntWritable(sum));
}

}

public static void main(String[] args) throws IOException {
Configuration config= new Configuration();

config.set("fs.default.name<http://fs.default.name>","hdfs://xyz-hostname:9000")

RE: Problem in Submitting a Map-Reduce Job to Remote Hadoop Cluster

2014-03-02 Thread Rohith Sharma K S
Hi,

 Set below configuration in your word count job.

Configuration config= new Configuration();
config.set("fs.default.name<http://fs.default.name>","hdfs://xyz-hostname:9000");
config.set("mapred.job.tracker","xyz-hostname:9001");
config.set("yarn.application.classpath ","$HADOOP_CONF_DIR, 
$HADOOP_COMMON_HOME/share/hadoop/common/*, 
$HADOOP_COMMON_HOME/share/hadoop/common/lib/*, 
$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,
$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*, $YARN_HOME/share/hadoop/mapreduce/*, 
$YARN_HOME/share/hadoop/mapreduce/lib/*, $YARN_HOME/share/hadoop/yarn/*,  
$YARN_HOME/share/hadoop/yarn/lib/*");



yarn.application.classpath
$HADOOP_CONF_DIR,
  $HADOOP_COMMON_HOME/share/hadoop/common/*,
  $HADOOP_COMMON_HOME/share/hadoop/common/lib/*,
  $HADOOP_HDFS_HOME/share/hadoop/hdfs/*,
  $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,
  $YARN_HOME/share/hadoop/mapreduce/*,
  $YARN_HOME/share/hadoop/mapreduce/lib/*,
  $YARN_HOME/share/hadoop/yarn/*,
  $YARN_HOME/share/hadoop/yarn/lib/*


Thanks & Regards
Rohith Sharma K S

From: Senthil Sekar [mailto:senthil...@gmail.com]
Sent: 01 March 2014 19:41
To: user@hadoop.apache.org
Subject: Problem in Submitting a Map-Reduce Job to Remote Hadoop Cluster

Hi ,

 I have a remote server (Cent - OS - 6.3 ) with CDH-4.0.1 installed.

 I do have another Windows-7.0 machine from which iam trying to submit simple 
WordCount Map reduce job (i have included the HADOOP - 2.0.0  lib Jars in my 
Eclipse environment)

I am getting the below Exception when i try to run it from ECLIPSE of my 
Windows7 Machine
//---
Exception in thread "main" java.io.IOException: Cannot initialize Cluster. 
Please check your configuration for 
mapreduce.framework.name<http://mapreduce.framework.name> and the correspond 
server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:83)
at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:76)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:487)
at org.apache.hadoop.mapred.JobClient.(JobClient.java:466)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:879)
at com.pss.WordCount.main(WordCount.java:79)

//-

Please find the code below

//-
public class WordCount {

public static class Map extends MapReduceBase implements 
Mapper
{
private final static IntWritable one = new 
IntWritable(1);
private Text word = new Text();

@Override
public void map(LongWritable key, Text value,
OutputCollector output, Reporter reporter)
throws IOException {
String line=value.toString();
StringTokenizer tokenizer = new 
StringTokenizer(line);
while (tokenizer.hasMoreTokens())
{
word.set(tokenizer.nextToken());
output.collect(word, one);
}

}

}

public static class Reduce extends MapReduceBase implements Reducer
{

@Override
public void reduce(Text key, Iterator 
values,
OutputCollector output, Reporter reporter)
throws IOException {
// TODO Auto-generated method stub

int sum=0;
while(values.hasNext())
{
sum+=values.next().get();
}
output.collect(key, new IntWritable(sum));
}

}

public static void main(String[] args) throws IOException {
Configuration config= new Configuration();

config.set("fs.default.name<http://fs.default.name>","hdfs://xyz-hostname:9000");
config.set("mapred.job.tracker","xyz-hostname:9001");


JobConf conf= new JobConf(config);

conf.setJarByClass(WordCount.class);
//conf.setJar(jar);

conf.setJobName("WordCount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);

RE: RM AM_RESYNC signal to AM

2014-02-27 Thread Rohith Sharma K S
Hi Gaurav

If NodeManage is killed, then containers running on this NM won't be killed 
immediately. RM holds node information for 10 minutes(default node expiry). 
Possibly there should be

1.   After 10 minutes , container is killed.

2.   NM is killed and restarted before 10 minutes.


1.   In what all scenarios does the RM sends AM_RESYNC signal to AM?
>>>  In two scenario's RM sends AM_RESYNC to AM.

a.   When there is responseID mismatch.  AM sends response id to RM in 
registration and every heart beat. RM validate responseId in every heartbeat 
sent by AM.

b.   When application attempts does not exist in RM cache. In your case, 
this scenario might be occurring. When NM is killed, it removed all the attempt 
data from RM. But still appliclation master is trying to connect RM.



2.   Should the RM not send the AM_SHUTDOWN signal to AM when node manager 
is killed?

>> As such AM_SHUTDOWN is NOT sent from RM. Community may be planning 
>> improvement on this.



Thanks & Regards
Rohith Sharma K S


From: Gaurav Gupta [mailto:gau...@datatorrent.com]
Sent: 28 February 2014 00:03
To: user@hadoop.apache.org
Subject: RM AM_RESYNC signal to AM

Hi,

I killed the node manager on the node where AM was running and the AM master 
got the AM_RESYNC command signal from RM. I have following questions

3.   In what all scenarios does the RM sends AM_RESYNC signal to AM?

4.   Should the RM not send the AM_SHUTDOWN signal to AM when node manager 
is killed?

Thanks
-Gaurav



JobHistoryEventHandler failed with AvroTypeException.

2014-02-21 Thread Rohith Sharma K S
Hi all,

I am using Hadoop-2.3 for Yarn Cluster.

While running job, I encountered below exception in MRAppmaster.  Why this 
error is logging?

2014-02-21 22:10:33,841 INFO [Thread-355] 
org.apache.hadoop.service.AbstractService: Service JobHistoryEventHandler 
failed in state STOPPED; cause: org.apache.avro.AvroTypeException: Attempt to 
process a enum when a string was expected.
org.apache.avro.AvroTypeException: Attempt to process a enum when a string was 
expected.
at org.apache.avro.io.parsing.Parser.advance(Parser.java:93)
at 
org.apache.avro.io.JsonEncoder.writeEnum(JsonEncoder.java:217)
at 
org.apache.avro.specific.SpecificDatumWriter.writeEnum(SpecificDatumWriter.java:54)
at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:67)
at 
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:106)
at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
at 
org.apache.hadoop.mapreduce.jobhistory.EventWriter.write(EventWriter.java:66)
at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$MetaInfo.writeEvent(JobHistoryEventHandler.java:870)
at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:517)
at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:332)
at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at 
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
at 
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
at 
org.apache.hadoop.service.CompositeService.stop(CompositeService.java:159)
at 
org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1386)
at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:550)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:602)

Thanks & Regards
Rohith Sharma K S


RE: what happens to a client attempting to get a new app when the resource manager is already down

2014-02-05 Thread Rohith Sharma K S
   Default Retry time period is 15 minutes. Setting configuration  
"yarn.resourcemanager.connect.max-wait.ms" to lesser value,  retry period can 
be reduced in client side.

Thanks  & Regards
Rohith Sharma K S

From: Vinod Kumar Vavilapalli [mailto:vino...@hortonworks.com] On Behalf Of 
Vinod Kumar Vavilapalli
Sent: 05 February 2014 22:43
To: user@hadoop.apache.org; REYANE OUKPEDJO
Subject: Re: what happens to a client attempting to get a new app when the 
resource manager is already down

Is this on trunk or a released version?

I think the default behavior (when RM HA is not enabled) shouldn't have client  
loop forever. Let me know and we can see if this needs fixing.

Thanks,
+vinod


On Jan 31, 2014, at 7:52 AM, REYANE OUKPEDJO 
mailto:r.oukpe...@yahoo.com>> wrote:


Hi there,

I am trying to solve a problem. My client run as a server. And was trying to 
make my client aware about the fact the resource manager is down but I could 
not figure out. The reason is that the call :  yarnClient.createApplication(); 
never return when the resource manager is down. However it just stay in a loops 
and sleep after 10 iteration and continue the same loops. Below you can find 
the logs. Any idea how to leave this loop ? is there any parameter that control 
the number of seconds before giving up.

Thanks

Reyane OUKPEDJO







logs
14/01/31 10:48:05 INFO ipc.Client: Retrying connect to server: 
isblade2/9.32.160.125:8032. Already tried 8 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/31 10:48:06 INFO ipc.Client: Retrying connect to server: 
isblade2/9.32.160.125:8032. Already tried 9 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/31 10:48:37 INFO ipc.Client: Retrying connect to server: 
isblade2/9.32.160.125:8032. Already tried 0 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/31 10:48:38 INFO ipc.Client: Retrying connect to server: 
isblade2/9.32.160.125:8032. Already tried 1 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/31 10:48:39 INFO ipc.Client: Retrying connect to server: 
isblade2/9.32.160.125:8032. Already tried 2 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/31 10:48:40 INFO ipc.Client: Retrying connect to server: 
isblade2/9.32.160.125:8032. Already tried 3 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/31 10:48:41 INFO ipc.Client: Retrying connect to server: 
isblade2/9.32.160.125:8032. Already tried 4 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/31 10:48:42 INFO ipc.Client: Retrying connect to server: 
isblade2/9.32.160.125:8032. Already tried 5 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/31 10:48:43 INFO ipc.Client: Retrying connect to server: 
isblade2/9.32.160.125:8032. Already tried 6 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/31 10:48:44 INFO ipc.Client: Retrying connect to server: 
isblade2/9.32.160.125:8032. Already tried 7 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/31 10:48:45 INFO ipc.Client: Retrying connect to server: 
isblade2/9.32.160.125:8032. Already tried 8 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/31 10:48:46 INFO ipc.Client: Retrying connect to server: 
isblade2/9.32.160.125:8032. Already tried 9 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/31 10:49:17 INFO ipc.Client: Retrying connect to server: 
isblade2/9.32.160.125:8032. Already tried 0 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/31 10:49:18 INFO ipc.Client: Retrying connect to server: 
isblade2/9.32.160.125:8032. Already tried 1 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/31 10:49:19 INFO ipc.Client: Retrying connect to server: 
isblade2/9.32.160.125:8032. Already tried 2 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/31 10:49:20 INFO ipc.Client: Retrying connect to server: 
isblade2/9.32.160.125:8032. Already tried 3 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/31 10:49:21 INFO ipc.Client: Retrying connect to server: 
isblade2/9.32.160.125:8032. Already tried 4 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/01/31 10:49:22 INFO ipc.Client: Retrying connect to server: 
isblade2/9.32.160.125:8032. Already tried 5

Reducers are launched after jobClient is exited.

2014-01-28 Thread Rohith Sharma K S
Hi All ,

 I ran job with 1 Map and 1 Reducers ( 
mapreduce.job.reduce.slowstart.completedmaps=1 ).  Map failed ( because of 
error in Mapper implementation), but still Reducers are launched by 
applicationMaster.  These reducers killed by applicationMaster while

stopping RMCommunicator service.


1.   Why Reducers are launching after job is finished.? ( Is this is bug in 
MR? )



Our use case is when job is finished(succeeded/failed),client program delete 
the JobOutput directory. Here, jobclient exit immediately after jobStatus is 
set. ( in below log, at 2014-01-23 07:34:43,166)



But , in the below log as mentioned reducers are launched later , Reducer 
temporary directory and files are created(_temporary). These files left in hdfs 
undeleted forever.

Kindly suggest your thoughts, how we can handle this situation?



2014-01-23 07:34:43,151 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
task_1389970937094_0047_m_00 Task Transitioned from RUNNING to FAILED
2014-01-23 07:34:43,151 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 1
2014-01-23 07:34:43,151 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Job failed as tasks 
failed. failedMaps:1 failedReduces:0
2014-01-23 07:34:43,153 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1389970937094_0047Job 
Transitioned from RUNNING to FAIL_ABORT
2014-01-23 07:34:43,153 INFO [CommitterEvent Processor #0] 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the 
event EventType: JOB_ABORT
2014-01-23 07:34:43,166 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1389970937094_0047Job 
Transitioned from FAIL_ABORT to FAILED
...
...
2014-01-23 07:34:43,707 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: 
PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 
CompletedMaps:1 CompletedReds:0 ContAlloc:4 ContRel:0 HostLocal:1 RackLocal:0
2014-01-23 07:34:43,709 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
schedule, headroom=12288
2014-01-23 07:34:43,709 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce slow start 
threshold reached. Scheduling reduces.
2014-01-23 07:34:43,709 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: All maps assigned. 
Ramping up all remaining reduces:1
...
...
2014-01-23 07:34:45,714 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated 
containers 1
2014-01-23 07:34:45,714 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
2014-01-23 07:34:45,714 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container 
container_1389970937094_0047_01_06 to attempt_1389970937094_0047_r_00_0
...
...
2014-01-23 07:34:45,724 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
attempt_1389970937094_0047_r_00_0 TaskAttempt Transitioned from UNASSIGNED 
to ASSIGNED
2014-01-23 07:34:45,725 INFO [ContainerLauncher #8] 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing 
the event EventType: CONTAINER_REMOTE_LAUNCH for container 
container_1389970937094_0047_01_06 taskAttempt 
attempt_1389970937094_0047_r_00_0
2014-01-23 07:34:45,725 INFO [ContainerLauncher #8] 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching 
attempt_1389970937094_0047_r_00_0
2014-01-23 07:34:45,727 INFO [ContainerLauncher #8] 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port 
returned by ContainerManager for attempt_1389970937094_0047_r_00_0 : 11234
2014-01-23 07:34:45,728 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: 
[attempt_1389970937094_0047_r_00_0] using containerId: 
[container_1389970937094_0047_01_06 on NM: [linux85:11232]
2014-01-23 07:34:45,728 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
attempt_1389970937094_0047_r_00_0 TaskAttempt Transitioned from ASSIGNED to 
RUNNING
2014-01-23 07:34:45,728 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
task_1389970937094_0047_r_00 Task Transitioned from SCHEDULED to RUNNING
...
.
2014-01-23 07:34:48,178 INFO [Thread-59] 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING 
attempt_1389970937094_0047_r

RE: unable to compile hadoop source code

2014-01-06 Thread Rohith Sharma K S
You can read Build instructions for Hadoop.
http://svn.apache.org/repos/asf/hadoop/common/trunk/BUILDING.txt

For your problem, proto-buf not set in PATH. After setting, recheck 
proto-buffer version is 2.5

From: nagarjuna kanamarlapudi [mailto:nagarjuna.kanamarlap...@gmail.com]
Sent: 07 January 2014 09:18
To: user@hadoop.apache.org
Subject: unable to compile hadoop source code

Hi,
I checked out the source code from 
https://svn.apache.org/repos/asf/hadoop/common/trunk/

I tried to compile the code with mvn.

I am compiling this on a mac os X , mavericks.  Any help is appreciated.

It failed at the following stage



[INFO] Apache Hadoop Auth Examples ... SUCCESS [5.017s]

[INFO] Apache Hadoop Common .. FAILURE [1:39.797s]

[INFO] Apache Hadoop NFS . SKIPPED

[INFO] Apache Hadoop Common Project .. SKIPPED








[INFO] 

[ERROR] Failed to execute goal 
org.apache.hadoop:hadoop-maven-plugins:3.0.0-SNAPSHOT:protoc (compile-protoc) 
on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: 
'protoc --version' did not return a version -> [Help 1]

[ERROR]

[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.

[ERROR] Re-run Maven using the -X switch to enable full debug logging.


Thanks,
Nagarjuna K