Re: Question about Yarn rolling upgrade

2019-02-07 Thread Rohith Sharma K S
The above JIRA mentioned breaks but those are fixed in 2.6 itself. The only
one JIRA I see is YARN-8310 which is fixed in 2.10. Looking from stack
trace which you have mentioned, it doesn't seems related to your issue. May
be try applying a patch and run a job.
Otherwise, lets create a JIRA and discuss there in detail.

-Rohith Sharma K S

On Thu, 7 Feb 2019 at 22:52, Aihua Xu  wrote:

> Hi Rohith,
>
> Thanks for your suggestion. I was tracing the issue and found out it's
> caused by the incompatibility from these two changes. The tokens have been
> changed.
>
> YARN-668. Changed 
> NMTokenIdentifier/AMRMTokenIdentifier/ContainerTokenIdentifier to use 
> protobuf object as the payload. Contributed by Junping Du.
>
> YARN-2615. Changed 
> ClientToAMTokenIdentifier/RM(Timeline)DelegationTokenIdentifier to use 
> protobuf as payload. Contributed by Junping Du
>
>
> I was testing new RM with old NM.
>
> Followup on the the order of Yarn upgrade. I checked the HWX blog
> 
>  about
> rolling upgrade and it's suggesting to upgrade RM first.  But you are
> saying we should NM first and RM second? Can you confirm?
>
> Thanks,
> Aihua
>
>
>
> On Wed, Feb 6, 2019 at 8:26 PM Rohith Sharma K S <
> rohithsharm...@apache.org> wrote:
>
>> Hi Aihua,
>>
>> Could you give more clarity on when job is submitted like a) before
>> starting upgrade b) after RM upgrade and before NM upgrade c) after YARN
>> upgrade fully?
>> Typically, order of upgrade suggested is NM's first and RM second.
>>
>> Reg the NM warn messages you might be hitting
>> https://issues.apache.org/jira/browse/HADOOP-11692.
>>
>> Doesn't any subsequent jobs succeeded post upgrade?
>> -Rohith Sharma K S
>>
>> On Thu, 7 Feb 2019 at 03:20, Aihua Xu  wrote:
>>
>>> Hi all,
>>>
>>> I'm investigating the rolling upgrade process from Hadoop 2.6 to Hadoop
>>> 2.9.1. I'm trying to upgrade ResourceManager first and then try to upgrade
>>> NodeManager. When I submit a yarn job, RM fails with the following
>>> exception:
>>>
>>>  Application application_1549408943468_0001 failed 2 times due to Error 
>>> launching appattempt_1549408943468_0001_02. Got exception: 
>>> java.io.IOException: Failed on local exception: java.io.IOException: 
>>> java.io.EOFException; Host Details : local host is: 
>>> "hadoopbenchaqjm01-sjc1/10.67.2.171"; destination host is: 
>>> "hadoopbencha22-sjc1.prod.uber.internal":8041;
>>> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:805)
>>> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1497)
>>> at org.apache.hadoop.ipc.Client.call(Client.java:1439)
>>> at org.apache.hadoop.ipc.Client.call(Client.java:1349)
>>> at 
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>>> at 
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>>> at com.sun.proxy.$Proxy87.startContainers(Unknown Source)
>>> at 
>>> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at 
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>> at 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:498)
>>> at 
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>>> at 
>>> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>>> at 
>>> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>>> at 
>>> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>>> at 
>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>>> at com.sun.proxy.$Proxy88.startContainers(Unknown Source)
>>> at 
>>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:122)
>>> at 
>>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:307)
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> at java.lang.Thread.run(Thread.java:748)
>>> Caused by: java.io.IOException: java.io.EOFException
>>> at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:757)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:422)
>>> at 
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1889)
>>> at 
>>> org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:720)
>>> at 

[ANNOUNCE] Apache Hadoop 3.1.2 release

2019-02-07 Thread Wangda Tan
It gives us great pleasure to announce that the Apache Hadoop community has
voted to release Apache Hadoop 3.1.2.

IMPORTANT NOTES

3.1.2 is the second stable release of 3.1 line which is considered to be
production-ready.

Hadoop 3.1.2 brings a number of enhancements.

The Hadoop community fixed 325 JIRAs [1] in total as part of the 3.1.2
release. Of these fixes:

Apache Hadoop 3.1.2 contains a number of significant features and
enhancements. A few of them are noted below.

- Nvidia-docker-plugin v2 support for GPU support on YARN.
- YARN service upgrade improvements and bug fixes.
- YARN UIv2 improvements and bug fixes.
- AliyunOSS related improvements and bug fixes.
- Docker on YARN support related improvements and bug fixes.

Please see the Hadoop 3.1.2 CHANGES for the detailed list of issues
resolved. The release news is posted on the Apache Hadoop website too, you
can go to the downloads section.

Many thanks to everyone who contributed to the release, and everyone in the
Apache Hadoop community! The release is a result of direct and indirect
efforts from many contributors, listed below are the those who contributed
directly by submitting patches and/or reporting issues. (148 contributors,
Sorted by ID)

BilwaST, Charo Zhang, GeLiXin, Harsha1206, Huachao, Jim_Brennan, LiJinglun,
Naganarasimha, OrDTesters, RANith, Rakesh_Shah, Ray Burgemeestre, Sen Zhao,
SoumyaPN, SouryakantaDwivedy, Tao Yang, Zian Chen, abmodi, adam.antal,
ajayydv, ajisakaa, akhilpb, akhilsnaik, amihalyi, arpitagarwal, aw,
ayushtkn, banditka, belugabehr, benlau, bibinchundatt, billie.rinaldi,
boky01, bolke, borisvu, botong, brahmareddy, briandburton, bsteinbach,
candychencan, ccondit-target, charanh, cheersyang, cltlfcjin, collinma,
crh, csingh, csun, daisuke.kobayashi, daryn, dibyendu_hadoop,
dineshchitlangia, ebadger, eepayne, elgoiri, erwaman, eyang, fengchuang,
ferhui, fly_in_gis, gabor.bota, gezapeti, gsaha, haibochen, hexiaoqiao,
hfyang20071, hgadre, jeagles, jhung, jiangjianfei, jianliang.wu,
jira.shegalov, jiwq, jlowe, jojochuang, jonBoone, kanwaljeets, karams,
kennethlnnn, kgyrtkirk, kihwal, knanasi, kshukla, laszlok, leftnoteasy,
leiqiang, liaoyuxiangqin, linyiqun, ljain, lukmajercak, maniraj...@gmail.com,
masatana, nandakumar131, oliverhuh...@gmail.com, oshevchenko, pbacsko,
peruguusha, photogamrun, pj.fanning, prabham, pradeepambati, pranay_singh,
revans2, rkanter, rohithsharma, shaneku...@gmail.com, shubham.dewan,
shuzirra, shv, simonprewo, sinago, smeng, snemeth, sodonnell,
sreenivasulureddy, ssath...@hortonworks.com, ssulav, ste...@apache.org,
study, suma.shivaprasad, sunilg, surendrasingh, tangzhankun, tarunparimi,
tasanuma0829, templedf, thinktaocs, tlipcon, tmarquardt, trjianjianjiao,
uranus, varun_saxena, vinayrpet, vrushalic, wilfreds,
write2kish...@gmail.com, wujinhu, xiaochen, xiaoheipangzi, xkrogen,
yangjiandan, yeshavora, yiran, yoelee, yuzhih...@gmail.com, zichensun,
zvenczel

Wangda Tan and Sunil Govind

[1] JIRA query: project in (YARN, HADOOP, MAPREDUCE, HDFS) AND resolution =
Fixed AND fixVersion = 3.1.2 ORDER BY key ASC, updated ASC, created DESC,
priority DESC


Re: Question about Yarn rolling upgrade

2019-02-07 Thread Aihua Xu
Hi Rohith,

Thanks for your suggestion. I was tracing the issue and found out it's
caused by the incompatibility from these two changes. The tokens have been
changed.

YARN-668. Changed
NMTokenIdentifier/AMRMTokenIdentifier/ContainerTokenIdentifier to use
protobuf object as the payload. Contributed by Junping Du.

YARN-2615. Changed
ClientToAMTokenIdentifier/RM(Timeline)DelegationTokenIdentifier to use
protobuf as payload. Contributed by Junping Du


I was testing new RM with old NM.

Followup on the the order of Yarn upgrade. I checked the HWX blog

about
rolling upgrade and it's suggesting to upgrade RM first.  But you are
saying we should NM first and RM second? Can you confirm?

Thanks,
Aihua



On Wed, Feb 6, 2019 at 8:26 PM Rohith Sharma K S 
wrote:

> Hi Aihua,
>
> Could you give more clarity on when job is submitted like a) before
> starting upgrade b) after RM upgrade and before NM upgrade c) after YARN
> upgrade fully?
> Typically, order of upgrade suggested is NM's first and RM second.
>
> Reg the NM warn messages you might be hitting
> https://issues.apache.org/jira/browse/HADOOP-11692.
>
> Doesn't any subsequent jobs succeeded post upgrade?
> -Rohith Sharma K S
>
> On Thu, 7 Feb 2019 at 03:20, Aihua Xu  wrote:
>
>> Hi all,
>>
>> I'm investigating the rolling upgrade process from Hadoop 2.6 to Hadoop
>> 2.9.1. I'm trying to upgrade ResourceManager first and then try to upgrade
>> NodeManager. When I submit a yarn job, RM fails with the following
>> exception:
>>
>>  Application application_1549408943468_0001 failed 2 times due to Error 
>> launching appattempt_1549408943468_0001_02. Got exception: 
>> java.io.IOException: Failed on local exception: java.io.IOException: 
>> java.io.EOFException; Host Details : local host is: 
>> "hadoopbenchaqjm01-sjc1/10.67.2.171"; destination host is: 
>> "hadoopbencha22-sjc1.prod.uber.internal":8041;
>> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:805)
>> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1497)
>> at org.apache.hadoop.ipc.Client.call(Client.java:1439)
>> at org.apache.hadoop.ipc.Client.call(Client.java:1349)
>> at 
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>> at 
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>> at com.sun.proxy.$Proxy87.startContainers(Unknown Source)
>> at 
>> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:498)
>> at 
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>> at 
>> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>> at 
>> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>> at 
>> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>> at 
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>> at com.sun.proxy.$Proxy88.startContainers(Unknown Source)
>> at 
>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:122)
>> at 
>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:307)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:748)
>> Caused by: java.io.IOException: java.io.EOFException
>> at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:757)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1889)
>> at 
>> org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:720)
>> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:813)
>> at org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:411)
>> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1554)
>> at org.apache.hadoop.ipc.Client.call(Client.java:1385)
>> ... 20 more
>> Caused by: java.io.EOFException
>> at java.io.DataInputStream.readInt(DataInputStream.java:392)
>> at org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1798)
>> at 
>> org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:365)
>> at 
>>